String Extensions
# String Extensions
This chapter covers ES6 improvements and enhancements to strings. The next chapter covers new methods added to the String object.
# Unicode Representation of Characters
ES6 strengthened Unicode support, allowing characters to be represented in the \uxxxx format, where xxxx is the character's Unicode code point.
"\u0061"
// "a"
2
However, this representation is limited to characters with code points between \u0000 and \uFFFF. Characters beyond this range must be represented using two double-byte forms.
"\uD842\uDFB7"
// "𠮷"
"\u20BB7"
// " 7"
2
3
4
5
The code above shows that if a value greater than 0xFFFF (such as \u20BB7) is placed directly after \u, JavaScript will interpret it as \u20BB+7. Since \u20BB is a non-printable character, only a space will be displayed, followed by a 7.
ES6 improved this by allowing code points to be placed inside curly braces for correct interpretation.
"\u{20BB7}"
// "𠮷"
"\u{41}\u{42}\u{43}"
// "ABC"
let hello = 123;
hell\u{6F} // 123
'\u{1F680}' === '\uD83D\uDE80'
// true
2
3
4
5
6
7
8
9
10
11
In the code above, the last example demonstrates that the curly brace notation is equivalent to the four-byte UTF-16 encoding.
With this notation, JavaScript now has 6 ways to represent a character.
'\z' === 'z' // true
'\172' === 'z' // true
'\x7A' === 'z' // true
'\u007A' === 'z' // true
'\u{7A}' === 'z' // true
2
3
4
5
# String Iterator Interface
ES6 added an iterator interface to strings (see the Iterator chapter for details), enabling strings to be traversed with for...of loops.
for (let codePoint of 'foo') {
console.log(codePoint)
}
// "f"
// "o"
// "o"
2
3
4
5
6
Beyond traversing strings, the biggest advantage of this iterator is its ability to recognize code points greater than 0xFFFF, which the traditional for loop cannot handle.
let text = String.fromCodePoint(0x20BB7);
for (let i = 0; i < text.length; i++) {
console.log(text[i]);
}
// " "
// " "
for (let i of text) {
console.log(i);
}
// "𠮷"
2
3
4
5
6
7
8
9
10
11
12
In the code above, the string text contains only one character, but the for loop treats it as two characters (both non-printable), while the for...of loop correctly recognizes it as a single character.
# Direct Input of U+2028 and U+2029
JavaScript strings allow direct input of characters as well as their escape forms. For example, the Unicode code point for the Chinese character "中" is U+4e2d. You can directly input this character in a string, or use its escape form \u4e2d — both are equivalent.
'中' === '\u4e2d' // true
However, JavaScript specifies 5 characters that cannot be used directly in strings and must use their escape forms instead.
- U+005C: reverse solidus
- U+000D: carriage return
- U+2028: line separator
- U+2029: paragraph separator
- U+000A: line feed
For example, a string cannot directly contain a backslash — it must be escaped as \\ or \u005c.
This rule itself is not problematic. The issue is that the JSON format allows strings to directly contain U+2028 (line separator) and U+2029 (paragraph separator). As a result, when JSON output from a server is parsed with JSON.parse, it may throw an error.
const json = '"\u2028"';
JSON.parse(json); // may throw an error
2
Since the JSON format is frozen (RFC 7159) and cannot be modified, ES2019 (opens new window) allows JavaScript strings to directly input U+2028 (line separator) and U+2029 (paragraph separator) to eliminate this error.
const PS = eval("'\u2029'");
According to this proposal, the code above will not throw an error.
Note that template strings already allow direct input of these two characters. Additionally, regular expressions still do not allow direct input of these two characters, which is not an issue since JSON was never intended to directly contain regular expressions.
# Improvements to JSON.stringify()
According to the standard, JSON data must be UTF-8 encoded. However, the current JSON.stringify() method may return strings that do not conform to the UTF-8 standard.
Specifically, the UTF-8 standard stipulates that code points between 0xD800 and 0xDFFF cannot be used alone — they must be used in pairs. For example, \uD834\uDF06 consists of two code points that must be paired together to represent the character 𝌆. This is a workaround for representing characters with code points greater than 0xFFFF. Using \uD834 and \uDFO6 individually is invalid, and reversing the order also doesn't work because \uDF06\uD834 doesn't correspond to any character.
The problem with JSON.stringify() is that it may return individual code points between 0xD800 and 0xDFFF.
JSON.stringify('\u{D834}') // "\u{D834}"
To ensure that valid UTF-8 characters are returned, ES2019 (opens new window) changed the behavior of JSON.stringify(). When it encounters individual code points between 0xD800 and 0xDFFF, or invalid surrogate pairs, it returns escape strings, leaving it up to the application to decide how to handle them.
JSON.stringify('\u{D834}') // ""\\uD834""
JSON.stringify('\uDF06\uD834') // ""\\udf06\\ud834""
2
# Template Strings
In traditional JavaScript, output templates were typically written like this (the example below uses jQuery methods).
$('#result').append(
'There are <b>' + basket.count + '</b> ' +
'items in your basket, ' +
'<em>' + basket.onSale +
'</em> are on sale!'
);
2
3
4
5
6
This approach is quite cumbersome. ES6 introduced template strings to solve this problem.
$('#result').append(`
There are <b>${basket.count}</b> items
in your basket, <em>${basket.onSale}</em>
are on sale!
`);
2
3
4
5
Template strings are enhanced strings, delimited by backticks (`). They can be used as regular strings, to define multi-line strings, or to embed variables within strings.
// Regular string
`In JavaScript '\n' is a line-feed.`
// Multi-line string
`In JavaScript this is
not legal.`
console.log(`string text line 1
string text line 2`);
// Embedding variables in strings
let name = "Bob", time = "today";
`Hello ${name}, how are you ${time}?`
2
3
4
5
6
7
8
9
10
11
12
13
The template strings in the code above all use backticks. If you need to use a backtick within a template string, escape it with a backslash.
let greeting = `\`Yo\` World!`;
If you use a template string to represent multi-line strings, all spaces and indentation will be preserved in the output.
$('#list').html(`
<ul>
<li>first</li>
<li>second</li>
</ul>
`);
2
3
4
5
6
In the code above, all spaces and newlines in the template string are preserved. For example, there will be a newline before the <ul> tag. If you don't want this newline, you can use the trim method to remove it.
$('#list').html(`
<ul>
<li>first</li>
<li>second</li>
</ul>
`.trim());
2
3
4
5
6
To embed variables in a template string, write the variable name inside ${}.
function authorize(user, action) {
if (!user.hasPrivilege(action)) {
throw new Error(
// Traditional approach:
// 'User '
// + user.name
// + ' is not authorized to do '
// + action
// + '.'
`User ${user.name} is not authorized to do ${action}.`);
}
}
2
3
4
5
6
7
8
9
10
11
12
Inside the curly braces, you can place any JavaScript expression, perform calculations, and reference object properties.
let x = 1;
let y = 2;
`${x} + ${y} = ${x + y}`
// "1 + 2 = 3"
`${x} + ${y * 2} = ${x + y * 2}`
// "1 + 4 = 5"
let obj = {x: 1, y: 2};
`${obj.x + obj.y}`
// "3"
2
3
4
5
6
7
8
9
10
11
12
You can also call functions within template strings.
function fn() {
return "Hello World";
}
`foo ${fn()} bar`
// foo Hello World bar
2
3
4
5
6
If the value inside the curly braces is not a string, it will be converted to a string following the standard rules. For example, if the curly braces contain an object, the object's toString method will be called by default.
If a variable in a template string has not been declared, an error will be thrown.
// Variable place has not been declared
let msg = `Hello, ${place}`;
// Error
2
3
Since the content inside the curly braces of a template string is JavaScript code, if the curly braces contain a string, it will be output as-is.
`Hello ${'World'}`
// "Hello World"
2
Template strings can even be nested.
const tmpl = addrs => `
<table>
${addrs.map(addr => `
<tr><td>${addr.first}</td></tr>
<tr><td>${addr.last}</td></tr>
`).join('')}
</table>
`;
2
3
4
5
6
7
8
In the code above, another template string is embedded within the variable of a template string. Here is how to use it.
const data = [
{ first: '<Jane>', last: 'Bond' },
{ first: 'Lars', last: '<Croft>' },
];
console.log(tmpl(data));
// <table>
//
// <tr><td><Jane></td></tr>
// <tr><td>Bond</td></tr>
//
// <tr><td>Lars</td></tr>
// <tr><td><Croft></td></tr>
//
// </table>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
If you need to reference the template string itself and execute it when needed, you can write it as a function.
let func = (name) => `Hello ${name}!`;
func('Jack') // "Hello Jack!"
2
In the code above, the template string is written as the return value of a function. Executing this function is equivalent to executing the template string.
# Example: Template Compilation
Let's look at an example of generating a formal template using template strings.
let template = `
<ul>
<% for(let i=0; i < data.supplies.length; i++) { %>
<li><%= data.supplies[i] %></li>
<% } %>
</ul>
`;
2
3
4
5
6
7
The code above places a regular template inside a template string. The template uses <%...%> to place JavaScript code and <%= ... %> to output JavaScript expressions.
How do we compile this template string?
One approach is to convert it into a JavaScript expression string.
echo('<ul>');
for(let i=0; i < data.supplies.length; i++) {
echo('<li>');
echo(data.supplies[i]);
echo('</li>');
};
echo('</ul>');
2
3
4
5
6
7
This conversion can be done using regular expressions.
let evalExpr = /<%=(.+?)%>/g;
let expr = /<%([\s\S]+?)%>/g;
template = template
.replace(evalExpr, '`); \n echo( $1 ); \n echo(`')
.replace(expr, '`); \n $1 \n echo(`');
template = 'echo(`' + template + '`);';
2
3
4
5
6
7
8
Then, wrap template inside a function and return it.
let script =
`(function parse(data){
let output = "";
function echo(html){
output += html;
}
${ template }
return output;
})`;
return script;
2
3
4
5
6
7
8
9
10
11
12
13
14
Assemble the above into a template compilation function called compile.
function compile(template){
const evalExpr = /<%=(.+?)%>/g;
const expr = /<%([\s\S]+?)%>/g;
template = template
.replace(evalExpr, '`); \n echo( $1 ); \n echo(`')
.replace(expr, '`); \n $1 \n echo(`');
template = 'echo(`' + template + '`);';
let script =
`(function parse(data){
let output = "";
function echo(html){
output += html;
}
${ template }
return output;
})`;
return script;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
The compile function is used as follows.
let parse = eval(compile(template));
div.innerHTML = parse({ supplies: [ "broom", "mop", "cleaner" ] });
// <ul>
// <li>broom</li>
// <li>mop</li>
// <li>cleaner</li>
// </ul>
2
3
4
5
6
7
# Tagged Templates
The capabilities of template strings go beyond what was described above. A template string can be placed immediately after a function name, and that function will be called to process the template string. This is known as the "tagged template" feature.
alert`123`
// equivalent to
alert(123)
2
3
A tagged template is not actually a template — it is a special form of function invocation. The "tag" refers to the function, and the template string that follows is its argument.
However, if the template string contains variables, it is not a simple invocation. Instead, the template string is first processed into multiple arguments before the function is called.
let a = 5;
let b = 10;
tag`Hello ${ a + b } world ${ a * b }`;
// equivalent to
tag(['Hello ', ' world ', ''], 15, 50);
2
3
4
5
6
In the code above, the template string is preceded by the identifier tag, which is a function. The return value of the entire expression is the return value of the tag function after processing the template string.
The tag function receives multiple arguments in sequence.
function tag(stringArr, value1, value2){
// ...
}
// equivalent to
function tag(stringArr, ...values){
// ...
}
2
3
4
5
6
7
8
9
The first argument of the tag function is an array whose members are the parts of the template string that do not contain variable substitutions. In other words, variable substitution only occurs between the first and second members, between the second and third members, and so on.
The remaining arguments of the tag function are the values of the template string variables after substitution. In this example, since the template string contains two variables, tag receives value1 and value2 as arguments.
The actual values of all arguments to the tag function are as follows.
- First argument:
['Hello ', ' world ', ''] - Second argument: 15
- Third argument: 50
In other words, the tag function is effectively called as follows.
tag(['Hello ', ' world ', ''], 15, 50)
We can write the code for the tag function as needed. Below is one implementation of the tag function, along with the output.
let a = 5;
let b = 10;
function tag(s, v1, v2) {
console.log(s[0]);
console.log(s[1]);
console.log(s[2]);
console.log(v1);
console.log(v2);
return "OK";
}
tag`Hello ${ a + b } world ${ a * b}`;
// "Hello "
// " world "
// ""
// 15
// 50
// "OK"
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Here is a more complex example.
let total = 30;
let msg = passthru`The total is ${total} (${total*1.05} with tax)`;
function passthru(literals) {
let result = '';
let i = 0;
while (i < literals.length) {
result += literals[i++];
if (i < arguments.length) {
result += arguments[i];
}
}
return result;
}
msg // "The total is 30 (31.5 with tax)"
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The example above demonstrates how to reassemble all the arguments back into their original positions.
Here is the passthru function rewritten using rest parameters.
function passthru(literals, ...values) {
let output = "";
let index;
for (index = 0; index < values.length; index++) {
output += literals[index] + values[index];
}
output += literals[index]
return output;
}
2
3
4
5
6
7
8
9
10
An important application of tagged templates is filtering HTML strings to prevent users from inputting malicious content.
let message =
SaferHTML`<p>${sender} has sent you a message.</p>`;
function SaferHTML(templateData) {
let s = templateData[0];
for (let i = 1; i < arguments.length; i++) {
let arg = String(arguments[i]);
// Escape special characters in the substitution.
s += arg.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">");
// Don't escape special characters in the template.
s += templateData[i];
}
return s;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In the code above, the sender variable is often user-provided. After processing by the SaferHTML function, all special characters within it will be escaped.
let sender = '<script>alert("abc")</script>'; // malicious code
let message = SaferHTML`<p>${sender} has sent you a message.</p>`;
message
// <p><script>alert("abc")</script> has sent you a message.</p>
2
3
4
5
Another application of tagged templates is multilingual translation (internationalization).
i18n`Welcome to ${siteName}, you are visitor number ${visitorNumber}!`
// "Welcome to xxx, you are visitor number xxxx!"
2
Template strings alone cannot replace template libraries like Mustache, as they lack conditional logic and loop processing. However, through tag functions, you can add these features yourself.
// The hashTemplate function below
// is a custom template processing function
let libraryHtml = hashTemplate`
<ul>
#for book in ${myBooks}
<li><i>#{book.title}</i> by #{book.author}</li>
#end
</ul>
`;
2
3
4
5
6
7
8
9
Beyond that, you can even use tagged templates to embed other languages within JavaScript.
jsx`
<div>
<input
ref='input'
onChange='${this.handleChange}'
defaultValue='${this.state.value}' />
${this.state.value}
</div>
`
2
3
4
5
6
7
8
9
The code above uses a jsx function to convert a DOM string into a React object. You can find the specific implementation (opens new window) of the jsx function on GitHub.
Below is a hypothetical example that uses a java function to run Java code within JavaScript code.
java`
class HelloWorldApp {
public static void main(String[] args) {
System.out.println("Hello World!"); // Display the string.
}
}
`
HelloWorldApp.main();
2
3
4
5
6
7
8
The first argument to the template processing function (the template string array) also has a raw property.
console.log`123`
// ["123", raw: Array[1]]
2
In the code above, the argument received by console.log is actually an array. This array has a raw property that stores the original string before escape processing.
Consider the following example.
tag`First line\nSecond line`
function tag(strings) {
console.log(strings.raw[0]);
// strings.raw[0] is "First line\\nSecond line"
// prints "First line\nSecond line"
}
2
3
4
5
6
7
In the code above, the first argument strings of the tag function has a raw property that also points to an array. The members of this array are identical to those of the strings array. For example, if the strings array is ["First line\nSecond line"], then the strings.raw array is ["First line\\nSecond line"]. The only difference is that all backslashes in the string have been escaped. For instance, the strings.raw array treats \n as two characters \\ and n, rather than a newline character. This is designed to make it easy to retrieve the original template before escaping.
# Limitations of Template Strings
As mentioned earlier, tagged templates can embed other languages. However, template strings escape strings by default, making it impossible to embed other languages directly.
For example, tagged templates can embed the LaTeX language.
function latex(strings) {
// ...
}
let document = latex`
\newcommand{\fun}{\textbf{Fun!}} // works fine
\newcommand{\unicode}{\textbf{Unicode!}} // error
\newcommand{\xerxes}{\textbf{King!}} // error
Breve over the h goes \u{h}ere // error
`
2
3
4
5
6
7
8
9
10
11
In the code above, the template string embedded in the variable document is perfectly valid LaTeX, but the JavaScript engine will throw an error. The reason is string escaping.
Template strings treat \u00FF and \u{42} as Unicode characters and escape them, so \unicode causes a parsing error. Similarly, \x56 is treated as a hexadecimal string escape, so \xerxes also causes an error. In other words, \u and \x have special meanings in LaTeX, but JavaScript escapes them.
To solve this problem, ES2018 relaxed (opens new window) the restrictions on string escaping in tagged templates. If an invalid string escape is encountered, it returns undefined instead of throwing an error, and the original string can still be obtained from the raw property.
function tag(strs) {
strs[0] === undefined
strs.raw[0] === "\\unicode and \\u{55}";
}
tag`\unicode and \u{55}`
2
3
4
5
In the code above, the template string would normally throw an error, but because the restrictions on string escaping have been relaxed, it no longer does. The JavaScript engine sets the first character to undefined, but the raw property can still retrieve the original string, so the tag function can still process the original string.
Note that this relaxation of string escaping only takes effect when parsing strings in tagged templates. In non-tagged-template contexts, errors will still be thrown.
let bad = `bad escape sequence: \unicode`; // error