New String Methods

# New String Methods

This chapter introduces the new methods added to the String object.

# String.fromCodePoint()

ES5 provides the String.fromCharCode() method, which returns a character from a Unicode code point. However, this method cannot handle code points greater than 0xFFFF.

String.fromCharCode(0x20BB7)
// "ஷ"

1
2

In the code above, String.fromCharCode() cannot handle code points greater than 0xFFFF, so 0x20BB7 overflows. The highest bit 2 is discarded, and the method returns the character corresponding to code point U+0BB7 instead of the character for code point U+20BB7.

ES6 provides the String.fromCodePoint() method, which can handle characters with code points greater than 0xFFFF, addressing the limitation of String.fromCharCode(). Functionally, it is the inverse of the codePointAt() method described below.

String.fromCodePoint(0x20BB7)
// "𠮷"
String.fromCodePoint(0x78, 0x1f680, 0x79) === 'x\uD83D\uDE80y'
// true

1
2
3
4

In the code above, if String.fromCodePoint receives multiple arguments, they are merged into a single string and returned.

Note that fromCodePoint is defined on the String object, while codePointAt is defined on string instances.

# String.raw()

ES6 also provides a raw() method on the native String object. This method returns a string with all backslashes escaped (i.e., a backslash is added before each backslash), and is often used for processing template strings.

String.raw`Hi\n${2+3}!`
// Actually returns "Hi\\n5!", displaying the escaped result "Hi\n5!"

String.raw`Hi\u000A!`;
// Actually returns "Hi\\u000A!", displaying the escaped result "Hi\u000A!"

1
2
3
4
5

If the backslashes in the original string are already escaped, String.raw() will escape them again.

String.raw`Hi\\n`
// Returns "Hi\\\\n"

String.raw`Hi\\n` === "Hi\\\\n" // true

1
2
3
4

The String.raw() method can serve as a basic method for processing template strings. It replaces all variables and escapes backslashes, making the result ready for use as a string.

String.raw() is essentially a normal function, but it is specifically designed as a tag function for template strings. If written as a regular function call, its first argument should be an object with a raw property, and the value of the raw property should be an array corresponding to the parsed values of the template string.

// `foo${1 + 2}bar`
// equivalent to
String.raw({ raw: ['foo', 'bar'] }, 1 + 2) // "foo3bar"

1
2
3

In the code above, the first argument to String.raw() is an object whose raw property is equivalent to the array obtained after parsing the original template string.

As a function, the implementation of String.raw() is essentially as follows.

String.raw = function (strings, ...values) {
  let output = '';
  let index;
  for (index = 0; index < values.length; index++) {
    output += strings.raw[index] + values[index];
  }

  output += strings.raw[index]
  return output;
}

1
2
3
4
5
6
7
8
9
10

# Instance Method: codePointAt()

Internally, JavaScript stores characters in UTF-16 format, with each character taking up a fixed 2 bytes. For characters that require 4 bytes of storage (Unicode code points greater than 0xFFFF), JavaScript treats them as two characters.

var s = "𠮷";

s.length // 2
s.charAt(0) // ''
s.charAt(1) // ''
s.charCodeAt(0) // 55362
s.charCodeAt(1) // 57271

1
2
3
4
5
6
7

In the code above, the Chinese character "𠮷" (note: this is not the "吉" in "吉祥") has a code point of 0x20BB7. Its UTF-16 encoding is 0xD842 0xDFB7 (decimal 55362 57271), requiring 4 bytes of storage. For such 4-byte characters, JavaScript cannot handle them correctly: the string length is incorrectly reported as 2, charAt() cannot read the entire character, and charCodeAt() can only return the values of the first two bytes and the last two bytes separately.

ES6 provides the codePointAt() method, which correctly handles characters stored in 4 bytes and returns the code point of a character.

let s = '𠮷a';

s.codePointAt(0) // 134071
s.codePointAt(1) // 57271

s.codePointAt(2) // 97

1
2
3
4
5
6

The argument to codePointAt() is the position of the character in the string (starting from 0). In the code above, JavaScript treats "𠮷a" as three characters. The codePointAt method correctly identifies "𠮷" at the first character position and returns its decimal code point 134071 (hexadecimal 20BB7). For the second character (the trailing two bytes of "𠮷") and the third character "a", codePointAt() returns the same results as charCodeAt().

In summary, codePointAt() correctly returns the code point of 32-bit UTF-16 characters. For regular characters stored in two bytes, it returns the same result as charCodeAt().

The codePointAt() method returns the code point as a decimal value. To get the hexadecimal value, use the toString() method for conversion.

let s = '𠮷a';

s.codePointAt(0).toString(16) // "20bb7"
s.codePointAt(2).toString(16) // "61"

1
2
3
4

You may have noticed that the argument to codePointAt() is still incorrect. For example, in the code above, the correct position of character a in string s should be 1, but you must pass 2 to codePointAt(). One way to solve this is to use a for...of loop, which correctly recognizes 32-bit UTF-16 characters.

let s = '𠮷a';
for (let ch of s) {
  console.log(ch.codePointAt(0).toString(16));
}
// 20bb7
// 61

1
2
3
4
5
6

Another approach is to use the spread operator (...) to expand the string.

let arr = [...'𠮷a']; // arr.length === 2
arr.forEach(
  ch => console.log(ch.codePointAt(0).toString(16))
);
// 20bb7
// 61

1
2
3
4
5
6

The codePointAt() method is the simplest way to test whether a character is composed of two bytes or four bytes.

function is32Bit(c) {
  return c.codePointAt(0) > 0xFFFF;
}

is32Bit("𠮷") // true
is32Bit("a") // false

1
2
3
4
5
6

# Instance Method: normalize()

Many European languages have diacritical marks and accent marks. Unicode provides two ways to represent them. One is to directly provide the accented character, such as Ǒ (\u01D1). The other is to use combining characters — a combination of the base character and the accent mark — where two characters combine into one, such as O (\u004F) and ˇ (\u030C) combining to form Ǒ (\u004F\u030C).

These two representations are visually and semantically equivalent, but JavaScript cannot recognize them as the same.

'\u01D1'==='\u004F\u030C' //false

'\u01D1'.length // 1
'\u004F\u030C'.length // 2

1
2
3
4

The code above shows that JavaScript treats the combining characters as two characters, causing the two representations to be unequal.

ES6 provides the normalize() method on string instances, which normalizes different representations of a character into the same form. This is known as Unicode normalization.

'\u01D1'.normalize() === '\u004F\u030C'.normalize()
// true

1
2

The normalize method accepts an argument to specify the normalization form. The four possible values are:

NFC — the default parameter, stands for "Normalization Form Canonical Composition." Returns the composite character from multiple simple characters. "Canonical equivalence" means visual and semantic equivalence.
NFD — stands for "Normalization Form Canonical Decomposition." Returns the decomposed simple characters from a composite character under canonical equivalence.
NFKC — stands for "Normalization Form Compatibility Composition." Returns the composite character. "Compatibility equivalence" means semantically equivalent but not visually equivalent, such as "囍" and "喜喜". (This is just an example — normalize cannot handle Chinese characters.)
NFKD — stands for "Normalization Form Compatibility Decomposition." Returns the decomposed simple characters from a composite character under compatibility equivalence.

'\u004F\u030C'.normalize('NFC').length // 1
'\u004F\u030C'.normalize('NFD').length // 2

1
2

The code above shows that the NFC parameter returns the composed form of the character, while the NFD parameter returns the decomposed form.

However, the normalize method currently cannot handle compositions of three or more characters. In such cases, regular expressions must be used to determine the character by its Unicode code point range.

# Instance Methods: includes(), startsWith(), endsWith()

Traditionally, JavaScript only had the indexOf method to determine whether one string is contained within another. ES6 provides three new methods.

includes(): Returns a boolean indicating whether the argument string was found.
startsWith(): Returns a boolean indicating whether the argument string is at the beginning of the original string.
endsWith(): Returns a boolean indicating whether the argument string is at the end of the original string.

let s = 'Hello world!';

s.startsWith('Hello') // true
s.endsWith('!') // true
s.includes('o') // true

1
2
3
4
5

All three methods support a second argument that specifies the position from which to begin searching.

let s = 'Hello world!';

s.startsWith('world', 6) // true
s.endsWith('Hello', 5) // true
s.includes('Hello', 6) // false

1
2
3
4
5

The code above shows that when the second argument n is used, endsWith behaves differently from the other two methods. It operates on the first n characters, while the other two methods operate from position n to the end of the string.

# Instance Method: repeat()

The repeat method returns a new string by repeating the original string n times.

'x'.repeat(3) // "xxx"
'hello'.repeat(2) // "hellohello"
'na'.repeat(0) // ""

1
2
3

If the argument is a decimal, it will be rounded down.

'na'.repeat(2.9) // "nana"

If the argument to repeat is a negative number or Infinity, an error will be thrown.

'na'.repeat(Infinity)
// RangeError
'na'.repeat(-1)
// RangeError

1
2
3
4

However, if the argument is a decimal between 0 and -1, it is treated as 0. This is because the value is first rounded. Decimals between 0 and -1 round to -0, which repeat treats as 0.

'na'.repeat(-0.9) // ""

The argument NaN is treated as 0.

'na'.repeat(NaN) // ""

If the argument to repeat is a string, it will first be converted to a number.

'na'.repeat('na') // ""
'na'.repeat('3') // "nanana"

1
2

# Instance Methods: padStart() and padEnd()

ES2017 introduced string padding functionality. If a string is shorter than a specified length, it will be padded at the beginning or end. padStart() pads at the beginning, and padEnd() pads at the end.

'x'.padStart(5, 'ab') // 'ababx'
'x'.padStart(4, 'ab') // 'abax'

'x'.padEnd(5, 'ab') // 'xabab'
'x'.padEnd(4, 'ab') // 'xaba'

1
2
3
4
5

In the code above, padStart() and padEnd() each accept two arguments: the first is the maximum length for the padded string, and the second is the string used for padding.

If the length of the original string equals or exceeds the maximum length, padding has no effect and the original string is returned.

'xxx'.padStart(2, 'ab') // 'xxx'
'xxx'.padEnd(2, 'ab') // 'xxx'

1
2

If the combined length of the padding string and original string exceeds the maximum length, the padding string is truncated.

'abc'.padStart(10, '0123456789')
// '0123456abc'

1
2

If the second argument is omitted, spaces are used for padding by default.

'x'.padStart(4) // '   x'
'x'.padEnd(4) // 'x   '

1
2

A common use of padStart() is padding numbers to a specified number of digits. The following code generates 10-digit numeric strings.

'1'.padStart(10, '0') // "0000000001"
'12'.padStart(10, '0') // "0000000012"
'123456'.padStart(10, '0') // "0000123456"

1
2
3

Another use is indicating string formats.

'12'.padStart(10, 'YYYY-MM-DD') // "YYYY-MM-12"
'09-12'.padStart(10, 'YYYY-MM-DD') // "YYYY-09-12"

1
2

# Instance Methods: trimStart() and trimEnd()

ES2019 (opens new window) added the trimStart() and trimEnd() methods to string instances. Their behavior is consistent with trim(): trimStart() removes whitespace from the beginning of the string, and trimEnd() removes whitespace from the end. Both return new strings without modifying the original.

const s = '  abc  ';

s.trim() // "abc"
s.trimStart() // "abc  "
s.trimEnd() // "  abc"

1
2
3
4
5

In the code above, trimStart() only removes whitespace from the beginning, preserving the trailing whitespace. trimEnd() behaves similarly.

In addition to spaces, these two methods also handle tab characters, newline characters, and other invisible whitespace characters at the beginning (or end) of a string.

Browsers also implement two additional methods: trimLeft() is an alias for trimStart(), and trimRight() is an alias for trimEnd().

# Instance Method: matchAll()

The matchAll() method returns all matches of a regular expression in the current string. See the "Regular Expression Extensions" chapter for details.

Edit

#ES6

Last Updated: 2026/03/21, 12:14:36

← String Extensions Regular Expression Extensions→

Recent Updates

01
How I Discovered Disposable Email — A True Story 06-12

02
Animations in Grid Layout 09-15

03
Renaming a Git Branch 08-11

More Articles >