📅  最后修改于: 2023-12-03 14:38:46.122000             🧑  作者: Mango
Unicode is a character encoding system that assigns a unique number to every character across all writing systems. JavaScript provides various methods and properties to work with Unicode characters in strings. In this article, we'll explore how to handle Unicode in JavaScript.
Unicode escapes are a way to represent Unicode characters using their hexadecimal code point value. In JavaScript, you can use the escape sequence \u{codePoint}
to include Unicode characters in a string.
const smiley = '\u{1F60A}';
console.log(smiley); // Output: 😊
JavaScript provides several string manipulation methods that work with Unicode characters. Some examples include:
String.length
: The length
property returns the number of UTF-16 code units in a string, so it can be used to count Unicode characters.
String.charAt()
: The charAt()
method returns the character at a specified index. It works with characters outside the Basic Multilingual Plane (BMP).
String.codePointAt()
: The codePointAt()
method returns the Unicode code point of the character at a given index.
String.fromCodePoint()
: The fromCodePoint()
method creates a string from a sequence of Unicode code points.
const astronaut = '👩🚀';
console.log(astronaut.length); // Output: 4
console.log(astronaut.charAt(0)); // Output: 👩
console.log(astronaut.codePointAt(1)); // Output: 8205 (Zero Width Joiner)
Regular expressions in JavaScript can be used to match Unicode characters. The u
flag is used to enable full Unicode matching, allowing regular expressions to handle characters outside the BMP.
const text = 'Hello 世界';
const regex = /\p{Script=Han}/u;
console.log(text.match(regex)); // Output: ["世"]
JavaScript's for...of
statement can iterate over Unicode characters in a string. It correctly handles characters outside the BMP.
const flags = '🇺🇳🇩🇪';
for (const flag of flags) {
console.log(flag);
}
// Output:
// 🇺
// 🇳
// 🇩
Unicode defines different forms of normalization to handle equivalent sequences of characters. JavaScript provides methods to normalize strings:
String.normalize()
: The normalize()
method converts a string to one of the four Unicode normalization forms: NFC, NFD, NFKC, or NFKD.const nfd = '\u0041\u0308'; // 'Ä' decomposed (NFD)
const nfc = nfd.normalize('NFC'); // 'Ä' composed (NFC)
console.log(nfc === 'Ä'); // Output: true
In conclusion, JavaScript has robust support for Unicode characters. Understanding how to work with Unicode in JavaScript is essential for handling multilingual text and ensuring proper string manipulation.