📅  最后修改于: 2023-12-03 15:35:31.677000             🧑  作者: Mango
unorm
is a lightweight Unicode normalization library for Node.js.
As a programmer working with text, you're likely familiar with the concept of Unicode normalization. Unicode defines several normalization forms (NFC, NFD, NFKC, NFKD) that represent equivalent sequences of characters in different ways. These different forms can lead to problems in text processing, sorting, and searching.
unorm
is a library that provides a simple and efficient API for normalizing Unicode text in Node.js. It supports all four normalization forms, as well as some additional utilities for working with Unicode strings.
To use unorm
in your Node.js projects, you can install it via npm:
npm install unorm
unorm
provides a simple API with functions for normalizing text and testing for normalization. Here are a few examples:
const { NFC, NFD, NFKC, NFKD, isNormalized } = require('unorm');
const input = 'caf\u00e9';
const nfc = NFC(input);
const nfd = NFD(input);
const nfkc = NFKC(input);
const nfkd = NFKD(input);
console.log(nfc);
// Output: 'café'
console.log(isNormalized(nfc));
// Output: true
console.log(isNormalized(nfd));
// Output: false
console.log(isNormalized(nfkc));
// Output: true
console.log(isNormalized(nfkd));
// Output: false
In this example, we're creating four normalized versions of the string "caf\u00e9" (which contains an accented "e"). We're then using the isNormalized
function to test each version for normalization. NFC and NFKC are considered normalized, while NFD and NFKD are not.
unorm
exports the following functions:
NFC(input: string): string
Returns the Unicode Normalization Form C (NFC) of the input string.
NFD(input: string): string
Returns the Unicode Normalization Form D (NFD) of the input string.
NFKC(input: string): string
Returns the Unicode Normalization Form KC (NFKC) of the input string.
NFKD(input: string): string
Returns the Unicode Normalization Form KD (NFKD) of the input string.
isNormalized(input: string, form?: string): boolean
Tests whether the input string is normalized according to the specified normalization form (defaults to NFC). Returns true
if the string is normalized, false
otherwise.
canonicalCombiningClass(char: string|number): number
Returns the Unicode canonical combining class value for the specified character. This is useful for some advanced text processing tasks, such as combining marks or grapheme clustering.
unorm
is a simple and efficient library for working with Unicode text normalization in Node.js. Its API is well-documented and easy to use, making it a great choice for anyone who needs to work with Unicode strings.