📜  unorm npm (1)

📅  最后修改于: 2023-12-03 15:35:31.677000             🧑  作者: Mango

unorm

unorm is a lightweight Unicode normalization library for Node.js.

Introduction

As a programmer working with text, you're likely familiar with the concept of Unicode normalization. Unicode defines several normalization forms (NFC, NFD, NFKC, NFKD) that represent equivalent sequences of characters in different ways. These different forms can lead to problems in text processing, sorting, and searching.

unorm is a library that provides a simple and efficient API for normalizing Unicode text in Node.js. It supports all four normalization forms, as well as some additional utilities for working with Unicode strings.

Installation

To use unorm in your Node.js projects, you can install it via npm:

npm install unorm
Usage

unorm provides a simple API with functions for normalizing text and testing for normalization. Here are a few examples:

const { NFC, NFD, NFKC, NFKD, isNormalized } = require('unorm');

const input = 'caf\u00e9';
const nfc = NFC(input);
const nfd = NFD(input);
const nfkc = NFKC(input);
const nfkd = NFKD(input);

console.log(nfc);
// Output: 'café'

console.log(isNormalized(nfc));
// Output: true

console.log(isNormalized(nfd));
// Output: false

console.log(isNormalized(nfkc));
// Output: true

console.log(isNormalized(nfkd));
// Output: false

In this example, we're creating four normalized versions of the string "caf\u00e9" (which contains an accented "e"). We're then using the isNormalized function to test each version for normalization. NFC and NFKC are considered normalized, while NFD and NFKD are not.

API

unorm exports the following functions:

NFC(input: string): string

Returns the Unicode Normalization Form C (NFC) of the input string.

NFD(input: string): string

Returns the Unicode Normalization Form D (NFD) of the input string.

NFKC(input: string): string

Returns the Unicode Normalization Form KC (NFKC) of the input string.

NFKD(input: string): string

Returns the Unicode Normalization Form KD (NFKD) of the input string.

isNormalized(input: string, form?: string): boolean

Tests whether the input string is normalized according to the specified normalization form (defaults to NFC). Returns true if the string is normalized, false otherwise.

canonicalCombiningClass(char: string|number): number

Returns the Unicode canonical combining class value for the specified character. This is useful for some advanced text processing tasks, such as combining marks or grapheme clustering.

Conclusion

unorm is a simple and efficient library for working with Unicode text normalization in Node.js. Its API is well-documented and easy to use, making it a great choice for anyone who needs to work with Unicode strings.