The Normalizer class

简介

Normalization is a process that involves transforming characters and sequences of characters into a formally-defined underlying representation. This process is most important when text needs to be compared for sorting and searching, but it is also used when storing text to ensure that the text is stored in a consistent representation.

The Unicode Consortium has defined a number of normalization forms reflecting the various needs of applications:

Normalization Form D (NFD) - Canonical Decomposition
Normalization Form C (NFC) - Canonical Decomposition followed by Canonical Composition
Normalization Form KD (NFKD) - Compatibility Decomposition
Normalization Form KC (NFKC) - Compatibility Decomposition followed by Canonical Composition

The different forms are defined in terms of a set of transformations on the text, transformations that are expressed by both an algorithm and a set of data files.

Class synopsis

Normalizer

Normalizer {

/* Methods */

static bool isNormalized ( string $input [, string $form ] )

static string normalize ( string $input [, string $form ] )

}

预定义常量

The following constants define the normalization form used by the normalizer:

Normalizer::FORM_C (string): Normalization Form C (NFC) - Canonical Decomposition followed by Canonical Composition
Normalizer::FORM_D (string): Normalization Form D (NFD) - Canonical Decomposition
Normalizer::FORM_KC (string): Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition
Normalizer::FORM_KD (string): Normalization Form KD (NFKD) - Compatibility Decomposition
Normalizer::NONE (string): No decomposition/composition
Normalizer::OPTION_DEFAULT (string): Default normalization options

参见

Normalizer::isNormalized — Checks if the provided string is already in the specified normalization form.
Normalizer::normalize — Normalizes the input provided and returns the normalized string

The Normalizer class

简介

Class synopsis

预定义常量

参见

Table of Contents