Java : Normalizer with Examples

Normalizer (Java SE 18 & JDK 18) API Examples.
You will find code examples on most Normalizer methods.


Summary

This class provides the method normalize which transforms Unicode text into an equivalent composed or decomposed form, allowing for easier sorting and searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 — Unicode Normalization Forms.

Class diagram

System.out.println(Arrays.toString(Normalizer.Form.values())); // [NFD, NFC, NFKD, NFKC]

final var sources = List.of("Å", "¼", "⑩", "㌀");

for (final var src : sources) {
    System.out.println("----------");
    System.out.println("src : " + src);

    for (final var form : Normalizer.Form.values()) {
        if (!Normalizer.isNormalized(src, form)) {
            System.out.println("  " + form + " : " + Normalizer.normalize(src, form));
        }
    }
}

// Result
// ↓
//----------
//src : Å
//  NFD : Å
//  NFC : Å
//  NFKD : Å
//  NFKC : Å
//----------
//src : ¼
//  NFKD : 1⁄4
//  NFKC : 1⁄4
//----------
//src : ⑩
//  NFKD : 10
//  NFKC : 10
//----------
//src : ㌀
//  NFKD : アパート
//  NFKC : アパート

Methods

static boolean isNormalized (CharSequence src, Normalizer.Form form)

Determines if the given sequence of char values is normalized.

final var src = "⑩";
final var form = Normalizer.Form.NFKC;

System.out.println(Normalizer.isNormalized(src, form)); // false

final var normalized = Normalizer.normalize(src, form);
System.out.println(normalized); // 10
System.out.println(Normalizer.isNormalized(normalized, form)); // true

static String normalize (CharSequence src, Normalizer.Form form)

Normalize a sequence of char values.

Please see isNormalized(CharSequence src, Normalizer.Form form).


Related posts

To top of page