Unicode equivalence
From Wikipedia, the free encyclopedia
Unicode contains numerous characters to maintain compatibility with existing standards, some of which are functionally equivalent to other characters or sequences of characters. Because of this, Unicode defines some as equivalent. For example, the n character followed by the combining ~ character is equivalent to the single Unicode ñ character. Unicode maintains two standards for defining equivalence.
Contents |
[edit] Canonical Equivalence
Canonical equivalence is a narrower form of equivalence that preserves visually distinct though functionally equivalent characters. For example, superscript and subscript numbers, full-width and half-width katakana characters, and ligatures and their individual characters are considered canonically distinct. However, other things like the single and double character ñ encodings are still equivalent.
[edit] Compatibility Equivalence
Compatibility equivalence is broader than canonical equivalence, as it is more concerned with meaning than style preservation. Superscript, subscript, and normal numbers are all considered equivalent, though this could cause problems if superscript numbers were intended as exponents in mathematical equations. Full-width and half-width katakana characters are also equivalent, as are ligatures and their component letter sequences. Anything that is canonically equivalent is also compatibility equivalent, but the opposite is not always true.