Binary Ordered Compression for Unicode
From Wikipedia, the free encyclopedia
Unicode |
---|
Encodings |
UCS |
Mapping |
Bi-directional text |
BOM |
Han unification |
Unicode and HTML |
Unicode and e-mail |
Unicode typefaces |
BOCU-1 is a MIME compatible Unicode compression scheme. BOCU stands for Binary Ordered Compression for Unicode. BOCU-1 combines the wide applicability of UTF-8 with the compactness of SCSU. This Unicode encoding is useful for compressing short strings, and it maintains code point order. Usually, the zip, bzip2, and other industry standard algorithms compact larger amounts of Unicode text more efficiently.
SCSU was created as a Unicode compression scheme with a byte/code point ratio similar to language-specific code pages. It has not been widely adopted although it fulfills the criteria for an IANA charset and is registered with IANA. SCSU is not suitable for MIME “text” media types. For example, SCSU cannot be used directly in emails and similar protocols. SCSU requires a complicated encoder design for good performance.
It is worth noting that SCSU has been adopted as an official Unicode Technical Standard. BOCU-1 has not been officially adopted by the Unicode consortium, but Unicode Technical Note #6 does describe this encoding in more detail.
[edit] External links
- Unicode Technical Note #6 BOCU-1: MIME Compatible Unicode Compression
- International Components for Unicode A library that can convert between BOCU-1 and other Unicode encodings