Hyphenation algorithm
From Wikipedia, the free encyclopedia
One of the reasons for the complexity of the rules of word-breaking, or hyphenation algorithm, is that different 'dialects' of English tend to differ on the rule: American English tends to work on sound, while British English tends to look to the origins of the word and then to sound. There are also a large number of exceptions which further complicates matters.
Some rules of thumb, for humans, can be found in the reference 'On Hyphenation - Anarchy of Pedantry.' Among algorithmic approaches to hyphenation, the one implemented in the TeX typesetting system is widely used. It is thoroughly documented in the first two volumes of Computers and Typesetting and in Frank Liang's dissertation. Contrary to the belief that TeX relies on a large dictionary of exceptions, the point of Liang's work was to get the algorithm as accurate as he practically could and keep any exception dictionary small. In TeX's original hyphenation patterns for US English, the exception list contains fourteen words.
Ports of the TeX hyphenation algorithm are available as libraries for several programming languages, including Perl, Ruby, and PostScript.
[edit] References
- On Hyphenation - Anarchy of Pedantry. PC Update, the magazine of Melbourne PC User Group, Australia. Retrieved on October 6, 2005.
- Liang, Franklin Mark (1983). "Word Hy-phen-a-tion by Com-put-er". Stanford University.
- TeX-Hyphen. Comprehensive Perl Archive Network. Retrieved on October 18, 2005.
- text-hyphen. RubyForge. Retrieved on October 18, 2005.
- Knuth-Liang hyphenation for the PostScript® language. anastigmatix.net. Retrieved on October 6, 2005.
- TeXHyphenator-J: TeX Hyphenator in Java. Retrieved on September 14, 2006.