Biometric word list

From Wikipedia, the free encyclopedia

It has been suggested that this article or section be merged with PGP Word List. (Discuss)

A biometric word list is a list of words for conveying data bytes in a clear unambiguous way via a voice channel. They are analogous in purpose to the NATO phonetic alphabet used by pilots, except a longer list of words is used, each word corresponding to one of the 256 unique numeric byte values.

The first (and only?) biometric word list was designed in 1995 by Patrick Juola, a computational linguist, and Philip Zimmermann, creator of PGP. The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme space. Grady Ward's Moby Pronunciator list was used as raw material to search for words.

The Zimmermann/Juola list was originally designed to be used in PGPfone, to allow the two parties to compare a short authentication string to detect a man-in-the-middle attack. It was called a biometric word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP to compare and verify PGP public key fingerprints over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by Jon Callas.

Each list contains 256 phonetically distinct words, each word representing a different byte value between 0 and 255. We use two lists, because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an even or an odd offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of syllables; the even list has words of two syllables, the odd list has three. Using a two-list scheme was suggested by Zhahai Stewart.

An example of words from the parallel lists are as follows:

Byte	Corresponding words
(hex value)	Two-Syllable	Three-Syllable
20	bison	butterfat
29	breakup	certify
6B	glitter	Hamilton
FE	woodlark	yesteryear
38	classic	consulting
0D	ancient	asteroid
D2	standard	sensation

For example, the fourteen-digit hexadecimal number sequence 20 29 6B FE 38 0D D2 is represented by the seven-word sequence, "bison certify glitter yesteryear classic asteroid standard", as shown by the boldfaced words in the table.