Caverphone
From Wikipedia, the free encyclopedia
The Caverphone phonetic matching algorithm was created in the Caversham Project at the University of Otago in New Zealand in 2002. The algorithm allows accents present in the study area (southern part of the city of Dunedin, New Zealand).
The exact algorithm is as follows:
- Convert to lowercase
- Remove anything not A-Z
- If the name starts with
- cough make it cou2f
- rough make it rou2f
- tough make it tou2f
- enough make it enou2f
- gn make it 2n
- mb make it m2
- Replace
- cq with 2q
- ci with si
- ce with se
- cy with sy
- tch with 2ch
- c with k
- q with k
- x with k
- v with f
- dg with 2g
- tio with sio
- tia with sia
- d with t
- ph with fh
- b with p
- sh with s2
- z with s
- any initial vowel with an A
- all other vowels with a 3
- 3gh3 with 3kh3
- gh with 22
- g with k
- groups of the letter s with a S
- groups of the letter t with a T
- groups of the letter p with a P
- groups of the letter k with a K
- groups of the letter f with a F
- groups of the letter m with a M
- groups of the letter n with a N
- w3 with W3
- wy with Wy
- wh3 with Wh3
- why with Why
- w with 2
- any initial h with an A
- all other occurrences of h with a 2
- r3 with R3
- ry with Ry
- r with 2
- l3 with L3
- ly with Ly
- l with 2
- j with y
- y3 with Y3
- y with 2
- remove all
- 2s
- 3s
- put six 1s on the end
- take the first six characters as the code
[edit] Examples
Lee -> lee lee -> l33 l33 -> l l -> l111111 l111111 -> l11111
Thompson -> thompson thompson -> th3mps3n th3mps3n -> t23mps3n t23mps3n -> tmpsn tmpsn111111 -> tmpsn1
[edit] See also
[edit] External links
- Project Dedupe http://dedupe.sourceforge.net