String metrics
From Wikipedia, the free encyclopedia
String metrics are metrics for defining similarity or distance on strings. The computed measures of distance can be exploited in fuzzy string searching.
[edit] List of metrics
- Hamming distance
- Levenshtein distance
- Needleman-Wunsch distance or Sellers Algorithm
- Smith-Waterman distance
- Gotoh Distance or Smith-Waterman-Gotoh distance
- Monge Elkan distance
- Block distance or L1 distance or City block distance
- Jaro distance metric
- Jaro-Winkler
- Soundex distance metric
- Matching Coefficient
- Dice’s Coefficient
- Jaccard similarity or Jaccard coefficient or Tanimoto coefficient
- Overlap Coefficient
- Euclidean distance or L2 distance
- Cosine similarity
- Variational distance
- Hellinger distance or Bhattacharyya distance
- Information Radius (Jensen-Shannon divergence)
- Harmonic Mean
- Skew divergence
- Confusion Probability
- Tau metric, an approximation of the Kullback-Leibler divergence
- Fellegi and Sunters metric (SFS)
- TFIDF or TF/IDF
- Maximal matches
[edit] See also
[edit] External links
- http://www.dcs.shef.ac.uk/~sam/stringmetrics.html A fairly complete overview