Boyer–Moore–Horspool algorithm
From Wikipedia, the free encyclopedia
In computer science, the Boyer–Moore–Horspool algorithm or Horspool's algorithm is an algorithm for finding substrings in strings.
It is a simplification of the Boyer-Moore algorithm which is related to the Knuth-Morris-Pratt algorithm. The algorithm trades space for time in order to obtain an average case complexity of O(N) on random text, although it has O(MN) in the worst case.
[edit] Example implementation
Here is an example implementation of the Boyer-Moore-Horspool algorithm, written in C.
#include <string.h> #include <limits.h> /* The constant UCHAR_MAX is assumed to contain the maximum * value of the input character type. Typically it's 255. * size_t and ssize_t are unsigned and signed types for * representing sizes. (signed meaning that it can be negative.) * If your system doesn't have those, substitute them with * unsigned int and int respectively. */ static size_t max(ssize_t a, ssize_t b) { return a>b ? a : b; } /* Returns a pointer to the first occurrence of "needle" * within "haystack", or NULL if not found. */ const unsigned char* memmem_boyermoore_simplified (const unsigned char* haystack, size_t hlen, const unsigned char* needle, size_t nlen) { size_t occ[UCHAR_MAX+1]; size_t a, b, hpos; /* Sanity checks on the parameters */ if(nlen > hlen || nlen <= 0 || !haystack || !needle) return NULL; /* Preprocess */ /* Initialize the table to default value */ for(a=0; a<UCHAR_MAX+1; a=a+1) occ[a] = nlen; /* Then populate it with the analysis of the needle */ b=nlen; for(a=0; a<nlen; a=a+1) { b=b-1; occ[needle[a]] = b; } /* Start searching from the end of needle (this is not a typo) */ hpos = nlen-1; while(hpos < hlen) { /* Compare the needle backwards, and stop when first mismatch is found */ size_t npos = nlen-1; while(haystack[hpos] == needle[npos]) { if(npos == 0) return haystack + hpos; hpos = hpos-1; npos = npos-1; } /* Find out how much ahead we can skip based * on the byte that was found */ hpos += max(nlen - npos, occ[haystack[hpos]]); } return NULL; }
[edit] See also
[edit] External links
- FASMLIB contains implementation in x86 32bit assembly (procedure "mem.find")
- "BMH.ASM" is an implementation of the Boyer–Moore–Horspool algorithm in x86 32bit assembly. ("BM.ASM" implements the Boyer–Moore string search algorithm).