Knowledge

Boyer–Moore string-search algorithm

Source 📝

2944: 129:
to skip sections of the text, resulting in a lower constant factor than many other string search algorithms. In general, the algorithm runs faster as the pattern length increases. The key features of the algorithm are to match on the tail of the pattern rather than the head, and to skip along the text in jumps of multiple characters rather than searching every single character in the text.
619:
Previous to the introduction of this algorithm, the usual way to search within text was to examine each character of the text for the first character of the pattern. Once that was found the subsequent characters of the text would be compared to the characters of the pattern. If no match occurred then
636:
in the pattern, then a partial shift of the pattern along the text is done to line up along the matching character and the process is repeated. Jumping along the text to make comparisons rather than checking every character in the text decreases the number of comparisons that have to be made, which
128:
being searched for (the pattern), but not the string being searched in (the text). It is thus well-suited for applications in which the pattern is much shorter than the text or where it persists across multiple searches. The Boyer–Moore algorithm uses information gathered during the preprocess step
2277:
Index 1, we matched the N, and it was preceded by something other than A. Now look at the pattern starting from the end, where do we have N preceded by something other than A? There are two other N's, but both are preceded by A. That means no part of the good suffix can be useful to us -- shift by
1505:
The good-suffix rule is markedly more complex in both concept and implementation than the bad-character rule. Like the bad-character rule, it also exploits the algorithm's feature of comparisons beginning at the end of the pattern and proceeding towards the pattern's start. It can be described as
714:
is reached (which means there is a match) or a mismatch occurs upon which the alignment is shifted forward (to the right) according to the maximum value permitted by a number of rules. The comparisons are performed again at the new alignment, and the process repeats until the alignment is shifted
623:
The key insight in this algorithm is that if the end of the pattern is compared to the text, then jumps along the text can be made rather than checking every character of the text. The reason that this works is that in lining up the pattern against the text, the last character of the pattern is
2644:
speeds up the process of checking whether a match has occurred at the given alignment by skipping explicit character comparisons. This uses information gleaned during the pre-processing of the pattern in conjunction with suffix match lengths recorded at each match attempt. Storing suffix match
624:
compared to the character in the text. If the characters do not match, there is no need to continue searching backwards along the text. If the character in the text does not match any of the characters in the pattern, then the next character in the text to check is located
112:
in 1977. The original paper contained static tables for computing the pattern shifts without an explanation of how to produce them. The algorithm for producing the tables was published in a follow-on paper; this paper contained errors which were later corrected by
2273:
Index 0, no characters matched, the character read was not an N. The good-suffix length is zero. Since there are plenty of letters in the pattern that are also not N, we have minimal information here - shifting by 1 is the least interesting result.
2266:
Index| Mismatch | Shift 0 | N| 1 1 | AN| 8 2 | MAN| 3 3 | NMAN| 6 4 | ANMAN| 6 5 | PANMAN| 6 6 | NPANMAN| 6 7 | ANPANMAN| 6
2281:
Index 2: We matched the AN, and it was preceded by not M. In the middle of the pattern there is a AN preceded by P, so it becomes the shift candidate. Shifting that AN to the right to line up with our match is a shift of 3.
964:
Methods vary on the exact form the table for the bad-character rule should take, but a simple constant-time lookup solution is as follows: create a 2D table which is indexed first by the index of the character
2297:
in 1979. As opposed to shifting, the Galil rule deals with speeding up the actual comparisons done at each alignment by skipping sections that are known to match. Suppose that at an alignment
1222: 1258: 734:
A shift is calculated by applying two rules: the bad-character rule and the good-suffix rule. The actual shifting offset is the maximum of the shifts calculated by these rules.
2652:
improves the performance of Boyer–Moore–Horspool algorithm. The searching pattern of particular sub-string in a given string is different from Boyer–Moore–Horspool algorithm.
2285:
Index 3 & up: the matched suffixes do not match anything else in the pattern, but the trailing suffix AN matches the start of the pattern, so the shifts here are all 6.
2591: 2481: 1007: 2566: 1106: 608: 2127: 2094: 1146: 1070: 2213: 2174: 1037: 666: 1868: 2250: 2057: 2020: 1978: 1940: 1907: 1823: 1777: 1174: 2687:
is the size of the alphabet. This space is for the original delta1 bad-character table in the C and Java implementations and the good-suffix table.
620:
the text would again be checked character by character in an effort to find a match. Thus almost every character in the text needs to be examined.
3541: 2425:
The Galil rule, in its original version, is only effective for versions that output multiple matches. It updates the substring range only on
3424: 2570:
in the worst case. This is easy to see when both pattern and text consist solely of the same repeated character. However, inclusion of the
2422:. In addition to increasing the efficiency of Boyer–Moore, the Galil rule is required for proving linear-time execution in the worst case. 1151: 2924: 3319: 3358: 940:
at which the comparison process failed (assuming such a failure occurred). The next occurrence of that character to the left in
3289: 2634: 3294: 3536: 3149: 3505: 3279: 2641: 2435: 3373: 3314: 3186: 3106: 2909: 3221: 1732:
is found), and another for use when the general case returns no meaningful result. These tables will be designated
722:
The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of
3018: 1183: 3401: 3123: 1231: 3406: 3261: 78: 65: 52: 3485: 3211: 2611: 3490: 3368: 3335: 3271: 101: 35: 3500: 3396: 3340: 3241: 3195: 2599: 125: 45: 1226:, with the last instance—the least shift amount—taking precedence. All unused characters are set as 3495: 3299: 3231: 2831: 2607: 2434:, i.e. a full match. A generalized version for dealing with submatches was reported in 1985 as the 3444: 3101:. Soda '91. Philadelphia, Pennsylvania: Society for Industrial and Applied Mathematics: 224–233. 2587: 2948: 2943: 2671:
is the length of the pattern string, which we are searching for in the text, which is of length
919:. The pattern is shifted right (in this case by 2) so that the next occurrence of the character 2862:
Rytter, Wojciech (1980). "A Correct Preprocessing Algorithm for Boyer–Moore String-Searching".
2826: 1150:
space complexity (make_delta1, makeCharTable). This is the same as the original delta1 and the
3449: 3391: 3251: 3179: 944:
is found, and a shift which brings that occurrence in line with the mismatched occurrence in
2451: 2446:
The Boyer–Moore algorithm as presented in the original paper has worst-case running time of
986: 3256: 3144: 2539: 1079: 581: 104:
that is the standard benchmark for practical string-search literature. It was developed by
2103: 2070: 1122: 1046: 8: 3454: 3159: 2900:
Gusfield, Dan (1999) , "Chapter 2 - Exact Matching: Classical Comparison-Based Methods",
2183: 2144: 1016: 645: 2947: This article incorporates text from this source, which is available under the 1832: 1723:
The good-suffix rule requires two tables: one for use in the general case (where a copy
3383: 3350: 3072: 3048: 2999: 2970:"On improving the worst case running time of the Boyer–Moore string matching algorithm" 2781: 2725: 2502: 2226: 2033: 1987: 1954: 1916: 1877: 1790: 1753: 1159: 573: 3515: 3102: 2991: 2905: 2879: 2844: 2773: 2675:. This runtime is for finding all occurrences of the pattern, without the Galil rule. 3056: 2785: 2637:
is a simplification of the Boyer–Moore algorithm using only the bad-character rule.
2614:
for predicate based matching within ranges as a part of the Phobos Runtime Library.
3510: 3480: 3434: 3236: 3172: 3153: 3076: 3064: 3030: 3003: 2981: 2871: 2836: 2763: 2747: 2729: 2717: 572:
by performing explicit character comparisons at different alignments. Instead of a
109: 93: 81: 3363: 3304: 3216: 2806: 2743: 2649: 2517: 2494: 114: 105: 68: 55: 3416: 3309: 3052: 2645:
lengths requires an additional table equal in size to the text being searched.
2506: 3063:. SFCS '77. Washington, District of Columbia: IEEE Computer Society: 189–195. 2814: 3530: 3226: 3203: 3095:"Tight bounds on the complexity of the Boyer–Moore string matching algorithm" 2995: 2883: 2848: 2810: 2777: 2498: 3057:"A new proof of the linearity of the Boyer–Moore string searching algorithm" 3429: 3246: 3094: 3061:
Proceedings of the 18th Annual Symposium on Foundations of Computer Science
2802: 2721: 2603: 2490: 121: 2986: 2969: 2768: 2751: 3439: 3068: 1944:
is defined to be zero if there is no position satisfying the condition.
3099:
Proceedings of the 2nd Annual ACM-SIAM Symposium on Discrete Algorithms
2708:
Hume, Andrew; Sunday, Daniel (November 1991). "Fast String Searching".
2571: 948:
is proposed. If the mismatched character does not occur to the left in
1011:
or -1 if there is no such occurrence. The proposed shift will then be
2965: 2582:
Various implementations exist in different programming languages. In
2294: 424: 3034: 2875: 2840: 2293:
A simple but important optimization of Boyer–Moore was put forth by
1872:
and such that the character preceding that suffix is not equal to
3475: 2762:(10). New York: Association for Computing Machinery: 762–772. 1676:) so that a prefix of the shifted pattern matches a suffix of 3019:"The Boyer–Moore–Galil String Searching Strategies Revisited" 2980:(9). New York: Association for Computing Machinery: 505–508. 2925:"Constructing a Good Suffix Table - Understanding an example" 2583: 2534:
occur in the text, running time of the original algorithm is
20: 1275: 927:) to the left of the current character (which is the middle 747: 137: 3459: 2622: 702:, moving backward. The strings are matched from the end of 632:
is the length of the pattern. If the character in the text
3164: 3017:
Apostolico, Alberto; Giancarlo, Raffaele (February 1986).
2904:(1 ed.), Cambridge University Press, pp. 19–21, 973:
in the pattern. This lookup will return the occurrence of
2618: 710:. The comparisons continue until either the beginning of 612:), Boyer–Moore uses information gained by preprocessing 2261: 1664:
to the right by the least amount (past the left end of
1546:
is the largest such substring for the given alignment.
564:
The Boyer–Moore algorithm searches for occurrences of
2542: 2454: 2229: 2186: 2147: 2106: 2073: 2036: 1990: 1957: 1919: 1880: 1835: 1793: 1756: 1234: 1186: 1162: 1125: 1082: 1049: 1019: 989: 648: 584: 379:
denotes the input text to be searched. Its length is
2797: 2795: 3160:
Richard Cole's 1991 paper proving runtime linearity
3016: 2586:it is part of the Standard Library since C++17 and 2560: 2475: 2413:can be recorded without explicitly comparing past 2244: 2207: 2168: 2121: 2088: 2051: 2014: 1972: 1934: 1901: 1862: 1817: 1771: 1252: 1216: 1168: 1140: 1100: 1064: 1031: 1001: 936:The bad-character rule considers the character in 660: 602: 391:denotes the string to be searched for, called the 2792: 952:, a shift is proposed that moves the entirety of 899:Demonstration of bad-character rule with pattern 640:More formally, the algorithm begins at alignment 3528: 2801: 1740:respectively. Their definitions are as follows: 719:, which means no further matches will be found. 2388:. Thus if the comparisons get down to position 1469:Demonstration of good-suffix rule with pattern 637:is the key to the efficiency of the algorithm. 915:(in the pattern) in the column marked with an 3180: 3047: 2489:appear in the text. This was first proved by 1550:Then find, if it exists, the right-most copy 2574:results in linear runtime across all cases. 1117:The C and Java implementations below have a 1110:space, assuming a finite alphabet of length 3145:Original paper on the Boyer-Moore algorithm 2902:Algorithms on Strings, Trees, and Sequences 2372:, in the next comparison phase a prefix of 1982:denote the length of the largest suffix of 1658:does not exist, then shift the left end of 1217:{\displaystyle \operatorname {len} (p)-1-i} 3187: 3173: 2742: 2707: 2617:The Boyer–Moore algorithm is also used in 2065:Both of these tables are constructible in 1601:differs from the character to the left of 1154:. This table maps a character at position 2985: 2855: 2830: 2767: 1703:If no such shift is possible, then shift 737: 628:characters farther along the text, where 3359:Comparison of regular-expression engines 2899: 969:in the alphabet and second by the index 616:to skip as many alignments as possible. 19:For the Boyer–Moore theorem prover, see 3150:An example of the Boyer-Moore algorithm 3121: 3088: 3086: 2960: 2958: 2736: 2527:comparisons in the worst case in 1991. 1265: 1253:{\displaystyle \operatorname {len} (p)} 3542:Computer-related introductions in 1977 3529: 2895: 2893: 2861: 576:of all alignments (of which there are 3320:Zhu–Takaoka string matching algorithm 3168: 2964: 2131:space. The alignment shift for index 2028:, if one exists. If none exists, let 3128:FreeBSD-current mailing list archive 3092: 3083: 2955: 2520:gave a proof with an upper bound of 2262:Shift Example using pattern ANPANMAN 686:are then compared starting at index 73:Θ(m) preprocessing + Ω(n/m) matching 3285:Boyer–Moore string-search algorithm 2890: 2752:"A Fast String Searching Algorithm" 2254:is zero or a match has been found. 98:Boyer–Moore string-search algorithm 60:Θ(m) preprocessing + O(mn) matching 13: 3041: 2815:"Fast pattern matching in strings" 2577: 2355:such that its left end is between 1781:is the largest position less than 1718: 1713:(length of P) places to the right. 14: 3553: 3374:Nondeterministic finite automaton 3315:Two-way string-matching algorithm 3138: 2710:Software: Practice and Experience 2288: 1586:and the character to the left of 1510:Suppose for a given alignment of 3122:Haertel, Mike (21 August 2010). 3093:Cole, Richard (September 1991). 2942: 1270: 959: 742: 516:such that the last character of 3115: 3010: 2516:comparisons in the worst case. 2509:in 1980 with an upper bound of 1619:to the right so that substring 407:denotes the character at index 3290:Boyer–Moore–Horspool algorithm 3280:Apostolico–Giancarlo algorithm 3156:, co-inventor of the algorithm 2917: 2701: 2678: 2662: 2642:Apostolico–Giancarlo algorithm 2635:Boyer–Moore–Horspool algorithm 2602:there is an implementation in 2555: 2546: 2470: 2458: 2441: 2436:Apostolico–Giancarlo algorithm 2239: 2233: 2202: 2196: 2163: 2157: 2116: 2110: 2083: 2077: 2046: 2040: 2009: 1994: 1967: 1961: 1929: 1923: 1896: 1884: 1857: 1854: 1848: 1839: 1812: 1797: 1766: 1760: 1247: 1241: 1199: 1193: 1135: 1129: 1095: 1086: 1059: 1053: 907:. There is a mismatch between 729: 559: 132: 1: 2694: 674:is aligned with the start of 3295:Knuth–Morris–Pratt algorithm 3222:Damerau–Levenshtein distance 1688:. This includes cases where 981:with the next-highest index 956:past the point of mismatch. 7: 3486:Compressed pattern matching 3212:Approximate string matching 3194: 2628: 2278:the full pattern length 8. 10: 3558: 3537:String matching algorithms 3491:Longest common subsequence 3402:Needleman–Wunsch algorithm 3272:String-searching algorithm 2592:generic Boyer–Moore search 102:string-searching algorithm 18: 16:String searching algorithm 3501:Sequential pattern mining 3468: 3415: 3382: 3349: 3341:Commentz-Walter algorithm 3329:Multiple string searching 3328: 3270: 3262:Wagner–Fischer algorithm 3202: 3023:SIAM Journal on Computing 2864:SIAM Journal on Computing 2819:SIAM Journal on Computing 2600:Go (programming language) 2594:implementation under the 2485:only if the pattern does 2380:must match the substring 2024:that is also a prefix of 77: 64: 51: 41: 31: 27:Boyer–Moore string search 3511:String rewriting systems 3496:Longest common substring 3407:Smith–Waterman algorithm 3232:Gestalt pattern matching 2655: 2608:D (programming language) 911:(in the input text) and 3445:Generalized suffix tree 3369:Thompson's construction 2221:should only be used if 1152:BMH bad-character table 543:occurs at an alignment 3397:Hirschberg's algorithm 3124:"why GNU grep is fast" 2722:10.1002/spe.4380211105 2562: 2477: 2476:{\displaystyle O(n+m)} 2246: 2209: 2170: 2123: 2090: 2063: 2053: 2016: 1974: 1946: 1936: 1903: 1864: 1819: 1773: 1716: 1634:aligns with substring 1254: 1218: 1170: 1142: 1102: 1066: 1033: 1003: 1002:{\displaystyle j<i} 738:The bad-character rule 662: 604: 520:is aligned with index 346:Alignments of pattern 3252:Levenshtein automaton 3242:Jaro–Winkler distance 3152:from the homepage of 2987:10.1145/359146.359148 2769:10.1145/359842.359859 2563: 2561:{\displaystyle O(nm)} 2501:in 1977, followed by 2478: 2247: 2210: 2171: 2124: 2091: 2054: 2017: 1975: 1947: 1937: 1904: 1865: 1820: 1774: 1742: 1694:is an exact match of 1508: 1262:as a sentinel value. 1255: 1219: 1171: 1143: 1103: 1101:{\displaystyle O(km)} 1067: 1034: 1004: 663: 605: 603:{\displaystyle n-m+1} 3300:Rabin–Karp algorithm 3257:Levenshtein distance 2807:Morris, James H. Jr. 2540: 2452: 2227: 2184: 2145: 2122:{\displaystyle O(m)} 2104: 2089:{\displaystyle O(m)} 2071: 2034: 1988: 1955: 1917: 1878: 1833: 1827:matches a suffix of 1791: 1754: 1534:matches a suffix of 1266:The good-suffix rule 1232: 1184: 1160: 1141:{\displaystyle O(k)} 1123: 1080: 1065:{\displaystyle O(1)} 1047: 1017: 987: 646: 582: 364:. A match occurs at 3455:Ternary search tree 3069:10.1109/SFCS.1977.3 2405:, an occurrence of 2208:{\displaystyle m-H} 2169:{\displaystyle m-L} 1580:is not a suffix of 1032:{\displaystyle i-j} 661:{\displaystyle k=m} 28: 3384:Sequence alignment 3351:Regular expression 2968:(September 1979). 2931:. 11 December 2014 2558: 2473: 2322:down to character 2242: 2205: 2166: 2119: 2086: 2049: 2012: 1970: 1932: 1899: 1863:{\displaystyle P]} 1860: 1815: 1769: 1250: 1214: 1166: 1138: 1098: 1062: 1029: 999: 670:, so the start of 658: 600: 574:brute-force search 431:starting at index 415:, counting from 1. 26: 3524: 3523: 3516:String operations 2811:Pratt, Vaughan R. 2748:Moore, J Strother 2716:(11): 1221–1248. 2530:When the pattern 2314:is compared with 2245:{\displaystyle L} 2052:{\displaystyle H} 2015:{\displaystyle P} 1973:{\displaystyle H} 1935:{\displaystyle L} 1902:{\displaystyle P} 1818:{\displaystyle P} 1785:such that string 1772:{\displaystyle L} 1466: 1465: 1169:{\displaystyle i} 896: 895: 551:is equivalent to 489:is the length of 485:in range , where 462:is the length of 458:in range , where 343: 342: 90: 89: 3549: 3481:Pattern matching 3435:Suffix automaton 3237:Hamming distance 3189: 3182: 3175: 3166: 3165: 3154:J Strother Moore 3132: 3131: 3119: 3113: 3112: 3090: 3081: 3080: 3049:Guibas, Leonidas 3045: 3039: 3038: 3014: 3008: 3007: 2989: 2962: 2953: 2946: 2940: 2938: 2936: 2921: 2915: 2914: 2897: 2888: 2887: 2859: 2853: 2852: 2834: 2803:Knuth, Donald E. 2799: 2790: 2789: 2771: 2750:(October 1977). 2744:Boyer, Robert S. 2740: 2734: 2733: 2705: 2688: 2682: 2676: 2666: 2612:BoyerMooreFinder 2569: 2567: 2565: 2564: 2559: 2526: 2515: 2484: 2482: 2480: 2479: 2474: 2433: 2421: 2412: 2404: 2396: 2387: 2379: 2371: 2362: 2354: 2345: 2337: 2329: 2321: 2313: 2305: 2253: 2251: 2249: 2248: 2243: 2220: 2216: 2214: 2212: 2211: 2206: 2177: 2175: 2173: 2172: 2167: 2138: 2134: 2130: 2128: 2126: 2125: 2120: 2097: 2095: 2093: 2092: 2087: 2060: 2058: 2056: 2055: 2050: 2027: 2023: 2021: 2019: 2018: 2013: 1981: 1979: 1977: 1976: 1971: 1943: 1941: 1939: 1938: 1933: 1910: 1908: 1906: 1905: 1900: 1871: 1869: 1867: 1866: 1861: 1826: 1824: 1822: 1821: 1816: 1784: 1780: 1778: 1776: 1775: 1770: 1747: 1739: 1735: 1729: 1655: 1625: 1592: 1577: 1556: 1493: 1447: 1442: 1437: 1400: 1395: 1390: 1382: 1377: 1372: 1347: 1342: 1337: 1276: 1261: 1259: 1257: 1256: 1251: 1225: 1223: 1221: 1220: 1215: 1177: 1175: 1173: 1172: 1167: 1149: 1147: 1145: 1144: 1139: 1113: 1109: 1107: 1105: 1104: 1099: 1074:lookup time and 1073: 1071: 1069: 1068: 1063: 1040: 1038: 1036: 1035: 1030: 1010: 1008: 1006: 1005: 1000: 980: 976: 972: 968: 955: 951: 947: 943: 939: 923:(in the pattern 874: 831: 800: 748: 725: 718: 715:past the end of 713: 709: 706:to the start of 705: 701: 697: 693: 689: 685: 681: 678:. Characters in 677: 673: 669: 667: 665: 664: 659: 631: 627: 615: 611: 609: 607: 606: 601: 571: 567: 395:. Its length is 138: 110:J Strother Moore 100:is an efficient 94:computer science 82:space complexity 29: 25: 3557: 3556: 3552: 3551: 3550: 3548: 3547: 3546: 3527: 3526: 3525: 3520: 3464: 3411: 3378: 3364:Regular grammar 3345: 3324: 3305:Raita algorithm 3266: 3217:Bitap algorithm 3198: 3193: 3141: 3136: 3135: 3120: 3116: 3109: 3091: 3084: 3053:Odlyzko, Andrew 3046: 3042: 3035:10.1137/0215007 3015: 3011: 2963: 2956: 2934: 2932: 2923: 2922: 2918: 2912: 2898: 2891: 2876:10.1137/0209037 2860: 2856: 2841:10.1137/0206024 2800: 2793: 2741: 2737: 2706: 2702: 2697: 2692: 2691: 2683: 2679: 2667: 2663: 2658: 2650:Raita algorithm 2631: 2580: 2578:Implementations 2541: 2538: 2537: 2535: 2521: 2510: 2453: 2450: 2449: 2447: 2444: 2426: 2420: 2414: 2406: 2398: 2395: 2389: 2381: 2373: 2370: 2364: 2356: 2353: 2347: 2339: 2331: 2323: 2315: 2307: 2304: 2298: 2291: 2268: 2264: 2259: 2257: 2228: 2225: 2224: 2222: 2218: 2185: 2182: 2181: 2179: 2146: 2143: 2142: 2140: 2136: 2132: 2105: 2102: 2101: 2099: 2072: 2069: 2068: 2066: 2035: 2032: 2031: 2029: 2025: 1989: 1986: 1985: 1983: 1956: 1953: 1952: 1950: 1918: 1915: 1914: 1912: 1879: 1876: 1875: 1873: 1834: 1831: 1830: 1828: 1792: 1789: 1788: 1786: 1782: 1755: 1752: 1751: 1749: 1745: 1737: 1733: 1727: 1721: 1653: 1623: 1590: 1575: 1554: 1503: 1502: 1501: 1491: 1445: 1440: 1435: 1398: 1393: 1388: 1380: 1375: 1370: 1345: 1340: 1335: 1273: 1268: 1233: 1230: 1229: 1227: 1185: 1182: 1181: 1179: 1161: 1158: 1157: 1155: 1124: 1121: 1120: 1118: 1111: 1081: 1078: 1077: 1075: 1048: 1045: 1044: 1042: 1018: 1015: 1014: 1012: 988: 985: 984: 982: 978: 974: 970: 966: 962: 953: 949: 945: 941: 937: 934: 933: 932: 872: 829: 798: 745: 740: 732: 723: 716: 711: 707: 703: 699: 695: 691: 687: 683: 679: 675: 671: 647: 644: 643: 641: 629: 625: 613: 583: 580: 579: 577: 569: 565: 562: 477:is a substring 450:is a substring 371: 370: 369: 355: 135: 115:Wojciech Rytter 106:Robert S. Boyer 24: 17: 12: 11: 5: 3555: 3545: 3544: 3539: 3522: 3521: 3519: 3518: 3513: 3508: 3503: 3498: 3493: 3488: 3483: 3478: 3472: 3470: 3466: 3465: 3463: 3462: 3457: 3452: 3447: 3442: 3437: 3432: 3427: 3421: 3419: 3417:Data structure 3413: 3412: 3410: 3409: 3404: 3399: 3394: 3388: 3386: 3380: 3379: 3377: 3376: 3371: 3366: 3361: 3355: 3353: 3347: 3346: 3344: 3343: 3338: 3332: 3330: 3326: 3325: 3323: 3322: 3317: 3312: 3310:Trigram search 3307: 3302: 3297: 3292: 3287: 3282: 3276: 3274: 3268: 3267: 3265: 3264: 3259: 3254: 3249: 3244: 3239: 3234: 3229: 3224: 3219: 3214: 3208: 3206: 3200: 3199: 3192: 3191: 3184: 3177: 3169: 3163: 3162: 3157: 3147: 3140: 3139:External links 3137: 3134: 3133: 3114: 3107: 3082: 3040: 3009: 2954: 2929:Stack Overflow 2916: 2910: 2889: 2870:(3): 509–512. 2854: 2832:10.1.1.93.8147 2825:(2): 323–350. 2791: 2735: 2699: 2698: 2696: 2693: 2690: 2689: 2677: 2660: 2659: 2657: 2654: 2630: 2627: 2579: 2576: 2557: 2554: 2551: 2548: 2545: 2472: 2469: 2466: 2463: 2460: 2457: 2443: 2440: 2418: 2393: 2368: 2351: 2346:is shifted to 2302: 2290: 2289:The Galil rule 2287: 2265: 2263: 2260: 2241: 2238: 2235: 2232: 2204: 2201: 2198: 2195: 2192: 2189: 2165: 2162: 2159: 2156: 2153: 2150: 2118: 2115: 2112: 2109: 2085: 2082: 2079: 2076: 2048: 2045: 2042: 2039: 2011: 2008: 2005: 2002: 1999: 1996: 1993: 1969: 1966: 1963: 1960: 1931: 1928: 1925: 1922: 1898: 1895: 1892: 1889: 1886: 1883: 1859: 1856: 1853: 1850: 1847: 1844: 1841: 1838: 1814: 1811: 1808: 1805: 1802: 1799: 1796: 1768: 1765: 1762: 1759: 1720: 1717: 1715: 1714: 1701: 1647: 1522:, a substring 1468: 1467: 1464: 1463: 1460: 1457: 1454: 1451: 1448: 1443: 1438: 1433: 1430: 1427: 1424: 1421: 1417: 1416: 1413: 1410: 1407: 1404: 1401: 1396: 1391: 1386: 1383: 1378: 1373: 1368: 1364: 1363: 1360: 1357: 1354: 1351: 1348: 1343: 1338: 1333: 1330: 1327: 1324: 1321: 1317: 1316: 1313: 1310: 1307: 1304: 1301: 1298: 1295: 1292: 1289: 1286: 1283: 1280: 1274: 1272: 1269: 1267: 1264: 1249: 1246: 1243: 1240: 1237: 1213: 1210: 1207: 1204: 1201: 1198: 1195: 1192: 1189: 1165: 1137: 1134: 1131: 1128: 1097: 1094: 1091: 1088: 1085: 1061: 1058: 1055: 1052: 1028: 1025: 1022: 998: 995: 992: 961: 958: 898: 897: 894: 893: 890: 887: 884: 881: 878: 875: 870: 867: 864: 861: 857: 856: 853: 850: 847: 844: 841: 838: 835: 832: 827: 824: 820: 819: 816: 813: 810: 807: 804: 801: 796: 793: 790: 787: 783: 782: 779: 776: 773: 770: 767: 764: 761: 758: 755: 752: 746: 744: 741: 739: 736: 731: 728: 657: 654: 651: 599: 596: 593: 590: 587: 561: 558: 557: 556: 529: 494: 467: 440: 435:and ending at 416: 400: 384: 345: 344: 341: 340: 337: 334: 331: 328: 325: 322: 319: 316: 312: 311: 308: 305: 302: 299: 296: 293: 290: 287: 283: 282: 279: 276: 273: 270: 267: 264: 261: 258: 254: 253: 250: 247: 244: 241: 238: 235: 232: 229: 225: 224: 221: 218: 215: 212: 209: 206: 203: 200: 196: 195: 192: 189: 186: 183: 180: 177: 174: 171: 167: 166: 163: 160: 157: 154: 151: 148: 145: 142: 136: 134: 131: 120:The algorithm 88: 87: 84: 75: 74: 71: 62: 61: 58: 49: 48: 43: 42:Data structure 39: 38: 33: 15: 9: 6: 4: 3: 2: 3554: 3543: 3540: 3538: 3535: 3534: 3532: 3517: 3514: 3512: 3509: 3507: 3504: 3502: 3499: 3497: 3494: 3492: 3489: 3487: 3484: 3482: 3479: 3477: 3474: 3473: 3471: 3467: 3461: 3458: 3456: 3453: 3451: 3448: 3446: 3443: 3441: 3438: 3436: 3433: 3431: 3428: 3426: 3423: 3422: 3420: 3418: 3414: 3408: 3405: 3403: 3400: 3398: 3395: 3393: 3390: 3389: 3387: 3385: 3381: 3375: 3372: 3370: 3367: 3365: 3362: 3360: 3357: 3356: 3354: 3352: 3348: 3342: 3339: 3337: 3334: 3333: 3331: 3327: 3321: 3318: 3316: 3313: 3311: 3308: 3306: 3303: 3301: 3298: 3296: 3293: 3291: 3288: 3286: 3283: 3281: 3278: 3277: 3275: 3273: 3269: 3263: 3260: 3258: 3255: 3253: 3250: 3248: 3245: 3243: 3240: 3238: 3235: 3233: 3230: 3228: 3227:Edit distance 3225: 3223: 3220: 3218: 3215: 3213: 3210: 3209: 3207: 3205: 3204:String metric 3201: 3197: 3190: 3185: 3183: 3178: 3176: 3171: 3170: 3167: 3161: 3158: 3155: 3151: 3148: 3146: 3143: 3142: 3129: 3125: 3118: 3110: 3108:0-89791-376-0 3104: 3100: 3096: 3089: 3087: 3078: 3074: 3070: 3066: 3062: 3058: 3054: 3050: 3044: 3036: 3032: 3028: 3024: 3020: 3013: 3005: 3001: 2997: 2993: 2988: 2983: 2979: 2975: 2971: 2967: 2961: 2959: 2952: 2950: 2945: 2930: 2926: 2920: 2913: 2911:0-521-58519-8 2907: 2903: 2896: 2894: 2885: 2881: 2877: 2873: 2869: 2865: 2858: 2850: 2846: 2842: 2838: 2833: 2828: 2824: 2820: 2816: 2812: 2808: 2804: 2798: 2796: 2787: 2783: 2779: 2775: 2770: 2765: 2761: 2757: 2753: 2749: 2745: 2739: 2731: 2727: 2723: 2719: 2715: 2711: 2704: 2700: 2686: 2681: 2674: 2670: 2665: 2661: 2653: 2651: 2646: 2643: 2638: 2636: 2626: 2624: 2620: 2615: 2613: 2609: 2605: 2601: 2597: 2593: 2590:provides the 2589: 2585: 2575: 2573: 2552: 2549: 2543: 2533: 2528: 2525: 2519: 2514: 2508: 2504: 2500: 2496: 2492: 2488: 2467: 2464: 2461: 2455: 2439: 2437: 2431: 2430: 2423: 2417: 2411: 2410: 2403: 2402: 2392: 2386: 2385: 2378: 2377: 2367: 2361: 2360: 2350: 2344: 2343: 2336: 2335: 2328: 2327: 2320: 2319: 2312: 2311: 2301: 2296: 2286: 2283: 2279: 2275: 2271: 2270:Explanation: 2258: 2255: 2236: 2230: 2199: 2193: 2190: 2187: 2160: 2154: 2151: 2148: 2113: 2107: 2098:time and use 2080: 2074: 2062: 2043: 2037: 2006: 2003: 2000: 1997: 1991: 1964: 1958: 1945: 1926: 1920: 1893: 1890: 1887: 1881: 1851: 1845: 1842: 1836: 1809: 1806: 1803: 1800: 1794: 1763: 1757: 1741: 1731: 1730: 1719:Preprocessing 1712: 1708: 1707: 1702: 1699: 1698: 1693: 1692: 1687: 1686: 1681: 1680: 1675: 1674: 1669: 1668: 1663: 1662: 1657: 1656: 1648: 1645: 1644: 1639: 1638: 1633: 1632: 1627: 1626: 1618: 1617: 1612: 1611: 1606: 1605: 1600: 1599: 1594: 1593: 1585: 1584: 1579: 1578: 1570: 1569: 1564: 1563: 1558: 1557: 1549: 1548: 1547: 1545: 1544: 1539: 1538: 1533: 1532: 1527: 1526: 1521: 1520: 1515: 1514: 1507: 1499: 1495: 1494: 1486: 1482: 1481: 1476: 1472: 1461: 1458: 1455: 1452: 1449: 1444: 1439: 1434: 1431: 1428: 1425: 1422: 1419: 1418: 1414: 1411: 1408: 1405: 1402: 1397: 1392: 1387: 1384: 1379: 1374: 1369: 1366: 1365: 1361: 1358: 1355: 1352: 1349: 1344: 1339: 1334: 1331: 1328: 1325: 1322: 1319: 1318: 1314: 1311: 1308: 1305: 1302: 1299: 1296: 1293: 1290: 1287: 1284: 1281: 1278: 1277: 1263: 1244: 1238: 1235: 1211: 1208: 1205: 1202: 1196: 1190: 1187: 1163: 1153: 1132: 1126: 1115: 1092: 1089: 1083: 1056: 1050: 1026: 1023: 1020: 996: 993: 990: 960:Preprocessing 957: 930: 926: 922: 918: 914: 910: 906: 902: 891: 888: 885: 882: 879: 876: 871: 868: 865: 862: 859: 858: 854: 851: 848: 845: 842: 839: 836: 833: 828: 825: 822: 821: 817: 814: 811: 808: 805: 802: 797: 794: 791: 788: 785: 784: 780: 777: 774: 771: 768: 765: 762: 759: 756: 753: 750: 749: 735: 727: 720: 655: 652: 649: 638: 635: 621: 617: 597: 594: 591: 588: 585: 575: 554: 550: 546: 542: 538: 534: 530: 527: 523: 519: 515: 511: 507: 503: 499: 495: 492: 488: 484: 480: 476: 472: 468: 465: 461: 457: 453: 449: 445: 441: 438: 434: 430: 426: 422: 421: 417: 414: 410: 406: 405: 401: 398: 394: 390: 389: 385: 382: 378: 377: 373: 372: 367: 363: 359: 353: 349: 338: 335: 332: 329: 326: 323: 320: 317: 314: 313: 309: 306: 303: 300: 297: 294: 291: 288: 285: 284: 280: 277: 274: 271: 268: 265: 262: 259: 256: 255: 251: 248: 245: 242: 239: 236: 233: 230: 227: 226: 222: 219: 216: 213: 210: 207: 204: 201: 198: 197: 193: 190: 187: 184: 181: 178: 175: 172: 169: 168: 164: 161: 158: 155: 152: 149: 146: 143: 140: 139: 130: 127: 123: 118: 116: 111: 107: 103: 99: 95: 85: 83: 80: 76: 72: 70: 67: 63: 59: 57: 54: 50: 47: 44: 40: 37: 36:String search 34: 30: 22: 3430:Suffix array 3336:Aho–Corasick 3284: 3247:Lee distance 3127: 3117: 3098: 3060: 3043: 3026: 3022: 3012: 2977: 2973: 2949:CC BY-SA 3.0 2941: 2933:. Retrieved 2928: 2919: 2901: 2867: 2863: 2857: 2822: 2818: 2759: 2755: 2738: 2713: 2709: 2703: 2684: 2680: 2672: 2668: 2664: 2647: 2639: 2632: 2616: 2598:library. In 2595: 2581: 2531: 2529: 2523: 2518:Richard Cole 2512: 2486: 2445: 2428: 2427: 2424: 2415: 2408: 2407: 2400: 2399: 2390: 2383: 2382: 2375: 2374: 2365: 2358: 2357: 2348: 2341: 2340: 2333: 2332: 2325: 2324: 2317: 2316: 2309: 2308: 2299: 2292: 2284: 2280: 2276: 2272: 2269: 2256: 2139:is given by 2064: 1948: 1743: 1725: 1724: 1722: 1710: 1705: 1704: 1696: 1695: 1690: 1689: 1684: 1683: 1678: 1677: 1672: 1671: 1666: 1665: 1660: 1659: 1651: 1650: 1642: 1641: 1636: 1635: 1630: 1629: 1621: 1620: 1615: 1614: 1609: 1608: 1603: 1602: 1597: 1596: 1588: 1587: 1582: 1581: 1573: 1572: 1567: 1566: 1561: 1560: 1552: 1551: 1542: 1541: 1540:and suppose 1536: 1535: 1530: 1529: 1524: 1523: 1518: 1517: 1512: 1511: 1509: 1504: 1497: 1489: 1488: 1484: 1479: 1478: 1474: 1470: 1178:to shift by 1116: 963: 935: 928: 924: 920: 916: 912: 908: 904: 900: 733: 721: 639: 633: 622: 618: 563: 552: 548: 544: 540: 536: 532: 525: 521: 517: 513: 509: 508:is an index 505: 501: 497: 490: 486: 482: 478: 474: 470: 463: 459: 455: 451: 447: 443: 439:, inclusive. 436: 432: 428: 423:denotes the 419: 418: 412: 408: 403: 402: 396: 392: 387: 386: 380: 375: 374: 365: 361: 357: 351: 347: 122:preprocesses 119: 97: 91: 3440:Suffix tree 2442:Performance 1271:Description 931:) is found. 743:Description 730:Shift rules 560:Description 133:Definitions 69:performance 56:performance 3531:Categories 3029:: 98–105. 2695:References 2572:Galil rule 2338:. Then if 1571:such that 537:occurrence 427:of string 411:of string 79:Worst-case 53:Worst-case 2996:0001-0782 2974:Comm. ACM 2966:Galil, Z. 2884:0097-5397 2849:0097-5397 2827:CiteSeerX 2778:0001-0782 2756:Comm. ACM 2604:search.go 2596:Algorithm 2295:Zvi Galil 2191:− 2152:− 2061:be zero. 1891:− 1744:For each 1506:follows: 1239:⁡ 1209:− 1203:− 1191:⁡ 1024:− 589:− 498:alignment 481:for some 454:for some 425:substring 117:in 1980. 66:Best-case 3055:(1977). 2951:license. 2813:(1977). 2786:15892987 2629:Variants 1613:. Shift 1477:. Here, 1475:ANAMPNAM 352:ANPANMAN 350:to text 3506:Sorting 3476:Parsing 3196:Strings 3077:6470193 3004:1333465 2935:30 July 2730:5902579 2610:uses a 2568:⁠ 2536:⁠ 2507:Odlyzko 2483:⁠ 2448:⁠ 2252:⁠ 2223:⁠ 2215:⁠ 2180:⁠ 2176:⁠ 2141:⁠ 2129:⁠ 2100:⁠ 2096:⁠ 2067:⁠ 2059:⁠ 2030:⁠ 2022:⁠ 1984:⁠ 1980:⁠ 1951:⁠ 1942:⁠ 1913:⁠ 1909:⁠ 1874:⁠ 1870:⁠ 1829:⁠ 1825:⁠ 1787:⁠ 1779:⁠ 1750:⁠ 1260:⁠ 1228:⁠ 1224:⁠ 1180:⁠ 1176:⁠ 1156:⁠ 1148:⁠ 1119:⁠ 1108:⁠ 1076:⁠ 1072:⁠ 1043:⁠ 1041:, with 1039:⁠ 1013:⁠ 1009:⁠ 983:⁠ 905:NNAAMAN 668:⁠ 642:⁠ 610:⁠ 578:⁠ 393:pattern 3105:  3075:  3002:  2994:  2908:  2882:  2847:  2829:  2784:  2776:  2728:  2503:Guibas 2497:, and 2495:Morris 471:suffix 444:prefix 126:string 96:, the 86:Θ(k+m) 46:String 3469:Other 3425:DAFSA 3392:BLAST 3073:S2CID 3000:S2CID 2782:S2CID 2726:S2CID 2656:Notes 2588:Boost 2499:Pratt 2491:Knuth 533:match 356:from 32:Class 21:Nqthm 3460:Trie 3450:Rope 3103:ISBN 2992:ISSN 2937:2024 2906:ISBN 2880:ISSN 2845:ISSN 2774:ISSN 2648:The 2640:The 2633:The 2623:grep 2532:does 2505:and 2363:and 1949:Let 1736:and 1516:and 1487:and 994:< 694:and 682:and 124:the 108:and 3065:doi 3031:doi 2982:doi 2872:doi 2837:doi 2764:doi 2718:doi 2621:'s 2619:GNU 2584:C++ 2487:not 2432:= 0 2397:of 2330:of 2178:or 2135:in 1843:1.. 1709:by 1682:in 1670:in 1649:If 1640:in 1628:in 1607:in 1595:in 1565:in 1559:of 1528:of 1496:is 1483:is 1236:len 1188:len 977:in 698:in 690:in 568:in 547:if 539:of 535:or 524:of 512:in 504:to 500:of 496:An 473:of 446:of 366:k=5 362:k=8 360:to 358:k=3 348:PAN 92:In 3533:: 3126:. 3097:. 3085:^ 3071:. 3059:. 3051:; 3027:15 3025:. 3021:. 2998:. 2990:. 2978:22 2976:. 2972:. 2957:^ 2927:. 2892:^ 2878:. 2866:. 2843:. 2835:. 2821:. 2817:. 2809:; 2805:; 2794:^ 2780:. 2772:. 2760:20 2758:. 2754:. 2746:; 2724:. 2714:21 2712:. 2625:. 2606:. 2493:, 2438:. 2306:, 2217:. 1911:. 1748:, 1473:= 1462:- 1415:- 1362:- 1315:- 1114:. 903:= 892:- 855:- 818:- 781:- 726:. 634:is 531:A 469:A 442:A 339:- 310:- 281:- 252:- 223:- 194:- 165:- 3188:e 3181:t 3174:v 3130:. 3111:. 3079:. 3067:: 3037:. 3033:: 3006:. 2984:: 2939:. 2886:. 2874:: 2868:9 2851:. 2839:: 2823:6 2788:. 2766:: 2732:. 2720:: 2685:k 2673:n 2669:m 2556:) 2553:m 2550:n 2547:( 2544:O 2524:n 2522:3 2513:n 2511:5 2471:) 2468:m 2465:+ 2462:n 2459:( 2456:O 2429:c 2419:1 2416:k 2409:P 2401:T 2394:1 2391:k 2384:T 2376:P 2369:1 2366:k 2359:c 2352:2 2349:k 2342:P 2334:T 2326:c 2318:T 2310:P 2303:1 2300:k 2240:] 2237:i 2234:[ 2231:L 2219:H 2203:] 2200:i 2197:[ 2194:H 2188:m 2164:] 2161:i 2158:[ 2155:L 2149:m 2137:P 2133:i 2117:) 2114:m 2111:( 2108:O 2084:) 2081:m 2078:( 2075:O 2047:] 2044:i 2041:[ 2038:H 2026:P 2010:] 2007:m 2004:. 2001:. 1998:i 1995:[ 1992:P 1968:] 1965:i 1962:[ 1959:H 1930:] 1927:i 1924:[ 1921:L 1897:] 1894:1 1888:i 1885:[ 1882:P 1858:] 1855:] 1852:i 1849:[ 1846:L 1840:[ 1837:P 1813:] 1810:m 1807:. 1804:. 1801:i 1798:[ 1795:P 1783:m 1767:] 1764:i 1761:[ 1758:L 1746:i 1738:H 1734:L 1728:′ 1726:t 1711:m 1706:P 1700:. 1697:P 1691:t 1685:T 1679:t 1673:T 1667:t 1661:P 1654:′ 1652:t 1646:. 1643:T 1637:t 1631:P 1624:′ 1622:t 1616:P 1610:P 1604:t 1598:P 1591:′ 1589:t 1583:P 1576:′ 1574:t 1568:P 1562:t 1555:′ 1553:t 1543:t 1537:P 1531:T 1525:t 1519:T 1513:P 1500:. 1498:P 1492:′ 1490:t 1485:T 1480:t 1471:P 1459:M 1456:A 1453:N 1450:P 1446:M 1441:A 1436:N 1432:A 1429:- 1426:- 1423:- 1420:- 1412:- 1409:- 1406:- 1403:- 1399:M 1394:A 1389:N 1385:P 1381:M 1376:A 1371:N 1367:A 1359:P 1356:A 1353:N 1350:A 1346:M 1341:A 1336:N 1332:A 1329:P 1326:N 1323:A 1320:M 1312:- 1309:- 1306:- 1303:- 1300:K 1297:- 1294:- 1291:X 1288:- 1285:- 1282:- 1279:- 1248:) 1245:p 1242:( 1212:i 1206:1 1200:) 1197:p 1194:( 1164:i 1136:) 1133:k 1130:( 1127:O 1112:k 1096:) 1093:m 1090:k 1087:( 1084:O 1060:) 1057:1 1054:( 1051:O 1027:j 1021:i 997:i 991:j 979:P 975:c 971:i 967:c 954:P 950:P 946:T 942:P 938:T 929:A 925:P 921:N 917:X 913:A 909:N 901:P 889:N 886:A 883:M 880:A 877:A 873:N 869:N 866:- 863:- 860:- 852:- 849:- 846:N 843:A 840:M 837:A 834:A 830:N 826:N 823:- 815:M 812:A 809:N 806:A 803:M 799:N 795:A 792:P 789:N 786:A 778:- 775:- 772:K 769:- 766:- 763:X 760:- 757:- 754:- 751:- 724:P 717:T 712:P 708:P 704:P 700:T 696:k 692:P 688:m 684:T 680:P 676:T 672:P 656:m 653:= 650:k 630:m 626:m 614:P 598:1 595:+ 592:m 586:n 570:T 566:P 555:. 553:T 549:P 545:k 541:P 528:. 526:T 522:k 518:P 514:T 510:k 506:T 502:P 493:. 491:S 487:l 483:i 479:S 475:S 466:. 464:S 460:l 456:i 452:S 448:S 437:j 433:i 429:S 420:S 413:S 409:i 404:S 399:. 397:m 388:P 383:. 381:n 376:T 368:. 354:, 336:N 333:A 330:P 327:- 324:- 321:- 318:- 315:- 307:- 304:N 301:A 298:P 295:- 292:- 289:- 286:- 278:- 275:- 272:N 269:A 266:P 263:- 260:- 257:- 249:- 246:- 243:- 240:N 237:A 234:P 231:- 228:- 220:- 217:- 214:- 211:- 208:N 205:A 202:P 199:- 191:- 188:- 185:- 182:- 179:- 176:N 173:A 170:P 162:N 159:A 156:M 153:N 150:A 147:P 144:N 141:A 23:.

Index

Nqthm
String search
String
Worst-case
performance
Best-case
performance
Worst-case
space complexity
computer science
string-searching algorithm
Robert S. Boyer
J Strother Moore
Wojciech Rytter
preprocesses
string
substring
brute-force search
BMH bad-character table
Zvi Galil
Apostolico–Giancarlo algorithm
Knuth
Morris
Pratt
Guibas
Odlyzko
Richard Cole
Galil rule
C++
Boost

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.