2944:
129:
to skip sections of the text, resulting in a lower constant factor than many other string search algorithms. In general, the algorithm runs faster as the pattern length increases. The key features of the algorithm are to match on the tail of the pattern rather than the head, and to skip along the text in jumps of multiple characters rather than searching every single character in the text.
619:
Previous to the introduction of this algorithm, the usual way to search within text was to examine each character of the text for the first character of the pattern. Once that was found the subsequent characters of the text would be compared to the characters of the pattern. If no match occurred then
636:
in the pattern, then a partial shift of the pattern along the text is done to line up along the matching character and the process is repeated. Jumping along the text to make comparisons rather than checking every character in the text decreases the number of comparisons that have to be made, which
128:
being searched for (the pattern), but not the string being searched in (the text). It is thus well-suited for applications in which the pattern is much shorter than the text or where it persists across multiple searches. The Boyer–Moore algorithm uses information gathered during the preprocess step
2277:
Index 1, we matched the N, and it was preceded by something other than A. Now look at the pattern starting from the end, where do we have N preceded by something other than A? There are two other N's, but both are preceded by A. That means no part of the good suffix can be useful to us -- shift by
1505:
The good-suffix rule is markedly more complex in both concept and implementation than the bad-character rule. Like the bad-character rule, it also exploits the algorithm's feature of comparisons beginning at the end of the pattern and proceeding towards the pattern's start. It can be described as
714:
is reached (which means there is a match) or a mismatch occurs upon which the alignment is shifted forward (to the right) according to the maximum value permitted by a number of rules. The comparisons are performed again at the new alignment, and the process repeats until the alignment is shifted
623:
The key insight in this algorithm is that if the end of the pattern is compared to the text, then jumps along the text can be made rather than checking every character of the text. The reason that this works is that in lining up the pattern against the text, the last character of the pattern is
2644:
speeds up the process of checking whether a match has occurred at the given alignment by skipping explicit character comparisons. This uses information gleaned during the pre-processing of the pattern in conjunction with suffix match lengths recorded at each match attempt. Storing suffix match
624:
compared to the character in the text. If the characters do not match, there is no need to continue searching backwards along the text. If the character in the text does not match any of the characters in the pattern, then the next character in the text to check is located
112:
in 1977. The original paper contained static tables for computing the pattern shifts without an explanation of how to produce them. The algorithm for producing the tables was published in a follow-on paper; this paper contained errors which were later corrected by
2273:
Index 0, no characters matched, the character read was not an N. The good-suffix length is zero. Since there are plenty of letters in the pattern that are also not N, we have minimal information here - shifting by 1 is the least interesting result.
2266:
Index| Mismatch | Shift 0 | N| 1 1 | AN| 8 2 | MAN| 3 3 | NMAN| 6 4 | ANMAN| 6 5 | PANMAN| 6 6 | NPANMAN| 6 7 | ANPANMAN| 6
2281:
Index 2: We matched the AN, and it was preceded by not M. In the middle of the pattern there is a AN preceded by P, so it becomes the shift candidate. Shifting that AN to the right to line up with our match is a shift of 3.
964:
Methods vary on the exact form the table for the bad-character rule should take, but a simple constant-time lookup solution is as follows: create a 2D table which is indexed first by the index of the character
2297:
in 1979. As opposed to shifting, the Galil rule deals with speeding up the actual comparisons done at each alignment by skipping sections that are known to match. Suppose that at an alignment
1222:
1258:
734:
A shift is calculated by applying two rules: the bad-character rule and the good-suffix rule. The actual shifting offset is the maximum of the shifts calculated by these rules.
2652:
improves the performance of Boyer–Moore–Horspool algorithm. The searching pattern of particular sub-string in a given string is different from Boyer–Moore–Horspool algorithm.
2285:
Index 3 & up: the matched suffixes do not match anything else in the pattern, but the trailing suffix AN matches the start of the pattern, so the shifts here are all 6.
2591:
2481:
1007:
2566:
1106:
608:
2127:
2094:
1146:
1070:
2213:
2174:
1037:
666:
1868:
2250:
2057:
2020:
1978:
1940:
1907:
1823:
1777:
1174:
2687:
is the size of the alphabet. This space is for the original delta1 bad-character table in the C and Java implementations and the good-suffix table.
620:
the text would again be checked character by character in an effort to find a match. Thus almost every character in the text needs to be examined.
3541:
2425:
The Galil rule, in its original version, is only effective for versions that output multiple matches. It updates the substring range only on
3424:
2570:
in the worst case. This is easy to see when both pattern and text consist solely of the same repeated character. However, inclusion of the
2422:. In addition to increasing the efficiency of Boyer–Moore, the Galil rule is required for proving linear-time execution in the worst case.
1151:
2924:
3319:
3358:
940:
at which the comparison process failed (assuming such a failure occurred). The next occurrence of that character to the left in
3289:
2634:
3294:
3536:
3149:
3505:
3279:
2641:
2435:
3373:
3314:
3186:
3106:
2909:
3221:
1732:
is found), and another for use when the general case returns no meaningful result. These tables will be designated
722:
The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of
3018:
1183:
3401:
3123:
1231:
3406:
3261:
78:
65:
52:
3485:
3211:
2611:
3490:
3368:
3335:
3271:
101:
35:
3500:
3396:
3340:
3241:
3195:
2599:
125:
45:
1226:, with the last instance—the least shift amount—taking precedence. All unused characters are set as
3495:
3299:
3231:
2831:
2607:
2434:, i.e. a full match. A generalized version for dealing with submatches was reported in 1985 as the
3444:
3101:. Soda '91. Philadelphia, Pennsylvania: Society for Industrial and Applied Mathematics: 224–233.
2587:
2948:
2943:
2671:
is the length of the pattern string, which we are searching for in the text, which is of length
919:. The pattern is shifted right (in this case by 2) so that the next occurrence of the character
2862:
Rytter, Wojciech (1980). "A Correct
Preprocessing Algorithm for Boyer–Moore String-Searching".
2826:
1150:
space complexity (make_delta1, makeCharTable). This is the same as the original delta1 and the
3449:
3391:
3251:
3179:
944:
is found, and a shift which brings that occurrence in line with the mismatched occurrence in
2451:
2446:
The Boyer–Moore algorithm as presented in the original paper has worst-case running time of
986:
3256:
3144:
2539:
1079:
581:
104:
that is the standard benchmark for practical string-search literature. It was developed by
2103:
2070:
1122:
1046:
8:
3454:
3159:
2900:
Gusfield, Dan (1999) , "Chapter 2 - Exact
Matching: Classical Comparison-Based Methods",
2183:
2144:
1016:
645:
2947: This article incorporates text from this source, which is available under the
1832:
1723:
The good-suffix rule requires two tables: one for use in the general case (where a copy
3383:
3350:
3072:
3048:
2999:
2970:"On improving the worst case running time of the Boyer–Moore string matching algorithm"
2781:
2725:
2502:
2226:
2033:
1987:
1954:
1916:
1877:
1790:
1753:
1159:
573:
3515:
3102:
2991:
2905:
2879:
2844:
2773:
2675:. This runtime is for finding all occurrences of the pattern, without the Galil rule.
3056:
2785:
2637:
is a simplification of the Boyer–Moore algorithm using only the bad-character rule.
2614:
for predicate based matching within ranges as a part of the Phobos
Runtime Library.
3510:
3480:
3434:
3236:
3172:
3153:
3076:
3064:
3030:
3003:
2981:
2871:
2836:
2763:
2747:
2729:
2717:
572:
by performing explicit character comparisons at different alignments. Instead of a
109:
93:
81:
3363:
3304:
3216:
2806:
2743:
2649:
2517:
2494:
114:
105:
68:
55:
3416:
3309:
3052:
2645:
lengths requires an additional table equal in size to the text being searched.
2506:
3063:. SFCS '77. Washington, District of Columbia: IEEE Computer Society: 189–195.
2814:
3530:
3226:
3203:
3095:"Tight bounds on the complexity of the Boyer–Moore string matching algorithm"
2995:
2883:
2848:
2810:
2777:
2498:
3057:"A new proof of the linearity of the Boyer–Moore string searching algorithm"
3429:
3246:
3094:
3061:
Proceedings of the 18th Annual
Symposium on Foundations of Computer Science
2802:
2721:
2603:
2490:
121:
2986:
2969:
2768:
2751:
3439:
3068:
1944:
is defined to be zero if there is no position satisfying the condition.
3099:
Proceedings of the 2nd Annual ACM-SIAM Symposium on
Discrete Algorithms
2708:
Hume, Andrew; Sunday, Daniel (November 1991). "Fast String
Searching".
2571:
948:
is proposed. If the mismatched character does not occur to the left in
1011:
or -1 if there is no such occurrence. The proposed shift will then be
2965:
2582:
Various implementations exist in different programming languages. In
2294:
424:
3034:
2875:
2840:
2293:
A simple but important optimization of Boyer–Moore was put forth by
1872:
and such that the character preceding that suffix is not equal to
3475:
2762:(10). New York: Association for Computing Machinery: 762–772.
1676:) so that a prefix of the shifted pattern matches a suffix of
3019:"The Boyer–Moore–Galil String Searching Strategies Revisited"
2980:(9). New York: Association for Computing Machinery: 505–508.
2925:"Constructing a Good Suffix Table - Understanding an example"
2583:
2534:
occur in the text, running time of the original algorithm is
20:
1275:
927:) to the left of the current character (which is the middle
747:
137:
3459:
2622:
702:, moving backward. The strings are matched from the end of
632:
is the length of the pattern. If the character in the text
3164:
3017:
Apostolico, Alberto; Giancarlo, Raffaele (February 1986).
2904:(1 ed.), Cambridge University Press, pp. 19–21,
973:
in the pattern. This lookup will return the occurrence of
2618:
710:. The comparisons continue until either the beginning of
612:), Boyer–Moore uses information gained by preprocessing
2261:
1664:
to the right by the least amount (past the left end of
1546:
is the largest such substring for the given alignment.
564:
The Boyer–Moore algorithm searches for occurrences of
2542:
2454:
2229:
2186:
2147:
2106:
2073:
2036:
1990:
1957:
1919:
1880:
1835:
1793:
1756:
1234:
1186:
1162:
1125:
1082:
1049:
1019:
989:
648:
584:
379:
denotes the input text to be searched. Its length is
2797:
2795:
3160:
Richard Cole's 1991 paper proving runtime linearity
3016:
2586:it is part of the Standard Library since C++17 and
2560:
2475:
2413:can be recorded without explicitly comparing past
2244:
2207:
2168:
2121:
2088:
2051:
2014:
1972:
1934:
1901:
1862:
1817:
1771:
1252:
1216:
1168:
1140:
1100:
1064:
1031:
1001:
936:The bad-character rule considers the character in
660:
602:
391:denotes the string to be searched for, called the
2792:
952:, a shift is proposed that moves the entirety of
899:Demonstration of bad-character rule with pattern
640:More formally, the algorithm begins at alignment
3528:
2801:
1740:respectively. Their definitions are as follows:
719:, which means no further matches will be found.
2388:. Thus if the comparisons get down to position
1469:Demonstration of good-suffix rule with pattern
637:is the key to the efficiency of the algorithm.
915:(in the pattern) in the column marked with an
3180:
3047:
2489:appear in the text. This was first proved by
1550:Then find, if it exists, the right-most copy
2574:results in linear runtime across all cases.
1117:The C and Java implementations below have a
1110:space, assuming a finite alphabet of length
3145:Original paper on the Boyer-Moore algorithm
2902:Algorithms on Strings, Trees, and Sequences
2372:, in the next comparison phase a prefix of
1982:denote the length of the largest suffix of
1658:does not exist, then shift the left end of
1217:{\displaystyle \operatorname {len} (p)-1-i}
3187:
3173:
2742:
2707:
2617:The Boyer–Moore algorithm is also used in
2065:Both of these tables are constructible in
1601:differs from the character to the left of
1154:. This table maps a character at position
2985:
2855:
2830:
2767:
1703:If no such shift is possible, then shift
737:
628:characters farther along the text, where
3359:Comparison of regular-expression engines
2899:
969:in the alphabet and second by the index
616:to skip as many alignments as possible.
19:For the Boyer–Moore theorem prover, see
3150:An example of the Boyer-Moore algorithm
3121:
3088:
3086:
2960:
2958:
2736:
2527:comparisons in the worst case in 1991.
1265:
1253:{\displaystyle \operatorname {len} (p)}
3542:Computer-related introductions in 1977
3529:
2895:
2893:
2861:
576:of all alignments (of which there are
3320:Zhu–Takaoka string matching algorithm
3168:
2964:
2131:space. The alignment shift for index
2028:, if one exists. If none exists, let
3128:FreeBSD-current mailing list archive
3092:
3083:
2955:
2520:gave a proof with an upper bound of
2262:Shift Example using pattern ANPANMAN
686:are then compared starting at index
73:Θ(m) preprocessing + Ω(n/m) matching
3285:Boyer–Moore string-search algorithm
2890:
2752:"A Fast String Searching Algorithm"
2254:is zero or a match has been found.
98:Boyer–Moore string-search algorithm
60:Θ(m) preprocessing + O(mn) matching
13:
3041:
2815:"Fast pattern matching in strings"
2577:
2355:such that its left end is between
1781:is the largest position less than
1718:
1713:(length of P) places to the right.
14:
3553:
3374:Nondeterministic finite automaton
3315:Two-way string-matching algorithm
3138:
2710:Software: Practice and Experience
2288:
1586:and the character to the left of
1510:Suppose for a given alignment of
3122:Haertel, Mike (21 August 2010).
3093:Cole, Richard (September 1991).
2942:
1270:
959:
742:
516:such that the last character of
3115:
3010:
2516:comparisons in the worst case.
2509:in 1980 with an upper bound of
1619:to the right so that substring
407:denotes the character at index
3290:Boyer–Moore–Horspool algorithm
3280:Apostolico–Giancarlo algorithm
3156:, co-inventor of the algorithm
2917:
2701:
2678:
2662:
2642:Apostolico–Giancarlo algorithm
2635:Boyer–Moore–Horspool algorithm
2602:there is an implementation in
2555:
2546:
2470:
2458:
2441:
2436:Apostolico–Giancarlo algorithm
2239:
2233:
2202:
2196:
2163:
2157:
2116:
2110:
2083:
2077:
2046:
2040:
2009:
1994:
1967:
1961:
1929:
1923:
1896:
1884:
1857:
1854:
1848:
1839:
1812:
1797:
1766:
1760:
1247:
1241:
1199:
1193:
1135:
1129:
1095:
1086:
1059:
1053:
907:. There is a mismatch between
729:
559:
132:
1:
2694:
674:is aligned with the start of
3295:Knuth–Morris–Pratt algorithm
3222:Damerau–Levenshtein distance
1688:. This includes cases where
981:with the next-highest index
956:past the point of mismatch.
7:
3486:Compressed pattern matching
3212:Approximate string matching
3194:
2628:
2278:the full pattern length 8.
10:
3558:
3537:String matching algorithms
3491:Longest common subsequence
3402:Needleman–Wunsch algorithm
3272:String-searching algorithm
2592:generic Boyer–Moore search
102:string-searching algorithm
18:
16:String searching algorithm
3501:Sequential pattern mining
3468:
3415:
3382:
3349:
3341:Commentz-Walter algorithm
3329:Multiple string searching
3328:
3270:
3262:Wagner–Fischer algorithm
3202:
3023:SIAM Journal on Computing
2864:SIAM Journal on Computing
2819:SIAM Journal on Computing
2600:Go (programming language)
2594:implementation under the
2485:only if the pattern does
2380:must match the substring
2024:that is also a prefix of
77:
64:
51:
41:
31:
27:Boyer–Moore string search
3511:String rewriting systems
3496:Longest common substring
3407:Smith–Waterman algorithm
3232:Gestalt pattern matching
2655:
2608:D (programming language)
911:(in the input text) and
3445:Generalized suffix tree
3369:Thompson's construction
2221:should only be used if
1152:BMH bad-character table
543:occurs at an alignment
3397:Hirschberg's algorithm
3124:"why GNU grep is fast"
2722:10.1002/spe.4380211105
2562:
2477:
2476:{\displaystyle O(n+m)}
2246:
2209:
2170:
2123:
2090:
2063:
2053:
2016:
1974:
1946:
1936:
1903:
1864:
1819:
1773:
1716:
1634:aligns with substring
1254:
1218:
1170:
1142:
1102:
1066:
1033:
1003:
1002:{\displaystyle j<i}
738:The bad-character rule
662:
604:
520:is aligned with index
346:Alignments of pattern
3252:Levenshtein automaton
3242:Jaro–Winkler distance
3152:from the homepage of
2987:10.1145/359146.359148
2769:10.1145/359842.359859
2563:
2561:{\displaystyle O(nm)}
2501:in 1977, followed by
2478:
2247:
2210:
2171:
2124:
2091:
2054:
2017:
1975:
1947:
1937:
1904:
1865:
1820:
1774:
1742:
1694:is an exact match of
1508:
1262:as a sentinel value.
1255:
1219:
1171:
1143:
1103:
1101:{\displaystyle O(km)}
1067:
1034:
1004:
663:
605:
603:{\displaystyle n-m+1}
3300:Rabin–Karp algorithm
3257:Levenshtein distance
2807:Morris, James H. Jr.
2540:
2452:
2227:
2184:
2145:
2122:{\displaystyle O(m)}
2104:
2089:{\displaystyle O(m)}
2071:
2034:
1988:
1955:
1917:
1878:
1833:
1827:matches a suffix of
1791:
1754:
1534:matches a suffix of
1266:The good-suffix rule
1232:
1184:
1160:
1141:{\displaystyle O(k)}
1123:
1080:
1065:{\displaystyle O(1)}
1047:
1017:
987:
646:
582:
364:. A match occurs at
3455:Ternary search tree
3069:10.1109/SFCS.1977.3
2405:, an occurrence of
2208:{\displaystyle m-H}
2169:{\displaystyle m-L}
1580:is not a suffix of
1032:{\displaystyle i-j}
661:{\displaystyle k=m}
28:
3384:Sequence alignment
3351:Regular expression
2968:(September 1979).
2931:. 11 December 2014
2558:
2473:
2322:down to character
2242:
2205:
2166:
2119:
2086:
2049:
2012:
1970:
1932:
1899:
1863:{\displaystyle P]}
1860:
1815:
1769:
1250:
1214:
1166:
1138:
1098:
1062:
1029:
999:
670:, so the start of
658:
600:
574:brute-force search
431:starting at index
415:, counting from 1.
26:
3524:
3523:
3516:String operations
2811:Pratt, Vaughan R.
2748:Moore, J Strother
2716:(11): 1221–1248.
2530:When the pattern
2314:is compared with
2245:{\displaystyle L}
2052:{\displaystyle H}
2015:{\displaystyle P}
1973:{\displaystyle H}
1935:{\displaystyle L}
1902:{\displaystyle P}
1818:{\displaystyle P}
1785:such that string
1772:{\displaystyle L}
1466:
1465:
1169:{\displaystyle i}
896:
895:
551:is equivalent to
489:is the length of
485:in range , where
462:is the length of
458:in range , where
343:
342:
90:
89:
3549:
3481:Pattern matching
3435:Suffix automaton
3237:Hamming distance
3189:
3182:
3175:
3166:
3165:
3154:J Strother Moore
3132:
3131:
3119:
3113:
3112:
3090:
3081:
3080:
3049:Guibas, Leonidas
3045:
3039:
3038:
3014:
3008:
3007:
2989:
2962:
2953:
2946:
2940:
2938:
2936:
2921:
2915:
2914:
2897:
2888:
2887:
2859:
2853:
2852:
2834:
2803:Knuth, Donald E.
2799:
2790:
2789:
2771:
2750:(October 1977).
2744:Boyer, Robert S.
2740:
2734:
2733:
2705:
2688:
2682:
2676:
2666:
2612:BoyerMooreFinder
2569:
2567:
2565:
2564:
2559:
2526:
2515:
2484:
2482:
2480:
2479:
2474:
2433:
2421:
2412:
2404:
2396:
2387:
2379:
2371:
2362:
2354:
2345:
2337:
2329:
2321:
2313:
2305:
2253:
2251:
2249:
2248:
2243:
2220:
2216:
2214:
2212:
2211:
2206:
2177:
2175:
2173:
2172:
2167:
2138:
2134:
2130:
2128:
2126:
2125:
2120:
2097:
2095:
2093:
2092:
2087:
2060:
2058:
2056:
2055:
2050:
2027:
2023:
2021:
2019:
2018:
2013:
1981:
1979:
1977:
1976:
1971:
1943:
1941:
1939:
1938:
1933:
1910:
1908:
1906:
1905:
1900:
1871:
1869:
1867:
1866:
1861:
1826:
1824:
1822:
1821:
1816:
1784:
1780:
1778:
1776:
1775:
1770:
1747:
1739:
1735:
1729:
1655:
1625:
1592:
1577:
1556:
1493:
1447:
1442:
1437:
1400:
1395:
1390:
1382:
1377:
1372:
1347:
1342:
1337:
1276:
1261:
1259:
1257:
1256:
1251:
1225:
1223:
1221:
1220:
1215:
1177:
1175:
1173:
1172:
1167:
1149:
1147:
1145:
1144:
1139:
1113:
1109:
1107:
1105:
1104:
1099:
1074:lookup time and
1073:
1071:
1069:
1068:
1063:
1040:
1038:
1036:
1035:
1030:
1010:
1008:
1006:
1005:
1000:
980:
976:
972:
968:
955:
951:
947:
943:
939:
923:(in the pattern
874:
831:
800:
748:
725:
718:
715:past the end of
713:
709:
706:to the start of
705:
701:
697:
693:
689:
685:
681:
678:. Characters in
677:
673:
669:
667:
665:
664:
659:
631:
627:
615:
611:
609:
607:
606:
601:
571:
567:
395:. Its length is
138:
110:J Strother Moore
100:is an efficient
94:computer science
82:space complexity
29:
25:
3557:
3556:
3552:
3551:
3550:
3548:
3547:
3546:
3527:
3526:
3525:
3520:
3464:
3411:
3378:
3364:Regular grammar
3345:
3324:
3305:Raita algorithm
3266:
3217:Bitap algorithm
3198:
3193:
3141:
3136:
3135:
3120:
3116:
3109:
3091:
3084:
3053:Odlyzko, Andrew
3046:
3042:
3035:10.1137/0215007
3015:
3011:
2963:
2956:
2934:
2932:
2923:
2922:
2918:
2912:
2898:
2891:
2876:10.1137/0209037
2860:
2856:
2841:10.1137/0206024
2800:
2793:
2741:
2737:
2706:
2702:
2697:
2692:
2691:
2683:
2679:
2667:
2663:
2658:
2650:Raita algorithm
2631:
2580:
2578:Implementations
2541:
2538:
2537:
2535:
2521:
2510:
2453:
2450:
2449:
2447:
2444:
2426:
2420:
2414:
2406:
2398:
2395:
2389:
2381:
2373:
2370:
2364:
2356:
2353:
2347:
2339:
2331:
2323:
2315:
2307:
2304:
2298:
2291:
2268:
2264:
2259:
2257:
2228:
2225:
2224:
2222:
2218:
2185:
2182:
2181:
2179:
2146:
2143:
2142:
2140:
2136:
2132:
2105:
2102:
2101:
2099:
2072:
2069:
2068:
2066:
2035:
2032:
2031:
2029:
2025:
1989:
1986:
1985:
1983:
1956:
1953:
1952:
1950:
1918:
1915:
1914:
1912:
1879:
1876:
1875:
1873:
1834:
1831:
1830:
1828:
1792:
1789:
1788:
1786:
1782:
1755:
1752:
1751:
1749:
1745:
1737:
1733:
1727:
1721:
1653:
1623:
1590:
1575:
1554:
1503:
1502:
1501:
1491:
1445:
1440:
1435:
1398:
1393:
1388:
1380:
1375:
1370:
1345:
1340:
1335:
1273:
1268:
1233:
1230:
1229:
1227:
1185:
1182:
1181:
1179:
1161:
1158:
1157:
1155:
1124:
1121:
1120:
1118:
1111:
1081:
1078:
1077:
1075:
1048:
1045:
1044:
1042:
1018:
1015:
1014:
1012:
988:
985:
984:
982:
978:
974:
970:
966:
962:
953:
949:
945:
941:
937:
934:
933:
932:
872:
829:
798:
745:
740:
732:
723:
716:
711:
707:
703:
699:
695:
691:
687:
683:
679:
675:
671:
647:
644:
643:
641:
629:
625:
613:
583:
580:
579:
577:
569:
565:
562:
477:is a substring
450:is a substring
371:
370:
369:
355:
135:
115:Wojciech Rytter
106:Robert S. Boyer
24:
17:
12:
11:
5:
3555:
3545:
3544:
3539:
3522:
3521:
3519:
3518:
3513:
3508:
3503:
3498:
3493:
3488:
3483:
3478:
3472:
3470:
3466:
3465:
3463:
3462:
3457:
3452:
3447:
3442:
3437:
3432:
3427:
3421:
3419:
3417:Data structure
3413:
3412:
3410:
3409:
3404:
3399:
3394:
3388:
3386:
3380:
3379:
3377:
3376:
3371:
3366:
3361:
3355:
3353:
3347:
3346:
3344:
3343:
3338:
3332:
3330:
3326:
3325:
3323:
3322:
3317:
3312:
3310:Trigram search
3307:
3302:
3297:
3292:
3287:
3282:
3276:
3274:
3268:
3267:
3265:
3264:
3259:
3254:
3249:
3244:
3239:
3234:
3229:
3224:
3219:
3214:
3208:
3206:
3200:
3199:
3192:
3191:
3184:
3177:
3169:
3163:
3162:
3157:
3147:
3140:
3139:External links
3137:
3134:
3133:
3114:
3107:
3082:
3040:
3009:
2954:
2929:Stack Overflow
2916:
2910:
2889:
2870:(3): 509–512.
2854:
2832:10.1.1.93.8147
2825:(2): 323–350.
2791:
2735:
2699:
2698:
2696:
2693:
2690:
2689:
2677:
2660:
2659:
2657:
2654:
2630:
2627:
2579:
2576:
2557:
2554:
2551:
2548:
2545:
2472:
2469:
2466:
2463:
2460:
2457:
2443:
2440:
2418:
2393:
2368:
2351:
2346:is shifted to
2302:
2290:
2289:The Galil rule
2287:
2265:
2263:
2260:
2241:
2238:
2235:
2232:
2204:
2201:
2198:
2195:
2192:
2189:
2165:
2162:
2159:
2156:
2153:
2150:
2118:
2115:
2112:
2109:
2085:
2082:
2079:
2076:
2048:
2045:
2042:
2039:
2011:
2008:
2005:
2002:
1999:
1996:
1993:
1969:
1966:
1963:
1960:
1931:
1928:
1925:
1922:
1898:
1895:
1892:
1889:
1886:
1883:
1859:
1856:
1853:
1850:
1847:
1844:
1841:
1838:
1814:
1811:
1808:
1805:
1802:
1799:
1796:
1768:
1765:
1762:
1759:
1720:
1717:
1715:
1714:
1701:
1647:
1522:, a substring
1468:
1467:
1464:
1463:
1460:
1457:
1454:
1451:
1448:
1443:
1438:
1433:
1430:
1427:
1424:
1421:
1417:
1416:
1413:
1410:
1407:
1404:
1401:
1396:
1391:
1386:
1383:
1378:
1373:
1368:
1364:
1363:
1360:
1357:
1354:
1351:
1348:
1343:
1338:
1333:
1330:
1327:
1324:
1321:
1317:
1316:
1313:
1310:
1307:
1304:
1301:
1298:
1295:
1292:
1289:
1286:
1283:
1280:
1274:
1272:
1269:
1267:
1264:
1249:
1246:
1243:
1240:
1237:
1213:
1210:
1207:
1204:
1201:
1198:
1195:
1192:
1189:
1165:
1137:
1134:
1131:
1128:
1097:
1094:
1091:
1088:
1085:
1061:
1058:
1055:
1052:
1028:
1025:
1022:
998:
995:
992:
961:
958:
898:
897:
894:
893:
890:
887:
884:
881:
878:
875:
870:
867:
864:
861:
857:
856:
853:
850:
847:
844:
841:
838:
835:
832:
827:
824:
820:
819:
816:
813:
810:
807:
804:
801:
796:
793:
790:
787:
783:
782:
779:
776:
773:
770:
767:
764:
761:
758:
755:
752:
746:
744:
741:
739:
736:
731:
728:
657:
654:
651:
599:
596:
593:
590:
587:
561:
558:
557:
556:
529:
494:
467:
440:
435:and ending at
416:
400:
384:
345:
344:
341:
340:
337:
334:
331:
328:
325:
322:
319:
316:
312:
311:
308:
305:
302:
299:
296:
293:
290:
287:
283:
282:
279:
276:
273:
270:
267:
264:
261:
258:
254:
253:
250:
247:
244:
241:
238:
235:
232:
229:
225:
224:
221:
218:
215:
212:
209:
206:
203:
200:
196:
195:
192:
189:
186:
183:
180:
177:
174:
171:
167:
166:
163:
160:
157:
154:
151:
148:
145:
142:
136:
134:
131:
120:The algorithm
88:
87:
84:
75:
74:
71:
62:
61:
58:
49:
48:
43:
42:Data structure
39:
38:
33:
15:
9:
6:
4:
3:
2:
3554:
3543:
3540:
3538:
3535:
3534:
3532:
3517:
3514:
3512:
3509:
3507:
3504:
3502:
3499:
3497:
3494:
3492:
3489:
3487:
3484:
3482:
3479:
3477:
3474:
3473:
3471:
3467:
3461:
3458:
3456:
3453:
3451:
3448:
3446:
3443:
3441:
3438:
3436:
3433:
3431:
3428:
3426:
3423:
3422:
3420:
3418:
3414:
3408:
3405:
3403:
3400:
3398:
3395:
3393:
3390:
3389:
3387:
3385:
3381:
3375:
3372:
3370:
3367:
3365:
3362:
3360:
3357:
3356:
3354:
3352:
3348:
3342:
3339:
3337:
3334:
3333:
3331:
3327:
3321:
3318:
3316:
3313:
3311:
3308:
3306:
3303:
3301:
3298:
3296:
3293:
3291:
3288:
3286:
3283:
3281:
3278:
3277:
3275:
3273:
3269:
3263:
3260:
3258:
3255:
3253:
3250:
3248:
3245:
3243:
3240:
3238:
3235:
3233:
3230:
3228:
3227:Edit distance
3225:
3223:
3220:
3218:
3215:
3213:
3210:
3209:
3207:
3205:
3204:String metric
3201:
3197:
3190:
3185:
3183:
3178:
3176:
3171:
3170:
3167:
3161:
3158:
3155:
3151:
3148:
3146:
3143:
3142:
3129:
3125:
3118:
3110:
3108:0-89791-376-0
3104:
3100:
3096:
3089:
3087:
3078:
3074:
3070:
3066:
3062:
3058:
3054:
3050:
3044:
3036:
3032:
3028:
3024:
3020:
3013:
3005:
3001:
2997:
2993:
2988:
2983:
2979:
2975:
2971:
2967:
2961:
2959:
2952:
2950:
2945:
2930:
2926:
2920:
2913:
2911:0-521-58519-8
2907:
2903:
2896:
2894:
2885:
2881:
2877:
2873:
2869:
2865:
2858:
2850:
2846:
2842:
2838:
2833:
2828:
2824:
2820:
2816:
2812:
2808:
2804:
2798:
2796:
2787:
2783:
2779:
2775:
2770:
2765:
2761:
2757:
2753:
2749:
2745:
2739:
2731:
2727:
2723:
2719:
2715:
2711:
2704:
2700:
2686:
2681:
2674:
2670:
2665:
2661:
2653:
2651:
2646:
2643:
2638:
2636:
2626:
2624:
2620:
2615:
2613:
2609:
2605:
2601:
2597:
2593:
2590:provides the
2589:
2585:
2575:
2573:
2552:
2549:
2543:
2533:
2528:
2525:
2519:
2514:
2508:
2504:
2500:
2496:
2492:
2488:
2467:
2464:
2461:
2455:
2439:
2437:
2431:
2430:
2423:
2417:
2411:
2410:
2403:
2402:
2392:
2386:
2385:
2378:
2377:
2367:
2361:
2360:
2350:
2344:
2343:
2336:
2335:
2328:
2327:
2320:
2319:
2312:
2311:
2301:
2296:
2286:
2283:
2279:
2275:
2271:
2270:Explanation:
2258:
2255:
2236:
2230:
2199:
2193:
2190:
2187:
2160:
2154:
2151:
2148:
2113:
2107:
2098:time and use
2080:
2074:
2062:
2043:
2037:
2006:
2003:
2000:
1997:
1991:
1964:
1958:
1945:
1926:
1920:
1893:
1890:
1887:
1881:
1851:
1845:
1842:
1836:
1809:
1806:
1803:
1800:
1794:
1763:
1757:
1741:
1731:
1730:
1719:Preprocessing
1712:
1708:
1707:
1702:
1699:
1698:
1693:
1692:
1687:
1686:
1681:
1680:
1675:
1674:
1669:
1668:
1663:
1662:
1657:
1656:
1648:
1645:
1644:
1639:
1638:
1633:
1632:
1627:
1626:
1618:
1617:
1612:
1611:
1606:
1605:
1600:
1599:
1594:
1593:
1585:
1584:
1579:
1578:
1570:
1569:
1564:
1563:
1558:
1557:
1549:
1548:
1547:
1545:
1544:
1539:
1538:
1533:
1532:
1527:
1526:
1521:
1520:
1515:
1514:
1507:
1499:
1495:
1494:
1486:
1482:
1481:
1476:
1472:
1461:
1458:
1455:
1452:
1449:
1444:
1439:
1434:
1431:
1428:
1425:
1422:
1419:
1418:
1414:
1411:
1408:
1405:
1402:
1397:
1392:
1387:
1384:
1379:
1374:
1369:
1366:
1365:
1361:
1358:
1355:
1352:
1349:
1344:
1339:
1334:
1331:
1328:
1325:
1322:
1319:
1318:
1314:
1311:
1308:
1305:
1302:
1299:
1296:
1293:
1290:
1287:
1284:
1281:
1278:
1277:
1263:
1244:
1238:
1235:
1211:
1208:
1205:
1202:
1196:
1190:
1187:
1163:
1153:
1132:
1126:
1115:
1092:
1089:
1083:
1056:
1050:
1026:
1023:
1020:
996:
993:
990:
960:Preprocessing
957:
930:
926:
922:
918:
914:
910:
906:
902:
891:
888:
885:
882:
879:
876:
871:
868:
865:
862:
859:
858:
854:
851:
848:
845:
842:
839:
836:
833:
828:
825:
822:
821:
817:
814:
811:
808:
805:
802:
797:
794:
791:
788:
785:
784:
780:
777:
774:
771:
768:
765:
762:
759:
756:
753:
750:
749:
735:
727:
720:
655:
652:
649:
638:
635:
621:
617:
597:
594:
591:
588:
585:
575:
554:
550:
546:
542:
538:
534:
530:
527:
523:
519:
515:
511:
507:
503:
499:
495:
492:
488:
484:
480:
476:
472:
468:
465:
461:
457:
453:
449:
445:
441:
438:
434:
430:
426:
422:
421:
417:
414:
410:
406:
405:
401:
398:
394:
390:
389:
385:
382:
378:
377:
373:
372:
367:
363:
359:
353:
349:
338:
335:
332:
329:
326:
323:
320:
317:
314:
313:
309:
306:
303:
300:
297:
294:
291:
288:
285:
284:
280:
277:
274:
271:
268:
265:
262:
259:
256:
255:
251:
248:
245:
242:
239:
236:
233:
230:
227:
226:
222:
219:
216:
213:
210:
207:
204:
201:
198:
197:
193:
190:
187:
184:
181:
178:
175:
172:
169:
168:
164:
161:
158:
155:
152:
149:
146:
143:
140:
139:
130:
127:
123:
118:
116:
111:
107:
103:
99:
95:
85:
83:
80:
76:
72:
70:
67:
63:
59:
57:
54:
50:
47:
44:
40:
37:
36:String search
34:
30:
22:
3430:Suffix array
3336:Aho–Corasick
3284:
3247:Lee distance
3127:
3117:
3098:
3060:
3043:
3026:
3022:
3012:
2977:
2973:
2949:CC BY-SA 3.0
2941:
2933:. Retrieved
2928:
2919:
2901:
2867:
2863:
2857:
2822:
2818:
2759:
2755:
2738:
2713:
2709:
2703:
2684:
2680:
2672:
2668:
2664:
2647:
2639:
2632:
2616:
2598:library. In
2595:
2581:
2531:
2529:
2523:
2518:Richard Cole
2512:
2486:
2445:
2428:
2427:
2424:
2415:
2408:
2407:
2400:
2399:
2390:
2383:
2382:
2375:
2374:
2365:
2358:
2357:
2348:
2341:
2340:
2333:
2332:
2325:
2324:
2317:
2316:
2309:
2308:
2299:
2292:
2284:
2280:
2276:
2272:
2269:
2256:
2139:is given by
2064:
1948:
1743:
1725:
1724:
1722:
1710:
1705:
1704:
1696:
1695:
1690:
1689:
1684:
1683:
1678:
1677:
1672:
1671:
1666:
1665:
1660:
1659:
1651:
1650:
1642:
1641:
1636:
1635:
1630:
1629:
1621:
1620:
1615:
1614:
1609:
1608:
1603:
1602:
1597:
1596:
1588:
1587:
1582:
1581:
1573:
1572:
1567:
1566:
1561:
1560:
1552:
1551:
1542:
1541:
1540:and suppose
1536:
1535:
1530:
1529:
1524:
1523:
1518:
1517:
1512:
1511:
1509:
1504:
1497:
1489:
1488:
1484:
1479:
1478:
1474:
1470:
1178:to shift by
1116:
963:
935:
928:
924:
920:
916:
912:
908:
904:
900:
733:
721:
639:
633:
622:
618:
563:
552:
548:
544:
540:
536:
532:
525:
521:
517:
513:
509:
508:is an index
505:
501:
497:
490:
486:
482:
478:
474:
470:
463:
459:
455:
451:
447:
443:
439:, inclusive.
436:
432:
428:
423:denotes the
419:
418:
412:
408:
403:
402:
396:
392:
387:
386:
380:
375:
374:
365:
361:
357:
351:
347:
122:preprocesses
119:
97:
91:
3440:Suffix tree
2442:Performance
1271:Description
931:) is found.
743:Description
730:Shift rules
560:Description
133:Definitions
69:performance
56:performance
3531:Categories
3029:: 98–105.
2695:References
2572:Galil rule
2338:. Then if
1571:such that
537:occurrence
427:of string
411:of string
79:Worst-case
53:Worst-case
2996:0001-0782
2974:Comm. ACM
2966:Galil, Z.
2884:0097-5397
2849:0097-5397
2827:CiteSeerX
2778:0001-0782
2756:Comm. ACM
2604:search.go
2596:Algorithm
2295:Zvi Galil
2191:−
2152:−
2061:be zero.
1891:−
1744:For each
1506:follows:
1239:
1209:−
1203:−
1191:
1024:−
589:−
498:alignment
481:for some
454:for some
425:substring
117:in 1980.
66:Best-case
3055:(1977).
2951:license.
2813:(1977).
2786:15892987
2629:Variants
1613:. Shift
1477:. Here,
1475:ANAMPNAM
352:ANPANMAN
350:to text
3506:Sorting
3476:Parsing
3196:Strings
3077:6470193
3004:1333465
2935:30 July
2730:5902579
2610:uses a
2568:
2536:
2507:Odlyzko
2483:
2448:
2252:
2223:
2215:
2180:
2176:
2141:
2129:
2100:
2096:
2067:
2059:
2030:
2022:
1984:
1980:
1951:
1942:
1913:
1909:
1874:
1870:
1829:
1825:
1787:
1779:
1750:
1260:
1228:
1224:
1180:
1176:
1156:
1148:
1119:
1108:
1076:
1072:
1043:
1041:, with
1039:
1013:
1009:
983:
905:NNAAMAN
668:
642:
610:
578:
393:pattern
3105:
3075:
3002:
2994:
2908:
2882:
2847:
2829:
2784:
2776:
2728:
2503:Guibas
2497:, and
2495:Morris
471:suffix
444:prefix
126:string
96:, the
86:Θ(k+m)
46:String
3469:Other
3425:DAFSA
3392:BLAST
3073:S2CID
3000:S2CID
2782:S2CID
2726:S2CID
2656:Notes
2588:Boost
2499:Pratt
2491:Knuth
533:match
356:from
32:Class
21:Nqthm
3460:Trie
3450:Rope
3103:ISBN
2992:ISSN
2937:2024
2906:ISBN
2880:ISSN
2845:ISSN
2774:ISSN
2648:The
2640:The
2633:The
2623:grep
2532:does
2505:and
2363:and
1949:Let
1736:and
1516:and
1487:and
994:<
694:and
682:and
124:the
108:and
3065:doi
3031:doi
2982:doi
2872:doi
2837:doi
2764:doi
2718:doi
2621:'s
2619:GNU
2584:C++
2487:not
2432:= 0
2397:of
2330:of
2178:or
2135:in
1843:1..
1709:by
1682:in
1670:in
1649:If
1640:in
1628:in
1607:in
1595:in
1565:in
1559:of
1528:of
1496:is
1483:is
1236:len
1188:len
977:in
698:in
690:in
568:in
547:if
539:of
535:or
524:of
512:in
504:to
500:of
496:An
473:of
446:of
366:k=5
362:k=8
360:to
358:k=3
348:PAN
92:In
3533::
3126:.
3097:.
3085:^
3071:.
3059:.
3051:;
3027:15
3025:.
3021:.
2998:.
2990:.
2978:22
2976:.
2972:.
2957:^
2927:.
2892:^
2878:.
2866:.
2843:.
2835:.
2821:.
2817:.
2809:;
2805:;
2794:^
2780:.
2772:.
2760:20
2758:.
2754:.
2746:;
2724:.
2714:21
2712:.
2625:.
2606:.
2493:,
2438:.
2306:,
2217:.
1911:.
1748:,
1473:=
1462:-
1415:-
1362:-
1315:-
1114:.
903:=
892:-
855:-
818:-
781:-
726:.
634:is
531:A
469:A
442:A
339:-
310:-
281:-
252:-
223:-
194:-
165:-
3188:e
3181:t
3174:v
3130:.
3111:.
3079:.
3067::
3037:.
3033::
3006:.
2984::
2939:.
2886:.
2874::
2868:9
2851:.
2839::
2823:6
2788:.
2766::
2732:.
2720::
2685:k
2673:n
2669:m
2556:)
2553:m
2550:n
2547:(
2544:O
2524:n
2522:3
2513:n
2511:5
2471:)
2468:m
2465:+
2462:n
2459:(
2456:O
2429:c
2419:1
2416:k
2409:P
2401:T
2394:1
2391:k
2384:T
2376:P
2369:1
2366:k
2359:c
2352:2
2349:k
2342:P
2334:T
2326:c
2318:T
2310:P
2303:1
2300:k
2240:]
2237:i
2234:[
2231:L
2219:H
2203:]
2200:i
2197:[
2194:H
2188:m
2164:]
2161:i
2158:[
2155:L
2149:m
2137:P
2133:i
2117:)
2114:m
2111:(
2108:O
2084:)
2081:m
2078:(
2075:O
2047:]
2044:i
2041:[
2038:H
2026:P
2010:]
2007:m
2004:.
2001:.
1998:i
1995:[
1992:P
1968:]
1965:i
1962:[
1959:H
1930:]
1927:i
1924:[
1921:L
1897:]
1894:1
1888:i
1885:[
1882:P
1858:]
1855:]
1852:i
1849:[
1846:L
1840:[
1837:P
1813:]
1810:m
1807:.
1804:.
1801:i
1798:[
1795:P
1783:m
1767:]
1764:i
1761:[
1758:L
1746:i
1738:H
1734:L
1728:′
1726:t
1711:m
1706:P
1700:.
1697:P
1691:t
1685:T
1679:t
1673:T
1667:t
1661:P
1654:′
1652:t
1646:.
1643:T
1637:t
1631:P
1624:′
1622:t
1616:P
1610:P
1604:t
1598:P
1591:′
1589:t
1583:P
1576:′
1574:t
1568:P
1562:t
1555:′
1553:t
1543:t
1537:P
1531:T
1525:t
1519:T
1513:P
1500:.
1498:P
1492:′
1490:t
1485:T
1480:t
1471:P
1459:M
1456:A
1453:N
1450:P
1446:M
1441:A
1436:N
1432:A
1429:-
1426:-
1423:-
1420:-
1412:-
1409:-
1406:-
1403:-
1399:M
1394:A
1389:N
1385:P
1381:M
1376:A
1371:N
1367:A
1359:P
1356:A
1353:N
1350:A
1346:M
1341:A
1336:N
1332:A
1329:P
1326:N
1323:A
1320:M
1312:-
1309:-
1306:-
1303:-
1300:K
1297:-
1294:-
1291:X
1288:-
1285:-
1282:-
1279:-
1248:)
1245:p
1242:(
1212:i
1206:1
1200:)
1197:p
1194:(
1164:i
1136:)
1133:k
1130:(
1127:O
1112:k
1096:)
1093:m
1090:k
1087:(
1084:O
1060:)
1057:1
1054:(
1051:O
1027:j
1021:i
997:i
991:j
979:P
975:c
971:i
967:c
954:P
950:P
946:T
942:P
938:T
929:A
925:P
921:N
917:X
913:A
909:N
901:P
889:N
886:A
883:M
880:A
877:A
873:N
869:N
866:-
863:-
860:-
852:-
849:-
846:N
843:A
840:M
837:A
834:A
830:N
826:N
823:-
815:M
812:A
809:N
806:A
803:M
799:N
795:A
792:P
789:N
786:A
778:-
775:-
772:K
769:-
766:-
763:X
760:-
757:-
754:-
751:-
724:P
717:T
712:P
708:P
704:P
700:T
696:k
692:P
688:m
684:T
680:P
676:T
672:P
656:m
653:=
650:k
630:m
626:m
614:P
598:1
595:+
592:m
586:n
570:T
566:P
555:.
553:T
549:P
545:k
541:P
528:.
526:T
522:k
518:P
514:T
510:k
506:T
502:P
493:.
491:S
487:l
483:i
479:S
475:S
466:.
464:S
460:l
456:i
452:S
448:S
437:j
433:i
429:S
420:S
413:S
409:i
404:S
399:.
397:m
388:P
383:.
381:n
376:T
368:.
354:,
336:N
333:A
330:P
327:-
324:-
321:-
318:-
315:-
307:-
304:N
301:A
298:P
295:-
292:-
289:-
286:-
278:-
275:-
272:N
269:A
266:P
263:-
260:-
257:-
249:-
246:-
243:-
240:N
237:A
234:P
231:-
228:-
220:-
217:-
214:-
211:-
208:N
205:A
202:P
199:-
191:-
188:-
185:-
182:-
179:-
176:N
173:A
170:P
162:N
159:A
156:M
153:N
150:A
147:P
144:N
141:A
23:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.