641:
224:
636:{\displaystyle \operatorname {lev} (a,b)={\begin{cases}|a|&{\text{ if }}|b|=0,\\|b|&{\text{ if }}|a|=0,\\\operatorname {lev} {\big (}\operatorname {tail} (a),\operatorname {tail} (b){\big )}&{\text{ if }}\operatorname {head} (a)=\operatorname {head} (b),\\1+\min {\begin{cases}\operatorname {lev} {\big (}\operatorname {tail} (a),b{\big )}\\\operatorname {lev} {\big (}a,\operatorname {tail} (b){\big )}\\\operatorname {lev} {\big (}\operatorname {tail} (a),\operatorname {tail} (b){\big )}\\\end{cases}}&{\text{ otherwise}}\end{cases}}}
1150:
27:
1252:, the objective is to find matches for short strings in many longer texts, in situations where a small number of differences is to be expected. The short strings could come from a dictionary, for instance. Here, one of the strings is typically short, while the other is arbitrarily long. This has a wide range of applications, for instance,
66:
for measuring the difference between two sequences. The
Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after Soviet mathematician
1235:
An example where the
Levenshtein distance between two strings of the same length is strictly less than the Hamming distance is given by the pair "flaw" and "lawn". Here the Levenshtein distance equals 2 (delete "f" from the front; insert "n" at the end). The
1709:. The table is easy to construct one row at a time starting with row 0. When the entire table has been built, the desired distance is in the table in the last row and column, representing the distance between all of the characters in
2699:
It turns out that only two rows of the table – the previous row and the current row being calculated – are needed for the construction, if one does not want to reconstruct the edited input strings.
802:
1267:
The
Levenshtein distance can also be computed between two longer strings, but the cost to compute it, which is roughly proportional to the product of the two string lengths, makes this impractical. Thus, when used to aid in
3397:
Assuming that intelligibility is inversely related to linguistic distance ... the content words the percentage of cognates (related directly or via a synonym) ... lexical relatedness ... grammatical relatedness
929:
1349:
is usually defined as a parameterizable metric calculated with a specific set of allowed edit operations, and each operation is assigned a cost (possibly infinite). This is further generalized by DNA
1089:
3110:
1157:
For example, the
Levenshtein distance between "kitten" and "sitting" is 3, since the following 3 edits change one into the other, and there is no way to do it with fewer than 3 edits:
216:
834:
664:
1603:
A more efficient method would never repeat the same distance calculation. For example, the
Levenshtein distance of all possible suffixes might be stored in an array
985:
1224:
is an upper bound on the
Levenshtein distance. The Hamming distance is the number of positions at which the corresponding symbols in the two strings are different.
178:
148:
118:
1768:
1703:
1679:
1659:
1621:
1132:
1112:
1025:
1005:
958:
854:
704:
684:
3700:
3860:
1287:: the higher the linguistic distance, the lower the mutual intelligibility, and the lower the linguistic distance, the higher the mutual intelligibility.
709:
3838:
4699:
859:
4594:
4249:
3693:
4633:
4559:
4418:
4564:
3390:
3328:
1227:
The
Levenshtein distance between two strings is no greater than the sum of their Levenshtein distances from a third string (
4569:
4159:
3850:
3686:
3023:. It can compute the optimal edit sequence, and not just the edit distance, in the same asymptotic time and space bounds.
1600:
This implementation is very inefficient because it recomputes the
Levenshtein distance of the same substrings many times.
4413:
2123:
Two examples of the resulting matrix (hovering over a tagged number reveals the operation performed to get that number):
1803:
3451:
4780:
4554:
4020:
3225:
3157:
1321:
1034:
4823:
4648:
4589:
4461:
4174:
4005:
3300:
Levenshtein, Vladimir I. (February 1966). "Binary codes capable of correcting deletions, insertions, and reversals".
4496:
3945:
3175:
3056:
1310:
4362:
4015:
4676:
4010:
3755:
1371:
183:
807:
4279:
4000:
3034:
efficiently determine whether a string has an edit distance lower than a given constant from a given string.
3020:
1257:
4818:
4681:
4536:
3972:
3205:
1726:
1354:
1153:
Edit distance matrix for two words using cost of substitution as 1 and cost of deletion or insertion as 0.5
649:
31:
Edit distance matrix for two words using cost of substitution as 1 and cost of deletion or insertion as 0.5
3262:
83:
4760:
4486:
4317:
4302:
4274:
4139:
4134:
3709:
1339:
1314:
1249:
1190:
A simple example of a deletion can be seen with "uninformed" and "uniformed" which have a distance of 1:
4765:
4643:
4610:
4546:
4054:
4025:
3803:
3220:
1789:
of the first string and all prefixes of the second, then we can compute the values in the matrix in a
4775:
4671:
4615:
4516:
4470:
3897:
3750:
3479:
3215:
3016:
1786:
3605:
3493:
3348:
466:
257:
4770:
4574:
4506:
4423:
4347:
4079:
4035:
3920:
3818:
3380:
3546:
4813:
4719:
4327:
4297:
3964:
3047:
2668:
3798:
1778:
th column of the matrix, with the first row having index 0 and the first column having index 0.
4184:
3877:
3855:
3845:
3813:
3788:
3600:
3541:
3488:
3343:
1284:
1269:
78:, although that term may also denote a larger family of distance metrics known collectively as
4724:
4666:
4526:
4454:
4044:
3031:
1793:
fashion, and thus find the distance between the two full strings as the last value computed.
3532:
Schulz, Klaus U.; Mihov, Stoyan (2002). "Fast String
Correction with Levenshtein-Automata".
4397:
4073:
4049:
3902:
3635:
3592:
3510:
3309:
3185:
1782:
1028:
963:
68:
8:
4729:
4377:
4307:
4264:
4220:
3992:
3982:
3977:
3865:
1797:
1790:
1305:, which are calculated using a different set of allowable edit operations. For instance,
1280:
1228:
153:
123:
97:
3639:
3596:
3382:
Receptive multilingualism: linguistic analyses, language policies, and didactic concepts
3313:
4658:
4625:
4387:
4259:
4124:
3887:
3870:
3728:
3625:
3582:
3559:
3514:
3471:
3433:
3411:
3361:
3257:
3236:
3190:
1738:
1688:
1664:
1626:
1606:
1350:
1261:
1117:
1097:
1010:
990:
934:
839:
689:
669:
47:
2990:// since data in v1 is always invalidated, a swap without copy could be more efficient
2703:
The
Levenshtein distance may be calculated iteratively using the following algorithm:
4790:
4392:
4104:
3912:
3823:
3657:
3622:
Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false)
3563:
3518:
3386:
3365:
3195:
3437:
1781:
Computing the Levenshtein distance is based on the observation that if we reserve a
1211:
It is at least the absolute value of the difference of the sizes of the two strings.
4785:
4755:
4709:
4511:
4447:
4269:
4154:
4129:
3930:
3833:
3579:
Polylogarithmic approximation for edit distance and the asymmetric query complexity
3551:
3498:
3423:
3353:
3288:[Binary codes capable of correcting deletions, insertions, and reversals].
3251:
3200:
2691:
operations. At the end, the bottom-right element of the array contains the answer.
1328:
1237:
1221:
1207:
The Levenshtein distance has several simple upper and lower bounds. These include:
55:
3673:
1386:, together with their lengths, and returns the Levenshtein distance between them:
797:{\displaystyle \operatorname {tail} (x_{0}x_{1}\dots x_{n})=x_{1}x_{2}\dots x_{n}}
4638:
4579:
4491:
4381:
4342:
4337:
4205:
3935:
3808:
3783:
3765:
3506:
2671:
maintained throughout the algorithm is that we can transform the initial segment
1149:
26:
4691:
4584:
4089:
4069:
3793:
3467:
1331:
allows only substitution, hence, it only applies to strings of the same length.
1276:, the compared strings are usually short to help improve speed of comparisons.
1273:
3678:
3555:
4807:
4501:
4478:
4352:
4164:
4144:
3925:
3210:
1346:
1335:
1302:
1296:
1279:
In linguistics, the Levenshtein distance is used as a metric to quantify the
1253:
79:
63:
4704:
4521:
4332:
3950:
3241:
3502:
3428:
3357:
4714:
4289:
4169:
3882:
3775:
3723:
3141:
It has been shown that the Levenshtein distance of two strings of length
51:
1283:, or how different two languages are from one another. It is related to
3892:
3624:. Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC).
2765:// that distance is the number of characters to append to s to make t.
1811:
1317:
of two adjacent characters alongside insertion, deletion, substitution;
3760:
3472:"A linear space algorithm for computing maximal common subsequences"
3285:Двоичные коды с исправлением выпадений, вставок и замещений символов
1324:(LCS) distance allows only insertion and deletion, not substitution;
4235:
4215:
4200:
4179:
4149:
4094:
4059:
3940:
2540:
2315:
2191:
1526:-- Otherwise try all three possible actions and select the best one
3630:
3587:
2831:// edit distance is delete (i + 1) chars from s to match empty t
1731:
This section uses 1-based strings rather than 0-based strings. If
924:{\displaystyle \operatorname {head} (x_{0}x_{1}\dots x_{n})=x_{0}}
4750:
4372:
4230:
4210:
4084:
3828:
3743:
3283:
3246:
2987:// copy v1 (current row) to v0 (previous row) for next iteration
2825:// calculate v1 (current row distances) from the previous row v0
1869:// for all i and j, d will hold the Levenshtein distance between
1357:, which make an operation's cost depend on where it is applied.
1290:
3738:
3733:
3577:
Andoni, Alexandr; Krauthgamer, Robert; Onak, Krzysztof (2010).
3378:
3230:
1448:-- If t is empty, the distance is the number of characters in s
1430:-- If s is empty, the distance is the number of characters in t
1260:, and software to assist natural-language translation based on
1094:
The first element in the minimum corresponds to deletion (from
3656:
Black, Paul E., ed. (14 August 2008), "Levenshtein distance",
1872:// the first i characters of s and the first j characters of t
4428:
4064:
3659:
Dictionary of Algorithms and Data Structures [online]
3170:
3233:(an open source search engine that implements edit distance)
1511:-- If the first characters are the same, they can be ignored
4734:
3180:
629:
615:
4439:
3534:
International Journal of Document Analysis and Recognition
1941:// target prefixes can be reached from empty source prefix
1905:// source prefixes can be transformed into empty string by
4225:
1134:), the second to insertion and the third to replacement.
3299:
3042:
The Levenshtein distance between two strings of length
3005:// after the last swap, the results of v1 are now in v0
2651:
2614:
2577:
2503:
2456:
2346:
2284:
2253:
2222:
1807:
by Robert A. Wagner and Michael J. Fischer.
1370:
This is a straightforward, but inefficient, recursive
3662:, U.S. National Institute of Standards and Technology
3581:. IEEE Symp. Foundations of Computer Science (FOCS).
3576:
3059:
2762:// this row is A: edit distance from an empty s to t;
1834:, and returns the Levenshtein distance between them:
1741:
1691:
1667:
1629:
1609:
1120:
1100:
1037:
1013:
993:
966:
937:
862:
842:
810:
712:
692:
672:
652:
227:
186:
156:
126:
100:
3674:
Rosetta Code implementations of Levenshtein distance
3379:
Jan D. ten Thije; Ludger Zeevaert (1 January 2007),
3414:(1974), "The String-to-String Correction Problem",
1800:, is discussed, with variants, in the 1974 article
3104:
1762:
1697:
1673:
1653:
1615:
1126:
1106:
1083:
1019:
999:
979:
952:
923:
848:
828:
796:
698:
678:
658:
635:
210:
172:
142:
112:
2694:
4805:
3911:
3281:
2759:// initialize v0 (the previous row of distances)
1217:It is zero if and only if the strings are equal.
1084:{\displaystyle \operatorname {head} (x)=x_{0}=x}
458:
74:Levenshtein distance may also be referred to as
3708:
2738:// create two work vectors of integer distances
3329:"A guided tour to approximate string matching"
1785:to hold the Levenshtein distances between all
1214:It is at most the length of the longer string.
686:is a string of all but the first character of
40:measuring the difference between two sequences
4455:
3694:
3409:
3105:{\displaystyle (\log n)^{O(1/\varepsilon )},}
2849:// use formula to fill in the rest of the row
1720:
1291:Relationship with other edit distance metrics
607:
567:
550:
522:
505:
477:
400:
360:
94:The Levenshtein distance between two strings
16:Computer science metric for string similarity
3619:
3453:Fast, memory efficient Levenshtein algorithm
1592:-- Character is replaced (a replaced with b)
3531:
3136:
4462:
4448:
3701:
3687:
3466:
3449:
2466:
2461:
2374:
25:
3629:
3604:
3586:
3545:
3492:
3427:
3372:
3347:
3122:is a free parameter to be tuned, in time
1202:
211:{\displaystyle \operatorname {lev} (a,b)}
4634:Comparison of regular-expression engines
1796:This algorithm, an example of bottom-up
1148:
1137:This definition corresponds directly to
829:{\displaystyle \operatorname {head} (x)}
3326:
3156:for any ε greater than zero unless the
1220:If the strings have the same size, the
4806:
3620:Backurs, Arturs; Indyk, Piotr (2015).
3320:
4595:Zhu–Takaoka string matching algorithm
4443:
3682:
3655:
3385:, John Benjamins Publishing Company,
1556:-- Character is inserted (b inserted)
1198:formed → uniformed (deletion of "n").
659:{\displaystyle \operatorname {tail} }
4160:Simple Knowledge Organization System
2126:
1580:-- Character is deleted (a deleted)
1301:There are other popular measures of
1168:itten (substitution of "s" for "k"),
4560:Boyer–Moore string-search algorithm
1804:String-to-string correction problem
13:
3525:
3450:Hjelmqvist, Sten (26 March 2012),
3226:Longest common subsequence problem
3158:strong exponential time hypothesis
1139:the naive recursive implementation
71:, who defined the metric in 1965.
14:
4835:
4649:Nondeterministic finite automaton
4590:Two-way string-matching algorithm
4175:Thesaurus (information retrieval)
3649:
3196:Homology of sequences in genetics
1661:is the distance between the last
1378:function that takes two strings,
3037:
1179:n (substitution of "i" for "e"),
2685:
2678:
2672:
1944:// by inserting every character
1243:
4565:Boyer–Moore–Horspool algorithm
4555:Apostolico–Giancarlo algorithm
3756:Natural language understanding
3613:
3570:
3460:
3443:
3403:
3275:
3094:
3080:
3073:
3060:
2695:Iterative with two matrix rows
1814:implementation for a function
1757:
1745:
1648:
1642:
1639:
1633:
1360:
1186:(insertion of "g" at the end).
1138:
1078:
1072:
1050:
1044:
947:
941:
905:
869:
823:
817:
755:
719:
602:
596:
584:
578:
545:
539:
494:
488:
442:
436:
424:
418:
395:
389:
377:
371:
335:
327:
315:
307:
289:
281:
269:
261:
246:
234:
205:
193:
166:
158:
136:
128:
1:
4280:Optical character recognition
3269:
1258:optical character recognition
89:
4570:Knuth–Morris–Pratt algorithm
4497:Damerau–Levenshtein distance
3973:Multi-document summarization
3176:Damerau–Levenshtein distance
1365:
1311:Damerau–Levenshtein distance
7:
4761:Compressed pattern matching
4487:Approximate string matching
4469:
4303:Latent Dirichlet allocation
4275:Natural language generation
4140:Machine-readable dictionary
4135:Linguistic Linked Open Data
3710:Natural language processing
3163:
3145:cannot be computed in time
3026:
2828:// first element of v1 is A
1250:approximate string matching
1007:th character of the string
82:. It is closely related to
10:
4840:
4766:Longest common subsequence
4677:Needleman–Wunsch algorithm
4547:String-searching algorithm
4055:Explicit semantic analysis
3804:Deep linguistic processing
3290:Доклады Академии Наук СССР
3221:Locality-sensitive hashing
3019:combines this method with
2879:// calculating costs for A
1908:// dropping all characters
1810:This is a straightforward
1724:
1721:Iterative with full matrix
1713:and all the characters in
1322:longest common subsequence
1294:
1144:
836:is the first character of
180:respectively) is given by
84:pairwise string alignments
4776:Sequential pattern mining
4743:
4690:
4657:
4624:
4616:Commentz-Walter algorithm
4604:Multiple string searching
4603:
4545:
4537:Wagner–Fischer algorithm
4477:
4406:
4361:
4316:
4288:
4248:
4193:
4115:
4103:
4034:
3991:
3963:
3898:Word-sense disambiguation
3774:
3751:Computational linguistics
3716:
3556:10.1007/s10032-002-0082-8
3480:Communications of the ACM
3327:Navarro, Gonzalo (2001).
3282:В. И. Левенштейн (1965).
3263:Sørensen similarity index
1256:, correction systems for
36:
24:
4824:Quantitative linguistics
4786:String rewriting systems
4771:Longest common substring
4682:Smith–Waterman algorithm
4507:Gestalt pattern matching
4424:Natural Language Toolkit
4348:Pronunciation assessment
4250:Automatic identification
4080:Latent semantic analysis
4036:Distributional semantics
3921:Compound-term processing
3819:Named-entity recognition
3483:(Submitted manuscript).
3298:Appeared in English as:
3284:
3206:Hunt–Szymanski algorithm
3137:Computational complexity
2705:
1836:
1818:that takes two strings,
1727:Wagner–Fischer algorithm
1388:
1355:Smith–Waterman algorithm
1272:in applications such as
931:). Either the notation
4720:Generalized suffix tree
4644:Thompson's construction
4328:Automated essay scoring
4298:Document classification
3965:Automatic summarization
1353:algorithms such as the
4672:Hirschberg's algorithm
4185:Universal Dependencies
3878:Terminology extraction
3861:Semantic decomposition
3856:Semantic role labeling
3846:Part-of-speech tagging
3814:Information extraction
3799:Coreference resolution
3789:Collocation extraction
3302:Soviet Physics Doklady
3106:
3017:Hirschberg's algorithm
1764:
1699:
1675:
1655:
1617:
1285:mutual intelligibility
1270:fuzzy string searching
1203:Upper and lower bounds
1154:
1128:
1108:
1085:
1021:
1001:
981:
954:
925:
850:
830:
798:
700:
680:
660:
637:
212:
174:
144:
114:
4527:Levenshtein automaton
4517:Jaro–Winkler distance
3946:Sentence segmentation
3503:10.1145/360825.360861
3429:10.1145/321796.321811
3358:10.1145/375360.375365
3336:ACM Computing Surveys
3216:Jaro–Winkler distance
3107:
1765:
1705:characters of string
1700:
1681:characters of string
1676:
1656:
1618:
1152:
1129:
1109:
1086:
1022:
1002:
987:is used to refer the
982:
980:{\displaystyle x_{n}}
955:
926:
851:
831:
799:
701:
681:
661:
638:
213:
175:
145:
115:
4575:Rabin–Karp algorithm
4532:Levenshtein distance
4398:Voice user interface
4109:datasets and corpora
4050:Document-term matrix
3903:Word-sense induction
3186:Dynamic time warping
3057:
3032:Levenshtein automata
1739:
1689:
1665:
1627:
1607:
1374:implementation of a
1118:
1098:
1035:
1029:counting from 0
1011:
991:
964:
935:
860:
840:
808:
710:
690:
670:
650:
225:
184:
154:
124:
98:
69:Vladimir Levenshtein
60:Levenshtein distance
20:Levenshtein distance
4819:Dynamic programming
4730:Ternary search tree
4378:Interactive fiction
4308:Pachinko allocation
4265:Speech segmentation
4221:Google Ngram Viewer
3993:Machine translation
3983:Text simplification
3978:Sentence extraction
3866:Semantic similarity
3640:2014arXiv1412.0348B
3597:2010arXiv1005.4033A
3412:Fischer, Michael J.
3410:Wagner, Robert A.;
3314:1966SPhD...10..707L
3050:to within a factor
2711:LevenshteinDistance
2683:using a minimum of
1842:LevenshteinDistance
1816:LevenshteinDistance
1798:dynamic programming
1791:dynamic programming
1281:linguistic distance
1229:triangle inequality
173:{\displaystyle |b|}
143:{\displaystyle |a|}
113:{\displaystyle a,b}
21:
4659:Sequence alignment
4626:Regular expression
4388:Question answering
4260:Speech recognition
4125:Corpus linguistics
4105:Language resources
3888:Textual entailment
3871:Sentiment analysis
3416:Journal of the ACM
3258:Numerical taxonomy
3237:Manhattan distance
3191:Euclidean distance
3102:
3021:divide and conquer
1760:
1695:
1671:
1651:
1613:
1351:sequence alignment
1262:translation memory
1155:
1124:
1104:
1081:
1017:
997:
977:
950:
921:
846:
826:
794:
696:
676:
656:
633:
628:
614:
208:
170:
140:
110:
48:information theory
19:
4799:
4798:
4791:String operations
4437:
4436:
4393:Virtual assistant
4318:Computer-assisted
4244:
4243:
4001:Computer-assisted
3959:
3958:
3951:Word segmentation
3913:Text segmentation
3851:Semantic analysis
3839:Syntactic parsing
3824:Ontology learning
3468:Hirschberg, D. S.
3392:978-90-272-1926-8
2664:
2663:
2657:
2656:
2380:
2379:
1763:{\displaystyle m}
1698:{\displaystyle j}
1674:{\displaystyle i}
1654:{\displaystyle M}
1616:{\displaystyle M}
1127:{\displaystyle b}
1107:{\displaystyle a}
1020:{\displaystyle x}
1000:{\displaystyle n}
953:{\displaystyle x}
849:{\displaystyle x}
699:{\displaystyle x}
679:{\displaystyle x}
624:
410:
324:
278:
44:
43:
4831:
4756:Pattern matching
4710:Suffix automaton
4512:Hamming distance
4464:
4457:
4450:
4441:
4440:
4414:Formal semantics
4363:Natural language
4270:Speech synthesis
4252:and data capture
4155:Semantic network
4130:Lexical resource
4113:
4112:
3931:Lexical analysis
3909:
3908:
3834:Semantic parsing
3703:
3696:
3689:
3680:
3679:
3670:
3669:
3667:
3644:
3643:
3633:
3617:
3611:
3610:
3608:
3590:
3574:
3568:
3567:
3549:
3529:
3523:
3522:
3496:
3476:
3464:
3458:
3456:
3447:
3441:
3440:
3431:
3407:
3401:
3399:
3376:
3370:
3369:
3351:
3333:
3324:
3318:
3317:
3297:
3279:
3252:Optimal matching
3201:Hamming distance
3155:
3144:
3132:
3121:
3111:
3109:
3108:
3103:
3098:
3097:
3090:
3045:
3012:
3009:
3006:
3003:
3000:
2997:
2994:
2991:
2988:
2985:
2982:
2981:substitutionCost
2979:
2976:
2973:
2970:
2967:
2964:
2961:
2958:
2955:
2952:
2949:
2946:
2943:
2942:substitutionCost
2940:
2937:
2934:
2931:
2928:
2927:substitutionCost
2925:
2922:
2919:
2916:
2913:
2910:
2907:
2904:
2901:
2898:
2895:
2892:
2889:
2886:
2883:
2880:
2877:
2874:
2871:
2868:
2865:
2862:
2859:
2856:
2853:
2850:
2847:
2844:
2841:
2838:
2835:
2832:
2829:
2826:
2823:
2820:
2817:
2814:
2811:
2808:
2805:
2802:
2799:
2796:
2793:
2790:
2787:
2784:
2781:
2778:
2775:
2772:
2769:
2766:
2763:
2760:
2757:
2754:
2751:
2748:
2745:
2742:
2739:
2736:
2733:
2730:
2727:
2724:
2721:
2718:
2715:
2712:
2709:
2690:
2689:
2688:
2682:
2681:
2676:
2675:
2653:
2616:
2579:
2542:
2505:
2468:
2463:
2458:
2386:
2385:
2376:
2348:
2317:
2286:
2255:
2224:
2193:
2133:
2132:
2127:
2119:
2116:
2113:
2110:
2107:
2106:substitutionCost
2104:
2101:
2098:
2095:
2092:
2089:
2086:
2083:
2080:
2077:
2074:
2071:
2068:
2065:
2062:
2059:
2056:
2053:
2050:
2049:substitutionCost
2047:
2044:
2041:
2038:
2035:
2034:substitutionCost
2032:
2029:
2026:
2023:
2020:
2017:
2014:
2011:
2008:
2005:
2002:
1999:
1996:
1993:
1990:
1987:
1984:
1981:
1978:
1975:
1972:
1969:
1966:
1963:
1960:
1957:
1954:
1951:
1948:
1945:
1942:
1939:
1936:
1933:
1930:
1927:
1924:
1921:
1918:
1915:
1912:
1909:
1906:
1903:
1900:
1897:
1894:
1891:
1888:
1885:
1882:
1879:
1876:
1873:
1870:
1867:
1864:
1861:
1858:
1855:
1852:
1849:
1846:
1843:
1840:
1817:
1769:
1767:
1766:
1761:
1716:
1712:
1708:
1704:
1702:
1701:
1696:
1684:
1680:
1678:
1677:
1672:
1660:
1658:
1657:
1652:
1622:
1620:
1619:
1614:
1596:
1593:
1590:
1587:
1584:
1581:
1578:
1575:
1572:
1569:
1566:
1563:
1560:
1557:
1554:
1551:
1548:
1545:
1542:
1539:
1536:
1533:
1530:
1527:
1524:
1521:
1518:
1515:
1512:
1509:
1506:
1503:
1500:
1497:
1494:
1491:
1488:
1485:
1482:
1479:
1476:
1473:
1470:
1467:
1464:
1461:
1458:
1455:
1452:
1449:
1446:
1443:
1440:
1437:
1434:
1431:
1428:
1425:
1422:
1419:
1416:
1413:
1410:
1407:
1404:
1401:
1398:
1395:
1392:
1377:
1329:Hamming distance
1238:Hamming distance
1222:Hamming distance
1133:
1131:
1130:
1125:
1113:
1111:
1110:
1105:
1090:
1088:
1087:
1082:
1065:
1064:
1026:
1024:
1023:
1018:
1006:
1004:
1003:
998:
986:
984:
983:
978:
976:
975:
959:
957:
956:
951:
930:
928:
927:
922:
920:
919:
904:
903:
891:
890:
881:
880:
855:
853:
852:
847:
835:
833:
832:
827:
803:
801:
800:
795:
793:
792:
780:
779:
770:
769:
754:
753:
741:
740:
731:
730:
705:
703:
702:
697:
685:
683:
682:
677:
665:
663:
662:
657:
642:
640:
639:
634:
632:
631:
625:
622:
618:
617:
611:
610:
571:
570:
554:
553:
526:
525:
509:
508:
481:
480:
411:
408:
404:
403:
364:
363:
338:
330:
325:
322:
318:
310:
292:
284:
279:
276:
272:
264:
217:
215:
214:
209:
179:
177:
176:
171:
169:
161:
149:
147:
146:
141:
139:
131:
119:
117:
116:
111:
56:computer science
29:
22:
18:
4839:
4838:
4834:
4833:
4832:
4830:
4829:
4828:
4804:
4803:
4800:
4795:
4739:
4686:
4653:
4639:Regular grammar
4620:
4599:
4580:Raita algorithm
4541:
4492:Bitap algorithm
4473:
4468:
4438:
4433:
4402:
4382:Syntax guessing
4364:
4357:
4343:Predictive text
4338:Grammar checker
4319:
4312:
4284:
4251:
4240:
4206:Bank of English
4189:
4117:
4108:
4099:
4030:
3987:
3955:
3907:
3809:Distant reading
3784:Argument mining
3770:
3766:Text processing
3712:
3707:
3665:
3663:
3652:
3647:
3618:
3614:
3606:10.1.1.208.2079
3575:
3571:
3530:
3526:
3494:10.1.1.348.4774
3474:
3465:
3461:
3448:
3444:
3408:
3404:
3393:
3377:
3373:
3349:10.1.1.452.6317
3331:
3325:
3321:
3286:
3280:
3276:
3272:
3267:
3166:
3146:
3142:
3139:
3123:
3116:
3086:
3076:
3072:
3058:
3055:
3054:
3043:
3040:
3029:
3014:
3013:
3010:
3007:
3004:
3001:
2998:
2995:
2992:
2989:
2986:
2983:
2980:
2977:
2974:
2971:
2968:
2965:
2962:
2959:
2956:
2953:
2950:
2947:
2944:
2941:
2938:
2935:
2932:
2929:
2926:
2923:
2920:
2917:
2914:
2911:
2908:
2905:
2902:
2899:
2896:
2893:
2890:
2887:
2884:
2881:
2878:
2875:
2872:
2869:
2866:
2863:
2860:
2857:
2854:
2851:
2848:
2845:
2842:
2839:
2836:
2833:
2830:
2827:
2824:
2821:
2818:
2815:
2812:
2809:
2806:
2803:
2800:
2797:
2794:
2791:
2788:
2785:
2782:
2779:
2776:
2773:
2770:
2767:
2764:
2761:
2758:
2755:
2752:
2749:
2746:
2743:
2740:
2737:
2734:
2731:
2728:
2725:
2722:
2719:
2716:
2713:
2710:
2707:
2697:
2686:
2684:
2679:
2673:
2665:
2121:
2120:
2117:
2114:
2112:// substitution
2111:
2108:
2105:
2102:
2099:
2096:
2093:
2090:
2087:
2084:
2081:
2078:
2075:
2072:
2069:
2066:
2063:
2060:
2057:
2054:
2051:
2048:
2045:
2042:
2039:
2036:
2033:
2030:
2027:
2024:
2021:
2018:
2015:
2012:
2009:
2006:
2003:
2000:
1997:
1994:
1991:
1988:
1985:
1982:
1979:
1976:
1973:
1970:
1967:
1964:
1961:
1958:
1955:
1952:
1949:
1946:
1943:
1940:
1937:
1934:
1931:
1928:
1925:
1922:
1919:
1916:
1913:
1910:
1907:
1904:
1901:
1898:
1895:
1892:
1889:
1886:
1883:
1880:
1877:
1874:
1871:
1868:
1865:
1862:
1859:
1856:
1853:
1850:
1847:
1844:
1841:
1838:
1815:
1774:th row and the
1740:
1737:
1736:
1729:
1723:
1714:
1710:
1706:
1690:
1687:
1686:
1682:
1666:
1663:
1662:
1628:
1625:
1624:
1608:
1605:
1604:
1598:
1597:
1594:
1591:
1588:
1585:
1582:
1579:
1576:
1573:
1570:
1567:
1564:
1561:
1558:
1555:
1552:
1549:
1546:
1543:
1540:
1537:
1534:
1531:
1528:
1525:
1522:
1519:
1516:
1513:
1510:
1507:
1504:
1501:
1498:
1495:
1492:
1489:
1486:
1483:
1480:
1477:
1474:
1471:
1468:
1465:
1462:
1459:
1456:
1453:
1450:
1447:
1444:
1441:
1438:
1435:
1432:
1429:
1426:
1423:
1420:
1417:
1414:
1411:
1408:
1405:
1402:
1399:
1396:
1393:
1390:
1375:
1368:
1363:
1299:
1293:
1246:
1205:
1182:sittin → sittin
1147:
1119:
1116:
1115:
1099:
1096:
1095:
1060:
1056:
1036:
1033:
1032:
1012:
1009:
1008:
992:
989:
988:
971:
967:
965:
962:
961:
936:
933:
932:
915:
911:
899:
895:
886:
882:
876:
872:
861:
858:
857:
841:
838:
837:
809:
806:
805:
788:
784:
775:
771:
765:
761:
749:
745:
736:
732:
726:
722:
711:
708:
707:
691:
688:
687:
671:
668:
667:
666:of some string
651:
648:
647:
627:
626:
623: otherwise
621:
619:
613:
612:
606:
605:
566:
565:
556:
555:
549:
548:
521:
520:
511:
510:
504:
503:
476:
475:
462:
461:
449:
448:
407:
405:
399:
398:
359:
358:
349:
348:
334:
326:
321:
319:
314:
306:
303:
302:
288:
280:
275:
273:
268:
260:
253:
252:
226:
223:
222:
185:
182:
181:
165:
157:
155:
152:
151:
135:
127:
125:
122:
121:
99:
96:
95:
92:
32:
17:
12:
11:
5:
4837:
4827:
4826:
4821:
4816:
4814:String metrics
4797:
4796:
4794:
4793:
4788:
4783:
4778:
4773:
4768:
4763:
4758:
4753:
4747:
4745:
4741:
4740:
4738:
4737:
4732:
4727:
4722:
4717:
4712:
4707:
4702:
4696:
4694:
4692:Data structure
4688:
4687:
4685:
4684:
4679:
4674:
4669:
4663:
4661:
4655:
4654:
4652:
4651:
4646:
4641:
4636:
4630:
4628:
4622:
4621:
4619:
4618:
4613:
4607:
4605:
4601:
4600:
4598:
4597:
4592:
4587:
4585:Trigram search
4582:
4577:
4572:
4567:
4562:
4557:
4551:
4549:
4543:
4542:
4540:
4539:
4534:
4529:
4524:
4519:
4514:
4509:
4504:
4499:
4494:
4489:
4483:
4481:
4475:
4474:
4467:
4466:
4459:
4452:
4444:
4435:
4434:
4432:
4431:
4426:
4421:
4416:
4410:
4408:
4404:
4403:
4401:
4400:
4395:
4390:
4385:
4375:
4369:
4367:
4365:user interface
4359:
4358:
4356:
4355:
4350:
4345:
4340:
4335:
4330:
4324:
4322:
4314:
4313:
4311:
4310:
4305:
4300:
4294:
4292:
4286:
4285:
4283:
4282:
4277:
4272:
4267:
4262:
4256:
4254:
4246:
4245:
4242:
4241:
4239:
4238:
4233:
4228:
4223:
4218:
4213:
4208:
4203:
4197:
4195:
4191:
4190:
4188:
4187:
4182:
4177:
4172:
4167:
4162:
4157:
4152:
4147:
4142:
4137:
4132:
4127:
4121:
4119:
4110:
4101:
4100:
4098:
4097:
4092:
4090:Word embedding
4087:
4082:
4077:
4070:Language model
4067:
4062:
4057:
4052:
4047:
4041:
4039:
4032:
4031:
4029:
4028:
4023:
4021:Transfer-based
4018:
4013:
4008:
4003:
3997:
3995:
3989:
3988:
3986:
3985:
3980:
3975:
3969:
3967:
3961:
3960:
3957:
3956:
3954:
3953:
3948:
3943:
3938:
3933:
3928:
3923:
3917:
3915:
3906:
3905:
3900:
3895:
3890:
3885:
3880:
3874:
3873:
3868:
3863:
3858:
3853:
3848:
3843:
3842:
3841:
3836:
3826:
3821:
3816:
3811:
3806:
3801:
3796:
3794:Concept mining
3791:
3786:
3780:
3778:
3772:
3771:
3769:
3768:
3763:
3758:
3753:
3748:
3747:
3746:
3741:
3731:
3726:
3720:
3718:
3714:
3713:
3706:
3705:
3698:
3691:
3683:
3677:
3676:
3671:
3651:
3650:External links
3648:
3646:
3645:
3612:
3569:
3524:
3487:(6): 341–343.
3459:
3442:
3422:(1): 168–173,
3402:
3391:
3371:
3319:
3308:(8): 707–710.
3292:(in Russian).
3273:
3271:
3268:
3266:
3265:
3260:
3255:
3249:
3244:
3239:
3234:
3228:
3223:
3218:
3213:
3208:
3203:
3198:
3193:
3188:
3183:
3178:
3173:
3167:
3165:
3162:
3138:
3135:
3113:
3112:
3101:
3096:
3093:
3089:
3085:
3082:
3079:
3075:
3071:
3068:
3065:
3062:
3039:
3036:
3028:
3025:
2706:
2696:
2693:
2662:
2661:
2655:
2654:
2649:
2646:
2643:
2640:
2637:
2634:
2631:
2628:
2625:
2621:
2620:
2617:
2612:
2609:
2606:
2603:
2600:
2597:
2594:
2591:
2587:
2586:
2583:
2580:
2575:
2572:
2569:
2566:
2563:
2560:
2557:
2553:
2552:
2549:
2546:
2543:
2538:
2535:
2532:
2529:
2526:
2523:
2519:
2518:
2515:
2512:
2509:
2506:
2501:
2498:
2495:
2492:
2489:
2485:
2484:
2481:
2478:
2475:
2472:
2469:
2464:
2459:
2454:
2451:
2447:
2446:
2443:
2440:
2437:
2434:
2431:
2428:
2425:
2422:
2419:
2416:
2415:
2412:
2409:
2406:
2403:
2400:
2397:
2394:
2391:
2389:
2383:
2382:
2381:
2378:
2377:
2372:
2369:
2366:
2363:
2360:
2357:
2354:
2350:
2349:
2344:
2341:
2338:
2335:
2332:
2329:
2326:
2322:
2321:
2318:
2313:
2310:
2307:
2304:
2301:
2298:
2294:
2293:
2290:
2287:
2282:
2279:
2276:
2273:
2270:
2266:
2265:
2262:
2259:
2256:
2251:
2248:
2245:
2242:
2238:
2237:
2234:
2231:
2228:
2225:
2220:
2217:
2214:
2210:
2209:
2206:
2203:
2200:
2197:
2194:
2189:
2186:
2182:
2181:
2178:
2175:
2172:
2169:
2166:
2163:
2160:
2157:
2156:
2153:
2150:
2147:
2144:
2141:
2138:
2136:
2125:
1837:
1759:
1756:
1753:
1750:
1747:
1744:
1725:Main article:
1722:
1719:
1694:
1670:
1650:
1647:
1644:
1641:
1638:
1635:
1632:
1612:
1389:
1367:
1364:
1362:
1359:
1344:
1343:
1332:
1325:
1318:
1295:Main article:
1292:
1289:
1274:record linkage
1254:spell checkers
1245:
1242:
1233:
1232:
1225:
1218:
1215:
1212:
1204:
1201:
1200:
1199:
1188:
1187:
1180:
1169:
1146:
1143:
1123:
1103:
1080:
1077:
1074:
1071:
1068:
1063:
1059:
1055:
1052:
1049:
1046:
1043:
1040:
1016:
996:
974:
970:
949:
946:
943:
940:
918:
914:
910:
907:
902:
898:
894:
889:
885:
879:
875:
871:
868:
865:
845:
825:
822:
819:
816:
813:
791:
787:
783:
778:
774:
768:
764:
760:
757:
752:
748:
744:
739:
735:
729:
725:
721:
718:
715:
695:
675:
655:
644:
643:
630:
620:
616:
609:
604:
601:
598:
595:
592:
589:
586:
583:
580:
577:
574:
569:
564:
561:
558:
557:
552:
547:
544:
541:
538:
535:
532:
529:
524:
519:
516:
513:
512:
507:
502:
499:
496:
493:
490:
487:
484:
479:
474:
471:
468:
467:
465:
460:
457:
454:
451:
450:
447:
444:
441:
438:
435:
432:
429:
426:
423:
420:
417:
414:
409: if
406:
402:
397:
394:
391:
388:
385:
382:
379:
376:
373:
370:
367:
362:
357:
354:
351:
350:
347:
344:
341:
337:
333:
329:
323: if
320:
317:
313:
309:
305:
304:
301:
298:
295:
291:
287:
283:
277: if
274:
271:
267:
263:
259:
258:
256:
251:
248:
245:
242:
239:
236:
233:
230:
207:
204:
201:
198:
195:
192:
189:
168:
164:
160:
138:
134:
130:
109:
106:
103:
91:
88:
42:
41:
38:
34:
33:
30:
15:
9:
6:
4:
3:
2:
4836:
4825:
4822:
4820:
4817:
4815:
4812:
4811:
4809:
4802:
4792:
4789:
4787:
4784:
4782:
4779:
4777:
4774:
4772:
4769:
4767:
4764:
4762:
4759:
4757:
4754:
4752:
4749:
4748:
4746:
4742:
4736:
4733:
4731:
4728:
4726:
4723:
4721:
4718:
4716:
4713:
4711:
4708:
4706:
4703:
4701:
4698:
4697:
4695:
4693:
4689:
4683:
4680:
4678:
4675:
4673:
4670:
4668:
4665:
4664:
4662:
4660:
4656:
4650:
4647:
4645:
4642:
4640:
4637:
4635:
4632:
4631:
4629:
4627:
4623:
4617:
4614:
4612:
4609:
4608:
4606:
4602:
4596:
4593:
4591:
4588:
4586:
4583:
4581:
4578:
4576:
4573:
4571:
4568:
4566:
4563:
4561:
4558:
4556:
4553:
4552:
4550:
4548:
4544:
4538:
4535:
4533:
4530:
4528:
4525:
4523:
4520:
4518:
4515:
4513:
4510:
4508:
4505:
4503:
4502:Edit distance
4500:
4498:
4495:
4493:
4490:
4488:
4485:
4484:
4482:
4480:
4479:String metric
4476:
4472:
4465:
4460:
4458:
4453:
4451:
4446:
4445:
4442:
4430:
4427:
4425:
4422:
4420:
4419:Hallucination
4417:
4415:
4412:
4411:
4409:
4405:
4399:
4396:
4394:
4391:
4389:
4386:
4383:
4379:
4376:
4374:
4371:
4370:
4368:
4366:
4360:
4354:
4353:Spell checker
4351:
4349:
4346:
4344:
4341:
4339:
4336:
4334:
4331:
4329:
4326:
4325:
4323:
4321:
4315:
4309:
4306:
4304:
4301:
4299:
4296:
4295:
4293:
4291:
4287:
4281:
4278:
4276:
4273:
4271:
4268:
4266:
4263:
4261:
4258:
4257:
4255:
4253:
4247:
4237:
4234:
4232:
4229:
4227:
4224:
4222:
4219:
4217:
4214:
4212:
4209:
4207:
4204:
4202:
4199:
4198:
4196:
4192:
4186:
4183:
4181:
4178:
4176:
4173:
4171:
4168:
4166:
4165:Speech corpus
4163:
4161:
4158:
4156:
4153:
4151:
4148:
4146:
4145:Parallel text
4143:
4141:
4138:
4136:
4133:
4131:
4128:
4126:
4123:
4122:
4120:
4114:
4111:
4106:
4102:
4096:
4093:
4091:
4088:
4086:
4083:
4081:
4078:
4075:
4071:
4068:
4066:
4063:
4061:
4058:
4056:
4053:
4051:
4048:
4046:
4043:
4042:
4040:
4037:
4033:
4027:
4024:
4022:
4019:
4017:
4014:
4012:
4009:
4007:
4006:Example-based
4004:
4002:
3999:
3998:
3996:
3994:
3990:
3984:
3981:
3979:
3976:
3974:
3971:
3970:
3968:
3966:
3962:
3952:
3949:
3947:
3944:
3942:
3939:
3937:
3936:Text chunking
3934:
3932:
3929:
3927:
3926:Lemmatisation
3924:
3922:
3919:
3918:
3916:
3914:
3910:
3904:
3901:
3899:
3896:
3894:
3891:
3889:
3886:
3884:
3881:
3879:
3876:
3875:
3872:
3869:
3867:
3864:
3862:
3859:
3857:
3854:
3852:
3849:
3847:
3844:
3840:
3837:
3835:
3832:
3831:
3830:
3827:
3825:
3822:
3820:
3817:
3815:
3812:
3810:
3807:
3805:
3802:
3800:
3797:
3795:
3792:
3790:
3787:
3785:
3782:
3781:
3779:
3777:
3776:Text analysis
3773:
3767:
3764:
3762:
3759:
3757:
3754:
3752:
3749:
3745:
3742:
3740:
3737:
3736:
3735:
3732:
3730:
3727:
3725:
3722:
3721:
3719:
3717:General terms
3715:
3711:
3704:
3699:
3697:
3692:
3690:
3685:
3684:
3681:
3675:
3672:
3661:
3660:
3654:
3653:
3641:
3637:
3632:
3627:
3623:
3616:
3607:
3602:
3598:
3594:
3589:
3584:
3580:
3573:
3565:
3561:
3557:
3553:
3548:
3547:10.1.1.16.652
3543:
3539:
3535:
3528:
3520:
3516:
3512:
3508:
3504:
3500:
3495:
3490:
3486:
3482:
3481:
3473:
3469:
3463:
3455:
3454:
3446:
3439:
3435:
3430:
3425:
3421:
3417:
3413:
3406:
3398:
3394:
3388:
3384:
3383:
3375:
3367:
3363:
3359:
3355:
3350:
3345:
3341:
3337:
3330:
3323:
3315:
3311:
3307:
3303:
3296:(4): 845–848.
3295:
3291:
3287:
3278:
3274:
3264:
3261:
3259:
3256:
3253:
3250:
3248:
3245:
3243:
3240:
3238:
3235:
3232:
3229:
3227:
3224:
3222:
3219:
3217:
3214:
3212:
3211:Jaccard index
3209:
3207:
3204:
3202:
3199:
3197:
3194:
3192:
3189:
3187:
3184:
3182:
3179:
3177:
3174:
3172:
3169:
3168:
3161:
3159:
3153:
3149:
3134:
3130:
3126:
3119:
3099:
3091:
3087:
3083:
3077:
3069:
3066:
3063:
3053:
3052:
3051:
3049:
3038:Approximation
3035:
3033:
3024:
3022:
3018:
2975:insertionCost
2897:insertionCost
2704:
2701:
2692:
2670:
2660:
2650:
2647:
2644:
2641:
2638:
2635:
2632:
2629:
2626:
2623:
2622:
2618:
2613:
2610:
2607:
2604:
2601:
2598:
2595:
2592:
2589:
2588:
2584:
2581:
2576:
2573:
2570:
2567:
2564:
2561:
2558:
2555:
2554:
2550:
2547:
2544:
2539:
2536:
2533:
2530:
2527:
2524:
2521:
2520:
2516:
2513:
2510:
2507:
2502:
2499:
2496:
2493:
2490:
2487:
2486:
2482:
2479:
2476:
2473:
2470:
2465:
2460:
2455:
2452:
2449:
2448:
2444:
2441:
2438:
2435:
2432:
2429:
2426:
2423:
2420:
2418:
2417:
2413:
2410:
2407:
2404:
2401:
2398:
2395:
2392:
2390:
2388:
2387:
2384:
2373:
2370:
2367:
2364:
2361:
2358:
2355:
2352:
2351:
2345:
2342:
2339:
2336:
2333:
2330:
2327:
2324:
2323:
2319:
2314:
2311:
2308:
2305:
2302:
2299:
2296:
2295:
2291:
2288:
2283:
2280:
2277:
2274:
2271:
2268:
2267:
2263:
2260:
2257:
2252:
2249:
2246:
2243:
2240:
2239:
2235:
2232:
2229:
2226:
2221:
2218:
2215:
2212:
2211:
2207:
2204:
2201:
2198:
2195:
2190:
2187:
2184:
2183:
2179:
2176:
2173:
2170:
2167:
2164:
2161:
2159:
2158:
2154:
2151:
2148:
2145:
2142:
2139:
2137:
2135:
2134:
2131:
2130:
2129:
2128:
2124:
1835:
1833:
1829:
1825:
1821:
1813:
1808:
1806:
1805:
1799:
1794:
1792:
1788:
1784:
1779:
1777:
1773:
1754:
1751:
1748:
1742:
1735:is a matrix,
1734:
1728:
1718:
1692:
1685:and the last
1668:
1645:
1636:
1630:
1610:
1601:
1387:
1385:
1381:
1373:
1358:
1356:
1352:
1348:
1347:Edit distance
1341:
1340:transposition
1337:
1336:Jaro distance
1333:
1330:
1326:
1323:
1319:
1316:
1315:transposition
1312:
1308:
1307:
1306:
1304:
1303:edit distance
1298:
1297:Edit distance
1288:
1286:
1282:
1277:
1275:
1271:
1265:
1263:
1259:
1255:
1251:
1241:
1239:
1230:
1226:
1223:
1219:
1216:
1213:
1210:
1209:
1208:
1197:
1193:
1192:
1191:
1185:
1181:
1178:
1174:
1170:
1167:
1163:
1160:
1159:
1158:
1151:
1142:
1140:
1135:
1121:
1101:
1092:
1075:
1069:
1066:
1061:
1057:
1053:
1047:
1041:
1038:
1030:
1014:
994:
972:
968:
944:
938:
916:
912:
908:
900:
896:
892:
887:
883:
877:
873:
866:
863:
843:
820:
814:
811:
789:
785:
781:
776:
772:
766:
762:
758:
750:
746:
742:
737:
733:
727:
723:
716:
713:
693:
673:
653:
599:
593:
590:
587:
581:
575:
572:
562:
559:
542:
536:
533:
530:
527:
517:
514:
500:
497:
491:
485:
482:
472:
469:
463:
455:
452:
445:
439:
433:
430:
427:
421:
415:
412:
392:
386:
383:
380:
374:
368:
365:
355:
352:
345:
342:
339:
331:
311:
299:
296:
293:
285:
265:
254:
249:
243:
240:
237:
231:
228:
221:
220:
219:
202:
199:
196:
190:
187:
162:
132:
107:
104:
101:
87:
85:
81:
80:edit distance
77:
76:edit distance
72:
70:
65:
64:string metric
61:
57:
53:
49:
39:
35:
28:
23:
4801:
4705:Suffix array
4611:Aho–Corasick
4531:
4522:Lee distance
4333:Concordancer
3729:Bag-of-words
3664:, retrieved
3658:
3621:
3615:
3578:
3572:
3540:(1): 67–85.
3537:
3533:
3527:
3484:
3478:
3462:
3452:
3445:
3419:
3415:
3405:
3396:
3381:
3374:
3342:(1): 31–88.
3339:
3335:
3322:
3305:
3301:
3293:
3289:
3277:
3242:Metric space
3151:
3147:
3140:
3128:
3124:
3117:
3114:
3048:approximated
3041:
3030:
3015:
2969:deletionCost
2882:deletionCost
2702:
2698:
2666:
2658:
2122:
2097:// insertion
1831:
1827:
1823:
1819:
1809:
1801:
1795:
1780:
1775:
1771:
1732:
1730:
1602:
1599:
1383:
1379:
1369:
1345:
1338:allows only
1300:
1278:
1266:
1247:
1244:Applications
1234:
1206:
1195:
1189:
1183:
1176:
1172:
1165:
1161:
1156:
1136:
1093:
645:
93:
75:
73:
59:
45:
4715:Suffix tree
4290:Topic model
4170:Text corpus
4016:Statistical
3883:Text mining
3724:AI-complete
2082:// deletion
1361:Computation
1313:allows the
120:(of length
52:linguistics
4808:Categories
4011:Rule-based
3893:Truecasing
3761:Stop words
3666:2 November
3270:References
3160:is false.
1830:of length
1822:of length
1812:pseudocode
646:where the
90:Definition
4320:reviewing
4118:standards
4116:Types and
3631:1412.0348
3601:CiteSeerX
3588:1005.4033
3564:207046453
3542:CiteSeerX
3519:207694727
3489:CiteSeerX
3366:207551224
3344:CiteSeerX
3254:algorithm
3092:ε
3067:
2669:invariant
1583:lDistance
1559:lDistance
1532:lDistance
1502:lDistance
1451:lDistance
1433:lDistance
1415:lDistance
1391:lDistance
1376:lDistance
1366:Recursive
1042:
893:…
867:
815:
782:…
743:…
717:
594:
576:
563:
537:
518:
486:
473:
434:
416:
387:
369:
356:
232:
191:
4236:Wikidata
4216:FrameNet
4201:BabelNet
4180:Treebank
4150:PropBank
4095:Word2vec
4060:fastText
3941:Stemming
3470:(1975).
3438:13381535
3164:See also
3027:Automata
2708:function
1839:function
1787:prefixes
1623:, where
1175:n → sitt
1164:itten →
4781:Sorting
4751:Parsing
4471:Strings
4407:Related
4373:Chatbot
4231:WordNet
4211:DBpedia
4085:Seq2seq
3829:Parsing
3744:Trigram
3636:Bibcode
3593:Bibcode
3511:0375829
3310:Bibcode
3247:MinHash
3046:can be
2963:minimum
2750:declare
2741:declare
2064:minimum
1890:element
1875:declare
1770:is the
1523:minimum
1372:Haskell
1145:Example
1031:, thus
804:), and
4380:(c.f.
4038:models
4026:Neural
3739:Bigram
3734:n-gram
3603:
3562:
3544:
3517:
3509:
3491:
3436:
3389:
3364:
3346:
3231:Lucene
3120:> 0
3115:where
3008:return
2659:
2115:return
1826:, and
1783:matrix
1589:t'
1586:s'
1574:t'
1562:s'
1550:t'
1544:s'
1508:t'
1505:s'
1478:t'
1463:s'
1442:length
1424:length
1240:is 4.
856:(i.e.
706:(i.e.
218:where
58:, the
54:, and
4744:Other
4700:DAFSA
4667:BLAST
4429:spaCy
4074:large
4065:GloVe
3626:arXiv
3583:arXiv
3560:S2CID
3515:S2CID
3475:(PDF)
3434:S2CID
3362:S2CID
3332:(PDF)
3171:agrep
2677:into
1409:->
1406:->
1403:=>
62:is a
37:Class
4735:Trie
4725:Rope
4194:Data
4045:BERT
3668:2016
3387:ISBN
3181:diff
2999:with
2993:swap
2936:else
2858:from
2804:from
2774:from
2726:char
2717:char
2667:The
2043:else
2004:from
1983:from
1953:from
1917:from
1902:zero
1887:each
1857:char
1848:char
1802:The
1514:else
1499:then
1382:and
1334:the
1327:the
1320:the
1309:the
1171:sitt
1039:head
864:head
812:head
714:tail
654:tail
591:tail
573:tail
534:tail
483:tail
431:head
413:head
384:tail
366:tail
150:and
4226:UBY
3552:doi
3499:doi
3424:doi
3354:doi
3294:163
3064:log
2852:for
2798:for
2768:for
2753:int
2744:int
1998:for
1977:for
1947:for
1911:for
1884:set
1878:int
1412:Int
1248:In
1194:uni
1114:to
960:or
560:lev
515:lev
470:lev
459:min
353:lev
229:lev
188:lev
46:In
4810::
3634:.
3599:.
3591:.
3558:.
3550:.
3536:.
3513:.
3507:MR
3505:.
3497:.
3485:18
3477:.
3432:,
3420:21
3418:,
3395:,
3360:.
3352:.
3340:33
3338:.
3334:.
3306:10
3304:.
3133:.
3011:v0
3002:v1
2996:v0
2960::=
2957:v1
2948:v0
2945::=
2933:v0
2930::=
2912:if
2903:v1
2900::=
2888:v0
2885::=
2864:to
2834:v1
2810:to
2789:v0
2780:to
2756:v1
2747:v0
2624:y
2619:4
2590:a
2585:5
2556:d
2551:6
2522:n
2517:6
2488:u
2483:7
2450:S
2445:8
2414:y
2411:a
2408:d
2405:r
2402:u
2399:t
2396:a
2393:S
2353:g
2325:n
2320:3
2297:i
2292:3
2269:t
2264:4
2241:t
2236:5
2213:i
2208:6
2185:s
2180:6
2155:n
2152:e
2149:t
2146:t
2143:i
2140:k
2061::=
2052::=
2037::=
2019:if
2010:to
1989:to
1971::=
1959:to
1935::=
1923:to
1899:to
1893:in
1717:.
1577:),
1493:==
1487:if
1397:Eq
1394:::
1264:.
1231:).
1141:.
1091:.
1027:,
86:.
50:,
4463:e
4456:t
4449:v
4384:)
4107:,
4076:)
4072:(
3702:e
3695:t
3688:v
3642:.
3638::
3628::
3609:.
3595::
3585::
3566:.
3554::
3538:5
3521:.
3501::
3457:.
3426::
3400:.
3368:.
3356::
3316:.
3312::
3154:)
3152:n
3150:(
3148:O
3143:n
3131:)
3129:n
3127:(
3125:O
3118:ε
3100:,
3095:)
3088:/
3084:1
3081:(
3078:O
3074:)
3070:n
3061:(
3044:n
2984:)
2978:,
2972:,
2966:(
2954:1
2951:+
2939::
2924::
2921:t
2918:=
2915:s
2909:1
2906:+
2894:1
2891:+
2876::
2873:1
2870:-
2867:n
2861:0
2855:j
2846:1
2843:+
2840:i
2837:=
2822::
2819:1
2816:-
2813:m
2807:0
2801:i
2795:i
2792:=
2786::
2783:n
2777:0
2771:i
2735::
2732:)
2729:t
2723:,
2720:s
2714:(
2687:d
2680:t
2674:s
2652:3
2648:4
2645:5
2642:5
2639:5
2636:4
2633:4
2630:5
2627:6
2615:3
2611:4
2608:4
2605:4
2602:4
2599:3
2596:4
2593:5
2582:4
2578:3
2574:4
2571:3
2568:3
2565:3
2562:3
2559:4
2548:5
2545:4
2541:3
2537:3
2534:2
2531:2
2528:2
2525:3
2514:5
2511:4
2508:3
2504:2
2500:2
2497:1
2494:1
2491:2
2480:6
2477:5
2474:4
2471:3
2467:2
2462:1
2457:0
2453:1
2442:7
2439:6
2436:5
2433:4
2430:3
2427:2
2424:1
2421:0
2375:3
2371:4
2368:4
2365:5
2362:6
2359:7
2356:7
2347:2
2343:3
2340:3
2337:4
2334:5
2331:6
2328:6
2316:2
2312:2
2309:3
2306:4
2303:5
2300:5
2289:2
2285:1
2281:2
2278:3
2275:4
2272:4
2261:3
2258:2
2254:1
2250:2
2247:3
2244:3
2233:4
2230:3
2227:2
2223:1
2219:2
2216:2
2205:5
2202:4
2199:3
2196:2
2192:1
2188:1
2177:5
2174:4
2171:3
2168:2
2165:1
2162:0
2118:d
2109:)
2103:+
2100:d
2094:,
2091:1
2088:+
2085:d
2079:,
2076:1
2073:+
2070:d
2067:(
2058:d
2055:1
2046::
2040:0
2031::
2028:t
2025:=
2022:s
2016::
2013:m
2007:1
2001:i
1995::
1992:n
1986:1
1980:j
1974:j
1968:d
1965::
1962:n
1956:1
1950:j
1938:i
1932:d
1929::
1926:m
1920:1
1914:i
1896:d
1881:d
1866::
1863:)
1860:t
1854:,
1851:s
1845:(
1832:n
1828:t
1824:m
1820:s
1776:j
1772:i
1758:]
1755:j
1752:,
1749:i
1746:[
1743:m
1733:m
1715:t
1711:s
1707:t
1693:j
1683:s
1669:i
1649:]
1646:j
1643:[
1640:]
1637:i
1634:[
1631:M
1611:M
1595:]
1571::
1568:b
1565:(
1553:,
1547:)
1541::
1538:a
1535:(
1529:[
1520:+
1517:1
1496:b
1490:a
1484:=
1481:)
1475::
1472:b
1469:(
1466:)
1460::
1457:a
1454:(
1445:s
1439:=
1436:s
1427:t
1421:=
1418:t
1400:a
1384:t
1380:s
1342:.
1196:n
1184:g
1177:i
1173:e
1166:s
1162:k
1122:b
1102:a
1079:]
1076:0
1073:[
1070:x
1067:=
1062:0
1058:x
1054:=
1051:)
1048:x
1045:(
1015:x
995:n
973:n
969:x
948:]
945:n
942:[
939:x
917:0
913:x
909:=
906:)
901:n
897:x
888:1
884:x
878:0
874:x
870:(
844:x
824:)
821:x
818:(
790:n
786:x
777:2
773:x
767:1
763:x
759:=
756:)
751:n
747:x
738:1
734:x
728:0
724:x
720:(
694:x
674:x
608:)
603:)
600:b
597:(
588:,
585:)
582:a
579:(
568:(
551:)
546:)
543:b
540:(
531:,
528:a
523:(
506:)
501:b
498:,
495:)
492:a
489:(
478:(
464:{
456:+
453:1
446:,
443:)
440:b
437:(
428:=
425:)
422:a
419:(
401:)
396:)
393:b
390:(
381:,
378:)
375:a
372:(
361:(
346:,
343:0
340:=
336:|
332:a
328:|
316:|
312:b
308:|
300:,
297:0
294:=
290:|
286:b
282:|
270:|
266:a
262:|
255:{
250:=
247:)
244:b
241:,
238:a
235:(
206:)
203:b
200:,
197:a
194:(
167:|
163:b
159:|
137:|
133:a
129:|
108:b
105:,
102:a
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.