Knowledge

Levenshtein distance

Source 📝

641: 224: 636:{\displaystyle \operatorname {lev} (a,b)={\begin{cases}|a|&{\text{ if }}|b|=0,\\|b|&{\text{ if }}|a|=0,\\\operatorname {lev} {\big (}\operatorname {tail} (a),\operatorname {tail} (b){\big )}&{\text{ if }}\operatorname {head} (a)=\operatorname {head} (b),\\1+\min {\begin{cases}\operatorname {lev} {\big (}\operatorname {tail} (a),b{\big )}\\\operatorname {lev} {\big (}a,\operatorname {tail} (b){\big )}\\\operatorname {lev} {\big (}\operatorname {tail} (a),\operatorname {tail} (b){\big )}\\\end{cases}}&{\text{ otherwise}}\end{cases}}} 1150: 27: 1252:, the objective is to find matches for short strings in many longer texts, in situations where a small number of differences is to be expected. The short strings could come from a dictionary, for instance. Here, one of the strings is typically short, while the other is arbitrarily long. This has a wide range of applications, for instance, 66:
for measuring the difference between two sequences. The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after Soviet mathematician
1235:
An example where the Levenshtein distance between two strings of the same length is strictly less than the Hamming distance is given by the pair "flaw" and "lawn". Here the Levenshtein distance equals 2 (delete "f" from the front; insert "n" at the end). The
1709:. The table is easy to construct one row at a time starting with row 0. When the entire table has been built, the desired distance is in the table in the last row and column, representing the distance between all of the characters in 2699:
It turns out that only two rows of the table – the previous row and the current row being calculated – are needed for the construction, if one does not want to reconstruct the edited input strings.
802: 1267:
The Levenshtein distance can also be computed between two longer strings, but the cost to compute it, which is roughly proportional to the product of the two string lengths, makes this impractical. Thus, when used to aid in
3397:
Assuming that intelligibility is inversely related to linguistic distance ... the content words the percentage of cognates (related directly or via a synonym) ... lexical relatedness ... grammatical relatedness
929: 1349:
is usually defined as a parameterizable metric calculated with a specific set of allowed edit operations, and each operation is assigned a cost (possibly infinite). This is further generalized by DNA
1089: 3110: 1157:
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following 3 edits change one into the other, and there is no way to do it with fewer than 3 edits:
216: 834: 664: 1603:
A more efficient method would never repeat the same distance calculation. For example, the Levenshtein distance of all possible suffixes might be stored in an array
985: 1224:
is an upper bound on the Levenshtein distance. The Hamming distance is the number of positions at which the corresponding symbols in the two strings are different.
178: 148: 118: 1768: 1703: 1679: 1659: 1621: 1132: 1112: 1025: 1005: 958: 854: 704: 684: 3700: 3860: 1287:: the higher the linguistic distance, the lower the mutual intelligibility, and the lower the linguistic distance, the higher the mutual intelligibility. 709: 3838: 4699: 859: 4594: 4249: 3693: 4633: 4559: 4418: 4564: 3390: 3328: 1227:
The Levenshtein distance between two strings is no greater than the sum of their Levenshtein distances from a third string (
4569: 4159: 3850: 3686: 3023:. It can compute the optimal edit sequence, and not just the edit distance, in the same asymptotic time and space bounds. 1600:
This implementation is very inefficient because it recomputes the Levenshtein distance of the same substrings many times.
4413: 2123:
Two examples of the resulting matrix (hovering over a tagged number reveals the operation performed to get that number):
1803: 3451: 4780: 4554: 4020: 3225: 3157: 1321: 1034: 4823: 4648: 4589: 4461: 4174: 4005: 3300:
Levenshtein, Vladimir I. (February 1966). "Binary codes capable of correcting deletions, insertions, and reversals".
4496: 3945: 3175: 3056: 1310: 4362: 4015: 4676: 4010: 3755: 1371: 183: 807: 4279: 4000: 3034:
efficiently determine whether a string has an edit distance lower than a given constant from a given string.
3020: 1257: 4818: 4681: 4536: 3972: 3205: 1726: 1354: 1153:
Edit distance matrix for two words using cost of substitution as 1 and cost of deletion or insertion as 0.5
649: 31:
Edit distance matrix for two words using cost of substitution as 1 and cost of deletion or insertion as 0.5
3262: 83: 4760: 4486: 4317: 4302: 4274: 4139: 4134: 3709: 1339: 1314: 1249: 1190:
A simple example of a deletion can be seen with "uninformed" and "uniformed" which have a distance of 1:
4765: 4643: 4610: 4546: 4054: 4025: 3803: 3220: 1789:
of the first string and all prefixes of the second, then we can compute the values in the matrix in a
4775: 4671: 4615: 4516: 4470: 3897: 3750: 3479: 3215: 3016: 1786: 3605: 3493: 3348: 466: 257: 4770: 4574: 4506: 4423: 4347: 4079: 4035: 3920: 3818: 3380: 3546: 4813: 4719: 4327: 4297: 3964: 3047: 2668: 3798: 1778:
th column of the matrix, with the first row having index 0 and the first column having index 0.
4184: 3877: 3855: 3845: 3813: 3788: 3600: 3541: 3488: 3343: 1284: 1269: 78:, although that term may also denote a larger family of distance metrics known collectively as 4724: 4666: 4526: 4454: 4044: 3031: 1793:
fashion, and thus find the distance between the two full strings as the last value computed.
3532:
Schulz, Klaus U.; Mihov, Stoyan (2002). "Fast String Correction with Levenshtein-Automata".
4397: 4073: 4049: 3902: 3635: 3592: 3510: 3309: 3185: 1782: 1028: 963: 68: 8: 4729: 4377: 4307: 4264: 4220: 3992: 3982: 3977: 3865: 1797: 1790: 1305:, which are calculated using a different set of allowable edit operations. For instance, 1280: 1228: 153: 123: 97: 3639: 3596: 3382:
Receptive multilingualism: linguistic analyses, language policies, and didactic concepts
3313: 4658: 4625: 4387: 4259: 4124: 3887: 3870: 3728: 3625: 3582: 3559: 3514: 3471: 3433: 3411: 3361: 3257: 3236: 3190: 1738: 1688: 1664: 1626: 1606: 1350: 1261: 1117: 1097: 1010: 990: 934: 839: 689: 669: 47: 2990:// since data in v1 is always invalidated, a swap without copy could be more efficient 2703:
The Levenshtein distance may be calculated iteratively using the following algorithm:
4790: 4392: 4104: 3912: 3823: 3657: 3622:
Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false)
3563: 3518: 3386: 3365: 3195: 3437: 1781:
Computing the Levenshtein distance is based on the observation that if we reserve a
1211:
It is at least the absolute value of the difference of the sizes of the two strings.
4785: 4755: 4709: 4511: 4447: 4269: 4154: 4129: 3930: 3833: 3579:
Polylogarithmic approximation for edit distance and the asymmetric query complexity
3551: 3498: 3423: 3353: 3288:[Binary codes capable of correcting deletions, insertions, and reversals]. 3251: 3200: 2691:
operations. At the end, the bottom-right element of the array contains the answer.
1328: 1237: 1221: 1207:
The Levenshtein distance has several simple upper and lower bounds. These include:
55: 3673: 1386:, together with their lengths, and returns the Levenshtein distance between them: 797:{\displaystyle \operatorname {tail} (x_{0}x_{1}\dots x_{n})=x_{1}x_{2}\dots x_{n}} 4638: 4579: 4491: 4381: 4342: 4337: 4205: 3935: 3808: 3783: 3765: 3506: 2671:
maintained throughout the algorithm is that we can transform the initial segment
1149: 26: 4691: 4584: 4089: 4069: 3793: 3467: 1331:
allows only substitution, hence, it only applies to strings of the same length.
1276:, the compared strings are usually short to help improve speed of comparisons. 1273: 3678: 3555: 4807: 4501: 4478: 4352: 4164: 4144: 3925: 3210: 1346: 1335: 1302: 1296: 1279:
In linguistics, the Levenshtein distance is used as a metric to quantify the
1253: 79: 63: 4704: 4521: 4332: 3950: 3241: 3502: 3428: 3357: 4714: 4289: 4169: 3882: 3775: 3723: 3141:
It has been shown that the Levenshtein distance of two strings of length
51: 1283:, or how different two languages are from one another. It is related to 3892: 3624:. Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC). 2765:// that distance is the number of characters to append to s to make t. 1811: 1317:
of two adjacent characters alongside insertion, deletion, substitution;
3760: 3472:"A linear space algorithm for computing maximal common subsequences" 3285:Двоичные коды с исправлением выпадений, вставок и замещений символов 1324:(LCS) distance allows only insertion and deletion, not substitution; 4235: 4215: 4200: 4179: 4149: 4094: 4059: 3940: 2540: 2315: 2191: 1526:-- Otherwise try all three possible actions and select the best one 3630: 3587: 2831:// edit distance is delete (i + 1) chars from s to match empty t 1731:
This section uses 1-based strings rather than 0-based strings. If
924:{\displaystyle \operatorname {head} (x_{0}x_{1}\dots x_{n})=x_{0}} 4750: 4372: 4230: 4210: 4084: 3828: 3743: 3283: 3246: 2987:// copy v1 (current row) to v0 (previous row) for next iteration 2825:// calculate v1 (current row distances) from the previous row v0 1869:// for all i and j, d will hold the Levenshtein distance between 1357:, which make an operation's cost depend on where it is applied. 1290: 3738: 3733: 3577:
Andoni, Alexandr; Krauthgamer, Robert; Onak, Krzysztof (2010).
3378: 3230: 1448:-- If t is empty, the distance is the number of characters in s 1430:-- If s is empty, the distance is the number of characters in t 1260:, and software to assist natural-language translation based on 1094:
The first element in the minimum corresponds to deletion (from
3656:
Black, Paul E., ed. (14 August 2008), "Levenshtein distance",
1872:// the first i characters of s and the first j characters of t 4428: 4064: 3659:
Dictionary of Algorithms and Data Structures [online]
3170: 3233:(an open source search engine that implements edit distance) 1511:-- If the first characters are the same, they can be ignored 4734: 3180: 629: 615: 4439: 3534:
International Journal of Document Analysis and Recognition
1941:// target prefixes can be reached from empty source prefix 1905:// source prefixes can be transformed into empty string by 4225: 1134:), the second to insertion and the third to replacement. 3299: 3042:
The Levenshtein distance between two strings of length
3005:// after the last swap, the results of v1 are now in v0 2651: 2614: 2577: 2503: 2456: 2346: 2284: 2253: 2222: 1807:
by Robert A. Wagner and Michael J. Fischer.
1370:
This is a straightforward, but inefficient, recursive
3662:, U.S. National Institute of Standards and Technology 3581:. IEEE Symp. Foundations of Computer Science (FOCS). 3576: 3059: 2762:// this row is A: edit distance from an empty s to t; 1834:, and returns the Levenshtein distance between them: 1741: 1691: 1667: 1629: 1609: 1120: 1100: 1037: 1013: 993: 966: 937: 862: 842: 810: 712: 692: 672: 652: 227: 186: 156: 126: 100: 3674:
Rosetta Code implementations of Levenshtein distance
3379:
Jan D. ten Thije; Ludger Zeevaert (1 January 2007),
3414:(1974), "The String-to-String Correction Problem", 1800:, is discussed, with variants, in the 1974 article 3104: 1762: 1697: 1673: 1653: 1615: 1126: 1106: 1083: 1019: 999: 979: 952: 923: 848: 828: 796: 698: 678: 658: 635: 210: 172: 142: 112: 2694: 4805: 3911: 3281: 2759:// initialize v0 (the previous row of distances) 1217:It is zero if and only if the strings are equal. 1084:{\displaystyle \operatorname {head} (x)=x_{0}=x} 458: 74:Levenshtein distance may also be referred to as 3708: 2738:// create two work vectors of integer distances 3329:"A guided tour to approximate string matching" 1785:to hold the Levenshtein distances between all 1214:It is at most the length of the longer string. 686:is a string of all but the first character of 40:measuring the difference between two sequences 4455: 3694: 3409: 3105:{\displaystyle (\log n)^{O(1/\varepsilon )},} 2849:// use formula to fill in the rest of the row 1720: 1291:Relationship with other edit distance metrics 607: 567: 550: 522: 505: 477: 400: 360: 94:The Levenshtein distance between two strings 16:Computer science metric for string similarity 3619: 3453:Fast, memory efficient Levenshtein algorithm 1592:-- Character is replaced (a replaced with b) 3531: 3136: 4462: 4448: 3701: 3687: 3466: 3449: 2466: 2461: 2374: 25: 3629: 3604: 3586: 3545: 3492: 3427: 3372: 3347: 3122:is a free parameter to be tuned, in time 1202: 211:{\displaystyle \operatorname {lev} (a,b)} 4634:Comparison of regular-expression engines 1796:This algorithm, an example of bottom-up 1148: 1137:This definition corresponds directly to 829:{\displaystyle \operatorname {head} (x)} 3326: 3156:for any ε greater than zero unless the 1220:If the strings have the same size, the 4806: 3620:Backurs, Arturs; Indyk, Piotr (2015). 3320: 4595:Zhu–Takaoka string matching algorithm 4443: 3682: 3655: 3385:, John Benjamins Publishing Company, 1556:-- Character is inserted (b inserted) 1198:formed → uniformed (deletion of "n"). 659:{\displaystyle \operatorname {tail} } 4160:Simple Knowledge Organization System 2126: 1580:-- Character is deleted (a deleted) 1301:There are other popular measures of 1168:itten (substitution of "s" for "k"), 4560:Boyer–Moore string-search algorithm 1804:String-to-string correction problem 13: 3525: 3450:Hjelmqvist, Sten (26 March 2012), 3226:Longest common subsequence problem 3158:strong exponential time hypothesis 1139:the naive recursive implementation 71:, who defined the metric in 1965. 14: 4835: 4649:Nondeterministic finite automaton 4590:Two-way string-matching algorithm 4175:Thesaurus (information retrieval) 3649: 3196:Homology of sequences in genetics 1661:is the distance between the last 1378:function that takes two strings, 3037: 1179:n (substitution of "i" for "e"), 2685: 2678: 2672: 1944:// by inserting every character 1243: 4565:Boyer–Moore–Horspool algorithm 4555:Apostolico–Giancarlo algorithm 3756:Natural language understanding 3613: 3570: 3460: 3443: 3403: 3275: 3094: 3080: 3073: 3060: 2695:Iterative with two matrix rows 1814:implementation for a function 1757: 1745: 1648: 1642: 1639: 1633: 1360: 1186:(insertion of "g" at the end). 1138: 1078: 1072: 1050: 1044: 947: 941: 905: 869: 823: 817: 755: 719: 602: 596: 584: 578: 545: 539: 494: 488: 442: 436: 424: 418: 395: 389: 377: 371: 335: 327: 315: 307: 289: 281: 269: 261: 246: 234: 205: 193: 166: 158: 136: 128: 1: 4280:Optical character recognition 3269: 1258:optical character recognition 89: 4570:Knuth–Morris–Pratt algorithm 4497:Damerau–Levenshtein distance 3973:Multi-document summarization 3176:Damerau–Levenshtein distance 1365: 1311:Damerau–Levenshtein distance 7: 4761:Compressed pattern matching 4487:Approximate string matching 4469: 4303:Latent Dirichlet allocation 4275:Natural language generation 4140:Machine-readable dictionary 4135:Linguistic Linked Open Data 3710:Natural language processing 3163: 3145:cannot be computed in time 3026: 2828:// first element of v1 is A 1250:approximate string matching 1007:th character of the string 82:. It is closely related to 10: 4840: 4766:Longest common subsequence 4677:Needleman–Wunsch algorithm 4547:String-searching algorithm 4055:Explicit semantic analysis 3804:Deep linguistic processing 3290:Доклады Академии Наук СССР 3221:Locality-sensitive hashing 3019:combines this method with 2879:// calculating costs for A 1908:// dropping all characters 1810:This is a straightforward 1724: 1721:Iterative with full matrix 1713:and all the characters in 1322:longest common subsequence 1294: 1144: 836:is the first character of 180:respectively) is given by 84:pairwise string alignments 4776:Sequential pattern mining 4743: 4690: 4657: 4624: 4616:Commentz-Walter algorithm 4604:Multiple string searching 4603: 4545: 4537:Wagner–Fischer algorithm 4477: 4406: 4361: 4316: 4288: 4248: 4193: 4115: 4103: 4034: 3991: 3963: 3898:Word-sense disambiguation 3774: 3751:Computational linguistics 3716: 3556:10.1007/s10032-002-0082-8 3480:Communications of the ACM 3327:Navarro, Gonzalo (2001). 3282:В. И. Левенштейн (1965). 3263:Sørensen similarity index 1256:, correction systems for 36: 24: 4824:Quantitative linguistics 4786:String rewriting systems 4771:Longest common substring 4682:Smith–Waterman algorithm 4507:Gestalt pattern matching 4424:Natural Language Toolkit 4348:Pronunciation assessment 4250:Automatic identification 4080:Latent semantic analysis 4036:Distributional semantics 3921:Compound-term processing 3819:Named-entity recognition 3483:(Submitted manuscript). 3298:Appeared in English as: 3284: 3206:Hunt–Szymanski algorithm 3137:Computational complexity 2705: 1836: 1818:that takes two strings, 1727:Wagner–Fischer algorithm 1388: 1355:Smith–Waterman algorithm 1272:in applications such as 931:). Either the notation 4720:Generalized suffix tree 4644:Thompson's construction 4328:Automated essay scoring 4298:Document classification 3965:Automatic summarization 1353:algorithms such as the 4672:Hirschberg's algorithm 4185:Universal Dependencies 3878:Terminology extraction 3861:Semantic decomposition 3856:Semantic role labeling 3846:Part-of-speech tagging 3814:Information extraction 3799:Coreference resolution 3789:Collocation extraction 3302:Soviet Physics Doklady 3106: 3017:Hirschberg's algorithm 1764: 1699: 1675: 1655: 1617: 1285:mutual intelligibility 1270:fuzzy string searching 1203:Upper and lower bounds 1154: 1128: 1108: 1085: 1021: 1001: 981: 954: 925: 850: 830: 798: 700: 680: 660: 637: 212: 174: 144: 114: 4527:Levenshtein automaton 4517:Jaro–Winkler distance 3946:Sentence segmentation 3503:10.1145/360825.360861 3429:10.1145/321796.321811 3358:10.1145/375360.375365 3336:ACM Computing Surveys 3216:Jaro–Winkler distance 3107: 1765: 1705:characters of string 1700: 1681:characters of string 1676: 1656: 1618: 1152: 1129: 1109: 1086: 1022: 1002: 987:is used to refer the 982: 980:{\displaystyle x_{n}} 955: 926: 851: 831: 799: 701: 681: 661: 638: 213: 175: 145: 115: 4575:Rabin–Karp algorithm 4532:Levenshtein distance 4398:Voice user interface 4109:datasets and corpora 4050:Document-term matrix 3903:Word-sense induction 3186:Dynamic time warping 3057: 3032:Levenshtein automata 1739: 1689: 1665: 1627: 1607: 1374:implementation of a 1118: 1098: 1035: 1029:counting from 0 1011: 991: 964: 935: 860: 840: 808: 710: 690: 670: 650: 225: 184: 154: 124: 98: 69:Vladimir Levenshtein 60:Levenshtein distance 20:Levenshtein distance 4819:Dynamic programming 4730:Ternary search tree 4378:Interactive fiction 4308:Pachinko allocation 4265:Speech segmentation 4221:Google Ngram Viewer 3993:Machine translation 3983:Text simplification 3978:Sentence extraction 3866:Semantic similarity 3640:2014arXiv1412.0348B 3597:2010arXiv1005.4033A 3412:Fischer, Michael J. 3410:Wagner, Robert A.; 3314:1966SPhD...10..707L 3050:to within a factor 2711:LevenshteinDistance 2683:using a minimum of 1842:LevenshteinDistance 1816:LevenshteinDistance 1798:dynamic programming 1791:dynamic programming 1281:linguistic distance 1229:triangle inequality 173:{\displaystyle |b|} 143:{\displaystyle |a|} 113:{\displaystyle a,b} 21: 4659:Sequence alignment 4626:Regular expression 4388:Question answering 4260:Speech recognition 4125:Corpus linguistics 4105:Language resources 3888:Textual entailment 3871:Sentiment analysis 3416:Journal of the ACM 3258:Numerical taxonomy 3237:Manhattan distance 3191:Euclidean distance 3102: 3021:divide and conquer 1760: 1695: 1671: 1651: 1613: 1351:sequence alignment 1262:translation memory 1155: 1124: 1104: 1081: 1017: 997: 977: 950: 921: 846: 826: 794: 696: 676: 656: 633: 628: 614: 208: 170: 140: 110: 48:information theory 19: 4799: 4798: 4791:String operations 4437: 4436: 4393:Virtual assistant 4318:Computer-assisted 4244: 4243: 4001:Computer-assisted 3959: 3958: 3951:Word segmentation 3913:Text segmentation 3851:Semantic analysis 3839:Syntactic parsing 3824:Ontology learning 3468:Hirschberg, D. S. 3392:978-90-272-1926-8 2664: 2663: 2657: 2656: 2380: 2379: 1763:{\displaystyle m} 1698:{\displaystyle j} 1674:{\displaystyle i} 1654:{\displaystyle M} 1616:{\displaystyle M} 1127:{\displaystyle b} 1107:{\displaystyle a} 1020:{\displaystyle x} 1000:{\displaystyle n} 953:{\displaystyle x} 849:{\displaystyle x} 699:{\displaystyle x} 679:{\displaystyle x} 624: 410: 324: 278: 44: 43: 4831: 4756:Pattern matching 4710:Suffix automaton 4512:Hamming distance 4464: 4457: 4450: 4441: 4440: 4414:Formal semantics 4363:Natural language 4270:Speech synthesis 4252:and data capture 4155:Semantic network 4130:Lexical resource 4113: 4112: 3931:Lexical analysis 3909: 3908: 3834:Semantic parsing 3703: 3696: 3689: 3680: 3679: 3670: 3669: 3667: 3644: 3643: 3633: 3617: 3611: 3610: 3608: 3590: 3574: 3568: 3567: 3549: 3529: 3523: 3522: 3496: 3476: 3464: 3458: 3456: 3447: 3441: 3440: 3431: 3407: 3401: 3399: 3376: 3370: 3369: 3351: 3333: 3324: 3318: 3317: 3297: 3279: 3252:Optimal matching 3201:Hamming distance 3155: 3144: 3132: 3121: 3111: 3109: 3108: 3103: 3098: 3097: 3090: 3045: 3012: 3009: 3006: 3003: 3000: 2997: 2994: 2991: 2988: 2985: 2982: 2981:substitutionCost 2979: 2976: 2973: 2970: 2967: 2964: 2961: 2958: 2955: 2952: 2949: 2946: 2943: 2942:substitutionCost 2940: 2937: 2934: 2931: 2928: 2927:substitutionCost 2925: 2922: 2919: 2916: 2913: 2910: 2907: 2904: 2901: 2898: 2895: 2892: 2889: 2886: 2883: 2880: 2877: 2874: 2871: 2868: 2865: 2862: 2859: 2856: 2853: 2850: 2847: 2844: 2841: 2838: 2835: 2832: 2829: 2826: 2823: 2820: 2817: 2814: 2811: 2808: 2805: 2802: 2799: 2796: 2793: 2790: 2787: 2784: 2781: 2778: 2775: 2772: 2769: 2766: 2763: 2760: 2757: 2754: 2751: 2748: 2745: 2742: 2739: 2736: 2733: 2730: 2727: 2724: 2721: 2718: 2715: 2712: 2709: 2690: 2689: 2688: 2682: 2681: 2676: 2675: 2653: 2616: 2579: 2542: 2505: 2468: 2463: 2458: 2386: 2385: 2376: 2348: 2317: 2286: 2255: 2224: 2193: 2133: 2132: 2127: 2119: 2116: 2113: 2110: 2107: 2106:substitutionCost 2104: 2101: 2098: 2095: 2092: 2089: 2086: 2083: 2080: 2077: 2074: 2071: 2068: 2065: 2062: 2059: 2056: 2053: 2050: 2049:substitutionCost 2047: 2044: 2041: 2038: 2035: 2034:substitutionCost 2032: 2029: 2026: 2023: 2020: 2017: 2014: 2011: 2008: 2005: 2002: 1999: 1996: 1993: 1990: 1987: 1984: 1981: 1978: 1975: 1972: 1969: 1966: 1963: 1960: 1957: 1954: 1951: 1948: 1945: 1942: 1939: 1936: 1933: 1930: 1927: 1924: 1921: 1918: 1915: 1912: 1909: 1906: 1903: 1900: 1897: 1894: 1891: 1888: 1885: 1882: 1879: 1876: 1873: 1870: 1867: 1864: 1861: 1858: 1855: 1852: 1849: 1846: 1843: 1840: 1817: 1769: 1767: 1766: 1761: 1716: 1712: 1708: 1704: 1702: 1701: 1696: 1684: 1680: 1678: 1677: 1672: 1660: 1658: 1657: 1652: 1622: 1620: 1619: 1614: 1596: 1593: 1590: 1587: 1584: 1581: 1578: 1575: 1572: 1569: 1566: 1563: 1560: 1557: 1554: 1551: 1548: 1545: 1542: 1539: 1536: 1533: 1530: 1527: 1524: 1521: 1518: 1515: 1512: 1509: 1506: 1503: 1500: 1497: 1494: 1491: 1488: 1485: 1482: 1479: 1476: 1473: 1470: 1467: 1464: 1461: 1458: 1455: 1452: 1449: 1446: 1443: 1440: 1437: 1434: 1431: 1428: 1425: 1422: 1419: 1416: 1413: 1410: 1407: 1404: 1401: 1398: 1395: 1392: 1377: 1329:Hamming distance 1238:Hamming distance 1222:Hamming distance 1133: 1131: 1130: 1125: 1113: 1111: 1110: 1105: 1090: 1088: 1087: 1082: 1065: 1064: 1026: 1024: 1023: 1018: 1006: 1004: 1003: 998: 986: 984: 983: 978: 976: 975: 959: 957: 956: 951: 930: 928: 927: 922: 920: 919: 904: 903: 891: 890: 881: 880: 855: 853: 852: 847: 835: 833: 832: 827: 803: 801: 800: 795: 793: 792: 780: 779: 770: 769: 754: 753: 741: 740: 731: 730: 705: 703: 702: 697: 685: 683: 682: 677: 665: 663: 662: 657: 642: 640: 639: 634: 632: 631: 625: 622: 618: 617: 611: 610: 571: 570: 554: 553: 526: 525: 509: 508: 481: 480: 411: 408: 404: 403: 364: 363: 338: 330: 325: 322: 318: 310: 292: 284: 279: 276: 272: 264: 217: 215: 214: 209: 179: 177: 176: 171: 169: 161: 149: 147: 146: 141: 139: 131: 119: 117: 116: 111: 56:computer science 29: 22: 18: 4839: 4838: 4834: 4833: 4832: 4830: 4829: 4828: 4804: 4803: 4800: 4795: 4739: 4686: 4653: 4639:Regular grammar 4620: 4599: 4580:Raita algorithm 4541: 4492:Bitap algorithm 4473: 4468: 4438: 4433: 4402: 4382:Syntax guessing 4364: 4357: 4343:Predictive text 4338:Grammar checker 4319: 4312: 4284: 4251: 4240: 4206:Bank of English 4189: 4117: 4108: 4099: 4030: 3987: 3955: 3907: 3809:Distant reading 3784:Argument mining 3770: 3766:Text processing 3712: 3707: 3665: 3663: 3652: 3647: 3618: 3614: 3606:10.1.1.208.2079 3575: 3571: 3530: 3526: 3494:10.1.1.348.4774 3474: 3465: 3461: 3448: 3444: 3408: 3404: 3393: 3377: 3373: 3349:10.1.1.452.6317 3331: 3325: 3321: 3286: 3280: 3276: 3272: 3267: 3166: 3146: 3142: 3139: 3123: 3116: 3086: 3076: 3072: 3058: 3055: 3054: 3043: 3040: 3029: 3014: 3013: 3010: 3007: 3004: 3001: 2998: 2995: 2992: 2989: 2986: 2983: 2980: 2977: 2974: 2971: 2968: 2965: 2962: 2959: 2956: 2953: 2950: 2947: 2944: 2941: 2938: 2935: 2932: 2929: 2926: 2923: 2920: 2917: 2914: 2911: 2908: 2905: 2902: 2899: 2896: 2893: 2890: 2887: 2884: 2881: 2878: 2875: 2872: 2869: 2866: 2863: 2860: 2857: 2854: 2851: 2848: 2845: 2842: 2839: 2836: 2833: 2830: 2827: 2824: 2821: 2818: 2815: 2812: 2809: 2806: 2803: 2800: 2797: 2794: 2791: 2788: 2785: 2782: 2779: 2776: 2773: 2770: 2767: 2764: 2761: 2758: 2755: 2752: 2749: 2746: 2743: 2740: 2737: 2734: 2731: 2728: 2725: 2722: 2719: 2716: 2713: 2710: 2707: 2697: 2686: 2684: 2679: 2673: 2665: 2121: 2120: 2117: 2114: 2112:// substitution 2111: 2108: 2105: 2102: 2099: 2096: 2093: 2090: 2087: 2084: 2081: 2078: 2075: 2072: 2069: 2066: 2063: 2060: 2057: 2054: 2051: 2048: 2045: 2042: 2039: 2036: 2033: 2030: 2027: 2024: 2021: 2018: 2015: 2012: 2009: 2006: 2003: 2000: 1997: 1994: 1991: 1988: 1985: 1982: 1979: 1976: 1973: 1970: 1967: 1964: 1961: 1958: 1955: 1952: 1949: 1946: 1943: 1940: 1937: 1934: 1931: 1928: 1925: 1922: 1919: 1916: 1913: 1910: 1907: 1904: 1901: 1898: 1895: 1892: 1889: 1886: 1883: 1880: 1877: 1874: 1871: 1868: 1865: 1862: 1859: 1856: 1853: 1850: 1847: 1844: 1841: 1838: 1815: 1774:th row and the 1740: 1737: 1736: 1729: 1723: 1714: 1710: 1706: 1690: 1687: 1686: 1682: 1666: 1663: 1662: 1628: 1625: 1624: 1608: 1605: 1604: 1598: 1597: 1594: 1591: 1588: 1585: 1582: 1579: 1576: 1573: 1570: 1567: 1564: 1561: 1558: 1555: 1552: 1549: 1546: 1543: 1540: 1537: 1534: 1531: 1528: 1525: 1522: 1519: 1516: 1513: 1510: 1507: 1504: 1501: 1498: 1495: 1492: 1489: 1486: 1483: 1480: 1477: 1474: 1471: 1468: 1465: 1462: 1459: 1456: 1453: 1450: 1447: 1444: 1441: 1438: 1435: 1432: 1429: 1426: 1423: 1420: 1417: 1414: 1411: 1408: 1405: 1402: 1399: 1396: 1393: 1390: 1375: 1368: 1363: 1299: 1293: 1246: 1205: 1182:sittin → sittin 1147: 1119: 1116: 1115: 1099: 1096: 1095: 1060: 1056: 1036: 1033: 1032: 1012: 1009: 1008: 992: 989: 988: 971: 967: 965: 962: 961: 936: 933: 932: 915: 911: 899: 895: 886: 882: 876: 872: 861: 858: 857: 841: 838: 837: 809: 806: 805: 788: 784: 775: 771: 765: 761: 749: 745: 736: 732: 726: 722: 711: 708: 707: 691: 688: 687: 671: 668: 667: 666:of some string 651: 648: 647: 627: 626: 623: otherwise 621: 619: 613: 612: 606: 605: 566: 565: 556: 555: 549: 548: 521: 520: 511: 510: 504: 503: 476: 475: 462: 461: 449: 448: 407: 405: 399: 398: 359: 358: 349: 348: 334: 326: 321: 319: 314: 306: 303: 302: 288: 280: 275: 273: 268: 260: 253: 252: 226: 223: 222: 185: 182: 181: 165: 157: 155: 152: 151: 135: 127: 125: 122: 121: 99: 96: 95: 92: 32: 17: 12: 11: 5: 4837: 4827: 4826: 4821: 4816: 4814:String metrics 4797: 4796: 4794: 4793: 4788: 4783: 4778: 4773: 4768: 4763: 4758: 4753: 4747: 4745: 4741: 4740: 4738: 4737: 4732: 4727: 4722: 4717: 4712: 4707: 4702: 4696: 4694: 4692:Data structure 4688: 4687: 4685: 4684: 4679: 4674: 4669: 4663: 4661: 4655: 4654: 4652: 4651: 4646: 4641: 4636: 4630: 4628: 4622: 4621: 4619: 4618: 4613: 4607: 4605: 4601: 4600: 4598: 4597: 4592: 4587: 4585:Trigram search 4582: 4577: 4572: 4567: 4562: 4557: 4551: 4549: 4543: 4542: 4540: 4539: 4534: 4529: 4524: 4519: 4514: 4509: 4504: 4499: 4494: 4489: 4483: 4481: 4475: 4474: 4467: 4466: 4459: 4452: 4444: 4435: 4434: 4432: 4431: 4426: 4421: 4416: 4410: 4408: 4404: 4403: 4401: 4400: 4395: 4390: 4385: 4375: 4369: 4367: 4365:user interface 4359: 4358: 4356: 4355: 4350: 4345: 4340: 4335: 4330: 4324: 4322: 4314: 4313: 4311: 4310: 4305: 4300: 4294: 4292: 4286: 4285: 4283: 4282: 4277: 4272: 4267: 4262: 4256: 4254: 4246: 4245: 4242: 4241: 4239: 4238: 4233: 4228: 4223: 4218: 4213: 4208: 4203: 4197: 4195: 4191: 4190: 4188: 4187: 4182: 4177: 4172: 4167: 4162: 4157: 4152: 4147: 4142: 4137: 4132: 4127: 4121: 4119: 4110: 4101: 4100: 4098: 4097: 4092: 4090:Word embedding 4087: 4082: 4077: 4070:Language model 4067: 4062: 4057: 4052: 4047: 4041: 4039: 4032: 4031: 4029: 4028: 4023: 4021:Transfer-based 4018: 4013: 4008: 4003: 3997: 3995: 3989: 3988: 3986: 3985: 3980: 3975: 3969: 3967: 3961: 3960: 3957: 3956: 3954: 3953: 3948: 3943: 3938: 3933: 3928: 3923: 3917: 3915: 3906: 3905: 3900: 3895: 3890: 3885: 3880: 3874: 3873: 3868: 3863: 3858: 3853: 3848: 3843: 3842: 3841: 3836: 3826: 3821: 3816: 3811: 3806: 3801: 3796: 3794:Concept mining 3791: 3786: 3780: 3778: 3772: 3771: 3769: 3768: 3763: 3758: 3753: 3748: 3747: 3746: 3741: 3731: 3726: 3720: 3718: 3714: 3713: 3706: 3705: 3698: 3691: 3683: 3677: 3676: 3671: 3651: 3650:External links 3648: 3646: 3645: 3612: 3569: 3524: 3487:(6): 341–343. 3459: 3442: 3422:(1): 168–173, 3402: 3391: 3371: 3319: 3308:(8): 707–710. 3292:(in Russian). 3273: 3271: 3268: 3266: 3265: 3260: 3255: 3249: 3244: 3239: 3234: 3228: 3223: 3218: 3213: 3208: 3203: 3198: 3193: 3188: 3183: 3178: 3173: 3167: 3165: 3162: 3138: 3135: 3113: 3112: 3101: 3096: 3093: 3089: 3085: 3082: 3079: 3075: 3071: 3068: 3065: 3062: 3039: 3036: 3028: 3025: 2706: 2696: 2693: 2662: 2661: 2655: 2654: 2649: 2646: 2643: 2640: 2637: 2634: 2631: 2628: 2625: 2621: 2620: 2617: 2612: 2609: 2606: 2603: 2600: 2597: 2594: 2591: 2587: 2586: 2583: 2580: 2575: 2572: 2569: 2566: 2563: 2560: 2557: 2553: 2552: 2549: 2546: 2543: 2538: 2535: 2532: 2529: 2526: 2523: 2519: 2518: 2515: 2512: 2509: 2506: 2501: 2498: 2495: 2492: 2489: 2485: 2484: 2481: 2478: 2475: 2472: 2469: 2464: 2459: 2454: 2451: 2447: 2446: 2443: 2440: 2437: 2434: 2431: 2428: 2425: 2422: 2419: 2416: 2415: 2412: 2409: 2406: 2403: 2400: 2397: 2394: 2391: 2389: 2383: 2382: 2381: 2378: 2377: 2372: 2369: 2366: 2363: 2360: 2357: 2354: 2350: 2349: 2344: 2341: 2338: 2335: 2332: 2329: 2326: 2322: 2321: 2318: 2313: 2310: 2307: 2304: 2301: 2298: 2294: 2293: 2290: 2287: 2282: 2279: 2276: 2273: 2270: 2266: 2265: 2262: 2259: 2256: 2251: 2248: 2245: 2242: 2238: 2237: 2234: 2231: 2228: 2225: 2220: 2217: 2214: 2210: 2209: 2206: 2203: 2200: 2197: 2194: 2189: 2186: 2182: 2181: 2178: 2175: 2172: 2169: 2166: 2163: 2160: 2157: 2156: 2153: 2150: 2147: 2144: 2141: 2138: 2136: 2125: 1837: 1759: 1756: 1753: 1750: 1747: 1744: 1725:Main article: 1722: 1719: 1694: 1670: 1650: 1647: 1644: 1641: 1638: 1635: 1632: 1612: 1389: 1367: 1364: 1362: 1359: 1344: 1343: 1332: 1325: 1318: 1295:Main article: 1292: 1289: 1274:record linkage 1254:spell checkers 1245: 1242: 1233: 1232: 1225: 1218: 1215: 1212: 1204: 1201: 1200: 1199: 1188: 1187: 1180: 1169: 1146: 1143: 1123: 1103: 1080: 1077: 1074: 1071: 1068: 1063: 1059: 1055: 1052: 1049: 1046: 1043: 1040: 1016: 996: 974: 970: 949: 946: 943: 940: 918: 914: 910: 907: 902: 898: 894: 889: 885: 879: 875: 871: 868: 865: 845: 825: 822: 819: 816: 813: 791: 787: 783: 778: 774: 768: 764: 760: 757: 752: 748: 744: 739: 735: 729: 725: 721: 718: 715: 695: 675: 655: 644: 643: 630: 620: 616: 609: 604: 601: 598: 595: 592: 589: 586: 583: 580: 577: 574: 569: 564: 561: 558: 557: 552: 547: 544: 541: 538: 535: 532: 529: 524: 519: 516: 513: 512: 507: 502: 499: 496: 493: 490: 487: 484: 479: 474: 471: 468: 467: 465: 460: 457: 454: 451: 450: 447: 444: 441: 438: 435: 432: 429: 426: 423: 420: 417: 414: 409: if  406: 402: 397: 394: 391: 388: 385: 382: 379: 376: 373: 370: 367: 362: 357: 354: 351: 350: 347: 344: 341: 337: 333: 329: 323: if  320: 317: 313: 309: 305: 304: 301: 298: 295: 291: 287: 283: 277: if  274: 271: 267: 263: 259: 258: 256: 251: 248: 245: 242: 239: 236: 233: 230: 207: 204: 201: 198: 195: 192: 189: 168: 164: 160: 138: 134: 130: 109: 106: 103: 91: 88: 42: 41: 38: 34: 33: 30: 15: 9: 6: 4: 3: 2: 4836: 4825: 4822: 4820: 4817: 4815: 4812: 4811: 4809: 4802: 4792: 4789: 4787: 4784: 4782: 4779: 4777: 4774: 4772: 4769: 4767: 4764: 4762: 4759: 4757: 4754: 4752: 4749: 4748: 4746: 4742: 4736: 4733: 4731: 4728: 4726: 4723: 4721: 4718: 4716: 4713: 4711: 4708: 4706: 4703: 4701: 4698: 4697: 4695: 4693: 4689: 4683: 4680: 4678: 4675: 4673: 4670: 4668: 4665: 4664: 4662: 4660: 4656: 4650: 4647: 4645: 4642: 4640: 4637: 4635: 4632: 4631: 4629: 4627: 4623: 4617: 4614: 4612: 4609: 4608: 4606: 4602: 4596: 4593: 4591: 4588: 4586: 4583: 4581: 4578: 4576: 4573: 4571: 4568: 4566: 4563: 4561: 4558: 4556: 4553: 4552: 4550: 4548: 4544: 4538: 4535: 4533: 4530: 4528: 4525: 4523: 4520: 4518: 4515: 4513: 4510: 4508: 4505: 4503: 4502:Edit distance 4500: 4498: 4495: 4493: 4490: 4488: 4485: 4484: 4482: 4480: 4479:String metric 4476: 4472: 4465: 4460: 4458: 4453: 4451: 4446: 4445: 4442: 4430: 4427: 4425: 4422: 4420: 4419:Hallucination 4417: 4415: 4412: 4411: 4409: 4405: 4399: 4396: 4394: 4391: 4389: 4386: 4383: 4379: 4376: 4374: 4371: 4370: 4368: 4366: 4360: 4354: 4353:Spell checker 4351: 4349: 4346: 4344: 4341: 4339: 4336: 4334: 4331: 4329: 4326: 4325: 4323: 4321: 4315: 4309: 4306: 4304: 4301: 4299: 4296: 4295: 4293: 4291: 4287: 4281: 4278: 4276: 4273: 4271: 4268: 4266: 4263: 4261: 4258: 4257: 4255: 4253: 4247: 4237: 4234: 4232: 4229: 4227: 4224: 4222: 4219: 4217: 4214: 4212: 4209: 4207: 4204: 4202: 4199: 4198: 4196: 4192: 4186: 4183: 4181: 4178: 4176: 4173: 4171: 4168: 4166: 4165:Speech corpus 4163: 4161: 4158: 4156: 4153: 4151: 4148: 4146: 4145:Parallel text 4143: 4141: 4138: 4136: 4133: 4131: 4128: 4126: 4123: 4122: 4120: 4114: 4111: 4106: 4102: 4096: 4093: 4091: 4088: 4086: 4083: 4081: 4078: 4075: 4071: 4068: 4066: 4063: 4061: 4058: 4056: 4053: 4051: 4048: 4046: 4043: 4042: 4040: 4037: 4033: 4027: 4024: 4022: 4019: 4017: 4014: 4012: 4009: 4007: 4006:Example-based 4004: 4002: 3999: 3998: 3996: 3994: 3990: 3984: 3981: 3979: 3976: 3974: 3971: 3970: 3968: 3966: 3962: 3952: 3949: 3947: 3944: 3942: 3939: 3937: 3936:Text chunking 3934: 3932: 3929: 3927: 3926:Lemmatisation 3924: 3922: 3919: 3918: 3916: 3914: 3910: 3904: 3901: 3899: 3896: 3894: 3891: 3889: 3886: 3884: 3881: 3879: 3876: 3875: 3872: 3869: 3867: 3864: 3862: 3859: 3857: 3854: 3852: 3849: 3847: 3844: 3840: 3837: 3835: 3832: 3831: 3830: 3827: 3825: 3822: 3820: 3817: 3815: 3812: 3810: 3807: 3805: 3802: 3800: 3797: 3795: 3792: 3790: 3787: 3785: 3782: 3781: 3779: 3777: 3776:Text analysis 3773: 3767: 3764: 3762: 3759: 3757: 3754: 3752: 3749: 3745: 3742: 3740: 3737: 3736: 3735: 3732: 3730: 3727: 3725: 3722: 3721: 3719: 3717:General terms 3715: 3711: 3704: 3699: 3697: 3692: 3690: 3685: 3684: 3681: 3675: 3672: 3661: 3660: 3654: 3653: 3641: 3637: 3632: 3627: 3623: 3616: 3607: 3602: 3598: 3594: 3589: 3584: 3580: 3573: 3565: 3561: 3557: 3553: 3548: 3547:10.1.1.16.652 3543: 3539: 3535: 3528: 3520: 3516: 3512: 3508: 3504: 3500: 3495: 3490: 3486: 3482: 3481: 3473: 3469: 3463: 3455: 3454: 3446: 3439: 3435: 3430: 3425: 3421: 3417: 3413: 3406: 3398: 3394: 3388: 3384: 3383: 3375: 3367: 3363: 3359: 3355: 3350: 3345: 3341: 3337: 3330: 3323: 3315: 3311: 3307: 3303: 3296:(4): 845–848. 3295: 3291: 3287: 3278: 3274: 3264: 3261: 3259: 3256: 3253: 3250: 3248: 3245: 3243: 3240: 3238: 3235: 3232: 3229: 3227: 3224: 3222: 3219: 3217: 3214: 3212: 3211:Jaccard index 3209: 3207: 3204: 3202: 3199: 3197: 3194: 3192: 3189: 3187: 3184: 3182: 3179: 3177: 3174: 3172: 3169: 3168: 3161: 3159: 3153: 3149: 3134: 3130: 3126: 3119: 3099: 3091: 3087: 3083: 3077: 3069: 3066: 3063: 3053: 3052: 3051: 3049: 3038:Approximation 3035: 3033: 3024: 3022: 3018: 2975:insertionCost 2897:insertionCost 2704: 2701: 2692: 2670: 2660: 2650: 2647: 2644: 2641: 2638: 2635: 2632: 2629: 2626: 2623: 2622: 2618: 2613: 2610: 2607: 2604: 2601: 2598: 2595: 2592: 2589: 2588: 2584: 2581: 2576: 2573: 2570: 2567: 2564: 2561: 2558: 2555: 2554: 2550: 2547: 2544: 2539: 2536: 2533: 2530: 2527: 2524: 2521: 2520: 2516: 2513: 2510: 2507: 2502: 2499: 2496: 2493: 2490: 2487: 2486: 2482: 2479: 2476: 2473: 2470: 2465: 2460: 2455: 2452: 2449: 2448: 2444: 2441: 2438: 2435: 2432: 2429: 2426: 2423: 2420: 2418: 2417: 2413: 2410: 2407: 2404: 2401: 2398: 2395: 2392: 2390: 2388: 2387: 2384: 2373: 2370: 2367: 2364: 2361: 2358: 2355: 2352: 2351: 2345: 2342: 2339: 2336: 2333: 2330: 2327: 2324: 2323: 2319: 2314: 2311: 2308: 2305: 2302: 2299: 2296: 2295: 2291: 2288: 2283: 2280: 2277: 2274: 2271: 2268: 2267: 2263: 2260: 2257: 2252: 2249: 2246: 2243: 2240: 2239: 2235: 2232: 2229: 2226: 2221: 2218: 2215: 2212: 2211: 2207: 2204: 2201: 2198: 2195: 2190: 2187: 2184: 2183: 2179: 2176: 2173: 2170: 2167: 2164: 2161: 2159: 2158: 2154: 2151: 2148: 2145: 2142: 2139: 2137: 2135: 2134: 2131: 2130: 2129: 2128: 2124: 1835: 1833: 1829: 1825: 1821: 1813: 1808: 1806: 1805: 1799: 1794: 1792: 1788: 1784: 1779: 1777: 1773: 1754: 1751: 1748: 1742: 1735:is a matrix, 1734: 1728: 1718: 1692: 1685:and the last 1668: 1645: 1636: 1630: 1610: 1601: 1387: 1385: 1381: 1373: 1358: 1356: 1352: 1348: 1347:Edit distance 1341: 1340:transposition 1337: 1336:Jaro distance 1333: 1330: 1326: 1323: 1319: 1316: 1315:transposition 1312: 1308: 1307: 1306: 1304: 1303:edit distance 1298: 1297:Edit distance 1288: 1286: 1282: 1277: 1275: 1271: 1265: 1263: 1259: 1255: 1251: 1241: 1239: 1230: 1226: 1223: 1219: 1216: 1213: 1210: 1209: 1208: 1197: 1193: 1192: 1191: 1185: 1181: 1178: 1174: 1170: 1167: 1163: 1160: 1159: 1158: 1151: 1142: 1140: 1135: 1121: 1101: 1092: 1075: 1069: 1066: 1061: 1057: 1053: 1047: 1041: 1038: 1030: 1014: 994: 972: 968: 944: 938: 916: 912: 908: 900: 896: 892: 887: 883: 877: 873: 866: 863: 843: 820: 814: 811: 789: 785: 781: 776: 772: 766: 762: 758: 750: 746: 742: 737: 733: 727: 723: 716: 713: 693: 673: 653: 599: 593: 590: 587: 581: 575: 572: 562: 559: 542: 536: 533: 530: 527: 517: 514: 500: 497: 491: 485: 482: 472: 469: 463: 455: 452: 445: 439: 433: 430: 427: 421: 415: 412: 392: 386: 383: 380: 374: 368: 365: 355: 352: 345: 342: 339: 331: 311: 299: 296: 293: 285: 265: 254: 249: 243: 240: 237: 231: 228: 221: 220: 219: 202: 199: 196: 190: 187: 162: 132: 107: 104: 101: 87: 85: 81: 80:edit distance 77: 76:edit distance 72: 70: 65: 64:string metric 61: 57: 53: 49: 39: 35: 28: 23: 4801: 4705:Suffix array 4611:Aho–Corasick 4531: 4522:Lee distance 4333:Concordancer 3729:Bag-of-words 3664:, retrieved 3658: 3621: 3615: 3578: 3572: 3540:(1): 67–85. 3537: 3533: 3527: 3484: 3478: 3462: 3452: 3445: 3419: 3415: 3405: 3396: 3381: 3374: 3342:(1): 31–88. 3339: 3335: 3322: 3305: 3301: 3293: 3289: 3277: 3242:Metric space 3151: 3147: 3140: 3128: 3124: 3117: 3114: 3048:approximated 3041: 3030: 3015: 2969:deletionCost 2882:deletionCost 2702: 2698: 2666: 2658: 2122: 2097:// insertion 1831: 1827: 1823: 1819: 1809: 1801: 1795: 1780: 1775: 1771: 1732: 1730: 1602: 1599: 1383: 1379: 1369: 1345: 1338:allows only 1300: 1278: 1266: 1247: 1244:Applications 1234: 1206: 1195: 1189: 1183: 1176: 1172: 1165: 1161: 1156: 1136: 1093: 645: 93: 75: 73: 59: 45: 4715:Suffix tree 4290:Topic model 4170:Text corpus 4016:Statistical 3883:Text mining 3724:AI-complete 2082:// deletion 1361:Computation 1313:allows the 120:(of length 52:linguistics 4808:Categories 4011:Rule-based 3893:Truecasing 3761:Stop words 3666:2 November 3270:References 3160:is false. 1830:of length 1822:of length 1812:pseudocode 646:where the 90:Definition 4320:reviewing 4118:standards 4116:Types and 3631:1412.0348 3601:CiteSeerX 3588:1005.4033 3564:207046453 3542:CiteSeerX 3519:207694727 3489:CiteSeerX 3366:207551224 3344:CiteSeerX 3254:algorithm 3092:ε 3067:⁡ 2669:invariant 1583:lDistance 1559:lDistance 1532:lDistance 1502:lDistance 1451:lDistance 1433:lDistance 1415:lDistance 1391:lDistance 1376:lDistance 1366:Recursive 1042:⁡ 893:… 867:⁡ 815:⁡ 782:… 743:… 717:⁡ 594:⁡ 576:⁡ 563:⁡ 537:⁡ 518:⁡ 486:⁡ 473:⁡ 434:⁡ 416:⁡ 387:⁡ 369:⁡ 356:⁡ 232:⁡ 191:⁡ 4236:Wikidata 4216:FrameNet 4201:BabelNet 4180:Treebank 4150:PropBank 4095:Word2vec 4060:fastText 3941:Stemming 3470:(1975). 3438:13381535 3164:See also 3027:Automata 2708:function 1839:function 1787:prefixes 1623:, where 1175:n → sitt 1164:itten → 4781:Sorting 4751:Parsing 4471:Strings 4407:Related 4373:Chatbot 4231:WordNet 4211:DBpedia 4085:Seq2seq 3829:Parsing 3744:Trigram 3636:Bibcode 3593:Bibcode 3511:0375829 3310:Bibcode 3247:MinHash 3046:can be 2963:minimum 2750:declare 2741:declare 2064:minimum 1890:element 1875:declare 1770:is the 1523:minimum 1372:Haskell 1145:Example 1031:, thus 804:), and 4380:(c.f. 4038:models 4026:Neural 3739:Bigram 3734:n-gram 3603:  3562:  3544:  3517:  3509:  3491:  3436:  3389:  3364:  3346:  3231:Lucene 3120:> 0 3115:where 3008:return 2659: 2115:return 1826:, and 1783:matrix 1589:t' 1586:s' 1574:t' 1562:s' 1550:t' 1544:s' 1508:t' 1505:s' 1478:t' 1463:s' 1442:length 1424:length 1240:is 4. 856:(i.e. 706:(i.e. 218:where 58:, the 54:, and 4744:Other 4700:DAFSA 4667:BLAST 4429:spaCy 4074:large 4065:GloVe 3626:arXiv 3583:arXiv 3560:S2CID 3515:S2CID 3475:(PDF) 3434:S2CID 3362:S2CID 3332:(PDF) 3171:agrep 2677:into 1409:-> 1406:-> 1403:=> 62:is a 37:Class 4735:Trie 4725:Rope 4194:Data 4045:BERT 3668:2016 3387:ISBN 3181:diff 2999:with 2993:swap 2936:else 2858:from 2804:from 2774:from 2726:char 2717:char 2667:The 2043:else 2004:from 1983:from 1953:from 1917:from 1902:zero 1887:each 1857:char 1848:char 1802:The 1514:else 1499:then 1382:and 1334:the 1327:the 1320:the 1309:the 1171:sitt 1039:head 864:head 812:head 714:tail 654:tail 591:tail 573:tail 534:tail 483:tail 431:head 413:head 384:tail 366:tail 150:and 4226:UBY 3552:doi 3499:doi 3424:doi 3354:doi 3294:163 3064:log 2852:for 2798:for 2768:for 2753:int 2744:int 1998:for 1977:for 1947:for 1911:for 1884:set 1878:int 1412:Int 1248:In 1194:uni 1114:to 960:or 560:lev 515:lev 470:lev 459:min 353:lev 229:lev 188:lev 46:In 4810:: 3634:. 3599:. 3591:. 3558:. 3550:. 3536:. 3513:. 3507:MR 3505:. 3497:. 3485:18 3477:. 3432:, 3420:21 3418:, 3395:, 3360:. 3352:. 3340:33 3338:. 3334:. 3306:10 3304:. 3133:. 3011:v0 3002:v1 2996:v0 2960::= 2957:v1 2948:v0 2945::= 2933:v0 2930::= 2912:if 2903:v1 2900::= 2888:v0 2885::= 2864:to 2834:v1 2810:to 2789:v0 2780:to 2756:v1 2747:v0 2624:y 2619:4 2590:a 2585:5 2556:d 2551:6 2522:n 2517:6 2488:u 2483:7 2450:S 2445:8 2414:y 2411:a 2408:d 2405:r 2402:u 2399:t 2396:a 2393:S 2353:g 2325:n 2320:3 2297:i 2292:3 2269:t 2264:4 2241:t 2236:5 2213:i 2208:6 2185:s 2180:6 2155:n 2152:e 2149:t 2146:t 2143:i 2140:k 2061::= 2052::= 2037::= 2019:if 2010:to 1989:to 1971::= 1959:to 1935::= 1923:to 1899:to 1893:in 1717:. 1577:), 1493:== 1487:if 1397:Eq 1394::: 1264:. 1231:). 1141:. 1091:. 1027:, 86:. 50:, 4463:e 4456:t 4449:v 4384:) 4107:, 4076:) 4072:( 3702:e 3695:t 3688:v 3642:. 3638:: 3628:: 3609:. 3595:: 3585:: 3566:. 3554:: 3538:5 3521:. 3501:: 3457:. 3426:: 3400:. 3368:. 3356:: 3316:. 3312:: 3154:) 3152:n 3150:( 3148:O 3143:n 3131:) 3129:n 3127:( 3125:O 3118:ε 3100:, 3095:) 3088:/ 3084:1 3081:( 3078:O 3074:) 3070:n 3061:( 3044:n 2984:) 2978:, 2972:, 2966:( 2954:1 2951:+ 2939:: 2924:: 2921:t 2918:= 2915:s 2909:1 2906:+ 2894:1 2891:+ 2876:: 2873:1 2870:- 2867:n 2861:0 2855:j 2846:1 2843:+ 2840:i 2837:= 2822:: 2819:1 2816:- 2813:m 2807:0 2801:i 2795:i 2792:= 2786:: 2783:n 2777:0 2771:i 2735:: 2732:) 2729:t 2723:, 2720:s 2714:( 2687:d 2680:t 2674:s 2652:3 2648:4 2645:5 2642:5 2639:5 2636:4 2633:4 2630:5 2627:6 2615:3 2611:4 2608:4 2605:4 2602:4 2599:3 2596:4 2593:5 2582:4 2578:3 2574:4 2571:3 2568:3 2565:3 2562:3 2559:4 2548:5 2545:4 2541:3 2537:3 2534:2 2531:2 2528:2 2525:3 2514:5 2511:4 2508:3 2504:2 2500:2 2497:1 2494:1 2491:2 2480:6 2477:5 2474:4 2471:3 2467:2 2462:1 2457:0 2453:1 2442:7 2439:6 2436:5 2433:4 2430:3 2427:2 2424:1 2421:0 2375:3 2371:4 2368:4 2365:5 2362:6 2359:7 2356:7 2347:2 2343:3 2340:3 2337:4 2334:5 2331:6 2328:6 2316:2 2312:2 2309:3 2306:4 2303:5 2300:5 2289:2 2285:1 2281:2 2278:3 2275:4 2272:4 2261:3 2258:2 2254:1 2250:2 2247:3 2244:3 2233:4 2230:3 2227:2 2223:1 2219:2 2216:2 2205:5 2202:4 2199:3 2196:2 2192:1 2188:1 2177:5 2174:4 2171:3 2168:2 2165:1 2162:0 2118:d 2109:) 2103:+ 2100:d 2094:, 2091:1 2088:+ 2085:d 2079:, 2076:1 2073:+ 2070:d 2067:( 2058:d 2055:1 2046:: 2040:0 2031:: 2028:t 2025:= 2022:s 2016:: 2013:m 2007:1 2001:i 1995:: 1992:n 1986:1 1980:j 1974:j 1968:d 1965:: 1962:n 1956:1 1950:j 1938:i 1932:d 1929:: 1926:m 1920:1 1914:i 1896:d 1881:d 1866:: 1863:) 1860:t 1854:, 1851:s 1845:( 1832:n 1828:t 1824:m 1820:s 1776:j 1772:i 1758:] 1755:j 1752:, 1749:i 1746:[ 1743:m 1733:m 1715:t 1711:s 1707:t 1693:j 1683:s 1669:i 1649:] 1646:j 1643:[ 1640:] 1637:i 1634:[ 1631:M 1611:M 1595:] 1571:: 1568:b 1565:( 1553:, 1547:) 1541:: 1538:a 1535:( 1529:[ 1520:+ 1517:1 1496:b 1490:a 1484:= 1481:) 1475:: 1472:b 1469:( 1466:) 1460:: 1457:a 1454:( 1445:s 1439:= 1436:s 1427:t 1421:= 1418:t 1400:a 1384:t 1380:s 1342:. 1196:n 1184:g 1177:i 1173:e 1166:s 1162:k 1122:b 1102:a 1079:] 1076:0 1073:[ 1070:x 1067:= 1062:0 1058:x 1054:= 1051:) 1048:x 1045:( 1015:x 995:n 973:n 969:x 948:] 945:n 942:[ 939:x 917:0 913:x 909:= 906:) 901:n 897:x 888:1 884:x 878:0 874:x 870:( 844:x 824:) 821:x 818:( 790:n 786:x 777:2 773:x 767:1 763:x 759:= 756:) 751:n 747:x 738:1 734:x 728:0 724:x 720:( 694:x 674:x 608:) 603:) 600:b 597:( 588:, 585:) 582:a 579:( 568:( 551:) 546:) 543:b 540:( 531:, 528:a 523:( 506:) 501:b 498:, 495:) 492:a 489:( 478:( 464:{ 456:+ 453:1 446:, 443:) 440:b 437:( 428:= 425:) 422:a 419:( 401:) 396:) 393:b 390:( 381:, 378:) 375:a 372:( 361:( 346:, 343:0 340:= 336:| 332:a 328:| 316:| 312:b 308:| 300:, 297:0 294:= 290:| 286:b 282:| 270:| 266:a 262:| 255:{ 250:= 247:) 244:b 241:, 238:a 235:( 206:) 203:b 200:, 197:a 194:( 167:| 163:b 159:| 137:| 133:a 129:| 108:b 105:, 102:a

Index


information theory
linguistics
computer science
string metric
Vladimir Levenshtein
edit distance
pairwise string alignments
counting from 0
the naive recursive implementation

Hamming distance
triangle inequality
Hamming distance
approximate string matching
spell checkers
optical character recognition
translation memory
fuzzy string searching
record linkage
linguistic distance
mutual intelligibility
Edit distance
edit distance
Damerau–Levenshtein distance
transposition
longest common subsequence
Hamming distance
Jaro distance
transposition

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.