Knowledge

Statistical machine translation

Source đź“ť

858:, the speech signal and the corresponding textual representation can be mapped to each other in blocks in order. This is not always the case with the same text in two languages. For SMT, the machine translator can only manage small sequences of words, and word order has to be thought of by the program designer. Attempts at solutions have included re-ordering models, where a distribution of location changes for each item of translation is guessed from aligned bi-text. Different location changes can be ranked with the help of the language model and the best can be selected. 735: 850:
Word order in languages differ. Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages. There are also additional differences in word orders, for instance, where modifiers for nouns are
841:
This problem is connected with word alignment, as in very specific contexts the idiomatic expression aligned with words that resulted in an idiomatic expression of the same meaning in the target language. However, it is unlikely, as the alignment usually doesn't work in any other contexts. For that
801:
Function words that have no clear equivalent in the target language were another challenge for the statistical models. For example, when translating from English to German, the sentence "John does not live here," the word "does" doesn't have a clear alignment in the translated sentence "John wohnt
866:
SMT systems typically store different word forms as separate symbols without any relation to each other and word forms or phrases that were not in the training data cannot be translated. This might be because of the lack of training data, changes in the human domain where the system is used, or
773:
In parallel corpora single sentences in one language can be found translated into several sentences in the other and vice versa. Long sentences may be broken up, short sentences may be merged. There are even some languages that use writing systems without clear indication of a sentence end (for
30:
approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the
830:
bilingual corpus primarily consists of parliamentary speech examples, where "Hear, Hear!" is frequently associated with "Bravo!" Using a model built on this corpus to translate ordinary speech in a conversational register would lead to incorrect translation of the word
656:
In phrase-based translation, the aim was to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words were called blocks or phrases, however, typically they were not linguistic
802:
hier nicht." Through logical reasoning, it may be aligned with the words "wohnt" (as in English it contains grammatical information for the word "live") or "nicht" (as it only appears in the sentence because it is negated) or it may be unaligned.
672:
The chosen phrases were further mapped one-to-one based on a phrase translation table, and could be reordered. This table could be learnt based on word-alignment, or directly from a parallel corpus. The second model was trained using the
721:
rules, but the grammars could be constructed by an extension of methods for phrase-based translation without reference to linguistically motivated syntactic constituents. This idea was first introduced in Chiang's Hiero system (2005).
523: 567:
As the translation systems were not able to store all native strings and their translations, a document was typically translated sentence by sentence, but even this was not enough. Language models were typically approximated by
647:
The benefits obtained for translation between Western European languages are not representative of results for other language pairs, owing to smaller training corpora and greater grammatical differences.
560:
that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in
810:
An example of such an anomaly was that "I took the train to Berlin" was mis-translated as "I took the train to Paris" due to the statistical abundance of "train to Paris" in the training set.
842:
reason, idioms could only be subjected to phrasal alignment, as they could not be decomposed further without losing their meaning. This problem was specific for word-based translation.
271: 1136:
Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL)
665:
that were found using statistical methods from corpora. It has been shown that restricting the phrases to linguistic phrases (syntactically motivated groups of words, see
369: 576:, and similar approaches have been applied to translation models, but there was additional complexity due to different sentence lengths and word orders in the languages. 1320: 342:
is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two subproblems. Finding the best translation
308: 195: 115: 554: 340: 155: 135: 377: 1409: 1569: 698: 2163: 1313: 1547: 1306: 1958: 1402: 778:. Through this and other mathematical models efficient search and retrieval of the highest scoring sentence alignment is possible. 2127: 701:, the statistical counterpart of the old idea of syntax-based translation did not take off. Examples of this approach included 674: 1275: 1251: 1221: 569: 1868: 1559: 1395: 1341: 1241: 2122: 790:. To learn e.g. the translation model, however, we need to know which words align in a source-target sentence pair. The 1729: 1351: 746: 197:
has been approached in a number of ways. One approach which lends itself well to computer implementation is to apply
1883: 1714: 1361: 897: 717:
Hierarchical phrase-based translation combined the phrase-based and syntax-based approaches to translation. It used
1654: 1366: 718: 706: 644:
Statistical machine translation usually works less well for language pairs with significantly different word order.
590:
from Stephan Vogel and Model 6 from Franz-Joseph Och), but significant advances were made with the introduction of
63: 1088:. In COLING ’96: The 16th International Conference on Computational Linguistics, pp. 836-841, Copenhagen, Denmark. 826:
might not receive a translation that accurately represents the original intent. For example, the popular Canadian
529:
For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings
2071: 1008: 965: 787: 775: 1719: 1464: 1346: 923: 795: 204: 1988: 1709: 2158: 1681: 606:
The most frequently cited benefits of statistical machine translation (SMT) over rule-based approach were:
2026: 2011: 1983: 1848: 1843: 1418: 1020: 973: 917: 823: 58:. Statistical machine translation was re-introduced in the late 1980s and early 1990s by researchers at 1763: 1734: 1512: 1376: 1371: 907: 36: 1606: 1459: 2132: 2056: 1788: 1744: 1629: 1527: 78: 345: 2036: 2006: 1673: 1507: 1156:
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)
1893: 1586: 1564: 1554: 1522: 1497: 1753: 702: 693:
units, rather than single words or strings of words (as in phrase-based MT), i.e. (partial)
276: 163: 83: 2106: 1782: 1758: 1611: 912: 882: 791: 690: 678: 666: 532: 310:
is the probability that the source string is the translation of the target string, and the
786:
Sentence alignment is usually either provided by the corpus or obtained by aforementioned
316: 8: 2086: 2016: 1973: 1929: 1701: 1691: 1686: 1574: 1330: 587: 27: 518:{\displaystyle {\tilde {e}}=arg\max _{e\in e^{*}}p(e|f)=arg\max _{e\in e^{*}}p(f|e)p(e)} 2096: 1968: 1833: 1596: 1579: 1437: 855: 561: 140: 120: 74: 55: 1169: 2101: 1813: 1621: 1532: 1271: 1247: 1217: 1012: 969: 1298: 1978: 1863: 1838: 1639: 1542: 1292:— Includes links to freely available statistical machine translation software 1194: 1151: 1131: 1111: 1016: 902: 614: 1239: 2090: 2051: 2046: 1914: 1644: 1517: 1492: 1474: 1289: 928: 892: 1006: 963: 1798: 1778: 1502: 1116: 1099: 1053: 311: 51: 1387: 557: 2152: 2061: 1873: 1853: 1634: 198: 137:
in the target language (for example, English) is the translation of a string
47: 32: 2041: 1659: 1055: 977: 620:
Generally, SMT systems were not tailored to any specific pair of languages.
556:
in the native language. Performing the search efficiently is the work of a
1240:
Philip Williams; Rico Sennrich; Matt Post; Philipp Koehn (1 August 2016).
1056:"The mathematics of statistical machine translation: parameter estimation" 1024: 1998: 1878: 1591: 1484: 1432: 1085: 851:
located, or where the same words are used as a question or a statement.
734: 1601: 694: 43: 765:
Problems that statistical machine translation did not solve included:
1469: 1152:
A Hierarchical Phrase-Based Model for Statistical Machine Translation
641:
Results may have superficial fluency that masks translation problems.
1944: 1924: 1909: 1888: 1858: 1803: 1768: 1649: 1290:
Annotated list of statistical natural language processing resources
887: 662: 725: 371:
is done by picking up the one that gives the highest probability:
2081: 1939: 1919: 1793: 1537: 1452: 1100:"A Systematic Comparison of Various Statistical Alignment Models" 827: 697:
of sentences/utterances. Until the 1990s, with advent of strong
1447: 1442: 877: 774:
example, Thai). Sentence aligning can be performed through the
658: 595: 591: 1054:
P. Brown; S. Della Pietra; V. Della Pietra; R. Mercer (1993).
689:
Syntax-based translation was based on the idea of translating
2137: 1773: 819: 580: 73:
The idea behind statistical machine translation comes from
617:
in machine-readable format and even more monolingual data.
1934: 1170:"Has AI surpassed humans at translation? Not even close!" 625:
More fluent translations owing to use of a language model
584: 59: 712: 1235: 1233: 1328: 535: 380: 348: 319: 279: 207: 166: 160:
The problem of modeling the probability distribution
143: 123: 86: 1230: 1086:
HMM-based Word Alignment in Statistical Translation
988:. Association for Computational Linguistics: 71–76 579:The statistical translation models were initially 548: 517: 363: 334: 302: 265: 189: 149: 129: 109: 2150: 1620: 978:"A statistical approach to language translation" 861: 461: 406: 1417: 1025:"A statistical approach to machine translation" 726:Challenges with statistical machine translation 610:More efficient use of human and data resources 1193:Philipp Koehn, Franz Josef Och, Daniel Marcu: 157:in the source language (for example, French). 1403: 1314: 1243:Syntax-based Statistical Machine Translation 1146: 1144: 638:Specific errors are hard to predict and fix. 77:. A document is translated according to the 1091: 684: 651: 1410: 1396: 1321: 1307: 950:W. Weaver (1955). Translation (1949). In: 818:Depending on the corpora used, the use of 1141: 1130:P. Koehn, F.J. Och, and D. Marcu (2003). 1115: 798:were attempts at solving this challenge. 50:in 1949, including the ideas of applying 1084:S. Vogel, H. Ney and C. Tillmann. 1996. 845: 805: 669:) decreased the quality of translation. 266:{\displaystyle p(e|f)\propto p(f|e)p(e)} 2164:Statistical natural language processing 1098:Och, Franz Josef; Ney, Hermann (2003). 938: 594:based models. Later work incorporated 46:machine translation were introduced by 2151: 1268:An Introduction to Machine Translation 1266:W. J. Hutchins and H. Somers. (1992). 1097: 1047: 1000: 957: 1391: 1302: 1211: 1207: 1205: 1203: 813: 768: 713:Hierarchical phrase-based translation 1869:Simple Knowledge Organization System 1246:. Morgan & Claypool Publishers. 1195:Statistical Phrase-Based Translation 1167: 1132:Statistical phrase based translation 1011:; S. Della Pietra; V. Della Pietra; 968:; S. Della Pietra; V. Della Pietra; 729: 13: 1200: 675:expectation maximization algorithm 14: 2175: 1884:Thesaurus (information retrieval) 1283: 898:Example-based machine translation 781: 707:synchronous context-free grammars 952:Machine Translation of Languages 733: 719:synchronous context-free grammar 64:Thomas J. Watson Research Center 1260: 1214:Statistical Machine Translation 1187: 788:Gale-Church alignment algorithm 776:Gale-Church alignment algorithm 629: 598:or quasi-syntactic structures. 20:Statistical machine translation 1465:Natural language understanding 1216:. Cambridge University Press. 1168:Zhou, Sharon (July 25, 2018). 1161: 1124: 1078: 944: 924:Rule-based machine translation 677:, similarly to the word-based 635:Corpus creation can be costly. 512: 506: 500: 493: 486: 445: 438: 431: 387: 355: 329: 323: 297: 290: 283: 273:, where the translation model 260: 254: 248: 241: 234: 225: 218: 211: 184: 177: 170: 104: 97: 90: 1: 1989:Optical character recognition 862:Out of vocabulary (OOV) words 1682:Multi-document summarization 364:{\displaystyle {\tilde {e}}} 16:Machine translation paradigm 7: 2012:Latent Dirichlet allocation 1984:Natural language generation 1849:Machine-readable dictionary 1844:Linguistic Linked Open Data 1419:Natural language processing 954:, MIT Press, Cambridge, MA. 918:Moses (machine translation) 870: 867:differences in morphology. 601: 558:machine translation decoder 10: 2180: 1764:Explicit semantic analysis 1513:Deep linguistic processing 1117:10.1162/089120103321337421 908:Hybrid machine translation 37:neural machine translation 2115: 2070: 2025: 1997: 1957: 1902: 1824: 1812: 1743: 1700: 1672: 1607:Word-sense disambiguation 1483: 1460:Computational linguistics 1425: 1337: 1104:Computational Linguistics 1060:Computational Linguistics 1029:Computational Linguistics 2133:Natural Language Toolkit 2057:Pronunciation assessment 1959:Automatic identification 1789:Latent semantic analysis 1745:Distributional semantics 1630:Compound-term processing 1528:Named-entity recognition 685:Syntax-based translation 652:Phrase-based translation 79:probability distribution 68: 2037:Automated essay scoring 2007:Document classification 1674:Automatic summarization 1212:Koehn, Philipp (2010). 1066:(2). MIT Press: 263–311 583:based (Models 1-5 from 1894:Universal Dependencies 1587:Terminology extraction 1570:Semantic decomposition 1565:Semantic role labeling 1555:Part-of-speech tagging 1523:Information extraction 1508:Coreference resolution 1498:Collocation extraction 550: 519: 365: 336: 304: 303:{\displaystyle p(f|e)} 267: 191: 190:{\displaystyle p(e|f)} 151: 131: 111: 110:{\displaystyle p(e|f)} 1655:Sentence segmentation 1035:(2). MIT Press: 79–85 1023:; P. Roossin (1990). 976:; P. Roossin (1988). 846:Different word orders 806:Statistical anomalies 551: 549:{\displaystyle e^{*}} 520: 366: 337: 305: 268: 192: 152: 132: 112: 2107:Voice user interface 1818:datasets and corpora 1759:Document-term matrix 1612:Word-sense induction 939:Notes and references 913:Microsoft Translator 883:Cache language model 705:-based MT and later 667:syntactic categories 533: 378: 346: 335:{\displaystyle p(e)} 317: 277: 205: 164: 141: 121: 84: 2159:Machine translation 2087:Interactive fiction 2017:Pachinko allocation 1974:Speech segmentation 1930:Google Ngram Viewer 1702:Machine translation 1692:Text simplification 1687:Sentence extraction 1575:Semantic similarity 1331:machine translation 933:Statistical parsing 929:SDL Language Weaver 824:linguistic register 588:Hidden Markov model 42:The first ideas of 28:machine translation 2097:Question answering 1969:Speech recognition 1834:Corpus linguistics 1814:Language resources 1597:Textual entailment 1580:Sentiment analysis 1150:D. Chiang (2005). 856:speech recognition 814:Idiom and register 769:Sentence alignment 745:. You can help by 699:stochastic parsers 562:speech recognition 546: 515: 482: 427: 361: 332: 300: 263: 187: 147: 127: 107: 75:information theory 56:information theory 2146: 2145: 2102:Virtual assistant 2027:Computer-assisted 1953: 1952: 1710:Computer-assisted 1668: 1667: 1660:Word segmentation 1622:Text segmentation 1560:Semantic analysis 1548:Syntactic parsing 1533:Ontology learning 1385: 1384: 1276:978-0-12-362830-5 1253:978-1-62705-502-4 1223:978-0-521-87415-1 1013:Frederick Jelinek 970:Frederick Jelinek 763: 762: 460: 405: 390: 358: 150:{\displaystyle f} 130:{\displaystyle e} 2171: 2123:Formal semantics 2072:Natural language 1979:Speech synthesis 1961:and data capture 1864:Semantic network 1839:Lexical resource 1822: 1821: 1640:Lexical analysis 1618: 1617: 1543:Semantic parsing 1412: 1405: 1398: 1389: 1388: 1342:Dictionary-based 1323: 1316: 1309: 1300: 1299: 1278: 1264: 1258: 1257: 1237: 1228: 1227: 1209: 1198: 1191: 1185: 1184: 1182: 1180: 1165: 1159: 1148: 1139: 1128: 1122: 1121: 1119: 1095: 1089: 1082: 1076: 1075: 1073: 1071: 1051: 1045: 1044: 1042: 1040: 1021:Robert L. Mercer 1017:John D. Lafferty 1004: 998: 997: 995: 993: 974:Robert L. Mercer 961: 955: 948: 903:Google Translate 758: 755: 737: 730: 615:parallel corpora 613:There were many 555: 553: 552: 547: 545: 544: 524: 522: 521: 516: 496: 481: 480: 479: 441: 426: 425: 424: 392: 391: 383: 370: 368: 367: 362: 360: 359: 351: 341: 339: 338: 333: 309: 307: 306: 301: 293: 272: 270: 269: 264: 244: 221: 196: 194: 193: 188: 180: 156: 154: 153: 148: 136: 134: 133: 128: 116: 114: 113: 108: 100: 2179: 2178: 2174: 2173: 2172: 2170: 2169: 2168: 2149: 2148: 2147: 2142: 2111: 2091:Syntax guessing 2073: 2066: 2052:Predictive text 2047:Grammar checker 2028: 2021: 1993: 1960: 1949: 1915:Bank of English 1898: 1826: 1817: 1808: 1739: 1696: 1664: 1616: 1518:Distant reading 1493:Argument mining 1479: 1475:Text processing 1421: 1416: 1386: 1381: 1333: 1327: 1296: 1286: 1281: 1265: 1261: 1254: 1238: 1231: 1224: 1210: 1201: 1192: 1188: 1178: 1176: 1166: 1162: 1149: 1142: 1129: 1125: 1096: 1092: 1083: 1079: 1069: 1067: 1052: 1048: 1038: 1036: 1005: 1001: 991: 989: 962: 958: 949: 945: 941: 936: 920:, free software 893:Europarl corpus 873: 864: 848: 816: 808: 784: 771: 759: 753: 750: 743:needs expansion 728: 715: 687: 654: 632: 604: 540: 536: 534: 531: 530: 492: 475: 471: 464: 437: 420: 416: 409: 382: 381: 379: 376: 375: 350: 349: 347: 344: 343: 318: 315: 314: 289: 278: 275: 274: 240: 217: 206: 203: 202: 176: 165: 162: 161: 142: 139: 138: 122: 119: 118: 96: 85: 82: 81: 71: 17: 12: 11: 5: 2177: 2167: 2166: 2161: 2144: 2143: 2141: 2140: 2135: 2130: 2125: 2119: 2117: 2113: 2112: 2110: 2109: 2104: 2099: 2094: 2084: 2078: 2076: 2074:user interface 2068: 2067: 2065: 2064: 2059: 2054: 2049: 2044: 2039: 2033: 2031: 2023: 2022: 2020: 2019: 2014: 2009: 2003: 2001: 1995: 1994: 1992: 1991: 1986: 1981: 1976: 1971: 1965: 1963: 1955: 1954: 1951: 1950: 1948: 1947: 1942: 1937: 1932: 1927: 1922: 1917: 1912: 1906: 1904: 1900: 1899: 1897: 1896: 1891: 1886: 1881: 1876: 1871: 1866: 1861: 1856: 1851: 1846: 1841: 1836: 1830: 1828: 1819: 1810: 1809: 1807: 1806: 1801: 1799:Word embedding 1796: 1791: 1786: 1779:Language model 1776: 1771: 1766: 1761: 1756: 1750: 1748: 1741: 1740: 1738: 1737: 1732: 1730:Transfer-based 1727: 1722: 1717: 1712: 1706: 1704: 1698: 1697: 1695: 1694: 1689: 1684: 1678: 1676: 1670: 1669: 1666: 1665: 1663: 1662: 1657: 1652: 1647: 1642: 1637: 1632: 1626: 1624: 1615: 1614: 1609: 1604: 1599: 1594: 1589: 1583: 1582: 1577: 1572: 1567: 1562: 1557: 1552: 1551: 1550: 1545: 1535: 1530: 1525: 1520: 1515: 1510: 1505: 1503:Concept mining 1500: 1495: 1489: 1487: 1481: 1480: 1478: 1477: 1472: 1467: 1462: 1457: 1456: 1455: 1450: 1440: 1435: 1429: 1427: 1423: 1422: 1415: 1414: 1407: 1400: 1392: 1383: 1382: 1380: 1379: 1374: 1369: 1364: 1359: 1354: 1352:Transfer-based 1349: 1344: 1338: 1335: 1334: 1329:Approaches to 1326: 1325: 1318: 1311: 1303: 1294: 1293: 1285: 1284:External links 1282: 1280: 1279: 1259: 1252: 1229: 1222: 1199: 1186: 1160: 1140: 1123: 1090: 1077: 1046: 999: 956: 942: 940: 937: 935: 934: 931: 926: 921: 915: 910: 905: 900: 895: 890: 885: 880: 874: 872: 869: 863: 860: 847: 844: 815: 812: 807: 804: 783: 782:Word alignment 780: 770: 767: 761: 760: 740: 738: 727: 724: 714: 711: 686: 683: 653: 650: 649: 648: 645: 642: 639: 636: 631: 628: 627: 626: 623: 622: 621: 618: 603: 600: 543: 539: 527: 526: 514: 511: 508: 505: 502: 499: 495: 491: 488: 485: 478: 474: 470: 467: 463: 459: 456: 453: 450: 447: 444: 440: 436: 433: 430: 423: 419: 415: 412: 408: 404: 401: 398: 395: 389: 386: 357: 354: 331: 328: 325: 322: 312:language model 299: 296: 292: 288: 285: 282: 262: 259: 256: 253: 250: 247: 243: 239: 236: 233: 230: 227: 224: 220: 216: 213: 210: 186: 183: 179: 175: 172: 169: 146: 126: 117:that a string 106: 103: 99: 95: 92: 89: 70: 67: 52:Claude Shannon 15: 9: 6: 4: 3: 2: 2176: 2165: 2162: 2160: 2157: 2156: 2154: 2139: 2136: 2134: 2131: 2129: 2128:Hallucination 2126: 2124: 2121: 2120: 2118: 2114: 2108: 2105: 2103: 2100: 2098: 2095: 2092: 2088: 2085: 2083: 2080: 2079: 2077: 2075: 2069: 2063: 2062:Spell checker 2060: 2058: 2055: 2053: 2050: 2048: 2045: 2043: 2040: 2038: 2035: 2034: 2032: 2030: 2024: 2018: 2015: 2013: 2010: 2008: 2005: 2004: 2002: 2000: 1996: 1990: 1987: 1985: 1982: 1980: 1977: 1975: 1972: 1970: 1967: 1966: 1964: 1962: 1956: 1946: 1943: 1941: 1938: 1936: 1933: 1931: 1928: 1926: 1923: 1921: 1918: 1916: 1913: 1911: 1908: 1907: 1905: 1901: 1895: 1892: 1890: 1887: 1885: 1882: 1880: 1877: 1875: 1874:Speech corpus 1872: 1870: 1867: 1865: 1862: 1860: 1857: 1855: 1854:Parallel text 1852: 1850: 1847: 1845: 1842: 1840: 1837: 1835: 1832: 1831: 1829: 1823: 1820: 1815: 1811: 1805: 1802: 1800: 1797: 1795: 1792: 1790: 1787: 1784: 1780: 1777: 1775: 1772: 1770: 1767: 1765: 1762: 1760: 1757: 1755: 1752: 1751: 1749: 1746: 1742: 1736: 1733: 1731: 1728: 1726: 1723: 1721: 1718: 1716: 1715:Example-based 1713: 1711: 1708: 1707: 1705: 1703: 1699: 1693: 1690: 1688: 1685: 1683: 1680: 1679: 1677: 1675: 1671: 1661: 1658: 1656: 1653: 1651: 1648: 1646: 1645:Text chunking 1643: 1641: 1638: 1636: 1635:Lemmatisation 1633: 1631: 1628: 1627: 1625: 1623: 1619: 1613: 1610: 1608: 1605: 1603: 1600: 1598: 1595: 1593: 1590: 1588: 1585: 1584: 1581: 1578: 1576: 1573: 1571: 1568: 1566: 1563: 1561: 1558: 1556: 1553: 1549: 1546: 1544: 1541: 1540: 1539: 1536: 1534: 1531: 1529: 1526: 1524: 1521: 1519: 1516: 1514: 1511: 1509: 1506: 1504: 1501: 1499: 1496: 1494: 1491: 1490: 1488: 1486: 1485:Text analysis 1482: 1476: 1473: 1471: 1468: 1466: 1463: 1461: 1458: 1454: 1451: 1449: 1446: 1445: 1444: 1441: 1439: 1436: 1434: 1431: 1430: 1428: 1426:General terms 1424: 1420: 1413: 1408: 1406: 1401: 1399: 1394: 1393: 1390: 1378: 1375: 1373: 1370: 1368: 1365: 1363: 1362:Example-based 1360: 1358: 1355: 1353: 1350: 1348: 1345: 1343: 1340: 1339: 1336: 1332: 1324: 1319: 1317: 1312: 1310: 1305: 1304: 1301: 1297: 1291: 1288: 1287: 1277: 1273: 1269: 1263: 1255: 1249: 1245: 1244: 1236: 1234: 1225: 1219: 1215: 1208: 1206: 1204: 1196: 1190: 1175: 1171: 1164: 1157: 1153: 1147: 1145: 1137: 1133: 1127: 1118: 1113: 1109: 1105: 1101: 1094: 1087: 1081: 1065: 1061: 1057: 1050: 1034: 1030: 1026: 1022: 1018: 1014: 1010: 1003: 987: 983: 979: 975: 971: 967: 960: 953: 947: 943: 932: 930: 927: 925: 922: 919: 916: 914: 911: 909: 906: 904: 901: 899: 896: 894: 891: 889: 886: 884: 881: 879: 876: 875: 868: 859: 857: 852: 843: 839: 838: 834: 829: 825: 821: 811: 803: 799: 797: 793: 789: 779: 777: 766: 757: 748: 744: 741:This section 739: 736: 732: 731: 723: 720: 710: 708: 704: 700: 696: 692: 682: 680: 676: 670: 668: 664: 660: 646: 643: 640: 637: 634: 633: 624: 619: 616: 612: 611: 609: 608: 607: 599: 597: 593: 589: 586: 582: 577: 575: 573: 565: 563: 559: 541: 537: 509: 503: 497: 489: 483: 476: 472: 468: 465: 457: 454: 451: 448: 442: 434: 428: 421: 417: 413: 410: 402: 399: 396: 393: 384: 374: 373: 372: 352: 326: 320: 313: 294: 286: 280: 257: 251: 245: 237: 231: 228: 222: 214: 208: 200: 199:Bayes Theorem 181: 173: 167: 158: 144: 124: 101: 93: 87: 80: 76: 66: 65: 61: 57: 53: 49: 48:Warren Weaver 45: 40: 38: 34: 33:deep learning 29: 25: 21: 2042:Concordancer 1724: 1438:Bag-of-words 1367:Interlingual 1356: 1295: 1270:, 18.3:322. 1267: 1262: 1242: 1213: 1189: 1177:. Retrieved 1174:Skynet Today 1173: 1163: 1155: 1135: 1126: 1107: 1103: 1093: 1080: 1068:. Retrieved 1063: 1059: 1049: 1037:. Retrieved 1032: 1028: 1002: 990:. Retrieved 985: 981: 959: 951: 946: 865: 853: 849: 840: 836: 832: 817: 809: 800: 796:HMM-approach 785: 772: 764: 751: 747:adding to it 742: 716: 688: 671: 655: 630:Shortcomings 605: 578: 574:-gram models 571: 566: 528: 159: 72: 41: 23: 19: 18: 1999:Topic model 1879:Text corpus 1725:Statistical 1592:Text mining 1433:AI-complete 1357:Statistical 695:parse trees 44:statistical 2153:Categories 1720:Rule-based 1602:Truecasing 1470:Stop words 1347:Rule-based 1009:John Cocke 1007:P. Brown; 966:John Cocke 964:P. Brown; 792:IBM-Models 201:, that is 2029:reviewing 1827:standards 1825:Types and 1110:: 19–51. 982:Coling'88 691:syntactic 679:IBM model 663:phrasemes 570:smoothed 542:∗ 477:∗ 469:∈ 422:∗ 414:∈ 388:~ 356:~ 229:∝ 1945:Wikidata 1925:FrameNet 1910:BabelNet 1889:Treebank 1859:PropBank 1804:Word2vec 1769:fastText 1650:Stemming 1179:2 August 1070:22 March 1039:22 March 992:22 March 888:Duolingo 871:See also 754:May 2012 602:Benefits 26:) was a 2116:Related 2082:Chatbot 1940:WordNet 1920:DBpedia 1794:Seq2seq 1538:Parsing 1453:Trigram 828:Hansard 794:or the 659:phrases 35:-based 2089:(c.f. 1747:models 1735:Neural 1448:Bigram 1443:n-gram 1377:Hybrid 1372:Neural 1274:  1250:  1220:  1197:(2003) 878:AppTek 837:Bravo! 661:, but 596:syntax 592:phrase 2138:spaCy 1783:large 1774:GloVe 1154:. In 1134:. In 820:idiom 69:Basis 1903:Data 1754:BERT 1272:ISBN 1248:ISBN 1218:ISBN 1181:2018 1072:2015 1041:2015 994:2015 833:hear 822:and 581:word 1935:UBY 1112:doi 854:In 835:as 749:. 703:DOP 585:IBM 462:max 407:max 62:'s 60:IBM 54:'s 24:SMT 2155:: 1232:^ 1202:^ 1172:. 1143:^ 1108:29 1106:. 1102:. 1064:19 1062:. 1058:. 1033:16 1031:. 1027:. 1019:; 1015:; 984:. 980:. 972:; 709:. 681:. 564:. 39:. 2093:) 1816:, 1785:) 1781:( 1411:e 1404:t 1397:v 1322:e 1315:t 1308:v 1256:. 1226:. 1183:. 1158:. 1138:. 1120:. 1114:: 1074:. 1043:. 996:. 986:1 756:) 752:( 572:n 538:e 525:. 513:) 510:e 507:( 504:p 501:) 498:e 494:| 490:f 487:( 484:p 473:e 466:e 458:g 455:r 452:a 449:= 446:) 443:f 439:| 435:e 432:( 429:p 418:e 411:e 403:g 400:r 397:a 394:= 385:e 353:e 330:) 327:e 324:( 321:p 298:) 295:e 291:| 287:f 284:( 281:p 261:) 258:e 255:( 252:p 249:) 246:e 242:| 238:f 235:( 232:p 226:) 223:f 219:| 215:e 212:( 209:p 185:) 182:f 178:| 174:e 171:( 168:p 145:f 125:e 105:) 102:f 98:| 94:e 91:( 88:p 22:(

Index

machine translation
deep learning
neural machine translation
statistical
Warren Weaver
Claude Shannon
information theory
IBM
Thomas J. Watson Research Center
information theory
probability distribution
Bayes Theorem
language model
machine translation decoder
speech recognition
smoothed n-gram models
word
IBM
Hidden Markov model
phrase
syntax
parallel corpora
phrases
phrasemes
syntactic categories
expectation maximization algorithm
IBM model
syntactic
parse trees
stochastic parsers

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑