Knowledge

Grammar-based code

Source đź“ť

1775: 1765: 31: 237:, which constructs a grammar that may be reducible, i.e., contain repeats, where the entropy-coding cost of "spelling out" the repeats is less than the cost creating and entropy-coding a rule to capture them. (In general, the compression-optimal SLG is not irreducible, and the Smallest Grammar Problem is different from the actual SLG compression problem.) 116: 231:
is a greedy algorithm using the strategy of most-frequent-first substitution. The compressive performance is powerful, although the main memory space requirement is very large.
495:
Conrad, Kennon J.; Wilson, Paul R. (2016). "Grammatical Ziv-Lempel Compression: Achieving PPM-Class Text Compression Ratios with LZ-Class Decompression Speed".
225:
is a classical grammar compression algorithm that sequentially translates an input text into a CFG, and then the produced CFG is encoded by an arithmetic coder.
180: 156: 136: 202:, and many other new universal lossless compression algorithms. Grammar-based codes are universal in the sense that they can achieve asymptotically the 162:) is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar 599: 1267: 1078: 967: 38: 1473: 1296: 1090: 781: 302:
Charikar, M.; Lehman, E.; Liu, D.; Panigrahy, R.; Prabharakan, M.; Sahai, A.; Shelat, A. (2005), "The Smallest Grammar Problem",
1478: 1055: 512: 1208: 410:
Nevill-Manning, C. G.; Witten, I. H. (1997), "Identifying Hierarchical Structure in Sequences: A linear-time algorithm",
17: 1585: 1323: 1262: 1073: 1023: 846: 706: 691: 592: 1698: 1708: 1546: 1397: 1316: 1110: 1681: 1301: 1095: 883: 1778: 814: 1443: 1768: 1671: 1213: 771: 585: 276:
Kieffer, J. C.; Yang, E.-H. (2000), "Grammar-based codes: A new class of universal lossless source codes",
1814: 761: 756: 75: 1703: 1630: 1468: 1448: 1392: 1050: 841: 644: 551: 1804: 1713: 1654: 1580: 1428: 1018: 1013: 868: 711: 69: 1718: 1291: 1085: 786: 159: 1659: 1030: 917: 873: 686: 669: 659: 1809: 1284: 1035: 819: 664: 1556: 1688: 257: 34: 543: 374:
Ziv, J.; Lempel, A. (1978), "Compression of individual sequences via variable rate coding",
1372: 834: 796: 617: 65: 8: 1603: 1494: 1453: 1438: 1407: 1402: 1311: 1218: 1151: 1120: 1105: 888: 565: 198:, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing 1676: 1646: 1625: 1531: 1463: 1357: 1045: 861: 746: 726: 721: 518: 445: 419: 357: 319: 222: 165: 141: 121: 42: 1257: 1620: 1608: 1590: 1458: 1342: 1279: 1125: 1040: 996: 957: 639: 508: 252: 183: 1595: 1551: 1524: 1519: 1377: 1362: 1272: 1181: 1176: 1005: 738: 716: 608: 522: 500: 478: 463: 449: 437: 429: 391: 383: 361: 349: 323: 311: 285: 247: 61: 1514: 1328: 1252: 1233: 1203: 1171: 1137: 696: 634: 571: 555: 1306: 1100: 829: 824: 681: 654: 626: 396: 1798: 1613: 1561: 1228: 1223: 1198: 1130: 751: 649: 387: 218:
The compression programs of the following are available from external links.
199: 337: 315: 1734: 701: 676: 577: 560: 203: 1693: 1571: 1367: 1243: 1193: 504: 1750: 1541: 1536: 1423: 1382: 1188: 195: 538: 441: 353: 289: 482: 433: 574:- implementation of Sequitur, Re-Pair, and parallel Re-Pair in Java. 1664: 1509: 1166: 424: 158:. The problem of finding a smallest grammar for an input sequence ( 68:(CFG) for the string to be compressed. Examples include universal 1433: 907: 856: 228: 207: 338:"Universal lossless compression via multilevel pattern matching" 947: 1782: 1387: 980: 927: 336:
Kieffer, J. C.; Yang, E.-H.; Nelson, G.; Cosman, P. (2000),
194:
The class of grammar-based codes is very broad. It includes
937: 791: 776: 766: 234: 46: 301: 30: 912: 878: 548: 335: 168: 144: 124: 78: 37:(with start symbol Ăź) for the second sentence of the 409: 182:is further compressed by statistical encoders like 174: 150: 130: 110: 1796: 544:Description of grammar-based codes with example 189: 64:algorithms based on the idea of constructing a 593: 461: 607: 494: 412:Journal of Artificial Intelligence Research 275: 600: 586: 423: 395: 373: 39:United States Declaration of Independence 72:algorithms. To compress a data sequence 29: 213: 14: 1797: 497:2016 Data Compression Conference (DCC) 464:"Offline Dictionary-Based Compression" 581: 462:Larsson, N. J.; Moffat, A. (2000), 111:{\displaystyle x=x_{1}\cdots x_{n}} 27:Lossless data compression algorithm 24: 118:, a grammar-based code transforms 25: 1826: 532: 1774: 1773: 1764: 1763: 41:. Each blue character denotes a 210:source with a finite alphabet. 488: 455: 403: 367: 329: 295: 269: 13: 1: 568:a version of Gonzalo Navarro. 263: 49:-compression of the sentence. 190:Examples and characteristics 138:into a context-free grammar 45:; they were obtained from a 7: 241: 10: 1831: 1655:Compressed data structures 977:RLE + BWT + MTF + Huffman 645:Asymmetric numeral systems 1759: 1743: 1727: 1645: 1570: 1502: 1493: 1416: 1350: 1341: 1242: 1159: 1150: 1066: 1014:Discrete cosine transform 1004: 995: 944:LZ77 + Huffman + context 897: 807: 737: 625: 616: 539:GLZA discussion and paper 70:lossless data compression 58:Grammar-based compression 1719:Smallest grammar problem 388:10.1109/TIT.1978.1055934 160:smallest grammar problem 1660:Compressed suffix array 1209:Nyquist–Shannon theorem 471:Proceedings of the IEEE 376:IEEE Trans. Inf. Theory 342:IEEE Trans. Inf. Theory 316:10.1109/tit.2005.850116 304:IEEE Trans. Inf. Theory 278:IEEE Trans. Inf. Theory 176: 152: 132: 112: 50: 1689:Kolmogorov complexity 1557:Video characteristics 934:LZ77 + Huffman + ANS 258:Straight-line grammar 177: 153: 133: 113: 35:Straight-line grammar 33: 1779:Compression software 1373:Compression artifact 1329:Psychoacoustic model 505:10.1109/DCC.2016.119 214:Practical algorithms 166: 142: 122: 76: 66:context-free grammar 1769:Compression formats 1408:Texture compression 1403:Standard test image 1219:Silence compression 206:of any stationary, 54:Grammar-based codes 18:Grammar-based codes 1815:Information theory 1677:Information theory 1532:Display resolution 1358:Chroma subsampling 747:Byte pair encoding 692:Shannon–Fano–Elias 554:2008-10-13 at the 397:10338.dmlcz/142945 172: 148: 128: 108: 51: 43:nonterminal symbol 1792: 1791: 1641: 1640: 1591:Deblocking filter 1489: 1488: 1337: 1336: 1146: 1145: 991: 990: 514:978-1-5090-1853-6 477:(11): 1722–1732, 354:10.1109/18.850665 290:10.1109/18.841160 253:Grammar induction 184:arithmetic coding 175:{\displaystyle G} 151:{\displaystyle G} 131:{\displaystyle x} 16:(Redirected from 1822: 1805:Data compression 1777: 1776: 1767: 1766: 1596:Lapped transform 1500: 1499: 1378:Image resolution 1363:Coding tree unit 1348: 1347: 1157: 1156: 1002: 1001: 623: 622: 609:Data compression 602: 595: 588: 579: 578: 527: 526: 492: 486: 485: 483:10.1109/5.892708 468: 459: 453: 452: 434:10.1613/jair.374 427: 407: 401: 400: 399: 371: 365: 364: 348:(4): 1227–1245, 333: 327: 326: 310:(7): 2554–2576, 299: 293: 292: 273: 248:Dictionary coder 181: 179: 178: 173: 157: 155: 154: 149: 137: 135: 134: 129: 117: 115: 114: 109: 107: 106: 94: 93: 21: 1830: 1829: 1825: 1824: 1823: 1821: 1820: 1819: 1795: 1794: 1793: 1788: 1755: 1739: 1723: 1704:Rate–distortion 1637: 1566: 1485: 1412: 1333: 1238: 1234:Sub-band coding 1142: 1067:Predictive type 1062: 987: 954:LZSS + Huffman 904:LZ77 + Huffman 893: 803: 739:Dictionary type 733: 635:Adaptive coding 612: 606: 556:Wayback Machine 535: 530: 515: 499:. p. 586. 493: 489: 466: 460: 456: 408: 404: 372: 368: 334: 330: 300: 296: 274: 270: 266: 244: 216: 200:Lempel-Ziv code 192: 167: 164: 163: 143: 140: 139: 123: 120: 119: 102: 98: 89: 85: 77: 74: 73: 28: 23: 22: 15: 12: 11: 5: 1828: 1818: 1817: 1812: 1807: 1790: 1789: 1787: 1786: 1771: 1760: 1757: 1756: 1754: 1753: 1747: 1745: 1741: 1740: 1738: 1737: 1731: 1729: 1725: 1724: 1722: 1721: 1716: 1711: 1706: 1701: 1696: 1691: 1686: 1685: 1684: 1674: 1669: 1668: 1667: 1662: 1651: 1649: 1643: 1642: 1639: 1638: 1636: 1635: 1634: 1633: 1628: 1618: 1617: 1616: 1611: 1606: 1598: 1593: 1588: 1583: 1577: 1575: 1568: 1567: 1565: 1564: 1559: 1554: 1549: 1544: 1539: 1534: 1529: 1528: 1527: 1522: 1517: 1506: 1504: 1497: 1491: 1490: 1487: 1486: 1484: 1483: 1482: 1481: 1476: 1471: 1466: 1456: 1451: 1446: 1441: 1436: 1431: 1426: 1420: 1418: 1414: 1413: 1411: 1410: 1405: 1400: 1395: 1390: 1385: 1380: 1375: 1370: 1365: 1360: 1354: 1352: 1345: 1339: 1338: 1335: 1334: 1332: 1331: 1326: 1321: 1320: 1319: 1314: 1309: 1304: 1299: 1289: 1288: 1287: 1277: 1276: 1275: 1270: 1260: 1255: 1249: 1247: 1240: 1239: 1237: 1236: 1231: 1226: 1221: 1216: 1211: 1206: 1201: 1196: 1191: 1186: 1185: 1184: 1179: 1174: 1163: 1161: 1154: 1148: 1147: 1144: 1143: 1141: 1140: 1138:Psychoacoustic 1135: 1134: 1133: 1128: 1123: 1115: 1114: 1113: 1108: 1103: 1098: 1093: 1083: 1082: 1081: 1070: 1068: 1064: 1063: 1061: 1060: 1059: 1058: 1053: 1048: 1038: 1033: 1028: 1027: 1026: 1021: 1010: 1008: 1006:Transform type 999: 993: 992: 989: 988: 986: 985: 984: 983: 975: 974: 973: 970: 962: 961: 960: 952: 951: 950: 942: 941: 940: 932: 931: 930: 922: 921: 920: 915: 910: 901: 899: 895: 894: 892: 891: 886: 881: 876: 871: 866: 865: 864: 859: 849: 844: 839: 838: 837: 827: 822: 817: 811: 809: 805: 804: 802: 801: 800: 799: 794: 789: 784: 779: 774: 769: 764: 759: 749: 743: 741: 735: 734: 732: 731: 730: 729: 724: 719: 714: 704: 699: 694: 689: 684: 679: 674: 673: 672: 667: 662: 652: 647: 642: 637: 631: 629: 620: 614: 613: 605: 604: 597: 590: 582: 576: 575: 572:GrammarViz 2.0 569: 563: 558: 549:Sequitur codes 546: 541: 534: 533:External links 531: 529: 528: 513: 487: 454: 402: 382:(5): 530–536, 366: 328: 294: 284:(3): 737–754, 267: 265: 262: 261: 260: 255: 250: 243: 240: 239: 238: 232: 226: 215: 212: 191: 188: 171: 147: 127: 105: 101: 97: 92: 88: 84: 81: 26: 9: 6: 4: 3: 2: 1827: 1816: 1813: 1811: 1810:Coding theory 1808: 1806: 1803: 1802: 1800: 1784: 1780: 1772: 1770: 1762: 1761: 1758: 1752: 1749: 1748: 1746: 1742: 1736: 1733: 1732: 1730: 1726: 1720: 1717: 1715: 1712: 1710: 1707: 1705: 1702: 1700: 1697: 1695: 1692: 1690: 1687: 1683: 1680: 1679: 1678: 1675: 1673: 1670: 1666: 1663: 1661: 1658: 1657: 1656: 1653: 1652: 1650: 1648: 1644: 1632: 1629: 1627: 1624: 1623: 1622: 1619: 1615: 1612: 1610: 1607: 1605: 1602: 1601: 1599: 1597: 1594: 1592: 1589: 1587: 1584: 1582: 1579: 1578: 1576: 1573: 1569: 1563: 1562:Video quality 1560: 1558: 1555: 1553: 1550: 1548: 1545: 1543: 1540: 1538: 1535: 1533: 1530: 1526: 1523: 1521: 1518: 1516: 1513: 1512: 1511: 1508: 1507: 1505: 1501: 1498: 1496: 1492: 1480: 1477: 1475: 1472: 1470: 1467: 1465: 1462: 1461: 1460: 1457: 1455: 1452: 1450: 1447: 1445: 1442: 1440: 1437: 1435: 1432: 1430: 1427: 1425: 1422: 1421: 1419: 1415: 1409: 1406: 1404: 1401: 1399: 1396: 1394: 1391: 1389: 1386: 1384: 1381: 1379: 1376: 1374: 1371: 1369: 1366: 1364: 1361: 1359: 1356: 1355: 1353: 1349: 1346: 1344: 1340: 1330: 1327: 1325: 1322: 1318: 1315: 1313: 1310: 1308: 1305: 1303: 1300: 1298: 1295: 1294: 1293: 1290: 1286: 1283: 1282: 1281: 1278: 1274: 1271: 1269: 1266: 1265: 1264: 1261: 1259: 1256: 1254: 1251: 1250: 1248: 1245: 1241: 1235: 1232: 1230: 1229:Speech coding 1227: 1225: 1224:Sound quality 1222: 1220: 1217: 1215: 1212: 1210: 1207: 1205: 1202: 1200: 1199:Dynamic range 1197: 1195: 1192: 1190: 1187: 1183: 1180: 1178: 1175: 1173: 1170: 1169: 1168: 1165: 1164: 1162: 1158: 1155: 1153: 1149: 1139: 1136: 1132: 1129: 1127: 1124: 1122: 1119: 1118: 1116: 1112: 1109: 1107: 1104: 1102: 1099: 1097: 1094: 1092: 1089: 1088: 1087: 1084: 1080: 1077: 1076: 1075: 1072: 1071: 1069: 1065: 1057: 1054: 1052: 1049: 1047: 1044: 1043: 1042: 1039: 1037: 1034: 1032: 1029: 1025: 1022: 1020: 1017: 1016: 1015: 1012: 1011: 1009: 1007: 1003: 1000: 998: 994: 982: 979: 978: 976: 971: 969: 966: 965: 964:LZ77 + Range 963: 959: 956: 955: 953: 949: 946: 945: 943: 939: 936: 935: 933: 929: 926: 925: 923: 919: 916: 914: 911: 909: 906: 905: 903: 902: 900: 896: 890: 887: 885: 882: 880: 877: 875: 872: 870: 867: 863: 860: 858: 855: 854: 853: 850: 848: 845: 843: 840: 836: 833: 832: 831: 828: 826: 823: 821: 818: 816: 813: 812: 810: 806: 798: 795: 793: 790: 788: 785: 783: 780: 778: 775: 773: 770: 768: 765: 763: 760: 758: 755: 754: 753: 750: 748: 745: 744: 742: 740: 736: 728: 725: 723: 720: 718: 715: 713: 710: 709: 708: 705: 703: 700: 698: 695: 693: 690: 688: 685: 683: 680: 678: 675: 671: 668: 666: 663: 661: 658: 657: 656: 653: 651: 648: 646: 643: 641: 638: 636: 633: 632: 630: 628: 624: 621: 619: 615: 610: 603: 598: 596: 591: 589: 584: 583: 580: 573: 570: 567: 566:Re-Pair codes 564: 562: 561:Re-Pair codes 559: 557: 553: 550: 547: 545: 542: 540: 537: 536: 524: 520: 516: 510: 506: 502: 498: 491: 484: 480: 476: 472: 465: 458: 451: 447: 443: 439: 435: 431: 426: 421: 417: 413: 406: 398: 393: 389: 385: 381: 377: 370: 363: 359: 355: 351: 347: 343: 339: 332: 325: 321: 317: 313: 309: 305: 298: 291: 287: 283: 279: 272: 268: 259: 256: 254: 251: 249: 246: 245: 236: 233: 230: 227: 224: 221: 220: 219: 211: 209: 205: 201: 197: 187: 185: 169: 161: 145: 125: 103: 99: 95: 90: 86: 82: 79: 71: 67: 63: 59: 55: 48: 44: 40: 36: 32: 19: 1735:Hutter Prize 1699:Quantization 1604:Compensation 1398:Quantization 1121:Compensation 851: 687:Shannon–Fano 627:Entropy type 496: 490: 474: 470: 457: 418:(4): 67–82, 415: 411: 405: 379: 375: 369: 345: 341: 331: 307: 303: 297: 281: 277: 271: 217: 204:entropy rate 193: 57: 53: 52: 1694:Prefix code 1547:Frame types 1368:Color space 1194:Convolution 924:LZ77 + ANS 835:Incremental 808:Other types 727:Levenshtein 196:block codes 62:compression 1799:Categories 1751:Mark Adler 1709:Redundancy 1626:Daubechies 1609:Estimation 1542:Frame rate 1464:Daubechies 1424:Chain code 1383:Macroblock 1189:Companding 1126:Estimation 1046:Daubechies 752:Lempel–Ziv 712:Exp-Golomb 640:Arithmetic 442:10289/1186 425:cs/9709102 264:References 1728:Community 1552:Interlace 938:Zstandard 717:Fibonacci 707:Universal 665:Canonical 96:⋯ 1714:Symmetry 1682:Timeline 1665:FM-index 1510:Bit rate 1503:Concepts 1351:Concepts 1214:Sampling 1167:Bit rate 1160:Concepts 862:Sequitur 697:Tunstall 670:Modified 660:Adaptive 618:Lossless 552:Archived 242:See also 223:Sequitur 1672:Entropy 1621:Wavelet 1600:Motion 1459:Wavelet 1439:Fractal 1434:Deflate 1417:Methods 1204:Latency 1117:Motion 1041:Wavelet 958:LHA/LZH 908:Deflate 857:Re-Pair 852:Grammar 682:Shannon 655:Huffman 611:methods 523:3116024 450:2957960 362:8191526 324:6900082 229:Re-Pair 208:ergodic 1783:codecs 1744:People 1647:Theory 1614:Vector 1131:Vector 948:Brotli 898:Hybrid 797:Snappy 650:Golomb 521:  511:  448:  360:  322:  1574:parts 1572:Codec 1537:Frame 1495:Video 1479:SPIHT 1388:Pixel 1343:Image 1297:ACELP 1268:ADPCM 1258:ÎĽ-law 1253:A-law 1246:parts 1244:Codec 1152:Audio 1091:ACELP 1079:ADPCM 1056:SPIHT 997:Lossy 981:bzip2 972:LZHAM 928:LZFSE 830:Delta 722:Gamma 702:Unary 677:Range 519:S2CID 467:(PDF) 446:S2CID 420:arXiv 358:S2CID 320:S2CID 1586:DPCM 1393:PSNR 1324:MDCT 1317:WLPC 1302:CELP 1263:DPCM 1111:WLPC 1096:CELP 1074:DPCM 1024:MDCT 968:LZMA 869:LDCT 847:DPCM 792:LZWL 782:LZSS 777:LZRW 767:LZJB 509:ISBN 235:GLZA 60:are 47:gzip 1631:DWT 1581:DCT 1525:VBR 1520:CBR 1515:ABR 1474:EZW 1469:DWT 1454:RLE 1444:KLT 1429:DCT 1312:LSP 1307:LAR 1292:LPC 1285:FFT 1182:VBR 1177:CBR 1172:ABR 1106:LSP 1101:LAR 1086:LPC 1051:DWT 1036:FFT 1031:DST 1019:DCT 918:LZS 913:LZX 889:RLE 884:PPM 879:PAQ 874:MTF 842:DMC 820:CTW 815:BWT 787:LZW 772:LZO 762:LZ4 757:842 501:doi 479:doi 438:hdl 430:doi 392:hdl 384:doi 350:doi 312:doi 286:doi 56:or 1801:: 1449:LP 1280:FT 1273:DM 825:CM 517:. 507:. 475:88 473:, 469:, 444:, 436:, 428:, 414:, 390:, 380:24 378:, 356:, 346:46 344:, 340:, 318:, 308:51 306:, 282:46 280:, 186:. 1785:) 1781:( 601:e 594:t 587:v 525:. 503:: 481:: 440:: 432:: 422:: 416:7 394:: 386:: 352:: 314:: 288:: 170:G 146:G 126:x 104:n 100:x 91:1 87:x 83:= 80:x 20:)

Index

Grammar-based codes

Straight-line grammar
United States Declaration of Independence
nonterminal symbol
gzip
compression
context-free grammar
lossless data compression
smallest grammar problem
arithmetic coding
block codes
Lempel-Ziv code
entropy rate
ergodic
Sequitur
Re-Pair
GLZA
Dictionary coder
Grammar induction
Straight-line grammar
doi
10.1109/18.841160
doi
10.1109/tit.2005.850116
S2CID
6900082
"Universal lossless compression via multilevel pattern matching"
doi
10.1109/18.850665

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑