Knowledge

Character encodings in HTML

Source 📝

572:
Successful viewing of a page is not necessarily an indication that its encoding is specified correctly. If the page's creator and reader are both assuming some platform-specific character encoding, and the server does not send any identifying information, then the reader will nonetheless see the page
1389:
Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately, then HTML character references are usually only required for markup delimiting characters as mentioned above, and for a few special characters (or
589:
HTML 5.0 and 5.1) specifies a list of encodings which browsers must support. The HTML standards forbid support of other encodings. The Encoding Standard further stipulates that new formats, new protocols (even when existing formats are used) and authors of new documents are required to use
488:
With this second approach, because the character encoding cannot be known until the declaration is parsed, there is a problem knowing which character encoding is used in the document up to and including the declaration itself. If the character encoding is an
1311:
used by authors of HTML documents, will be able to render all HTML characters. Most modern software is able to display most or all of the characters for the user's language, and will draw a box or other clear indicator for characters they cannot render.
1139:, which also allow sequences of ASCII bytes to be interpreted differently, this approach was not seen as feasible for them since they are comparatively more frequently used in deployed content. The following encodings receive this treatment: 493:
then the content up to and including the declaration itself should be pure ASCII and this will work correctly. For character encodings that are not ASCII extensions (i.e. not a superset of ASCII), such as
1511:(which gives é, Latin lower-case E with acute accent, U+00E9 in Unicode) in an XML document will generate an error unless the entity has already been defined. XML also requires that the 546:) language environments where there are several different multi-byte encodings in use, auto-detection is also often employed. Finally, browsers usually permit the user to override 1131:) which may exploit a difference between the client and server in what encodings are supported in order to mask malicious content. Although the same security concern applies to 514:. An "encoding sniffing algorithm" is defined in the specification to determine the character encoding of the document based on multiple sources of input, including: 771:(U+3000) for compatibility reasons, and as such excluding U+E5E5 (a private use character). Also, specified with 0x80 accepted as an alternative encoding of the 569:
ASCII superset encoding, and they are less efficient for text with a high frequency of ASCII characters, which is usually the case for HTML documents.
1018:
Uses the same encoder and decoder as ISO-8859-8, but is not subject to the visual-order behaviour which is used for documents labelled as ISO-8859-8.
1552: 1383: 1320: 161: 31: 882:(BOM), if present, takes priority over any label. Specified for decoding only; form submissions from UTF-16-coded documents are to be encoded in 565:, which can be used for all languages as well, are less widely used because they can be harder to handle in programming languages that assume a 1418:
there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:
1075:
The specification uses the same index as used for Shift JIS (insofar as is within reach of the EUC code set 1), i.e. includes NEC extensions.
542:-speaking users, but other languages regularly—in some cases, always—require characters outside that range. In Chinese, Japanese, and Korean ( 2473: 791:
variant, although most of the HKSCS extensions (those with lead bytes less than 0xA1) are not included by the encoder, only by the decoder.
1406:, such as space and tab, must be escaped using entities. Other languages related to HTML have their own methods of escaping characters. 573:
as the creator intended, but other readers on different platforms or with different native languages will not see the page as intended.
38: 1323:. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference. 597:
Besides UTF-8, the following encodings are explicitly listed in the HTML standard itself, with reference to the Encoding Standard:
585:
Encoding Standard, referenced by recent HTML standards (the current WHATWG HTML Living Standard, as well as the formerly competing
502:, a processor of HTML, such as a web browser, should be able to parse the declaration in some cases through the use of heuristics. 17: 1319:
standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using
1106: 534:
Analysis of the document bytes looking for specific sequences or ranges of byte values, and other tentative detection mechanisms.
287: 1636: 1101: 2449:
HTML Entity Encoding chapter of Browser Security Handbook – more information about current browsers and their entity handling
788: 339:
were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit
1701: 906:
The following additional encodings are listed in the Encoding Standard, and support for them is therefore also required:
821:
The specification uses the same index as used for Shift JIS (insofar as is within reach), i.e. includes NEC extensions.
1904: 538:
Characters outside of the printable ASCII range (32 to 126) usually appear incorrectly. This presents few problems for
1382:. For a list of all named HTML character entity references along with the versions in which they were introduced, see 2453: 847: 1398:
is used). Incorrect HTML entity escaping may also open up security vulnerabilities for injection attacks such as
335:) has been in use since 1991, HTML 4.0 from December 1997 was the first standardized version where international 825:
is converted to fullwidth by the encoder, but accepted using an escape sequence (ESC 0x28 0x49) by the decoder.
1123:
The standard also defines a "replacement" decoder, which maps all content labelled as certain encodings to the
318: 292: 249: 188: 1756: 1176: 371: 1172: 1507:
All other character entity references have to be defined before they can be used. For example, use of
1644: 244: 2398: 1669: 1527:, which is an XML application, supports the HTML entity set, along with XML's predefined entities. 998: 993: 895:
Maps 0x00 through 0x7F to U+0000 through U+007F, and 0x80 through 0xFF to U+F780 through U+F7FF (a
2448: 2443: 2372: 1617: 1226: 553:
It is increasingly common for multilingual websites and websites in non-Western languages to use
2192: 1978: 1715: 859:
Specified for decoding only; form submissions from UTF-16-coded documents are to be encoded in
393: 146: 1733: 1124: 388:
This method gives the HTTP server a convenient way to alter document's encoding according to
336: 178: 1403: 1399: 1128: 183: 100: 8: 843: 389: 2346: 2033: 2007: 359:
There are two general ways to specify which character encoding is used in the document.
2243: 1926: 1900: 1593: 311: 200: 2454:
The Open Web Application Security Project's wiki article on cross-site scripting (XSS)
2438: 2368: 2342: 2316: 2290: 2264: 2217: 2188: 2162: 2136: 2110: 2088: 2084: 2055: 2029: 2003: 1974: 1948: 1922: 1869: 1843: 1814: 1789: 1785: 1573: 768: 1873: 1818: 1597: 1066:) excludes four-byte codes, and favours the one-byte 0x80 representation for U+20AC. 2166: 1585: 1542: 1536: 896: 539: 166: 2294: 1952: 1847: 899:
range), such that the low 8 bits of the code point always match the original byte.
2059: 1338:
is a case-sensitive alphanumeric string. For example, "λ" can also be encoded as
879: 822: 525: 490: 127: 68: 2320: 1414:
Unlike traditional HTML with its large range of character entity references, in
1086:
The following encodings are listed as explicit examples of forbidden encodings:
2402: 2221: 1673: 543: 271: 134: 78: 1127:(�), refusing to process it at all. This is intended to prevent attacks (e.g. 2468: 2462: 2114: 1547: 1402:. If HTML attributes are left unquoted, certain characters, most importantly 1181:
In addition to native character encodings, characters can also be encoded as
911: 566: 304: 173: 139: 122: 2140: 1304: 988: 983: 936: 657: 652: 647: 642: 637: 632: 627: 622: 117: 112: 107: 58: 1539:– used by many browsers when character encoding metadata is not available 1308: 1300: 1277: 1194: 1154: 1149: 1144: 1132: 964: 959: 954: 949: 944: 809: 776: 718: 677: 617: 348: 266: 261: 151: 95: 2268: 1374:
are already used to delimit markup. This notably did not include XML's
1159: 1076: 931: 926: 921: 916: 754: 739: 612: 607: 602: 363: 210: 205: 83: 73: 1296:
may mix uppercase and lowercase, though uppercase is the usual style.
872:
For compatibility with deployed content, also specified for the plain
392:; certain HTTP server software can do it, for example Apache with the 1589: 1432: 826: 772: 672: 474:
documents have a third option: to express the character encoding via
344: 2433: 2394: 1665: 1062:
for decoding purposes. For encoding purposes, labelling as GBK (or
830: 692: 687: 662: 499: 495: 402:
Second, a declaration can be included within the document itself.
1391: 1269: 1230: 1190: 1054: 1036: 978: 712: 1201:. Character entity references are also sometimes referred to as 521:
An explicit meta tag within the first 1024 bytes of the document
483:<?xml version="1.0" encoding="utf-8"?> 39:
Help:Percent-encoding § Fixing Links with Unsupported Characters
2380: 2354: 2328: 2302: 2276: 2229: 2200: 2174: 2148: 2122: 2096: 2067: 2041: 2015: 1986: 1960: 1934: 1881: 1855: 1826: 1797: 1515:
in hexadecimal numeric references be in lowercase: for example
1292:
may be any number of digits and may include leading zeros. The
1136: 1116: 1111: 1091: 1003: 974: 969: 874: 682: 582: 562: 558: 405:
For HTML it is possible to include this information inside the
354: 256: 232: 2393: 1664: 1578:
Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
1524: 1395: 1379: 1316: 1096: 883: 860: 733: 591: 554: 511: 471: 443: 340: 237: 227: 222: 215: 90: 63: 1209:
for HTML. HTML's usage of character references derives from
27:
Use of encoding systems for international characters in HTML
1581: 1210: 846:(Windows-949), which is a superset which covers the entire 667: 557:, which allows use of the same encoding for all languages. 446:
also allows the following syntax to mean exactly the same:
332: 49: 1040: 779:). Otherwise, follows the mappings from the 2005 standard. 531:
The HTTP Content-Type or other transport layer information
2410: 2248: 1681: 1415: 833:(0x0E and 0x0F) are excluded entirely to prevent attacks. 805: 801: 586: 475: 193: 1702:"HTML5 prescan a byte stream to determine its encoding" 1342:
in an HTML document. The character entity references
1326:
Character entity references can also have the format
767:
Specified with 0xA3A0 as a duplicate encoding of the
343:, two goals are worth considering: the information's 1998: 1996: 528:(BOM) within the first three bytes of the document 2460: 1993: 1780: 1778: 1776: 1774: 1772: 1770: 1553:List of XML and HTML character entity references 1384:List of XML and HTML character entity references 32:List of XML and HTML character entity references 2244:"Bug 17053: Support KOI8-RU mapping for KOI8-U" 2212: 2210: 1571: 505: 30:For a list of character entity references, see 2444:The Definitive Guide to Web Character Encoding 1899: 1895: 1893: 1891: 1637:"Specifying the document's character encoding" 1631: 1629: 1627: 378:header, which would typically look like this: 2434:Online HTML entity encoder & decoder tool 2401:; Maler, E.; Yergeau, F. (26 November 2008), 1767: 1751: 1749: 1747: 1672:; Maler, E.; Yergeau, F. (26 November 2008), 312: 2207: 2079: 2077: 1315:For codes from 0 to 127, the original 7-bit 1216: 355:Specifying the document's character encoding 1979:"5. Indexes (§ Index ISO-2022-JP katakana)" 1888: 1624: 1409: 2193:"9. Legacy single-byte encodings (§ Note)" 1838: 1836: 1744: 1039:in positions 0xAE and 0xBE (i.e. includes 319: 305: 2074: 1358:are predefined in HTML and SGML, because 2367: 2341: 2315: 2289: 2263: 2216: 2187: 2161: 2135: 2109: 2083: 2054: 2028: 2002: 1973: 1947: 1921: 1868: 1842: 1813: 1809: 1807: 1784: 1307:used by receivers of HTML documents, or 1284:must be lowercase in XML documents. The 1058:and related labels. Handled the same as 2373:"4.2: Names and labels (§ replacement)" 1833: 1572:Fielding, R.; Reschke, J. (June 2014), 1166: 510:As of HTML5 the recommended charset is 366:can include the character encoding or " 37:For fixing links within Knowledge, see 14: 2461: 1905:"Notable Differences from IANA Naming" 1674:"Prolog and Document Type Declaration" 576: 409:element near the top of the document: 383:Content-Type: text/html; charset=utf-8 1804: 1726: 1708: 1576:, in Fielding, R; Reschke, J (eds.), 1225:in HTML refers to a character by its 1027:Titled KOI8-U and specified for both 789:Hong Kong Supplementary Character Set 2439:Character entity references in HTML4 435:"text/html; charset=utf-8" 2474:World Wide Web Consortium standards 2141:"6. Hooks for standards (§ decode)" 1927:"5. Indexes (§ index Big5 pointer)" 24: 1043:) but KOI8-U in positions 0x93–9F. 808:extensions, and is more precisely 25: 2485: 2427: 2403:"Character and Entity References" 331:While Hypertext Markup Language ( 1618:"Apache Module mod_charset_lite" 550:charset label manually as well. 2387: 2361: 2335: 2309: 2283: 2257: 2236: 2181: 2155: 2129: 2103: 2048: 2022: 1967: 1941: 1915: 1862: 1069: 1046: 1021: 1012: 889: 866: 853: 836: 815: 794: 782: 761: 746: 725: 704: 2295:"5. Indexes (§ Index jis0212)" 1953:"5. Indexes (§ Index jis0208)" 1848:"5. Indexes (§ index gb18030)" 1757:"12.2.3.3 Character encodings" 1734:"8.2.2.3. Character encodings" 1716:"8.2.2.3. Character encodings" 1694: 1658: 1610: 1565: 1079:is included for decoding only. 13: 1: 2060:"5. Indexes (§ index EUC-KR)" 2034:"12.2.2. ISO-2022-JP encoder" 2008:"12.2.1. ISO-2022-JP decoder" 1558: 293:Comparison of browser engines 2222:"index KOI8-U visualization" 1494: 1478: 1462: 1446: 1428: 1187:numeric character references 506:Encoding detection algorithm 7: 1530: 1223:numeric character reference 1199:character entity references 1177:Numeric character reference 800:The specification includes 372:Hypertext Transfer Protocol 10: 2490: 1173:Character entity reference 1170: 36: 29: 1874:"10.2.1. gb18030 decoder" 1819:"10.2.2. gb18030 encoder" 1645:World Wide Web Consortium 1217:HTML character references 518:Explicit user instruction 478:declaration, as follows: 288:Document markup languages 2347:"2: Security background" 1410:XML character references 1390:none at all if a native 480: 448: 426:"Content-Type" 411: 380: 18:WHATWG Encoding Standard 2089:"4.3. Output encodings" 1790:"4.2: Names and labels" 1227:Universal Character Set 2167:"14.5. x-user-defined" 1321:character entity names 1236:, and uses the format 1276:is the code point in 1268:is the code point in 1125:replacement character 179:Document Object Model 2399:Sperberg-McQueen, C. 1761:HTML Living Standard 1670:Sperberg-McQueen, C. 1400:cross-site scripting 1378:(') entity prior to 1183:character references 1167:Character references 1129:cross site scripting 184:Browser Object Model 2321:"14.1: replacement" 1052:Also specified for 844:Unified Hangul Code 758:and related labels. 752:Also specified for 743:and related labels. 731:Also specified for 722:and related labels. 710:Also specified for 577:Permitted encodings 390:content negotiation 157:Character encodings 2369:van Kesteren, Anne 2343:van Kesteren, Anne 2317:van Kesteren, Anne 2291:van Kesteren, Anne 2265:van Kesteren, Anne 2218:van Kesteren, Anne 2189:van Kesteren, Anne 2163:van Kesteren, Anne 2137:van Kesteren, Anne 2111:van Kesteren, Anne 2085:van Kesteren, Anne 2056:van Kesteren, Anne 2030:van Kesteren, Anne 2004:van Kesteren, Anne 1975:van Kesteren, Anne 1949:van Kesteren, Anne 1923:van Kesteren, Anne 1901:Mozilla Foundation 1870:van Kesteren, Anne 1844:van Kesteren, Anne 1815:van Kesteren, Anne 1786:van Kesteren, Anne 1647:, 14 December 2017 878:label, although a 2377:Encoding Standard 2351:Encoding Standard 2325:Encoding Standard 2299:Encoding Standard 2273:Encoding Standard 2253:. 19 August 2015. 2226:Encoding Standard 2197:Encoding Standard 2171:Encoding Standard 2145:Encoding Standard 2119:Encoding Standard 2093:Encoding Standard 2064:Encoding Standard 2038:Encoding Standard 2012:Encoding Standard 1983:Encoding Standard 1957:Encoding Standard 1931:Encoding Standard 1909:Crate encoding_rs 1878:Encoding Standard 1852:Encoding Standard 1823:Encoding Standard 1794:Encoding Standard 1720:HTML 5.1 Standard 1505: 1504: 1466:greater-than sign 769:ideographic space 463:"utf-8" 329: 328: 16:(Redirected from 2481: 2421: 2420: 2419: 2417: 2391: 2385: 2384: 2365: 2359: 2358: 2339: 2333: 2332: 2313: 2307: 2306: 2287: 2281: 2280: 2261: 2255: 2254: 2240: 2234: 2233: 2214: 2205: 2204: 2185: 2179: 2178: 2159: 2153: 2152: 2133: 2127: 2126: 2115:"14.4. UTF-16LE" 2107: 2101: 2100: 2081: 2072: 2071: 2052: 2046: 2045: 2026: 2020: 2019: 2000: 1991: 1990: 1971: 1965: 1964: 1945: 1939: 1938: 1919: 1913: 1912: 1897: 1886: 1885: 1866: 1860: 1859: 1840: 1831: 1830: 1811: 1802: 1801: 1782: 1765: 1764: 1753: 1742: 1741: 1730: 1724: 1723: 1712: 1706: 1705: 1698: 1692: 1691: 1690: 1688: 1662: 1656: 1655: 1654: 1652: 1633: 1622: 1621: 1614: 1608: 1607: 1606: 1604: 1590:10.17487/RFC7231 1569: 1543:Unicode and HTML 1537:Charset sniffing 1522: 1518: 1514: 1510: 1492: 1476: 1460: 1444: 1426: 1421: 1420: 1377: 1373: 1369: 1365: 1361: 1357: 1353: 1349: 1345: 1341: 1333: 1260: 1246: 1080: 1073: 1067: 1065: 1061: 1057: 1050: 1044: 1035:labels; follows 1034: 1030: 1025: 1019: 1016: 940: 900: 897:Private Use Area 893: 887: 877: 870: 864: 857: 851: 848:Hangul Syllables 840: 834: 819: 813: 798: 792: 786: 780: 765: 759: 757: 750: 744: 742: 736: 729: 723: 721: 715: 708: 484: 467: 464: 461: 458: 455: 452: 439: 436: 433: 430: 427: 424: 421: 418: 415: 408: 398: 397:mod_charset_lite 384: 377: 369: 347:, and universal 321: 314: 307: 272:Rendering engine 162:named characters 46: 45: 21: 2489: 2488: 2484: 2483: 2482: 2480: 2479: 2478: 2459: 2458: 2430: 2425: 2424: 2415: 2413: 2392: 2388: 2366: 2362: 2340: 2336: 2314: 2310: 2288: 2284: 2262: 2258: 2242: 2241: 2237: 2215: 2208: 2186: 2182: 2160: 2156: 2134: 2130: 2108: 2104: 2082: 2075: 2053: 2049: 2027: 2023: 2001: 1994: 1972: 1968: 1946: 1942: 1920: 1916: 1898: 1889: 1867: 1863: 1841: 1834: 1812: 1805: 1783: 1768: 1755: 1754: 1745: 1738:HTML 5 Standard 1732: 1731: 1727: 1714: 1713: 1709: 1700: 1699: 1695: 1686: 1684: 1663: 1659: 1650: 1648: 1635: 1634: 1625: 1616: 1615: 1611: 1602: 1600: 1570: 1566: 1561: 1533: 1520: 1516: 1512: 1508: 1490: 1474: 1458: 1442: 1424: 1412: 1375: 1371: 1367: 1363: 1359: 1355: 1351: 1347: 1343: 1339: 1327: 1254: 1240: 1219: 1185:, which can be 1179: 1171:Main articles: 1169: 1164: 1155:ISO-2022-CN-EXT 1121: 1084: 1083: 1074: 1070: 1063: 1059: 1053: 1051: 1047: 1032: 1028: 1026: 1022: 1017: 1013: 1008: 994:Mac OS Cyrillic 938: 904: 903: 894: 890: 880:byte order mark 873: 871: 867: 858: 854: 841: 837: 823:Half-width kana 820: 816: 799: 795: 787: 783: 766: 762: 753: 751: 747: 738: 732: 730: 726: 717: 711: 709: 705: 700: 579: 526:byte order mark 508: 491:ASCII extension 486: 485: 482: 469: 468: 465: 462: 459: 456: 453: 450: 441: 440: 437: 434: 431: 428: 425: 422: 419: 416: 413: 406: 396: 386: 385: 382: 375: 367: 357: 325: 42: 35: 28: 23: 22: 15: 12: 11: 5: 2487: 2477: 2476: 2471: 2457: 2456: 2451: 2446: 2441: 2436: 2429: 2428:External links 2426: 2423: 2422: 2386: 2360: 2334: 2308: 2282: 2256: 2235: 2206: 2180: 2154: 2128: 2102: 2073: 2047: 2021: 1992: 1966: 1940: 1914: 1887: 1861: 1832: 1803: 1766: 1743: 1725: 1707: 1693: 1657: 1623: 1609: 1574:"Content-Type" 1563: 1562: 1560: 1557: 1556: 1555: 1550: 1545: 1540: 1532: 1529: 1503: 1502: 1499: 1496: 1493: 1487: 1486: 1483: 1482:quotation mark 1480: 1477: 1471: 1470: 1467: 1464: 1461: 1455: 1454: 1451: 1450:less-than sign 1448: 1445: 1439: 1438: 1435: 1430: 1427: 1411: 1408: 1394:encoding like 1262: 1261: 1248: 1247: 1218: 1215: 1203:named entities 1168: 1165: 1163: 1162: 1157: 1152: 1147: 1141: 1120: 1119: 1114: 1109: 1104: 1099: 1094: 1088: 1082: 1081: 1068: 1045: 1020: 1010: 1009: 1007: 1006: 1001: 996: 991: 986: 981: 972: 967: 962: 957: 952: 947: 942: 934: 929: 924: 919: 914: 908: 902: 901: 888: 865: 852: 835: 814: 793: 781: 760: 745: 724: 702: 701: 699: 698: 697:x-user-defined 695: 690: 685: 680: 675: 670: 665: 660: 655: 650: 645: 640: 635: 630: 625: 620: 615: 610: 605: 599: 578: 575: 536: 535: 532: 529: 522: 519: 507: 504: 481: 449: 412: 381: 356: 353: 327: 326: 324: 323: 316: 309: 301: 298: 297: 296: 295: 290: 282: 281: 277: 276: 275: 274: 269: 264: 259: 254: 253: 252: 242: 241: 240: 235: 230: 220: 219: 218: 208: 203: 198: 197: 196: 186: 181: 176: 171: 170: 169: 164: 154: 149: 144: 143: 142: 135:HTML attribute 132: 131: 130: 125: 120: 115: 105: 104: 103: 101:Mobile Profile 98: 88: 87: 86: 81: 76: 71: 61: 53: 52: 26: 9: 6: 4: 3: 2: 2486: 2475: 2472: 2470: 2467: 2466: 2464: 2455: 2452: 2450: 2447: 2445: 2442: 2440: 2437: 2435: 2432: 2431: 2412: 2408: 2404: 2400: 2397:; Paoli, J.; 2396: 2390: 2382: 2378: 2374: 2370: 2364: 2356: 2352: 2348: 2344: 2338: 2330: 2326: 2322: 2318: 2312: 2304: 2300: 2296: 2292: 2286: 2278: 2274: 2270: 2266: 2260: 2252: 2250: 2245: 2239: 2231: 2227: 2223: 2219: 2213: 2211: 2202: 2198: 2194: 2190: 2184: 2176: 2172: 2168: 2164: 2158: 2150: 2146: 2142: 2138: 2132: 2124: 2120: 2116: 2112: 2106: 2098: 2094: 2090: 2086: 2080: 2078: 2069: 2065: 2061: 2057: 2051: 2043: 2039: 2035: 2031: 2025: 2017: 2013: 2009: 2005: 1999: 1997: 1988: 1984: 1980: 1976: 1970: 1962: 1958: 1954: 1950: 1944: 1936: 1932: 1928: 1924: 1918: 1910: 1906: 1902: 1896: 1894: 1892: 1883: 1879: 1875: 1871: 1865: 1857: 1853: 1849: 1845: 1839: 1837: 1828: 1824: 1820: 1816: 1810: 1808: 1799: 1795: 1791: 1787: 1781: 1779: 1777: 1775: 1773: 1771: 1762: 1758: 1752: 1750: 1748: 1739: 1735: 1729: 1721: 1717: 1711: 1703: 1697: 1683: 1679: 1675: 1671: 1668:; Paoli, J.; 1667: 1661: 1646: 1642: 1638: 1632: 1630: 1628: 1619: 1613: 1599: 1595: 1591: 1587: 1583: 1579: 1575: 1568: 1564: 1554: 1551: 1549: 1548:Language code 1546: 1544: 1541: 1538: 1535: 1534: 1528: 1526: 1500: 1497: 1489: 1488: 1484: 1481: 1473: 1472: 1468: 1465: 1457: 1456: 1452: 1449: 1441: 1440: 1436: 1434: 1431: 1423: 1422: 1419: 1417: 1407: 1405: 1401: 1397: 1393: 1387: 1385: 1381: 1337: 1331: 1324: 1322: 1318: 1313: 1310: 1306: 1305:email clients 1302: 1297: 1295: 1291: 1287: 1283: 1279: 1275: 1271: 1267: 1258: 1253: 1252: 1251: 1244: 1239: 1238: 1237: 1235: 1232: 1228: 1224: 1214: 1212: 1208: 1207:HTML entities 1204: 1200: 1196: 1192: 1188: 1184: 1178: 1174: 1161: 1158: 1156: 1153: 1151: 1148: 1146: 1143: 1142: 1140: 1138: 1134: 1130: 1126: 1118: 1115: 1113: 1110: 1108: 1105: 1103: 1100: 1098: 1095: 1093: 1090: 1089: 1087: 1078: 1072: 1056: 1049: 1042: 1038: 1024: 1015: 1011: 1005: 1002: 1000: 997: 995: 992: 990: 987: 985: 982: 980: 976: 973: 971: 968: 966: 963: 961: 958: 956: 953: 951: 948: 946: 943: 941: 935: 933: 930: 928: 925: 923: 920: 918: 915: 913: 912:Code page 866 910: 909: 907: 898: 892: 885: 881: 876: 869: 862: 856: 849: 845: 839: 832: 828: 824: 818: 811: 807: 803: 797: 790: 785: 778: 775:(U+20AC; see 774: 770: 764: 756: 749: 741: 735: 728: 720: 714: 707: 703: 696: 694: 691: 689: 686: 684: 681: 679: 676: 674: 671: 669: 666: 664: 661: 659: 656: 654: 651: 649: 646: 644: 641: 639: 636: 634: 631: 629: 626: 624: 621: 619: 616: 614: 611: 609: 606: 604: 601: 600: 598: 595: 594:exclusively. 593: 588: 584: 574: 570: 568: 567:byte-oriented 564: 560: 556: 551: 549: 545: 541: 533: 530: 527: 523: 520: 517: 516: 515: 513: 503: 501: 497: 492: 479: 477: 473: 447: 445: 410: 403: 400: 395: 391: 379: 373: 365: 360: 352: 350: 346: 342: 338: 334: 322: 317: 315: 310: 308: 303: 302: 300: 299: 294: 291: 289: 286: 285: 284: 283: 279: 278: 273: 270: 268: 265: 263: 260: 258: 255: 251: 248: 247: 246: 243: 239: 236: 234: 231: 229: 226: 225: 224: 221: 217: 214: 213: 212: 209: 207: 204: 202: 199: 195: 192: 191: 190: 187: 185: 182: 180: 177: 175: 174:Language code 172: 168: 165: 163: 160: 159: 158: 155: 153: 150: 148: 145: 141: 140:alt attribute 138: 137: 136: 133: 129: 126: 124: 121: 119: 116: 114: 111: 110: 109: 106: 102: 99: 97: 94: 93: 92: 89: 85: 82: 80: 77: 75: 72: 70: 67: 66: 65: 62: 60: 57: 56: 55: 54: 51: 48: 47: 44: 40: 33: 19: 2414:, retrieved 2406: 2389: 2376: 2363: 2350: 2337: 2324: 2311: 2298: 2285: 2272: 2259: 2247: 2238: 2225: 2196: 2183: 2170: 2157: 2144: 2131: 2118: 2105: 2092: 2063: 2050: 2037: 2024: 2011: 1982: 1969: 1956: 1943: 1930: 1917: 1908: 1877: 1864: 1851: 1822: 1793: 1760: 1737: 1728: 1719: 1710: 1696: 1685:, retrieved 1677: 1660: 1649:, retrieved 1640: 1612: 1601:, retrieved 1577: 1567: 1519:rather than 1509:&eacute; 1506: 1413: 1388: 1340:&lambda; 1335: 1329: 1325: 1314: 1309:text editors 1301:web browsers 1298: 1293: 1289: 1285: 1281: 1273: 1265: 1263: 1256: 1249: 1242: 1233: 1222: 1220: 1206: 1202: 1198: 1186: 1182: 1180: 1122: 1085: 1071: 1048: 1023: 1014: 989:Windows-1253 984:Mac OS Roman 905: 891: 868: 855: 838: 817: 796: 784: 763: 748: 727: 706: 658:Windows-1258 653:Windows-1257 648:Windows-1256 643:Windows-1255 638:Windows-1254 633:Windows-1252 628:Windows-1251 623:Windows-1250 596: 580: 571: 552: 547: 537: 509: 487: 470: 442: 404: 401: 387: 376:Content-Type 361: 358: 330: 189:Style sheets 156: 118:div and span 108:HTML element 59:Dynamic HTML 43: 2269:"10.1. GBK" 1278:hexadecimal 1195:hexadecimal 1150:ISO-2022-CN 1145:ISO-2022-KR 1133:ISO-2022-JP 965:ISO-8859-16 960:ISO-8859-15 955:ISO-8859-14 950:ISO-8859-13 945:ISO-8859-10 937:ISO-8859-8- 810:Windows-31J 777:Windows-936 719:ISO-8859-11 678:ISO-2022-JP 618:Windows-874 362:First, the 280:Comparisons 267:Web storage 262:Quirks mode 201:Font family 152:HTML editor 2463:Categories 1911:. docs.rs. 1559:References 1521:&#XA1b 1517:&#xA1b 1498:apostrophe 1491:&apos; 1475:&quot; 1404:whitespace 1376:&apos; 1352:&quot; 1280:form. The 1272:form, and 1234:code point 1160:HZ-GB-2312 1077:JIS X 0212 932:ISO-8859-6 927:ISO-8859-5 922:ISO-8859-4 917:ISO-8859-3 755:ISO-8859-9 740:ISO-8859-1 613:ISO-8859-8 608:ISO-8859-7 603:ISO-8859-2 420:http-equiv 364:web server 337:characters 211:JavaScript 206:Web colors 147:HTML frame 1763:. WHATWG. 1433:ampersand 1425:&amp; 1356:&amp; 842:Actually 827:Shift Out 773:euro sign 673:Shift JIS 548:incorrect 370:" in the 351:display. 345:integrity 250:Validator 2395:Bray, T. 2251:Bugzilla 1666:Bray, T. 1598:14399078 1531:See also 1459:&gt; 1443:&lt; 1348:&gt; 1344:&lt; 1299:Not all 1060:GB 18030 831:Shift In 693:UTF-16LE 688:UTF-16BE 663:GB 18030 500:UTF-16LE 496:UTF-16BE 2416:8 March 1687:8 March 1603:30 July 1501:U+0027 1485:U+0022 1469:U+003E 1453:U+003C 1437:U+0026 1392:Unicode 1270:decimal 1255:&#x 1231:Unicode 1191:decimal 1064:GB 2312 1037:KOI8-RU 1033:KOI8-RU 979:KOI8-RU 713:TIS-620 540:English 457:charset 429:content 374:(HTTP) 368:charset 349:browser 167:Unicode 128:marquee 69:article 2381:WHATWG 2355:WHATWG 2329:WHATWG 2303:WHATWG 2277:WHATWG 2230:WHATWG 2201:WHATWG 2175:WHATWG 2149:WHATWG 2123:WHATWG 2097:WHATWG 2068:WHATWG 2042:WHATWG 2016:WHATWG 1987:WHATWG 1961:WHATWG 1935:WHATWG 1882:WHATWG 1856:WHATWG 1827:WHATWG 1798:WHATWG 1740:. W3C. 1722:. W3C. 1651:28 May 1596:  1334:where 1264:where 1241:&# 1137:UTF-16 1117:UTF-32 1112:EBCDIC 1102:BOCU-1 1092:CESU-8 1055:GB2312 1029:KOI8-U 1004:EUC-JP 975:KOI8-U 970:KOI8-R 875:UTF-16 850:block. 683:EUC-KR 583:WHATWG 563:UTF-32 559:UTF-16 394:module 257:WHATWG 233:WebGPU 79:canvas 1641:HTML5 1594:S2CID 1525:XHTML 1429:& 1396:UTF-8 1380:HTML5 1372:& 1328:& 1317:ASCII 1205:, or 1197:) or 1097:UTF-7 884:UTF-8 861:UTF-8 734:ASCII 592:UTF-8 555:UTF-8 512:UTF-8 472:XHTML 444:HTML5 341:ASCII 238:WebXR 228:WebGL 223:Web3D 216:WebCL 123:blink 96:Basic 91:XHTML 84:video 74:audio 64:HTML5 2469:HTML 2418:2010 1689:2010 1653:2018 1605:2014 1582:IETF 1463:> 1447:< 1370:and 1364:> 1360:< 1354:and 1336:name 1330:name 1294:hhhh 1290:hhhh 1286:nnnn 1274:hhhh 1266:nnnn 1257:hhhh 1243:nnnn 1211:SGML 1175:and 1135:and 1107:SCSU 1031:and 829:and 804:and 668:Big5 581:The 498:and 466:> 454:meta 451:< 438:> 417:meta 414:< 407:head 333:HTML 113:meta 50:HTML 2411:W3C 2407:XML 2249:W3C 1682:W3C 1678:XML 1586:doi 1416:XML 1303:or 1288:or 1250:or 1193:or 1041:Ў/ў 999:GBK 806:NEC 802:IBM 587:W3C 561:or 544:CJK 476:XML 245:W3C 194:CSS 2465:: 2409:, 2405:, 2379:. 2375:. 2371:. 2353:. 2349:. 2345:. 2327:. 2323:. 2319:. 2301:. 2297:. 2293:. 2275:. 2271:. 2267:. 2246:. 2228:. 2224:. 2220:. 2209:^ 2199:. 2195:. 2191:. 2173:. 2169:. 2165:. 2147:. 2143:. 2139:. 2121:. 2117:. 2113:. 2095:. 2091:. 2087:. 2076:^ 2066:. 2062:. 2058:. 2040:. 2036:. 2032:. 2014:. 2010:. 2006:. 1995:^ 1985:. 1981:. 1977:. 1959:. 1955:. 1951:. 1933:. 1929:. 1925:. 1907:. 1903:. 1890:^ 1880:. 1876:. 1872:. 1854:. 1850:. 1846:. 1835:^ 1825:. 1821:. 1817:. 1806:^ 1796:. 1792:. 1788:. 1769:^ 1759:. 1746:^ 1736:. 1718:. 1680:, 1676:, 1643:, 1639:, 1626:^ 1592:, 1584:, 1580:, 1523:. 1386:. 1366:, 1362:, 1350:, 1346:, 1221:A 1213:. 977:/ 737:, 716:, 524:A 399:. 2383:. 2357:. 2331:. 2305:. 2279:. 2232:. 2203:. 2177:. 2151:. 2125:. 2099:. 2070:. 2044:. 2018:. 1989:. 1963:. 1937:. 1884:. 1858:. 1829:. 1800:. 1704:. 1620:. 1588:: 1513:x 1495:' 1479:" 1368:" 1332:; 1282:x 1259:; 1245:; 1229:/ 1189:( 939:I 886:. 863:. 812:. 460:= 432:= 423:= 320:e 313:t 306:v 41:. 34:. 20:)

Index

WHATWG Encoding Standard
List of XML and HTML character entity references
Help:Percent-encoding § Fixing Links with Unsupported Characters
HTML
Dynamic HTML
HTML5
article
audio
canvas
video
XHTML
Basic
Mobile Profile
HTML element
meta
div and span
blink
marquee
HTML attribute
alt attribute
HTML frame
HTML editor
Character encodings
named characters
Unicode
Language code
Document Object Model
Browser Object Model
Style sheets
CSS

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.