Knowledge

Universal Character Set characters

Source 📝

892:. Number forms primarily consist of precomposed fractions and Roman numerals. Like other areas of composing sequences of characters, the Unicode approach prefers the flexibility of composing fractions by combining characters together. In this case to create fractions, one combines numbers with the fraction slash character (U+2044). As an example of the flexibility this approach provides, there are nineteen precomposed fraction characters included within the UCS. However, there are an infinity of possible fractions. By using composing characters the infinity of fractions is handled by 11 characters (0-9 and the fraction slash). No character set could include code points for every precomposed fraction. Ideally a text system should present the same glyphs for a fraction whether it is one of the precomposed fractions (such as 1066:. Several blocks in the UCS are devoted almost entirely to compatibility characters. Compatibility characters are those included for support of legacy text handling systems that do not make a distinction between character and glyph the way Unicode does. For example, many Arabic letters are represented by a different glyph when the letter appears at the end of a word than when the letter appears at the beginning of a word. Unicode's approach prefers to have these letters mapped to the same character for ease of internal machine text processing and storage. To complement this approach, the text software must select different glyph variants for display of the character based on its context. Over 4000 characters are included for such compatibility reasons. 2410:". Traditionally, other character sets assigned a unique character code point for each diacritic modified letter used in each language. Unicode seeks to create a more flexible approach by allowing combining diacritic characters to combine with any letter. This has the potential to significantly reduce the number of active code points needed for the character set. As an example, consider a language that uses the Latin script and combines the diaeresis with the upper- and lower-case letters "a", "o", and "u". With the Unicode approach, only the diaeresis diacritic character needs to be added to the character set to use with the Latin letters: "a", "A", "o", "O", "u", and "U": seven characters in all. A legacy character sets needs to add six 2356:) permanently reserved for internal use, and therefore guaranteed to never be assigned to a character. Each of the 17 planes has its two ending code points set aside as noncharacters. So, noncharacters are: U+FFFE and U+FFFF on the BMP, U+1FFFE and U+1FFFF on Plane 1, and so on, up to U+10FFFE and U+10FFFF on Plane 16, for a total of 34 code points. In addition, there is a contiguous range of another 32 noncharacter code points in the BMP: U+FDD0..U+FDEF. Software implementations are free to use these code points for internal use. One particularly useful example of a noncharacter is the code point U+FFFE. This code point has the reverse UTF-16/UCS-2 byte sequence of the 2221:(PUA). The Unicode standard recognizes code points within PUAs as legitimate Unicode character codes, but does not assign them any (abstract) character. Instead, individuals, organizations, software vendors, operating system vendors, font vendors and communities of end-users are free to use them as they see fit. Within closed systems, characters in the PUA can operate unambiguously, allowing such systems to represent characters or glyphs not defined in Unicode. In public systems their use is more problematic, since there is no registry and no way to prevent several organizations from adopting the same code points for different purposes. One example of such a conflict is 1128:, provide for contextual substitution and positioning of glyphs, a simple text layout engine might rely entirely on the font for all decisions of glyph choice and placement. In the same situation a more complex engine may combine information from the font with its own rules to achieve its own idea of best rendering. To implement all recommendations of the Unicode specification, a text engine must be prepared to work with fonts of any level of sophistication, since contextual substitution and positioning rules do not exist in some font formats and are optional in the rest. The 1219: 1078:. The UCS includes 2048 code points in the Basic Multilingual Plane (BMP) for surrogate code point pairs. Together these surrogates allow any code point in the sixteen other planes to be addressed by using two surrogate code points. This provides a simple built-in method for encoding the 20.1 bit UCS within a 16 bit encoding such as UTF-16. In this way UTF-16 can represent any character within the BMP with a single 16-bit word. Characters outside the BMP are then encoded using two 16-bit words (4 octets or bytes total) using the surrogate pairs. 1973:(NEXT LINE) to be whitespace, even though Unicode does. Whitespace characters are characters typically designated for programming environments. Often they have no syntactic meaning in such programming environments and are ignored by the machine interpreters. Unicode designates the legacy control characters U+0009 through U+000D and U+0085 as whitespace characters, as well as all characters whose General Category property value is Separator. There are 25 total whitespace characters as of Unicode 16.0. 1989:(U+200C) control the joining and ligation of glyphs. The joiner does not cause characters that would not otherwise join or ligate to do so, but when paired with the non-joiner these characters can be used to control the joining and ligating properties of the surrounding two joining or ligating characters. The Combining Grapheme Joiner (U+034F) is used to distinguish two base characters as one common base or digraph, mostly for underlying text processing, collation of strings, case folding and so on. 56: 4724: 4713: 850:. An important advance conceived by Unicode in designing the UCS and related algorithms for handling text was the introduction of combining diacritic marks. By providing accents that can combine with any letter character, the Unicode and the UCS reduce significantly the number of characters needed. While the UCS also includes precomposed characters, these were included primarily to facilitate support within UCS for non-Unicode text processing systems. 49: 2506: 1214:) shows the synthesized common fraction on the left and the precomposed fraction glyph on the right as a rendering the plain text string "1 1⁄4 1¼". Depending on the text environment, the single string "1 1⁄4" might yield either result, the one on the right through substitution of the fraction sequence with the single precomposed fraction glyph. 1243:. If the displaying software is incapable of mapping the fraction to a unit, then it can also be displayed as a simple linear sequence as a fallback (for example, 3/4). If the fraction is to be separated from a previous number, then a space can be used, choosing the appropriate width (normal, thin, zero width, and so on). For example, 1 + 778:(0—9, A—F) preceding the four final ones: hence U+24321 is in Plane 2, U+4321 is in Plane 0 (implicitly read U+04321), and U+10A200 would be in Plane 16 (hex 10 = decimal 16). Within one plane, the range of code points is hexadecimal 0000—FFFF, yielding a maximum of 65536 code points. Planes restrict code points to a subset of that range. 793:
allocates entire blocks of similar characters: for example all the characters belonging to the same script or all similarly purposed symbols get assigned to a single block. Blocks may also maintain unassigned or reserved code points when the Consortium expects a block to require additional assignments.
2453:
Every character in Unicode is defined by a large and growing set of properties. Most of these properties are not part of Universal Character Set. The properties facilitate text processing including collation or sorting of text, identifying words, sentences and graphemes, rendering or imaging text and
2198:
The majority of code points in actual use have been assigned to abstract characters. This includes private-use characters, which though not formally designated by the Unicode standard for a particular purpose, require a sender and recipient to have agreed in advance how they should be interpreted for
2115:
Aside from the original ASCII space, the other spaces are all compatibility characters. In this context this means that they effectively add no semantic content to the text, but instead provide styling control. Within Unicode, this non-semantic styling control is often referred to as rich text and is
1997:
The most common word separator is a space (U+0020). However, there are other word joiners and separators that also indicate a break between words and participate in line-breaking algorithms. The No-Break Space (U+00A0) also produces a baseline advance without a glyph but inhibits rather than enabling
1355:
While Unicode is designed to handle multiple languages, multiple writing systems and even text that flows either left-to-right or right-to-left with minimal author intervention, there are special circumstances where the mix of bidirectional text can become intricate—requiring more author control. For
1183:
Primarily for mathematics, the Invisible Separator (U+2063) provides a separator between characters where punctuation or space may be omitted such as in a two-dimensional index like i⁣j. Invisible Times (U+2062) and Function Application (U+2061) are useful in mathematics text where the multiplication
2073:
The space character (U+0020) typically input by the space bar on a keyboard serves semantically as a word separator in many languages. For legacy reasons, the UCS also includes spaces of varying sizes that are compatibility equivalents for the space character. While these spaces of varying width are
2058:
These provide Unicode with native paragraph and line separators independent of the legacy encoded ASCII control characters such as carriage return (U+000A), linefeed (U+000D), and Next Line (U+0085). Unicode does not provide for other ASCII formatting control characters which presumably then are not
1325:
The render-time directional type of a neutral character can remain ambiguous when the mark is placed on the boundary between directional changes. To address this, Unicode includes characters that have strong directionality, have no glyph associated with them, and are ignorable by systems that do not
759:
Unicode and ISO divide the set of code points into 17 planes, each capable of containing 65536 distinct characters or 1,114,112 total. As of 2024 (Unicode 16.0) ISO and the Unicode Consortium has only allocated characters and blocks in seven of the 17 planes. The others remain empty and reserved for
2591:
For numerals, this property indicates the numeric value of the character. Decimal digits have all three values set to the same value, presentational rich text compatibility characters and other Arabic-Indic non-decimal digits typically have only the latter two properties set to the numeric value of
2479:
This is a permanent name assigned by the joint cooperation of Unicode and the ISO UCS. A few known poorly chosen names exist and are acknowledged (e.g. U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET, which is misspelled – should be BRACKET) but will not be changed, in order to
2394:
Whereas many other character sets assign a character for every possible glyph representation of the character, Unicode seeks to treat characters separately from glyphs. This distinction is not always unambiguous; however, a few examples will help illustrate the distinction. Often two characters may
1317:
While lexical characters (that is, letters) are normally specific to a single writing script, some symbols and punctuation marks are used across many writing scripts. Unicode could have created duplicate symbols in the repertoire that differ only by directional type, but chose instead to unify them
1292:
being the substituted precomposed fraction, rather than synthesized). In the face of problems like this, those who wish to rely on the recommended Unicode behavior should choose fonts known to synthesize fractions or text layout software known to produce Unicode's recommended behavior regardless of
2537:
Since diacritics and other combining marks can be expressed with multiple characters in Unicode the "Combining Class" property allows characters to be differentiated by the type of combining character it represents. The combining class can be expressed as an integer between 0 and 255 or as a named
2438:
However, for UCS and Unicode in particular, the preferred approach is to always encode or map that letter to the same character no matter where it appears in a word. Then the distinct forms of each letter are determined by the font and text layout software methods. In this way, the internal memory
1039:
in what Unicode refers to as Unihan (for Unified Han). With Unihan, the text layout software must work together with the available fonts and these Unicode characters to produce the appropriate glyph for the appropriate language. Despite unifying these characters, the UCS still includes over 97,000
2434:
written in an Arabic script are always cursive. Each letter has many different forms. UCS includes 730 Arabic form characters that decompose to just 88 unique Arabic characters. However, these additional Arabic characters are included so that text processing software may translate text from other
2074:
important in typography, the Unicode processing model calls for such visual effects to be handled by rich text, markup and other such protocols. They are included in the Unicode repertoire primarily to handle lossless roundtrip transcoding from other character set encodings. These spaces include:
1263:
More sophisticated layout engines face two practical choices: they can follow Unicode's recommendation, or they can rely on the font's own instructions for synthesizing fractions. By ignoring the font's instructions, the layout engine can guarantee Unicode's recommended behavior. By following the
1167:
byte order. Conversely, if the first two bytes are 0xFF, 0xFE, then the text stream may be assumed to be encoded as UTF-16LE because, read as a 16-bit little-endian value, the bytes yield the expected 0xFEFF byte order mark. This assumption becomes questionable, however, if the next two bytes are
2138:
Several characters are designed to help control line-breaks either by discouraging them (no-break characters) or suggesting line breaks such as the soft hyphen (U+00AD) (sometimes called the "shy hyphen"). Such characters, though designed for styling, are probably indispensable for the intricate
1340:
Surrounding a bidirectionally neutral character by the left-to-right mark will force the character to behave as a left-to-right character while surrounding it by the right-to-left mark will force it to behave as a right-to-left character. The behavior of these characters is detailed in Unicode's
1238:
The standard form of a fraction built using the fraction slash is defined as follows: any sequence of one or more decimal digits (General Category = Nd), followed by the fraction slash, followed by any sequence of one or more decimal digits. Such a fraction should be displayed as a unit, such as
2285:
without resorting to more-than-16-bit-word representations. There are 1024 "high" surrogates (D800–DBFF) and 1024 "low" surrogates (DC00–DFFF). By combining a pair of surrogates, the remaining characters in all the other planes can be addressed (1024 × 1024 = 1048576 code points in the other 16
1271:
The problem with following the font's instructions is that the simpler font formats have no way to specify fraction synthesis behavior. Meanwhile, the more complex formats do not require the font to specify fraction synthesis behavior and therefore many do not. Most fonts of complex formats can
1259:
By following this Unicode recommendation, text processing systems yield sophisticated symbols from plain text alone. Here the presence of the fraction slash character instructs the layout engine to synthesize a fraction from all consecutive digits preceding and following the slash. In practice,
792:
Unicode adds a block property to UCS that further divides each plane into separate blocks. Each block is a grouping of characters by their use such as "mathematical operators" or "Hebrew script characters". When assigning characters to previously unassigned code points, the Consortium typically
1105:
Unicode codifies over a hundred thousand characters. Most of those represent graphemes for processing as linear text. Some, however, either do not represent graphemes, or, as graphemes, require exceptional treatment. Unlike the ASCII control characters and other characters included for legacy
2422:
UCS includes thousands of characters that Unicode designates as compatibility characters. These are characters that were included in UCS in order to provide distinct code points for characters that other character sets differentiate, but would not be differentiated in the Unicode approach to
1184:
of terms or the application of a function is implied without any glyph indicating the operation. Unicode 5.1 introduces the Mathematical Invisible Plus character as well (U+2064) which may indicate that an integral number followed by a fraction should denote their sum, but not their product.
812:. Though Unicode refers to these as a Latin script block, these two blocks contain many characters that are commonly useful outside of the Latin script. In general, not all characters in a given block need be of the same script, and a given script can occur in several different blocks. 2564:
Indicates the character's glyph must be reversed or mirrored within the bidirectional algorithm. Mirrored glyphs can be provided by font makers, extracted from other characters related through the "Bidirectional Mirroring Glyph" property or synthesized by the text rendering system.
1998:
a line-break. The Zero Width Space (U+200B) allows a line-break but provides no space: in a sense joining, rather than separating, two words. Finally, the Word Joiner (U+2060) inhibits line breaks and also involves none of the white space produced by a baseline advance.
2395:
be combined typographically to improve the readability of the text. For example, the three letter sequence "ffi" may be treated as a single glyph. Other character sets would often assign a code point to this glyph in addition to the individual letters: "f" and "i".
1301:
Writing direction is the direction glyphs are placed on the page in relation to forward progression of characters in the Unicode string. English and other languages of Latin script have left-to-right writing direction. Several major writing scripts, such as
1090:. The consortium guarantees certain code points will never be assigned a character and calls these noncharacter code points. These include the range U+FDD0..U+FDEF, and the last two code points of each plane (ending in the hexadecimal digits FFFE and FFFF). 2161:
The break inhibiting characters are meant to be equivalent to a character sequence wrapped in the Word Joiner U+2060. However, the Word Joiner may be appended before or after any character that would allow a line-break to inhibit such line-breaking.
1949:
Unicode provides a list of characters it deems whitespace characters for interoperability support. Software Implementations and other standards may use the term to denote a slightly different set of characters. For example, Java does not consider
1260:
results vary because of the complicated interplay between fonts and layout engines. Simple text layout engines tend not to synthesize fractions at all, and instead draw the glyphs as a linear sequence as described in the Unicode fallback scheme.
2336:
Since high surrogate values in the range DB80–DBFF always produce values in the Private Use planes, the high surrogate range can be further divided into (normal) high surrogates (D800–DB7F) and "high private use surrogates" (DB80–DBFF).
1931:
When a combining mark is adjacent to a non-combining mark code point, text rendering applications should superimpose the combining mark onto the glyph represented by the other code point to form a grapheme according to a set of rules.
2493:
The Unicode code point is a number also permanently assigned along with the "Name" property and included in the companion UCS. The usual custom is to represent the code point as hexadecimal number with the prefix "U+" in front.
862:. Many mathematics, technical, geometrical and other symbols are included within the UCS. This provides distinct symbols with their own code point or character rather than relying on switching fonts to provide symbolic glyphs. 1174:
The Unicode specification does not require the use of byte order marks in text streams. It further states that they should not be used in situations where some other method of signaling the encoding form is already in use.
2059:
part of the Unicode plain text processing model. These legacy formatting control characters include Tab (U+0009), Line Tabulation or Vertical Tab (U+000B), and Form Feed (U+000C) which is also thought of as a page break.
878:. Unicode designates many of the letterlike symbols as compatibility characters usually because they can be in plain text by substituting glyphs for a composing sequence of characters: for example substituting the glyph 1035:. By far the largest portion of the UCS is devoted to ideographs used in languages of Eastern Asia. While the glyph representation of these ideographs have diverged in the languages that use them, the UCS unifies these 2181:
Both the break inhibiting and break enabling characters participate with other punctuation and whitespace characters to enable text imaging systems to determine line breaks within the Unicode Line Breaking Algorithm.
900:). However, web browsers are not typically that sophisticated with Unicode and text handling. Doing so ensures that precomposed fractions and combining sequence fractions will appear compatible next to each other. 2371:
of the standard later stated that this was leading to "inappropriate over-rejection", clarifying that " are not illegal in interchange nor do they cause ill-formed Unicode text", and removing the original claim.
856:. Along with unifying diacritical marks, the UCS also sought to unify punctuation across scripts. Many scripts also contain punctuation, however, when that punctuation has no similar semantics in other scripts. 1910:) used can depict visual variations of the same character. It is possible that two different graphemes can have the exact same glyph or are visually so close that the average reader cannot tell them apart. 1199: 2190:
All code points given some kind of purpose or use are considered designated code points. Of those, they may be assigned to an abstract character, or otherwise designated for some other purpose.
1171:
The UTF-8 sequence corresponding to U+FEFF is 0xEF, 0xBB, 0xBF. This sequence has no meaning in other Unicode encoding forms, so it may serve to indicate that that stream is encoded as UTF-8.
834:. As of 2024 (Unicode 16.0), the UCS identifies 168 scripts that are, or have been, used throughout of the world. Many more are in various approval stages for future inclusion of the UCS. 2380:
All other code points, being those not designated, are referred to as being reserved. These code points may be assigned for a particular use in future versions of the Unicode standard.
1168:
both 0x00; either the text begins with a null character (U+0000), or the correct encoding is actually UTF-32LE, in which the full 4-byte sequence FF FE 00 00 is one character, the BOM.
2578:
This property indicates the code point of another character whose glyph can serve as the mirrored glyph for the present character when mirroring within the bidirectional algorithm.
1113:, while others do not affect text layout at all, but instead affect the way text strings are collated, matched or otherwise processed. Other special-purpose characters, such as the 2116:
outside the thrust of Unicode's goals. Rather than using different spaces in different contexts, this styling should instead be handled through intelligent text layout software.
1132:
is an example: complex fonts may or may not supply positioning rules in the presence of the fraction slash character to create a fraction, while fonts in simple formats cannot.
1120:
Unicode does not specify the division of labor between font and text layout software (or "engine") when rendering Unicode text. Because the more complex font formats, such as
2439:
for the characters remains identical regardless of where the character appears in a word. This greatly simplifies searching, sorting and other text processing operations.
2430:
style, the letter "i" may take different forms whether it appears at the beginning of a word, the end of a word, the middle of a word or in isolation. Languages such as
1084:. The consortium provides several private use blocks and planes that can be assigned characters within various communities, as well as operating system and font vendors. 2592:
the character while numerals unrelated to Arabic Indic digits such as Roman Numerals or Hanzhou/Suzhou numerals typically have only the "Numeric Value" indicated.
2256:, Plane 15 and Plane 16, each have their own PUAs of 65,534 private-use characters (with the final two code points of each plane being noncharacters). These are 1159:
byte order because 0xFE, 0xFF read as a 16-bit little endian word would be U+FFFE, which is meaningless. The sequence also has no meaning in any arrangement of
450:
16.0, released in September 2024, 299,056 (27%) of these code points are allocated, 155,063 (14%) have been assigned characters, 137,468 (12%) are reserved for
1920:
is an example where a character can be represented by more than one code point. It can be U+00C4, or U+0041U+0308. U+0041 is the familiar A and U+0308 is the
1356:
these circumstances, Unicode includes five other characters to control the complex embedding of left-to-right text within right-to-left text and vice versa:
824:
and subcategory. The general categories are: letter, mark, number, punctuation, symbol, or control (in other words a formatting or non-graphical character).
301: 2340:
Isolated surrogate code points have no general interpretation; consequently, no character code charts or names lists are provided for this range. In the
1935:
The word BÄM would therefore be three graphemes. It may be made up of three code points or more depending on how the characters are actually composed.
2426:
The chief reason for this differentiation was that Unicode makes a distinction between characters and glyphs. For example, when writing English in a
2538:
value. The integer values allow the combining marks to be reordered into a canonical order to make string comparison of identical strings possible.
3616: 619: 3558: 1318:
and assign them a neutral directional type. They acquire direction at render time from adjacent characters. Some of these characters also have a
489: 360: 2414:
letters with a diaeresis in addition to the six code points it uses for the letters without diaeresis: twelve character code points in total.
4681: 1913:
A grapheme is almost always represented by one code point, for example the LATIN CAPITAL LETTER A is represented by only code point U+0041.
3171: 2650:
Indicates the character is ignorable for implementations and that no glyph, last resort glyph, or replacement character need be displayed.
2217:
The UCS includes 137,468 private-use characters, which are code points for private use spread across three different blocks, each called a
415:, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in 1117:, generally have no effect on text rendering, though sophisticated text layout software may choose to subtly adjust spacing around them. 2126:
Ideographic Space (U+3000): behaves as an ideographic separator and generally rendered as white space of the same width as an ideograph.
1193: 4666: 2267:
PUAs are a concept inherited from certain Asian encoding systems. These systems had private use areas to encode what the Japanese call
2360:(U+FEFF). If a stream of text contains this noncharacter, this is a good indication the text has been interpreted with the incorrect 539:, (not a joint project with ISO, but rather a publication of the Unicode Consortium,) provides other implementation details such as: 364: 2290:, they must always appear in pairs, as a high surrogate followed by a low surrogate, thus using 32 bits to denote one code point. 4686: 3476: 2524:
The general category is expressed as a two-letter sequence such as "Lu" for uppercase letter or "Nd", for decimal digit number.
3461: 1023:. Devoted to ideographs and other characters to support languages in China, Japan, Korea (CJK), Taiwan, Vietnam, and Thailand. 183: 2669:
Unicode provides an online database to interactively query the entire Unicode character repertoire by the various properties.
3705: 3384: 3700: 2663:
Unicode never removes characters from the repertoire, but on occasion Unicode has deprecated a small number of characters.
2454:
so on. Below is a list of some of the core properties. There are many others documented in the Unicode Character Database.
767:. This is to help ease the transition for legacy software since the Basic Multilingual Plane is addressable with just two 59: 411:
map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy
40: 1234:
The fraction slash character (U+2044) has special behavior in the Unicode Standard: (section 6.2, Other Punctuation)
561: 3533: 3164: 2683: 2129:
Ogham Space Mark (U+1680): this character is sometimes displayed with a glyph and other times as only white space.
3538: 3453: 3354: 1280:
glyph. But because many of them will not issue instructions to synthesize fractions, a plain text string such as
1163:
encoding, so, in summary, it serves as a fairly reliable indication that the text stream is encoded as UTF-16 in
841: 4015: 3734: 2367:
Versions of the Unicode standard from 3.1.0 to 6.3.0 claimed that noncharacters "should never be interchanged".
2787: 1147:
If the stream's first byte is 0xFE and the second 0xFF, then the stream's text is not likely to be encoded in
400: 3835: 3649: 3633: 3596: 3443: 3379: 3199: 2769: 2402:
modified letters as separate characters that, when rendered, become a single glyph. For example, an "o" with
2341: 424:
UCS has a potential capacity of over 1 million characters. Each UCS character is abstractly represented by a
2805: 1226:. This font supplies the text layout software with instructions to synthesize the fraction according to the 274: 265: 4574: 4444: 3759: 3369: 3138: 2248:
The Basic Multilingual Plane (Plane 0) contains 6,400 private-user characters in the eponymously named PUA
1268:
because placement and shaping of the digits will be tuned to that particular font at that particular size.
1106:
round-trip capabilities, these other special-purpose characters endow plain text with important semantics.
462:, leaving the remaining 815,056 (73%) unallocated. The number of encoded characters is made up as follows: 2282: 4693: 3810: 3621: 3421: 3157: 516:
in ISO/IEC 10646 includes the combination of the code point and its name, Unicode adds many other useful
4373: 3860: 3695: 3690: 3339: 3244: 2678: 2448: 2435:
character sets to UCS and back again without any loss of information crucial for non-Unicode software.
2234: 1925: 1314:
to each character to inform text processors how sequences of characters should be ordered on the page.
1100: 607: 517: 120: 17: 4378: 3754: 4298: 3575: 3284: 1125: 404: 156: 721:. The entity must either be predefined (built into the markup language) or explicitly declared in a 3389: 1434: 722: 283: 230: 165: 147: 138: 4175: 2230: 1322:
property indicating the glyph should be rendered in mirror-image when used in right-to-left text.
4728: 4629: 4549: 3985: 3890: 3239: 3227: 2966: 2904: 2868: 2200: 637: 544: 352: 111: 36: 30: 3065: 4753: 4318: 4065: 3920: 3855: 3570: 3438: 3344: 3303: 1894:
The term "character" is not well defined, and what we are referring to most of the time is the
292: 252: 243: 221: 209: 196: 4514: 4268: 4263: 4140: 3553: 3318: 2403: 2226: 1986: 1921: 384: 376: 368: 174: 4220: 4499: 4413: 4333: 4055: 4020: 3900: 1944: 569: 64: 8: 4544: 4454: 4398: 3764: 3744: 3543: 3528: 3433: 3359: 3349: 2068: 412: 4539: 4645: 4564: 4534: 4504: 4484: 4120: 4100: 3850: 3416: 3293: 3289: 3194: 3141:
Unicode Wiki with all 98884 graphic characters of Unicode 5.0 as gifs, full text search
2253: 1350: 801: 529: 340: 336: 2890: 1198: 874:. These symbols appear like combinations of many common Latin scripts letters such as 4614: 4524: 4509: 4348: 4313: 4125: 3965: 3790: 3601: 3313: 3254: 2389: 2368: 2344:, individual surrogate codes are used to embed undecodable bytes in Unicode strings. 2212: 1982: 1222:
A more elaborate example of fraction slash usage: plain text "4 221⁄225" rendered in
1218: 775: 768: 592: 478: 317: 4040: 771:. The characters outside the first plane usually have very specialized or rare use. 4758: 4717: 4676: 4569: 4519: 4408: 4368: 4293: 4283: 4273: 4145: 4130: 4035: 4010: 3885: 3865: 3725: 3611: 3364: 3323: 596: 536: 525: 482: 388: 380: 129: 3144: 3109: 2551:
Indicates the type of character for applying the Unicode bidirectional algorithm.
396: 4671: 4624: 4609: 4469: 4434: 4429: 4363: 4353: 4303: 4170: 4160: 4155: 4105: 4075: 3945: 3935: 3895: 3795: 3710: 3685: 3374: 3279: 3249: 2357: 2242: 1307: 1303: 1141: 754: 603: 443: 3880: 3091: 3030:
representative glyph. To see the official Unicode representative glyph, see the
16:"Unicode characters" redirects here. For a complete list of UCS characters, see 4579: 4559: 4479: 4459: 4449: 4358: 4205: 4135: 4110: 4090: 4045: 4030: 4005: 3955: 3930: 3875: 3830: 2745: 1036: 996:(chess, checkers, go, dice, dominoes, mahjong, playing cards, and many others). 493: 2926: 2823: 4747: 4599: 4584: 4464: 4403: 4288: 4230: 4225: 4190: 4165: 4115: 4000: 3990: 3840: 3785: 3628: 3426: 3222: 2626: 1223: 1211: 1156: 1005: 840:. The UCS devotes several blocks (over 300 characters) to characters for the 805: 787: 623: 588: 565: 521: 344: 2941: 2273:(rare characters not normally found in fonts) in application-specific ways. 1310:, have right-to-left writing direction. The Unicode specification assigns a 55: 4594: 4328: 4323: 4278: 4180: 4095: 4085: 3995: 3970: 3910: 3825: 3805: 3800: 3780: 3659: 3606: 2703: 392: 3204: 4650: 4489: 4474: 4388: 4258: 4235: 4200: 4070: 4050: 4025: 3845: 3308: 3298: 3010:"Unicode Technical Note #27 — Known Anomalies in Unicode Character Names" 713: 685: 557: 439: 434: 372: 3580: 2847: 4529: 3980: 3870: 3506: 3214: 2634: 2361: 2352:
The unhyphenated term "noncharacter" refers to 66 code points (labeled
2222: 1265: 1164: 797: 550: 512:
of the character: what one might think of as its address. Meanwhile, a
425: 65:
Index of predominant national and selected regional or minority scripts
1151:, since those bytes are invalid in UTF-8. It is also not likely to be 704:
may mix uppercase and lowercase, though uppercase is the usual style.
205: 4393: 4308: 4240: 3975: 3749: 3669: 3664: 3565: 3548: 2630: 2399: 745:
is the case-sensitive name of the entity. The semicolon is required.
584: 576: 504:
will be used interchangeably. However, when a distinction is made, a
87: 2333:
are the numeric values of the high and low surrogates respectively.
1272:
instruct the layout engine to replace a plain text sequence such as
4619: 4589: 4439: 4424: 4419: 4210: 3960: 3940: 3905: 3815: 3654: 3471: 1903: 1895: 1207: 1121: 417: 82: 76: 48: 4060: 3044: 3031: 3009: 2281:
The UCS uses surrogates to address characters outside the initial
1109:
Some special characters can alter the layout of text, such as the
466:
149,641 graphical characters (some of which do not have a visible
4343: 4338: 4215: 4150: 4080: 3820: 3180: 2427: 1227: 677: 509: 447: 442:), used to represent each character within the internal logic of 98: 2724: 4604: 4383: 4195: 4185: 3915: 3501: 3496: 3466: 2987: 2431: 2287: 1516: 1160: 1152: 763:
Most characters are currently assigned to the first plane: the
700:
may be any number of digits and may include leading zeros. The
808:. As a result, the first 128 characters are also identical to 796:
The first 256 code points in the UCS correspond with those of
351:. The Universal Coded Character Set, most commonly called the 4698: 4554: 4494: 3950: 3925: 3491: 3486: 3481: 2269: 1899: 1148: 809: 467: 93: 2746:"FAQ - Private-Use Characters, Noncharacters, and Sentinels" 1144:(BOM) U+FEFF hints at the encoding form and its byte order. 2383: 1907: 725:(DTD). The format is the same as for any entity reference: 629: 564:
algorithm"), where text on the same line may shift between
553:
of characters and character strings for different languages
387:
values. By creating this mapping, the UCS enables computer
23:
Complete list of the characters available on most computers
3149: 3133: 2505: 2407: 1917: 1771:
Variation Selector-17 through -256 (U+E0100–U+E01EF)
1264:
font's instructions, the layout engine can achieve better
2119:
Three other writing-system-specific word separators are:
1194:
Unicode subscripts and superscripts § Fraction slash
1140:
When appearing at the head of a text file or stream, the
636:
numeric character reference refers to a character by its
633: 2824:"UTN #2: A General Method for Rendering Combining Marks" 774:
Each plane corresponds with the value of the one or two
428:, an integer between 0 and 1,114,111 (1,114,112 = 2 + 2 3096:
Unicode Standard Annex #44 — Unicode Character Database
2511:
The representative glyphs are provided in code charts.
1938: 1768:
Variation Selector-1 through -16 (U+FE00–U+FE0F)
1110: 930:
Graphical representations of many control characters.
587:
enter these characters into programs through various
2942:"Non-decodable Bytes in System Character Interfaces" 1499:
Egyptian Hieroglyph Begin Walled Enclosure (U+1343E)
1466:
Egyptian Hieroglyph Insert At Bottom Start (U+13433)
1388: 1976: 1344: 1296: 602:The UCS can be divided in various ways, such as by 3724: 3026:official Unicode representative glyph, but merely 3007: 2788:"Section 4.12: Characters with Unusual Properties" 1776:Tag characters (U+E0001 and U+E0020–U+E007F) 1502:Egyptian Hieroglyph End Walled Enclosure (U+1343F) 1472:Egyptian Hieroglyph Insert At Bottom End (U+13435) 1463:Egyptian Hieroglyph Insert At Top Start (U+13432) 896:) or a composing sequence of characters (such as 4745: 2133: 1757:Mongolian Free Variation Selector Three (U+180D) 1515:Brahmi-derived script dead-character formation ( 620:List of XML and HTML character entity references 3271: 2888: 1469:Egyptian Hieroglyph Insert At Top End (U+13434) 1460:Egyptian Hieroglyph Horizontal Joiner (U+13431) 613: 492:maintains the basic mapping of characters from 474: 349:characters in the Universal Coded Character Set 3090:Davis, Mark; Iancu, Laurențiu; Whistler, Ken. 3089: 3016: 3008:Freytag, Asmus; McGowan, Rick; Whistler, Ken. 1992: 1754:Mongolian Free Variation Selector Two (U+180C) 1751:Mongolian Free Variation Selector One (U+180B) 1490:Egyptian Hieroglyph Insert At Bottom (U+1343B) 1484:Egyptian Hieroglyph Insert At Middle (U+13439) 3165: 3110:"Unicode Utilities: Character Property Index" 1841:Shorthand Format Continuing Overlap (U+1BCA1) 1801:Ideographic Description (U+2FF0–U+2FFB) 1493:Egyptian Hieroglyph Begin Enclosure (U+1343C) 1457:Egyptian Hieroglyph Vertical Joiner (U+13430) 1094: 2260:, which ranges from U+F0000 to U+FFFFD, and 1898:. A grapheme is represented visually by its 1889: 1609:Syloti Nagri Sign Alternate Hasanta (U+A82C) 1475:Egyptian Hieroglyph Overlay Middle (U+13436) 1114: 454:, 2,048 are used to enable the mechanism of 367:10646), is an international standard to map 2417: 2154:Tibetan Mark Delimiter Tsheg Bstar (U+0F0C) 2139:types of line-breaking they make possible. 1546:Malayalam Sign Vertical Bar Virama (U+0D3B) 1496:Egyptian Hieroglyph End Enclosure (U+1343D) 1487:Egyptian Hieroglyph Insert At Top (U+1343A) 1478:Egyptian Hieroglyph Begin Segment (U+13437) 1111:zero-width joiner and zero-width non-joiner 451: 4667:Cultural, political, and religious symbols 3172: 3158: 2848:"UAX #14: Unicode Line Breaking Algorithm" 2264:, which ranges from U+100000 to U+10FFFD. 2252:, which ranges from U+E000 to U+F8FF. The 1654:Tulu-Tigalari Sign Looped Virama (U+113CF) 1400:Interlinear Annotation Terminator (U+FFFB) 1178: 535:In addition to the UCS, the supplementary 2976:. The Unicode Consortium. September 2022. 2927:"Surrogate Support in Microsoft Products" 2914:. The Unicode Consortium. September 2022. 2878:. The Unicode Consortium. September 2022. 2812:. The Unicode Consortium. September 2024. 2794:. The Unicode Consortium. September 2024. 2776:. The Unicode Consortium. September 2024. 2206: 2174:Tibetan Mark Intersyllabic Tsheg (U+0F0B) 1838:Shorthand Format Letter Overlap (U+1BCA0) 1481:Egyptian Hieroglyph End Segment (U+13438) 1397:Interlinear Annotation Separator (U+FFFA) 820:Unicode assigns to every UCS character a 640:/Unicode code point, and uses the format 347:2 jointly collaborate on the list of the 3092:"Table 9. Property Table § PropList.txt" 2939: 2384:Characters, grapheme clusters and glyphs 2293:A surrogate pair denotes the code point 1798:Ideographic variation indicator (U+303E) 1728:Historical Viramas with other functions 1217: 1197: 882:for the composed sequence of characters 711:refers to a character by the name of an 692:must be lowercase in XML documents. The 54: 3200:ISO/IEC 10646 (Universal Character Set) 2442: 2375: 1549:Malayalam Sign Circular Virama (U+0D3C) 717:which has the desired character as its 41:question marks, boxes, or other symbols 4746: 2869:"Section 23.5: Private-Use Characters" 2193: 2185: 1882:Object Replacement Character (U+FFFC) 1690:Zanabazar Square Sign Virama (U+11A34) 1630:Brahmi Sign Old Tamil Virama (U+11070) 1394:Interlinear Annotation Anchor (U+FFF9) 3723: 3153: 3066:"UAX #44: Unicode Character Database" 1864:Activate Arabic Form Shaping (U+206D) 1827:Musical Symbol Begin Phrase (U+1D179) 832:Modern, Historic, and Ancient Scripts 470:, but are still counted as graphical) 407:from one to another. Because it is a 3701:International Components for Unicode 3650:Common Locale Data Repository (CLDR) 3072:. The Unicode Consortium. 2014-06-05 2850:. The Unicode Consortium. 2016-06-01 1861:Inhibit Arabic Form Shaping (U+206C) 1858:Activate Symmetric Swapping (U+206B) 1844:Shorthand Format Down Step (U+1BCA2) 1717:Gurung Khema Sign Tholhoma (U+1612F) 1702:Masaram Gondi Sign Halanta (U+11D44) 1693:Zanabazar Square Subjoiner (U+11A47) 2045: 1939:Whitespace, joiners, and separators 1855:Inhibit Symmetric Swapping (U+206A) 1830:Musical Symbol End Phrase (U+1D17A) 1821:Musical Symbol Begin Slur (U+1D177) 1809:Musical Symbol Begin Beam (U+1D173) 1651:Tulu-Tigalari Sign Virama (U+113CE) 1366:Pop directional formatting (U+202C) 1203: 1129: 356: 13: 4682:Mathematical operators and symbols 2924: 2806:"Section 6.2: General Punctuation" 2770:"Section 2.13: Special Characters" 2390:Unicode § Abstract characters 2123:Mongolian Vowel Separator (U+180E) 2111:Medium Mathematical Space (U+205F) 1906:(often erroneously referred to as 1847:Shorthand Format Up Step (U+1BCA3) 1815:Musical Symbol Begin Tie (U+1D175) 1782:Tifinagh Consonant Joiner (U+2D7F) 1760:Mongolian Vowel Separator (U+180E) 1681:Dives Akuru Sign Halanta (U+1193D) 1606:Syloti Nagri Sign Hasanta (U+A806) 1449:Kaithi Number Sign Above (U+110CD) 1443:Arabic Piastre Mark Above (U+0891) 1404: 1135: 459: 47: 14: 4770: 3127: 1824:Musical Symbol End Slur (U+1D178) 1812:Musical Symbol End Beam (U+1D174) 1687:Nandinagari Sign Virama (U+119E0) 1657:Tulu-Tigalari Conjoiner (U+113D0) 1428:Arabic Number Mark Above (U+0605) 1389:Interlinear annotation characters 1187: 4723: 4722: 4712: 4711: 4694:Phonetic symbols (including IPA) 2684:Unicode compatibility characters 2504: 2480:ensure specification stability. 2398:In addition, Unicode approaches 2347: 2262:Supplementary Private Use Area-B 2258:Supplementary Private Use Area-A 1977:Grapheme joiners and non-joiners 1918:LATIN CAPITAL A WITH DIAERESIS Ä 1852:Deprecated Alternate Formatting 1818:Musical Symbol End Tie (U+1D176) 1561:Thai Character Yamakkan (U+0E4E) 1440:Arabic Pound Mark Above (U+0890) 1384:Pop directional isolate (U+2069) 1363:Right-to-left embedding (U+202B) 1360:Left-to-right embedding (U+202A) 1345:Bidirectional general formatting 1297:Bidirectional neutral formatting 606:, block, character category, or 496:to code point. Often, the terms 3102: 3083: 3058: 3037: 3001: 2980: 2959: 2940:v. Löwis, Martin (2009-04-22). 2933: 2918: 2905:"Section 23.6: Surrogates Area" 2897: 2882: 1740:Meetei Mayek Apun Iyek (U+ABED) 1720:Kirat Rai Sign Virama (U+16D6B) 1699:Bhaiksuki Sign Virama (U+11C3F) 1645:Khudawadi Sign Virama (U+112EA) 1612:Saurashtra Sign Virama (U+A8C4) 1594:Sundanese Sign Pamaaeh (U+1BAA) 1558:Thai Character Phinthu (U+0E3A) 1555:Sinhala Sign Al-Lakuna (U+0DCA) 1522:Devanagari Sign Virama (U+094D) 1419:Arabic Footnote Marker (U+0602) 1372:Right-to-left override (U+202E) 1369:Left-to-right override (U+202D) 1230:rule described in this section. 842:International Phonetic Alphabet 838:International Phonetic Alphabet 455: 383:, and other domains, to unique 3145:Unicode Characters by Property 2889:Michael Everson (2004-01-15). 2861: 2840: 2816: 2798: 2780: 2762: 2738: 2717: 2696: 2157:Narrow no-break space (U+202F) 1885:Replacement Character (U+FFFD) 1867:National Digit Shapes (U+206E) 1748:Mongolian Variation Selectors 1708:Gunjala Gondi Virama (U+11D97) 1705:Masaram Gondi Virama (U+11D45) 1597:Sundanese Sign Virama (U+1BAB) 1588:Tai Tham Sign Ra Haam (U+1A7A) 1576:Hanunoo Sign Pamudpod (U+1734) 1573:Tagalog Sign Pamudpod (U+1715) 1552:Malayalam Sign Virama (U+0D4D) 1510:Brahmi Number Joiner (U+1107F) 1378:Right-to-left isolate (U+2067) 1375:Left-to-right isolate (U+2066) 520:to the character set, such as 1: 3634:International Ideographs Core 3444:International Ideographs Core 3385:Alias names and abbreviations 2967:"Section 23.7: Noncharacters" 2689: 2625:Indicates the character is a 2571:Bidirectional Mirroring Glyph 2276: 2134:Line-break control characters 1870:Nominal Digit Shapes (U+206F) 1731:Tibetan Mark Halanta (U+0F84) 1723:Kirat Rai Sign Saat (U+16D6C) 1666:Siddham Sign Virama (U+115BF) 1663:Tirhuta Sign Virama (U+114C2) 1648:Grantha Sign Virama (U+1134D) 1639:Sharada Sign Virama (U+111C0) 1564:Lao Sign Pali Virama (U+0EBA) 1531:Gujarati Sign Virama (U+0ACD) 1528:Gurmukhi Sign Virama (U+0A4D) 1381:First strong isolate (U+2068) 952:Optical Character Recognition 815: 3856:CJK Unified Ideographs (Han) 3706:People involved with Unicode 2988:"Unicode Character Database" 2946:Python Enhancement Proposals 2148:Non-breaking hyphen (U+2011) 2054:Paragraph Separator (U+2029) 1765:Generic Variation Selectors 1684:Dives Akuru Virama (U+1193E) 1642:Khojki Sign Virama (U+11235) 1633:Kaithi Sign Virama (U+110B9) 1621:Meetei Mayek Virama (U+AAF6) 1585:Tai Tham Sign Sakot (U+1A60) 1570:Tagalog Sign Virama (U+1714) 1567:Myanmar Sign Virama (U+1039) 1543:Kannada Sign Virama (U+0CCD) 1525:Bengali Sign Virama (U+09CD) 1446:Kaithi Number Sign (U+110BD) 1326:process bidirectional text: 614:Character reference overview 421:if the wrong one is chosen. 7: 3179: 2672: 2342:Python programming language 2239:klingon mummification glyph 2090:Three-Per-Em Space (U+2004) 1993:Word joiners and separators 1696:Soyombo Subjoiner (U+11A99) 1678:Dogra Sign Virama (U+11839) 1672:Takri Sign Virama (U+116B6) 1624:Kharoshthi Virama (U+10A3F) 1591:Balinese Adeg Adeg (U+1B44) 1540:Telugu Sign Virama (U+0C4D) 1431:Arabic End of Ayah (U+06DD) 1425:Arabic Sign Samvat (U+0604) 1413:Arabic Number Sign (U+0600) 1336:Right-to-left mark (U+200F) 1333:Left-to-right mark (U+200E) 1330:Arabic letter mark (U+061C) 848:Combining Diacritical Marks 458:, and 66 are designated as 371:, discrete symbols used in 359:UCS, official designation: 10: 4775: 3696:Ideographic Research Group 3691:ConScript Unicode Registry 2891:"Klingon: U+F8D0 - U+F8FF" 2679:ConScript Unicode Registry 2449:Unicode character property 2446: 2387: 2235:ConScript Unicode Registry 2210: 2102:Punctuation Space (U+2008) 2093:Four-Per-Em Space (U+2005) 2066: 1942: 1926:combining diacritical mark 1734:Myanmar Sign Asat (U+103A) 1711:Kawi Sign Killer (U+11F41) 1675:Ahom Sign Killer (U+1172B) 1669:Modi Sign Virama (U+1163F) 1660:Newa Sign Virama (U+11442) 1579:Khmer Sign Viriam (U+17D1) 1537:Tamil Sign Virama (U+0BCD) 1534:Oriya Sign Virama (U+0B4D) 1422:Arabic Sign Safha (U+0603) 1416:Arabic Sign Sanah (U+0601) 1348: 1191: 1101:Unicode control characters 1098: 1095:Special-purpose characters 785: 752: 709:character entity reference 617: 597:virtual character palettes 475:special purpose characters 18:List of Unicode characters 15: 4707: 4659: 4638: 4249: 3773: 3733: 3719: 3678: 3642: 3589: 3576:Regional indicator symbol 3519: 3452: 3409: 3402: 3332: 3285:Combining grapheme joiner 3270: 3263: 3213: 3187: 2590: 2177:Zero-width space (U+200B) 2096:Six-Per-Em Space (U+2006) 2062: 1890:Characters vs code points 1876: 1835:Shorthand Format Control 1790:Ogham Space Mark (U+1680) 1618:Javanese Pangkon (U+A9C0) 1603:Batak Panongonan (U+1BF3) 1582:Khmer Sign Coeng (U+17D2) 1341:Bidirectional Algorithm. 1126:Apple Advanced Typography 800:, the most popular 8-bit 781: 748: 560:bidirectional text ("the 543:mappings between UCS and 314: 70: 63: 4729:Category: Unicode blocks 3534:Compatibility characters 3112:. The Unicode Consortium 3047:. The Unicode Consortium 2990:. The Unicode Consortium 2727:. The Unicode Consortium 2706:. The Unicode Consortium 2418:Compatibility characters 2283:Basic Multilingual Plane 2151:No-break space (U+00A0) 2024:Zero Width Space U+200B 1743:Chakma Maayyaa (U+11134) 1737:Limbu Sign Sa-I (U+193B) 1714:Kawi Conjoiner (U+11F42) 1519:and similar diacritics) 1435:Syriac Abbreviation Mark 1410:Prefixed format control 1064:Compatibility Characters 765:Basic Multilingual Plane 723:Document Type Definition 591:, for example, physical 3454:Comparison of encodings 3380:Halfwidth and fullwidth 3235:Universal Character Set 3070:General Category Values 3045:"Character Code Charts" 2354:<not a character> 2201:information interchange 2051:Line Separator (U+2028) 1806:Musical Format Control 1636:Chakma Virama (U+11133) 1627:Brahmi Virama (U+11046) 1600:Batak Pangolat (U+1BF2) 1179:Mathematical invisibles 1115:mathematical invisibles 1013:Yijing Hexagram Symbols 982:Symbols and Pictographs 638:Universal Character Set 353:Universal Character Set 4379:Inscriptional Parthian 4066:Nyiakeng Puachue Hmong 3728:and symbols in Unicode 3345:CJK Unified Ideographs 2704:"The Unicode Standard" 2557:Bidirectional Mirrored 2544:Bidirectional Category 2476:LATIN CAPITAL LETTER A 2207:Private-use characters 1922:COMBINING DIAERESIS ̈ 1615:Rejang Virama (U+A953) 1257: 1231: 1215: 332: 52: 29:This article contains 4515:Old Persian cuneiform 4374:Inscriptional Pahlavi 4269:Ancient North Arabian 4264:Anatolian hieroglyphs 3554:Precomposed character 3390:Whitespace characters 3319:Zero-width non-joiner 3098:. Unicode Consortium. 3012:. Unicode Consortium. 2725:"Roadmaps to Unicode" 2388:Further information: 2099:Figure Space (U+2007) 2067:Further information: 2036:No-Break Space U+00A0 1987:zero-width non-joiner 1454:Egyptian Hieroglyphs 1349:Further information: 1276:with the precomposed 1236: 1221: 1201: 970:Miscellaneous Symbols 684:is the code point in 676:is the code point in 385:machine-readable data 58: 51: 4334:Egyptian hieroglyphs 3539:Duplicate characters 3355:Duplicate characters 2974:The Unicode Standard 2912:The Unicode Standard 2876:The Unicode Standard 2810:The Unicode Standard 2792:The Unicode Standard 2774:The Unicode Standard 2500:Representative Glyph 2443:Character properties 2376:Reserved code points 2237:'s use of U+F8FF as 2171:Soft hyphen (U+00AD) 2009:No baseline advance 1971:<control-0085> 1945:Whitespace character 1251:+ 4 is displayed as 1027:Radicals and Strokes 545:other character sets 4399:Khitan small script 3836:Canadian Aboriginal 3571:Variation sequences 3529:Combining character 3439:Variation sequences 3350:Combining character 2584:Decimal Digit Value 2194:Assigned characters 2186:Types of code point 2108:Hair Space (U+200A) 2105:Thin Space (U+2009) 2069:Space (punctuation) 2039:Word Joiner U+2060 1284:may well render as 1052:Duployan shorthands 413:character encodings 318:Featural-alphabetic 4639:Notational scripts 4590:Tagalog (Baybayin) 4299:Caucasian Albanian 3622:numeric references 3597:Domain names (IDN) 3417:Bidirectional text 3294:Right-to-left mark 3290:Left-to-right mark 3245:Character property 3195:Unicode Consortium 3134:Unicode Consortium 2254:Private Use Planes 2084:En Space (U+2002) 2030:Inhibit line-break 1351:Bidirectional text 1232: 1216: 1070:Control Characters 1058:Sutton SignWriting 1040:Unihan ideographs. 988:Alchemical Symbols 802:character encoding 776:hexadecimal digits 608:character property 583:Computer software 341:ISO/IEC JTC 1/SC 2 337:Unicode Consortium 333: 322:    302:Canadian syllabics 53: 31:special characters 4741: 4740: 4737: 4736: 4718:Category: Unicode 3755:Punctuation marks 3737:inherited scripts 3643:Related standards 3617:entity references 3515: 3514: 3398: 3397: 3314:Zero-width joiner 3139:decodeunicode.org 2925:Kaplan, Michael. 2667: 2666: 2643:Default Ignorable 2534:Not_Reordered (0) 2213:Private Use Areas 2087:Em Space (U+2003) 2043: 2042: 1983:zero-width joiner 1250: 1246: 556:an algorithm for 330: 329: 309: 308: 37:rendering support 4766: 4726: 4725: 4715: 4714: 4677:Control Pictures 4630:Zanabazar Square 4369:Imperial Aramaic 4252:historic scripts 3721: 3720: 3581:Emoji skin color 3407: 3406: 3324:Zero-width space 3268: 3267: 3255:Private Use Area 3240:Character charts 3174: 3167: 3160: 3151: 3150: 3121: 3120: 3118: 3117: 3106: 3100: 3099: 3087: 3081: 3080: 3078: 3077: 3062: 3056: 3055: 3053: 3052: 3041: 3035: 3020: 3014: 3013: 3005: 2999: 2998: 2996: 2995: 2984: 2978: 2977: 2971: 2963: 2957: 2956: 2954: 2953: 2937: 2931: 2930: 2922: 2916: 2915: 2909: 2901: 2895: 2894: 2886: 2880: 2879: 2873: 2865: 2859: 2858: 2856: 2855: 2844: 2838: 2837: 2835: 2834: 2820: 2814: 2813: 2802: 2796: 2795: 2784: 2778: 2777: 2766: 2760: 2759: 2757: 2756: 2742: 2736: 2735: 2733: 2732: 2721: 2715: 2714: 2712: 2711: 2700: 2521:Uppercase_Letter 2517:General Category 2508: 2457: 2456: 2355: 2250:Private Use Area 2240: 2219:Private Use Area 2143:Break inhibiting 2081:Em Quad (U+2001) 2078:En Quad (U+2000) 2046:Other separators 2015:Allow line-break 2006:Baseline advance 2001: 2000: 1972: 1969: 1967: 1962: 1959: 1956: 1954: 1312:directional type 1291: 1287: 1283: 1279: 1275: 1254: 1248: 1245:ZERO WIDTH SPACE 1244: 1242: 1046:Musical Notation 946:Braille Patterns 928:Control Pictures 922:Legacy Computing 916:Geometric Shapes 899: 895: 885: 881: 877: 822:general category 737: 731: 719:replacement text 668: 662: 652: 646: 537:Unicode Standard 446:software. As of 438: 389:software vendors 373:natural language 358: 325: 323: 299: 290: 281: 272: 263: 250: 241: 228: 219: 214: 203: 194: 181: 172: 163: 154: 145: 136: 127: 118: 109: 73: 72: 61: 60: 4774: 4773: 4769: 4768: 4767: 4765: 4764: 4763: 4744: 4743: 4742: 4733: 4703: 4687:List by subject 4660:Symbols, emojis 4655: 4634: 4550:Psalter Pahlavi 4251: 4245: 4106:Pracalit (Newa) 3921:Hanifi Rohingya 3769: 3745:Combining marks 3736: 3729: 3715: 3711:Han unification 3674: 3638: 3585: 3521: 3511: 3448: 3394: 3328: 3272:Special purpose 3259: 3209: 3183: 3178: 3130: 3125: 3124: 3115: 3113: 3108: 3107: 3103: 3088: 3084: 3075: 3073: 3064: 3063: 3059: 3050: 3048: 3043: 3042: 3038: 3021: 3017: 3006: 3002: 2993: 2991: 2986: 2985: 2981: 2969: 2965: 2964: 2960: 2951: 2949: 2938: 2934: 2923: 2919: 2907: 2903: 2902: 2898: 2887: 2883: 2871: 2867: 2866: 2862: 2853: 2851: 2846: 2845: 2841: 2832: 2830: 2828:www.unicode.org 2822: 2821: 2817: 2804: 2803: 2799: 2786: 2785: 2781: 2768: 2767: 2763: 2754: 2752: 2750:www.unicode.org 2744: 2743: 2739: 2730: 2728: 2723: 2722: 2718: 2709: 2707: 2702: 2701: 2697: 2692: 2675: 2530:Combining Class 2451: 2445: 2420: 2392: 2386: 2378: 2358:byte order mark 2353: 2350: 2320: 2312: 2308: 2300: 2279: 2238: 2215: 2209: 2203:to take place. 2196: 2188: 2136: 2071: 2065: 2048: 2032: 2017: 1995: 1979: 1970: 1965: 1964: 1960: 1957: 1952: 1951: 1947: 1941: 1892: 1879: 1407: 1405:Script-specific 1391: 1353: 1347: 1299: 1289: 1285: 1281: 1277: 1273: 1252: 1240: 1196: 1190: 1181: 1142:byte order mark 1138: 1136:Byte order mark 1103: 1097: 897: 893: 883: 879: 875: 827:Types include: 818: 790: 784: 757: 755:Plane (Unicode) 751: 735: 729: 707:In contrast, a 666: 660: 650: 644: 626: 616: 444:text processing 433: 395:, and transmit— 331: 326: 321: 316: 310: 305: 304: 297: 295: 288: 286: 279: 277: 270: 268: 261: 256: 255: 248: 246: 239: 234: 233: 226: 224: 217: 215: 212: 201: 199: 192: 187: 186: 179: 177: 170: 168: 161: 159: 152: 150: 143: 141: 134: 132: 125: 123: 116: 114: 107: 90: 85: 66: 46: 45: 44: 35:Without proper 24: 21: 12: 11: 5: 4772: 4762: 4761: 4756: 4739: 4738: 4735: 4734: 4732: 4731: 4720: 4708: 4705: 4704: 4702: 4701: 4696: 4691: 4690: 4689: 4679: 4674: 4669: 4663: 4661: 4657: 4656: 4654: 4653: 4648: 4642: 4640: 4636: 4635: 4633: 4632: 4627: 4622: 4617: 4612: 4607: 4602: 4597: 4592: 4587: 4582: 4577: 4572: 4567: 4562: 4557: 4552: 4547: 4542: 4537: 4532: 4527: 4522: 4517: 4512: 4507: 4502: 4497: 4492: 4487: 4482: 4477: 4472: 4467: 4462: 4457: 4452: 4447: 4442: 4437: 4432: 4427: 4422: 4417: 4411: 4406: 4401: 4396: 4391: 4386: 4381: 4376: 4371: 4366: 4361: 4356: 4351: 4346: 4341: 4336: 4331: 4326: 4321: 4316: 4311: 4306: 4301: 4296: 4291: 4286: 4281: 4276: 4271: 4266: 4261: 4255: 4253: 4247: 4246: 4244: 4243: 4238: 4233: 4228: 4223: 4218: 4213: 4208: 4203: 4198: 4193: 4188: 4183: 4178: 4173: 4168: 4163: 4158: 4153: 4148: 4143: 4141:Sorang Sompeng 4138: 4133: 4128: 4123: 4118: 4113: 4108: 4103: 4098: 4093: 4088: 4083: 4078: 4073: 4068: 4063: 4058: 4053: 4048: 4043: 4038: 4033: 4031:Miao (Pollard) 4028: 4023: 4018: 4013: 4008: 4003: 3998: 3993: 3988: 3983: 3978: 3973: 3968: 3963: 3958: 3953: 3948: 3943: 3938: 3933: 3928: 3923: 3918: 3913: 3908: 3903: 3898: 3893: 3888: 3883: 3878: 3873: 3868: 3863: 3858: 3853: 3848: 3843: 3838: 3833: 3828: 3823: 3818: 3813: 3808: 3803: 3798: 3793: 3788: 3783: 3777: 3775: 3774:Modern scripts 3771: 3770: 3768: 3767: 3762: 3757: 3752: 3747: 3741: 3739: 3731: 3730: 3717: 3716: 3714: 3713: 3708: 3703: 3698: 3693: 3688: 3682: 3680: 3679:Related topics 3676: 3675: 3673: 3672: 3667: 3662: 3657: 3652: 3646: 3644: 3640: 3639: 3637: 3636: 3631: 3626: 3625: 3624: 3619: 3609: 3604: 3599: 3593: 3591: 3587: 3586: 3584: 3583: 3578: 3573: 3568: 3563: 3562: 3561: 3551: 3546: 3541: 3536: 3531: 3525: 3523: 3517: 3516: 3513: 3512: 3510: 3509: 3504: 3499: 3494: 3489: 3484: 3479: 3474: 3469: 3464: 3458: 3456: 3450: 3449: 3447: 3446: 3441: 3436: 3431: 3430: 3429: 3419: 3413: 3411: 3404: 3400: 3399: 3396: 3395: 3393: 3392: 3387: 3382: 3377: 3372: 3367: 3362: 3357: 3352: 3347: 3342: 3336: 3334: 3330: 3329: 3327: 3326: 3321: 3316: 3311: 3306: 3301: 3296: 3287: 3282: 3276: 3274: 3265: 3261: 3260: 3258: 3257: 3252: 3247: 3242: 3237: 3232: 3231: 3230: 3219: 3217: 3211: 3210: 3208: 3207: 3202: 3197: 3191: 3189: 3185: 3184: 3177: 3176: 3169: 3162: 3154: 3148: 3147: 3142: 3136: 3129: 3128:External links 3126: 3123: 3122: 3101: 3082: 3057: 3036: 3015: 3000: 2979: 2958: 2932: 2917: 2896: 2881: 2860: 2839: 2815: 2797: 2779: 2761: 2737: 2716: 2694: 2693: 2691: 2688: 2687: 2686: 2681: 2674: 2671: 2665: 2664: 2661: 2658: 2652: 2651: 2648: 2645: 2639: 2638: 2623: 2620: 2614: 2613: 2610: 2604: 2603: 2600: 2594: 2593: 2589: 2586: 2580: 2579: 2576: 2573: 2567: 2566: 2562: 2559: 2553: 2552: 2549: 2546: 2540: 2539: 2535: 2532: 2526: 2525: 2522: 2519: 2513: 2512: 2509: 2502: 2496: 2495: 2491: 2488: 2482: 2481: 2477: 2474: 2468: 2467: 2464: 2461: 2447:Main article: 2444: 2441: 2419: 2416: 2385: 2382: 2377: 2374: 2369:Corrigendum #9 2349: 2346: 2323: 2322: 2318: 2310: 2306: 2298: 2278: 2275: 2243:Klingon script 2231:the Apple logo 2211:Main article: 2208: 2205: 2195: 2192: 2187: 2184: 2179: 2178: 2175: 2172: 2168: 2167: 2166:Break enabling 2159: 2158: 2155: 2152: 2149: 2145: 2144: 2135: 2132: 2131: 2130: 2127: 2124: 2113: 2112: 2109: 2106: 2103: 2100: 2097: 2094: 2091: 2088: 2085: 2082: 2079: 2064: 2061: 2056: 2055: 2052: 2047: 2044: 2041: 2040: 2037: 2034: 2026: 2025: 2022: 2019: 2011: 2010: 2007: 2004: 1994: 1991: 1978: 1975: 1961:NO-BREAK SPACE 1943:Main article: 1940: 1937: 1891: 1888: 1887: 1886: 1883: 1878: 1875: 1874: 1873: 1872: 1871: 1868: 1865: 1862: 1859: 1856: 1850: 1849: 1848: 1845: 1842: 1839: 1833: 1832: 1831: 1828: 1825: 1822: 1819: 1816: 1813: 1810: 1804: 1803: 1802: 1799: 1793: 1792: 1791: 1785: 1784: 1783: 1777: 1774: 1773: 1772: 1769: 1763: 1762: 1761: 1758: 1755: 1752: 1746: 1745: 1744: 1741: 1738: 1735: 1732: 1726: 1725: 1724: 1721: 1718: 1715: 1712: 1709: 1706: 1703: 1700: 1697: 1694: 1691: 1688: 1685: 1682: 1679: 1676: 1673: 1670: 1667: 1664: 1661: 1658: 1655: 1652: 1649: 1646: 1643: 1640: 1637: 1634: 1631: 1628: 1625: 1622: 1619: 1616: 1613: 1610: 1607: 1604: 1601: 1598: 1595: 1592: 1589: 1586: 1583: 1580: 1577: 1574: 1571: 1568: 1565: 1562: 1559: 1556: 1553: 1550: 1547: 1544: 1541: 1538: 1535: 1532: 1529: 1526: 1523: 1513: 1512: 1511: 1505: 1504: 1503: 1500: 1497: 1494: 1491: 1488: 1485: 1482: 1479: 1476: 1473: 1470: 1467: 1464: 1461: 1458: 1452: 1451: 1450: 1447: 1444: 1441: 1438: 1432: 1429: 1426: 1423: 1420: 1417: 1414: 1406: 1403: 1402: 1401: 1398: 1395: 1390: 1387: 1386: 1385: 1382: 1379: 1376: 1373: 1370: 1367: 1364: 1361: 1346: 1343: 1338: 1337: 1334: 1331: 1298: 1295: 1249:FRACTION SLASH 1224:Apple Chancery 1212:Apple Chancery 1204:fraction slash 1192:Main article: 1189: 1188:Fraction slash 1186: 1180: 1177: 1137: 1134: 1130:fraction slash 1096: 1093: 1092: 1091: 1085: 1079: 1073: 1067: 1061: 1055: 1049: 1043: 1042: 1041: 1037:Han characters 1030: 1018: 1017: 1016: 1010: 1002: 997: 991: 985: 979: 973: 967: 961: 955: 949: 943: 940:Block Elements 937: 931: 925: 919: 913: 907: 901: 887: 869: 857: 851: 845: 835: 817: 814: 786:Main article: 783: 780: 753:Main article: 750: 747: 739: 738: 670: 669: 654: 653: 615: 612: 581: 580: 573: 554: 547: 530:directionality 508:refers to the 494:character name 487: 486: 471: 328: 327: 315: 312: 311: 307: 306: 296: 287: 278: 269: 260: 259: 257: 247: 238: 237: 235: 225: 216: 200: 191: 190: 188: 178: 169: 160: 151: 142: 133: 124: 115: 106: 105: 102: 101: 96: 91: 81: 79: 71: 68: 67: 39:, you may see 27: 26: 25: 22: 9: 6: 4: 3: 2: 4771: 4760: 4757: 4755: 4754:IEC standards 4752: 4751: 4749: 4730: 4721: 4719: 4710: 4709: 4706: 4700: 4697: 4695: 4692: 4688: 4685: 4684: 4683: 4680: 4678: 4675: 4673: 4670: 4668: 4665: 4664: 4662: 4658: 4652: 4649: 4647: 4644: 4643: 4641: 4637: 4631: 4628: 4626: 4623: 4621: 4618: 4616: 4613: 4611: 4610:Tulu Tigalari 4608: 4606: 4603: 4601: 4598: 4596: 4593: 4591: 4588: 4586: 4585:Sylheti Nagri 4583: 4581: 4578: 4576: 4575:South Arabian 4573: 4571: 4568: 4566: 4563: 4561: 4558: 4556: 4553: 4551: 4548: 4546: 4543: 4541: 4538: 4536: 4533: 4531: 4528: 4526: 4523: 4521: 4518: 4516: 4513: 4511: 4508: 4506: 4503: 4501: 4500:Old Hungarian 4498: 4496: 4493: 4491: 4488: 4486: 4483: 4481: 4478: 4476: 4473: 4471: 4468: 4466: 4463: 4461: 4458: 4456: 4453: 4451: 4448: 4446: 4443: 4441: 4438: 4436: 4433: 4431: 4428: 4426: 4423: 4421: 4418: 4415: 4412: 4410: 4407: 4405: 4402: 4400: 4397: 4395: 4392: 4390: 4387: 4385: 4382: 4380: 4377: 4375: 4372: 4370: 4367: 4365: 4362: 4360: 4357: 4355: 4352: 4350: 4347: 4345: 4342: 4340: 4337: 4335: 4332: 4330: 4327: 4325: 4322: 4320: 4317: 4315: 4312: 4310: 4307: 4305: 4302: 4300: 4297: 4295: 4292: 4290: 4287: 4285: 4282: 4280: 4277: 4275: 4272: 4270: 4267: 4265: 4262: 4260: 4257: 4256: 4254: 4248: 4242: 4239: 4237: 4234: 4232: 4229: 4227: 4224: 4222: 4219: 4217: 4214: 4212: 4209: 4207: 4204: 4202: 4199: 4197: 4194: 4192: 4189: 4187: 4184: 4182: 4179: 4177: 4174: 4172: 4169: 4167: 4164: 4162: 4159: 4157: 4154: 4152: 4149: 4147: 4144: 4142: 4139: 4137: 4134: 4132: 4129: 4127: 4124: 4122: 4119: 4117: 4114: 4112: 4109: 4107: 4104: 4102: 4099: 4097: 4094: 4092: 4089: 4087: 4084: 4082: 4079: 4077: 4074: 4072: 4069: 4067: 4064: 4062: 4059: 4057: 4054: 4052: 4049: 4047: 4044: 4042: 4039: 4037: 4034: 4032: 4029: 4027: 4024: 4022: 4021:Mende Kikakui 4019: 4017: 4016:Masaram Gondi 4014: 4012: 4009: 4007: 4004: 4002: 4001:Lisu (Fraser) 3999: 3997: 3994: 3992: 3989: 3987: 3984: 3982: 3979: 3977: 3974: 3972: 3969: 3967: 3964: 3962: 3959: 3957: 3954: 3952: 3949: 3947: 3944: 3942: 3939: 3937: 3934: 3932: 3929: 3927: 3924: 3922: 3919: 3917: 3914: 3912: 3909: 3907: 3904: 3902: 3901:Gunjala Gondi 3899: 3897: 3894: 3892: 3889: 3887: 3884: 3882: 3879: 3877: 3874: 3872: 3869: 3867: 3864: 3862: 3859: 3857: 3854: 3852: 3849: 3847: 3844: 3842: 3839: 3837: 3834: 3832: 3829: 3827: 3824: 3822: 3819: 3817: 3814: 3812: 3809: 3807: 3804: 3802: 3799: 3797: 3794: 3792: 3789: 3787: 3784: 3782: 3779: 3778: 3776: 3772: 3766: 3763: 3761: 3758: 3756: 3753: 3751: 3748: 3746: 3743: 3742: 3740: 3738: 3732: 3727: 3722: 3718: 3712: 3709: 3707: 3704: 3702: 3699: 3697: 3694: 3692: 3689: 3687: 3684: 3683: 3681: 3677: 3671: 3668: 3666: 3663: 3661: 3658: 3656: 3653: 3651: 3648: 3647: 3645: 3641: 3635: 3632: 3630: 3627: 3623: 3620: 3618: 3615: 3614: 3613: 3610: 3608: 3605: 3603: 3600: 3598: 3595: 3594: 3592: 3588: 3582: 3579: 3577: 3574: 3572: 3569: 3567: 3564: 3560: 3557: 3556: 3555: 3552: 3550: 3547: 3545: 3542: 3540: 3537: 3535: 3532: 3530: 3527: 3526: 3524: 3518: 3508: 3505: 3503: 3500: 3498: 3495: 3493: 3490: 3488: 3485: 3483: 3480: 3478: 3475: 3473: 3470: 3468: 3465: 3463: 3460: 3459: 3457: 3455: 3451: 3445: 3442: 3440: 3437: 3435: 3432: 3428: 3427:ISO/IEC 14651 3425: 3424: 3423: 3420: 3418: 3415: 3414: 3412: 3408: 3405: 3401: 3391: 3388: 3386: 3383: 3381: 3378: 3376: 3373: 3371: 3368: 3366: 3363: 3361: 3358: 3356: 3353: 3351: 3348: 3346: 3343: 3341: 3338: 3337: 3335: 3331: 3325: 3322: 3320: 3317: 3315: 3312: 3310: 3307: 3305: 3302: 3300: 3297: 3295: 3291: 3288: 3286: 3283: 3281: 3278: 3277: 3275: 3273: 3269: 3266: 3262: 3256: 3253: 3251: 3248: 3246: 3243: 3241: 3238: 3236: 3233: 3229: 3226: 3225: 3224: 3221: 3220: 3218: 3216: 3212: 3206: 3203: 3201: 3198: 3196: 3193: 3192: 3190: 3186: 3182: 3175: 3170: 3168: 3163: 3161: 3156: 3155: 3152: 3146: 3143: 3140: 3137: 3135: 3132: 3131: 3111: 3105: 3097: 3093: 3086: 3071: 3067: 3061: 3046: 3040: 3033: 3029: 3025: 3019: 3011: 3004: 2989: 2983: 2975: 2968: 2962: 2947: 2943: 2936: 2928: 2921: 2913: 2906: 2900: 2892: 2885: 2877: 2870: 2864: 2849: 2843: 2829: 2825: 2819: 2811: 2807: 2801: 2793: 2789: 2783: 2775: 2771: 2765: 2751: 2747: 2741: 2726: 2720: 2705: 2699: 2695: 2685: 2682: 2680: 2677: 2676: 2670: 2662: 2659: 2657: 2654: 2653: 2649: 2646: 2644: 2641: 2640: 2636: 2632: 2628: 2627:CJK ideograph 2624: 2621: 2619: 2616: 2615: 2611: 2609: 2608:Numeric Value 2606: 2605: 2601: 2599: 2596: 2595: 2587: 2585: 2582: 2581: 2577: 2574: 2572: 2569: 2568: 2563: 2560: 2558: 2555: 2554: 2550: 2548:Left_To_Right 2547: 2545: 2542: 2541: 2536: 2533: 2531: 2528: 2527: 2523: 2520: 2518: 2515: 2514: 2510: 2507: 2503: 2501: 2498: 2497: 2492: 2489: 2487: 2484: 2483: 2478: 2475: 2473: 2470: 2469: 2465: 2462: 2459: 2458: 2455: 2450: 2440: 2436: 2433: 2429: 2424: 2415: 2413: 2409: 2405: 2401: 2396: 2391: 2381: 2373: 2370: 2365: 2363: 2359: 2348:Noncharacters 2345: 2343: 2338: 2334: 2332: 2328: 2316: 2304: 2296: 2295: 2294: 2291: 2289: 2284: 2274: 2272: 2271: 2265: 2263: 2259: 2255: 2251: 2246: 2244: 2236: 2233:, versus the 2232: 2228: 2224: 2220: 2214: 2204: 2202: 2191: 2183: 2176: 2173: 2170: 2169: 2165: 2164: 2163: 2156: 2153: 2150: 2147: 2146: 2142: 2141: 2140: 2128: 2125: 2122: 2121: 2120: 2117: 2110: 2107: 2104: 2101: 2098: 2095: 2092: 2089: 2086: 2083: 2080: 2077: 2076: 2075: 2070: 2060: 2053: 2050: 2049: 2038: 2035: 2031: 2028: 2027: 2023: 2020: 2016: 2013: 2012: 2008: 2005: 2003: 2002: 1999: 1990: 1988: 1985:(U+200D) and 1984: 1974: 1946: 1936: 1933: 1929: 1927: 1923: 1919: 1916:The grapheme 1914: 1911: 1909: 1905: 1901: 1897: 1884: 1881: 1880: 1869: 1866: 1863: 1860: 1857: 1854: 1853: 1851: 1846: 1843: 1840: 1837: 1836: 1834: 1829: 1826: 1823: 1820: 1817: 1814: 1811: 1808: 1807: 1805: 1800: 1797: 1796: 1794: 1789: 1788: 1786: 1781: 1780: 1778: 1775: 1770: 1767: 1766: 1764: 1759: 1756: 1753: 1750: 1749: 1747: 1742: 1739: 1736: 1733: 1730: 1729: 1727: 1722: 1719: 1716: 1713: 1710: 1707: 1704: 1701: 1698: 1695: 1692: 1689: 1686: 1683: 1680: 1677: 1674: 1671: 1668: 1665: 1662: 1659: 1656: 1653: 1650: 1647: 1644: 1641: 1638: 1635: 1632: 1629: 1626: 1623: 1620: 1617: 1614: 1611: 1608: 1605: 1602: 1599: 1596: 1593: 1590: 1587: 1584: 1581: 1578: 1575: 1572: 1569: 1566: 1563: 1560: 1557: 1554: 1551: 1548: 1545: 1542: 1539: 1536: 1533: 1530: 1527: 1524: 1521: 1520: 1518: 1514: 1509: 1508: 1506: 1501: 1498: 1495: 1492: 1489: 1486: 1483: 1480: 1477: 1474: 1471: 1468: 1465: 1462: 1459: 1456: 1455: 1453: 1448: 1445: 1442: 1439: 1436: 1433: 1430: 1427: 1424: 1421: 1418: 1415: 1412: 1411: 1409: 1408: 1399: 1396: 1393: 1392: 1383: 1380: 1377: 1374: 1371: 1368: 1365: 1362: 1359: 1358: 1357: 1352: 1342: 1335: 1332: 1329: 1328: 1327: 1323: 1321: 1320:bidi-mirrored 1315: 1313: 1309: 1305: 1294: 1269: 1267: 1261: 1256: 1235: 1229: 1225: 1220: 1213: 1209: 1205: 1200: 1195: 1185: 1176: 1172: 1169: 1166: 1162: 1158: 1157:little-endian 1154: 1150: 1145: 1143: 1133: 1131: 1127: 1123: 1118: 1116: 1112: 1107: 1102: 1089: 1088:Noncharacters 1086: 1083: 1080: 1077: 1074: 1071: 1068: 1065: 1062: 1059: 1056: 1053: 1050: 1047: 1044: 1038: 1034: 1031: 1028: 1025: 1024: 1022: 1019: 1014: 1011: 1008: 1007: 1006:Tai Xuan Jing 1003: 1001: 1000:Chess Symbols 998: 995: 992: 989: 986: 983: 980: 977: 974: 971: 968: 965: 962: 959: 956: 953: 950: 947: 944: 941: 938: 935: 932: 929: 926: 923: 920: 917: 914: 911: 908: 905: 902: 891: 888: 873: 870: 867: 864: 863: 861: 858: 855: 852: 849: 846: 843: 839: 836: 833: 830: 829: 828: 825: 823: 813: 811: 807: 806:Western world 803: 799: 794: 789: 788:Unicode block 779: 777: 772: 770: 766: 761: 756: 746: 744: 734: 728: 727: 726: 724: 720: 716: 715: 710: 705: 703: 699: 695: 691: 687: 683: 679: 675: 665: 659: 658: 657: 649: 643: 642: 641: 639: 635: 631: 625: 624:Unicode input 621: 611: 609: 605: 600: 598: 594: 590: 589:input methods 586: 578: 574: 571: 570:right-to-left 567: 566:left-to-right 563: 559: 555: 552: 548: 546: 542: 541: 540: 538: 533: 531: 527: 523: 519: 515: 511: 507: 503: 499: 495: 491: 484: 480: 476: 472: 469: 465: 464: 463: 461: 460:noncharacters 457: 453: 449: 445: 441: 436: 431: 427: 422: 420: 419: 414: 410: 406: 402: 398: 394: 390: 386: 382: 378: 374: 370: 366: 362: 354: 350: 346: 342: 338: 319: 313: 303: 294: 285: 276: 267: 258: 254: 245: 236: 232: 223: 211: 207: 198: 189: 185: 176: 167: 158: 149: 140: 131: 122: 113: 104: 103: 100: 97: 95: 92: 89: 84: 80: 78: 75: 74: 69: 62: 57: 50: 42: 38: 34: 32: 19: 4465:Meetei Mayek 4416:(Chorasmian) 4319:Cypro-Minoan 4096:Pahawh Hmong 3911:Gurung Khema 3660:ISO/IEC 8859 3502:UTF-32/UCS-4 3497:UTF-16/UCS-2 3304:Variant form 3234: 3114:. Retrieved 3104: 3095: 3085: 3074:. Retrieved 3069: 3060: 3049:. Retrieved 3039: 3027: 3023: 3018: 3003: 2992:. Retrieved 2982: 2973: 2961: 2950:. Retrieved 2945: 2935: 2920: 2911: 2899: 2884: 2875: 2863: 2852:. Retrieved 2842: 2831:. Retrieved 2827: 2818: 2809: 2800: 2791: 2782: 2773: 2764: 2753:. Retrieved 2749: 2740: 2729:. Retrieved 2719: 2708:. Retrieved 2698: 2668: 2655: 2642: 2617: 2607: 2597: 2583: 2570: 2556: 2543: 2529: 2516: 2499: 2485: 2471: 2452: 2437: 2425: 2423:characters. 2421: 2411: 2397: 2393: 2379: 2366: 2351: 2339: 2335: 2330: 2326: 2324: 2314: 2302: 2292: 2286:planes). In 2280: 2268: 2266: 2261: 2257: 2249: 2247: 2218: 2216: 2197: 2189: 2180: 2160: 2137: 2118: 2114: 2072: 2057: 2029: 2021:Space U+0020 2018:(Separators) 2014: 1996: 1980: 1948: 1934: 1930: 1915: 1912: 1893: 1795:Ideographic 1354: 1339: 1324: 1319: 1316: 1311: 1300: 1270: 1262: 1258: 1237: 1233: 1182: 1173: 1170: 1146: 1139: 1119: 1108: 1104: 1087: 1081: 1075: 1069: 1063: 1057: 1051: 1045: 1032: 1026: 1020: 1012: 1004: 999: 993: 987: 981: 975: 969: 963: 957: 951: 945: 939: 933: 927: 921: 915: 910:Mathematical 909: 903: 890:Number Forms 889: 871: 865: 859: 853: 847: 837: 831: 826: 821: 819: 795: 791: 773: 764: 762: 760:future use. 758: 742: 740: 732: 718: 712: 708: 706: 701: 697: 693: 689: 681: 673: 671: 663: 655: 647: 627: 601: 582: 577:case-folding 568:("LTR") and 534: 524:, category, 513: 505: 501: 497: 488: 429: 423: 416: 408: 393:interoperate 348: 334: 213:   208: / 166:Neo-Tifinagh 28: 4651:SignWriting 4520:Old Sogdian 4490:Nandinagari 4414:Khwarezmian 4324:Dives Akuru 4250:Ancient and 4236:Warang Citi 4101:Pau Cin Hau 4056:New Tai Lue 4051:Nag Mundari 4026:Medefaidrin 3735:Common and 3544:Equivalence 3522:code points 3520:On pairs of 3434:Equivalence 3309:Word joiner 3299:Soft hyphen 3215:Code points 3032:code charts 2618:Ideographic 2598:Digit Value 2412:precomposed 2199:meaningful 1202:Example of 1082:Private Use 994:Game Pieces 934:Box Drawing 854:Punctuation 686:hexadecimal 452:private use 440:code points 401:UCS-encoded 397:interchange 377:mathematics 275:South Indic 266:North Indic 4748:Categories 4545:Phoenician 4530:Old Uyghur 4525:Old Turkic 4510:Old Permic 4505:Old Italic 4455:Manichaean 4349:Glagolitic 4126:Saurashtra 3871:Devanagari 3750:Diacritics 3507:UTF-EBCDIC 3410:Algorithms 3403:Processing 3340:Characters 3264:Characters 3116:2015-06-09 3076:2016-08-09 3051:2016-08-09 2994:2016-08-09 2952:2016-08-09 2854:2016-08-09 2833:2020-12-16 2755:2023-10-24 2731:2024-09-12 2710:2016-08-09 2690:References 2656:Deprecated 2635:Han script 2486:Code Point 2362:endianness 2277:Surrogates 2225:'s use of 1288:(with the 1266:typography 1206:use. This 1165:big-endian 1099:See also: 1076:Surrogates 1033:Ideographs 872:Letterlike 816:Categories 798:ISO 8859-1 688:form. The 680:form, and 618:See also: 558:laying out 551:collations 549:different 518:properties 506:code point 502:code point 483:formatting 456:surrogates 426:code point 369:characters 83:ogographic 77:Alphabetic 4540:ʼPhags-pa 4535:Palmyrene 4485:Nabataean 4409:Khudawadi 4394:Kharosthi 4309:Cuneiform 4284:Bhaiksuki 4279:Bassa Vah 4146:Sundanese 4121:Samaritan 4036:Mongolian 4011:Malayalam 3976:Kirat Rai 3686:Anomalies 3670:ISO 15924 3665:DIN 91379 3566:Z-variant 3549:Homoglyph 3422:Collation 2948:. PEP 383 2631:logograph 2404:diaeresis 2400:diacritic 2033:(Joiners) 1779:Tifinagh 976:Emoticons 958:Technical 593:keyboards 585:end users 579:algorithm 514:character 498:character 432:17 × 2 = 409:universal 157:Mongolian 4672:Currency 4646:Duployan 4620:Vithkuqi 4615:Ugaritic 4470:Meroitic 4440:Mahajani 4425:Linear B 4420:Linear A 4211:Tifinagh 4176:Tai Viet 4171:Tai Tham 4161:Tagbanwa 4076:Ol Chiki 3966:Kayah Li 3961:Katakana 3946:Javanese 3941:Hiragana 3931:Hanunuoo 3906:Gurmukhi 3896:Gujarati 3886:Georgian 3861:Cyrillic 3851:Cherokee 3816:Bopomofo 3796:Balinese 3791:Armenian 3655:GB 18030 3472:Punycode 3360:Numerals 3292: / 3205:Versions 2673:See also 2466:Details 2460:Property 1904:typeface 1896:grapheme 1437:(U+070F) 1208:typeface 1122:OpenType 964:Dingbats 866:Currency 418:mojibake 339:and the 324:Limited. 284:Ethiopic 231:Cherokee 148:Georgian 139:Armenian 121:Cyrillic 4759:Unicode 4580:Soyombo 4570:Sogdian 4565:Siddham 4560:Sharada 4480:Multani 4460:Marchen 4450:Mandaic 4445:Makasar 4359:Grantha 4344:Elymaic 4339:Elbasan 4314:Cypriot 4274:Avestan 4216:Tirhuta 4206:Tibetan 4151:Sunuwar 4136:Sinhala 4131:Shavian 4111:Ranjana 4091:Osmanya 4081:Ol Onal 4006:Lontara 3956:Kannada 3866:Deseret 3831:Burmese 3821:Braille 3811:Bengali 3765:Numbers 3726:Scripts 3375:Symbols 3365:Scripts 3188:Unicode 3181:Unicode 2633:in the 2463:Example 2428:cursive 2309:) × 400 2241:in the 1507:Brahmi 1282:221⁄225 1228:Unicode 860:Symbols 804:in the 678:decimal 661:&#x 572:("RTL") 510:integer 479:control 448:Unicode 405:strings 99:Abugida 88:yllabic 4727:  4716:  4625:Yezidi 4605:Todhri 4600:Tangut 4435:Lydian 4430:Lycian 4404:Khojki 4384:Kaithi 4364:Hatran 4354:Gothic 4304:Coptic 4294:Carian 4289:Brāhmī 4231:Wancho 4196:Thaana 4191:Telugu 4186:Tangsa 4166:Tai Le 4156:Syriac 4116:Rejang 3991:Lepcha 3936:Hebrew 3916:Hangul 3841:Chakma 3786:Arabic 3760:Spaces 3467:CESU-8 3462:BOCU-1 3370:Spaces 2490:U+0041 2432:Arabic 2325:where 2317:- DC00 2305:- D800 2288:UTF-16 2227:U+F8FF 2063:Spaces 1968: 1966:U+0085 1958:  1955: 1953:U+00A0 1902:. The 1877:Others 1787:Ogham 1517:Virama 1308:Hebrew 1304:Arabic 1293:font. 1247:+ 3 + 1161:UTF-32 1153:UTF-16 904:Arrows 782:Blocks 769:octets 749:Planes 741:where 714:entity 672:where 645:&# 528:, and 526:script 437:110000 300:  298:  293:Thaana 291:  289:  282:  280:  273:  271:  264:  262:  253:Hebrew 251:  249:  244:Arabic 242:  240:  229:  227:  220:  218:  204:  202:  195:  193:  184:Hangul 182:  180:  173:  171:  164:  162:  155:  153:  146:  144:  137:  135:  128:  126:  119:  117:  110:  108:  4699:Emoji 4595:Takri 4555:Runic 4495:Ogham 4329:Dogra 4181:Tamil 4086:Osage 4061:Nüshu 3996:Limbu 3986:Latin 3971:Khmer 3951:Kanji 3926:Hanja 3891:Greek 3881:Geʽez 3876:Garay 3826:Buhid 3806:Batak 3801:Bamum 3781:Adlam 3629:Input 3607:Fonts 3602:Email 3590:Usage 3492:UTF-8 3487:UTF-7 3482:UTF-1 3333:Lists 3250:Plane 3223:Block 2970:(PDF) 2908:(PDF) 2872:(PDF) 2660:False 2647:False 2622:False 2297:10000 2270:gaiji 2223:Apple 1900:glyph 1286:22½25 1149:UTF-8 810:ASCII 730:& 604:plane 522:block 468:glyph 403:text 381:music 357:abbr. 222:Hanja 210:Kanji 197:Hanzi 175:Osage 130:Greek 112:Latin 94:Abjad 4475:Modi 4389:Kawi 4259:Ahom 4221:Toto 4201:Thai 4071:Odia 4046:N'Ko 3846:Cham 3612:HTML 3559:list 3477:SCSU 3228:List 3022:Not 2629:: a 2612:NaN 2602:NaN 2472:Name 2329:and 2229:for 1981:The 1924:, a 1908:font 1306:and 743:name 733:name 702:hhhh 698:hhhh 694:nnnn 682:hhhh 674:nnnn 664:hhhh 648:nnnn 630:HTML 622:and 562:BiDi 500:and 481:and 477:for 473:237 335:The 206:Kana 86:and 4226:Vai 4041:Mru 3981:Lao 3280:BOM 3024:the 2588:NaN 2575:N/A 2406:: " 2313:+ ( 2301:+ ( 1963:or 1274:1⁄2 1155:in 1124:or 1021:CJK 898:1⁄3 884:c/o 696:or 656:or 634:XML 632:or 628:An 595:or 490:ISO 391:to 365:IEC 361:ISO 4750:: 4241:Yi 3094:. 3068:. 2972:. 2944:. 2910:. 2874:. 2826:. 2808:. 2790:. 2772:. 2748:. 2637:. 2561:no 2364:. 2319:16 2311:16 2307:16 2299:16 2245:. 1928:. 1253:1¾ 610:. 599:. 575:a 532:. 435:0x 430:or 379:, 375:, 345:WG 3173:e 3166:t 3159:v 3119:. 3079:. 3054:. 3034:. 3028:a 2997:. 2955:. 2929:. 2893:. 2857:. 2836:. 2758:. 2734:. 2713:. 2408:ö 2331:L 2327:H 2321:) 2315:L 2303:H 1290:½ 1278:½ 1255:. 1241:¾ 1210:( 1072:. 1060:. 1054:. 1048:. 1029:. 1015:. 1009:. 990:. 984:. 978:. 972:. 966:. 960:. 954:. 948:. 942:. 936:. 924:. 918:. 912:. 906:. 894:⅓ 886:. 880:℅ 876:℅ 868:. 844:. 736:; 690:x 667:; 651:; 485:. 399:— 363:/ 355:( 343:/ 320:. 43:. 33:. 20:.

Index

List of Unicode characters
special characters
rendering support
question marks, boxes, or other symbols


Alphabetic
ogographic
yllabic
Abjad
Abugida
Latin
Cyrillic
Greek
Armenian
Georgian
Mongolian
Neo-Tifinagh
Osage
Hangul
Hanzi
Kana
Kanji
Hanja
Cherokee
Arabic
Hebrew
North Indic
South Indic
Ethiopic

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.