| Char | Code Point | HTML Dec | HTML Hex | UTF-8 Bytes | Category |
|---|---|---|---|---|---|
| H | 0x48 | ASCII | |||
| e | 0x65 | ASCII | |||
| l | 0x6C | ASCII | |||
| l | 0x6C | ASCII | |||
| o | 0x6F | ASCII | |||
| Β· | 0x20 | ASCII | |||
| π | 0xF0 0x9F 0x8C 0x8D | Emoji | |||
| ! | 0x21 | ASCII | |||
| Β· | 0x20 | ASCII | |||
| γ | 0xE3 0x81 0x93 | Hiragana / Katakana | |||
| γ | 0xE3 0x82 0x93 | Hiragana / Katakana | |||
| γ« | 0xE3 0x81 0xAB | Hiragana / Katakana | |||
| γ‘ | 0xE3 0x81 0xA1 | Hiragana / Katakana | |||
| γ― | 0xE3 0x81 0xAF | Hiragana / Katakana | |||
| Β· | 0x20 | ASCII | |||
| β | 0xE2 0x80 0x93 | Other Unicode | |||
| Β· | 0x20 | ASCII | |||
| c | 0x63 | ASCII | |||
| a | 0x61 | ASCII | |||
| f | 0x66 | ASCII | |||
| Γ© | 0xC3 0xA9 | Latin-1 Supplement | |||
| Β· | 0x20 | ASCII | |||
| r | 0x72 | ASCII | |||
| Γ© | 0xC3 0xA9 | Latin-1 Supplement | |||
| s | 0x73 | ASCII | |||
| u | 0x75 | ASCII | |||
| m | 0x6D | ASCII | |||
| Γ© | 0xC3 0xA9 | Latin-1 Supplement |
Frequently Asked Questions
What information does the Unicode Inspector show?
For each character it shows: the Unicode code point (U+XXXX), HTML entity (decimal and hex), UTF-8 bytes, and Unicode category.
Does it support emoji?
Yes. Emoji are Unicode code points and are analysed like any other character. Multi-codepoint sequences (e.g. family emoji) are split into their component code points.
Can I click to copy a code point?
Yes. Click any code point, HTML decimal, or HTML hex value to copy it to your clipboard.
Is there a character limit?
No hard limit, though very long texts may make the table difficult to navigate.
What is Unicode?
Unicode is the universal character encoding standard that assigns a unique number (called a code point) to every character in every writing system, symbol set, and emoji in the world. Code points are written as U+XXXXin hexadecimal β for example, the letter A isU+0041, and the emoji π is U+1F680.
The current Unicode standard (v16) defines over 149,000 characters across 161 scripts, organized into 17 planes of 65,536 code points each. The Basic Multilingual Plane (BMP, U+0000βU+FFFF) covers most living scripts. Higher planes (U+10000βU+10FFFF) include historic scripts, musical notation, math operators, and all emoji.
How UTF-8 Encodes Code Points
UTF-8 is the most popular Unicode encoding. It is variable-width: common characters (ASCII) use 1 byte; European and Middle Eastern scripts use 2β3 bytes; emoji and rare symbols use 4 bytes. UTF-8 is backwards-compatible with ASCII β any pure ASCII file is already valid UTF-8.
| Code point range | Bytes | Examples |
|---|---|---|
| U+0000 β U+007F | 1 | ASCII: A, z, 0, space |
| U+0080 β U+07FF | 2 | Latin-1 suppl., Arabic, Hebrew, Greek |
| U+0800 β U+FFFF | 3 | Chinese, Japanese, Korean, symbols |
| U+10000 β U+10FFFF | 4 | Emoji, historic scripts, math alphabets |
Common Unicode Bugs in Code
- String length vs code point count β in JavaScript,
'\ud83d\ude80'.length === 2because emoji use a surrogate pair. Use[...str].lengthorIntl.Segmenterfor true grapheme count. - Normalization issues β the letter Γ© can be encoded as a single code point (
U+00E9) or ase + U+0301(combining accent). They look identical but are not equal unless normalized withstr.normalize('NFC'). - Invisible characters β zero-width space (
U+200B), BOM (U+FEFF), and direction overrides can hide in strings and cause mysterious bugs.