CharCode PointHTML DecHTML HexUTF-8 BytesCategory
H0x48ASCII
e0x65ASCII
l0x6CASCII
l0x6CASCII
o0x6FASCII
Β·0x20ASCII
🌍0xF0 0x9F 0x8C 0x8DEmoji
!0x21ASCII
Β·0x20ASCII
こ0xE3 0x81 0x93Hiragana / Katakana
γ‚“0xE3 0x82 0x93Hiragana / Katakana
に0xE3 0x81 0xABHiragana / Katakana
け0xE3 0x81 0xA1Hiragana / Katakana
は0xE3 0x81 0xAFHiragana / Katakana
Β·0x20ASCII
–0xE2 0x80 0x93Other Unicode
Β·0x20ASCII
c0x63ASCII
a0x61ASCII
f0x66ASCII
Γ©0xC3 0xA9Latin-1 Supplement
Β·0x20ASCII
r0x72ASCII
Γ©0xC3 0xA9Latin-1 Supplement
s0x73ASCII
u0x75ASCII
m0x6DASCII
Γ©0xC3 0xA9Latin-1 Supplement

Frequently Asked Questions

What information does the Unicode Inspector show?

For each character it shows: the Unicode code point (U+XXXX), HTML entity (decimal and hex), UTF-8 bytes, and Unicode category.

Does it support emoji?

Yes. Emoji are Unicode code points and are analysed like any other character. Multi-codepoint sequences (e.g. family emoji) are split into their component code points.

Can I click to copy a code point?

Yes. Click any code point, HTML decimal, or HTML hex value to copy it to your clipboard.

Is there a character limit?

No hard limit, though very long texts may make the table difficult to navigate.

What is Unicode?

Unicode is the universal character encoding standard that assigns a unique number (called a code point) to every character in every writing system, symbol set, and emoji in the world. Code points are written as U+XXXXin hexadecimal β€” for example, the letter A isU+0041, and the emoji πŸš€ is U+1F680.

The current Unicode standard (v16) defines over 149,000 characters across 161 scripts, organized into 17 planes of 65,536 code points each. The Basic Multilingual Plane (BMP, U+0000–U+FFFF) covers most living scripts. Higher planes (U+10000–U+10FFFF) include historic scripts, musical notation, math operators, and all emoji.

How UTF-8 Encodes Code Points

UTF-8 is the most popular Unicode encoding. It is variable-width: common characters (ASCII) use 1 byte; European and Middle Eastern scripts use 2–3 bytes; emoji and rare symbols use 4 bytes. UTF-8 is backwards-compatible with ASCII β€” any pure ASCII file is already valid UTF-8.

Code point rangeBytesExamples
U+0000 – U+007F1ASCII: A, z, 0, space
U+0080 – U+07FF2Latin-1 suppl., Arabic, Hebrew, Greek
U+0800 – U+FFFF3Chinese, Japanese, Korean, symbols
U+10000 – U+10FFFF4Emoji, historic scripts, math alphabets

Common Unicode Bugs in Code

  • String length vs code point count β€” in JavaScript, '\ud83d\ude80'.length === 2 because emoji use a surrogate pair. Use [...str].length or Intl.Segmenter for true grapheme count.
  • Normalization issues β€” the letter Γ© can be encoded as a single code point (U+00E9) or as e + U+0301 (combining accent). They look identical but are not equal unless normalized with str.normalize('NFC').
  • Invisible characters β€” zero-width space (U+200B), BOM (U+FEFF), and direction overrides can hide in strings and cause mysterious bugs.