Unicode & Emoji Inspector – Code Points, UTF-8 Bytes

Text to Inspect (28 code points)

Char	UTF-8 Bytes	Category
H	0x48	ASCII
e	0x65	ASCII
l	0x6C	ASCII
l	0x6C	ASCII
o	0x6F	ASCII
·	0x20	ASCII
🌍	0xF0 0x9F 0x8C 0x8D	Emoji
!	0x21	ASCII
·	0x20	ASCII
こ	0xE3 0x81 0x93	Hiragana / Katakana
ん	0xE3 0x82 0x93	Hiragana / Katakana
に	0xE3 0x81 0xAB	Hiragana / Katakana
ち	0xE3 0x81 0xA1	Hiragana / Katakana
は	0xE3 0x81 0xAF	Hiragana / Katakana
·	0x20	ASCII
–	0xE2 0x80 0x93	Other Unicode
·	0x20	ASCII
c	0x63	ASCII
a	0x61	ASCII
f	0x66	ASCII
é	0xC3 0xA9	Latin-1 Supplement
·	0x20	ASCII
r	0x72	ASCII
é	0xC3 0xA9	Latin-1 Supplement
s	0x73	ASCII
u	0x75	ASCII
m	0x6D	ASCII
é	0xC3 0xA9	Latin-1 Supplement

Frequently Asked Questions

What information does the Unicode Inspector show?

For each character it shows: the Unicode code point (U+XXXX), HTML entity (decimal and hex), UTF-8 bytes, and Unicode category.

Does it support emoji?

Yes. Emoji are Unicode code points and are analysed like any other character. Multi-codepoint sequences (e.g. family emoji) are split into their component code points.

Can I click to copy a code point?

Yes. Click any code point, HTML decimal, or HTML hex value to copy it to your clipboard.

Is there a character limit?

No hard limit, though very long texts may make the table difficult to navigate.

What is Unicode?

Unicode is the universal character encoding standard that assigns a unique number (called a code point) to every character in every writing system, symbol set, and emoji in the world. Code points are written as U+XXXXin hexadecimal — for example, the letter A isU+0041, and the emoji 🚀 is U+1F680.

The current Unicode standard (v16) defines over 149,000 characters across 161 scripts, organized into 17 planes of 65,536 code points each. The Basic Multilingual Plane (BMP, U+0000–U+FFFF) covers most living scripts. Higher planes (U+10000–U+10FFFF) include historic scripts, musical notation, math operators, and all emoji.

How UTF-8 Encodes Code Points

UTF-8 is the most popular Unicode encoding. It is variable-width: common characters (ASCII) use 1 byte; European and Middle Eastern scripts use 2–3 bytes; emoji and rare symbols use 4 bytes. UTF-8 is backwards-compatible with ASCII — any pure ASCII file is already valid UTF-8.

Code point range	Bytes	Examples
U+0000 – U+007F	1	ASCII: A, z, 0, space
U+0080 – U+07FF	2	Latin-1 suppl., Arabic, Hebrew, Greek
U+0800 – U+FFFF	3	Chinese, Japanese, Korean, symbols
U+10000 – U+10FFFF	4	Emoji, historic scripts, math alphabets

Common Unicode Bugs in Code

String length vs code point count — in JavaScript, '\ud83d\ude80'.length === 2 because emoji use a surrogate pair. Use [...str].length or Intl.Segmenter for true grapheme count.
Normalization issues — the letter é can be encoded as a single code point (U+00E9) or as e + U+0301 (combining accent). They look identical but are not equal unless normalized with str.normalize('NFC').
Invisible characters — zero-width space (U+200B), BOM (U+FEFF), and direction overrides can hide in strings and cause mysterious bugs.