HTML Entity Encoder / Decoder

Encode/decode HTML entities (&, <, >, named, numeric, hex). Full HTML5 entity set.

encoding

HTML Entity Encoder / Decoder

Encode every character (not just required)

Output

—

Runs entirely in your browser. Your input never leaves your device.

What next?

URL Encoder & Decoder

Different encoding for URLs

Base64 Encode & Decode

For binary-safe transport

How it works

What HTML entities are

HTML entities are text sequences that represent characters with special meaning in HTML — or characters that are hard to type. The browser sees < and renders <; you see & in source and the page shows &. They exist because HTML uses <, >, &, and " as structural markers, so if you want those characters to display rather than parse, you need an escape sequence.

There are three forms:

Named references — < > & " ' and hundreds more for special symbols (©, €, —)
Decimal numeric references — < for <, & for &
Hexadecimal numeric references — < for <, & for &

All three forms are equivalent. Named references are more readable; numeric references are universal (they work even if the parser doesn't know the name). The hex form is common in programmatically generated output because it maps directly to Unicode code points.

When encoding actually matters

Displaying code in HTML

If you're building a blog, docs site, or a code renderer, every < in a code example must be < or the browser will try to parse it as a tag. This is the most common everyday use case. Copy a code snippet into HTML without encoding and watch Chrome try to evaluate <script> as a real tag.

XSS prevention

Cross-site scripting attacks often inject <script> tags or event handlers like onclick="malicious()" through user-supplied input. When you entity-encode everything coming from users before inserting it into HTML, angle brackets and quotes become inert text characters instead of parser instructions.

<!-- Dangerous: user input interpolated raw -->
<p>Welcome, <script>alert('XSS')</script>!</p>

<!-- Safe: entity-encoded -->
<p>Welcome, &lt;script&gt;alert('XSS')&lt;/script&gt;!</p>

The browser renders the second version as literal text — the script never runs.

Important nuance: entity encoding is the right defense when inserting into HTML context. Inserting user data into a JavaScript string, a CSS value, or a URL parameter each requires different escaping — entity encoding alone doesn't protect those contexts.

Preventing double-encoding

Double-encoding is a common bug: you encode & to &, then encode again and get &amp;. The page renders & instead of &. This happens when:

A template escapes HTML, and you also call an encode function on the same string
Data is stored already-encoded in the database, then encoded again at render time

The fix is to decode first (or check whether the input is already encoded), then encode once at the final HTML insertion point.

Named vs numeric references: when to choose

Use named references for human-maintained HTML files — © and — are far more readable than © and —. Named references have been standardized in HTML5 and are universally supported.

Use numeric references when:

Generating HTML programmatically and you don't want to maintain a named-reference lookup table
The character doesn't have a named reference (most Unicode characters above the HTML4 set don't)
You're dealing with XML, where only < > & " ' are guaranteed to be recognized by default

Use hex references (&#x…) when working closely with Unicode code points. A code point like U+1F600 (😀) becomes 😀 directly.

What NOT to entity-encode

Entity encoding is for HTML context only. Applying it elsewhere breaks things:

URL parameters — use percent-encoding (%26 not &). Putting & in a URL gives you a literal & in the query string.
JavaScript strings — use \' or \" or template literals. " in a JS string is the four characters ", not a quote.
CSS values — Unicode escapes look like \003C, not <.
JSON — JSON uses \" for quotes inside strings; HTML entities are meaningless there.

The `he` library

This tool uses he (HTML entities) by Mathias Bynens. It covers the full HTML5 named character reference list (over 2000 entries), handles all three entity forms, and is correct about edge cases that trip up hand-rolled encoders — like the semicolon being optional for some legacy named references, or ¬ vs ∉ having different Unicode values.

The encoding function escapes &, <, >, ", ', and optionally all non-ASCII characters. The decoding function handles both well-formed and some malformed entities. All processing is in-browser; nothing leaves your machine.

Practical examples

Encoding a code snippet for HTML:

Input: if (a < b && c > d) { return "<br>"; } Encoded: if (a < b && c > d) { return "<br>"; }

Decoding an entity string:

Input: <script>alert("hello")</script> Decoded: <script>alert("hello")</script>

Numeric reference:

Input: 💻 → 💻 (U+1F4BB, PERSONAL COMPUTER)

Security and privacy

All encoding and decoding runs locally in your browser. Your input is never sent to our servers. The he library is MIT-licensed and auditable on GitHub.

FAQ

What's the difference between < and < and <?

All three represent the same character: <. < is the named reference (most readable), < is the decimal numeric reference, and < is the hexadecimal numeric reference. Browsers handle all three identically. Named references are better for hand-written HTML; numeric and hex forms are common in programmatically generated markup because they map directly to Unicode code points without needing a lookup table.

Does HTML entity encoding prevent XSS?

It prevents XSS in HTML text content context — if you're inserting user input between tags, entity-encoding <, >, &, and " makes injected markup inert. But it's not a universal XSS fix. If you're inserting user data into a JavaScript string, a href attribute, a CSS value, or a URL parameter, each context needs its own escaping. Entity encoding applied to the wrong context either does nothing or breaks your output.

What causes double-encoding like &amp; showing up on a page?

Double-encoding happens when a string is entity-encoded more than once. & becomes &, and if that's encoded again it becomes &amp;. The page then renders & instead of &. The fix: decode first to check whether the input is already encoded, then encode exactly once at the point where the string is inserted into HTML — not at storage time and render time both.

Why should I entity-encode when templating engines auto-escape?

Most templating engines (React JSX, Jinja2, Handlebars, Blade) auto-escape by default, which is great. You need manual entity encoding when: using dangerouslySetInnerHTML, v-html, innerHTML, or any raw-HTML insertion API; generating HTML strings outside a template engine; writing static HTML by hand; or sanitizing stored content that was entered through a rich-text editor.

Can I use HTML entities inside a URL?

No. URLs use percent-encoding, not HTML entities. https://example.com/?q=a&b=c — the & here is a URL delimiter, and it must be %26 if it's literal data. If you put & in a URL, browsers interpret it as the four-character string &, not &. Use URL encoding for query strings and HTML encoding for HTML attributes that happen to contain URLs — they're layered, not interchangeable.

Is my input sent to a server?

No. All encoding and decoding runs in your browser using the he library. Nothing is transmitted. You can verify by opening the network tab while you type — there are zero outbound requests.

What characters does this tool encode?

By default it encodes the five characters that have structural meaning in HTML: &, <, >, ", '. For the "encode all non-ASCII" option, every character outside the ASCII range (code points > 127) is additionally emitted as a numeric reference — useful when you need pure-ASCII output, such as embedding HTML inside systems that don't handle UTF-8 reliably.

Why do some named entities not need a semicolon?

Legacy HTML (before HTML5) allowed certain named references without the trailing semicolon — &amp was sometimes accepted alongside &. Modern parsers still tolerate the omission for backward compatibility, but it's technically malformed. Always include the semicolon in new code; the he library decodes both forms but always emits the correct form with a semicolon.