About HTML Entity Encoder
Encode reserved characters as named, numeric (decimal), or hex entities. Decode handles all three forms cleanly. Optional non-ASCII-only encoding for compact output that still survives legacy charsets. Useful when embedding user content in HTML, when parsing scraped pages, and when sanitizing strings against XSS.
Why HTML entities still exist
HTML conflates text content with markup using a small set of reserved characters: <, >, &, ", '. Any of those in your data must be escaped or the parser will read it as part of the markup. Entity encoding is the bridge — < becomes < and parses as the literal less-than character.
The original spec was ASCII; entities also let you embed any Unicode codepoint (😀 for 😀) when the page’s encoding could not represent it directly. Modern UTF-8 pages rarely need this, but the option survives for legacy systems.
Encoding modes
- Named (
&,<,>) — readable, the default for hand-written HTML. - Decimal (
&,<) — universal, every parser since 1995 reads it. - Hex (
&,<) — preferred by XML, used in some security configs. - Non-ASCII only — leave ASCII unchanged, encode only characters above 127. Compact, safe.
Common workflows
Sanitize user input for HTML output. Encode user-supplied strings before injecting into a template. Modern frameworks do this for you; for raw HTML generation, this tool is a backstop.
Decode scraped content. Pages scraped through HTTP clients arrive with entities intact. Decode here to get the plain text — useful for analysis or NLP preprocessing.
Verify your escaping. Paste suspected XSS input, see exactly what the encoder outputs. If <script> survives, your encoder is wrong.
Embed code snippets in HTML. Pasting <div> into an HTML page would render. Encoded <div> shows as text.
Why entities defeat XSS
XSS attacks rely on user content being interpreted as markup. If the user’s <script>alert(1)</script> becomes <script>alert(1)</script> in the HTML, the browser sees text, not code. Escape consistently and the attack surface evaporates. Forget once and a single field becomes injectable. The mechanism is simple; discipline is everything.
Frequently asked questions
Which characters must be encoded?
& as &, < as <, > as >, " as ", ' as '. Inside attribute values, quote handling is mandatory; in regular content, > and " are sometimes optional.Named, decimal, or hex?
&) is most readable. Decimal (&) is the most compatible — every parser since the 1990s reads it. Hex (&) is what XML output prefers. Pick by audience.Is this enough to prevent XSS?
How do I encode emoji?
😀 for 😀. Most modern parsers also accept the raw UTF-8 byte sequence directly.Why does <code>&nbsp;</code> matter?
Can I decode whole HTML pages?
Related tools
Last updated: 2025-01-15