Skip to content
TypeParser
All tools

Punycode IDN

Convert internationalized domain names.

beats punycoder.com edge: Per-label conversion + bidirectional
unicode
punycode
Guide

About Punycode IDN

Encode unicode domains (<code>héllo.com</code>, <code>例え.jp</code>) to Punycode (<code>xn--hllo-bpa.com</code>, <code>xn--r8jz45g.jp</code>) and back. Per-label conversion (the <code>.com</code> stays ASCII). Useful when registering an IDN, parsing email addresses with non-ASCII domains, or auditing potential homograph attacks.

What Punycode encodes

The DNS protocol predates Unicode by decades. To support domains in any language without breaking the wire format, IDN (RFC 5891) defines an ASCII encoding called Punycode (RFC 3492). Per-label, the unicode form encodes to xn--<encoded>.

Modern browsers and email clients convert at the boundary — humans see Unicode, the network sees ASCII.

Common workflows

Register an IDN. Pick the unicode domain you want, encode to Punycode, register the encoded form at your registrar.

Audit DNS records. A hostname starting with xn-- is an IDN — decode to see the human-readable form. Useful when reviewing certificate transparency logs or domain reports.

Detect homograph risks. Decode an IDN, inspect the codepoints. Mixed Latin / Cyrillic in one label is a red flag.

Process email at scale. Email logs contain Punycode domains. Decoding makes them readable for analysis.

Mixed-script detection

Browsers protect against homographs by showing Punycode when a label mixes scripts (Cyrillic + Latin). The tool flags potential homograph cases so you can verify before trusting an IDN domain.

Frequently asked questions

Why does <code>héllo.com</code> need conversion?
DNS only accepts ASCII letters, digits, and hyphens. Internationalized domains encode the non-ASCII part as Punycode prefixed with xn--. Browsers and email clients display the unicode form to humans.
What is a homograph attack?
Using lookalike characters from other scripts to mimic legit domains — e.g. Cyrillic "а" (U+0430) instead of Latin "a" (U+0061) in аpple.com. Browsers detect mixed scripts and show the punycode form to flag suspicion.
Can <code>.com</code> be unicode?
The TLD itself is ASCII. Each label converts independently — labels can mix any UTF-8, the dots stay as ASCII separators.
How does email handle this?
The local part (name@) follows different rules (RFC 6531 EAI). The domain part follows IDN. Most modern mail stacks convert to Punycode for SMTP transport.
Browser display?
Modern browsers show the Unicode form in the address bar by default but switch to Punycode if the domain mixes scripts in a homograph-suspicious way.
For emoji domains?
Yes — 💩.la is a real (now defunct) emoji domain. Encodes to xn--ls8h.la.

Related tools

Last updated: 2025-01-15