What percent-encoding is and why it exists
URLs were born in 1994 with a restricted alphabet: ASCII letters, digits, and a handful of special characters that carry structural meaning (/, ?, &, =, #). Anything else — a space, an accented letter, an emoji — must be represented some other way so browser and server transmit it without ambiguity. That mechanism is percent-encoding (a.k.a. URL encoding), defined in RFC 3986.
The basic rule
Each unsafe byte in a URL is replaced by % followed by the two hex digits representing that byte. A space (byte 0x20) becomes %20. A ñ in UTF-8 is two bytes (0xC3 0xB1) and encodes to %C3%B1. Encoding operates on bytes, not characters, so the result depends on the encoding (UTF-8 has been the de facto standard for 20 years).
encodeURI vs encodeURIComponent
- encodeURI: respects URL structure. It does not encode
/ ? & = # :. Use when you have a complete URL with special characters in the path that should still be navigable. - encodeURIComponent: encodes aggressively. It also encodes
/ ? & = # :. Use for values going inside a parameter (after=) that must not be confused with structure.
Typical scenarios
- Building URLs dynamically. If you concatenate
?q=with a user search, that search must go through encodeURIComponent. If the user typed "coffee & cake", an unencoded&looks like a new parameter to the browser. - Calling REST APIs. Same idea: any value traveling in a query string must be encoded.
- Decoding what you receive. When you read
req.queryin a backend, frameworks usually decode for you, but if you handle raw URLs you must decode manually. - UTM and tracking. If a campaign uses accents in the medium or campaign field, encode them; otherwise analytics tools lose them.
Reserved characters
RFC 3986 defines two groups. Reserved characters carry structural meaning: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. Unreserved characters are always safe: letters, digits, - _ . ~. Everything else must be encoded. encodeURIComponent encodes all reserved; encodeURI leaves the structurally meaningful ones alone.
Common mistakes
- Double-encoding.
%20becomes%2520. If the server decodes only once, you end up with literal%20in your data. - Forgetting + in query strings. Some systems encode space as
+(legacy of application/x-www-form-urlencoded). Know which one you're dealing with. - Encoding domains. Hostnames with non-ASCII characters use IDN (punycode), not percent-encoding.
café.combecomesxn--caf-dma.com, notcaf%C3%A9.com.