Every piece of text you see on a screen is, at its core, a sequence of numbers. The way those numbers map to characters is the world of character encoding. I have spent years debugging encoding issues, and understanding how encoding works will save you from some of the most frustrating bugs in software development.
In this guide, I will cover the encoding schemes that matter most to web developers: Base64, URL encoding, HTML entities, Unicode, and more. You will understand not just how to use them, but why they exist and when each is the right choice.
ASCII: Where It All Started
The American Standard Code for Information Interchange (ASCII) was published in 1963 and defined 128 characters: the English alphabet, digits, punctuation, and control characters, each represented by a number from 0 to 127 (7 bits). ASCII worked for English but completely ignored the rest of the world. There was no way to represent accented characters, Chinese, Arabic, or the thousands of other symbols humans use. This limitation drove the development of every encoding system we will discuss here.
Base64: Turning Binary into Text
Base64 is an encoding scheme that converts binary data into a text representation using 64 printable ASCII characters. It was originally designed for email (the MIME standard), where binary attachments needed to travel through text-only channels.
How Base64 Works
The algorithm processes input bytes in groups of three (24 bits). Each group is split into four 6-bit chunks, and each 6-bit value maps to one of 64 characters: A-Z, a-z, 0-9, +, and /. If the input length is not divisible by three, the output is padded with = signs to signal how many bytes were in the final group.
// JavaScript Base64 encoding and decoding
const encoded = btoa("Hello, World!");
console.log(encoded); // "SGVsbG8sIFdvcmxkIQ=="
const decoded = atob("SGVsbG8sIFdvcmxkIQ==");
console.log(decoded); // "Hello, World!"
When to Use Base64
Common use cases include embedding small images as data URIs in HTML/CSS, encoding email attachments via MIME, including binary data in JSON API payloads, and storing binary data in text-only database fields. The trade-off is size: Base64 increases data by approximately 33%. Try it with our Base64 Encoder/Decoder.
URL Encoding: Making URLs Safe
URLs have a restricted character set. Characters like spaces, ampersands, question marks, and non-ASCII characters cannot appear directly in a URL because they have special meaning or are simply not allowed by the specification. URL encoding (also called percent-encoding) solves this by replacing unsafe characters with a percent sign followed by their hex value.
Space -> %20
& -> %26
= -> %3D
? -> %3F
/ -> %2F
cafe latte -> cafe%20latte
encodeURIComponent vs. encodeURI
JavaScript provides two functions for URL encoding, and using the wrong one is a common source of bugs:
const query = "name=Alice & Bob";
// encodeURIComponent: encodes everything except A-Z a-z 0-9 - _ . ! ~ * ' ( )
encodeURIComponent(query);
// "name%3DAlice%20%26%20Bob"
// encodeURI: preserves URL structure characters (: / ? # & = etc.)
encodeURI("https://example.com/search?q=" + query);
// "https://example.com/search?q=name=Alice%20&%20Bob" (BROKEN!)
// Correct approach for query parameters:
"https://example.com/search?q=" + encodeURIComponent(query);
// "https://example.com/search?q=name%3DAlice%20%26%20Bob"
The rule: use encodeURIComponent() for individual query parameter values and path segments. Use encodeURI() only when encoding a complete URL. In practice, I reach for encodeURIComponent() about 95% of the time. Experiment with our URL Encoder/Decoder.
HTML Entities: Safe Characters in Markup
In HTML, certain characters have special meaning. The less-than sign < opens a tag. The ampersand & starts an entity reference. If you want to display these characters as literal text, you must encode them as HTML entities.
Named vs. Numeric Entities
HTML entities come in two forms. Named entities use readable labels like < for the less-than sign and & for the ampersand. Numeric entities use the character's code point in decimal (<) or hex (<). Named entities are easier to remember, while numeric entities work for any Unicode character.
XSS Prevention
HTML entity encoding is critical for security. If your application displays user-generated content without encoding it, an attacker can inject malicious <script> tags that execute in every visitor's browser. This is Cross-Site Scripting (XSS), one of the most common web vulnerabilities. Always encode user input before inserting it into HTML. Convert and inspect HTML entities with our HTML Entity Encoder/Decoder.
Unicode: One Encoding to Rule Them All
Unicode aims to include every character from every writing system on Earth, plus mathematical symbols, technical symbols, and emoji. It defines over 149,000 characters and counting.
Code Points
Every Unicode character is assigned a unique number called a code point, written as U+ followed by a hex value:
A U+0041
e (accented) U+00E9
Sigma U+03A3
Han character U+4E16
Smiling face U+1F600
UTF-8 Encoding
UTF-8 is the dominant encoding on the web (used by over 98% of websites). It is a variable-width encoding that uses 1 to 4 bytes per character:
| Code Point Range | Bytes | Example |
|---|---|---|
| U+0000 to U+007F | 1 | ASCII characters (A, 5, !) |
| U+0080 to U+07FF | 2 | Latin accented, Greek, Cyrillic |
| U+0800 to U+FFFF | 3 | Chinese, Japanese, Korean, most symbols |
| U+10000 to U+10FFFF | 4 | Emoji, historic scripts, math symbols |
The beauty of UTF-8 is backward compatibility: any valid ASCII text is also valid UTF-8, making migration nearly painless for English-language systems.
Surrogate Pairs and Emoji
JavaScript strings use UTF-16 internally. Characters with code points above U+FFFF (like most emoji) are stored as surrogate pairs, which are two 16-bit code units that together represent a single character:
const emoji = "\u{1F600}"; // Grinning face
console.log(emoji.length); // 2 (two UTF-16 code units!)
console.log([...emoji].length); // 1 (one actual character)
console.log(emoji.codePointAt(0)); // 128512 (decimal for U+1F600)
This is why string.length in JavaScript can give surprising results with emoji. If you need to count actual characters, use [...string].length or Array.from(string).length. Explore Unicode characters and conversions with our Unicode Converter.
Binary and Hexadecimal Representations
At the lowest level, all data is binary: sequences of ones and zeros. Hexadecimal (base 16) provides a more compact way to represent binary data, where each hex digit maps to exactly 4 bits:
Character: A
ASCII: 65
Binary: 01000001
Hex: 41
Character: Z
ASCII: 90
Binary: 01011010
Hex: 5A
Hex appears everywhere: CSS color codes (#FF5733), memory addresses, MAC addresses, and cryptographic hashes. Convert between text, binary, and hex using our Binary to Text Converter and Hex Editor.
Morse Code: A Historical Perspective
Before digital encoding, there was Morse code. Developed by Samuel Morse and Alfred Vail in the 1830s, Morse code encodes letters as sequences of short signals (dots) and long signals (dashes). It was the first widely adopted system for encoding text for electronic transmission, and it remained in active use for maritime communication until the late 1990s.
H ....
E .
L .-..
L .-..
O ---
"HELLO" in Morse: .... . .-.. .-.. ---
Morse code is a variable-length encoding: common letters (E, T) get short codes, while rare ones (Q, Z) get longer ones. This is the same principle behind modern Huffman coding. Play with it using our Morse Code Translator.
Common Pitfalls and How to Avoid Them
Double Encoding
One of the most frequent encoding bugs is applying the same encoding twice. For URL encoding, this turns a space into %20, and then into %2520 because the percent sign itself gets encoded. The fix is simple: always encode exactly once, at the boundary where data enters a new context.
Encoding Mismatches and Mojibake
If a file is saved as UTF-8 but the server declares it as ISO-8859-1, characters outside the ASCII range will display as garbled text, known as "mojibake." Always ensure your HTML declares <meta charset="UTF-8"> and your server sends the matching Content-Type header. In 2026, the answer is almost always UTF-8.
Storing Encoded Values
If a user submits "Tom & Jerry" and you store it as "Tom & Jerry" in the database, you now have an encoded string masquerading as raw text. When your template engine encodes it again on output, the user sees "Tom &amp; Jerry". The rule: store raw data, encode only at the point of output.
A Quick Reference
| Encoding | Use Case | Size Overhead |
|---|---|---|
| Base64 | Binary data in text contexts | ~33% |
| URL Encoding | Special characters in URLs | ~3x for encoded chars |
| HTML Entities | Special characters in HTML | 4 to 8 chars per entity |
| UTF-8 | Universal text encoding | 1 to 4 bytes per character |
| Hex | Byte-level data inspection | 2x |
Wrapping Up
Character encoding touches almost everything in web development. Getting it right means your application works correctly for users worldwide. Getting it wrong means mojibake, broken URLs, and security vulnerabilities.
Here are the BoltQuickTools encoding resources mentioned in this article:
- Base64 Encoder/Decoder for converting between binary and text
- URL Encoder/Decoder for percent-encoding and decoding
- HTML Entity Encoder/Decoder for safe HTML output
- Unicode Converter for exploring code points and encodings
- Binary to Text Converter for binary and text conversions
- Hex Editor for byte-level data inspection
- Morse Code Translator for encoding text as Morse code
All of these tools run entirely in your browser with no data leaving your device. Whether you are debugging an encoding issue or just exploring how different encoding schemes work, they are free and ready to use.