Base16 explained

Why hex?

Computers think in binary. Humans don't. A single byte -- eight bits -- can hold 256 different values, and writing those out as ones and zeros gets unreadable fast. The value 255, for instance, is 11111111 in binary. That's eight identical digits. Good luck spotting an error in a long dump of those.

Hexadecimal (base-16) solves this by mapping each group of four bits to a single character. Since four bits can represent 16 values (0 through 15), you need exactly 16 symbols: the digits 0 through 9 for values zero to nine, and the letters A through F for values ten to fifteen¹. One byte becomes exactly two hex characters. Always. No padding, no variable-length output, no ambiguity.

That's the fundamental appeal. Hex is a compact, lossless, human-readable representation of binary data.

A brief history of the notation

The hexadecimal system didn't emerge from the computing world fully formed. Before hex became standard, many computer architectures used octal (base-8) notation -- the PDP-8 and early IBM machines among them. Octal maps three bits to one digit, which works beautifully for machines with 12-bit or 36-bit word lengths. But it's awkward for 8-bit bytes, because 8 isn't evenly divisible by 3.

The turning point came on April 7, 1964, when IBM announced the System/360². The S/360 standardized the 8-bit byte as the fundamental unit of memory -- a decision that shaped virtually all computing architecture that followed³. With bytes now 8 bits wide, hex notation was the natural fit: two hex digits per byte, no leftovers.

The A-F letter convention was popularized through the Fortran IV manual for the System/360, published around 1966⁴. Not everyone was happy about it. In 1968, Bruce Alan Martin of Brookhaven National Laboratory wrote a letter to the Communications of the ACM calling the choice of A through F "ridiculous" and proposing entirely new symbols that would visually reflect binary structure⁵. His proposal didn't catch on, obviously. We're still using A-F sixty years later.

How Base16 encoding works

Diagram showing how Base16 encodes the letter A into hex digits 41

Base16 encoding process

The encoding itself is almost trivially simple. Take any sequence of bytes. For each byte:

Split it into two nibbles (4-bit halves) -- the high nibble and the low nibble.
Look up each nibble in the Base16 alphabet: 0123456789ABCDEF.
Concatenate the two resulting characters.

That's it. The ASCII letter "A" (decimal 65, binary 01000001) splits into 0100 (4) and 0001 (1), giving you 41. The string "foobar" becomes 666F6F626172¹.

There's no padding needed -- unlike Base64 and Base32, where the input length doesn't always divide evenly into the encoding's bit groupings, Base16 maps bytes one-to-one to character pairs. Every byte produces exactly two hex characters, every time¹.

The encoding is also case-insensitive. 4a and 4A represent the same byte. RFC 4648 notes this explicitly, though it warns that in certain security contexts, the case difference could theoretically be exploited to leak information through a covert channel¹.

RFC 4648: the formal specification

Base16 was formally standardized in October 2006 as part of RFC 4648, authored by Simon Josefsson¹. The RFC covers Base16, Base32, and Base64 together, treating them as a family of encodings with the same basic purpose -- representing binary data in text-safe form -- but with different efficiency tradeoffs.

The earlier RFC 3548 (published in 2003) covered the same ground, but RFC 4648 obsoleted it with clearer language and additional security considerations⁶.

These three encodings form an efficiency spectrum. Base64 encodes 6 bits per output character, Base32 encodes 5, and Base16 encodes just 4. That means hex encoding is the least space-efficient of the three -- it doubles the size of the input data. But it's also the simplest and the most universally readable, which is why it shows up in so many places where compactness isn't the priority.

Where you see hex every day

Hex is everywhere in programming and systems work, often in places you stop noticing after a while.

CSS color codes. The #RRGGBB notation that web developers use daily is hex encoding of three bytes -- one each for red, green, and blue intensity. #FF0000 is pure red (255, 0, 0). The W3C formalized this in CSS Level 1, published as a Recommendation on December 17, 1996⁷. CSS has since gained shorthand (#F00), alpha channels (#FF000080), and modern color functions, but the six-digit hex format remains probably the most commonly written hex in the world.

MAC addresses. Every network interface has a 48-bit hardware address written as six hex octets separated by colons or hyphens -- something like 00:1A:2B:3C:4D:5E. The format is defined by IEEE 802 standards and documented in RFC 7042⁸⁹. Six bytes, twelve hex digits. Clean and unambiguous.

Cryptographic hashes. When you see a SHA-256 hash, you're looking at 32 bytes (256 bits) rendered as 64 hex characters¹⁰. Something like e3b0c44298fc1c14... -- that's hex. Same for MD5 (32 hex characters), SHA-1 (40 characters), and practically every other hash output you'll encounter. I find it interesting that most developers work with these hex strings daily without thinking of them as "Base16 encoded data," but that's exactly what they are.

Memory addresses and binary inspection. Debuggers, hex editors, and memory dump tools all display data in hex. The convention of showing 16 bytes per line with both hex and ASCII representations goes back decades and remains the standard way to inspect binary files.

Hex in programming languages

Most programming languages adopted the 0x prefix for hexadecimal literals. The convention traces back to the C programming language, which Dennis Ritchie developed in the early 1970s at Bell Labs¹¹. C inherited some notation ideas from BCPL but introduced 0x specifically to disambiguate hex constants from decimal and octal¹¹. The first edition of The C Programming Language by Kernighan and Ritchie, published in 1978, documented this notation.

From C, the 0x prefix spread to C++, Java, JavaScript, Python, Rust, Go, and nearly every language with C-family syntax. A few languages do it differently -- Haskell uses 0x too, but Ada uses 16#FF#, and some assemblers use a trailing h (as in FFh). The 0x convention won overwhelmingly, though.

Escape sequences in strings are another common hex surface. C-style languages use \x41 to represent a byte with hex value 41 (the letter "A"). Python has the same, plus \u0041 for Unicode code points. URLs use percent-encoding: %20 is a space (hex 20, decimal 32). Different syntax, same underlying idea -- representing byte values as pairs of hex digits.

Language	Hex literal	String escape	Notes
C/C++	`0xFF`	`\xFF`	Original source of `0x` convention
Java	`0xFF`	`\u00FF`	Unicode escapes only
JavaScript	`0xFF`	`\xFF` or `\u00FF`	Both byte and Unicode escapes
Python	`0xFF`	`\xFF` or `\u00FF`	Also `bytes.hex()` and `bytes.fromhex()`
Rust	`0xFF`	`\xFF`	Strict -- only valid byte values in byte strings
Go	`0xFF`	`\xFF`	Also `fmt.Sprintf("%x", data)`

Base16 vs. Base64: when to use which

The choice between hex and Base64 comes down to what matters more -- readability or compactness.

Base16 doubles the data size. Every input byte becomes two output characters. Base64, by contrast, produces roughly 4 characters for every 3 input bytes -- about a 33% size increase¹. For embedding a 20 KB image in a data URI, that difference matters. For displaying a 32-byte hash, it doesn't.

Hex is also trivially decodable by humans. You can read a hex dump and mentally convert pairs to bytes. You can't do that with Base64 -- the 6-bit groupings don't align with byte boundaries, so a single Base64 character doesn't correspond to any meaningful unit of the original data.

My rule of thumb: if a human will read the encoded data, use hex. If a machine will transport it and size matters, use Base64. If you're encoding data for a URL, consider Base64url (the URL-safe variant from RFC 4648). If you need something in between -- readable but more compact than hex -- Base32 exists, though I've rarely seen it used outside of specific protocols like TOTP¹.

Citations

RFC 4648: The Base16, Base32, and Base64 Data Encodings. S. Josefsson, October 2006 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
IBM: The IBM System/360. Retrieved March 4, 2026 ↩
G. M. Amdahl, G. A. Blaauw, F. P. Brooks, Jr.: Architecture of the IBM System/360. IBM Journal of Research and Development, Vol. 8, No. 2, April 1964 ↩
IBM: IBM System/360 Fortran IV Language, 1966. Referenced in historical accounts of hexadecimal notation standardization ↩
Bruce Alan Martin: Letters to the Editor -- On Binary Notation. Communications of the ACM, Vol. 11, No. 10, October 1968 ↩
RFC 3548: The Base16, Base32, and Base64 Data Encodings. S. Josefsson, July 2003 ↩
W3C: Cascading Style Sheets, level 1. W3C Recommendation, December 17, 1996. Retrieved March 4, 2026 ↩
RFC 7042: IANA Considerations and IETF Protocol and Documentation Usage for IEEE 802 Parameters. Retrieved March 4, 2026 ↩
IEEE: Standard for Local and Metropolitan Area Networks: Overview and Architecture. IEEE 802-2014. Retrieved March 4, 2026 ↩
NIST: FIPS PUB 180-4 -- Secure Hash Standard (SHS). Retrieved March 4, 2026 ↩
Brian W. Kernighan and Dennis M. Ritchie: The C Programming Language. Prentice Hall, 1978 ↩ ↩²

Updated: March 4, 2026