Charset

The character encoding that specifies how bytes in a text part are mapped to readable characters. Common charsets include UTF-8, ISO-8859-1, and Shift_JIS; a mismatch causes garbled text known as mojibake.

Every text part of an email has a character encoding — a rule that maps byte values to characters. The charset is declared in the Content-Type header, for example: Content-Type: text/plain; charset="UTF-8". UTF-8 is the dominant encoding today because it can represent every character in Unicode, but older messages may use regional encodings such as ISO-8859-1 (Western European), ISO-2022-JP (Japanese), GB2312 (Simplified Chinese), or Windows-1252.

When an email is displayed with the wrong charset, characters outside the basic ASCII range are rendered as nonsense symbols — a phenomenon known as mojibake (from Japanese, roughly "character transformation"). This happens when a message declares one charset but the reader uses another, or when no charset is declared and the reader guesses incorrectly.

Mbox Viewer detects the charset declaration from the MIME headers and applies the correct decoder for each message part. For messages that omit a charset declaration, the app applies heuristic detection to identify the encoding from the byte patterns, reducing mojibake in archives that contain mail from older or non-standard clients.

Related terms

Encoded-word (RFC 2047)

An encoding scheme defined in RFC 2047 ("Encoded-Word") that allows non-ASCII characters in email header fields such as Subject and From, by encoding them as =?charset?encoding?text?= tokens.

MIME

Multipurpose Internet Mail Extensions — the standard that defines how email messages encode non-ASCII text, HTML bodies, attachments, and other binary content within the plain-text structure of email.

Back to the full glossary

Read your MBOX files on your Mac

Download on the Mac App Store