Skip to content
Mbox Viewer

Encoded-word (RFC 2047)

RFC 2047

An encoding scheme defined in RFC 2047 ("Encoded-Word") that allows non-ASCII characters in email header fields such as Subject and From, by encoding them as =?charset?encoding?text?= tokens.

Email headers are required by RFC 5322 to contain only 7-bit ASCII characters. RFC 2047 provides a workaround: non-ASCII text in headers is represented as an "encoded word" in the form =?charset?B?...?= (Base64) or =?charset?Q?...?= (quoted-printable). For example, a Japanese subject line might appear in the raw message as =?ISO-2022-JP?B?...?= and must be decoded before it can be displayed.

Without RFC 2047 decoding, subject lines and sender names containing accented characters, CJK characters, Arabic, or any other non-ASCII script appear as raw encoded strings — completely unreadable to the end user. Correct implementation requires detecting encoded-word tokens anywhere they can legally appear in a header value and decoding each one using the specified charset and encoding.

Mbox Viewer decodes RFC 2047 encoded words in all header fields when building its message list and search index. This means that searching for a name written in its original script — for example, a Japanese sender name — will match correctly even though the underlying MBOX file stores the name in encoded form.

Related terms

Read your MBOX files on your Mac