Glossary
Email & MBOX glossary
Key terms for email archives, formats, protocols and the anatomy of a message — each with its own page, explained in plain language.
File formats
A plain-text file format that stores multiple email messages concatenated together, each beginning with a "From " separator line. It is the format Google Takeout produces when you export your Gmail archive.
A single-message file in MIME format, containing headers, body, and attachments. EML files are widely supported across email clients and are useful for archiving or sharing individual messages.
Microsoft Outlook's proprietary binary format for a single email message, storing headers, body, and attachments in a Compound Document Structure. Unlike EML, MSG is not a plain-text standard.
Personal Storage Table — Microsoft Outlook's container file for an entire mailbox, including folders, messages, contacts, and calendar items. Also known as OST when used for offline sync with Exchange or Microsoft 365.
A mailbox format that stores each email message as a separate file within a directory hierarchy, rather than concatenating all messages into a single file like MBOX.
Protocols & services
Internet Message Access Protocol — the standard protocol for accessing email stored on a server, keeping messages synchronized across multiple devices without downloading and deleting them.
Post Office Protocol 3 — an older email retrieval protocol that downloads messages from a server to a local device, typically removing them from the server afterward.
Simple Mail Transfer Protocol — the standard protocol used to send and relay email messages between mail servers. It is used for outgoing mail only; reading email requires IMAP or POP3.
Google's official service for exporting your personal data, including Gmail. For email, it produces one or more MBOX files containing all your messages and their Gmail labels.
Gmail's tagging system that assigns one or more labels to each message, serving the role that folders play in traditional email clients. A single message can carry multiple labels simultaneously.
Message structure
The structured metadata block at the beginning of an email message, containing fields like From, To, Subject, Date, and numerous technical fields that describe how the message was composed, routed, and encoded.
Multipurpose Internet Mail Extensions — the standard that defines how email messages encode non-ASCII text, HTML bodies, attachments, and other binary content within the plain-text structure of email.
A MIME message structure that combines multiple content parts — such as plain text, HTML, and attachments — in a single message, each separated by a unique boundary string.
A globally unique identifier assigned to each email message, specified in the Message-ID header. It is used to track messages, build conversation threads, and detect duplicates when merging archives.
Email headers (In-Reply-To and References) that link a reply to the message it responds to, enabling mail clients and archive tools to group related messages into conversation threads.
The delivery metadata used by SMTP servers to route an email message — specifically the envelope sender (MAIL FROM) and envelope recipients (RCPT TO) — which may differ from the visible From and To headers.
A file — such as a PDF, image, or spreadsheet — embedded in an email message and encoded as a MIME part, separate from the message body, intended for the recipient to save or open.
An image embedded directly into an HTML email body using a Content-ID (cid:) reference, rather than attached as a separate downloadable file. The image data is stored as a MIME part within the same message.
Encoding & charsets
The character encoding that specifies how bytes in a text part are mapped to readable characters. Common charsets include UTF-8, ISO-8859-1, and Shift_JIS; a mismatch causes garbled text known as mojibake.
A binary-to-text encoding scheme that represents arbitrary binary data using only 64 printable ASCII characters, widely used in email to safely transmit attachments and binary content.
A MIME transfer encoding that represents text with mostly ASCII characters, escaping non-ASCII bytes as =XX hex sequences. It keeps the majority of the text human-readable in the raw message source.
An encoding scheme defined in RFC 2047 ("Encoded-Word") that allows non-ASCII characters in email header fields such as Subject and From, by encoding them as =?charset?encoding?text?= tokens.
Concepts & features
The process of grouping related email messages into conversations by following In-Reply-To and References header links, typically using the JWZ algorithm that supports up to four levels of nesting.
The process of detecting and removing duplicate email messages from an archive, typically by comparing Message-ID values, to avoid redundancy when merging multiple MBOX files.
A parsing technique that reads a file incrementally in small chunks rather than loading the entire file into memory at once, enabling tools to open and index very large MBOX files — tens or hundreds of gigabytes — with low memory usage.
A compact index file that Mbox Viewer writes alongside an MBOX archive after the first parse, storing message byte offsets and metadata to enable near-instant reopens without re-scanning the entire file.
An email message whose body is formatted with HTML and CSS, allowing rich typography, layout, colors, and images. Most modern email is HTML, but privacy and security considerations require careful rendering.
A macOS application bundle that contains native compiled code for both Apple Silicon (ARM64) and Intel (x86_64) architectures, running natively on either processor without emulation.
App Sandbox is a macOS security feature that restricts an application's access to system resources, files, and network connections, limiting the potential impact if the app or its dependencies have a vulnerability.