JSON vs YAML vs MessagePack vs Protobuf: When to Use Each Format

JSON has no comments, no date type, and a number precision problem that caused Twitter to change their API. Here's how JSON compares to XML, YAML, MessagePack, and Protocol Buffers, when each format makes sense, and what JSON Schema adds.

JSON dominates data interchange — but it was never designed to be a data format, and its limitations matter

JSON (JavaScript Object Notation) was extracted from JavaScript by Douglas Crockford around 2001 as a simple alternative to XML for data exchange. It was defined in RFC 4627 (2006) and later more precisely in RFC 8259 (2017). Its success was remarkable: it became the default data interchange format for web APIs, configuration files, and data pipelines.

But JSON has well-documented limitations — no comments, no type system beyond the basic six, no binary support, no schema enforcement — and understanding when alternatives like YAML, MessagePack, or Protocol Buffers are the better choice is practical knowledge for any developer working with data.

JSON's actual data types

JSON supports exactly six value types:

null
boolean (true or false)
number (no separate integer/float distinction in the spec)
string (UTF-8)
array (ordered list of values)
object (unordered set of key-value pairs, keys must be strings)

What JSON doesn't have:

Dates (dates are strings — there's no standard date format, though ISO 8601 is convention)
Binary data (binary must be Base64-encoded to a string)
Integers vs. floats (the number type covers both)
Comments
Undefined (JavaScript's undefined doesn't serialise to JSON)
Functions
Circular references (will fail to serialise)

The number precision problem: JSON numbers are IEEE 754 double-precision floats. The maximum safely representable integer is 2⁵³ − 1 (9,007,199,254,740,991). Numbers outside this range lose precision. Large IDs (common in databases using 64-bit integers) must be transmitted as strings in JSON to avoid precision loss.

Twitter learned this: they switched tweet IDs to strings in API responses after developers discovered that JavaScript's JSON.parse() was silently dropping precision on large numeric IDs.

JSON Schema: adding type enforcement

JSON has no built-in type validation, but JSON Schema is a specification for validating JSON documents. Used for:

API request/response validation
Configuration file validation
Documentation generation from schema definitions

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "minLength": 1
    },
    "age": {
      "type": "integer",
      "minimum": 0,
      "maximum": 150
    },
    "email": {
      "type": "string",
      "format": "email"
    }
  },
  "required": ["name", "email"]
}

Validation libraries exist for most languages: jsonschema (Python), ajv (JavaScript/Node.js), Newtonsoft.Json.Schema (.NET).

JSON vs. XML: the context where JSON won

XML (eXtensible Markup Language) dominated data interchange in the early 2000s. JSON replaced it in most web API contexts for reasons that are straightforward in retrospect:

Verbosity: XML requires closing tags. <name>John</name> vs "name": "John". XML serialisation is consistently 2–4× more verbose than equivalent JSON.

Developer experience: JSON maps directly to native data structures in JavaScript, Python, Ruby, and most modern languages. Parsing XML requires an XML library; parsing JSON is built into JavaScript natively.

Simplicity: JSON has six types. XML has a richer but more complex model — attributes, elements, namespaces, schemas (XSD), transforms (XSLT), queries (XPath/XQuery), and more.

Where XML is still appropriate:

Documents with mixed content (text with embedded formatting)
Contexts requiring comments in the data itself (though JSON5 and JSONC address this)
Systems already using XML-based standards (SOAP, RSS, Atom, SVG, HTML)
Configurations requiring complex document validation (DocBook, DITA)

YAML: the human-friendly alternative

YAML (YAML Ain't Markup Language) is a superset of JSON — valid JSON is valid YAML — but adds:

Comments (#)
Multi-line strings
Anchors and aliases (reference reuse)
More readable syntax for nested structures

# Configuration file (YAML)
database:
  host: localhost
  port: 5432
  name: mydb
  
features:
  - search
  - notifications
  - analytics

Where YAML is used: configuration files (Kubernetes, Docker Compose, GitHub Actions, Ansible), CI/CD pipeline definitions. YAML is less suitable for machine-to-machine data exchange due to parsing complexity.

YAML's pitfall: ambiguous type coercion. In many YAML parsers, yes, no, on, off, true, false are all treated as booleans. Country codes like NO (Norway) and YES (Yemen) — perfectly valid strings — were parsed as booleans in many early Kubernetes configs, causing subtle bugs. YAML 1.2 addressed this, but old parsers persist.

MessagePack: binary JSON

MessagePack encodes the same JSON-like data model in a compact binary format instead of text. Typical compression vs. JSON: 20–50% smaller. No text parsing overhead — byte-level decoding is faster.

import msgpack

data = {"name": "Alice", "age": 30, "active": True}

json_bytes = json.dumps(data).encode()  # 38 bytes
msgpack_bytes = msgpack.packb(data)     # 22 bytes (example)

When to use MessagePack: high-throughput APIs where bandwidth and parsing speed matter, inter-service communication where human readability isn't required, storing large volumes of structured data.

Protocol Buffers: schema-first serialisation

Protocol Buffers (protobuf), developed by Google, take a fundamentally different approach: define your schema first, then generate serialisation/deserialisation code.

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
  bool active = 3;
}

The compiled code handles serialisation to a compact binary format. Advantages:

Smaller payload than JSON (typically 3–10× smaller)
Faster serialisation/deserialisation
Schema enforcement at compile time
Backward/forward compatibility with field numbering

Where protobuf is used: Google's internal APIs, gRPC (Google's RPC framework), performance-critical microservices.

Trade-offs: requires schema files and code generation step; binary format is not human-readable without tooling.

JSON5 and JSONC: extending JSON with comments

Some configuration contexts need comments and other conveniences that JSON doesn't support. Two non-standard extensions address this:

JSON5: extends JSON with comments, trailing commas, unquoted keys, single-quoted strings, and more. Used in some configuration tools.

JSONC (JSON with Comments): a minimal extension adding just comment support. Used by VS Code's settings files.

Neither is valid JSON — they require their own parsers.

How to use the JSON Formatter on sadiqbd.com

Paste your JSON — minified or formatted
Format/Pretty print — adds indentation and line breaks for readability
Minify — removes whitespace, reducing size for transmission
Validate — syntax errors are highlighted with line numbers
Use for debugging — paste raw API responses to inspect their structure

Frequently Asked Questions

Why doesn't JSON support comments? Crockford intentionally excluded comments to prevent the misuse he'd observed in config files: people leaving commented-out configuration sections that caused maintenance confusion. He later described this as potentially a mistake, but it's baked into the spec.

When should I use JSON vs. YAML for configuration files? YAML for human-edited configuration files (more readable, supports comments). JSON for machine-generated config (simpler, no ambiguity, universally supported). Both are fine; team preference often determines the choice.

Is the JSON Formatter free? Yes — completely free, no sign-up required.

JSON's simplicity is what made it win — six types, no schema, trivially parseable in any language. Its limitations are the price of that simplicity. Knowing when those limitations matter and which alternative addresses them is the practical skill that complements the tool.

Try the JSON Formatter free at sadiqbd.com — pretty print, minify, and validate any JSON instantly.