Skip to content

JSON vs YAML vs XML: Which Data Format to Use

Side-by-side comparison of JSON, YAML, and XML with the same payload, data-type support, and concrete recommendations for when to pick each.

JSON, YAML, and XML solve the same broad problem — encode structured data as text — but their syntax, ecosystems, and trade-offs differ enough that picking the wrong one will hurt. This guide compares them head-to-head and gives a clear rule for when to reach for each.

The three formats at a glance

PropertyJSONYAMLXML
Primary use todayAPIs, wire formatConfig, IaC, CI pipelinesDocuments, legacy systems
CommentsNoYes (#)Yes (<!-- -->)
SchemaJSON Schema (add-on)JSON Schema (via converters)XSD, RELAX NG, DTD
Anchors / referencesNoYes (&, *)Limited (via DTD)
Attributes vs contentNo distinctionNo distinctionYes (attributes + text)
Parsing complexityTrivialHigh (whitespace, types)High (namespaces, DTD)
Human writabilityOKBestWorst
File size (typical)SmallSmallestLargest

The same payload in three formats

A user record, in JSON:

{
  "id": 42,
  "name": "Ada Lovelace",
  "active": true,
  "tags": ["admin", "founder"]
}

Same record, in YAML:

id: 42
name: Ada Lovelace
active: true
tags:
  - admin
  - founder

Same record, in XML:

<user id="42" active="true">
  <name>Ada Lovelace</name>
  <tags>
    <tag>admin</tag>
    <tag>founder</tag>
  </tags>
</user>

Notice XML's structural choice you do not have in JSON or YAML: id and active can be attributes of the element or child elements. That flexibility is why XML mappings are bespoke per project.

Data types

  • JSON has six types: string, number, boolean, null, object, array. No date, no binary.
  • YAML has the JSON set plus timestamps, plus implicit typing — 1.0, 1, yes, no, on, off, null, ~ are all magic. This is the famous Norway problem: the country code NO becomes the boolean false unless quoted.
  • XML has no types in the base spec — everything is text. XSD adds a type system, but it is a separate document and rarely used outside enterprise.

For untyped exchange where types matter, JSON wins on simplicity. If you need to model rich types, lean on a schema (JSON Schema, OpenAPI, or XSD).

Comments, anchors, and namespaces

JSON has none of these. YAML has both:

defaults: &defaults
  region: us-east-1
  timeout: 30

prod:
  <<: *defaults
  bucket: prod-bucket

staging:
  <<: *defaults
  bucket: staging-bucket

Anchors are the killer feature for hand-edited config; they are why YAML is the dominant choice for Kubernetes manifests, GitHub Actions workflows, and Ansible playbooks.

XML's superpower is namespaces and a document/mixed-content model — useful for things like SOAP, SVG, and word-processing documents where text and markup interleave, but overkill for plain records.

Size, parsing speed, and security

JSON parsers are fast — JSON.parse in V8 is hand-written assembly. YAML parsers must implement a much larger grammar and are correspondingly slower; pathological documents have caused real CVEs (the "billion laughs" attack works in YAML and XML, never in JSON).

XML's external entities have been a persistent security issue (XXE). If you parse XML from untrusted sources, disable external entity resolution explicitly.

For pure size, well-written YAML is usually smallest, JSON middle, XML largest — sometimes by a factor of two on the same data.

When to pick which

  • HTTP APIs and wire formats — JSON. Universal support, fast parsers, no surprises.
  • Configuration that humans hand-edit — YAML. Comments and anchors pay for themselves.
  • Build/CI pipeline definitions — YAML, because the ecosystem (GitHub Actions, GitLab CI, CircleCI) standardised on it.
  • Document-oriented data with mixed content — XML. Think DOCX, SVG, RSS, SOAP. JSON cannot represent Hello <b>world</b>! cleanly.
  • Greenfield enterprise integration — JSON with OpenAPI. XML/SOAP only if a partner forces it.
  • Anything machine-to-machine, high volume — JSON, or move to a binary format like Protocol Buffers or MessagePack.

Converting between them

In practice you will need to move data between formats — pulling a YAML config into a JSON API request, or accepting XML from a legacy partner and re-emitting JSON. The conversions are mostly mechanical when types are simple:

The lossy edges are: YAML's anchors flatten on conversion, XML's attribute/element distinction collapses, and arrays-vs-single-element are ambiguous in XML. Document your conversion conventions.

Next steps