Skip to content

Comparing Two JSON Files: A Diff Guide

Why text-diffing JSON is unreliable and how to compute a structural diff — key-order independence, array identity, JSON Patch output.

Two JSON documents that look different in a text diff might be semantically identical. Two that look identical might have a subtle mismatch buried in an array. JSON's tree structure means a useful diff has to operate on the parsed structure, not on the text. This guide explains why text-diffing JSON fails, how a structural diff works, and how to choose a strategy for arrays — the one place even structural diffs disagree.

Why text-diffing JSON fails

git diff and diff -u are line-based text diffs. They show you that two strings of bytes differ. For JSON they fail in two ways:

Key order. JSON objects are unordered, but JSON.stringify writes keys in insertion order. Two correct payloads with the same data can serialise to different byte sequences:

{"a": 1, "b": 2}
{"b": 2, "a": 1}

A text diff flags both lines as changed. A structural diff says "identical".

Formatting. Different indentation, trailing newlines, escaped vs unescaped non-ASCII, spaces after colons — all change the bytes without changing the meaning. See minify vs prettify JSON for the whitespace dimension.

The text diff isn't wrong — it's answering a different question. "Are these bytes identical" and "does this JSON have the same meaning" are not the same question.

Structural diff: by key, recursively

A structural diff walks both trees in parallel:

  1. If both values are objects, compare key sets. For each key, recurse on the paired values. Report keys only-in-A as removed, only-in-B as added.
  2. If both values are arrays, walk in parallel (see the array section below).
  3. If both values are scalars, compare directly.
  4. If the types differ at any node, report a type change.

Given:

// A
{ "name": "Ada", "age": 36, "city": "London" }
// B
{ "name": "Ada", "age": 37, "country": "UK" }

A useful diff says:

age: 36 → 37          (changed)
city: "London"        (removed)
country: "UK"         (added)

…not "two lines differ in the text output."

Added, removed, changed semantics

The three change types are the building blocks of every structural JSON diff and of the standardised diff formats. They map to:

  • Added — key exists in B, not in A.
  • Removed — key exists in A, not in B.
  • Changed — key exists in both, values differ. (Sometimes broken out as "type changed" if the new value is a different JSON type.)

When the changed value is itself an object or array, you can either emit one "changed" record with the whole new subtree, or recurse and emit fine-grained changes inside. Fine-grained is almost always more useful.

Arrays: by index vs by identity

Arrays are the hard case. Three reasonable strategies:

By index. Pair A[0] with B[0], A[1] with B[1], etc. Cheap. Works when arrays are short and position is meaningful (matrices, RGB triples). Fails catastrophically when you prepend or remove a single element — every subsequent index is a "change."

[1, 2, 3] → [0, 1, 2, 3]

By index: every element changed plus one added. By human inspection: one element inserted at the head.

By LCS (longest common subsequence). The algorithm diff uses for text. Finds the largest matching subsequence and treats everything else as inserts and deletes. Handles the prepend case sensibly. Cost is O(n·m).

By identity key. Match elements by an id field (or any chosen key). Best for arrays of records where order doesn't matter. Requires the caller to specify the key:

// A
[{ "id": 1, "name": "Ada" }, { "id": 2, "name": "Alan" }]
// B
[{ "id": 2, "name": "Alan" }, { "id": 1, "name": "Ada Lovelace" }]

By index: both elements changed. By id: id=1 had its name change, id=2 is unchanged. The id-based result is what humans almost always want.

The JSON Diff tool lets you choose the array strategy per-diff; defaulting to LCS, with id matching as an opt-in when you specify the id field.

Normalising before diffing

Before running the diff, normalise both inputs so accidental differences don't appear as changes:

  • Sort object keys recursively. A "did the meaning change" diff treats {a,b} and {b,a} as identical.
  • Pretty-print. Same indentation on both sides.
  • Strip insignificant whitespace inside strings? No — string contents are meaningful. Don't touch them.
  • Normalise number formats. 1.0 and 1 parse to the same value in JavaScript; some diffs treat them as identical, some as different (because their string form differs). Decide which you want.

A combined formatter-plus-diff pipeline is format(A) → format(B) → diff, which is what the /json/formatter + /json/diff pair does.

JSON Patch and JSON Merge Patch

For programmatic use, two standardised diff output formats:

JSON Patch (RFC 6902) is a sequence of operations: add, remove, replace, move, copy, test. The example diff above as a patch:

[
  { "op": "replace", "path": "/age", "value": 37 },
  { "op": "remove", "path": "/city" },
  { "op": "add", "path": "/country", "value": "UK" }
]

This is the right format to send over the wire for a PATCH endpoint.

JSON Merge Patch (RFC 7396) is a simpler format that looks like the target document with null for removals:

{
  "age": 37,
  "city": null,
  "country": "UK"
}

Easier to read, but can't represent setting a field to null (because null means "delete"). Pick Merge Patch for human-edited diffs, JSON Patch for machine-generated ones.

Compare yours

Paste two payloads into the JSON Diff tool. It computes a structural diff (key-order independent, by default), highlights added / removed / changed nodes, and can emit a JSON Patch.

Next steps