Comparing Two JSON Files: A Diff Guide
Why text-diffing JSON is unreliable and how to compute a structural diff — key-order independence, array identity, JSON Patch output.
Two JSON documents that look different in a text diff might be semantically identical. Two that look identical might have a subtle mismatch buried in an array. JSON's tree structure means a useful diff has to operate on the parsed structure, not on the text. This guide explains why text-diffing JSON fails, how a structural diff works, and how to choose a strategy for arrays — the one place even structural diffs disagree.
Why text-diffing JSON fails
git diff and diff -u are line-based text diffs. They show you that
two strings of bytes differ. For JSON they fail in two ways:
Key order. JSON objects are unordered, but JSON.stringify writes
keys in insertion order. Two correct payloads with the same data can
serialise to different byte sequences:
{"a": 1, "b": 2}
{"b": 2, "a": 1}
A text diff flags both lines as changed. A structural diff says "identical".
Formatting. Different indentation, trailing newlines, escaped vs unescaped non-ASCII, spaces after colons — all change the bytes without changing the meaning. See minify vs prettify JSON for the whitespace dimension.
The text diff isn't wrong — it's answering a different question. "Are these bytes identical" and "does this JSON have the same meaning" are not the same question.
Structural diff: by key, recursively
A structural diff walks both trees in parallel:
- If both values are objects, compare key sets. For each key, recurse on the paired values. Report keys only-in-A as removed, only-in-B as added.
- If both values are arrays, walk in parallel (see the array section below).
- If both values are scalars, compare directly.
- If the types differ at any node, report a type change.
Given:
// A
{ "name": "Ada", "age": 36, "city": "London" }
// B
{ "name": "Ada", "age": 37, "country": "UK" }
A useful diff says:
age: 36 → 37 (changed)
city: "London" (removed)
country: "UK" (added)
…not "two lines differ in the text output."
Added, removed, changed semantics
The three change types are the building blocks of every structural JSON diff and of the standardised diff formats. They map to:
- Added — key exists in B, not in A.
- Removed — key exists in A, not in B.
- Changed — key exists in both, values differ. (Sometimes broken out as "type changed" if the new value is a different JSON type.)
When the changed value is itself an object or array, you can either emit one "changed" record with the whole new subtree, or recurse and emit fine-grained changes inside. Fine-grained is almost always more useful.
Arrays: by index vs by identity
Arrays are the hard case. Three reasonable strategies:
By index. Pair A[0] with B[0], A[1] with B[1], etc. Cheap.
Works when arrays are short and position is meaningful (matrices, RGB
triples). Fails catastrophically when you prepend or remove a single
element — every subsequent index is a "change."
[1, 2, 3] → [0, 1, 2, 3]
By index: every element changed plus one added. By human inspection: one element inserted at the head.
By LCS (longest common subsequence). The algorithm diff uses for
text. Finds the largest matching subsequence and treats everything else
as inserts and deletes. Handles the prepend case sensibly. Cost is
O(n·m).
By identity key. Match elements by an id field (or any chosen
key). Best for arrays of records where order doesn't matter. Requires
the caller to specify the key:
// A
[{ "id": 1, "name": "Ada" }, { "id": 2, "name": "Alan" }]
// B
[{ "id": 2, "name": "Alan" }, { "id": 1, "name": "Ada Lovelace" }]
By index: both elements changed. By id: id=1 had its name change,
id=2 is unchanged. The id-based result is what humans almost always
want.
The JSON Diff tool lets you choose the array strategy per-diff; defaulting to LCS, with id matching as an opt-in when you specify the id field.
Normalising before diffing
Before running the diff, normalise both inputs so accidental differences don't appear as changes:
- Sort object keys recursively. A "did the meaning change" diff
treats
{a,b}and{b,a}as identical. - Pretty-print. Same indentation on both sides.
- Strip insignificant whitespace inside strings? No — string contents are meaningful. Don't touch them.
- Normalise number formats.
1.0and1parse to the same value in JavaScript; some diffs treat them as identical, some as different (because their string form differs). Decide which you want.
A combined formatter-plus-diff pipeline is format(A) → format(B) → diff, which is what the /json/formatter + /json/diff
pair does.
JSON Patch and JSON Merge Patch
For programmatic use, two standardised diff output formats:
JSON Patch (RFC 6902) is a sequence of operations: add, remove,
replace, move, copy, test. The example diff above as a patch:
[
{ "op": "replace", "path": "/age", "value": 37 },
{ "op": "remove", "path": "/city" },
{ "op": "add", "path": "/country", "value": "UK" }
]
This is the right format to send over the wire for a PATCH endpoint.
JSON Merge Patch (RFC 7396) is a simpler format that looks like the
target document with null for removals:
{
"age": 37,
"city": null,
"country": "UK"
}
Easier to read, but can't represent setting a field to null (because
null means "delete"). Pick Merge Patch for human-edited diffs, JSON
Patch for machine-generated ones.
Compare yours
Paste two payloads into the JSON Diff tool. It computes a structural diff (key-order independent, by default), highlights added / removed / changed nodes, and can emit a JSON Patch.
Next steps
- JSONPath explained — narrow the diff to a subtree using a JSONPath query.
- Common JSON syntax errors — when the diff fails because one document won't parse.