Working With Large JSON Files
Strategies for JSON files that don't fit comfortably in memory — streaming parsers, NDJSON, chunking, compression, and tooling that doesn't choke.
A 5 MB JSON file is fine. A 500 MB JSON file will take down your editor,
your JSON.parse, and your memory budget if you treat it like a 5 MB
file. This guide is the survival kit for big JSON — what makes it slow,
which parser to reach for, when to switch formats entirely, and the
tools that don't choke when you load a 100k-record array.
Why big JSON is slow
The default parser in every mainstream language — JSON.parse,
json.loads, json.Unmarshal — is whole-document, in-memory. It
reads the entire file into a string, then allocates an object graph that
mirrors the document. Two costs scale with size:
- Memory — the parsed structure is typically 3–5× the file size in RAM (object headers, hash buckets, boxed numbers). A 500 MB file routinely becomes 2 GB of heap.
- Latency — parsing is single-pass and largely serial. A 500 MB JSON document takes seconds even on fast hardware, during which the parsing thread is unavailable.
The whole-document parser is the right choice for documents under a few megabytes. Above that, you need a different strategy.
Streaming parsers
A streaming parser emits events as it walks the input, rather than
building a tree. The classic interface is SAX-style: onStartObject,
onKey, onValue, onEndArray. You see each record once and choose
what to keep.
Libraries:
- JavaScript —
clarinet,stream-json,JSONStream. - Python —
ijson. - Go —
encoding/json'sDecoderwithToken(). - Rust —
serde_json::Deserializer::from_reader.
A stream-json example that extracts records from a huge top-level
array without ever holding the full document:
import fs from "node:fs";
import { parser } from "stream-json";
import { streamArray } from "stream-json/streamers/StreamArray.js";
fs.createReadStream("orders.json")
.pipe(parser())
.pipe(streamArray())
.on("data", ({ value }) => {
// value is one record; process it and drop it
if (value.total > 100) console.log(value.id);
})
.on("end", () => console.log("done"));
Memory stays flat regardless of file size because each record is parsed and discarded.
NDJSON / line-delimited JSON
The simplest "streaming-friendly" format is one JSON value per line:
{"id":"1","total":42}
{"id":"2","total":85}
{"id":"3","total":17}
This is NDJSON (or JSON Lines, or .ndjson / .jsonl). Every cloud
logger emits it. Every analytics platform ingests it. Parsing is one
line per JSON.parse call — trivial to stream, trivial to split across
workers, trivial to grep from a shell.
If you control the producer of a large dataset, emit NDJSON instead of a giant array. Consumers will thank you.
import readline from "node:readline";
import fs from "node:fs";
const rl = readline.createInterface({ input: fs.createReadStream("orders.jsonl") });
for await (const line of rl) {
if (!line.trim()) continue;
const record = JSON.parse(line);
// ...
}
Splitting and paginating large payloads
If a producer insists on emitting a single huge document, two patterns recover sanity:
- Server-side pagination —
?page=1&pageSize=1000cursors. Each page is a small JSON document. Always include anextcursor or total count so clients know when to stop. - Client-side chunking — when reading from disk, use a streaming parser and process N records at a time before flushing to the database / network / output stream.
For ad hoc exploration of a giant document, /json/viewer lazily expands subtrees instead of materialising the whole graph.
Compression on the wire
JSON compresses extremely well — typically 80–90% reduction with gzip, even better with Brotli or Zstandard. Repeated keys and whitespace are exactly what these algorithms exploit.
For HTTP endpoints serving large JSON:
- Set
Content-Encoding: gzip(orbr, orzstd) and pre-compress at the origin, or compress on the fly. - Pre-compress static large JSON in your build pipeline; serve
.json.gzand.json.brdirectly. - Don't bother minifying JSON before compressing — the compressor recovers the whitespace savings.
See minify vs prettify JSON for the trade-offs.
Tools that don't choke
Most text editors and JSON viewers attempt to parse the whole document into a DOM-like structure for syntax highlighting, which is what makes them lock up on big files. Better strategies:
- The JSON Viewer — lazy tree expansion; subtrees are parsed on demand.
- The JSON Formatter — uses a Web Worker for files above ~1.5 MB so the UI stays responsive.
jq -cfor line-by-line processing in a terminal.lessfor raw inspection — never load 500 MB into Vim.
When to switch off JSON entirely
JSON is text. Text is verbose. For workloads that are throughput-limited or cost-limited, a binary format is the right answer:
- Protocol Buffers / FlatBuffers — schema-driven, very fast, very compact.
- MessagePack — JSON-like data model, binary encoding, drop-in for most JSON code.
- Avro / Parquet — columnar, ideal for analytics workloads.
- CBOR — IETF binary JSON, used in IoT and WebAuthn.
Convert to a columnar format like Parquet if you find yourself running the same aggregations across millions of records.
Practical workflow
A repeatable process for "I've been handed a 2 GB JSON file":
- Inspect the first 1000 lines with
headto confirm the shape (single document vs NDJSON). - If NDJSON, you're done — stream it.
- If a single array of records, use a streaming parser scoped to the array.
- If a deeply-nested document, use JSONPath to extract the part you care about, then stream that.
- If you need to share the result, flatten to CSV for tabular data — see JSON to CSV: flattening nested data.
Next steps
- JSONPath explained — extract the slice of a large document you actually need.
- JSON to CSV: flattening nested data — a flat format that handles millions of records gracefully.
- /json/viewer — explore big documents without melting your browser.