Working With Large JSON Files

Strategies for JSON files that don't fit comfortably in memory — streaming parsers, NDJSON, chunking, compression, and tooling that doesn't choke.

2026-05-13

A 5 MB JSON file is fine. A 500 MB JSON file will take down your editor, your JSON.parse, and your memory budget if you treat it like a 5 MB file. This guide is the survival kit for big JSON — what makes it slow, which parser to reach for, when to switch formats entirely, and the tools that don't choke when you load a 100k-record array.

Why big JSON is slow

The default parser in every mainstream language — JSON.parse, json.loads, json.Unmarshal — is whole-document, in-memory. It reads the entire file into a string, then allocates an object graph that mirrors the document. Two costs scale with size:

Memory — the parsed structure is typically 3–5× the file size in RAM (object headers, hash buckets, boxed numbers). A 500 MB file routinely becomes 2 GB of heap.
Latency — parsing is single-pass and largely serial. A 500 MB JSON document takes seconds even on fast hardware, during which the parsing thread is unavailable.

The whole-document parser is the right choice for documents under a few megabytes. Above that, you need a different strategy.

Streaming parsers

A streaming parser emits events as it walks the input, rather than building a tree. The classic interface is SAX-style: onStartObject, onKey, onValue, onEndArray. You see each record once and choose what to keep.

Libraries:

JavaScript — clarinet, stream-json, JSONStream.
Python — ijson.
Go — encoding/json's Decoder with Token().
Rust — serde_json::Deserializer::from_reader.

A stream-json example that extracts records from a huge top-level array without ever holding the full document:

import fs from "node:fs";
import { parser } from "stream-json";
import { streamArray } from "stream-json/streamers/StreamArray.js";

fs.createReadStream("orders.json")
  .pipe(parser())
  .pipe(streamArray())
  .on("data", ({ value }) => {
    // value is one record; process it and drop it
    if (value.total > 100) console.log(value.id);
  })
  .on("end", () => console.log("done"));

Memory stays flat regardless of file size because each record is parsed and discarded.

NDJSON / line-delimited JSON

The simplest "streaming-friendly" format is one JSON value per line:

{"id":"1","total":42}
{"id":"2","total":85}
{"id":"3","total":17}

This is NDJSON (or JSON Lines, or .ndjson / .jsonl). Every cloud logger emits it. Every analytics platform ingests it. Parsing is one line per JSON.parse call — trivial to stream, trivial to split across workers, trivial to grep from a shell.

If you control the producer of a large dataset, emit NDJSON instead of a giant array. Consumers will thank you.

import readline from "node:readline";
import fs from "node:fs";

const rl = readline.createInterface({ input: fs.createReadStream("orders.jsonl") });
for await (const line of rl) {
  if (!line.trim()) continue;
  const record = JSON.parse(line);
  // ...
}

Splitting and paginating large payloads

If a producer insists on emitting a single huge document, two patterns recover sanity:

Server-side pagination — ?page=1&pageSize=1000 cursors. Each page is a small JSON document. Always include a next cursor or total count so clients know when to stop.
Client-side chunking — when reading from disk, use a streaming parser and process N records at a time before flushing to the database / network / output stream.

For ad hoc exploration of a giant document, /json/viewer lazily expands subtrees instead of materialising the whole graph.

Compression on the wire

JSON compresses extremely well — typically 80–90% reduction with gzip, even better with Brotli or Zstandard. Repeated keys and whitespace are exactly what these algorithms exploit.

For HTTP endpoints serving large JSON:

Set Content-Encoding: gzip (or br, or zstd) and pre-compress at the origin, or compress on the fly.
Pre-compress static large JSON in your build pipeline; serve .json.gz and .json.br directly.
Don't bother minifying JSON before compressing — the compressor recovers the whitespace savings.

See minify vs prettify JSON for the trade-offs.

Tools that don't choke

Most text editors and JSON viewers attempt to parse the whole document into a DOM-like structure for syntax highlighting, which is what makes them lock up on big files. Better strategies:

The JSON Viewer — lazy tree expansion; subtrees are parsed on demand.
The JSON Formatter — uses a Web Worker for files above ~1.5 MB so the UI stays responsive.
jq -c for line-by-line processing in a terminal.
less for raw inspection — never load 500 MB into Vim.

When to switch off JSON entirely

JSON is text. Text is verbose. For workloads that are throughput-limited or cost-limited, a binary format is the right answer:

Protocol Buffers / FlatBuffers — schema-driven, very fast, very compact.
MessagePack — JSON-like data model, binary encoding, drop-in for most JSON code.
Avro / Parquet — columnar, ideal for analytics workloads.
CBOR — IETF binary JSON, used in IoT and WebAuthn.

Convert to a columnar format like Parquet if you find yourself running the same aggregations across millions of records.

Practical workflow

A repeatable process for "I've been handed a 2 GB JSON file":

Inspect the first 1000 lines with head to confirm the shape (single document vs NDJSON).
If NDJSON, you're done — stream it.
If a single array of records, use a streaming parser scoped to the array.
If a deeply-nested document, use JSONPath to extract the part you care about, then stream that.
If you need to share the result, flatten to CSV for tabular data — see JSON to CSV: flattening nested data.

Next steps

JSONPath explained — extract the slice of a large document you actually need.
JSON to CSV: flattening nested data — a flat format that handles millions of records gracefully.
/json/viewer — explore big documents without melting your browser.