Skip to content

Working With Large JSON Files

Strategies for JSON files that don't fit comfortably in memory — streaming parsers, NDJSON, chunking, compression, and tooling that doesn't choke.

A 5 MB JSON file is fine. A 500 MB JSON file will take down your editor, your JSON.parse, and your memory budget if you treat it like a 5 MB file. This guide is the survival kit for big JSON — what makes it slow, which parser to reach for, when to switch formats entirely, and the tools that don't choke when you load a 100k-record array.

Why big JSON is slow

The default parser in every mainstream language — JSON.parse, json.loads, json.Unmarshal — is whole-document, in-memory. It reads the entire file into a string, then allocates an object graph that mirrors the document. Two costs scale with size:

  • Memory — the parsed structure is typically 3–5× the file size in RAM (object headers, hash buckets, boxed numbers). A 500 MB file routinely becomes 2 GB of heap.
  • Latency — parsing is single-pass and largely serial. A 500 MB JSON document takes seconds even on fast hardware, during which the parsing thread is unavailable.

The whole-document parser is the right choice for documents under a few megabytes. Above that, you need a different strategy.

Streaming parsers

A streaming parser emits events as it walks the input, rather than building a tree. The classic interface is SAX-style: onStartObject, onKey, onValue, onEndArray. You see each record once and choose what to keep.

Libraries:

  • JavaScriptclarinet, stream-json, JSONStream.
  • Pythonijson.
  • Goencoding/json's Decoder with Token().
  • Rustserde_json::Deserializer::from_reader.

A stream-json example that extracts records from a huge top-level array without ever holding the full document:

import fs from "node:fs";
import { parser } from "stream-json";
import { streamArray } from "stream-json/streamers/StreamArray.js";

fs.createReadStream("orders.json")
  .pipe(parser())
  .pipe(streamArray())
  .on("data", ({ value }) => {
    // value is one record; process it and drop it
    if (value.total > 100) console.log(value.id);
  })
  .on("end", () => console.log("done"));

Memory stays flat regardless of file size because each record is parsed and discarded.

NDJSON / line-delimited JSON

The simplest "streaming-friendly" format is one JSON value per line:

{"id":"1","total":42}
{"id":"2","total":85}
{"id":"3","total":17}

This is NDJSON (or JSON Lines, or .ndjson / .jsonl). Every cloud logger emits it. Every analytics platform ingests it. Parsing is one line per JSON.parse call — trivial to stream, trivial to split across workers, trivial to grep from a shell.

If you control the producer of a large dataset, emit NDJSON instead of a giant array. Consumers will thank you.

import readline from "node:readline";
import fs from "node:fs";

const rl = readline.createInterface({ input: fs.createReadStream("orders.jsonl") });
for await (const line of rl) {
  if (!line.trim()) continue;
  const record = JSON.parse(line);
  // ...
}

Splitting and paginating large payloads

If a producer insists on emitting a single huge document, two patterns recover sanity:

  • Server-side pagination?page=1&pageSize=1000 cursors. Each page is a small JSON document. Always include a next cursor or total count so clients know when to stop.
  • Client-side chunking — when reading from disk, use a streaming parser and process N records at a time before flushing to the database / network / output stream.

For ad hoc exploration of a giant document, /json/viewer lazily expands subtrees instead of materialising the whole graph.

Compression on the wire

JSON compresses extremely well — typically 80–90% reduction with gzip, even better with Brotli or Zstandard. Repeated keys and whitespace are exactly what these algorithms exploit.

For HTTP endpoints serving large JSON:

  • Set Content-Encoding: gzip (or br, or zstd) and pre-compress at the origin, or compress on the fly.
  • Pre-compress static large JSON in your build pipeline; serve .json.gz and .json.br directly.
  • Don't bother minifying JSON before compressing — the compressor recovers the whitespace savings.

See minify vs prettify JSON for the trade-offs.

Tools that don't choke

Most text editors and JSON viewers attempt to parse the whole document into a DOM-like structure for syntax highlighting, which is what makes them lock up on big files. Better strategies:

  • The JSON Viewer — lazy tree expansion; subtrees are parsed on demand.
  • The JSON Formatter — uses a Web Worker for files above ~1.5 MB so the UI stays responsive.
  • jq -c for line-by-line processing in a terminal.
  • less for raw inspection — never load 500 MB into Vim.

When to switch off JSON entirely

JSON is text. Text is verbose. For workloads that are throughput-limited or cost-limited, a binary format is the right answer:

  • Protocol Buffers / FlatBuffers — schema-driven, very fast, very compact.
  • MessagePack — JSON-like data model, binary encoding, drop-in for most JSON code.
  • Avro / Parquet — columnar, ideal for analytics workloads.
  • CBOR — IETF binary JSON, used in IoT and WebAuthn.

Convert to a columnar format like Parquet if you find yourself running the same aggregations across millions of records.

Practical workflow

A repeatable process for "I've been handed a 2 GB JSON file":

  1. Inspect the first 1000 lines with head to confirm the shape (single document vs NDJSON).
  2. If NDJSON, you're done — stream it.
  3. If a single array of records, use a streaming parser scoped to the array.
  4. If a deeply-nested document, use JSONPath to extract the part you care about, then stream that.
  5. If you need to share the result, flatten to CSV for tabular data — see JSON to CSV: flattening nested data.

Next steps