JSON to CSV: Flattening Nested Data
How to convert nested JSON into flat CSV rows — dotted keys, array handling, escaping, encoding. Real examples plus the lossy edges to watch for.
CSV is the universal currency of data analysis tools — Excel, Google Sheets, every database import, every BI tool. JSON, by contrast, is tree-shaped. Going from one to the other isn't a 1-to-1 mapping; flattening makes choices that lose information. This guide covers the decisions you need to make explicitly, so the conversion doesn't surprise you downstream.
Why CSV needs flat rows
CSV is a 2D format: a header row plus N data rows, all with the same number of columns. JSON is N-dimensional — objects nest, arrays nest inside objects, objects nest inside arrays. Any conversion has to project the tree onto a grid.
Three operations do this projection:
- Pick a "row entity" — usually the elements of a top-level array.
- Flatten nested objects into dot-pathed columns.
- Decide what to do with nested arrays — explode, join, or index.
Mapping an array of objects to rows
The easy case. The JSON is a flat array of flat objects:
[
{ "id": 1, "name": "Ada", "active": true },
{ "id": 2, "name": "Alan", "active": false }
]
Every key becomes a column; every element becomes a row:
id,name,active
1,Ada,true
2,Alan,false
The header row is the union of keys across all records. If a key is missing from a row, the cell is empty.
Flattening nested objects (dotted keys)
Nest one level, dot the path:
[
{
"id": 1,
"name": "Ada",
"address": { "city": "London", "zip": "SW1A 2AA" }
}
]
id,name,address.city,address.zip
1,Ada,London,SW1A 2AA
For multiple levels, just keep dotting: address.country.code. The
notation is the same one JSONPath uses, so
the column names tell you exactly which field they came from. Some tools
use address_city (underscore) or address__city (double underscore)
instead — pick a convention and document it.
Arrays inside records
This is where conversion gets opinionated. There are three reasonable strategies, each lossy in a different way.
Strategy A: explode. One row per array element. Best when array elements are the real records:
[
{ "id": 1, "tags": ["admin", "founder"] }
]
id,tag
1,admin
1,founder
This is what jq does with [.id, .tags[]] | @csv and what SQL would do
with UNNEST. Two issues — you lose the natural "one row per user"
shape, and if a record has multiple arrays you have to explode the
cross-product or pick one.
Strategy B: join. Encode the array as a single delimited string:
id,tags
1,"admin;founder"
Best when the array is small and consumers can split. Choose a delimiter that doesn't appear in the values — semicolon or pipe usually.
Strategy C: index. One column per array position:
id,tags.0,tags.1
1,admin,founder
Best when the array has a fixed small length and position is meaningful
(e.g. RGB triples). Falls apart for variable-length arrays — you end up
with tags.0 through tags.99 mostly empty.
There's no universally right answer. Pick based on what the consumer does with the output.
Headers, delimiters, quoting, encoding
The four CSV variables that bite in production:
- Delimiter. RFC 4180 says comma. European Excel says semicolon. TSV uses tab. Always set the delimiter explicitly in your converter and document it.
- Quoting. Values containing the delimiter, a newline, or a double
quote must be quoted. The quoting char is
", and a literal"inside a quoted value is doubled:". Soshe said "hi"becomes"she said ""hi""". - Headers. Optional in RFC 4180 but you should always emit them. Without headers, the conversion is opaque to the consumer.
- Encoding. UTF-8 is the only safe answer. Excel for Windows still
defaults to a regional 8-bit encoding when double-clicking a
.csvfile — adding a UTF-8 BOM (EF BB BF) at the start of the file makes Excel auto-detect UTF-8. Yes, this is the same BOM the JSON syntax errors guide tells you to avoid in JSON. CSV is friendlier to it.
Lossy conversions and how to mitigate
Conversions that lose information:
- Type loss — CSV has no types.
"true",true, and1are all the same stringtrueon the wire. Pin a convention: numbers as decimals, booleans astrue/falseor1/0, dates as ISO 8601. - Null vs empty — there is no
nullin CSV. Most tools use the empty string. If you need to distinguish "missing" from "empty string", encode them differently (e.g.\Nfor null, à la PostgreSQLCOPY). - Nested array order — when you explode, the original within-record order is preserved per row, but consumers can't always tell what the parent record was unless you carry an id column.
- Object identity inside arrays — exploding arrays of objects
multiplies columns:
orders[*].id,orders[*].totalneed either an index suffix or a row explode.
The mitigation is the same in every case: write down your conversion conventions and treat them as part of the data contract.
Working with big inputs
If the JSON is large, don't load it all into memory before converting. See working with large JSON files for streaming patterns — a streaming JSON parser feeding a streaming CSV writer is the right shape, and processes arbitrarily large inputs in constant memory.
Convert yours
Paste JSON into /json/convert/csv — it offers strategy A, B, and C for nested arrays, a delimiter picker, and a UTF-8-BOM toggle for Excel compatibility. Everything runs locally; no data leaves your browser.
Next steps
- Working with large JSON files — streaming JSON-to-CSV for huge inputs.
- JSON vs YAML vs XML — choosing the right format upstream of conversion.