Parsing JSON/XML/YML in the command line with jq/yq/xq
Table of contents
- Install
- Get the first item of the array
- Get only a specific field from the array items
- Get the length
- Filter entries with select
- Unescape character from JSON string with fromjson filter (parse)
- Return specific fields in a row with string interpolation
- Pick specific fields from an object
- Select with pipes and match
- Select with pipes and contains (includes)
- Get all the possible properties (keys)
- Get all the possible values (values)
- Get the items that their value is an array
- Find which items have a value that is an array, and get the items' keys and counts
- Drop (exclude, delete, remove) a property (field) from an object path
- Add a property (field) to an object
- Sort results
- Sort results and keep the initial structure
- How to remove items from a nested array based on a filter and keep the rest structure unchanged
- Update with pipe assignment operator |= multiple properties at once, maybe even on different levels of nesting.
- Unique field
- How to minify a JSON file
- Count objects if the output is an object stream (?) instead of an array with --compact, -c option (keywords: minify, minified, minimize, compress)
- Convert JSON to csv with jq
- Get the results as raw strings (without quotes)
- Multiple expresssions (search keyword: conditions) in select
- jq ternary if then else elseif conditions
- SQL-like operators, IN: check if the value of a property is in an array of known values
- Get a range from an array
- Combine arrays with jq with the slurp option
- Links
Install
Install jq with a package manager. For example, in Ubuntu with apt:
$ sudo apt install jqThe xq/yq executables are part the python package yq https://github.com/kislyuk/yq that transforms the XML/YML files to JSON and calls jq with the filter you provide. Install it with
# You need to install python3 first that includes the pip package manager.
$ pip3 install yqxq/yq call jq, so the filters will be the same.
See https://stedolan.github.io/jq/manual/
Get the first item of the array
$ curl https://jsonplaceholder.typicode.com/users | jq .[0]Get only a specific field from the array items
Also count the unique
$ curl https://jsonplaceholder.typicode.com/users |
jq .[].address.city | sort | uniq -cGet the length
Get the length of an array.
$ cat facebook.xml | xq '.rss.channel.item | length'Get the length of nested fields. Be careful, though, because if the field is a string, instead of an array, you’ll get the string length.
$ cat facebook.xml | xq '.rss.channel.item[]."g:additional_image_link" | length'Filter entries with select
$ curl https://jsonplaceholder.typicode.com/users |
jq '.[] | select(.username == "Bret")'Use and/or:
$ curl https://jsonplaceholder.typicode.com/users |
jq '.[] | select(.username == "Bret" and .address.city == "Gwenborough")'Unescape character from JSON string with fromjson filter (parse)
$ cat | jq '. | fromjson'
> "{\"result\":[{\"battery.current\":0,\"vehicle.mileage\":437.974}]}"prints:
{
"result": [
{
"battery.current": 0,
"vehicle.mileage": 437.974
}
]
}Or decode JSON strings in your programming language.
Return specific fields in a row with string interpolation
$ curl https://jsonplaceholder.typicode.com/users |
jq '.[] | select(.username == "Bret") | "\(.id) \(.name) \(.company.name)"'Pick specific fields from an object
$ jq '.urlset.url[] | { loc, lastmod }' kids-shop-123-sitemap.json | head
{
"loc": "https://kids-shop-123.gr/kathisma-daxtilidi-baniou-panda-prasino-lorelli",
"lastmod": "2023-04-27T21:54:05+00:00"
}
{
"loc": "https://kids-shop-123.gr/kathisma-daxtilidi-baniou-panda-roz-lorelli",
"lastmod": "2023-02-28T13:52:06+00:00"
}
{
"loc": "https://kids-shop-123.gr/1022057-Lorelli-pipila-silikonis-MY-FRIEND-me-kallyma1022057-blue",Select with pipes and match
Note the pipe inside select(..).
$ cat facebook.xml |
xq '.rss.channel.item[] | select(."g:link" | match("https:\/\/kids-shop-123.gr\/.*\/"))."g:link"' |
grep -ivE routeSelect with pipes and contains (includes)
$ cat facebook.xml |
xq '.rss.channel.item[] | select(."g:link" | contains("Lorelli")) | ."g:link"'Get all the possible properties (keys)
From an array of objects with keys built-in function.
$ cat facebook.xml | xq '.rss.channel.item[] | keys' | sort | uniqYou can also use keys[], I don’t know atm what’s the difference.
Get all the possible values (values)
From an array of objects with values[] built-in function.
$ jq '.[][1].ips | values[]' ./x64/Debug/user-stats.json
# From input:
[
[
667204,
{
"ips": {
"122.127.33.1": 1685028929
}
}
],
[
675978,
{
"ips": {
"10.26.218.58": 1685028929
}
}
],
...
# Output:
1685028929
1685028929
1685028976
...Get the items that their value is an array
Use the type
$ cat shop-2023-06-17.json |
jq '.products.product[] | select( .manufacturer == "CRISTIANA MASI") | select( values[] | type == "array" )'Find which items have a value that is an array, and get the items’ keys and counts
to_entries transforms an object to an array of key/value objects. It’s usefull if you want to do stuff with one of the key/value and then reference the other.
$ cat ~/Desktop/shop/shop-2024-02-10.json |
jq -r '.data.post[] | to_entries | .[] | select(.value | type == "array") | "\(.key) \(.value | length)"' |
sort |
uniq -c |
sort |
awk 'BEGIN{print "count, key, array_length"} {print $0}'
# prints
count, key, array_length
1 ImageFilename 3
1 ImageFilename 8
1 ImageURL 3
1 ImageURL 8
1 ΠροϊόνΣύνθεση 2
2 ΠροϊόνΔιαστάσεις 2
2 ΠροϊόνΧρώμα 4
3 Κατηγορίεςπροϊόντων 3
12 ΠροϊόνΧρώμα 2
21 ImageFilename 2
21 ImageURL 2
55 Κατηγορίεςπροϊόντων 2The use case here is to see which objects from an array have array properties. At the same time, print the key and the length of the array. Useful if I want to use wpallimport for a feed to see how many max additional images or attributes to account for (and not lose existing data).
$ cat kids-shop-123-facebook-2023-06-19.json |
jq -c '.rss.channel.item[] | to_entries[] | if .value | type == "array" then { key, length: .value | length } else null end' |
grep -v null | headDrop (exclude, delete, remove) a property (field) from an object path
Use the del(path_expression).
$ echo '[
{ "name": "Mark", "age": 12, "password": "123456" },
{ "name": "John", "age": 21, "password": "123" }
]' |
jq 'del(.[].password)'
# prints
[
{
"name": "Mark",
"age": 12
},
{
"name": "John",
"age": 21
}
]Add a property (field) to an object
Add the profit_target_sent property to all objects, with a default value of 0.
$ jq '.records[][1] += { "profit_target_sent": 0.0 }' my_file.json > my_file.json.readySort results
Notice that it’s item, not item[].
$ cat facebook.xml |
xq '.rss.channel.item | sort_by(."g:quantity") | .[]."g:quantity"'Sort results and keep the initial structure
Use |= vs |: See https://stackoverflow.com/a/30332672
$ jq '.urlset.url |= sort_by (.loc)' kids-shop-123-sitemap-real.json | head
{
"urlset": {
"@xmlns": "http://www.sitemaps.org/schemas/sitemap/0.9",
"@xmlns:image": "http://www.google.com/schemas/sitemap-image/1.1",
"url": [
{
"loc": "https://kids-shop-123.gr/10010060003-marsipo-lorelli-prasino",
"changefreq": "weekly",
"lastmod": "2021-11-03T13:49:11+00:00",
"priority": "1.0",Explanation:
$ echo '{ "numbers": [{ "value": 2 }, { "value": 3 }, { "value": 1 }] }' |
jq '.numbers | sort_by ( .value )'
# | (pipe) did not keep numbers property.
[
{
"value": 1
},
{
"value": 2
},
{
"value": 3
}
]
$ echo '{ "numbers": [{ "value": 2 }, { "value": 3 }, { "value": 1 }] }' |
jq '.numbers |= sort_by ( .value )'
# |= kept numbers property.
{
"numbers": [
{
"value": 1
},
{
"value": 2
},
{
"value": 3
}
]
}How to remove items from a nested array based on a filter and keep the rest structure unchanged
Again, use the assignment operator (|=)
$ echo '{"message":"Hi","items":[1,2,3]}' |
jq -c '.items[] |= select(. >2)'
{"message":"Hi","items":[3]}Note this works for jq version >= 1.7
Update with pipe assignment operator |= multiple properties at once, maybe even on different levels of nesting.
You do what you did above, but you then use regular pipe to update the second property.
$ echo '{"message":"Hi","items":[1,2,3],"hello":{"items":[1,4,5]}}' |
jq -c '.items[] |= select(. >2) | .hello.items[] |= select(. >2)'
{"message":"Hi","items":[3],"hello":{"items":[4,5]}}Unique field
It’s not a filter, the same as sort_by
cat facebook.xml |
xq '.rss.channel.item | unique_by(."g:quantity") | .[]."g:quantity"'How to minify a JSON file
Use the -c (compact) option:
$ echo '{
"message": "Hello World",
"number": 123
}' | jq -c '.'
# prints:
{"message":"Hello World","number":123}Count objects if the output is an object stream (?) instead of an array with –compact, -c option (keywords: minify, minified, minimize, compress)
The length works if the output is an array:
$ cat input/current-shop-export-22-09-2022.xml | xq '.data.post | length'
# Prints 761.But if the output consists of individual objects, the length is applied as a filter on each object:
$ cat input/current-shop-export-22-09-2022.xml | xq '.data.post[] | length'
# Prints:
# 25
# 25
# ...This comes up when you already filtered the array and you want to count how many items satisfy the filter. In this case, you can use the -c (compact) option to print each item in a single line and pipe the output to wc to count the number of lines
$ cat input/current-shop-export-22-09-2022.xml |
xq -c '.data.post[] | select(."Ετικέτεςπροϊόντος" | type == "array")' |
wc -l
# Prints 208Convert JSON to csv with jq
See https://stackoverflow.com/questions/32960857/how-to-convert-arbitrary-simple-json-to-csv-using-jq
$ jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' input.json > output.csvGet the results as raw strings (without quotes)
Say you don’t want the string quotes:
$ echo '["A", "B", "C"]' | jq '.[]'
"A"
"B"
"C"Use the -r option:
$ echo '["A", "B", "C"]' | jq -r '.[]'
A
B
C-r output raw strings, not JSON texts;Multiple expresssions (search keyword: conditions) in select
I want to select the tags from Woocommerce products that are arrays (they can also be a single string) and their length is equal to 8:
$ cat input/current-shop-export-22-09-2022.xml |
xq '.data.post[] | select(."Ετικέτεςπροϊόντος" | (type == "array") and length == 8)'Enclose the first expression in parenthesis and use the and keyword for the second expression, see https://stackoverflow.com/questions/33057420/jq-select-multiple-conditions, https://stedolan.github.io/jq/manual/#select(boolean_expression)
jq ternary if then else elseif conditions
My product categories are either a single category (string) or an array of categories (strings). I want to return the single category string, but I want to flatten the category arrays. For that reason, I will use if-then-else, see https://stedolan.github.io/jq/manual/#if-then-else and the example below:
$ cat input/current-shop-export-22-09-2022.xml |
xq -r '.data.post[]."Κατηγορίεςπροϊόντων" | if type == "array" then .[] else . end' |
sort | uniq -cSQL-like operators, IN: check if the value of a property is in an array of known values
$ jq '.records[] | select(.[0] | IN(20008,20009,20010))' profits.jsonGet a range from an array
Use the [start_index:end_endex] syntax:
# get the first 2 million elements from the array:
$ jq '.[0:2000000]' results-1705071239.json > results-1705071239-first-2m.jsonYou can also query items from the end of the array by adding negative indices.
You can also omit the end index if you from the first element until the last element of the array.
See “jq: Select range” https://stackoverflow.com/questions/45548604/jq-select-range
Combine arrays with jq with the slurp option
# inputs is
# [] length 200
# [] length 50
# [] length 550
# that resulted from multiple files
$ jq 'transform-stuff' *json | jq -s 'add' input.json
# prints
[each object merge here] length 800-s option => -s read (slurp) all inputs into an array; apply filter to it;
The filter in this case is add.
More examples to understand what’s happening:
# this works but our data is not in this form
$ echo '[[1,2],[3,4]]' | jq 'add'
[
1,
2,
3,
4
]
# our data is in this form and the output is not what we want...
# it applies the filter on each individual array.
$ echo '[1,2][3,4]' | jq 'add'
3
7
# jq slurp to the rescue:
$ echo '[1,2][3,4]' | jq -s 'add'
[
1,
2,
3,
4
]
# just jq -s also doesn't work in case you're wondering
$ echo '[1,2][3,4]' | jq -s
[
[
1,
2
],
[
3,
4
]
]Links
See https://stedolan.github.io/jq/manual/ and https://remysharp.com/drafts/jq-recipes
Other things to read
Popular
- Reveal animations on scroll with react-spring
- Gatsby background image example
- Extremely fast loading with Gatsby and self-hosted fonts