Parsing JSON/XML/YML in the command line with jq/yq/xq

Last update: 31 December, 2025
Table of contents

Install

Install jq with a package manager. For example, in Ubuntu with apt:

$ sudo apt install jq

The xq/yq executables are part the python package yq https://github.com/kislyuk/yq that transforms the XML/YML files to JSON and calls jq with the filter you provide. Install it with

# You need to install python3 first that includes the pip package manager.
$ pip3 install yq

xq/yq call jq, so the filters will be the same.

See https://stedolan.github.io/jq/manual/

Get the first item of the array

$ curl https://jsonplaceholder.typicode.com/users | jq .[0]

Get only a specific field from the array items

Also count the unique

$ curl https://jsonplaceholder.typicode.com/users |
 jq .[].address.city | sort | uniq -c

Get the length

Get the length of an array.

$ cat facebook.xml | xq  '.rss.channel.item | length'

Get the length of nested fields. Be careful, though, because if the field is a string, instead of an array, you’ll get the string length.

$ cat facebook.xml | xq  '.rss.channel.item[]."g:additional_image_link" | length'

Filter entries with select

$ curl https://jsonplaceholder.typicode.com/users |
 jq '.[] | select(.username == "Bret")'

Use and/or:

$ curl https://jsonplaceholder.typicode.com/users |
 jq '.[] | select(.username == "Bret" and .address.city == "Gwenborough")'

Unescape character from JSON string with fromjson filter (parse)

$ cat | jq '. | fromjson'

> "{\"result\":[{\"battery.current\":0,\"vehicle.mileage\":437.974}]}"

prints:

{
 "result": [
   {
     "battery.current": 0,
     "vehicle.mileage": 437.974
   }
 ]
}

Or decode JSON strings in your programming language.

Return specific fields in a row with string interpolation

$ curl https://jsonplaceholder.typicode.com/users |
 jq '.[] | select(.username == "Bret") | "\(.id) \(.name) \(.company.name)"'

Pick specific fields from an object

$ jq '.urlset.url[] | { loc, lastmod }' kids-shop-123-sitemap.json | head

{
 "loc": "https://kids-shop-123.gr/kathisma-daxtilidi-baniou-panda-prasino-lorelli",
 "lastmod": "2023-04-27T21:54:05+00:00"
}
{
 "loc": "https://kids-shop-123.gr/kathisma-daxtilidi-baniou-panda-roz-lorelli",
 "lastmod": "2023-02-28T13:52:06+00:00"
}
{
 "loc": "https://kids-shop-123.gr/1022057-Lorelli-pipila-silikonis-MY-FRIEND-me-kallyma1022057-blue",

Select with pipes and match

Note the pipe inside select(..).

$ cat facebook.xml |
 xq  '.rss.channel.item[] | select(."g:link" | match("https:\/\/kids-shop-123.gr\/.*\/"))."g:link"' |
 grep -ivE route

Select with pipes and contains (includes)

$ cat facebook.xml |
 xq  '.rss.channel.item[] | select(."g:link" | contains("Lorelli")) | ."g:link"'

Get all the possible properties (keys)

From an array of objects with keys built-in function.

$ cat facebook.xml | xq  '.rss.channel.item[] | keys' | sort | uniq

You can also use keys[], I don’t know atm what’s the difference.

Get all the possible values (values)

From an array of objects with values[] built-in function.

$ jq '.[][1].ips | values[]' ./x64/Debug/user-stats.json
# From input:
[
 [
   667204,
   {
     "ips": {
       "122.127.33.1": 1685028929
     }
   }
 ],
 [
   675978,
   {
     "ips": {
       "10.26.218.58": 1685028929
     }
   }
 ],
...
# Output:
1685028929
1685028929
1685028976
...

Get the items that their value is an array

Use the type

$ cat shop-2023-06-17.json |
 jq '.products.product[] | select( .manufacturer == "CRISTIANA MASI") | select( values[] | type == "array" )'

Find which items have a value that is an array, and get the items’ keys and counts

to_entries transforms an object to an array of key/value objects. It’s usefull if you want to do stuff with one of the key/value and then reference the other.

$ cat ~/Desktop/shop/shop-2024-02-10.json |
 jq -r '.data.post[] | to_entries | .[] | select(.value | type == "array") | "\(.key) \(.value | length)"' |
 sort |
 uniq -c |
 sort |
 awk 'BEGIN{print "count, key, array_length"} {print $0}'

# prints
count, key, array_length
     1 ImageFilename 3
     1 ImageFilename 8
     1 ImageURL 3
     1 ImageURL 8
     1 ΠροϊόνΣύνθεση 2
     2 ΠροϊόνΔιαστάσεις 2
     2 ΠροϊόνΧρώμα 4
     3 Κατηγορίεςπροϊόντων 3
    12 ΠροϊόνΧρώμα 2
    21 ImageFilename 2
    21 ImageURL 2
    55 Κατηγορίεςπροϊόντων 2

The use case here is to see which objects from an array have array properties. At the same time, print the key and the length of the array. Useful if I want to use wpallimport for a feed to see how many max additional images or attributes to account for (and not lose existing data).

$ cat kids-shop-123-facebook-2023-06-19.json |
 jq -c '.rss.channel.item[] | to_entries[] | if .value | type == "array" then { key, length: .value | length } else null end' |
 grep -v null | head

Drop (exclude, delete, remove) a property (field) from an object path

Use the del(path_expression).

$ echo '[
 { "name": "Mark", "age": 12, "password": "123456" },
 { "name": "John", "age": 21, "password": "123" }
]' |
 jq 'del(.[].password)'

# prints
[
 {
   "name": "Mark",
   "age": 12
 },
 {
   "name": "John",
   "age": 21
 }
]

Add a property (field) to an object

Add the profit_target_sent property to all objects, with a default value of 0.

$ jq '.records[][1] += { "profit_target_sent": 0.0 }' my_file.json > my_file.json.ready

Sort results

Notice that it’s item, not item[].

$ cat facebook.xml |
 xq  '.rss.channel.item | sort_by(."g:quantity") | .[]."g:quantity"'

Sort results and keep the initial structure

Use |= vs |: See https://stackoverflow.com/a/30332672

$ jq '.urlset.url |= sort_by (.loc)' kids-shop-123-sitemap-real.json | head

{
 "urlset": {
   "@xmlns": "http://www.sitemaps.org/schemas/sitemap/0.9",
   "@xmlns:image": "http://www.google.com/schemas/sitemap-image/1.1",
   "url": [
     {
       "loc": "https://kids-shop-123.gr/10010060003-marsipo-lorelli-prasino",
       "changefreq": "weekly",
       "lastmod": "2021-11-03T13:49:11+00:00",
       "priority": "1.0",

Explanation:

$ echo '{ "numbers": [{ "value": 2 }, { "value": 3 }, { "value": 1 }] }' |
 jq '.numbers | sort_by ( .value )'
# | (pipe) did not keep numbers property.
[
 {
   "value": 1
 },
 {
   "value": 2
 },
 {
   "value": 3
 }
]

$ echo '{ "numbers": [{ "value": 2 }, { "value": 3 }, { "value": 1 }] }' |
 jq '.numbers |= sort_by ( .value )'

# |= kept numbers property.
{
 "numbers": [
   {
     "value": 1
   },
   {
     "value": 2
   },
   {
     "value": 3
   }
 ]
}

How to remove items from a nested array based on a filter and keep the rest structure unchanged

Again, use the assignment operator (|=)

$ echo '{"message":"Hi","items":[1,2,3]}' |
 jq -c '.items[] |= select(. >2)'

{"message":"Hi","items":[3]}

Note this works for jq version >= 1.7

Update with pipe assignment operator |= multiple properties at once, maybe even on different levels of nesting.

You do what you did above, but you then use regular pipe to update the second property.

$ echo '{"message":"Hi","items":[1,2,3],"hello":{"items":[1,4,5]}}' |
 jq -c '.items[] |= select(. >2) | .hello.items[] |= select(. >2)'

{"message":"Hi","items":[3],"hello":{"items":[4,5]}}

Unique field

It’s not a filter, the same as sort_by

cat facebook.xml |
 xq  '.rss.channel.item | unique_by(."g:quantity") | .[]."g:quantity"'

How to minify a JSON file

Use the -c (compact) option:

$ echo '{
  "message": "Hello World",
  "number": 123
}' | jq -c '.'

# prints:
{"message":"Hello World","number":123}

Count objects if the output is an object stream (?) instead of an array with –compact, -c option (keywords: minify, minified, minimize, compress)

The length works if the output is an array:

$ cat input/current-shop-export-22-09-2022.xml | xq '.data.post | length'
# Prints 761.

But if the output consists of individual objects, the length is applied as a filter on each object:

$ cat input/current-shop-export-22-09-2022.xml | xq '.data.post[] | length'
# Prints:
# 25
# 25
# ...

This comes up when you already filtered the array and you want to count how many items satisfy the filter. In this case, you can use the -c (compact) option to print each item in a single line and pipe the output to wc to count the number of lines

$ cat input/current-shop-export-22-09-2022.xml |
 xq -c  '.data.post[] | select(."Ετικέτεςπροϊόντος" | type == "array")' |
 wc -l
# Prints 208

Convert JSON to csv with jq

See https://stackoverflow.com/questions/32960857/how-to-convert-arbitrary-simple-json-to-csv-using-jq

$ jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' input.json > output.csv

Get the results as raw strings (without quotes)

Say you don’t want the string quotes:

$ echo '["A", "B", "C"]' | jq '.[]'

"A"
"B"
"C"

Use the -r option:

$ echo '["A", "B", "C"]' | jq -r '.[]'

A
B
C
-r               output raw strings, not JSON texts;

Multiple expresssions (search keyword: conditions) in select

I want to select the tags from Woocommerce products that are arrays (they can also be a single string) and their length is equal to 8:

$ cat input/current-shop-export-22-09-2022.xml |
 xq  '.data.post[] | select(."Ετικέτεςπροϊόντος" | (type == "array") and length == 8)'

Enclose the first expression in parenthesis and use the and keyword for the second expression, see https://stackoverflow.com/questions/33057420/jq-select-multiple-conditions, https://stedolan.github.io/jq/manual/#select(boolean_expression)

jq ternary if then else elseif conditions

My product categories are either a single category (string) or an array of categories (strings). I want to return the single category string, but I want to flatten the category arrays. For that reason, I will use if-then-else, see https://stedolan.github.io/jq/manual/#if-then-else and the example below:

$ cat input/current-shop-export-22-09-2022.xml |
 xq -r '.data.post[]."Κατηγορίεςπροϊόντων" | if type == "array" then .[] else . end' |
 sort | uniq -c

SQL-like operators, IN: check if the value of a property is in an array of known values

$ jq '.records[] | select(.[0] | IN(20008,20009,20010))' profits.json

Get a range from an array

Use the [start_index:end_endex] syntax:

# get the first 2 million elements from the array:

$ jq '.[0:2000000]' results-1705071239.json > results-1705071239-first-2m.json

You can also query items from the end of the array by adding negative indices.

You can also omit the end index if you from the first element until the last element of the array.

See “jq: Select range” https://stackoverflow.com/questions/45548604/jq-select-range

Combine arrays with jq with the slurp option

# inputs is
# [] length 200
# [] length 50
# [] length 550
# that resulted from multiple files

$ jq 'transform-stuff' *json | jq -s 'add' input.json

# prints
[each object merge here] length 800

-s option => -s read (slurp) all inputs into an array; apply filter to it;

The filter in this case is add.

More examples to understand what’s happening:

# this works but our data is not in this form
$ echo '[[1,2],[3,4]]' | jq 'add'
[
 1,
 2,
 3,
 4
]

# our data is in this form and the output is not what we want...
# it applies the filter on each individual array.
$ echo '[1,2][3,4]' | jq 'add'
3
7

# jq slurp to the rescue:
$ echo '[1,2][3,4]' | jq -s 'add'
[
 1,
 2,
 3,
 4
]

# just jq -s also doesn't work in case you're wondering
$ echo '[1,2][3,4]' | jq -s
[
 [
   1,
   2
 ],
 [
   3,
   4
 ]
]

See https://stedolan.github.io/jq/manual/ and https://remysharp.com/drafts/jq-recipes

Other things to read

Popular

Previous/Next