JSON parsing for Command-Line and Shell Scripts with jq

JSON is ubiquitous. Maybe you work with a service that ingests or produces JSON output. And maybe you want to be able to automate some tasks using shell scripts. However, the hierarchical structure of JSON makes this difficult using grep, sed or awk. Enter jq.

jq is a utility that lets you “slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text”. In this post, I’ll show how I’ve used jq as part of a backup script.

Script Description

I needed to back up some databases hosted using AWS’s RDS service. These databases can only be accessed from IP addresses in the same subnet. I needed to run the backups every day, but didn’t want to pay for and EC2 instance up to be active all day for less than 10 minutes of work. So, I configured an instance that I would use for the backups, and placed it in a “stopped” state. My script would then do the following.

  1. Start the EC2 instance
  2. Wait until the server has started.
  3. Get service IP address
  4. Wait for the instance to accept an SSH connection.
  5. Fetch DB credentials for Database
  6. Run pg_dump via ssh and store the dump files locally.

Gist

The third and fifth steps require jq.

Getting Server IP address.

Once the server is started it’s details can be accessed via aws ec2 describe-instances. This command will return details about CPU architecture, subnet configuration, mounted volumes. All we need is the public IP address. One of the basic functions of jq filtering JSON objects.

INSTANCE_DESCRIPTION=$(aws ec2 describe-instances --instance-ids $PROXY_SERVER_ID)
IP_ADDRESS=$(echo $INSTANCE_DESCRIPTION | jq -r '.Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicIp')

In the above snippet. First, the full description is loaded into $INSTANCE_DESCRIPTION. Then this output is piped through jq. The public ip address is then accessed at the path .Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicIp. This syntax should be fairly familiar. The initial dot operator (.) is a reference to the root object. The path then drills down the hierarchy using array accesses when necessary[0]. jq allows access to multiple array indices. Because we specified the instance id (--instance-ids $PROXY_SERVER_ID), we only need the first (and only) index. The flag -r indicates that jq will output “raw result”, rather than a string wrapped in quote marks.

Fetching DB credentials

Fetching the DB credentials is much the same. The raw blob is fetched via AWS command-line tools. Then parse out the desired data use jq.

DB_SECRET_PATH=[DB PATH WITHIN AWS SECRETS]
SECRET=$(aws secretsmanager get-secret-value --secret-id $DB_SECRET_PATH | jq -r '.SecretString')
DBUSER=$(echo $SECRET | jq '.username')
DBNAME=$(echo $SECRET | jq '.dbname')
PASS=$(echo $SECRET | jq '.password')
HOST=$(echo $SECRET | jq '.host')

DATE_STR=$(date +"%y-%m-%d")
ssh ubuntu@$IP_ADDRESS -i $SSH_KEY \
    "echo $PASS | pg_dump --host=$HOST --user=$DBUSER $DBNAME | gzip" \
    >  $BACKUP_DIR/db-name.dump.$DATE_STR.sql.gz

Parsing out the secret is a little more complicated because the secret body is a serialized JSON string within a JSON document.

SECRET=$(aws secretsmanager get-secret-value --secret-id $DB_SECRET_PATH | jq -r '.SecretString')

The result of aws secretsmanager get-secret-value includes the secret body at the path .SecretString. Again the raw result is stored in an ENV variable.

Next, each value is picked off the resulting secret object.

DBUSER=$(echo $SECRET | jq '.username')
DBNAME=$(echo $SECRET | jq '.dbname')
PASS=$(echo $SECRET | jq '.password')
HOST=$(echo $SECRET | jq '.host')

As with $INSTANCE_DESCRIPTION in the previous section, we pipe the JQ string through jq and pick specific value by name. The values are then used to connect to the database.

DATE_STR=$(date +"%y-%m-%d")
ssh ubuntu@$IP_ADDRESS -i $SSH_KEY \
    "echo $PASS | pg_dump --host=$HOST --user=$DBUSER $DBNAME | gzip" \
    >  $BACKUP_DIR/db-name.dump.$DATE_STR.sql.gz

BONUS! echoing password for pg_dump

Like other Postgres utilities pg_dump doesn’t accept passwords as a command-line argument. It reads them from standard input. We can avoid user interaction can by piping the password into pg_dump.

echo $PASS | pg_dump --host=$HOST --user=$DBUSER $DBNAME

Bye!

jq is a great tool, and is really easy to get a grip on if you are already familiar with JSON. Hopefully you found this post helpful. Thanks for reading!