Inspect text that contains control characters in bash

Last update: 05 July, 2023

My use case: See what the IFS environment variable contains.

Some context: The IFS is the input field separator environment variable and it’s used by programs such as awk to separate the fields (columns) of a line.

For example, if a line contains the text 1 2 3 and ${IFS} is set to the space character, awk will give you 3 columns 1, 2, and 3. If IFS is set to tab, though, awk will give you only 1 column, a column that contains all the numbers 1 2 3.

It’s also used by the read shell built-in, a utility that’s used to read lines and fields from files and from the standard input (see help read).

It’s also worth noting that IFS can contain a list of characters, not only one character.

See Bourne shell reserved variables if want to know learn more environment variables.

So, let’s see what the IFS variable contains in my terminal:

$ echo "${IFS}"

# no visible out

I get no visible output, but if I pipe the result to the xxd command:

$ echo "${IFS}" | xxd

00000000: 2009 0a0a                                 ...
# xxd output explanation:
# line address: | hex bytes in pairs of 2 by default | textual representation

I get 4 characters. There’s an extra new line character there 0a because echo adds an extra new line at the end of the output.

We can use the -n option of echo to not append a newline (type help echo for more options).

$ echo -n "${IFS}" | xxd

00000000: 2009 0a                                   ..

It seems that my IFS environment variable consists of 3 characters:

  1. 0x20 the space
  2. 0x09 the tab (horizontal)
  3. 0x0a the line feed, aka new line, or \n, or LF.

The Wikipedia page List of Unicode characters will probably serve you well if you want to translate hex codes to Unicode characters.

Text to UTF-8 bytes

You can also use xxd to see the UTF-8 encoding (Unicode) of some text.

In the following example, I print my name (Mark) in English:

$ echo -n "Mark" | xxd

00000000: 4d61 726b                                Mark

Not too exciting, I gave it 4 characters, I got back 4 bytes.

But if I type my name in Greek:

$ echo -n "Μάρκος" | xxd

00000000: ce9c ceac cf81 ceba cebf cf82            ............

I get the UTF-8 encoding; how the text in stored in the file as bytes. I gave it “6” characters, I got back 12 bytes.

A “quick” way to verify this is with JavaScript’s encodeURIComponent method:

$ node
Welcome to Node.js vx.x.x.
Type ".help" for more information.
> encodeURIComponent("Μάρκος").split("%").join(" ").toLowerCase();

' ce 9c ce ac cf 81 ce ba ce bf cf 82'

Bash default encoding

If you want to see what’s the default encoding in your shell, search for environment variables that start with LC* (stands for locale) or LANG*:

$ printenv | grep -iE 'LC|LANG'

LANG=en_US.UTF-8

The printenv command above retrieves the values of all environment variables. The | (pipe) symbol redirects the printenv output to the grep command.

The grep command is used to search for patterns in text. The options used are:

  • -i: Performs a case-insensitive search.
  • -E: Enables extended regular expressions for pattern matching.

The pattern ‘LC|LANG’ specifies the search criteria. It looks for lines that contain either “LC” or “LANG”.

Other things to read

Popular

Previous/Next