Inspect text that contains control characters in bash
My use case: See what the IFS environment variable contains.
Some context: The IFS is the input field separator environment variable and it’s used by programs such as awk to separate the fields (columns) of a line.
For example, if a line contains the text 1 2 3 and ${IFS} is set to the space character, awk will give you 3 columns 1, 2, and 3. If IFS is set to tab, though, awk will give you only 1 column, a column that contains all the numbers 1 2 3.
It’s also used by the read shell built-in, a utility that’s used to read lines and fields from files and from the standard input (see help read).
It’s also worth noting that IFS can contain a list of characters, not only one character.
See Bourne shell reserved variables if want to know learn more environment variables.
So, let’s see what the IFS variable contains in my terminal:
$ echo "${IFS}"
# no visible outI get no visible output, but if I pipe the result to the xxd command:
$ echo "${IFS}" | xxd
00000000: 2009 0a0a ...
# xxd output explanation:
# line address: | hex bytes in pairs of 2 by default | textual representationI get 4 characters. There’s an extra new line character there 0a because echo adds an extra new line at the end of the output.
We can use the -n option of echo to not append a newline (type help echo for more options).
$ echo -n "${IFS}" | xxd
00000000: 2009 0a ..It seems that my IFS environment variable consists of 3 characters:
0x20the space0x09the tab (horizontal)0x0athe line feed, aka new line, or\n, orLF.
The Wikipedia page List of Unicode characters will probably serve you well if you want to translate hex codes to Unicode characters.
Text to UTF-8 bytes
You can also use xxd to see the UTF-8 encoding (Unicode) of some text.
In the following example, I print my name (Mark) in English:
$ echo -n "Mark" | xxd
00000000: 4d61 726b MarkNot too exciting, I gave it 4 characters, I got back 4 bytes.
But if I type my name in Greek:
$ echo -n "Μάρκος" | xxd
00000000: ce9c ceac cf81 ceba cebf cf82 ............I get the UTF-8 encoding; how the text in stored in the file as bytes. I gave it “6” characters, I got back 12 bytes.
A “quick” way to verify this is with JavaScript’s encodeURIComponent method:
$ node
Welcome to Node.js vx.x.x.
Type ".help" for more information.
> encodeURIComponent("Μάρκος").split("%").join(" ").toLowerCase();
' ce 9c ce ac cf 81 ce ba ce bf cf 82'Bash default encoding
If you want to see what’s the default encoding in your shell, search for environment variables that start with LC* (stands for locale) or LANG*:
$ printenv | grep -iE 'LC|LANG'
LANG=en_US.UTF-8The printenv command above retrieves the values of all environment variables. The | (pipe) symbol redirects the printenv output to the grep command.
The grep command is used to search for patterns in text. The options used are:
-i: Performs a case-insensitive search.-E: Enables extended regular expressions for pattern matching.
The pattern ‘LC|LANG’ specifies the search criteria. It looks for lines that contain either “LC” or “LANG”.
Links
Other things to read
Popular
- Reveal animations on scroll with react-spring
- Gatsby background image example
- Extremely fast loading with Gatsby and self-hosted fonts