Get specific columns from multiple files and paste (combine) them into a new file

Last update: 25 June, 2023

Assuming that the first file shows the block size of a file in column 1 and the file path in column 2:

# file 1
$ du -sh ~/bin/*

0       /c/Users/asdf/bin/abswin
0       /c/Users/asdf/bin/abwin
37M     /c/Users/asdf/bin/BrowserStackLocal.exe
0       /c/Users/asdf/bin/cloc
0       /c/Users/asdf/bin/curltime
960K    /c/Users/asdf/bin/cvdump.exe

And file 2 has the file size in column 5:

# "file" 2
$ ls -lha ~/bin/* | awk '{print $5}'

lrwxrwxrwx 1 asdf 197121   27 Nov 21  2022 /c/Users/asdf/bin/abswin -> /c/abs-from-laragon/abs.exe
lrwxrwxrwx 1 asdf 197121   56 Nov 20  2022 /c/Users/asdf/bin/abwin -> /c/laragon/bin/apache/httpd-2.4.47-win64-VS16/bin/ab.exe*
-rwxr-xr-x 1 asdf 197121  37M Dec 16  2022 /c/Users/asdf/bin/BrowserStackLocal.exe*
lrwxrwxrwx 1 asdf 197121   66 Oct 28  2022 /c/Users/asdf/bin/cloc -> /c/Users/asdf/Desktop/dev/personal-projects/shell-scripts/clock.sh*
lrwxrwxrwx 1 asdf 197121   69 Nov  2  2022 /c/Users/asdf/bin/curltime -> /c/Users/asdf/Desktop/dev/personal-projects/shell-scripts/curltime.sh*
-rwxr-xr-x 1 asdf 197121 957K May  3 14:40 /c/Users/asdf/bin/cvdump.exe*
If you don't have files in your ~/bin directory, you can grab some files from the /bin directory. For example, replace du -sh ~/bin/* with du -sh /bin/* | head -n 5. The bin folder of your home directory (the ~ character means home directory), is in the PATH variable, so you can refer and execute them from any folder.

Let’s assume that I want the following output. First, the column 2 of file 1 (name), second, the column 1 of file 1 (size), and finally, column 5 of file 2 (size):

/c/Users/asdf/bin/abswin        27      0
/c/Users/asdf/bin/abwin 56      0
/c/Users/asdf/bin/BrowserStackLocal.exe 37M     37M
/c/Users/asdf/bin/cloc  66      0
/c/Users/asdf/bin/curltime      69      0
/c/Users/asdf/bin/cvdump.exe    957K    960K

Use the cut --fields LIST or awk '{print $5}' command to print specific columns from a file, and use the paste utility to combine them into a new file. See the following command for example:

$ paste \
  <(du -sh ~/bin/* | cut -f 2) \
  <(ls -lha ~/bin/* | awk '{print $5}') \
  <(du -sh ~/bin/* | cut -f 1)

/c/Users/asdf/bin/abswin        27      0
/c/Users/asdf/bin/abwin 56      0
/c/Users/asdf/bin/BrowserStackLocal.exe 37M     37M
/c/Users/asdf/bin/cloc  66      0
/c/Users/asdf/bin/curltime      69      0
/c/Users/asdf/bin/cvdump.exe    957K    960K

In the command above, I use process substitution <(...) to generate the output of the du, cut, and awk commands as input to the paste utility. See the process substitution as something that creates a temporary file that doesn’t get saved anywhere. The du command retrieves the disk usage of files, the cut command extracts specific fields, and the awk command prints the fifth field.

I also use linux pipes, we combine single letter options, e.g. -sh = -s + -h, the -h option in all the previous commands means human readable, the -f option means field (or column). If you don’t understand a command, copy and paste it in the Explain shell web application.

Other things to read

Popular

Previous/Next