<<< Back to Tips Index

3 Jun 2015

Sorting on Fields

The "sort" utility seems a pretty obvious thing. But it can catch you out in odd little ways. Here's a simple example. Given some machine sizings as follows, how does a script display and sort them appropriately?:

SizeCPUMemory
Tiny12048
Small14096
Medium24096
Big28192
Large48192
XL416384
XXL832768

Start with a template file which declares two simple arrays - call it size.tmpl. This can be read in by the main script via "source ./size.tmpl" (or just ". ./size.tmpl"):

Download size.tmpl
declare -A CPU RAM

# CPU Count
CPU['Tiny']=1
CPU['Small']=1
CPU['Medium']=2
CPU['Big']=2
CPU['Large']=4
CPU['XL']=4
CPU['XXL']=8

# RAM in Mb
RAM['Tiny']=2048     # 2Gb
RAM['Small']=4096    # 4Gb
RAM['Medium']=4096   # 4Gb
RAM['Big']=8192      # 8Gb
RAM['Large']=8192    # 8Gb
RAM['XL']=16384      # 16Gb
RAM['XXL']=32768     # 32Gb

Then create a script which will read those arrays, and display their output, call it sort.sh:

#!/bin/bash
. ./size.tmpl

printf "%-10s%4s%8s\n" Size CPU RAM   # show header

for SIZE in ${!CPU[*]}
do
  printf "%-10s%4d%8d\n" ${SIZE} ${CPU[$SIZE]} ${RAM[$SIZE]}
done

Unfortunately, this doesn't parse the arrays in any particular order:

$ ./sort.sh
Size       CPU     RAM
XL           4   16384
Medium       2    4096
Tiny         1    2048
Small        1    4096
Large        4    8192
Big          2    8192
XXL          8   32768

The answer is to use sort. The -n switch tells it to sort numerically (so that "9" comes before "10", for example). And you can give it keys to sort on. By default, the padding is whitespace, which is what we have here, so we just need to use "sort -n -k 3 -k 2". This tells it to sort on column 3 (and anything which might come after), then on column two:

Download sort.sh
#!/bin/bash
. ./size.tmpl

printf "%-10s%4s%8s\n" Size CPU RAM   # show header

for SIZE in ${!CPU[*]}
do
  printf "%-10s%4d%8d\n" \
    ${SIZE} ${CPU[$SIZE]} ${RAM[$SIZE]}
done  | sort -n -k 3 -k 2

This now gives a more sensibly formatted output:

$ ./sort.sh
Size       CPU     RAM
Tiny         1    2048
Small        1    4096
Medium       2    4096
Big          2    8192
Large        4    8192
XL           4   16384
XXL          8   32768

And so we have a nicely formatted display, sorted by CPU and by RAM.

Bonus Points

For bonus points, we can tell sort more about the input format. If it was CSV, for example, we can use "sort -t," to tell it that the comma separates the fields:

Download sort-csv.sh
#!/bin/bash
. ./size.tmpl

echo "Size,CPU,RAM"

for SIZE in ${!CPU[*]}
do
  printf "%s,%d,%d\n" \
    ${SIZE} ${CPU[$SIZE]} ${RAM[$SIZE]}
done  | sort -t, -n -k 3 -k 2

Then you can create a sorted CSV file:

$ ./sort-csv.sh
Size,CPU,RAM
Tiny,1,2048
Small,1,4096
Medium,2,4096
Big,2,8192
Large,4,8192
XL,4,16384
XXL,8,32768

Invest in your career. Buy my Shell Scripting Tutorial today:

 

Steve's Bourne / Bash shell scripting tips
Share on Twitter Share on Facebook Share on LinkedIn Share on Identi.ca Share on StumbleUpon