<<< Back to Tips Index

8th May 2017

Sorting Files by Name

Including advanced sorts, and coping with spaces

Sorting files is normally a fairly straight-forward task; "ls -lSr" will sort them by size, (smallest to largest). "ls -ltr" sorts them by last-modified time (oldest to newest), and so on. For more advanced sorts, say to sort them in numerical order, you may want to pipe the files through the dedicated "sort" utility. For example, here I have a set of files, which I'd like to sort numerically. For an extra twist, the names contain spaces, which as we shall see, complicates things slightly.

$ ls
10. Blade Runner  2. Pulp Fiction
11. The Shining   3. The Wizard of Oz
12. Fight Club    4. 2001 - A Space Odyssey
13. Alien         5. Schindler's List
14. Toy Story     6. Star Wars
15. The Matrix    7. Se7en
16. Amelie        8. To Kill a Mockingbird
1. Casablanca     9. The Silence of the Lambs
$ 

These files are listed in alphabetical order by default; "10" comes before "11", which is good, but also before "1", which is not what we want. We will need to pass these files to "sort -n", because that understands about the difference between sorting alphabetically and sorting numerically.

$ ls | sort -n
1. Casablanca
2. Pulp Fiction
3. The Wizard of Oz
4. 2001 - A Space Odyssey
5. Schindler's List
6. Star Wars
7. Se7en
8. To Kill a Mockingbird
9. The Silence of the Lambs
10. Blade Runner
11. The Shining
12. Fight Club
13. Alien
14. Toy Story
15. The Matrix
16. Amelie
$ 

As an aside, you may notice that "ls" by itself formatted the output into columns; this is because it knew it was writing to a terminal, so it prettified the output a little. "ls -1" (that's minus 1, not minus l) tells "ls" to always list one file per line. If it knows it is writing to a pipe, a file, or anything else, "ls" will default to writing one file per line.

Now that we have sorted the files, we can iterate through them in a loop. Here is one well-intentioned attempt:

$ for filename in `ls | sort -n`
> do
>   echo "Found file: ${filename}"
> done

Unfortunately, that does not deal with the fact that these filenames contain spaces. Every word passed to the for loop is processed in turn:

Found file: 1.
Found file: Casablanca
Found file: 2.
Found file: Pulp
Found file: Fiction
Found file: 3.
Found file: The
Found file: Wizard
. . .

Probably the tidiest way around this problem, is to pass the output into a loop; "while read filename" will read its input line-by-line, into the variable named filename. So rather than grabbing the output of "ls | sort -n" into a for loop, we pipe it into a while loop, like this:

$ ls | sort -n | while read filename
> do
>   echo "Found file: ${filename}"
> done

This produces the desired result: spaces are preserved, and the files are listed in the order that you would expect:

Found file: 1. Casablanca
Found file: 2. Pulp Fiction
Found file: 3. The Wizard of Oz
Found file: 4. 2001 - A Space Odyssey
Found file: 5. Schindler's List
. . .

These files can now be processed however you wish; this technique could be useful for a Flyway style database migration tool, where you name all of your SQL files to indicate the order in which they should be run, for example.

This is one of those types of problems, where if you know the answer, it is likely to be quite obvious, but for those not so familiar with the shell and how it processes strings, it can be quite frustrating and time consuming to work it out by trial-and-error. Hopefully this short page has helped you better to understand what is going on, and why the for loop didn't work, where the while loop did.

Invest in your career. Buy my Shell Scripting Tutorial today:

 

og:image credit: Se7en movie, just because it's in this list, and it includes lists!

Steve's Bourne / Bash shell scripting tips
Share on Twitter Share on Facebook Share on LinkedIn Share on Identi.ca Share on StumbleUpon