UNIX Shell/Command-Line Primer

I’ll start with un-intuitive things that took me the longest to learn. It can be hard to figure out what are command line programs, and what commands are built-in to bash… sometimes there is both a program version and a shell version! Eg, test. It’s very UNIX-y to have even the most basic things be programs, like ls. Some things must be built-in, like fg/bg (foreground and background jobs), for, etc.

You get help with programs using --help or man. You get help with bash built-ins using help (eg, help for). I’m pretty sure built-ins override programs if they have the same name. You can check if a program exists with whereis. help on it’s own gives a listing of commands. Some examples of commands that are built-in and programs:

echo
true/false
[      (sic!)
kill

There are basically three categories of UNIX shell:

  • simple standards-compliant (POSIX) shell: /bin/sh
  • bash-compatible: /usr/bin/bash, zsh
  • non-bash compatible: ksh, csh, fish, xonsh, etc

For things to run on any UNIX machine (including macOS), stick to POSIX shell… but in reality bash is almost always installed (is it default on macOS? I don’t know. It isn’t on FreeBSD). The bash/zsh/csh thing is like emacs/vi. You can see which you are running with echo $0.

To make a script on UNIX, you make the file executable (chmod +x thefile.sh), and add a “shebang” line (the #! is called “shebang”):

#!/usr/bin/bash

You can put any program on shebang line, and the body of the file will be passed to that program:

#!/usr/bin/python3

or

#!/usr/bin/parallel

For portability, and to avoid issues like “python3 is installed under /opt, not /usr/bin”, you can do this:

#!/usr/bin/env python3

The “env” will look up the program by name on the $PATH.

To do a for loop in one line you do:

for l in `ls /somedir`; do echo $l; done

In a script, for better legibility:

for l in `ls /somedir`; do
    echo $l
done

If statements:

if [ -e /some/file ]; do echo 'file exists'; fi

if [ -e /some/file ]; do
    echo 'file exists'
fi

You’ll see those brackets (single [ or double [[), and it took me forever to learn about them… help [ and help if are not helpful, you need help test.

Specific Commands

There are a bajillion command line commands. Here are some helpful categories…

For doing csv/tsv/column data munging:

cut - select columns
grep/rg - select rows (by pattern), use '-v' to invert
join - what it says
sort - note the -n flag for numerical. lexical/case etc are a mess
uniq
sed - search/replace using regex
rg - more complex search/replace using the -o or -e options
awk - simple string substitutions, re-order columns with 'print'
tr - character-level search/replace or filtering (eg, lowercase)
xsv - a whole lot!

Here’s an example of the above; I run things like this all day:

cut -f1 ~/.bash_history -d' ' | sort | uniq -c | sort -n

For doing batch operations:

xargs - superceded by parallel
parallel - like 'for' but in parallel. can also work on piped stdin
    I basically never use 'for' loops after learning this command
find - helpful for finding files, but can also run a command for each
    file found

Other every-day tools:

jq - the JSON swiss army knife. A whole scripting language
http (httpie) - better than curl/wget for debugging (though maybe not
    for big downloads)
xsv - can do conversions between csv types
rg/ripgrep - faster + better than grep
pv - show progress
screen/tmux - a whole world of it's own