Please stop writing shell scripts

When you’re automating some task, for example packaging your application for Docker, you’ll often find yourself writing shell scripts. You might have a bash script to drive the packaging process, and another script as an entry point for the container. As your packaging grows in complexity, so does your shell script.

Everything works fine.

And then, one day, your shell script does something completely wrong.

That’s when you realize your mistake: bash, and shell scripting languages in general, are mostly broken by default. Unless you are very careful from day one, any shell script above a certain complexity level is almost guaranteed to be buggy… and retrofitting the correctness features is quite difficult.

The problem with shell scripts

Let’s focus on bash as a specific example.

Problem #1: Errors don’t stop execution

Consider the following shell script:

#!/bin/bash
touch newfile
cp newfil newfile2  # Deliberate typo
echo "Success"

What do you think will happen when we run it?

$ bash bad1.sh 
cp: cannot stat 'newfil': No such file or directory
Success

The script kept on running even though a command failed! Compare this to Python, where an exception prevents later code from running.

You can solve this by adding set -e to the top of the shell script:

#!/bin/bash
set -e
touch newfile
cp newfil newfile2  # Deliberate typo, don't omit!
echo "Success"

And now:

$ bash bad1.sh 
cp: cannot stat 'newfil': No such file or directory

Problem #2: Unknown variables cause no errors

Next let’s consider the following script, which tries to add a directory to the PATH environment variable. PATH is how the location of executables is found.

#!/bin/bash
set -e
export PATH="venv/bin:$PTH"  # Typo is deliberate
ls

When we run it:

$ bash bad2.sh 
bad2.sh: line 4: ls: command not found

It can’t find ls because we had a typo, writing $PTH instead of $PATH—and bash didn’t complain about an unknown environment variable. In Python you’d get a NameError exception; in a compiled language the code wouldn’t even compile. In bash the script just keeps running; what could go wrong?

The solution is set -u:

#!/bin/bash
set -eu
export PATH="venv/bin:$PTH"  # Typo is deliberate
ls

And now bash catches the typo:

$ bash bad2.sh
bad2.sh: line 3: PTH: unbound variable

Problem #3: Pipes don’t catch errors

We thought we solved the failing command problem with set -e, but we didn’t solve all cases:

#!/bin/bash
set -eu
nonexistentprogram | echo
echo "Success!"

and when we run it:

$ bash bad3.sh 
bad3.sh: line 3: nonexistentprogram: command not found

Success! 

The solution is set -o pipefail:

#!/bin/bash
set -euo pipefail
nonexistentprogram | echo
echo "Success!"

Now:

$ bash bad3.sh 
bad3.sh: line 3: nonexistentprogram: command not found

At this point we’ve implemented (most of) the unofficial bash strict mode. But that’s still not enough.

Problem #4: Subshells are weird

Note: An earlier version of this article had incorrect information about subshells. Thanks to Loris Lucido for pointing out my error.

Using the $() syntax, you can launch a subshell:

#!/bin/bash
set -euo pipefail
export VAR=$(echo hello | nonexistentprogram)
echo "Success!"

When we run it:

$ bash bad4.sh 
bad4.sh: line 3: nonexistentprogram: command not found
Success!

What’s going on? Errors in subshells aren’t treated as an error if they’re part of a command’s arguments. That means that subshell’s error just gets thrown away.

The one exception is setting a variable directly, so we need to write our code like this:

#!/bin/bash
set -euo pipefail
VAR=$(echo hello | nonexistentprogram)
export VAR
echo "Success!"

And now our program operates correctly:

$ bash good4.sh 
good4.sh: line 3: nonexistentprogram: command not found

This is probably a sufficient demonstration of bash’s bad behavior, but it’s certainly not a complete demonstration.

Some bad reasons to use shell scripts

What are some reasons you might want to use shell scripts anyway?

Bad reason #1: It’s always there!

Pretty much every Unix-y computing environment will have a basic shell. So if you’re writing some packaging or startup scripts, it’s tempting to use a tool that you know will be there.

Thing is, if you’re packaging a Python application, you can pretty much guarantee that the development environment, CI, and the runtime environment will all have Python installed. So why not use a programming language that actually handles errors by default?

More broadly, pretty much every programming language with a decently-sized userbase will have some sort of scripting-oriented library or idioms. Rust, for example, has xshell, and other libraries as well. So in most cases you can use your programming language of choice instead of a shell script.

Bad reason #2: Just write correct code!

In theory, if you know what you’re doing, and you stay focused and don’t forget any of the boilerplate, you can write correct shell scripts, even quite complex ones. You can even write unit tests.

In practice:

  • You’re probably not working alone; it’s unlikely everyone on your team has the relevant expertise.
  • Everyone gets tired, gets distracted, and otherwise ends up making mistakes.
  • Almost every complex shell script I’ve seen was lacking the set -euo pipefail invocation, and adding it after the fact is quite difficult (usually impossible).
  • I’m not sure I’ve ever seen an automated test for a shell script. I’m sure they exist, but they’re quite rare.

Bad reason #3: Shellcheck will catch all these bugs!

If you are writing shell programs, shellcheck is a very useful way to catch bugs. Unfortunately, it’s not enough on its own.

Consider the following program:

#!/bin/bash
echo "$(nonexistentprogram | grep foo)"
export VAR="$(nonexistentprogram | grep bar)"
cp x /nosuchdirectory/
echo "$VAR $UNKNOWN_VAR"
echo "success!"

If we run this program it will print “success!”, even though it has 4 separate problems (at least):

$ bash bad6.sh 
bad6.sh: line 2: nonexistentprogram: command not found

bad6.sh: line 3: nonexistentprogram: command not found
cp: cannot stat 'x': No such file or directory
 
success!

How does shellcheck do? It will catch some of the problems… but not all:

  1. If you run shellcheck, it will point out the issue with the export.
  2. If you run shellcheck -o all, so it runs all checks, it will also point out the problem with echo "$(nonexistentprogram ...)". That is, assuming you are using v0.8, which was released in November 2021. Older versions didn’t have this check, so any Linux distribution predating that will give you a shellcheck that doesn’t catch that problem.
  3. It doesn’t suggest set -euo pipefail.

If you’re relying on shellcheck I strongly recommend upgrading and making sure you run with -o all.

Stop writing shell scripts

Shell scripts are fine in some situations:

  • For one-off scripts that you are manually supervising, you can get away with laxer practices.
  • Sometimes you really have no guarantees that another programming language is available, and you need to use the shell to get things going.
  • For sufficiently simple cases, just running a few commands sequentially, with no subshells, conditional logic, or loops, set -euo pipefail is sufficient (and make sure you use shellcheck -o all).

As soon as you find yourself doing anything beyond that, you’re much better off using a less error-prone programming language. And given most software tends to grow over time, your best bet is starting with something a little less broken.