Please stop writing shell scripts
When you’re automating some task, for example packaging your application for Docker, you’ll often find yourself writing shell scripts.
You might have a bash
script to drive the packaging process, and another script as an entry point for the container.
As your packaging grows in complexity, so does your shell script.
Everything works fine.
And then, one day, your shell script does something completely wrong.
That’s when you realize your mistake: bash
, and shell scripting languages in general, are mostly broken by default.
Unless you are very careful from day one, any shell script above a certain complexity level is almost guaranteed to be buggy… and retrofitting the correctness features is quite difficult.
The problem with shell scripts
Let’s focus on bash
as a specific example.
Problem #1: Errors don’t stop execution
Consider the following shell script:
#!/bin/bash
touch newfile
cp newfil newfile2 # Deliberate typo
echo "Success"
What do you think will happen when we run it?
$ bash bad1.sh
cp: cannot stat 'newfil': No such file or directory
Success
The script kept on running even though a command failed! Compare this to Python, where an exception prevents later code from running.
You can solve this by adding set -e
to the top of the shell script:
#!/bin/bash
set -e
touch newfile
cp newfil newfile2 # Deliberate typo, don't omit!
echo "Success"
And now:
$ bash bad1.sh
cp: cannot stat 'newfil': No such file or directory
Problem #2: Unknown variables cause no errors
Next let’s consider the following script, which tries to add a directory to the PATH
environment variable.
PATH
is how the location of executables is found.
#!/bin/bash
set -e
export PATH="venv/bin:$PTH" # Typo is deliberate
ls
When we run it:
$ bash bad2.sh
bad2.sh: line 4: ls: command not found
It can’t find ls
because we had a typo, writing $PTH
instead of $PATH
—and bash
didn’t complain about an unknown environment variable.
In Python you’d get a NameError
exception; in a compiled language the code wouldn’t even compile.
In bash
the script just keeps running; what could go wrong?
The solution is set -u
:
#!/bin/bash
set -eu
export PATH="venv/bin:$PTH" # Typo is deliberate
ls
And now bash catches the typo:
$ bash bad2.sh
bad2.sh: line 3: PTH: unbound variable
Problem #3: Pipes don’t catch errors
We thought we solved the failing command problem with set -e
, but we didn’t solve all cases:
#!/bin/bash
set -eu
nonexistentprogram | echo
echo "Success!"
and when we run it:
$ bash bad3.sh
bad3.sh: line 3: nonexistentprogram: command not found
Success!
The solution is set -o pipefail
:
#!/bin/bash
set -euo pipefail
nonexistentprogram | echo
echo "Success!"
Now:
$ bash bad3.sh
bad3.sh: line 3: nonexistentprogram: command not found
At this point we’ve implemented (most of) the unofficial bash
strict mode.
But that’s still not enough.
Problem #4: Subshells are weird
Note: An earlier version of this article had incorrect information about subshells. Thanks to Loris Lucido for pointing out my error.
Using the $()
syntax, you can launch a subshell:
#!/bin/bash
set -euo pipefail
export VAR=$(echo hello | nonexistentprogram)
echo "Success!"
When we run it:
$ bash bad4.sh
bad4.sh: line 3: nonexistentprogram: command not found
Success!
What’s going on? Errors in subshells aren’t treated as an error if they’re part of a command’s arguments. That means that subshell’s error just gets thrown away.
The one exception is setting a variable directly, so we need to write our code like this:
#!/bin/bash
set -euo pipefail
VAR=$(echo hello | nonexistentprogram)
export VAR
echo "Success!"
And now our program operates correctly:
$ bash good4.sh
good4.sh: line 3: nonexistentprogram: command not found
This is probably a sufficient demonstration of bash
’s bad behavior, but it’s certainly not a complete demonstration.
Some bad reasons to use shell scripts
What are some reasons you might want to use shell scripts anyway?
Bad reason #1: It’s always there!
Pretty much every Unix-y computing environment will have a basic shell. So if you’re writing some packaging or startup scripts, it’s tempting to use a tool that you know will be there.
Thing is, if you’re packaging a Python application, you can pretty much guarantee that the development environment, CI, and the runtime environment will all have Python installed. So why not use a programming language that actually handles errors by default?
More broadly, pretty much every programming language with a decently-sized userbase will have some sort of scripting-oriented library or idioms.
Rust, for example, has xshell
, and other libraries as well.
So in most cases you can use your programming language of choice instead of a shell script.
Bad reason #2: Just write correct code!
In theory, if you know what you’re doing, and you stay focused and don’t forget any of the boilerplate, you can write correct shell scripts, even quite complex ones. You can even write unit tests.
In practice:
- You’re probably not working alone; it’s unlikely everyone on your team has the relevant expertise.
- Everyone gets tired, gets distracted, and otherwise ends up making mistakes.
- Almost every complex shell script I’ve seen was lacking the
set -euo pipefail
invocation, and adding it after the fact is quite difficult (usually impossible). - I’m not sure I’ve ever seen an automated test for a shell script. I’m sure they exist, but they’re quite rare.
Bad reason #3: Shellcheck will catch all these bugs!
If you are writing shell programs, shellcheck
is a very useful way to catch bugs.
Unfortunately, it’s not enough on its own.
Consider the following program:
#!/bin/bash
echo "$(nonexistentprogram | grep foo)"
export VAR="$(nonexistentprogram | grep bar)"
cp x /nosuchdirectory/
echo "$VAR $UNKNOWN_VAR"
echo "success!"
If we run this program it will print “success!”, even though it has 4 separate problems (at least):
$ bash bad6.sh
bad6.sh: line 2: nonexistentprogram: command not found
bad6.sh: line 3: nonexistentprogram: command not found
cp: cannot stat 'x': No such file or directory
success!
How does shellcheck
do? It will catch some of the problems… but not all:
- If you run
shellcheck
, it will point out the issue with theexport
. - If you run
shellcheck -o all
, so it runs all checks, it will also point out the problem withecho "$(nonexistentprogram ...)"
. That is, assuming you are using v0.8, which was released in November 2021. Older versions didn’t have this check, so any Linux distribution predating that will give you ashellcheck
that doesn’t catch that problem. - It doesn’t suggest
set -euo pipefail
.
If you’re relying on shellcheck
I strongly recommend upgrading and making sure you run with -o all
.
Stop writing shell scripts
Shell scripts are fine in some situations:
- For one-off scripts that you are manually supervising, you can get away with laxer practices.
- Sometimes you really have no guarantees that another programming language is available, and you need to use the shell to get things going.
- For sufficiently simple cases, just running a few commands sequentially, with no subshells, conditional logic, or loops,
set -euo pipefail
is sufficient (and make sure you useshellcheck -o all
).
As soon as you find yourself doing anything beyond that, you’re much better off using a less error-prone programming language. And given most software tends to grow over time, your best bet is starting with something a little less broken.