Please stop writing shell scripts
When you’re automating some task, for example packaging your application for Docker, you’ll often find yourself writing shell scripts.
You might have a
bash script to drive the packaging process, and another script as an entry point for the container.
As your packaging grows in complexity, so does your shell script.
Everything works fine.
And then, one day, your shell script does something completely wrong.
That’s when you realize your mistake:
bash, and shell scripting languages in general, are mostly broken by default.
Unless you are very careful from day one, any shell script above a certain complexity level is almost guaranteed to be buggy… and retrofitting the correctness features is quite difficult.
The problem with shell scripts
Let’s focus on
bash as a specific example.
Problem #1: Errors don’t stop execution
Consider the following shell script:
#!/bin/bash touch newfile cp newfil newfile2 # Deliberate typo echo "Success"
What do you think will happen when we run it?
$ bash bad1.sh cp: cannot stat 'newfil': No such file or directory Success
The script kept on running even though a command failed! Compare this to Python, where an exception prevents later code from running.
You can solve this by adding
set -e to the top of the shell script:
#!/bin/bash set -e touch newfile cp newfil newfile2 # Deliberate typo, don't omit! echo "Success"
$ bash bad1.sh cp: cannot stat 'newfil': No such file or directory
Problem #2: Unknown variables cause no errors
Next let’s consider the following script, which tries to add a directory to the
PATH environment variable.
PATH is how the location of executables is found.
#!/bin/bash set -e export PATH="venv/bin:$PTH" # Typo is deliberate ls
When we run it:
$ bash bad2.sh bad2.sh: line 4: ls: command not found
It can’t find
ls because we had a typo, writing
$PTH instead of
bash didn’t complain about an unknown environment variable.
In Python you’d get a
NameError exception; in a compiled language the code wouldn’t even compile.
bash the script just keeps running; what could go wrong?
The solution is
#!/bin/bash set -eu export PATH="venv/bin:$PTH" # Typo is deliberate ls
And now bash catches the typo:
$ bash bad2.sh bad2.sh: line 3: PTH: unbound variable
Problem #3: Pipes don’t catch errors
We thought we solved the failing command problem with
set -e, but we didn’t solve all cases:
#!/bin/bash set -eu nonexistentprogram | echo echo "Success!"
and when we run it:
$ bash bad3.sh bad3.sh: line 3: nonexistentprogram: command not found Success!
The solution is
set -o pipefail:
#!/bin/bash set -euo pipefail nonexistentprogram | echo echo "Success!"
$ bash bad3.sh bad3.sh: line 3: nonexistentprogram: command not found
At this point we’ve implemented (most of) the unofficial
bash strict mode.
But that’s still not enough.
Problem #4: Subshells are weird
Note: An earlier version of this article had incorrect information about subshells. Thanks to Loris Lucido for pointing out my error.
$() syntax, you can launch a subshell:
#!/bin/bash set -euo pipefail export VAR=$(echo hello | nonexistentprogram) echo "Success!"
When we run it:
$ bash bad4.sh bad4.sh: line 3: nonexistentprogram: command not found Success!
What’s going on? Errors in subshells aren’t treated as an error if they’re part of a command’s arguments. That means that subshell’s error just gets thrown away.
The one exception is setting a variable directly, so we need to write our code like this:
#!/bin/bash set -euo pipefail VAR=$(echo hello | nonexistentprogram) export VAR echo "Success!"
And now our program operates correctly:
$ bash good4.sh good4.sh: line 3: nonexistentprogram: command not found
This is probably a sufficient demonstration of
bash’s bad behavior, but it’s certainly not a complete demonstration.
Some bad reasons to use shell scripts
What are some reasons you might want to use shell scripts anyway?
Bad reason #1: It’s always there!
Pretty much every Unix-y computing environment will have a basic shell. So if you’re writing some packaging or startup scripts, it’s tempting to use a tool that you know will be there.
Thing is, if you’re packaging a Python application, you can pretty much guarantee that the development environment, CI, and the runtime environment will all have Python installed. So why not use a programming language that actually handles errors by default?
More broadly, pretty much every programming language with a decently-sized userbase will have some sort of scripting-oriented library or idioms.
Rust, for example, has
xshell, and other libraries as well.
So in most cases you can use your programming language of choice instead of a shell script.
Bad reason #2: Just write correct code!
In theory, if you know what you’re doing, and you stay focused and don’t forget any of the boilerplate, you can write correct shell scripts, even quite complex ones. You can even write unit tests.
- You’re probably not working alone; it’s unlikely everyone on your team has the relevant expertise.
- Everyone gets tired, gets distracted, and otherwise ends up making mistakes.
- Almost every complex shell script I’ve seen was lacking the
set -euo pipefailinvocation, and adding it after the fact is quite difficult (usually impossible).
- I’m not sure I’ve ever seen an automated test for a shell script. I’m sure they exist, but they’re quite rare.
Bad reason #3: Shellcheck will catch all these bugs!
If you are writing shell programs,
shellcheck is a very useful way to catch bugs.
Unfortunately, it’s not enough on its own.
Consider the following program:
#!/bin/bash echo "$(nonexistentprogram | grep foo)" export VAR="$(nonexistentprogram | grep bar)" cp x /nosuchdirectory/ echo "$VAR $UNKNOWN_VAR" echo "success!"
If we run this program it will print “success!”, even though it has 4 separate problems (at least):
$ bash bad6.sh bad6.sh: line 2: nonexistentprogram: command not found bad6.sh: line 3: nonexistentprogram: command not found cp: cannot stat 'x': No such file or directory success!
shellcheck do? It will catch some of the problems… but not all:
- If you run
shellcheck, it will point out the issue with the
- If you run
shellcheck -o all, so it runs all checks, it will also point out the problem with
echo "$(nonexistentprogram ...)". That is, assuming you are using v0.8, which was released in November 2021. Older versions didn’t have this check, so any Linux distribution predating that will give you a
shellcheckthat doesn’t catch that problem.
- It doesn’t suggest
set -euo pipefail.
If you’re relying on
shellcheck I strongly recommend upgrading and making sure you run with
Stop writing shell scripts
Shell scripts are fine in some situations:
- For one-off scripts that you are manually supervising, you can get away with laxer practices.
- Sometimes you really have no guarantees that another programming language is available, and you need to use the shell to get things going.
- For sufficiently simple cases, just running a few commands sequentially, with no subshells, conditional logic, or loops,
set -euo pipefailis sufficient (and make sure you use
shellcheck -o all).
As soon as you find yourself doing anything beyond that, you’re much better off using a less error-prone programming language. And given most software tends to grow over time, your best bet is starting with something a little less broken.
The concise and action-oriented guide to Docker packaging for production
Docker packaging for production is complicated, with as many as 70+ best practices to get right. And you want small images, fast builds, and your Python application running securely.
Take the fast path to learning best practices, by using the Python on Docker Production Handbook.
Free ebook: Introduction to Dockerizing for Production
Learn a step-by-step iterative DevOps packaging process in this free mini-ebook. You'll learn what to prioritize, the decisions you need to make, and the ongoing organizational processes you need to start.
Plus, you'll join my newsletter and get weekly articles covering practical tools and techniques, from Docker packaging to Python best practices.