When your CI is taking forever on AWS EC2, it might be EBS
You’re running you test suite or your Docker image packaging on a EC2 server. And it’s slow.
docker pulltakes 15 minutes just to verify the images it downloaded in 1 minute.
dnfinstalls take another 15 minutes.
conda installtake even more time.
It’s a fast machine with plenty of memory—in fact, you may have be using a custom machine precisely because Travis CI or Circle CI builders are underpowered. And yet it’s still taking forever.
Quite possibly the problem is with your EBS disk’s IOPS.
IOPS as bottleneck
Your EC2 virtual machine has a virtual hard drive, typically an AWS EBS volume. And these drives have limited I/O operations per second (“IOPS”), a limited number of reads and writes per second.
For the default general purpose
gp2 disk type there are two limits:
- The standard IOPS, 3 IOPS per GiB of storage, with a minimum of 100 regardless of volume size. If you have a 100GiB EBS volume it will do 300 IOPS; a 500GiB volume will do 1500 IOPS.
- The burst IOPS of 3000.
The way the burst IOPS works is that you get a 5.4 million credit, and that gets used up at a 3000/sec rate. Once the credit is used up you’re back to the minimum IOPS, and over time the credit rebuilds. (You can get the full details in the AWS documentation).
For application servers, this works great: you’re not doing a lot of I/O once your application has started running. For CI workloads—tests and packaging—limited IOPS can be a performance problem.
When you download a Docker image, operating system package, or Python package, you are doing lots and lots of disk I/O. The packages get written to disk, they get re-read, they get unpackaged and lots of small files are written to disk. It all adds up.
A few concurrent CI runs might use up all of your burst IOPS—and if you have a 100GiB hard drive, you suddenly drop from 3000 IOPS to 100 IOPS. And now installing packages is going to take as much as 30× as long, because it takes so much longer to write and read to disk.
In general this problem is much more likely if you have a small EBS volume (since it will have less IOPS). And you can get a hint that it’s happening if package installs are particularly slow.
But you can also explicitly check.
In the AWS console for EC2, go to Volumes, and look at the Monitoring tab for your particular volume. One of the graphs will be “Burst Balance”. If the balance has flatlined and is at 0%, that means you’ve got no credit left for now, and you’re running at the minimal IOPS.
Solving the problem
Given an existing EBS volume, the easiest way to solve the problem is to increase the size of your
For example, 500GiB will give you 1500 IOPS, a much more reasonable minimum than 300 IOPS you’d get for 100GiB.
- You can switch to a
io1type EBS volume, that has configurable dedicated IOPS, but it’s probably not worth the cost just for running tests.
- You can switch to using local instance storage.
Some EC2 instances have dedicated NVMe SSD storage with vastly more IOPS, e.g.
If your test suite is unreasonably slow on EC2, do check for this problem. I’ve seen it take the run time of a Docker build from over an hour (not sure how much over, because at that point it timed out, unfinished) to just 15 minutes—and diagnosing and fixing the problem takes only 5 minutes of your time.
Not sure how to test your code? Slow tests getting in the way of shipping features?
Learn how you can upgrade your team’s skills—and ship faster—with my training classes.
You need to get your job done, so how do you find time to learn new skills?
There’s not always time to learn new tools and technologies at work—but you still need to keep your skills sharp. And with so many tools and technologies to learn, you’re not even sure where to start.
Learn relevant, practical tools and techniques, quickly and efficiently, by signing up for my newsletter.
You’ll join over 1000 Python developers and data scientists getting weekly emails about software engineering best practices, from Docker packaging, to faster code, to better testing.