Why Does CI Suck

· ngp's blog


CI is one of those things that most developers will have encountered at some point. It's widely considered a "best practice" for some tortured definition of "best" (and debatable definitions of "practice") and yet the industry has ossified around shell commands and container gunk embedded in YAML DSL garbage. It's horrible in nearly any way I can think of. You know what's insane? Having to make a commit and push every time you want to check if your code works1. Are there technical reasons for it? Sorta? Are they worth it? Absolutely not. How did developers test their code before the advent of YAML, modern git forges, scrumlords, and the other garbage that pollutes the industry? They ran make(1), if they were on [[UNIX]] and probably didn't test their code if they were on Windows (it's not like Microsoft does)2. Java devs have had Jenkins (formerly Hudson) for quite awhile, and I feel like that is really the only major platform to largely get the core idea right (while also getting a lot of really important details completely, horribly wrong).

Where did we go wrong?

make(1) is weird. It's crufty, old, and ubiquitous. It was probably written some time in the 80s and takes some really good ideas (file dependency based task execution) and mixes it all up with some really bad ones (text processing, weirdly obtuse syntax) that make learning it just difficult enough to scare off people with better things to do with their time. It's similar enough to sh(1) syntax that it lulls you into believing it's the same. It's not, and you will get bit by it fairly quickly. But, it's everywhere, and it gets 80% of the way there, so it sticks around. Aside from the syntax, it's otherwise relatively simple. There have been many attempts to replace it (use [[Plan9]]'s mk(1) you cowards), but none are as popular.

What is the core idea of CI?

Running the damn code. That's it. The most useful aspect of writing and running automated tests of any kind is to create a simulated environment to execute the code and check that it did the thing you intended it to do. There's a ton of complexity that goes into making that work right. That random library you apt-get installed onto your system 2 years ago and have never thought about again since? Yeah, that matters. Sometimes quite a lot too. One of the first challenges met when building automated test execution is preparing an "environment" in which the code may be built and executed within. Below is a short, incomplete list of some of the things you need to consider when creating this environment:

That doesn't seem like that long of a list, but there's a nuance and complexity that the simple terms hide. make(1) excels at the doing part, but it doesn't really have any way to manage the where part. Fortunately for us, most of these have been "solved" using a collection of utilities, pseudo-standards, and kernel APIs called containers. Docker is the most popular, but there's nothing particularly special about Docker itself these days. I personally use podman(1) as it is less opinionated about what services I have running on my system.

Containers solve one very critical problem when it comes to CI: creating a mostly-reproduceable userspace in which to build and run code. Despite their shortcomings, containers are a good concept. In general, open container images can knock out most of the above list and are widely understood, making them an ideal concept to build on top of for something like CI.

Why is every CI platform so complicated?

A mix of incentives and feature creep. CI platforms generally monetize by being a middle-man between you and AWS or whatever public cloud provider they've decided to purchase bulk compute from. It's most common to see charges per-minute of CPU time (scaled by the number of CPU "cores" available in the executor runtime). Modern CI platforms cram event-driven execution, time-interval driven periodic tasks, and a spattering of RBAC to make things confusing and more complicated3. There is something to be said about reducing barrier to entry, there is a place for hosted/managed executor platforms, but it should not be the default.

For some very basic cases, it is nice to be able to click a few buttons, check some boxes and have features set up for you. Unfortunately, it's rarely that simple. As soon as you've invested in an opinionated framework, breaking away from its happy-path becomes significantly more difficult: not only do you have to learn how to solve the problem at hand, you have to learn how the framework wants you to do it!

It could be better.

The best "DevOps" (whatever that means) engineer I've worked with had a wonderful suggestion: whatever CI platform you use should be a thin wrapper around something you can execute locally.

This makes intuitive sense, developers typically test their code locally before pushing it and requesting reviews, at least I do. The sooner a bug/failure is caught, the less time and CPU cycles are wasted. Further, if you're burning that electricity anyways, why duplicate the work on underpowered cloud VMs again after checking it locally? Some platforms have tried to implement this4, to marginal success. The benefits are clearly there, so why did they fail?

CI, as it should be #

There are a few things that should be core principles of any honest CI platform.

A basic outline for the design of a platform using these principles can be found [[CI How it Should Be|here]]. That design is a work-in-progress and likely to change.



  1. There are a few platforms that solve this in a few different ways. [[Sourcehut]] is one that has a mostly acceptable solution, but falls to the same """declarative""" pitfalls of nearly every other ↩︎

  2. For legal reasons, this is a joke. ↩︎

  3. That isn't to say that the RBAC isn't helpful or nice. [[GitLab]]'s repository access lists using the ephemeral access token is very nice for creating multi-repo projects. This is a pattern I like and wish was more common, but it also has a lot of complexity and requires a shift in thinking about "access", which may be potentially dangerous for certain risk profiles. ↩︎

  4. https://earthly.dev/ - I think it failed because the incentives for building a good platform that people want to use are heavily at odds with making money. Their bespoke format also makes it a tougher sell, even if it does streamline certain operations. ↩︎