The time elapsed between writing and shipping is the room temp petri dish where pathological symptoms breed and snowball. Longer lead times lead to larger code diffs and slower code reviews. This means anyone reviewing or revising these nightmare diffs has to pause and swap the full context in and out of their mind any time they switch gears, from writing code to reviewing and back again.
Thus the elastic feedback loop of the development cycle begins to stretch out longer and longer, as people spend more and more time waiting on each other—to review, to comment, to deploy, to make requested changes—and more and more time paging state in and out of their brains and trying to remember where they were and what they were trying to do.
But it gets worse. Most deploys are run by hand, at some indeterminate interval after the code was written, not by the person who wrote the code, and—worst of all—with many developers’ changes batched up at once. It is this, above all else, that severs the engineer from the effects of their work, and makes it impossible for them to practice responsible ownership over the full lifecycle of their code.
Batching up changes means you cannot practice observability-driven development; you can’t expect engineers to watch their code go out and ensure it is working in production. You don’t know when your code is going out and you don’t know who is responsible for a change in production behavior; therefore, you have no accountability or autonomy and cannot practice software ownership over your own code. Ultimately, your engineers will spend a higher percentage of their time waiting, fumbling, or dealing with tech debt, not moving the business forward.
Since so many of your existing engineers are tied up, you will need to hire many more of them, which carries higher coordination costs. Soon, you will need specialized roles such as SRE, ops, QA, build engineers, etc. to help you heroically battle each of the symptoms in isolation. Then you will need more managers and TPMs, and so on. Guess what! Now you’re a big, expensive company, and you lumber like a dinosaur.
It gets slower, so it gets harder and more complicated, so you need more resources to manage the complexity and battle the side effects, which creates even more complexity and surface area. This is the death spiral of software development. But there is another way. You can fix the problem at the source, by focusing relentlessly on the length of time between when a line of code is written and when it is fully deployed to production. Fixate on that interval, automate that interval, track that interval, dedicate real engineering resources to shrinking it over time.
Until that interval is short enough to be a functional feedback loop, all you will be doing is managing the symptoms of dysfunction. The longer the delay, the more of these symptoms will appear, and the more time your teams will spend running around bailing leaks.
How short? 15 minutes is great, under an hour is probably fine. Predictability matters as much as length, which is why human gates are a disqualifier. But if your current interval is something like 15 days, take heart—any work you put into shortening this interval will pay off. Any improvement pays dividends.
This is where lots of people look dubious. “I’m not sure my team can handle that”, they may say. “This is all very well and good for Facebook or Google, but we aren’t them.”
[…] This may surprise you, but continuous deployment is far and away the easiest way to write, ship, and run code in production. This is the counterintuitive truth about software: making lots of little changes swiftly is infinitely easier than making a few bulky changes slowly.
Think of it this way. Which of these bugs would be easier for you to find, understand, repro, and fix: a bug in the code you know you wrote earlier today or a bug in the code someone on your team probably wrote last year?
Source: Fulfilling the promise of CI/CD, Charity Majors, https://stackoverflow.blog/2021/12/20/fulfilling-the-promise-of-ci-cd/