Tag: architecture

  • The real cost of infrastructure drift — and why most teams don’t measure it

    Every organization has infrastructure drift. The production environment doesn’t quite match the staging environment. The architecture diagram in Confluence doesn’t match either of them. The Terraform files were written six months ago and nobody’s sure they’re current. The engineer who knew how the payment service connected to the database left in March.

    This is normal. It happens everywhere. Most teams manage it through a combination of institutional knowledge, careful engineers, and occasional heroics when something breaks.

    What most teams don’t do is measure what it costs. And the number, when you add it up honestly, tends to be surprising.

    What infrastructure drift actually is.

    Drift is the gap between what you believe your infrastructure to be and what it actually is. It takes several forms:

    Design-to-deployment drift

    The architecture was designed one way. It was deployed slightly differently — under pressure, with shortcuts, with pragmatic changes that seemed small at the time. Nobody updated the design. Six months later, the design document is fiction.

    Environment drift

    Production and staging diverge. A configuration change was applied to production in an incident but never backported. A new service was added to staging for testing and never cleaned up. The two environments were supposed to be mirrors. They’re not.

    Documentation drift

    The architecture diagram was accurate when it was drawn. Two hundred deployments later, it’s an archaeological artifact. Anyone using it to understand the system is working from wrong information.

    Ownership drift

    The team that built the service moved on to something else. Nobody formally transferred ownership. The service still runs. When it breaks, or when an auditor asks who owns it, the answer is a spreadsheet that hasn’t been updated since Q2.

    Why teams don’t measure the cost.

    The cost of drift is real but diffuse. It shows up as:

    • Extra hours spent on onboarding new engineers who have to reverse-engineer the real system
    • Incident investigation time spent figuring out what the infrastructure actually looks like before diagnosing the problem
    • Deployment delays caused by environment inconsistencies discovered mid-deployment
    • Architecture review cycles that take three times as long because the baseline documentation is wrong
    • Compliance preparation sprints before audits — weeks of engineering time spent reconstructing evidence that should have been continuous

    None of these show up as a line item called ‘drift cost.’ They show up as engineering time, delayed releases, and compliance scrambles. Because the cost is distributed across many activities and never attributed to drift specifically, it tends to be invisible.

    A framework for estimating your drift cost.

    Here’s a rough model. Adjust the numbers for your organization — the point is the structure, not the specific figures.

    Deployment cycle overhead

    Take your average time from architecture decision to running infrastructure. For most organizations this is 60–90 days. Now estimate how much of that time is spent on manual translation: taking the architecture design and turning it into IaC, configuration, and deployment scripts, then debugging what was lost in translation.

    If 40% of a 90-day cycle is manual translation and debugging, that’s 36 person-days per project. At an average fully-loaded engineering cost of €500–800 per day, that’s €18,000–€29,000 per project in deployment overhead alone.

    For an organization running 10 infrastructure projects per year: €180,000–€290,000 annually, just in deployment cycle overhead.

    Incident investigation overhead

    When an incident happens in a drifted environment, the first 30–60 minutes are often spent figuring out what the infrastructure looks like — before any actual diagnosis begins. How often does this happen in your environment? Once a week? Three times a month?

    Assume 20 significant incidents per year, each requiring 45 minutes of infrastructure archaeology at the start, involving an average of 3 engineers. That’s 45 hours of senior engineering time per year — roughly €15,000–€25,000 — spent on investigation overhead that would be near-zero with a current infrastructure model.

    Compliance preparation

    How long did your last compliance audit preparation take? How many engineering weeks went into pulling together the asset inventory, the change history, the architecture documentation? This is the most visible form of drift cost because it happens on a schedule and the effort is obvious.

    A mid-sized financial institution preparing for a DORA assessment might spend 4–8 engineering weeks on documentation preparation. At €500–800/day, that’s €20,000–€32,000 per audit cycle — for evidence that should have been continuous and automatic.

    Onboarding friction

    A new infrastructure engineer joins. How long before they have an accurate picture of the real infrastructure — not the version in the wiki, not the version in someone’s memory, but what’s actually running? Two weeks? A month?

    If you hire 4 infrastructure engineers per year and each loses 3 weeks of productive time to system archaeology, that’s 12 weeks of senior engineering time — roughly €30,000–€48,000 annually in onboarding friction directly attributable to infrastructure drift.

    Adding it up.

    Using the conservative end of each estimate:

    • Deployment cycle overhead: €180,000/year
    • Incident investigation overhead: €15,000/year
    • Compliance preparation: €20,000/year
    • Onboarding friction: €30,000/year

    Conservative total: €245,000 per year, for a mid-sized organization running 10 infrastructure projects annually.

    That’s the cost of drift that shows up in your P&L as ‘engineering time’ and ‘delayed releases.’ It doesn’t include the harder-to-measure costs: the delayed product launches, the architecture debt that accumulates when shortcuts become permanent, the compliance risk when documentation doesn’t match reality.

    The question isn’t whether infrastructure drift costs you money. It does. The question is whether you’re measuring it — and whether you’re willing to do something about it.

    What reducing drift actually looks like.

    Drift isn’t inevitable. It’s the result of a specific architectural failure: the gap between the tool used to design infrastructure and the tool used to deploy, operate, and document it.

    When design, deployment, discovery, and documentation all live in the same system — updated from the same source of truth — drift approaches zero. The architecture diagram is generated from the real infrastructure, not maintained separately from it. New environments are provisioned from the same model that describes production. Change history is continuous and automatic.

    Our first customer reduced their deployment cycle from 90 days to 10 days after implementing this approach. The reduction wasn’t magic — it was the elimination of the manual translation layer between design and deployment, and the elimination of the debugging that translation introduces.

    What to do with this.

    If you’ve never run this calculation for your organization, do it. It takes about an hour and the result is usually enough to justify a meaningful investment in infrastructure automation and governance tooling.

    If you’d like to see what the numbers look like for your specific environment — based on your infrastructure, your team size, and your compliance obligations — we can work through it together in a demo.

    Book a Demo