Author: ikxx

  • Draw your infrastructure. Deploy it. Here’s how that actually works.

    The idea sounds simple: draw an architecture diagram, press a button, get Terraform. But anyone who has tried to automate the path from architecture design to deployable infrastructure knows it’s not simple. The gap between a diagram and working IaC is where most of the complexity lives.

    In this post I want to explain how the draw-to-deploy workflow actually functions — the technical pieces involved, where the hard problems are, and what makes it reliable versus fragile.

    Why the gap exists.

    Architecture diagrams and infrastructure as code are two different representations of the same thing — but they’re designed for different audiences and optimized for different purposes.

    A diagram is optimized for human understanding. It shows relationships, groupings, and high-level structure. It deliberately abstracts away implementation details. A service and a container are distinct boxes in a diagram. How the container image is built, what resources it requests, what health check it uses — none of that is in the diagram.

    IaC is optimized for machine execution. It’s explicit about every detail. A Kubernetes deployment manifest needs a container image, resource limits, liveness probes, environment variables, and dozens of other fields that have no representation in a diagram.

    Bridging these two representations is the core technical challenge of draw-to-deploy. You’re translating from human-optimized to machine-optimized, and the translation requires filling in a lot of information that exists in the diagram implicitly or not at all.

    The technical pieces.

    1. A structured diagram format

    The first requirement is that diagrams are structured, not just visual. A PNG export of a Visio diagram is a picture — you can look at it, but a system can’t parse it meaningfully. Draw-to-deploy requires a format where the diagram structure is machine-readable.

    draw.io (also known as diagrams.net) stores diagrams as XML. The XML encodes what shapes exist, what they’re connected to, and what labels they carry. It’s not infrastructure-aware by default — a box labeled ‘service’ is just a labeled box — but it’s parseable.

    Mermaid is more structured. A Mermaid architecture diagram explicitly declares node types and relationships in a syntax designed for parsing. The semantic information is more reliable.

    The more structured the input format, the more the system can infer about the infrastructure being described.

    2. Component recognition and classification

    Given a parseable diagram, the next step is identifying what each component is. A box labeled ‘PostgreSQL’ or a standard database icon is recognizable. A box with a custom label might be a service, a database, a queue, a load balancer — the system needs to determine which.

    This is where an infrastructure-aware diagram format pays off. If the diagram is drawn in a tool that has explicit node types — ‘this is a Kubernetes service, this is an RDS instance, this is an S3 bucket’ — classification is reliable. If you’re parsing a free-form diagram, you’re relying on label matching and shape heuristics, which is less reliable.

    Good component recognition produces a typed infrastructure model: Service A (type: Kubernetes Deployment), connected to Database B (type: PostgreSQL RDS), running in Environment C (type: AWS VPC).

    3. Relationship extraction

    Connections in a diagram represent relationships. But ‘relationship’ is underspecified. Two components connected by an arrow might have a network connection, a dependency, a data flow, an API call, or a simple ‘runs inside’ relationship. The type of relationship determines what IaC is generated.

    A Service connected to a Database generates different IaC than a Service connected to a Message Queue. Getting this right requires either explicit relationship labeling in the diagram or inference from component types.

    The rule I apply: if the relationship type matters for IaC generation (and it almost always does), it should be explicitly labeled. Rely on inference only for relationships where the type is unambiguous — a container connecting to a cluster node is always a ‘runs on’ relationship.

    4. Template-based code generation

    With a typed infrastructure model and explicit relationships, code generation is the most tractable part of the problem. Each component type maps to a template. The templates are parameterized by the values extracted from the diagram — name, environment, scale, connectivity.

    A Kubernetes Deployment template has fields for: name, image, replicas, resource requests and limits, environment variables, volume mounts, and health checks. Some of these come from the diagram (name, perhaps replicas if specified). Others come from defaults appropriate for the component type and environment. Some require explicit values that must be provided.

    The output is IaC that is correct in structure and partially complete in content. The engineer filling in the remaining values has a valid starting point — not a blank file.

    5. The defaults problem

    This is the hardest part of reliable draw-to-deploy, and the part that most implementations get wrong.

    A diagram doesn’t specify resource limits. What does the generated Terraform request? A diagram doesn’t specify backup retention. What does the generated RDS configuration set? A diagram doesn’t specify encryption. What does the generated storage configuration use?

    If the generator picks arbitrary defaults, the result is either insecure (defaulting to no encryption), over-provisioned (defaulting to large instance sizes), or under-specified (generating a skeleton that can’t actually deploy without significant manual completion).

    Sensible defaults require infrastructure knowledge baked into the generator. The defaults for a production PostgreSQL RDS instance should be different from the defaults for a development one. The defaults for infrastructure in a GDPR-regulated environment should include encryption settings that defaults for a non-regulated environment might not require.

    Context-aware defaults — defaults that depend on the environment, the regulatory context, and the component type — are what separate a useful draw-to-deploy implementation from one that generates technically valid but operationally wrong code.

    What makes it reliable.

    A draw-to-deploy workflow is reliable when:

    • The diagram format is structured and infrastructure-aware, not free-form
    • Component types are explicit, not inferred from labels
    • Relationship types are labeled, not guessed
    • Defaults are context-aware — they know what environment and regulatory context they’re generating for
    • The generated code is reviewed by an engineer before deployment — the diagram-to-IaC step fills in structure, not substitutes for judgment

    That last point matters. Draw-to-deploy is not a push-button deploy system. It’s a system that eliminates the manual, error-prone translation layer between design and code, and produces a correct-structure, sane-defaults starting point that an engineer reviews and approves. The engineer is still in the loop — they’re just not spending their time transcribing box positions into YAML.

    What the workflow looks like in practice.

    Here’s the practical sequence with Vernix.one:

    • Draw the architecture in Vernix.one’s visual designer, or import an existing draw.io or Mermaid diagram
    • Vernix.one parses the diagram, classifies components, and extracts relationships
    • The infrastructure model is built — a structured, typed representation of what the diagram describes
    • Select the target environment (development, staging, production) and IaC format (Terraform, Pulumi, Ansible)
    • Vernix.one generates IaC using environment-appropriate defaults and your organization’s standards
    • Review the generated code — verify the defaults are right, fill in any values that require manual specification (specific secrets, custom configurations)
    • Deploy through your normal pipeline

    The time from diagram to deployable IaC is minutes, not days. The generated code is structurally correct and compliant with defaults appropriate for the environment. The engineer’s time is spent on review and refinement, not transcription.

    The closed loop.

    The workflow doesn’t stop at deployment. After the infrastructure is running, Vernix.one’s discovery engine scans the live environment and updates the infrastructure model. If the deployed infrastructure diverges from the diagram — a scale change, a configuration update, an emergency fix — the model reflects the divergence.

    The diagram is no longer a static document that becomes wrong the moment it’s published. It’s a view into the current state of the infrastructure model, updated from the real environment. When you look at the diagram tomorrow, it reflects what’s running tomorrow.

    This is the closed loop that makes the workflow durable rather than a one-time convenience. Design flows into deployment. Deployment flows back into the model. The model always reflects reality.

    If you want to see this with your own diagrams — bring a draw.io file, a Mermaid diagram, or even a Terraform file, and we’ll run the full workflow in a demo session.

    Book a Demo

  • The real cost of infrastructure drift — and why most teams don’t measure it

    Every organization has infrastructure drift. The production environment doesn’t quite match the staging environment. The architecture diagram in Confluence doesn’t match either of them. The Terraform files were written six months ago and nobody’s sure they’re current. The engineer who knew how the payment service connected to the database left in March.

    This is normal. It happens everywhere. Most teams manage it through a combination of institutional knowledge, careful engineers, and occasional heroics when something breaks.

    What most teams don’t do is measure what it costs. And the number, when you add it up honestly, tends to be surprising.

    What infrastructure drift actually is.

    Drift is the gap between what you believe your infrastructure to be and what it actually is. It takes several forms:

    Design-to-deployment drift

    The architecture was designed one way. It was deployed slightly differently — under pressure, with shortcuts, with pragmatic changes that seemed small at the time. Nobody updated the design. Six months later, the design document is fiction.

    Environment drift

    Production and staging diverge. A configuration change was applied to production in an incident but never backported. A new service was added to staging for testing and never cleaned up. The two environments were supposed to be mirrors. They’re not.

    Documentation drift

    The architecture diagram was accurate when it was drawn. Two hundred deployments later, it’s an archaeological artifact. Anyone using it to understand the system is working from wrong information.

    Ownership drift

    The team that built the service moved on to something else. Nobody formally transferred ownership. The service still runs. When it breaks, or when an auditor asks who owns it, the answer is a spreadsheet that hasn’t been updated since Q2.

    Why teams don’t measure the cost.

    The cost of drift is real but diffuse. It shows up as:

    • Extra hours spent on onboarding new engineers who have to reverse-engineer the real system
    • Incident investigation time spent figuring out what the infrastructure actually looks like before diagnosing the problem
    • Deployment delays caused by environment inconsistencies discovered mid-deployment
    • Architecture review cycles that take three times as long because the baseline documentation is wrong
    • Compliance preparation sprints before audits — weeks of engineering time spent reconstructing evidence that should have been continuous

    None of these show up as a line item called ‘drift cost.’ They show up as engineering time, delayed releases, and compliance scrambles. Because the cost is distributed across many activities and never attributed to drift specifically, it tends to be invisible.

    A framework for estimating your drift cost.

    Here’s a rough model. Adjust the numbers for your organization — the point is the structure, not the specific figures.

    Deployment cycle overhead

    Take your average time from architecture decision to running infrastructure. For most organizations this is 60–90 days. Now estimate how much of that time is spent on manual translation: taking the architecture design and turning it into IaC, configuration, and deployment scripts, then debugging what was lost in translation.

    If 40% of a 90-day cycle is manual translation and debugging, that’s 36 person-days per project. At an average fully-loaded engineering cost of €500–800 per day, that’s €18,000–€29,000 per project in deployment overhead alone.

    For an organization running 10 infrastructure projects per year: €180,000–€290,000 annually, just in deployment cycle overhead.

    Incident investigation overhead

    When an incident happens in a drifted environment, the first 30–60 minutes are often spent figuring out what the infrastructure looks like — before any actual diagnosis begins. How often does this happen in your environment? Once a week? Three times a month?

    Assume 20 significant incidents per year, each requiring 45 minutes of infrastructure archaeology at the start, involving an average of 3 engineers. That’s 45 hours of senior engineering time per year — roughly €15,000–€25,000 — spent on investigation overhead that would be near-zero with a current infrastructure model.

    Compliance preparation

    How long did your last compliance audit preparation take? How many engineering weeks went into pulling together the asset inventory, the change history, the architecture documentation? This is the most visible form of drift cost because it happens on a schedule and the effort is obvious.

    A mid-sized financial institution preparing for a DORA assessment might spend 4–8 engineering weeks on documentation preparation. At €500–800/day, that’s €20,000–€32,000 per audit cycle — for evidence that should have been continuous and automatic.

    Onboarding friction

    A new infrastructure engineer joins. How long before they have an accurate picture of the real infrastructure — not the version in the wiki, not the version in someone’s memory, but what’s actually running? Two weeks? A month?

    If you hire 4 infrastructure engineers per year and each loses 3 weeks of productive time to system archaeology, that’s 12 weeks of senior engineering time — roughly €30,000–€48,000 annually in onboarding friction directly attributable to infrastructure drift.

    Adding it up.

    Using the conservative end of each estimate:

    • Deployment cycle overhead: €180,000/year
    • Incident investigation overhead: €15,000/year
    • Compliance preparation: €20,000/year
    • Onboarding friction: €30,000/year

    Conservative total: €245,000 per year, for a mid-sized organization running 10 infrastructure projects annually.

    That’s the cost of drift that shows up in your P&L as ‘engineering time’ and ‘delayed releases.’ It doesn’t include the harder-to-measure costs: the delayed product launches, the architecture debt that accumulates when shortcuts become permanent, the compliance risk when documentation doesn’t match reality.

    The question isn’t whether infrastructure drift costs you money. It does. The question is whether you’re measuring it — and whether you’re willing to do something about it.

    What reducing drift actually looks like.

    Drift isn’t inevitable. It’s the result of a specific architectural failure: the gap between the tool used to design infrastructure and the tool used to deploy, operate, and document it.

    When design, deployment, discovery, and documentation all live in the same system — updated from the same source of truth — drift approaches zero. The architecture diagram is generated from the real infrastructure, not maintained separately from it. New environments are provisioned from the same model that describes production. Change history is continuous and automatic.

    Our first customer reduced their deployment cycle from 90 days to 10 days after implementing this approach. The reduction wasn’t magic — it was the elimination of the manual translation layer between design and deployment, and the elimination of the debugging that translation introduces.

    What to do with this.

    If you’ve never run this calculation for your organization, do it. It takes about an hour and the result is usually enough to justify a meaningful investment in infrastructure automation and governance tooling.

    If you’d like to see what the numbers look like for your specific environment — based on your infrastructure, your team size, and your compliance obligations — we can work through it together in a demo.

    Book a Demo

  • What DORA actually requires from your IT infrastructure

    DORA — the Digital Operational Resilience Act — went live across the EU in January 2025. If you work at a bank, insurance company, investment firm, payment processor, or any of the other financial entities it covers, you are now legally required to comply.

    The problem is that DORA is a regulation written by lawyers, for lawyers. The actual text runs to dozens of articles, each referencing others, with definitions that require their own definitions. Reading it and knowing what you need to do to your infrastructure are two very different things.

    This post cuts through it. Here’s what DORA actually requires from your IT infrastructure — organized by what you need to have, what you need to document, and what you need to prove.

    First: who does DORA apply to?

    DORA covers ‘financial entities’ as defined in Article 2. The list is long but the main categories are:

    • Credit institutions (banks)
    • Payment institutions
    • Electronic money institutions
    • Investment firms
    • Crypto-asset service providers
    • Insurance and reinsurance undertakings
    • Occupational pension funds
    • Credit rating agencies
    • Data reporting service providers

    The regulation also applies to ICT third-party service providers that serve these entities — cloud providers, software vendors, managed service providers — though their obligations differ.

    If you’re unsure whether your organization is covered, assume it is and verify with legal counsel. The fines for non-compliance reach €10 million or 2% of total annual worldwide turnover, whichever is higher.

    What DORA requires — broken down by infrastructure topic.

    1. ICT risk management framework (Articles 5–16)

    You need a documented ICT risk management framework. This isn’t a policy document — it requires actual implementation evidence.

    Specifically, your framework must include:

    • An up-to-date map of all ICT systems, assets, and their interdependencies
    • Classification of systems by criticality
    • Identification of all dependencies on third-party ICT providers
    • Documented processes for identifying, assessing, and managing ICT risks

    The phrase ‘up-to-date map of all ICT systems’ is doing a lot of work here. It means a continuously maintained inventory — not a spreadsheet that someone updates before audits. DORA assessors will ask when it was last verified, how it’s kept current, and what process updates it when infrastructure changes.

    2. ICT-related incident management (Articles 17–23)

    You need to be able to detect, classify, and respond to ICT incidents. Infrastructure requirements here include:

    • Monitoring capability across critical systems — you need to know when something goes wrong, fast
    • Documented incident response procedures with clear escalation paths
    • The ability to reconstruct what your infrastructure looked like at the time of an incident
    • Logging that supports root-cause analysis

    That last point is often missed. DORA requires you to be able to explain what happened and why. That requires a change history — a record of what your infrastructure looked like before, during, and after an incident.

    3. Digital operational resilience testing (Articles 24–27)

    You need to regularly test your ICT systems for resilience. For most financial entities this means:

    • Basic testing: vulnerability assessments and network security testing
    • Advanced testing (for significant entities): Threat-Led Penetration Testing (TLPT)

    The infrastructure requirement here is documentation — evidence that tests were conducted, what was tested, what was found, and what was remediated. This evidence needs to be traceable back to specific infrastructure components.

    4. Third-party ICT risk (Articles 28–44)

    This is the part that catches organizations off guard. DORA requires you to manage the ICT risk of your third-party providers — not just accept their security certifications and move on.

    The key requirements:

    • Maintain a register of all ICT third-party service providers
    • Classify providers by criticality
    • Conduct due diligence before engagement and ongoing monitoring
    • Include specific contractual provisions in agreements with critical providers

    Article 30 is particularly important for infrastructure teams. It requires that contracts with third-party ICT providers include provisions for exit — the ability to transition services away from the provider. This applies to any tool or platform that touches your critical infrastructure, including governance and monitoring tools.

    5. Information and intelligence sharing (Articles 45–49)

    DORA encourages (and for significant entities, may require) participation in information sharing about cyber threats. The infrastructure implication is that you need to be able to extract and share relevant threat intelligence from your environment without exposing sensitive operational data.

    The documentation DORA assessors will ask for.

    When a DORA assessment happens — either internal audit or regulatory inspection — here’s the documentation that will be requested:

    • ICT asset inventory — complete, current, and with a process for keeping it current
    • System criticality classification — which systems are critical, how that was determined
    • Change history — evidence that infrastructure changes are tracked and reviewed
    • Third-party provider register — all providers, criticality classification, contractual status
    • Incident records — classification, response timeline, infrastructure impact
    • Testing evidence — what was tested, when, by whom, findings, and remediation
    • Exit plans — documented capability to exit critical third-party arrangements

    The common thread: you need to be able to produce this documentation quickly, for any point in time, and have it reflect what actually happened — not what was planned.

    The most common DORA gaps in infrastructure teams.

    The asset inventory is manual and out of date

    Every infrastructure team has some form of asset inventory. Almost none of them are current. The moment a new service is deployed, a container is scaled, or a configuration changes, the inventory is wrong. DORA requires a process for keeping it current — not a best-effort spreadsheet.

    The change history is in Git, Jira, and memory

    Most teams can tell you what changed if you ask the right person. They can’t always tell you what changed across all environments, across all teams, for a specific time window, in a format that maps to specific ICT systems. DORA needs that mapping.

    Third-party dependencies aren’t fully mapped

    Cloud providers are obvious. But what about the IaC tools, the monitoring platforms, the compliance scanning tools? Every piece of software that touches your infrastructure is a third-party ICT provider under DORA. Most teams haven’t mapped all of them.

    Exit capability is theoretical, not demonstrated

    Most organizations can tell you they could exit a provider ‘in principle.’ DORA requires that exit capability be documented and demonstrated. For many tools — especially those deeply integrated with infrastructure management — exit is more complex than assumed.

    What a DORA-ready infrastructure looks like.

    A DORA-compliant infrastructure is one where:

    • Every component is discovered, classified, and documented automatically — not manually
    • Every change is tracked with a timestamp, an actor, and a before/after state
    • Compliance checks run continuously, not just before assessments
    • Reports can be generated for any historical state, not just the current one
    • All tooling is self-hosted or has documented, exercisable exit capability

    None of these are aspirational. DORA assessors will expect evidence, not intent.

    The good news: if you have continuous infrastructure discovery, versioned change tracking, and automated compliance checking, most of the evidence DORA requires is already there.

    What to do next.

    If you’re an IT leader at an EU financial institution, the first step is understanding your current gap. Specifically:

    • Do you have a current, complete ICT asset inventory? How is it maintained?
    • Do you have a change history that maps changes to specific ICT components?
    • Have you mapped all third-party ICT dependencies, including tooling?
    • Do you have documented exit capability for critical providers?

    These aren’t trick questions. They’re the questions a DORA assessor will ask. If any of them give you pause, that’s where to start.

    Vernix.one was built specifically for this problem. It discovers infrastructure automatically, maintains a versioned history of every change, runs continuous compliance checks against DORA requirements, and is self-hosted — so it satisfies Article 30’s exit requirement by design. We can show you exactly what your DORA compliance status looks like today, against your real infrastructure, in a single session.

    Book a Demo