Back to Insights
DevOps & Automation

OpenTofu at Scale: What Actually Changes When You Cut the HashiCorp Cord

1 June 2026·9 min read·Kineticor Team
OpenTofu at Scale: What Actually Changes When You Cut the HashiCorp Cord

Two years after HashiCorp moved Terraform to the Business Source Licence, OpenTofu has stopped being a protest fork and started being a serious operational choice. The Linux Foundation project ships on its own cadence, state encryption is in, provider iteration in for_each is in, and most of the major CI providers and registries treat it as a first-class citizen. We are now seeing OpenTofu in production at UK enterprises that, eighteen months ago, would have laughed the idea out of the room.

What we are not seeing is anyone admit how messy the migration actually is. Vendor blog posts make it sound like a brew install opentofu away from done. In the field, on real estates with three hundred state files, eight provider versions, and a CI system that nobody fully owns, it is a project — and a programme decision, not a tooling decision. This post is the version we wish every CTO had read before kicking one off.

The licence is the trigger, not the reason

Most teams move because their procurement function flagged the BSL exposure during a renewal review, or because HashiCorp's pricing for Terraform Enterprise made the FinOps case obvious once the seat count crossed a threshold. Those are valid triggers. They are not, on their own, an engineering reason to migrate.

The engineering reasons we actually see hold up are these. First, the velocity of meaningful features in OpenTofu — provider-defined functions, early evaluation of for_each, native state encryption — outpaced upstream Terraform for a stretch. Second, the registry split lets you pin to a community provider when an upstream one is delayed or quietly broken. Third, the governance model is no longer a single vendor's roadmap, which matters if you are running multi-decade systems in regulated industries.

If none of those land for your estate, do not move. A migration you cannot justify in engineering terms will not get the attention it needs once the novelty wears off, and you will end up with two tools instead of one.

What actually breaks on day one

The HCL surface is largely compatible, and the binary is a drop-in for most modules written before Terraform 1.6. The breakage is rarely in the language. It is in everything around it.

The first thing that bites is provider sourcing. OpenTofu defaults to its own registry at registry.opentofu.org. If your modules declare hashicorp/aws as the source string, that still resolves — but if you have an internal mirror configured as a network mirror or filesystem mirror, you will discover that your .terraformrc equivalent is now ~/.tofurc, and your CI image probably does not ship it. We have watched a pipeline silently fall back to the public registry from a runner that was supposed to be air-gapped. That is the kind of finding that ends up in an audit report.

The second is state file backends. The S3 backend works. The Terraform Cloud / HCP backend obviously does not. Teams that adopted HCP for the workspace UX rather than the runs themselves are in for a rethink — Spacelift, env0, Scalr and Terramate all want that seat, and they are not equivalent. Choose deliberately; this is the part of the stack your engineers will live in every day, not the binary.

The third is module sources. Private module registries, particularly anything fronted by Terraform Cloud, need their own migration. Git-sourced modules are unaffected, which is one more reason to standardise on them anyway.

Plan the cutover by blast radius, not by inventory

The instinct is to count state files and build a Gantt chart. Resist it. The right unit of work is the blast radius of a single state — what it owns, who depends on it, what breaks if a plan goes wrong at three in the morning.

Sort your estate into three tiers. Tier one is everything underneath the Landing Zone: the Control Tower customisations, the SCP module, the central logging account, the audit trails. These move last, because the blast radius is the whole estate. Tier two is shared platform: networking, transit gateways, shared services accounts, the IAM Identity Centre permission sets. Tier three is application workloads — usually most of the state file count, but the smallest individual blast radius.

Run the tier-three migration first, in waves, against a non-production copy of each workload's state. The goal of the first wave is not to be done — it is to find the actual provider version skew, the modules with old terraform blocks that pinned a version range you forgot about, and the bits of glue code in your CI that quietly assumed the binary was called terraform. By the time you reach tier one, the team has worked through the boring failures on workloads they can re-create.

The state file itself is the risk, not the binary

The single highest-impact mistake we see is treating state file conversion as a copy operation. The format is compatible — but every state file change is a chance to corrupt a production estate.

Three controls that actually matter. Take a manual aws s3 cp snapshot of every state file before any cutover, into a separate bucket with object lock and a thirty-day retention. Do not rely on S3 versioning alone — that bucket policy can be edited. Run tofu plan against the migrated state before tofu apply, and review the plan diff against a saved terraform plan from the same commit. If the diff is anything other than empty or trivially noisy, stop and investigate. And keep both binaries available on the runner for the duration of the cutover window so you can roll back without a rebuild.

CI and policy-as-code are where the project either lands or stalls

The actual delivery work is rarely in the modules. It is in the pipeline. setup-terraform in GitHub Actions, the Atlantis configuration, the Sentinel policies, the Checkov rules that hard-coded the binary name in their config — every one of those needs a deliberate decision.

If you were using Sentinel, you are now choosing between OPA, Conftest, or one of the commercial OpenTofu-aware policy engines. None of them are drop-in. Plan a month of policy rewriting and a careful regression suite, because the policy layer is what stops a junior engineer from destroying a production VPC at half past four on a Friday. We have seen teams treat this as a backlog item and pay for it three sprints later.

How Kineticor Can Help

We have led OpenTofu migrations on multi-account AWS estates for regulated UK organisations — discovery, blast-radius mapping, CI cutover, policy rewrite, and the boring state-file hygiene that decides whether the project ships or stalls. We are senior engineers who have run these programmes end-to-end; you get the same people scoping the work and writing the modules. If a Terraform-to-OpenTofu migration is on your roadmap for the next two quarters, or you are weighing the licence exposure during a renewal, get in touch — we will give you an honest read on whether it is worth doing now, or whether your team is better off staying put for another cycle.

— Danish


Danish Muhammad

Danish Muhammad

Founder, Kineticor

I help businesses achieve their vision by making the cloud work for them — efficiently, securely, and at scale. Beyond technical solutions, I focus on solving real-world challenges, aligning cloud strategy with business goals, and building high-performing teams. My background is technical delivery; my passion is solving people problems. Connect on LinkedIn.