Back to Insights
AI-Native Engineering

Stuck at Level 2 AI? Your Engineering Platform Is the Bottleneck

21 April 2026·8 min read·Kineticor Team
Stuck at Level 2 AI? Your Engineering Platform Is the Bottleneck

Almost every enterprise engineering organisation I have walked into this year has the same shape of problem. They rolled out GitHub Copilot or Amazon Q Developer eighteen months ago, measured seat adoption, declared a win, and then... nothing much changed. Cycle time is roughly where it was. The same programmes are still late for the same reasons. Engineers are producing slightly faster snippets inside the same broken delivery pipeline.

McKinsey's April 2026 note on AI-native engineering puts language around this. Level 2 is AI autocomplete — individual developer productivity, inline suggestions, the thing most teams already have. Level 3 is agentic workflows — agents executing multi-step tasks against real systems. Level 4 is agent factories — fleets of agents owned by product teams, orchestrated by humans, producing 20x the throughput of a traditional squad. The note argues, correctly in my view, that most organisations are stuck at Level 2 and that the jump to Level 3 is not a model problem or a tooling problem. It is a platform problem.

The Platform Is the Prerequisite

Agents that write code are not useful. Agents that ship code are useful. The difference is everything sitting between a prompt and production: source control conventions, CI, test harnesses, preview environments, infrastructure provisioning, secrets, observability, incident response, rollback. An agent operating without that scaffolding is a liability. An agent operating inside a well-designed internal developer platform is a force multiplier.

This is why the teams already running Level 3 workloads almost all have a mature platform engineering practice first. Backstage is serving as the agent's orientation layer — service catalogue, ownership, API schemas, runbooks. Crossplane or similar is the provisioning surface — the agent requests a resource against a golden path, not a raw CloudFormation template. Terraform modules or CDK constructs are versioned and linted. There is an eval harness. There is a sandbox account with SCPs that make destructive mistakes impossible. Without this, every agent run is a roulette wheel.

Failure Modes We See Repeatedly

Teams attempting Level 3 without the platform underneath tend to fail in predictable ways.

Snowflake infrastructure at agent scale. An agent asked to "set up a new service" will invent a VPC, an RDS instance, and an IAM role from scratch every time. Multiply that by 50 engineers and you have unmanageable drift inside a quarter. The fix is not a better prompt. It is a Crossplane composition or a Backstage scaffolder template that the agent is forced to call, with the raw cloud APIs locked down by SCPs.

No eval, no trust. Teams run an agent over a change, merge the output, and discover three weeks later that a subtle regression has been in production the whole time. You need the same rigour you apply to human code: contract tests, integration tests, preview environments, staged rollouts. If your pipeline cannot catch a bad change from a human, it will not catch a bad change from an agent either.

Identity and audit gaps. Agents are being handed long-lived IAM access keys, stored in plaintext in a config file somewhere, with permissions an engineer would never be granted directly. When the audit question comes — who changed this, and why — there is no answer. Agents need proper identity (IAM Identity Center, scoped roles assumed via STS), short-lived credentials, and every action in CloudTrail tagged with the agent's session.

Runaway token and compute spend. An agent in a retry loop against Bedrock can spend four figures an hour. We have seen it. Without budget guardrails (Bedrock throttling, per-agent cost allocation tags, circuit breakers at the MCP server layer) you find out on the monthly invoice.

What Level 3 Actually Looks Like on AWS

The concrete stack we are building with clients who are serious about this:

  • Amazon Bedrock AgentCore for agent orchestration, with Claude or Nova as the primary model and a clear routing policy for simpler tasks.
  • MCP servers exposing the platform's capabilities — service catalogue reads, infra provisioning requests, deployment triggers, log queries — as tools with typed schemas, not freeform shell access.
  • Backstage as the context surface. The agent reads ownership, dependencies, and runbooks from the same place a human engineer would.
  • Crossplane compositions as the only path to new infrastructure. The agent cannot skip the golden path because the raw AWS APIs are denied at the SCP level from its execution account.
  • A dedicated agent execution account per environment, with GuardDuty monitoring for anomalous behaviour, CloudTrail centralised, and AWS Config rules enforcing tag hygiene for agent-owned resources.
  • An eval harness — a versioned set of tasks and expected outcomes, run against every agent and model change, with results in the same dashboards as your deployment metrics.

None of this is exotic. All of it is work. Most of it is platform engineering work that teams were already behind on before agents turned up and made the gap visible.

The Operating Model Shift Is Harder Than the Tech

The technical stack is the easy half. The harder half is changing how work is organised. Level 4 — the agent factory — requires product-oriented teams that own outcomes, not tickets. It requires engineers who see themselves as orchestrators and reviewers rather than keystroke-producers. It requires leadership that measures delivered customer outcomes, not agent adoption rates or seats licensed. These shifts are uncomfortable for organisations whose operating model is still fundamentally a project-based delivery factory. Mandating Copilot seats without touching any of this produces Level 2 and stops.

Where to Start

If your organisation is stuck at Level 2, the honest diagnosis is usually one of three things. You do not have a golden path for provisioning — every service is bespoke. You do not have an eval harness — you cannot tell a good change from a bad one without a human reading every diff. Or you do not have a product operating model — your engineers are still on a ticket treadmill and have no headroom to operate as orchestrators. Pick the one that is most broken and fix it first. The agent work will not land until one of these three is in place.

How Kineticor Can Help

Kineticor builds the engineering platforms that make Level 3 AI-native delivery possible. We have senior engineers who have run the Backstage, Crossplane, Bedrock and EKS rollouts inside regulated UK enterprises, and who have made the SCP, IAM and audit work hold up to scrutiny. We own the end-to-end path — discovery, architecture, build, enablement — so the platform is handed over to your teams working, documented, and safe to extend. If your Copilot rollout has plateaued and you want a concrete plan to move past autocomplete, get in touch.

— Danish


Danish Muhammad

Danish Muhammad

Founder, Kineticor

I help businesses achieve their vision by making the cloud work for them — efficiently, securely, and at scale. Beyond technical solutions, I focus on solving real-world challenges, aligning cloud strategy with business goals, and building high-performing teams. My background is technical delivery; my passion is solving people problems. Connect on LinkedIn.