|

In February 2025, Andrej Karpathy tweeted a throwaway thought that would define an entire year of debate:

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."

Twelve months later, Karpathy himself had moved on. In his 2025 LLM Year in Review, he described going from "80% manual coding + 20% agents in November to 80% agent coding + 20% manual edits in December." He stopped calling it vibe coding. He started calling it agentic engineering.

Meanwhile, Spotify's co-CEO Gustav Söderström told investors on a Q4 2025 earnings call that the company's best developers "have not written a single line of code since December."

The industry is moving so fast that the framework people use to describe AI development is already outdated. Most people think of it as a ladder:

  1. Vibe Coding — the beginner level
  2. AI-Assisted — the responsible middle
  3. Agentic — the advanced tier

That ladder is wrong. These aren't levels you graduate through. They're layers you stack together — like a sandwich.

Layer 1: Vibe Coding — The Top Slice

Karpathy's original tweet captured something real. When you're exploring an idea, prototyping a concept, or hacking on a weekend project, fighting the AI is counterproductive. Give in to the vibes. Let the model generate. See what emerges.

This layer is powerful for:

  • Rapid prototyping and proof-of-concepts
  • Exploring design spaces quickly
  • Internal tools nobody else will maintain
  • Hackathons and creative experiments

But the data is unforgiving about its limits.

A CodeRabbit study analyzing 470 real-world pull requests found that AI-generated code produces 1.7x more issues than human-written code. On security specifically, AI code was 2.74x more likely to introduce XSS vulnerabilities and 1.88x more likely to introduce improper password handling.

Vibe coding works when the stakes are low and iteration speed matters more than code quality. It's the top slice of bread — where ideas begin, not where products ship.

Layer 2: AI-Assisted — The Filling That Can Spoil

This is where most companies live today: developers using Copilot, ChatGPT, or Cursor as autocomplete-on-steroids. Write code, get suggestions, accept or reject.

It sounds like the responsible approach. The data says otherwise.

The METR randomized controlled trial — the most rigorous study to date on AI-assisted coding — found that experienced open-source developers were actually 19% slower when using AI tools. The kicker? Those same developers believed they were 20% faster. The perception gap is almost perfectly inverted.

Google's 2024 DORA report, the industry standard for DevOps metrics, confirmed the pattern: increased AI adoption correlated with a 7.2% reduction in delivery stability and a 1.5% drop in throughput. The culprit? Developers accepted larger, less-reviewed changesets because AI made generating code feel effortless.

AI-assisted development without discipline doesn't save time — it redistributes it. You save time writing. You lose it debugging, reviewing, and fixing the subtle bugs AI introduced.

It's the filling of the sandwich — essential, but it can spoil the whole thing if you're not careful.

Layer 3: Agentic — The Foundation

This is where the real shift is happening. Agentic coding isn't autocomplete. It's delegation. You define the task, the constraints, and the acceptance criteria. The agent writes the code, runs the tests, iterates on failures, and presents the result.

The evidence is building fast:

  • Spotify built an internal system called "Honk" on top of Claude Code, enabling real-time remote code deployment. Their best engineers haven't written code manually since December 2025.
  • Karpathy described the transition: 80% manual in November, 80% agent in December. He now says "you are not writing the code directly 99% of the time."
  • Apple integrated AI agents directly into Xcode, signaling that agentic workflows are becoming the default IDE experience.

But here's what people miss: agentic coding requires more engineering skill, not less. You need to write precise specifications. You need to architect systems the agent can reason about. You need to review diffs critically. The developer's role shifts from typing code to engineering outcomes.

It's the bottom slice — the foundation everything else rests on.

The Sandwich

The mistake is treating these as competing approaches. They're complementary layers of one workflow:

Layer Best For Risk If Used Alone
Vibe Coding (top) Ideation, prototypes, exploration Security vulnerabilities, doesn't scale
AI-Assisted (middle) Boilerplate, docs, routine tasks 19% slower on complex tasks, false confidence
Agentic (bottom) Production code, testing, deployment Requires strong specs and architecture

A mature team uses all three:

  1. Vibe code the prototype to validate the idea fast
  2. Hand it to agents to build the production version with tests, error handling, and security
  3. Use AI-assisted tools for the routine work in between — documentation, boilerplate, code review

The sandwich isn't a hierarchy. It's a workflow.

What This Means for Your Team

The companies pulling ahead aren't the ones who picked one level. They're the ones who learned when to use each layer.

If you're only vibe coding, you're building on sand. The prototypes feel magical, but the code won't survive contact with real users, real scale, or real security requirements.

If you're only AI-assisted, you're in the productivity trap. You feel faster. The data says otherwise. And your competitors who went agentic are shipping circles around you.

If you're only agentic, you're over-engineering the exploration phase. Not every idea needs a full agent pipeline. Sometimes you need to vibe first to know if the idea is even worth building.

The winning move is all three, used deliberately.

Karpathy didn't abandon vibe coding — he evolved past it for production work while keeping it for exploration. Spotify didn't skip straight to agents — they built the infrastructure to support agentic workflows at scale.

The question isn't "which level are you on?" It's "are you using the right layer for the job?"