What Six Months of Vibe Coding Actually Taught Me
I'm a product leader, not a software engineer. Over the last six months I built and shipped real software without one: this site, a multi-agent research pipeline, an 11-part playbook. Here is what worked, what failed, and the lesson underneath both.
Perspective · June 2026
TL;DR
- ·The biggest unlock wasn't a clever prompt. It was a capability jump: the 1M-token context window changed what I could hold together at once, and my whole way of building changed with it.
- ·It 10x'd me on structured, verifiable work and walled me out completely on taste-dependent creative work (a pixel-art game I had to kill after weeks).
- ·The skeptics are mostly looking backward. If I can already ship this fast as a non-engineer, the question isn't whether it works. It's why you aren't using it at every step of your own work.
What I actually built
I did not set out to write an essay about vibe coding. I set out to build things, and the essay is the residue. Over six months, as a product person who codes rather than a career engineer, I built and shipped: this entire site; a multi-agent research pipeline that powers the deeply-sourced articles here; the S&P 500 valuation tool across all 500+ companies; and an 11-part Product Management playbook with diagrams, quizzes, and its own measurement pipeline.
The thing I'm proudest of isn't any one of those. It's that I can now be the thinker and the builder in the same person: product manager and vibe coder at once. I don't feel chained to needing a software engineer to put my ideas into the world. I can move fast, I can debate the trade-offs with myself instead of negotiating someone else's opinion on how it "should" be done, and I can fail and learn on my own clock. After fifteen years of carrying frameworks and SEO ideas in my head, I could finally test them. That included learning GEO[12], how you get found through LLMs, by simply trying, repeating, and watching what happened.
Where it 10x'd me
The honest answer to "what made the biggest difference" is not a technique. It was a capability jump. When models moved to a 1-million-token context window[10] (for me, around the Opus 4.6 era), it was a step change. Suddenly I could hold an entire architecture in one working session: the AI could see across the whole project instead of losing the thread halfway through. Auto-compaction helped too; I stopped babysitting a context-percentage meter and waiting to compact by hand. My ideas scaled because the tool could finally keep up with the size of them.
The second multiplier was discipline, not capability: ruthlessly streamlining to only the skills and steering files that matter, and building a "helicopter-level" architecture for how I work with AI. That setup then trickles down into every project, which is the real force multiplier. Get the operating model right once and every subsequent build inherits it.
Where it failed me: the pixel-art game
The honest counter-example. I wanted my resume to be an interactive Pokémon-style world: you walk from city to city and each one uncovers a chapter of my career. I found someone else's convincing game to work backwards from, and I tried to build on that base.
It was wall after wall after wall. The AI could not do the front-end and design work the idea needed: it couldn't create pixel art, couldn't produce anything that looked remotely like a city, couldn't even rearrange the art from a working code base without breaking the sprites. I spent days on it, spread across weeks, coming back again and again to keep the project afloat. Eventually I had to accept that my current stack simply wasn't going to get it done. I canned that part of the project.
The lesson wasn't "AI can't build games." It was narrower and more useful: vibe coding, on the stack within my reach, is not yet ready for taste-dependent creative work like original pixel art. The further you get from a verifiable spec and the closer you get to "is this good?" judgment, the worse it performs today. I still think the idea is cool, and I'd happily hear from anyone who's cracked it. The 70% problem is real[9]: the AI sprints you to most of the way, and then the last stretch, the part that needs taste, is on you.
Two more honest failure modes
Complex multi-country, multi-language front-ends. On an earlier, bigger build, around Opus 4.5 and prior models, I bug-fixed endlessly. The AI couldn't see through its own inconsistencies, and the context windows of the time choked on the complexity. What I didn't realize then was that most of those were not permanent limits; they were about to be solved by better models, the 1M context window, and stronger steering files and skills. I got through it eventually. I still don't think the AI is fully there for that level of localized complexity, but the trajectory is unmistakable, and that's the point.
Two people on one codebase. The other unsolved headache: a friend and I wanted to build a pet project together, autonomously, across time zones. The plan was to divide by area (he takes the backend, I take the front-end, site structure, marketing, go-to-market) and each ship on our own. Even that was hard to get off the ground. Working over each other on the same code without conflict is a real problem I don't have a clean answer to yet, even looking at what other orgs do. The answer seems to be hard compartmentalization plus the right push mechanism. I'm confident a good pattern exists; I just haven't found the one yet. I'll revisit it.
And the most current one: I'm a bit disappointed with the stock picker. It's been hard to keep the analysis rigorous and turn it into something that automatically re-runs as a scheduled job. It needs more time, and I'm keeping it honest by saying so. It's a work in progress, not a finished trophy.
The lesson, and where the skeptics are wrong
The aha moment was that context-window jump, but the deeper lesson is about stance. While I was shipping, I kept reading people on LinkedIn confidently posting that vibe coding gets you nothing. I fundamentally disagree, and I think that's already backward thinking. If you genuinely understand the power of these models, you have to project forward: what we're doing right now is the beginning. If a non-engineer can already feel this empowered and deliver output this fast, the future is far more exciting than the skeptics' snapshot suggests.
That's also where I part ways with the studies. METR found experienced developers were 19% slower with AI while feeling 24% faster[4], and I take the perception gap seriously. But other field experiments already show meaningful throughput gains[5][6], and the production signal is loud[7]. Where AI "didn't work," it's usually because an org wasn't willing to change how it works, or because the survey caught a moment in time on a capability curve that moves monthly. Reality on the ground hits hard and fast: people who know how to use these tools are faster, full stop.
The one real risk is that it still takes a lot of critical thinking to know what "better" looks like. That's why this gets harder, not easier, for the most junior engineers: companies are realizing the best engineer in an AI world isn't the one you can mold. It's the one who already has the experience, the architectural judgment, and the critical thinking to orchestrate the AI. That's where the 10x lives. The craft is changing, not disappearing.
What I'd tell a Chief AI Officer
If you're at a top tech company, your people should be vibe coding on the side, and you should emulate that culture, even if it's just a couple of hackathons, to set the pace and help the org learn faster. You can't impose that. But here's what you can: make the use of generative AI methodical at specific steps, for specific roles. A product manager should use it to prototype, and should not be shy about overlapping with what a UX person used to own.
Because the jobs themselves are about to change. What a PM, a software engineer, or a UX designer is "supposed to do" may be reshaped completely, and some of those roles may merge. People should not be afraid of stepping outside today's job description to use the tools in front of them to improve their throughput and their thinking. So if there's one sentence to take from this:
At every step of your work, you should be asking: why am I not using AI here?
That's the muscle: not a tool, not a prompt, a reflex. I built everything on this site by exercising it, and I'm increasingly pointing the same skill set at things that matter more than a portfolio: helping family and friends, and pro-bono work. It's also, quietly, a healthier hobby than the video games it replaced, though I'd argue it scratches the same itch. Build, fail, learn, repeat. The loop is the whole point.
A note on the stack, since people ask: the tools that earned their place for me are Kiro and Claude Code, plus a multi-agent setup that lets several agents work a problem in parallel when I want materially higher quality. The specific tools will change; the operating model is what lasts.
Sources & Further Reading
12 sources researched for this article. Last updated when the page was published.
The vibe coding origin + landscape
- Andrej Karpathy on "vibe coding" (origin of the term)— Andrej Karpathy (via Simon Willison), 2025-02-06Where the term began: coding by describing intent and accepting the AI's output.
- Vibe coding: definition and origin— Wikipedia, 2026
- Not all AI-assisted programming is vibe coding— Simon Willison, 2025-03-19The useful distinction between vibe coding and disciplined, reviewed AI-assisted work.
Productivity evidence (the contested part)
- Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity— METR, 2025-07-10The RCT: developers felt 24% faster, were 19% slower. The perception gap.
- The Effects of Generative AI on Software Developer Performance— Cui, Demirer, et al. / NBER, 2024Large field experiments showing meaningful throughput gains from AI coding tools.
- Quantifying GitHub Copilot's impact on developer productivity and happiness— GitHub, 2022
Production reality + walkbacks
- Claude Code: engineers merging materially more PRs— Anthropic, 2025Production signal: real teams shipping more with AI coding agents.
- Klarna AI assistant handles two-thirds of service chats, and the later human rebalancing— Klarna / reporting, 2024–2025The hype, and the correction; why org willingness matters as much as capability.
- The 70% problem: hard truths about AI-assisted coding— Addy Osmani, 2025AI sprints you to ~70%; the last stretch needs human taste, matching the pixel-art wall.
The capability curve (why timing matters)
- Long-context models and what 1M-token windows unlock— Anthropic, 2025–2026The capability jump that changed how much architecture I could hold at once.
- Measuring AI ability to complete long tasks (autonomous task-length doubling)— METR, 2025-03-19Why "it can't do X today" is a snapshot on a fast-moving curve.
- GEO: Generative Engine Optimization— Aggarwal et al. (arXiv 2311.09735), KDD 2024, 2024Being found through LLMs; one of the frameworks I could finally test by building.