Skip to content
Production PlaybookIntermediate

Vibe Coding to Production: How to Ship AI Agents That Survive Real Users

Vibe Coding to Production — Take Your AI Agent From Demo to Deployed
TG
Tijo Gaucher

April 20, 2026·16 min read

The vibe coding market hit $4.7B in 2026 with 92% developer adoption. Cursor, Claude Code, v0, Replit Agent, and a dozen others made it possible to ship a working AI agent in an afternoon. The catch: there is a canyon between “I vibe coded it” and “it is actually deployed reliably.” This is the bridge.

TL;DR

Vibe coding to production is a four-stage pipeline: audit what the AI actually built, harden the boring infrastructure layer (auth, rate limits, secrets, errors), observe what runs in the wild, and deploy behind a sandbox. Most vibe-coded agents fail in production because they skipped stages 2–4. This guide shows what to fix and in what order — with concrete checklists for AI agents specifically. RapidClaw includes the production layer by default for OpenClaw and Hermes Agent.

Want the production layer included by default?

Try Rapid Claw

What Vibe Coding Actually Is (And Why It Exploded)

Andrej Karpathy coined the phrase in February 2025: “There’s a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.” A year later it stopped being a meme and became a market. By Q1 2026, vibe coding tools represented roughly $4.7B in annualized spend, with surveys placing developer adoption around 92% in some form — Cursor, Windsurf, Claude Code, v0, Replit Agent, Bolt, Lovable, and a long tail of niche tools.

What changed is not just speed. It is the kind of person who can now ship working software. A solo founder describes a multi-tool AI agent in plain English and has a working prototype in three hours. Three years ago that took a team and a quarter. The prototype works. It demos beautifully. The investor video gets shared.

Then someone tries to use it for real work. And it falls over.

The Production Readiness Gap

A vibe-coded prototype is optimized to look like it works on the happy path the developer described. Production is the set of every path the developer didn’t. The gap between the two is not about code quality — the AI often writes cleaner code than a tired human at 11 p.m. The gap is about everything the AI was never told to consider.

What the demo proves

  • • The feature exists and renders
  • • The happy path returns a result
  • • The UI looks coherent
  • • The data shape is plausible
  • • You, logged in, can use it

What the demo does NOT prove

  • • It survives concurrent users
  • • It handles rate limits and timeouts
  • • Auth actually isolates tenants
  • • Secrets are not in the repo
  • • The agent loop has a cost ceiling

For AI agents specifically, the gap is wider than for regular CRUD apps. An agent has tools. It calls APIs. It loops. It can spend money. A vibe-coded chatbot that hallucinates a wrong fact is embarrassing — a vibe-coded agent that loops on a paid API at 4 a.m. with no rate limit is a$2,000 invoice and a Slack message that starts with “hey, do you know what happened with…”

The 90/10 reality of agent demos

90% of the AI agent demos shared on X in any given week would not survive 24 hours of real production traffic without intervention. This is not a knock on vibe coding — it is a knock on confusing “works in dev” with “works for users.” If you want context on why agents in particular are fragile, see why AI agents fail in production for the five most common failure modes.

Common Pitfalls of Vibe-Coded AI Agents

Across hundreds of vibe-coded agents shipped in the last year, the same eight pitfalls show up again and again. Each one is fixable in under an hour — and each one is why the prototype melts in production.

1.Hardcoded secrets in the repo

API keys baked into the source because the model wanted to "just make it work." Often committed to a public GitHub repo before anyone notices.

2.No rate limiting on tool calls

The agent can loop indefinitely on web_search, send_email, or any other tool. One stuck loop = real money on the API bill.

3.Silent error swallowing

try/except blocks that catch everything and return an empty result. The agent keeps marching forward on data it never actually got.

4.Auth that "works in dev"

Every test ran as the same logged-in user. Once two real tenants exist, you discover the JWT was never validated and user A can read user B's data.

5.Zero observability

No structured logs, no traces, no metrics. When something breaks at 2 a.m. there is nothing to look at except the user complaint.

6.Model version pinning ignored

Code calls "claude-sonnet-latest" or "gpt-4". When the provider rolls a new version, behavior shifts overnight and tests written against the old version pass anyway.

7.Unvalidated tool inputs

The agent decides what arguments to pass to your tools. A prompt injection nudges it into deleting a row, sending an email to attacker@example.com, or running an unbounded SQL query.

8.No cost ceiling

No daily token budget per user, per task, per tenant. The first abusive user discovers your agent and your Anthropic invoice triples.

How to Audit a Vibe-Coded Agent (4-Pass Review)

Before you fix anything, you need to know what you have. The audit is not a code review — you do not need to read every line. You need to map four things: secrets, surface area, failure modes, and cost. One pass each.

Pass 1 — Secrets sweep

Vibe-coded code routinely contains pasted-in API keys, even from tools that “know better.” Run gitleaks across the repo and the full git history:

audit_secrets.sh
#!/bin/bash
# Pass 1: Secrets sweep — run before anything else

# Install gitleaks
brew install gitleaks  # or: docker pull zricethezav/gitleaks

# Scan working tree
gitleaks detect --source . --verbose --report-path gitleaks-tree.json

# Scan full git history (vibe-coded repos often have leaked keys in commits)
gitleaks detect --source . --log-opts="--all" --report-path gitleaks-history.json

# Quick grep for the obvious ones
grep -rEn "sk-[a-zA-Z0-9]{20,}" --include="*.{js,ts,py,env,json,yaml}" .
grep -rEn "AKIA[0-9A-Z]{16}" --include="*.{js,ts,py,env,json,yaml}" .

# If you find anything: rotate the key NOW, then rewrite history.
# Do NOT just delete the line and recommit — the key is still in the history.

Pass 2 — Surface area map

List every external thing the agent can touch. For a typical vibe-coded agent, that means: every tool the agent can call, every HTTP endpoint your app exposes, every third-party API called, and every database write. Write it down on one page. If you can’t fit it on a page, your agent has too much surface for one person to safely operate.

surface_area.md
# Agent Surface Area — fill this in for every vibe-coded agent

## Tools the agent can call
- web_search           → external HTTP, costs money per call
- send_email           → side effect, can spam users
- read_calendar        → reads PII
- write_database       → mutates state, no rollback
- execute_shell        → REMOVE THIS unless you really meant it

## Endpoints exposed
- POST /api/chat       → unauthenticated? rate limited?
- POST /api/webhook    → who can call this? signed?
- GET  /api/admin/*    → admin auth check?

## Third-party APIs called
- Anthropic            → cost: ~$3 per 1M tokens, no hard cap
- Stripe               → side effect, real charges
- SendGrid             → spam vector

## Database writes
- users.subscription   → can the agent change billing? 🚨
- agent_logs           → unbounded growth?

Pass 3 — Failure mode walk

For each tool and endpoint from Pass 2, write down what happens when:

  • The external API times out
  • The external API returns a 429 rate limit
  • The model refuses (“I can’t help with that”)
  • The model returns malformed JSON
  • The user sends an empty input, a huge input, or a prompt-injection input
  • The same user fires 100 requests in 10 seconds

For most vibe-coded agents, the answer to half of these is “I don’t know.” That is the audit’s job — surface the unknowns before users do.

Pass 4 — Cost ceiling

What is the maximum amount of money this agent can spend in 24 hours, assuming the worst-case loop and the worst-case adversarial user? If the answer is “unlimited,” you have not built a product — you have built a way to lose money. Set a hard cap before going live. Even a generous one.

The audit deliverable

At the end of the four passes, you should have one page describing: (1) the surface area, (2) the gaps in failure handling, (3) the cost ceiling. If those three fit on a page, you have a path to production. If they don’t, the agent needs to be scoped down before being shipped, not after.

How to Harden Vibe-Coded Code

Hardening is the boring layer. The AI did not write it because you did not ask. Add it now, in this order — cheapest fix first, biggest blast radius first.

1. Move secrets to a secret manager

Anything from Pass 1 goes into Vault, AWS Secrets Manager, Doppler, or 1Password Secrets — never into .env files committed to git, and never into the source. Rotate any key that was ever in the repo, even if you removed it.

2. Add rate limits on every tool

Per-tool sliding window. The full pattern is in our AI agent firewall setup guide — copy the rate-limit decorator and apply it to every tool the agent can call. This single change prevents the most expensive class of failure: the runaway loop.

3. Validate every tool input

The model decides arguments. You decide what is allowed. Wrap every tool with a Pydantic (or Zod) schema and reject anything that doesn’t match:

tool_validation.py
from pydantic import BaseModel, EmailStr, constr, ValidationError
from typing import Literal

class SendEmailArgs(BaseModel):
    """Schema enforced before the agent's email tool runs."""
    to: EmailStr                         # must be a real email
    subject: constr(min_length=1, max_length=200)
    body: constr(min_length=1, max_length=10_000)
    priority: Literal["low", "normal", "high"] = "normal"

def send_email_tool(raw_args: dict) -> dict:
    try:
        args = SendEmailArgs(**raw_args)
    except ValidationError as e:
        # Return error TO THE AGENT — don't crash, don't silently succeed
        return {"error": "invalid_input", "details": e.errors()}

    # Defense-in-depth: enforce allow-list on recipients in production
    if not args.to.endswith(("@your-company.com", "@trusted-domain.com")):
        return {"error": "recipient_not_allowed", "to": args.to}

    # ... actually send the email
    return {"status": "sent", "to": args.to}

4. Replace silent excepts with structured errors

Find every except Exception: pass and every .catch(() => null). Replace each one with structured error logging and a thrown error or returned error object. The agent can recover from a known error. It cannot recover from a None that it didn’t know was a failure.

5. Pin model versions and write behavior tests

Replace claude-sonnet-latest with the exact version string (e.g. claude-sonnet-4-6). Then write 5–10 behavior tests covering the actual decisions you care about: “given this support ticket, does the agent escalate?”, “given this calendar conflict, does the agent decline?”. Unit-testing the wrapper code is fine. Unit-testing the agent’s decisions is what catches the silent regressions when you bump the model.

6. Add observability before deploy, not after

Structured logs (one line of JSON per agent action), traces (one span per tool call), and metrics (tool call count, latency, token cost). The full setup is in self-hosted AI agent observability. Without this you cannot debug production. With it, the same incident takes 10 minutes instead of 4 hours.

Deployment Best Practices for Vibe-Coded Agents

Once you have audited and hardened, deployment is mostly about constraining blast radius. The defaults below assume a single-region, single-tenant setup — scale up only when you have to.

deploy_checklist.md
# Vibe-Coded Agent — Pre-Deploy Checklist

## Isolation
[ ] Agent runs in a sandboxed container, not on the host
[ ] Egress is default-deny — only known API endpoints allowed
[ ] No access to host filesystem unless explicitly mounted

## Identity & secrets
[ ] No secrets in env vars baked into the image
[ ] Secrets pulled from a manager at runtime
[ ] Each tool has a scoped key with minimum permissions
[ ] All keys rotated within last 30 days

## Limits
[ ] Per-tool rate limit (sliding window)
[ ] Per-user daily token budget
[ ] Per-tenant daily spend cap, hard-stopped
[ ] Max tool calls per agent task

## Failure handling
[ ] Every tool returns a structured error object (not None)
[ ] Timeouts on every external call (default: 30s)
[ ] Retries are bounded (max 3, exponential backoff)
[ ] Circuit breaker on repeated failures

## Observability
[ ] Structured JSON logs to a queryable backend
[ ] Distributed traces with span per tool call
[ ] Metrics: latency, error rate, cost per tenant
[ ] Alert: cost spike > 3x baseline in 1h
[ ] Alert: error rate > 5% in 5min

## Rollout
[ ] Feature-flagged behind a per-tenant toggle
[ ] Canary to 1% of traffic before full rollout
[ ] One-click rollback documented in the runbook
[ ] On-call knows where the runbook is

None of this is novel. None of it is interesting. All of it is the difference between an agent that runs for a week and an agent that runs for a year. For the deeper architectural picture, see the AI agent hosting complete guide and the OpenClaw production deployment guide.

How RapidClaw Bridges the Vibe-to-Production Pipeline

Everything in the hardening and deployment sections above is doable. It is also a week of work, and a week that is not building features. RapidClaw exists because someone needed to do this for OpenClaw and Hermes Agent and decided to do it once, properly, for everyone.

When you push a vibe-coded agent to a RapidClaw deployment, the production layer is already there:

Sandboxed execution

Per-agent containers with default-deny network egress and no host access. The agent cannot exfiltrate what it cannot reach.

Built-in rate limiting

Per-tool, per-user, and per-tenant limits applied automatically. The runaway-loop incident becomes impossible by construction.

Scoped secret management

Keys rotated automatically, each tool gets its own. No more leaked-key incidents from a copy-paste demo.

Observability included

Structured logs, traces, and per-tenant cost metrics from the first request, not bolted on after the first outage.

Hard cost ceilings

Daily spend caps per tenant. The first abusive user does not turn into a $4,000 invoice — the agent stops and pages you.

One-click rollback

Every deploy is versioned. If the new agent regresses, roll back from the dashboard before the incident review starts.

You still bring the agent. You still make the product decisions. RapidClaw replaces the week of plumbing with a deploy command. If you’re comparing your options, the CrewAI alternatives that handle deployment and OpenClaw alternatives posts cover the broader landscape.

The Vibe-to-Production Pipeline, In Order

1

Audit (one afternoon)

Four passes — secrets, surface area, failure modes, cost. Output: one page describing what you actually built.

2

Harden (two to four days)

Secrets manager, rate limits, input validation, structured errors, model pinning, observability. The boring layer the AI did not write.

3

Deploy behind a sandbox (one day)

Container isolation, default-deny egress, scoped keys, hard cost cap. Either build it yourself or use a platform that includes it.

4

Observe and iterate (forever)

The agent will surprise you in production. Logs, metrics, and traces are how you find out before users do. Treat the first month post-launch as continuous audit, not maintenance.

Vibe coding is not the problem. It is the most important shift in how software gets built since open source. The problem is treating the prototype as the product. The pipeline above is what closes that gap — whether you build it yourself or run on RapidClaw.

Frequently Asked Questions

Skip the production plumbing

Push your vibe-coded OpenClaw or Hermes Agent to RapidClaw and ship with sandboxing, rate limits, scoped keys, and observability included. Production-ready in under two minutes.

Deploy with RapidClaw