Skip to content
Back to Blog
Token Economics · AI Agents

Why AI Agents Cost $100K/Year (And the Fix)

Apr 5, 2026
6 min read
TB

Tijo Bear

CEO, Rapid Claw

"I'm spending $300 a day per agent. That's $100,000 a year — just for one."

— Jason Calacanis, All-In Podcast, 2026

Jason Calacanis disclosed on the All-In Podcast that he's spending $300 per day per AI agent — roughly $100,000 a year. TechCrunch picked it up with the headline "Are AI tokens the new signing bonus?" The number sounds absurd. But it's not wrong. It's what happens when you run frontier models without routing — and it's a number that will become a standard line item on P&Ls across the industry before the end of the year.

The purpose of this post is to show exactly where that $100K comes from, why it's structurally unavoidable without smart routing, and what the math looks like once you add a routing layer. This is practical accounting, not marketing.

The $300/day math, broken down

A production AI agent doing research, writing, and multi-step tasks might generate 500K–2M tokens per day at frontier model prices. Current pricing for capable frontier models sits in the $3–15 per million tokens range. At a blended average of $5 per million tokens, here's the scale:

  • 1M tokens/day = $5/day
  • 2M tokens/day = $10/day
  • 10M tokens/day = $50/day
  • 60M tokens/day = $300/day

To hit $300/day on a single agent you'd need 60 million tokens — that's either a very active continuous agent with long context windows, or more likely: multiple agents running in parallel all billed under one umbrella, or a team running premium models with no routing layer. Calacanis almost certainly falls into the latter category. No routing means every sub-task — whether it's formatting JSON or synthesizing a legal document — runs on the same frontier model at the same per-token rate.

At $300/day, annualized cost is $109,500. Round it to $100K because some days are lighter. The number is real.

Why token costs spiral without routing

The core problem is that agentic workflows are not uniform. An agent running a research-and-write task might spend 80% of its tokens on mechanical sub-steps: parsing a URL response, reformatting structured data, classifying a result, extracting fields from a document. These tasks do not need a frontier model. A lightweight model at $0.25/million tokens handles them correctly.

But when there's no routing layer, every sub-task goes to the same model. The task that says "format this JSON as a markdown table" costs the same as the task that says "synthesize these six conflicting legal opinions and flag the highest-risk clause." The second task benefits from a frontier model. The first one is just burning money.

At low volumes this is annoying. At scale — five agents, ten agents, continuous operation — it's the difference between a sustainable cost structure and a CFO conversation you don't want to have.

The agentic multiplier — and why you can't engineer your way out of it

Jensen Huang put a number on this at GTC 2026: agentic tasks consume approximately 1,000x more tokens than a standard prompt. For continuous agents operating in the background, that figure climbs to 1,000,000x.

The mechanics are straightforward. An agentic task involves planning, tool calls, intermediate reasoning, error handling, context accumulation across steps, and final synthesis. Each of those stages consumes tokens. Long contexts compound the cost because each new completion is generated over the full accumulated context. An agent that's been running for two hours and has built up 200K tokens of working memory pays for those 200K tokens on every subsequent inference call.

You cannot prompt-engineer your way out of this. Shorter system prompts help at the margins. Aggressive context pruning helps a little more. But the fundamental shape of the cost curve — driven by multi-step reasoning over accumulating context — is structural. The only lever that materially changes the equation is routing.

How smart routing changes the math

Smart routing classifies each sub-task and sends it to the cheapest model that can handle it correctly. Simple tasks — formatting, extraction, classification, structured output — route to models priced at $0.25–$0.80 per million tokens. Complex tasks — multi-step reasoning, synthesis, code generation with domain context — route to frontier models at $3–15 per million tokens.

In practice, the ratio of cheap-to-expensive tasks in a typical agentic workflow is roughly 70/30 by token volume. Routing those 70% of tokens to lightweight models at a 10–20x price difference produces a 60–80% reduction in total token spend. The math on Calacanis's number:

  • Unrouted: $300/day → $109,500/year
  • With 60% reduction: $120/day → $43,800/year
  • With 80% reduction: $60/day → $21,900/year

$100K/year becomes $20–40K/year. The agent does the same work. The routing layer handles the difference.

What Rapid Claw does

Smart routing is built into the OpenClaw platform on Rapid Claw. You don't configure it. The routing layer runs automatically and classifies each task before it hits an inference endpoint.

The $29/mo plan includes $20 in AI tokens. Because of routing, that $20 typically covers 3–5x the effective work you'd get spending the same $20 at raw API rates — the cheap tasks are dramatically cheaper, which stretches the budget for the frontier-model tasks that actually need it.

On overages: if you exceed your included token budget, the agent throttles and notifies you. There's no automatic charge. You decide whether to add tokens or let the agent queue. This is intentional — the worst-case scenario for AI cost management is a silent runaway agent billing you at frontier rates until you notice a credit card statement three weeks later. That doesn't happen here.

Who should actually care about this

Solo developers and small teams running 1–5 agents: token costs at this scale are the difference between a project that's financially sustainable and one that quietly gets shut down. A $100K/year cost structure doesn't work on a solo project budget. A $20–40K/year cost structure — or significantly less on the lower end — does.

Enterprise teams running 50+ agents: this is literally a line item. At 50 agents running at Calacanis-scale, you're looking at $5M/year unrouted vs. $1–2M/year with routing. That's a budget conversation that finance will have with or without you — better to have the routing answer ready before they ask.

If you're running zero agents today, the cost discussion is premature. Get one deployed first. But build on infrastructure that has routing from day one, because retrofitting it later is harder than starting with it.

Deploy your first agent with smart routing built in

$29/mo includes $20 in AI tokens. Routing handles cost optimization automatically. Overages throttle and notify — no surprise charges.

Deploy Free — from $29/mo