How to Deploy AI Agents: Complete Step-by-Step Guide
Deploying AI agents isn't like deploying a web app. GPU requirements, state management, unpredictable token costs, and long-running processes make it a different beast entirely. This guide covers every step from infrastructure to production monitoring — with real commands, real configs, and real costs.
April 3, 2026·22 min read
April 3, 2026·22 min read
6
Steps to deploy
2 hrs
Estimated setup time
$70+
Monthly starting cost
TL;DR
Deploying AI agents requires choosing a deployment model (self-hosted, managed, or hybrid), setting up infrastructure with enough CPU/RAM for browser automation, installing an agent framework like OpenClaw, configuring monitoring, and planning for scale. Budget $70-250/month for a solo setup, or start with Rapid Claw at $29/month to skip the infrastructure entirely.
Want to deploy AI agents in under 2 minutes?
Try Rapid ClawWhy Deploying AI Agents Is Different from Deploying Regular Apps
If you've deployed web apps before, you might assume deploying an AI agent is similar. It isn't. Here's why AI agent deployment is its own discipline:
GPU & Compute Requirements
AI agents need substantially more compute than typical web services. Browser automation, screen capture, and model inference are CPU and memory intensive. Running local models adds GPU requirements on top of that.
State Management
Web apps are mostly stateless — agents are not. Each agent maintains conversation history, browser state, file system context, and tool outputs. A crashed agent can't just restart from scratch. You need persistence and recovery strategies.
Unpredictable Costs
A web app's cost scales predictably with traffic. An AI agent's cost scales with task complexity — a single bad prompt can trigger a retry loop that burns through hundreds of dollars in API calls. Cost guardrails aren't optional.
Long-Running Processes
Web requests finish in milliseconds. Agent tasks can run for minutes or hours. Your infrastructure needs to handle long-lived processes, timeouts, graceful shutdowns, and task resumption after failures.
The good news: these challenges are solvable. This guide walks through each one systematically. If you want to understand what AI agents are before diving into deployment, read our introduction to OpenClaw and AI agent platforms first.
Prerequisites — What You Need Before Starting
Before you touch any infrastructure, make sure these are in place. Missing a prerequisite is the most common reason deployments stall on day one.
- An LLM API key — You need access to at least one LLM provider (OpenAI, Anthropic, Google, or a local model). Anthropic Claude is recommended for complex agent tasks. Budget $50-500/month depending on usage.
- A framework decision — This guide uses OpenClaw as the reference framework (250k+ GitHub stars, most popular open-source option). The deployment principles apply to any agent framework.
- Terminal / SSH proficiency — You should be comfortable with command-line operations, SSH, editing config files, and reading logs. If you're not, start with a managed platform and come back to self-hosting later.
- Docker installed (or willingness to install it) — Containerized deployment is the standard for agent frameworks. Docker handles dependencies, isolation, and reproducibility.
- A budget — Minimum $70/month for a basic solo setup. See the AI agent cost calculator for detailed estimates based on your specific workload, or read our deep dive on why AI agent token costs can hit $100K/year at scale.
Minimum hardware: 4 CPU cores, 8 GB RAM, 80 GB SSD. For 5+ concurrent agents: 8+ cores, 16+ GB RAM. For local model inference: add an NVIDIA GPU with 16+ GB VRAM (A10G, RTX 4090, or similar).
Step 1: Choose Your Deployment Model
This is the most consequential decision you'll make, and it's hard to change later. There are three deployment models, each with real trade-offs:
Self-Hosted
You provision and manage everything: servers, networking, security, updates, backups. Full control, full responsibility. This is the right choice if you have DevOps expertise in-house, need to keep data on-premises for compliance, or want to run local models without sending data to third-party APIs.
Managed Platform
A hosting provider like Rapid Claw handles all infrastructure — you just configure agents and bring your own API keys. Best for teams that want to focus on building agents, not managing servers. You trade some control for dramatically less ops work.
Hybrid
Run sensitive workloads on your own infrastructure while offloading less sensitive or bursty work to a managed platform. Common in enterprise environments where some data can't leave your network but you still want managed scaling for other tasks.
| Factor | Self-Hosted | Managed Platform | Hybrid |
|---|---|---|---|
| Setup Time | 2-8 hours | Under 5 minutes | 4-12 hours |
| Monthly Cost | $50-200 + API costs | $29-199 + API costs | $100-500 + API costs |
| Ops Burden | High — you own everything | None — fully managed | Medium — split responsibility |
| Data Control | Full — on your hardware | Vendor-dependent | Full — sensitive data stays local |
| Scaling | Manual — you configure it | Automatic | Semi-automatic |
| Best For | Teams with DevOps expertise, strict compliance needs | Solo devs, startups, teams that want to focus on building agents | Enterprise with mixed workloads, regulated industries |
Our recommendation: If you're deploying AI agents for the first time, start with a managed platform to validate your use case. You can always migrate to self-hosted later — our migration guide covers both directions. The worst outcome is spending weeks on infrastructure only to discover your agent design needs rethinking.
Step 2: Infrastructure Setup
If you chose self-hosted or hybrid, this step is where you provision your infrastructure. We'll walk through a VPS-based setup — the most common starting point for teams deploying their first agents.
Provision a VPS
Recommended providers: Hetzner (best price/performance in EU), DigitalOcean, AWS Lightsail, or Vultr. Here's what to select:
# Recommended specs for 2-3 concurrent agents: # - 4-8 CPU cores (AMD EPYC or Intel Xeon) # - 8-16 GB RAM # - 80-160 GB NVMe SSD # - Ubuntu 22.04 LTS or Debian 12 # Example: Hetzner CPX31 — 4 vCPU, 8GB RAM, 160GB — ~$15/month # Example: DigitalOcean Premium — 4 vCPU, 8GB RAM, 100GB — ~$48/month
Secure the Server
# Create a non-root user adduser deploy usermod -aG sudo deploy # Set up SSH key authentication su - deploy mkdir -p ~/.ssh && chmod 700 ~/.ssh # Add your public key to ~/.ssh/authorized_keys # Disable password authentication sudo sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config sudo systemctl restart sshd # Configure firewall sudo ufw allow 22/tcp # SSH sudo ufw allow 80/tcp # HTTP sudo ufw allow 443/tcp # HTTPS sudo ufw enable
GPU Allocation (Optional)
If you plan to run local models (Llama, Mistral, etc.) alongside your agent framework, you'll need GPU compute. This is optional if you're using API-based models (OpenAI, Anthropic, etc.).
# Install NVIDIA drivers (Ubuntu 22.04) sudo apt install -y nvidia-driver-535 # Install NVIDIA Container Toolkit for Docker GPU access distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt update && sudo apt install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker # Verify GPU access in Docker docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
Network Configuration
AI agents make a lot of outbound requests — web browsing, API calls, file downloads. Make sure your network config supports this:
- Outbound ports 80 and 443 open (most VPS providers allow this by default)
- DNS resolution working (test with: dig google.com)
- Domain pointed to your server IP (A record) for HTTPS access
- Rate limiting configured at the reverse proxy level to prevent runaway agents from hammering external services
For a deeper look at securing your agent's API connections, read our guide on securing AI APIs: best practices.
Step 3: Agent Framework Installation
We'll use OpenClaw as our reference framework. If you're using a different framework, the concepts (containerized deployment, environment config, API key management) are the same — only the specific commands differ.
Install Docker & Docker Compose
# Install Docker Engine (Ubuntu/Debian) curl -fsSL https://get.docker.com | sh sudo usermod -aG docker $USER newgrp docker # Verify installation docker --version docker compose version
Clone and Configure OpenClaw
# Clone the repository sudo mkdir -p /opt/openclaw && sudo chown $USER:$USER /opt/openclaw git clone https://github.com/openclaw/openclaw.git /opt/openclaw cd /opt/openclaw # Copy the example environment file cp .env.example .env
Configure Environment Variables
Edit .env with your actual values. These are the critical settings:
# /opt/openclaw/.env # --- LLM Provider API Keys --- OPENAI_API_KEY=sk-...your-key-here ANTHROPIC_API_KEY=sk-ant-...your-key-here # --- Security --- OPENCLAW_SECRET_KEY=$(openssl rand -hex 32) # --- Display Settings --- OPENCLAW_RESOLUTION=1920x1080 OPENCLAW_COLOR_DEPTH=24 # --- Performance --- OPENCLAW_MAX_CONCURRENT_AGENTS=3 OPENCLAW_SMART_ROUTING=true # Route simple tasks to cheaper models OPENCLAW_TASK_TIMEOUT=3600 # Max seconds per task (1 hour) # --- Cost Controls --- OPENCLAW_MAX_TOKENS_PER_TASK=500000 OPENCLAW_DAILY_SPEND_LIMIT=50 # USD - critical safety net # --- Networking --- OPENCLAW_PORT=3000 OPENCLAW_HOST=0.0.0.0
Critical: Set OPENCLAW_DAILY_SPEND_LIMIT before your first deployment. Without it, a misbehaving agent can rack up hundreds of dollars in API calls within hours. We've seen it happen — set the limit, test, then raise it gradually.
Set Up SSL with Nginx
# Install Nginx and Certbot
sudo apt install -y nginx certbot python3-certbot-nginx
# Create Nginx config
sudo tee /etc/nginx/sites-available/openclaw > /dev/null <<'NGINX'
server {
server_name agents.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (required for live agent view)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Long timeouts for agent tasks
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
}
}
NGINX
# Enable and get SSL
sudo ln -s /etc/nginx/sites-available/openclaw /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
sudo certbot --nginx -d agents.yourdomain.comStep 4: Deploy Your First Agent
Everything is configured. Time to bring it up and run your first agent task.
Start the Stack
cd /opt/openclaw # Pull images and start all services docker compose pull docker compose up -d # Verify all containers are running docker compose ps # Expected output: # NAME STATUS # openclaw-app running (healthy) # openclaw-browser running (healthy) # openclaw-redis running (healthy) # openclaw-postgres running (healthy)
Verify the Dashboard
Open https://agents.yourdomain.com in your browser. You should see the OpenClaw dashboard. Log in with the secret key from your .env file.
Create Your First Task
Start simple to validate everything works end-to-end:
- 1Create a new agent from the dashboard.
- 2Give it a simple task: "Go to news.ycombinator.com and summarize the top 5 stories."
- 3Watch the live view. You'll see the agent's browser, its reasoning, and each action in real-time.
- 4Check the logs if anything fails:
docker compose logs -f openclaw-app
# You can also create agents via the CLI: docker compose exec openclaw-app openclaw agent create \ --name "hn-summarizer" \ --task "Go to news.ycombinator.com and summarize the top 5 stories" \ --model "claude-sonnet-4-6" \ --max-tokens 50000
If that works, congratulations — you have a production AI agent. Now let's make sure it stays healthy.
Step 5: Monitoring & Observability
Running agents without monitoring is like driving without a dashboard. You need visibility into what your agents are doing, how much they're costing, and when things go wrong. For a comprehensive deep-dive, see our guide on AI agent observability.
Essential Metrics to Track
- Task success / failure rate
- Average task completion time
- Agent crash / restart count
- Queue depth and wait times
- Tokens consumed per task
- Daily / weekly spend by agent
- Cost per successful task
- Spend rate alerts and limits
- CPU and memory utilization
- Disk usage and I/O
- Network throughput
- Container health checks
- Failed authentication attempts
- Unusual outbound traffic patterns
- API key usage anomalies
- Agent permission violations
Structured Logging Setup
# docker-compose.override.yml — add logging config
services:
openclaw-app:
logging:
driver: "json-file"
options:
max-size: "50m"
max-file: "5"
tag: "openclaw-app"
environment:
- OPENCLAW_LOG_LEVEL=info
- OPENCLAW_LOG_FORMAT=json # Structured JSON for log aggregation
- OPENCLAW_METRICS_PORT=9090 # Prometheus metrics endpoint# Basic Prometheus scrape config (prometheus.yml)
scrape_configs:
- job_name: 'openclaw'
scrape_interval: 15s
static_configs:
- targets: ['openclaw-app:9090']
# Then visualize in Grafana with dashboards for:
# - Agent task throughput and latency
# - Token consumption over time
# - Error rates by agent and task type
# - Infrastructure resource usageStep 6: Scaling Your Deployment
A single server gets you started, but production workloads need scaling strategies. Here's how to grow from one server to a fleet.
Vertical Scaling (Quick Win)
The simplest path: upgrade your server. Going from 4 cores / 8GB to 8 cores / 32GB typically doubles your concurrent agent capacity. Most VPS providers let you resize with minimal downtime.
Horizontal Scaling
When a single server isn't enough, add more servers behind a load balancer:
# Architecture for horizontal scaling: # # [Load Balancer (Nginx/HAProxy)] # ├── [Agent Server 1] ── 3 concurrent agents # ├── [Agent Server 2] ── 3 concurrent agents # └── [Agent Server 3] ── 3 concurrent agents # │ # └── [Shared Redis] ← task queue & state # └── [Shared Postgres] ← persistence # # Each agent server runs its own OpenClaw instance. # Redis handles task distribution and inter-agent communication. # Postgres stores task history, agent configs, and results.
Multi-Agent Orchestration
At scale, you're not just running more agents — you're coordinating them. OpenClaw supports multi-agent patterns:
- Task queues — agents pull tasks from a shared Redis queue, so work distributes automatically across servers.
- Agent specialization — dedicate some agents to web research, others to data processing, others to email management. Route tasks based on type.
- Sub-agent delegation — a coordinator agent breaks down complex tasks and delegates subtasks to specialized worker agents.
- Auto-scaling — use container orchestration (Kubernetes, Docker Swarm) to spin up additional agent instances when queue depth exceeds thresholds.
# Kubernetes HPA example for auto-scaling agents
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: openclaw-agents
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: openclaw-agent
minReplicas: 2
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: redis_queue_depth
target:
type: AverageValue
averageValue: "5" # Scale up when >5 tasks queued per podFor enterprise-grade scaling patterns, see our enterprise AI agent deployment guide.
Cost Breakdown: What AI Agent Deployment Actually Costs
The #1 question we hear: "How much will this cost me?" Here's a realistic breakdown across three deployment scales. For a personalized estimate, use our AI agent cost calculator.
| Cost Category | Hobby / Solo | Startup (5-10 agents) | Scale (50+ agents) |
|---|---|---|---|
| Infrastructure | $20-50/mo | $100-300/mo | $500-2,000/mo |
| LLM API Costs | $50-200/mo | $500-2,000/mo | $5,000-20,000+/mo |
| GPU (if local models) | $0 (API only) | $200-800/mo | $2,000-10,000/mo |
| Monitoring / Logging | $0 (self-managed) | $50-150/mo | $200-500/mo |
| DevOps Time | 2-4 hrs/week | 10-20 hrs/week | Full-time hire |
| Total Estimated | $70-250/mo | $850-3,250/mo | $7,700-32,500+/mo |
LLM API costs are the wildcard. A single agent running complex multi-step tasks can consume 500K-2M tokens per day. Read our deep dive on why AI agent token costs hit $100K/year to understand where the money goes and how to optimize.
Cost optimization tip: Enable smart routing (OPENCLAW_SMART_ROUTING=true) to automatically send simple tasks to cheaper models and complex tasks to capable ones. Teams using smart routing report 30-50% cost reductions with no impact on task quality.
Top 5 Deployment Mistakes (and How to Avoid Them)
1No spend limits on API keys
We've seen teams burn $500+ in a single afternoon from a retry loop. Always set OPENCLAW_DAILY_SPEND_LIMIT and configure alerts at 50% and 80% of your budget. Additionally, set spending limits directly with your LLM provider.
2Undersizing the server
AI agents are resource-hungry. A 2-core/4GB server will crash under browser automation load. Start with at least 4 cores / 8GB, and monitor resource usage in the first week. The OOM killer is not your friend.
3Skipping security basics
An AI agent with internet access and your API keys is a high-value target. At minimum: enable HTTPS, disable password SSH, configure a firewall, and rotate API keys quarterly. Read our security best practices guide before going to production.
4No monitoring until something breaks
"It was working fine" is not a monitoring strategy. Set up logging, metrics, and alerts from day one. You need to know about failures before your users do — not when they file a support ticket.
5Over-engineering the first deployment
You don't need Kubernetes, multi-region failover, and a custom orchestration layer on day one. Start with a single server, one agent, one task. Validate it works. Then scale. The teams that move fastest start simple and add complexity only when the workload demands it.
Frequently Asked Questions
Keep Reading
Ready to Deploy AI Agents?
Whether you self-host or go managed, you now have the complete playbook for deploying AI agents in production. If you want to skip the infrastructure and get straight to building — Rapid Claw deploys a fully configured OpenClaw instance in under 2 minutes.