Skip to content
Pillar GuideIntermediate

How to Deploy AI Agents: Complete Step-by-Step Guide

Deploying AI agents isn't like deploying a web app. GPU requirements, state management, unpredictable token costs, and long-running processes make it a different beast entirely. This guide covers every step from infrastructure to production monitoring — with real commands, real configs, and real costs.

TG
Tijo Gaucher

April 3, 2026·22 min read

TG
Tijo Gaucher

April 3, 2026·22 min read

6

Steps to deploy

2 hrs

Estimated setup time

$70+

Monthly starting cost

TL;DR

Deploying AI agents requires choosing a deployment model (self-hosted, managed, or hybrid), setting up infrastructure with enough CPU/RAM for browser automation, installing an agent framework like OpenClaw, configuring monitoring, and planning for scale. Budget $70-250/month for a solo setup, or start with Rapid Claw at $29/month to skip the infrastructure entirely.

Want to deploy AI agents in under 2 minutes?

Try Rapid Claw

Why Deploying AI Agents Is Different from Deploying Regular Apps

If you've deployed web apps before, you might assume deploying an AI agent is similar. It isn't. Here's why AI agent deployment is its own discipline:

GPU & Compute Requirements

AI agents need substantially more compute than typical web services. Browser automation, screen capture, and model inference are CPU and memory intensive. Running local models adds GPU requirements on top of that.

State Management

Web apps are mostly stateless — agents are not. Each agent maintains conversation history, browser state, file system context, and tool outputs. A crashed agent can't just restart from scratch. You need persistence and recovery strategies.

Unpredictable Costs

A web app's cost scales predictably with traffic. An AI agent's cost scales with task complexity — a single bad prompt can trigger a retry loop that burns through hundreds of dollars in API calls. Cost guardrails aren't optional.

Long-Running Processes

Web requests finish in milliseconds. Agent tasks can run for minutes or hours. Your infrastructure needs to handle long-lived processes, timeouts, graceful shutdowns, and task resumption after failures.

The good news: these challenges are solvable. This guide walks through each one systematically. If you want to understand what AI agents are before diving into deployment, read our introduction to OpenClaw and AI agent platforms first.

Prerequisites — What You Need Before Starting

Before you touch any infrastructure, make sure these are in place. Missing a prerequisite is the most common reason deployments stall on day one.

  • An LLM API key — You need access to at least one LLM provider (OpenAI, Anthropic, Google, or a local model). Anthropic Claude is recommended for complex agent tasks. Budget $50-500/month depending on usage.
  • A framework decision — This guide uses OpenClaw as the reference framework (250k+ GitHub stars, most popular open-source option). The deployment principles apply to any agent framework.
  • Terminal / SSH proficiency — You should be comfortable with command-line operations, SSH, editing config files, and reading logs. If you're not, start with a managed platform and come back to self-hosting later.
  • Docker installed (or willingness to install it) — Containerized deployment is the standard for agent frameworks. Docker handles dependencies, isolation, and reproducibility.
  • A budget — Minimum $70/month for a basic solo setup. See the AI agent cost calculator for detailed estimates based on your specific workload, or read our deep dive on why AI agent token costs can hit $100K/year at scale.

Minimum hardware: 4 CPU cores, 8 GB RAM, 80 GB SSD. For 5+ concurrent agents: 8+ cores, 16+ GB RAM. For local model inference: add an NVIDIA GPU with 16+ GB VRAM (A10G, RTX 4090, or similar).

Step 1: Choose Your Deployment Model

This is the most consequential decision you'll make, and it's hard to change later. There are three deployment models, each with real trade-offs:

Self-Hosted

You provision and manage everything: servers, networking, security, updates, backups. Full control, full responsibility. This is the right choice if you have DevOps expertise in-house, need to keep data on-premises for compliance, or want to run local models without sending data to third-party APIs.

Managed Platform

A hosting provider like Rapid Claw handles all infrastructure — you just configure agents and bring your own API keys. Best for teams that want to focus on building agents, not managing servers. You trade some control for dramatically less ops work.

Hybrid

Run sensitive workloads on your own infrastructure while offloading less sensitive or bursty work to a managed platform. Common in enterprise environments where some data can't leave your network but you still want managed scaling for other tasks.

FactorSelf-HostedManaged PlatformHybrid
Setup Time2-8 hoursUnder 5 minutes4-12 hours
Monthly Cost$50-200 + API costs$29-199 + API costs$100-500 + API costs
Ops BurdenHigh — you own everythingNone — fully managedMedium — split responsibility
Data ControlFull — on your hardwareVendor-dependentFull — sensitive data stays local
ScalingManual — you configure itAutomaticSemi-automatic
Best ForTeams with DevOps expertise, strict compliance needsSolo devs, startups, teams that want to focus on building agentsEnterprise with mixed workloads, regulated industries

Our recommendation: If you're deploying AI agents for the first time, start with a managed platform to validate your use case. You can always migrate to self-hosted later — our migration guide covers both directions. The worst outcome is spending weeks on infrastructure only to discover your agent design needs rethinking.

Step 2: Infrastructure Setup

If you chose self-hosted or hybrid, this step is where you provision your infrastructure. We'll walk through a VPS-based setup — the most common starting point for teams deploying their first agents.

Provision a VPS

Recommended providers: Hetzner (best price/performance in EU), DigitalOcean, AWS Lightsail, or Vultr. Here's what to select:

# Recommended specs for 2-3 concurrent agents:
# - 4-8 CPU cores (AMD EPYC or Intel Xeon)
# - 8-16 GB RAM
# - 80-160 GB NVMe SSD
# - Ubuntu 22.04 LTS or Debian 12

# Example: Hetzner CPX31 — 4 vCPU, 8GB RAM, 160GB — ~$15/month
# Example: DigitalOcean Premium — 4 vCPU, 8GB RAM, 100GB — ~$48/month

Secure the Server

# Create a non-root user
adduser deploy
usermod -aG sudo deploy

# Set up SSH key authentication
su - deploy
mkdir -p ~/.ssh && chmod 700 ~/.ssh
# Add your public key to ~/.ssh/authorized_keys

# Disable password authentication
sudo sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart sshd

# Configure firewall
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 80/tcp    # HTTP
sudo ufw allow 443/tcp   # HTTPS
sudo ufw enable

GPU Allocation (Optional)

If you plan to run local models (Llama, Mistral, etc.) alongside your agent framework, you'll need GPU compute. This is optional if you're using API-based models (OpenAI, Anthropic, etc.).

# Install NVIDIA drivers (Ubuntu 22.04)
sudo apt install -y nvidia-driver-535

# Install NVIDIA Container Toolkit for Docker GPU access
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi

Network Configuration

AI agents make a lot of outbound requests — web browsing, API calls, file downloads. Make sure your network config supports this:

  • Outbound ports 80 and 443 open (most VPS providers allow this by default)
  • DNS resolution working (test with: dig google.com)
  • Domain pointed to your server IP (A record) for HTTPS access
  • Rate limiting configured at the reverse proxy level to prevent runaway agents from hammering external services

For a deeper look at securing your agent's API connections, read our guide on securing AI APIs: best practices.

Step 3: Agent Framework Installation

We'll use OpenClaw as our reference framework. If you're using a different framework, the concepts (containerized deployment, environment config, API key management) are the same — only the specific commands differ.

Install Docker & Docker Compose

# Install Docker Engine (Ubuntu/Debian)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

# Verify installation
docker --version
docker compose version

Clone and Configure OpenClaw

# Clone the repository
sudo mkdir -p /opt/openclaw && sudo chown $USER:$USER /opt/openclaw
git clone https://github.com/openclaw/openclaw.git /opt/openclaw
cd /opt/openclaw

# Copy the example environment file
cp .env.example .env

Configure Environment Variables

Edit .env with your actual values. These are the critical settings:

# /opt/openclaw/.env

# --- LLM Provider API Keys ---
OPENAI_API_KEY=sk-...your-key-here
ANTHROPIC_API_KEY=sk-ant-...your-key-here

# --- Security ---
OPENCLAW_SECRET_KEY=$(openssl rand -hex 32)

# --- Display Settings ---
OPENCLAW_RESOLUTION=1920x1080
OPENCLAW_COLOR_DEPTH=24

# --- Performance ---
OPENCLAW_MAX_CONCURRENT_AGENTS=3
OPENCLAW_SMART_ROUTING=true        # Route simple tasks to cheaper models
OPENCLAW_TASK_TIMEOUT=3600         # Max seconds per task (1 hour)

# --- Cost Controls ---
OPENCLAW_MAX_TOKENS_PER_TASK=500000
OPENCLAW_DAILY_SPEND_LIMIT=50     # USD - critical safety net

# --- Networking ---
OPENCLAW_PORT=3000
OPENCLAW_HOST=0.0.0.0

Critical: Set OPENCLAW_DAILY_SPEND_LIMIT before your first deployment. Without it, a misbehaving agent can rack up hundreds of dollars in API calls within hours. We've seen it happen — set the limit, test, then raise it gradually.

Set Up SSL with Nginx

# Install Nginx and Certbot
sudo apt install -y nginx certbot python3-certbot-nginx

# Create Nginx config
sudo tee /etc/nginx/sites-available/openclaw > /dev/null <<'NGINX'
server {
    server_name agents.yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (required for live agent view)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        # Long timeouts for agent tasks
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
    }
}
NGINX

# Enable and get SSL
sudo ln -s /etc/nginx/sites-available/openclaw /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
sudo certbot --nginx -d agents.yourdomain.com

Step 4: Deploy Your First Agent

Everything is configured. Time to bring it up and run your first agent task.

Start the Stack

cd /opt/openclaw

# Pull images and start all services
docker compose pull
docker compose up -d

# Verify all containers are running
docker compose ps

# Expected output:
# NAME                STATUS
# openclaw-app        running (healthy)
# openclaw-browser    running (healthy)
# openclaw-redis      running (healthy)
# openclaw-postgres   running (healthy)

Verify the Dashboard

Open https://agents.yourdomain.com in your browser. You should see the OpenClaw dashboard. Log in with the secret key from your .env file.

Create Your First Task

Start simple to validate everything works end-to-end:

  1. 1
    Create a new agent from the dashboard.
  2. 2
    Give it a simple task: "Go to news.ycombinator.com and summarize the top 5 stories."
  3. 3
    Watch the live view. You'll see the agent's browser, its reasoning, and each action in real-time.
  4. 4
    Check the logs if anything fails: docker compose logs -f openclaw-app
# You can also create agents via the CLI:
docker compose exec openclaw-app openclaw agent create \
  --name "hn-summarizer" \
  --task "Go to news.ycombinator.com and summarize the top 5 stories" \
  --model "claude-sonnet-4-6" \
  --max-tokens 50000

If that works, congratulations — you have a production AI agent. Now let's make sure it stays healthy.

Step 5: Monitoring & Observability

Running agents without monitoring is like driving without a dashboard. You need visibility into what your agents are doing, how much they're costing, and when things go wrong. For a comprehensive deep-dive, see our guide on AI agent observability.

Essential Metrics to Track

Agent Health
  • Task success / failure rate
  • Average task completion time
  • Agent crash / restart count
  • Queue depth and wait times
Cost Tracking
  • Tokens consumed per task
  • Daily / weekly spend by agent
  • Cost per successful task
  • Spend rate alerts and limits
Infrastructure
  • CPU and memory utilization
  • Disk usage and I/O
  • Network throughput
  • Container health checks
Security
  • Failed authentication attempts
  • Unusual outbound traffic patterns
  • API key usage anomalies
  • Agent permission violations

Structured Logging Setup

# docker-compose.override.yml — add logging config
services:
  openclaw-app:
    logging:
      driver: "json-file"
      options:
        max-size: "50m"
        max-file: "5"
        tag: "openclaw-app"
    environment:
      - OPENCLAW_LOG_LEVEL=info
      - OPENCLAW_LOG_FORMAT=json    # Structured JSON for log aggregation
      - OPENCLAW_METRICS_PORT=9090  # Prometheus metrics endpoint
# Basic Prometheus scrape config (prometheus.yml)
scrape_configs:
  - job_name: 'openclaw'
    scrape_interval: 15s
    static_configs:
      - targets: ['openclaw-app:9090']

# Then visualize in Grafana with dashboards for:
# - Agent task throughput and latency
# - Token consumption over time
# - Error rates by agent and task type
# - Infrastructure resource usage

Step 6: Scaling Your Deployment

A single server gets you started, but production workloads need scaling strategies. Here's how to grow from one server to a fleet.

Vertical Scaling (Quick Win)

The simplest path: upgrade your server. Going from 4 cores / 8GB to 8 cores / 32GB typically doubles your concurrent agent capacity. Most VPS providers let you resize with minimal downtime.

Horizontal Scaling

When a single server isn't enough, add more servers behind a load balancer:

# Architecture for horizontal scaling:
#
# [Load Balancer (Nginx/HAProxy)]
#     ├── [Agent Server 1] ── 3 concurrent agents
#     ├── [Agent Server 2] ── 3 concurrent agents
#     └── [Agent Server 3] ── 3 concurrent agents
#           │
#           └── [Shared Redis]     ← task queue & state
#           └── [Shared Postgres]  ← persistence
#
# Each agent server runs its own OpenClaw instance.
# Redis handles task distribution and inter-agent communication.
# Postgres stores task history, agent configs, and results.

Multi-Agent Orchestration

At scale, you're not just running more agents — you're coordinating them. OpenClaw supports multi-agent patterns:

  • Task queues — agents pull tasks from a shared Redis queue, so work distributes automatically across servers.
  • Agent specialization — dedicate some agents to web research, others to data processing, others to email management. Route tasks based on type.
  • Sub-agent delegation — a coordinator agent breaks down complex tasks and delegates subtasks to specialized worker agents.
  • Auto-scaling — use container orchestration (Kubernetes, Docker Swarm) to spin up additional agent instances when queue depth exceeds thresholds.
# Kubernetes HPA example for auto-scaling agents
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: openclaw-agents
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: openclaw-agent
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: redis_queue_depth
        target:
          type: AverageValue
          averageValue: "5"    # Scale up when >5 tasks queued per pod

For enterprise-grade scaling patterns, see our enterprise AI agent deployment guide.

Cost Breakdown: What AI Agent Deployment Actually Costs

The #1 question we hear: "How much will this cost me?" Here's a realistic breakdown across three deployment scales. For a personalized estimate, use our AI agent cost calculator.

Cost CategoryHobby / SoloStartup (5-10 agents)Scale (50+ agents)
Infrastructure$20-50/mo$100-300/mo$500-2,000/mo
LLM API Costs$50-200/mo$500-2,000/mo$5,000-20,000+/mo
GPU (if local models)$0 (API only)$200-800/mo$2,000-10,000/mo
Monitoring / Logging$0 (self-managed)$50-150/mo$200-500/mo
DevOps Time2-4 hrs/week10-20 hrs/weekFull-time hire
Total Estimated$70-250/mo$850-3,250/mo$7,700-32,500+/mo

LLM API costs are the wildcard. A single agent running complex multi-step tasks can consume 500K-2M tokens per day. Read our deep dive on why AI agent token costs hit $100K/year to understand where the money goes and how to optimize.

Cost optimization tip: Enable smart routing (OPENCLAW_SMART_ROUTING=true) to automatically send simple tasks to cheaper models and complex tasks to capable ones. Teams using smart routing report 30-50% cost reductions with no impact on task quality.

Top 5 Deployment Mistakes (and How to Avoid Them)

1No spend limits on API keys

We've seen teams burn $500+ in a single afternoon from a retry loop. Always set OPENCLAW_DAILY_SPEND_LIMIT and configure alerts at 50% and 80% of your budget. Additionally, set spending limits directly with your LLM provider.

2Undersizing the server

AI agents are resource-hungry. A 2-core/4GB server will crash under browser automation load. Start with at least 4 cores / 8GB, and monitor resource usage in the first week. The OOM killer is not your friend.

3Skipping security basics

An AI agent with internet access and your API keys is a high-value target. At minimum: enable HTTPS, disable password SSH, configure a firewall, and rotate API keys quarterly. Read our security best practices guide before going to production.

4No monitoring until something breaks

"It was working fine" is not a monitoring strategy. Set up logging, metrics, and alerts from day one. You need to know about failures before your users do — not when they file a support ticket.

5Over-engineering the first deployment

You don't need Kubernetes, multi-region failover, and a custom orchestration layer on day one. Start with a single server, one agent, one task. Validate it works. Then scale. The teams that move fastest start simple and add complexity only when the workload demands it.

Frequently Asked Questions

Keep Reading

Ready to Deploy AI Agents?

Whether you self-host or go managed, you now have the complete playbook for deploying AI agents in production. If you want to skip the infrastructure and get straight to building — Rapid Claw deploys a fully configured OpenClaw instance in under 2 minutes.