You’ve built an amazing AI application locally. Now you need to deploy it. Simple, right?

Except your laptop has 64GB RAM, local model files, cached embeddings, and environment variables scattered across three different files. Production has… none of that.

Here’s how to bridge the gap between “works on my machine” and “running reliably in production.”

The Production Deployment Checklist

Before we dive into specifics, here’s what you need:

Component Purpose Example Tools
Containerization Reproducible environments Docker, Podman
Orchestration Manage multiple containers Docker Compose, Kubernetes
Reverse Proxy Handle HTTPS, routing Caddy, Nginx, Traefik
CI/CD Automated testing & deployment GitHub Actions, GitLab CI
Secrets Management Secure API keys, passwords Vault, AWS Secrets Manager
Monitoring Know when things break Grafana, Datadog, Sentry
Logging Debug production issues Loki, CloudWatch, Better Stack

Step 1: Dockerize Your Application

Docker ensures your app runs the same everywhere. Here’s a production-ready Dockerfile for a Python AI application:

# Multi-stage build for smaller images
FROM python:3.11-slim as builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y 
    gcc 
    g++ 
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Final stage
FROM python:3.11-slim

WORKDIR /app

# Copy only what we need from builder
COPY --from=builder /root/.local /root/.local
COPY . .

# Make sure scripts are in PATH
ENV PATH=/root/.local/bin:$PATH

# Don't run as root
RUN useradd -m appuser && chown -R appuser /app
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=3s 
  CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Key Docker Best Practices

  • Multi-stage builds: Reduce final image size by 60-80%
  • Don’t run as root: Security best practice
  • Health checks: Let orchestrators know if container is healthy
  • Specific base images: Use python:3.11-slim not python:latest
  • .dockerignore: Exclude unnecessary files (node_modules, .git, cache)

Step 2: Environment Configuration

Never hardcode API keys or secrets. Use environment variables:

# .env.example (check this into git)
ANTHROPIC_API_KEY=sk-ant-xxx
DATABASE_URL=postgresql://localhost/mydb
REDIS_URL=redis://localhost:6379
LOG_LEVEL=info

# .env (never commit this!)
ANTHROPIC_API_KEY=sk-ant-real-key-here
DATABASE_URL=postgresql://user:pass@prod-db.com/prod

Load them in your app:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    anthropic_api_key: str
    database_url: str
    redis_url: str = "redis://localhost:6379"  # default
    log_level: str = "info"

    class Config:
        env_file = ".env"

settings = Settings()

Step 3: Docker Compose for Local Development

Run your entire stack with one command:

version: '3.8'

services:
  app:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/mydb
      - REDIS_URL=redis://redis:6379
    env_file:
      - .env
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    volumes:
      - ./app:/app  # hot reload in dev
    restart: unless-stopped

  db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=mydb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

  # Vector database for RAG
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

volumes:
  postgres_data:
  redis_data:
  qdrant_data:

Run everything:

docker compose up -d

Your app, database, Redis, and vector database are now running.

Step 4: Caddy for HTTPS and Reverse Proxy

Caddy automatically provisions SSL certificates from Let’s Encrypt. Configuration is beautifully simple:

# Caddyfile

ai.yourdomain.com {
    # Automatic HTTPS!
    reverse_proxy app:8000

    # Rate limiting
    rate_limit {
        zone app_zone {
            key {remote_host}
            events 100
            window 1m
        }
    }

    # Logging
    log {
        output file /var/log/caddy/access.log
        format json
    }
}

# Separate domain for admin panel
admin.yourdomain.com {
    reverse_proxy app:8000

    # Basic auth
    basicauth {
        admin $2a$14$hashed_password_here
    }
}

Add Caddy to docker-compose.yml:

  caddy:
    image: caddy:2-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    restart: unless-stopped

Boom. HTTPS, rate limiting, and load balancing in 20 lines.

Step 5: CI/CD Pipeline

Automate testing and deployment with GitHub Actions:

# .github/workflows/deploy.yml

name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov

      - name: Run tests
        run: pytest --cov=app tests/

      - name: Run LLM evals
        run: python scripts/run_evals.py
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

  deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    steps:
      - uses: actions/checkout@v3

      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          push: true
          tags: your-registry/ai-app:latest

      - name: Deploy to server
        uses: appleboy/ssh-action@master
        with:
          host: ${{ secrets.SERVER_HOST }}
          username: ${{ secrets.SERVER_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          script: |
            cd /opt/ai-app
            docker compose pull
            docker compose up -d
            docker compose exec app python scripts/migrate.py

Now every push to main:

  1. Runs unit tests
  2. Runs LLM evaluations
  3. Builds Docker image
  4. Deploys to production
  5. Runs database migrations

All automatically.

Step 6: Monitoring and Observability

You need to know when things break. Set up Grafana + Prometheus:

# docker-compose.yml additions

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your_password_here

Instrument your app:

from prometheus_client import Counter, Histogram
import time

llm_requests = Counter('llm_requests_total', 'Total LLM requests')
llm_latency = Histogram('llm_request_duration_seconds', 'LLM request latency')
llm_cost = Counter('llm_cost_dollars', 'Total LLM cost in dollars')

@llm_latency.time()
async def call_llm(prompt: str):
    llm_requests.inc()
    start = time.time()

    response = await client.messages.create(
        model="claude-3-5-sonnet-20241022",
        messages=[{"role": "user", "content": prompt}]
    )

    # Track cost
    cost = calculate_cost(response.usage)
    llm_cost.inc(cost)

    return response

Now you have dashboards showing:

  • Request volume
  • Latency (p50, p95, p99)
  • Error rates
  • Cost per hour
  • Cache hit rates

Step 7: Blue-Green Deployments

Deploy new versions without downtime:

# Deploy new version (green)
docker compose -f docker-compose.green.yml up -d

# Test it on separate port
curl http://localhost:8001/health

# If good, switch traffic (update Caddy)
# If bad, kill green and keep blue

Or use Kubernetes for automatic rolling updates.

Common Production Issues (And Fixes)

Issue: Out of Memory

Symptom: Container keeps restarting
Fix: Set memory limits in docker-compose.yml:

services:
  app:
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          memory: 2G

Issue: Slow Performance

Symptom: Requests timing out
Fix: Add Redis caching, increase worker processes, use async I/O

Issue: Database Connection Exhaustion

Symptom: “Too many connections” errors
Fix: Use connection pooling (SQLAlchemy, asyncpg), increase DB max connections

Issue: Secrets Leaked in Logs

Symptom: API keys visible in logs
Fix: Scrub logs, use structured logging with sensitive field redaction

The Production Deployment Runbook

When deploying a major change:

  1. ☐ Test locally with docker compose up
  2. ☐ Run full test suite including LLM evals
  3. ☐ Deploy to staging environment first
  4. ☐ Run smoke tests on staging
  5. ☐ Deploy to 10% of prod traffic (canary)
  6. ☐ Monitor for 1 hour
  7. ☐ If metrics look good, deploy to 100%
  8. ☐ If anything breaks, rollback immediately
  9. ☐ Keep deployment window open for 24 hours

Cost Optimization

Running AI in production can get expensive. Optimize:

  • Use spot instances for batch jobs (save 60-80%)
  • Auto-scale workers based on queue depth
  • Cache LLM responses aggressively (Redis with 1-hour TTL)
  • Use smaller models where quality difference is minimal
  • Batch similar requests to save on API calls

Security Hardening

Essential security practices:

  • Run containers as non-root user
  • Use secrets management (Vault, AWS Secrets Manager)
  • Enable rate limiting (prevent abuse)
  • Scan Docker images for vulnerabilities (Trivy, Snyk)
  • Use network policies (isolate services)
  • Enable audit logging (track all API calls)
  • Rotate API keys regularly

Backup and Disaster Recovery

What happens if your server dies?

  1. Database backups: Automated daily backups to S3
  2. Vector DB backups: Regular snapshots of Qdrant/Weaviate
  3. Configuration backups: Store in git (Infrastructure as Code)
  4. Recovery time objective: Can you restore in under 1 hour?

The Bottom Line

Deploying AI applications is more complex than traditional apps because:

  • They depend on external APIs (LLMs)
  • They have ML-specific failure modes
  • They can be expensive to run at scale
  • They require continuous monitoring and improvement

But with the right DevOps practices, you can run AI in production with confidence.

Start simple (Docker + Docker Compose), add complexity only when needed (Kubernetes), and always measure what matters (latency, cost, user satisfaction).

Now go deploy that AI app. The world is waiting.