Deployment

Overview

Guardian API is deployed to Google Cloud Run, a fully managed serverless platform that automatically scales your containerized application. This guide covers deployment options and configuration.

Quick Deploy

The fastest way to deploy Guardian API to production:

Prerequisites

Google Cloud Platform account with billing enabled
gcloud CLI installed and configured
GitHub repository with backend code

Enable APIs

gcloud services enable \
    cloudbuild.googleapis.com \
    run.googleapis.com \
    artifactregistry.googleapis.com

Create Artifact Registry

gcloud artifacts repositories create guardian-api \
    --repository-format=docker \
    --location=us-central1 \
    --description="GuardianAPI Docker images"

Deploy with Cloud Build

gcloud builds submit --config=backend/cloudbuild.yaml

Your API will be deployed and accessible at a Cloud Run URL.

Deployment Options

Cloud Build (Recommended)
Manual Deploy
Docker Compose (Local)

Automated CI/CD PipelineCloud Build automatically builds and deploys your Docker container:

# Deploy with custom CORS origins
gcloud builds submit --config=backend/cloudbuild.yaml \
    --substitutions=_CORS_ORIGINS="https://yourapp.com"

Benefits:

Automatic builds from GitHub
Integrated with Cloud Run
Build caching for faster deployments
Configurable via cloudbuild.yaml

Build Configuration:

Machine type: E2_HIGHCPU_8
Timeout: 20 minutes
Automatic tag: latest + commit SHA

Build and Deploy ManuallyFor custom workflows or testing:

# Build Docker image
cd backend
docker build -t gcr.io/YOUR_PROJECT_ID/guardian-api:latest .

# Push to Artifact Registry
docker tag gcr.io/YOUR_PROJECT_ID/guardian-api:latest \
    us-central1-docker.pkg.dev/YOUR_PROJECT_ID/guardian-api/guardian-api:latest
docker push us-central1-docker.pkg.dev/YOUR_PROJECT_ID/guardian-api/guardian-api:latest

# Deploy to Cloud Run
gcloud run deploy guardian-api \
    --image us-central1-docker.pkg.dev/YOUR_PROJECT_ID/guardian-api/guardian-api:latest \
    --region us-central1 \
    --platform managed \
    --allow-unauthenticated \
    --port 8080 \
    --memory 2Gi \
    --cpu 2 \
    --timeout 300 \
    --max-instances 10

Local DevelopmentRun the full stack locally:

# Clone repository
git clone https://github.com/Ksmith18skc/GuardianAPI.git
cd GuardianAPI

# Start services
docker-compose up -d

# API available at http://localhost:8000

Includes:

FastAPI backend (port 8000)
Redis for rate limiting (port 6379)
Automatic model loading

Cloud Run Configuration

Resource Allocation

Production-ready configuration:

Setting	Value	Purpose
Memory	2Gi	Sufficient for all ML models
CPU	2 cores	Fast inference performance
Timeout	300s	Allows model loading on cold start
Max Instances	10	Scale to handle traffic spikes
Min Instances	0	Scale to zero when idle (cost optimization)

Update Configuration

Change resource allocation after deployment:

gcloud run services update guardian-api \
    --region us-central1 \
    --memory 4Gi \
    --cpu 2 \
    --timeout 300 \
    --max-instances 20

Environment Variables

CORS_ORIGINS (Required)

Purpose: Specify which frontend domains can access your APIFormat: Comma-separated list of URLsExample:

CORS_ORIGINS=https://yourapp.com,https://staging.yourapp.com

Default:

https://guardian.korymsmith.dev
http://localhost:5173
http://127.0.0.1:5173

Set in Cloud Run:

gcloud run services update guardian-api \
    --set-env-vars CORS_ORIGINS=https://yourapp.com

LOG_LEVEL (Optional)

Purpose: Control logging verbosityOptions: DEBUG, INFO, WARNING, ERROR, CRITICALDefault: INFOExample:

gcloud run services update guardian-api \
    --set-env-vars LOG_LEVEL=DEBUG

REDIS_URL (Optional)

Purpose: Enable rate limiting with RedisFormat: rediss://default:<token>@<host>:<port>Example (Upstash Redis):

REDIS_URL=rediss://default:abc123@xyz.upstash.io:6379

Note: Rate limiting fails open if not configured (allows all requests)Set securely:

# Store in Secret Manager
gcloud secrets create redis-url --data-file=-
# Paste your Redis URL and press Ctrl+D

# Mount in Cloud Run
gcloud run services update guardian-api \
    --update-secrets REDIS_URL=redis-url:latest

HUGGINGFACE_HUB_TOKEN (Optional)

Purpose: Improve HuggingFace model download reliabilityNote: Public models don’t require a token, but having one can help with rate limitsGet token: HuggingFace SettingsSet securely:

gcloud run services update guardian-api \
    --update-secrets HUGGINGFACE_HUB_TOKEN=hf-token:latest

Continuous Deployment

Set up automatic deployments when you push to GitHub:

Connect Repository

gcloud builds triggers create github \
    --repo-name=GuardianAPI \
    --repo-owner=YOUR_GITHUB_USERNAME \
    --branch-pattern="^main$" \
    --build-config=backend/cloudbuild.yaml

Configure Trigger Settings

In the Google Cloud Console:

Navigate to Cloud Build > Triggers
Find your trigger
Add substitution variables:
- _CORS_ORIGINS: Your frontend URL
- _LOG_LEVEL: INFO (or DEBUG)

Automatic Deployments

Every push to main now:

Triggers Cloud Build
Builds Docker image
Pushes to Artifact Registry
Deploys to Cloud Run
Creates new revision

Health Monitoring

Health Endpoint

Check API health and model status:

# Get your Cloud Run URL
SERVICE_URL=$(gcloud run services describe guardian-api \
    --region us-central1 \
    --format 'value(status.url)')

# Check health
curl ${SERVICE_URL}/v1/health

Expected Response:

{
  "status": "healthy",
  "models": {
    "sexism": "loaded",
    "toxicity": "loaded",
    "rules": "loaded"
  }
}

View Logs

Monitor your API in real-time:

# Real-time logs
gcloud run services logs tail guardian-api --region us-central1

# Historical logs (last 50 entries)
gcloud run services logs read guardian-api --region us-central1 --limit 50

Cloud Run Metrics

View in Google Cloud Console:

Request count and rate
Response latency (p50, p95, p99)
Error rate
Instance count
Memory and CPU usage

Troubleshooting

Service Won't Start

Check logs:

gcloud run services logs read guardian-api --region us-central1

Common issues:

Missing dependencies in requirements.txt
Python version mismatch (requires 3.11+)
Model files not included in Docker build
Insufficient memory allocation (needs 2Gi minimum)

Models Not Loading

Symptoms:

Health endpoint shows models as “not loaded”
Moderation requests fail with 500 errors

Solutions:

Check logs for HuggingFace download errors
Verify sufficient memory (2Gi recommended)
Add HUGGINGFACE_HUB_TOKEN if rate limited
Increase timeout to 300s for model loading

CORS Errors

Symptoms:

Frontend can’t connect to API
Browser console shows CORS errors

Solutions:

Verify CORS_ORIGINS includes your frontend URL
Check for exact match (including https://)
Avoid trailing slashes in URLs

Verify environment variable is set:

gcloud run services describe guardian-api \
    --format 'value(spec.template.spec.containers[0].env)'

Cold Start Delays

Cause: Cloud Run scales to zero when idleBehavior: First request after idle takes 10-30 secondsSolutions:

Accept delay (most cost-effective)

Set minimum instances:

gcloud run services update guardian-api \
    --min-instances 1

Note: This prevents scale-to-zero (increases cost)

Use Cloud Scheduler to ping API every 5 minutes

Cost Optimization

Free Tier

Google Cloud Run includes generous free tier:

2 million requests/month
360,000 GB-seconds of memory
180,000 vCPU-seconds

Most small to medium applications stay within free tier.

Cost Management

Scale to Zero

Set min-instances=0 to avoid charges when idleSavings: Only pay for actual usageTrade-off: Cold start delays

Right-Size Resources

Start with 2Gi memory and scale if neededMonitor: Check Cloud Run metrics for actual usageAdjust: Increase only if hitting limits

Set Billing Alerts

Configure budget alerts in GCPRecommended: Alert at 50%, 80%, 100% of budgetPrevents: Unexpected charges

Limit Max Instances

Cap auto-scaling to prevent runaway costsRecommended: Start with 10 max instancesAdjust: Based on traffic patterns

Security Best Practices

Use Secret Manager

Never commit secrets to repository:

# Store Redis URL securely
echo "rediss://your-redis-url" | gcloud secrets create redis-url --data-file=-

# Mount in Cloud Run
gcloud run services update guardian-api \
    --update-secrets REDIS_URL=redis-url:latest

Configure CORS Properly

Only allow specific frontend origins:

# Good: Specific domains
CORS_ORIGINS=https://yourapp.com,https://staging.yourapp.com

# Bad: Wildcard (never use in production)
CORS_ORIGINS=*

Enable Rate Limiting

Configure Redis to prevent abuse:

# Set up Upstash Redis
# Get URL from https://upstash.com

# Configure in Cloud Run
gcloud run services update guardian-api \
    --update-secrets REDIS_URL=redis-url:latest

See Rate Limiting for details.

Monitor Logs

Regularly check logs for suspicious activity:

# Check for errors
gcloud run services logs read guardian-api \
    --region us-central1 \
    --filter="severity>=ERROR"

Next Steps

Configure Environment

Set up environment variables and configuration

Enable Rate Limiting

Protect your API with rate limiting

API Reference

Explore API endpoints and schemas

Architecture Guide

Understand the system architecture

Additional Resources

Detailed Deployment Guide: backend/DEPLOYMENT.md in repository
Cloud Build Config: backend/cloudbuild.yaml
Google Cloud Run Docs: cloud.google.com/run
Docker Configuration: backend/Dockerfile

Production Checklist

Before going live, verify:

Getting Started

Core Concepts

SDKs

Configuration

Use Cases

About

Overview

Quick Deploy

Deployment Options

Cloud Run Configuration

Resource Allocation

Update Configuration

Environment Variables

Continuous Deployment

Health Monitoring

Health Endpoint

View Logs

Cloud Run Metrics

Troubleshooting

Cost Optimization

Free Tier

Cost Management

Scale to Zero

Right-Size Resources

Set Billing Alerts

Limit Max Instances

Security Best Practices

Next Steps

Configure Environment

Enable Rate Limiting

API Reference

Architecture Guide

Additional Resources

Production Checklist

Getting Started

Core Concepts

SDKs

Configuration

Use Cases

About

​Overview

​Quick Deploy

​Deployment Options

​Cloud Run Configuration

​Resource Allocation

​Update Configuration

​Environment Variables

​Continuous Deployment

​Health Monitoring

​Health Endpoint

​View Logs

​Cloud Run Metrics

​Troubleshooting

​Cost Optimization

​Free Tier

​Cost Management

Scale to Zero

Right-Size Resources

Set Billing Alerts

Limit Max Instances

​Security Best Practices

​Next Steps

Configure Environment

Enable Rate Limiting

API Reference

Architecture Guide

​Additional Resources

​Production Checklist

Overview

Quick Deploy

Deployment Options

Cloud Run Configuration

Resource Allocation

Update Configuration

Environment Variables

Continuous Deployment

Health Monitoring

Health Endpoint

View Logs

Cloud Run Metrics

Troubleshooting

Cost Optimization

Free Tier

Cost Management

Security Best Practices

Next Steps

Additional Resources

Production Checklist