Skip to main content

Overview

Guardian API is deployed to Google Cloud Run, a fully managed serverless platform that automatically scales your containerized application. This guide covers deployment options and configuration.

Quick Deploy

The fastest way to deploy Guardian API to production:
1

Prerequisites

  • Google Cloud Platform account with billing enabled
  • gcloud CLI installed and configured
  • GitHub repository with backend code
2

Enable APIs

gcloud services enable \
    cloudbuild.googleapis.com \
    run.googleapis.com \
    artifactregistry.googleapis.com
3

Create Artifact Registry

gcloud artifacts repositories create guardian-api \
    --repository-format=docker \
    --location=us-central1 \
    --description="GuardianAPI Docker images"
4

Deploy with Cloud Build

gcloud builds submit --config=backend/cloudbuild.yaml
Your API will be deployed and accessible at a Cloud Run URL.

Deployment Options

Cloud Run Configuration

Resource Allocation

Production-ready configuration:
SettingValuePurpose
Memory2GiSufficient for all ML models
CPU2 coresFast inference performance
Timeout300sAllows model loading on cold start
Max Instances10Scale to handle traffic spikes
Min Instances0Scale to zero when idle (cost optimization)

Update Configuration

Change resource allocation after deployment:
gcloud run services update guardian-api \
    --region us-central1 \
    --memory 4Gi \
    --cpu 2 \
    --timeout 300 \
    --max-instances 20

Environment Variables

Purpose: Specify which frontend domains can access your APIFormat: Comma-separated list of URLsExample:
CORS_ORIGINS=https://yourapp.com,https://staging.yourapp.com
Default:
  • https://guardian.korymsmith.dev
  • http://localhost:5173
  • http://127.0.0.1:5173
Set in Cloud Run:
gcloud run services update guardian-api \
    --set-env-vars CORS_ORIGINS=https://yourapp.com
Purpose: Control logging verbosityOptions: DEBUG, INFO, WARNING, ERROR, CRITICALDefault: INFOExample:
gcloud run services update guardian-api \
    --set-env-vars LOG_LEVEL=DEBUG
Purpose: Enable rate limiting with RedisFormat: rediss://default:<token>@<host>:<port>Example (Upstash Redis):
REDIS_URL=rediss://default:abc123@xyz.upstash.io:6379
Note: Rate limiting fails open if not configured (allows all requests)Set securely:
# Store in Secret Manager
gcloud secrets create redis-url --data-file=-
# Paste your Redis URL and press Ctrl+D

# Mount in Cloud Run
gcloud run services update guardian-api \
    --update-secrets REDIS_URL=redis-url:latest
Purpose: Improve HuggingFace model download reliabilityNote: Public models don’t require a token, but having one can help with rate limitsGet token: HuggingFace SettingsSet securely:
gcloud run services update guardian-api \
    --update-secrets HUGGINGFACE_HUB_TOKEN=hf-token:latest

Continuous Deployment

Set up automatic deployments when you push to GitHub:
1

Connect Repository

gcloud builds triggers create github \
    --repo-name=GuardianAPI \
    --repo-owner=YOUR_GITHUB_USERNAME \
    --branch-pattern="^main$" \
    --build-config=backend/cloudbuild.yaml
2

Configure Trigger Settings

In the Google Cloud Console:
  • Navigate to Cloud Build > Triggers
  • Find your trigger
  • Add substitution variables:
    • _CORS_ORIGINS: Your frontend URL
    • _LOG_LEVEL: INFO (or DEBUG)
3

Automatic Deployments

Every push to main now:
  1. Triggers Cloud Build
  2. Builds Docker image
  3. Pushes to Artifact Registry
  4. Deploys to Cloud Run
  5. Creates new revision

Health Monitoring

Health Endpoint

Check API health and model status:
# Get your Cloud Run URL
SERVICE_URL=$(gcloud run services describe guardian-api \
    --region us-central1 \
    --format 'value(status.url)')

# Check health
curl ${SERVICE_URL}/v1/health
Expected Response:
{
  "status": "healthy",
  "models": {
    "sexism": "loaded",
    "toxicity": "loaded",
    "rules": "loaded"
  }
}

View Logs

Monitor your API in real-time:
# Real-time logs
gcloud run services logs tail guardian-api --region us-central1

# Historical logs (last 50 entries)
gcloud run services logs read guardian-api --region us-central1 --limit 50

Cloud Run Metrics

View in Google Cloud Console:
  • Request count and rate
  • Response latency (p50, p95, p99)
  • Error rate
  • Instance count
  • Memory and CPU usage

Troubleshooting

Check logs:
gcloud run services logs read guardian-api --region us-central1
Common issues:
  • Missing dependencies in requirements.txt
  • Python version mismatch (requires 3.11+)
  • Model files not included in Docker build
  • Insufficient memory allocation (needs 2Gi minimum)
Symptoms:
  • Health endpoint shows models as “not loaded”
  • Moderation requests fail with 500 errors
Solutions:
  1. Check logs for HuggingFace download errors
  2. Verify sufficient memory (2Gi recommended)
  3. Add HUGGINGFACE_HUB_TOKEN if rate limited
  4. Increase timeout to 300s for model loading
Symptoms:
  • Frontend can’t connect to API
  • Browser console shows CORS errors
Solutions:
  1. Verify CORS_ORIGINS includes your frontend URL
  2. Check for exact match (including https://)
  3. Avoid trailing slashes in URLs
  4. Verify environment variable is set:
    gcloud run services describe guardian-api \
        --format 'value(spec.template.spec.containers[0].env)'
    
Cause: Cloud Run scales to zero when idleBehavior: First request after idle takes 10-30 secondsSolutions:
  • Accept delay (most cost-effective)
  • Set minimum instances:
    gcloud run services update guardian-api \
        --min-instances 1
    
    Note: This prevents scale-to-zero (increases cost)
  • Use Cloud Scheduler to ping API every 5 minutes

Cost Optimization

Free Tier

Google Cloud Run includes generous free tier:
  • 2 million requests/month
  • 360,000 GB-seconds of memory
  • 180,000 vCPU-seconds
Most small to medium applications stay within free tier.

Cost Management

Scale to Zero

Set min-instances=0 to avoid charges when idleSavings: Only pay for actual usageTrade-off: Cold start delays

Right-Size Resources

Start with 2Gi memory and scale if neededMonitor: Check Cloud Run metrics for actual usageAdjust: Increase only if hitting limits

Set Billing Alerts

Configure budget alerts in GCPRecommended: Alert at 50%, 80%, 100% of budgetPrevents: Unexpected charges

Limit Max Instances

Cap auto-scaling to prevent runaway costsRecommended: Start with 10 max instancesAdjust: Based on traffic patterns

Security Best Practices

1

Use Secret Manager

Never commit secrets to repository:
# Store Redis URL securely
echo "rediss://your-redis-url" | gcloud secrets create redis-url --data-file=-

# Mount in Cloud Run
gcloud run services update guardian-api \
    --update-secrets REDIS_URL=redis-url:latest
2

Configure CORS Properly

Only allow specific frontend origins:
# Good: Specific domains
CORS_ORIGINS=https://yourapp.com,https://staging.yourapp.com

# Bad: Wildcard (never use in production)
CORS_ORIGINS=*
3

Enable Rate Limiting

Configure Redis to prevent abuse:
# Set up Upstash Redis
# Get URL from https://upstash.com

# Configure in Cloud Run
gcloud run services update guardian-api \
    --update-secrets REDIS_URL=redis-url:latest
See Rate Limiting for details.
4

Monitor Logs

Regularly check logs for suspicious activity:
# Check for errors
gcloud run services logs read guardian-api \
    --region us-central1 \
    --filter="severity>=ERROR"

Next Steps

Additional Resources

  • Detailed Deployment Guide: backend/DEPLOYMENT.md in repository
  • Cloud Build Config: backend/cloudbuild.yaml
  • Google Cloud Run Docs: cloud.google.com/run
  • Docker Configuration: backend/Dockerfile

Production Checklist

Before going live, verify: