Overview
Guardian API is deployed to Google Cloud Run, a fully managed serverless platform that automatically scales your containerized application. This guide covers deployment options and configuration.Quick Deploy
The fastest way to deploy Guardian API to production:1
Prerequisites
- Google Cloud Platform account with billing enabled
gcloudCLI installed and configured- GitHub repository with backend code
2
Enable APIs
3
Create Artifact Registry
4
Deploy with Cloud Build
Deployment Options
- Cloud Build (Recommended)
- Manual Deploy
- Docker Compose (Local)
Automated CI/CD PipelineCloud Build automatically builds and deploys your Docker container:Benefits:
- Automatic builds from GitHub
- Integrated with Cloud Run
- Build caching for faster deployments
- Configurable via
cloudbuild.yaml
- Machine type: E2_HIGHCPU_8
- Timeout: 20 minutes
- Automatic tag:
latest+ commit SHA
Cloud Run Configuration
Resource Allocation
Production-ready configuration:| Setting | Value | Purpose |
|---|---|---|
| Memory | 2Gi | Sufficient for all ML models |
| CPU | 2 cores | Fast inference performance |
| Timeout | 300s | Allows model loading on cold start |
| Max Instances | 10 | Scale to handle traffic spikes |
| Min Instances | 0 | Scale to zero when idle (cost optimization) |
Update Configuration
Change resource allocation after deployment:Environment Variables
CORS_ORIGINS (Required)
CORS_ORIGINS (Required)
Purpose: Specify which frontend domains can access your APIFormat: Comma-separated list of URLsExample:Default:
https://guardian.korymsmith.devhttp://localhost:5173http://127.0.0.1:5173
LOG_LEVEL (Optional)
LOG_LEVEL (Optional)
Purpose: Control logging verbosityOptions:
DEBUG, INFO, WARNING, ERROR, CRITICALDefault: INFOExample:REDIS_URL (Optional)
REDIS_URL (Optional)
Purpose: Enable rate limiting with RedisFormat: Note: Rate limiting fails open if not configured (allows all requests)Set securely:
rediss://default:<token>@<host>:<port>Example (Upstash Redis):HUGGINGFACE_HUB_TOKEN (Optional)
HUGGINGFACE_HUB_TOKEN (Optional)
Purpose: Improve HuggingFace model download reliabilityNote: Public models don’t require a token, but having one can help with rate limitsGet token: HuggingFace SettingsSet securely:
Continuous Deployment
Set up automatic deployments when you push to GitHub:1
Connect Repository
2
Configure Trigger Settings
In the Google Cloud Console:
- Navigate to Cloud Build > Triggers
- Find your trigger
- Add substitution variables:
_CORS_ORIGINS: Your frontend URL_LOG_LEVEL:INFO(orDEBUG)
3
Automatic Deployments
Every push to
main now:- Triggers Cloud Build
- Builds Docker image
- Pushes to Artifact Registry
- Deploys to Cloud Run
- Creates new revision
Health Monitoring
Health Endpoint
Check API health and model status:View Logs
Monitor your API in real-time:Cloud Run Metrics
View in Google Cloud Console:- Request count and rate
- Response latency (p50, p95, p99)
- Error rate
- Instance count
- Memory and CPU usage
Troubleshooting
Service Won't Start
Service Won't Start
Check logs:Common issues:
- Missing dependencies in
requirements.txt - Python version mismatch (requires 3.11+)
- Model files not included in Docker build
- Insufficient memory allocation (needs 2Gi minimum)
Models Not Loading
Models Not Loading
Symptoms:
- Health endpoint shows models as “not loaded”
- Moderation requests fail with 500 errors
- Check logs for HuggingFace download errors
- Verify sufficient memory (2Gi recommended)
- Add
HUGGINGFACE_HUB_TOKENif rate limited - Increase timeout to 300s for model loading
CORS Errors
CORS Errors
Symptoms:
- Frontend can’t connect to API
- Browser console shows CORS errors
- Verify
CORS_ORIGINSincludes your frontend URL - Check for exact match (including
https://) - Avoid trailing slashes in URLs
- Verify environment variable is set:
Cold Start Delays
Cold Start Delays
Cause: Cloud Run scales to zero when idleBehavior: First request after idle takes 10-30 secondsSolutions:
- Accept delay (most cost-effective)
- Set minimum instances:
Note: This prevents scale-to-zero (increases cost)
- Use Cloud Scheduler to ping API every 5 minutes
Cost Optimization
Free Tier
Google Cloud Run includes generous free tier:- 2 million requests/month
- 360,000 GB-seconds of memory
- 180,000 vCPU-seconds
Cost Management
Scale to Zero
Set
min-instances=0 to avoid charges when idleSavings: Only pay for actual usageTrade-off: Cold start delaysRight-Size Resources
Start with 2Gi memory and scale if neededMonitor: Check Cloud Run metrics for actual usageAdjust: Increase only if hitting limits
Set Billing Alerts
Configure budget alerts in GCPRecommended: Alert at 50%, 80%, 100% of budgetPrevents: Unexpected charges
Limit Max Instances
Cap auto-scaling to prevent runaway costsRecommended: Start with 10 max instancesAdjust: Based on traffic patterns
Security Best Practices
1
Use Secret Manager
Never commit secrets to repository:
2
Configure CORS Properly
Only allow specific frontend origins:
3
Enable Rate Limiting
4
Monitor Logs
Regularly check logs for suspicious activity:
Next Steps
Configure Environment
Set up environment variables and configuration
Enable Rate Limiting
Protect your API with rate limiting
API Reference
Explore API endpoints and schemas
Architecture Guide
Understand the system architecture
Additional Resources
- Detailed Deployment Guide:
backend/DEPLOYMENT.mdin repository - Cloud Build Config:
backend/cloudbuild.yaml - Google Cloud Run Docs: cloud.google.com/run
- Docker Configuration:
backend/Dockerfile