Skip to main content

Overview

Guardian API uses a sophisticated multi-model ensemble architecture that combines machine learning models and rule-based heuristics to provide comprehensive content moderation.

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Client Request                        │
│                  POST /v1/moderate/text                      │
└──────────────────────────┬──────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                    Preprocessing Layer                       │
│  • URL removal      • Mention removal    • Emoji handling   │
│  • Normalization    • Feature extraction                    │
└──────────────────────────┬──────────────────────────────────┘

              ┌────────────┼────────────┐
              │            │            │
              ▼            ▼            ▼
        ┌─────────┐  ┌─────────┐  ┌─────────┐
        │ Model 1 │  │ Model 2 │  │ Model 3 │
        │ Sexism  │  │Toxicity │  │  Rules  │
        │  LASSO  │  │RoBERTa  │  │ Engine  │
        └────┬────┘  └────┬────┘  └────┬────┘
             │            │            │
             └────────────┼────────────┘

              ┌────────────────────┐
              │    Model 4         │
              │  Ensemble Layer    │
              │  • Score fusion    │
              │  • Conflict res.   │
              │  • Severity calc.  │
              └──────────┬─────────┘


              ┌───────────────────┐
              │  JSON Response    │
              │  • Labels         │
              │  • Scores         │
              │  • Ensemble       │
              │  • Metadata       │
              └───────────────────┘

Components

1. Preprocessing Layer

The preprocessing layer prepares text for model inference:
Text Cleaning:
  • Removes URLs (http://, https://, www.)
  • Removes user mentions (@username)
  • Handles emojis (converts to descriptions)
  • Normalizes whitespace
  • Converts to lowercase
Feature Extraction:
  • Text length
  • Caps abuse detection (>70% uppercase)
  • Character repetition detection (3+ repeated chars)
  • Exclamation mark count
  • Sentiment indicators
Code Location: backend/app/core/preprocessing.py

2. Model Layer

Three independent models analyze the content in parallel:

Sexism Classifier

Type: LASSO RegressionTraining: ~40k tweetsFeatures: 2500 n-grams (1-2)Output: Binary classification + confidence scoreThreshold: 0.400

Toxicity Model

Type: HuggingFace TransformerModel: unitary/unbiased-toxic-robertaLabels: 6 toxicity categoriesDevice: CUDA if availableFallback: CPU with warning

Rule Engine

Type: Heuristic-basedRules: JSON configuration filesChecks: Slurs, threats, self-harm, profanityPattern Matching: Regex + exact matchingExtensible: Easy to add new rules

3. Ensemble Layer

The ensemble layer combines outputs from all three models:
1

Score Aggregation

Weighted fusion of model scores:
  • 35% Sexism classifier
  • 35% Toxicity model
  • 30% Rule engine
ensemble_score = (
    0.35 * sexism_score +
    0.35 * toxicity_score +
    0.30 * rule_score
)
2

Conflict Resolution

Rule-based detections override low model scores for critical issues:
  • Threat detected → Override ensemble
  • Self-harm detected → Override ensemble
  • Slur detected → Boost ensemble score
3

Severity Calculation

Maps scores to severity levels:
  • 0.0 - 0.3: Low
  • 0.3 - 0.6: Moderate
  • 0.6 - 1.0: High
4

Primary Issue

Identifies the main concern based on highest individual model score
Code Location: backend/app/core/ensemble.py

4. Response Layer

Structured JSON response with comprehensive moderation data. See Response Structure for details.

Design Principles

Parallel Processing

All models run in parallel for optimal performance:
# Models run simultaneously
sexism_result = sexism_model.predict(text)
toxicity_result = toxicity_model.predict(text)
rules_result = rule_engine.check(text)

# Results combined in ensemble
final = ensemble.aggregate(sexism_result, toxicity_result, rules_result)

Fail-Safe Design

  • Model failures don’t crash the API
  • Toxicity model falls back to CPU if GPU unavailable
  • Rate limiting fails open (allows requests if Redis down)
  • Individual model errors logged but don’t block response

Extensibility

Easy to add new models or rules:
  1. New ML Model: Implement in backend/app/models/
  2. New Rules: Add JSON file in backend/app/models/rules/
  3. New Ensemble Logic: Modify backend/app/core/ensemble.py

Performance

Response Time

Average: 20-40msWith GPU: 15-25msBatch requests: 5-10ms per text

Throughput

Single instance: ~50 req/secWith rate limiting: ConfigurableScalable: Horizontal scaling supported

Technology Stack

LayerTechnologyPurpose
API FrameworkFastAPIREST API, async support
ServerUvicornASGI server
ML FrameworkPyTorch, Scikit-learnModel inference
NLPHuggingFace TransformersToxicity detection
Rate LimitingRedis (Upstash)Optional rate limiting
ValidationPydantic v2Request/response validation

Deployment Architecture

  • Single Instance
  • Load Balanced
  • Containerized
┌────────────────┐
│   Uvicorn      │
│   (Port 8000)  │
└────────┬───────┘

┌────────▼───────┐
│  FastAPI App   │
│  • Models      │
│  • Endpoints   │
└────────┬───────┘

┌────────▼───────┐
│   Redis        │
│  (Rate Limit)  │
└────────────────┘
Best for: Development, small-scale productionCapacity: ~50 requests/second

Next Steps