Overview
Guardian API returns structured JSON responses with comprehensive moderation data from all four models.Response Schema
Top-Level Fields
text (string)
The input text that was moderated. May be truncated if very long.
label (object)
Outputs from all three detection models.
ensemble (object)
Final moderation decision combining all models.
meta (object)
Request metadata including processing time and models used.
Label Structure
Sexism Label
| Field | Type | Description |
|---|---|---|
score | float | LASSO model confidence (0.0 to 1.0) |
severity | string | Severity level based on score |
model_version | string | Model identifier for tracking |
threshold_met | boolean | True if score ≥ 0.400 |
Toxicity Label
| Field | Type | Description |
|---|---|---|
overall | float | Maximum toxicity across all categories |
insult | float | Personal insult score (0-1) |
threat | float | Threatening language score (0-1) |
identity_attack | float | Identity-based attack score (0-1) |
profanity | float | Profane language score (0-1) |
model_version | string | Model identifier |
The
overall score is automatically set to at least the maximum of all sub-category scores.Rules Label
| Field | Type | Description |
|---|---|---|
slur_detected | boolean | True if slurs found |
threat_detected | boolean | True if threat patterns matched |
self_harm_flag | boolean | True if self-harm phrases found |
profanity_flag | boolean | True if profanity detected |
caps_abuse | boolean | True if >70% uppercase |
character_repetition | boolean | True if 3+ repeated characters |
model_version | string | Model identifier |
Ensemble Structure
The ensemble object provides the final moderation decision:Summary Values
| Value | Score Range | Meaning |
|---|---|---|
likely_safe | 0.0 - 0.1 | No significant harmful content |
potentially_harmful | 0.1 - 0.3 | Some harmful indicators present |
likely_harmful | 0.3 - 0.6 | Probable harmful content |
highly_harmful | 0.6 - 1.0 | Strong evidence of harmful content |
Primary Issue Values
| Value | Description |
|---|---|
"none" | No significant issues detected |
"sexism" | Sexist content is the main concern |
"toxicity" | Toxic language is the main concern |
"slur" | Slur detected by rules |
"threat" | Threat detected by rules |
"self_harm" | Self-harm content detected |
"harmful_content" | Generic harmful content |
Meta Structure
Metadata about the request and processing:| Field | Type | Description |
|---|---|---|
processing_time_ms | integer | Total time to process request (milliseconds) |
models_used | array | List of model versions used |
Complete Example
- Harmful Content
- Safe Content
- Mixed Signals
Request:Response:
Using Response Data
Decision Making
1
Check ensemble.summary
Use the summary for quick decisions:
"likely_safe": Allow content"potentially_harmful": Flag for review"likely_harmful"or"highly_harmful": Block or moderate
2
Check ensemble.primary_issue
Understand why content was flagged:
- Show specific feedback to users
- Route to appropriate moderators
- Apply category-specific rules
3
Use individual labels for detail
Access specific model outputs for:
- Detailed reporting
- Custom threshold logic
- Audit trails
4
Monitor processing_time_ms
Track performance:
- Identify slow requests
- Optimize infrastructure
- Set appropriate timeouts