Overview
Guardian API returns structured JSON responses with comprehensive moderation data from all four models.
Response Schema
{
"text" : string , // Original or sanitized input
"label" : { // Individual model outputs
"sexism" : { ... },
"toxicity" : { ... },
"rules" : { ... }
},
"ensemble" : { ... }, // Combined decision
"meta" : { ... } // Request metadata
}
Top-Level Fields
text (string)
The input text that was moderated. May be truncated if very long.
{
"text" : "Your input text here"
}
label (object)
Outputs from all three detection models.
ensemble (object)
Final moderation decision combining all models.
Request metadata including processing time and models used.
Label Structure
Sexism Label
{
"label" : {
"sexism" : {
"score" : 0.724 , // Confidence score (0-1)
"severity" : "high" , // "low", "moderate", or "high"
"model_version" : "sexism_lasso_v1" , // Model identifier
"threshold_met" : true // Whether score exceeds threshold
}
}
}
Field Type Description scorefloat LASSO model confidence (0.0 to 1.0) severitystring Severity level based on score model_versionstring Model identifier for tracking threshold_metboolean True if score ≥ 0.400
Toxicity Label
{
"label" : {
"toxicity" : {
"overall" : 0.742 , // Overall toxicity score
"insult" : 0.631 , // Insult score
"threat" : 0.123 , // Threat score
"identity_attack" : 0.412 , // Identity attack score
"profanity" : 0.584 , // Profanity score
"model_version" : "toxic_roberta_v1" // Model identifier
}
}
}
Field Type Description overallfloat Maximum toxicity across all categories insultfloat Personal insult score (0-1) threatfloat Threatening language score (0-1) identity_attackfloat Identity-based attack score (0-1) profanityfloat Profane language score (0-1) model_versionstring Model identifier
The overall score is automatically set to at least the maximum of all sub-category scores.
Rules Label
{
"label" : {
"rules" : {
"slur_detected" : false , // Slur flag
"threat_detected" : true , // Threat pattern flag
"self_harm_flag" : false , // Self-harm phrase flag
"profanity_flag" : true , // Profanity flag
"caps_abuse" : false , // Excessive caps flag
"character_repetition" : false , // Repeated chars flag
"model_version" : "rules_v1" // Model identifier
}
}
}
Field Type Description slur_detectedboolean True if slurs found threat_detectedboolean True if threat patterns matched self_harm_flagboolean True if self-harm phrases found profanity_flagboolean True if profanity detected caps_abuseboolean True if >70% uppercase character_repetitionboolean True if 3+ repeated characters model_versionstring Model identifier
Ensemble Structure
The ensemble object provides the final moderation decision :
{
"ensemble" : {
"summary" : "likely_harmful" , // Overall assessment
"primary_issue" : "sexism" , // Main concern
"score" : 0.612 , // Combined score (0-1)
"severity" : "high" // "low", "moderate", or "high"
}
}
Summary Values
Value Score Range Meaning likely_safe0.0 - 0.1 No significant harmful content potentially_harmful0.1 - 0.3 Some harmful indicators present likely_harmful0.3 - 0.6 Probable harmful content highly_harmful0.6 - 1.0 Strong evidence of harmful content
Primary Issue Values
Value Description "none"No significant issues detected "sexism"Sexist content is the main concern "toxicity"Toxic language is the main concern "slur"Slur detected by rules "threat"Threat detected by rules "self_harm"Self-harm content detected "harmful_content"Generic harmful content
Metadata about the request and processing:
{
"meta" : {
"processing_time_ms" : 24 , // Total processing time
"models_used" : [ // Models that ran
"sexism_lasso_v1" ,
"toxic_roberta_v1" ,
"rules_v1"
]
}
}
Field Type Description processing_time_msinteger Total time to process request (milliseconds) models_usedarray List of model versions used
Complete Example
Harmful Content
Safe Content
Mixed Signals
Request :{
"text" : "Women belong in the kitchen"
}
Response :{
"text" : "Women belong in the kitchen" ,
"label" : {
"sexism" : {
"score" : 0.847 ,
"severity" : "high" ,
"model_version" : "sexism_lasso_v1" ,
"threshold_met" : true
},
"toxicity" : {
"overall" : 0.621 ,
"insult" : 0.543 ,
"threat" : 0.087 ,
"identity_attack" : 0.621 ,
"profanity" : 0.124 ,
"model_version" : "toxic_roberta_v1"
},
"rules" : {
"slur_detected" : false ,
"threat_detected" : false ,
"self_harm_flag" : false ,
"profanity_flag" : false ,
"caps_abuse" : false ,
"character_repetition" : false ,
"model_version" : "rules_v1"
}
},
"ensemble" : {
"summary" : "highly_harmful" ,
"primary_issue" : "sexism" ,
"score" : 0.689 ,
"severity" : "high"
},
"meta" : {
"processing_time_ms" : 27 ,
"models_used" : [
"sexism_lasso_v1" ,
"toxic_roberta_v1" ,
"rules_v1"
]
}
}
Request :{
"text" : "I love this product! It's amazing!"
}
Response :{
"text" : "I love this product! It's amazing!" ,
"label" : {
"sexism" : {
"score" : 0.043 ,
"severity" : "low" ,
"model_version" : "sexism_lasso_v1" ,
"threshold_met" : false
},
"toxicity" : {
"overall" : 0.021 ,
"insult" : 0.012 ,
"threat" : 0.008 ,
"identity_attack" : 0.010 ,
"profanity" : 0.015 ,
"model_version" : "toxic_roberta_v1"
},
"rules" : {
"slur_detected" : false ,
"threat_detected" : false ,
"self_harm_flag" : false ,
"profanity_flag" : false ,
"caps_abuse" : false ,
"character_repetition" : false ,
"model_version" : "rules_v1"
}
},
"ensemble" : {
"summary" : "likely_safe" ,
"primary_issue" : "none" ,
"score" : 0.024 ,
"severity" : "low"
},
"meta" : {
"processing_time_ms" : 19 ,
"models_used" : [
"sexism_lasso_v1" ,
"toxic_roberta_v1" ,
"rules_v1"
]
}
}
Request :{
"text" : "This is garbage but I'll try again"
}
Response :{
"text" : "This is garbage but I'll try again" ,
"label" : {
"sexism" : {
"score" : 0.123 ,
"severity" : "low" ,
"model_version" : "sexism_lasso_v1" ,
"threshold_met" : false
},
"toxicity" : {
"overall" : 0.412 ,
"insult" : 0.234 ,
"threat" : 0.067 ,
"identity_attack" : 0.089 ,
"profanity" : 0.412 ,
"model_version" : "toxic_roberta_v1"
},
"rules" : {
"slur_detected" : false ,
"threat_detected" : false ,
"self_harm_flag" : false ,
"profanity_flag" : true ,
"caps_abuse" : false ,
"character_repetition" : false ,
"model_version" : "rules_v1"
}
},
"ensemble" : {
"summary" : "likely_harmful" ,
"primary_issue" : "toxicity" ,
"score" : 0.341 ,
"severity" : "moderate"
},
"meta" : {
"processing_time_ms" : 22 ,
"models_used" : [
"sexism_lasso_v1" ,
"toxic_roberta_v1" ,
"rules_v1"
]
}
}
Using Response Data
Decision Making
Check ensemble.summary
Use the summary for quick decisions:
"likely_safe": Allow content
"potentially_harmful": Flag for review
"likely_harmful" or "highly_harmful": Block or moderate
Check ensemble.primary_issue
Understand why content was flagged:
Show specific feedback to users
Route to appropriate moderators
Apply category-specific rules
Use individual labels for detail
Access specific model outputs for:
Detailed reporting
Custom threshold logic
Audit trails
Monitor processing_time_ms
Track performance:
Identify slow requests
Optimize infrastructure
Set appropriate timeouts
Example Implementation
def handle_moderation_result ( response ):
ensemble = response[ "ensemble" ]
if ensemble[ "summary" ] == "highly_harmful" :
# Block immediately
return "BLOCK"
elif ensemble[ "summary" ] == "likely_harmful" :
# Check specific issues
if ensemble[ "primary_issue" ] in [ "threat" , "self_harm" ]:
return "BLOCK_AND_ALERT"
else :
return "HOLD_FOR_REVIEW"
elif ensemble[ "summary" ] == "potentially_harmful" :
# Allow with monitoring
return "ALLOW_WITH_FLAG"
else :
# Safe to publish
return "ALLOW"
Next Steps
API Reference Try the moderation endpoint
Ensemble Learn how scores are calculated
Python SDK Use the Python SDK
JavaScript SDK Use the JavaScript SDK