Now in early access

LLM Risk Decisions
in Under 5ms

ShepAI evaluates every AI API request for bots, prompt injection, DDoS, abuse, and fake accounts — returning ALLOW, CHALLENGE, or BLOCK before your inference even starts.

< 5ms
p99 latency
14+
attack families
5
parallel evaluators
POST /v1/risk/evaluate Live
"requestId": "req_01jxk8f3a2b",
"decision": "BLOCK",
"riskScore": 95,
"riskLevel": "CRITICAL",
"processingTimeMs": 3,
"cached": false,
"signals": [
  {
    "type": "PROMPT_INJECTION",
    "score": 95,
    "reason": "Known jailbreak persona",
    "triggered": true
  },
  {
    "type": "BOT_DETECTION",
    "score": 62,
    "reason": "Headless browser fingerprint",
    "triggered": true
  }
]

One API call. Five signal layers.
Sub-millisecond logic.

Forward your gateway request

POST the incoming LLM request metadata — IP, User-Agent, prompt text, userId, account age — to /v1/risk/evaluate before calling your inference provider.

Five evaluators run in parallel

ShepAI's reactive engine fans out across all signal evaluators simultaneously. Repeat offender IPs are served from an in-memory decision cache in under 1ms.

Bot Detection Prompt Injection DDoS Abuse Fake Account

Apply ALLOW / CHALLENGE / BLOCK

The response includes an aggregate risk score (0–100), a severity band, and a full per-signal breakdown so you can apply your own custom policy thresholds on top.

14+ attack families.
Every LLM threat vector covered.

Purpose-built for the inference gateway layer — not a generic WAF adapted for AI. Every evaluator is designed around how LLMs are actually attacked in production.

BOT_DETECTION

Automated Traffic & Bots

Identifies non-human traffic through multi-signal fingerprinting of request characteristics, client behaviour patterns, and session context.

Automation framework detection Scripted client identification Request fingerprint analysis Behavioural anomaly signals
DDOS

Volumetric & Rate Abuse

Detects abnormal request volumes across multiple time windows per source, with configurable thresholds that adapt to your traffic profile.

Multi-window rate analysis Per-source tracking Burst & sustained flood detection Configurable thresholds
ABUSE

API Abuse & Policy Violations

Identifies usage patterns that violate content policies or indicate systematic misuse — including resource exhaustion and bulk automation.

Content policy violations Resource exhaustion patterns Bulk automation signals Payload anomaly analysis
FAKE_ACCOUNT

Fake & Synthetic Accounts

Evaluates account trust signals to detect newly-created or machine-generated identities attempting to abuse free tiers or bypass usage limits.

Account age & tenure signals Identity consistency checks Registration anomaly detection Velocity & pattern analysis
Full signal transparency. Every response includes a per-signal score (0–100) and a human-readable reason string. Your gateway sees exactly why a request was flagged — and can apply custom policies on top. Decision thresholds are fully configurable per client so you stay in control.

Integrate in minutes.

A single POST. No SDK required. Works with any HTTP client in any language.

Request
curl -X POST https://api.shep.ai/v1/risk/evaluate \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "requestId":   "req_abc123",
    "clientId":    "my-gateway-prod",
    "providerId":  "fireworks",
    "ipAddress":   "203.0.113.42",
    "userAgent":   "Mozilla/5.0 ...",
    "userId":      "user_7f3a",
    "accountCreatedAt": "2026-05-01T10:00:00Z",
    "prompt":      "Ignore all previous instructions...",
    "model":       "llama-3.1-70b",
    "requestsLastMinute": 12,
    "requestsLastHour":   87
  }'
Response — 3ms
{
  "requestId": "req_abc123",
  "decision": "BLOCK",
  "riskScore": 95,
  "riskLevel": "CRITICAL",
  "processingTimeMs": 3,
  "cached": false,
  "signals": [
    {
      "type":      "PROMPT_INJECTION",
      "score":     95,
      "reason":    "Known jailbreak persona",
      "triggered": true
    },
    {
      "type":      "DDOS",
      "score":     0,
      "reason":    "Normal rate: 12 req/min",
      "triggered": false
    }
  ]
}
Request
import httpx

client = httpx.Client(
    base_url="https://api.shep.ai",
    headers={"Authorization": "Bearer sk_live_..."},
)

response = client.post("/v1/risk/evaluate", json={
    "requestId":         "req_abc123",
    "clientId":          "my-gateway-prod",
    "ipAddress":         "203.0.113.42",
    "userAgent":         "Mozilla/5.0 ...",
    "userId":            "user_7f3a",
    "prompt":            "User's prompt text here...",
    "model":             "llama-3.1-70b",
})

result = response.json()
if result["decision"] == "BLOCK":
    raise PermissionError("Request blocked by ShepAI")

# Otherwise forward to inference provider
decision   = result["decision"]    # "ALLOW"
risk_score = result["riskScore"]   # 0–100
signals    = result["signals"]     # per-evaluator breakdown
Async variant (httpx)
import asyncio, httpx

async def check_risk(payload: dict) -> str:
    async with httpx.AsyncClient(
        base_url="https://api.shep.ai",
        headers={"Authorization": "Bearer sk_live_..."},
    ) as client:
        r = await client.post("/v1/risk/evaluate", json=payload)
        return r.json()["decision"]
Request
const SHEP_KEY = process.env.SHEPAI_API_KEY;

async function checkRisk(payload) {
  const res = await fetch("https://api.shep.ai/v1/risk/evaluate", {
    method:  "POST",
    headers: {
      "Authorization": `Bearer ${SHEP_KEY}`,
      "Content-Type":  "application/json",
    },
    body: JSON.stringify(payload),
  });
  return res.json();
}

// In your inference gateway middleware:
const { decision, riskScore, signals } = await checkRisk({
  requestId:  "req_abc123",
  clientId:   "my-gateway-prod",
  ipAddress:  req.ip,
  userAgent:  req.headers["user-agent"],
  userId:     session.userId,
  prompt:     req.body.messages.at(-1)?.content,
  model:      req.body.model,
});

if (decision === "BLOCK") {
  return res.status(403).json({ error: "Request blocked", riskScore });
}
OpenAI proxy example
import OpenAI from "openai";

const openai = new OpenAI();

async function safeCompletion(messages, ctx) {
  const risk = await checkRisk({
    ipAddress: ctx.ip,
    userId:    ctx.userId,
    prompt:    messages.at(-1).content,
    model:     "gpt-4o",
  });

  if (risk.decision !== "ALLOW") throw new Error("Blocked");

  return openai.chat.completions.create({ model: "gpt-4o", messages });
}
WebClient (reactive)
import org.springframework.web.reactive.function.client.WebClient;

var client = WebClient.builder()
    .baseUrl("https://api.shep.ai")
    .defaultHeader("Authorization", "Bearer sk_live_...")
    .build();

record RiskPayload(
    String requestId, String clientId,
    String ipAddress,  String userAgent,
    String userId,     String prompt
) {}

var result = client.post()
    .uri("/v1/risk/evaluate")
    .bodyValue(new RiskPayload(
        requestId, clientId, ip, userAgent, userId, prompt
    ))
    .retrieve()
    .bodyToMono(RiskResponse.class)
    .block(); // or .subscribe() for non-blocking

if (result.decision() == Decision.BLOCK) {
    throw new SecurityException("Request blocked: score="
        + result.riskScore());
}
Spring Boot filter (auto-wire)
@Component
public class ShepAIFilter implements WebFilter {

  private final ShepAIClient shepai;

  @Override
  public Mono<Void> filter(
      ServerWebExchange exchange,
      WebFilterChain   chain
  ) {
    return shepai.evaluate(exchange)
      .flatMap(r -> r.isBlock()
          ? reject(exchange, r)
          : chain.filter(exchange));
  }
}

Simple, transparent pricing.

Start free. Scale without friction. No per-signal charges — flat rate for all evaluators.

Monthly Annual Save 20%
Free

For evaluation and side projects

$ 0 / month

Forever free · No credit card required

Get Started Free

API key delivered within 24h


  • 10,000 requests / month
  • Bot Detection evaluator
  • DDoS evaluator (basic thresholds)
  • Prompt Injection (limited — 2 of 14 families)
  • Abuse & Fake Account evaluators
  • Custom score thresholds
  • Prometheus metrics
  • Community support (GitHub)
Enterprise

For large-scale AI infrastructure

Custom

Volume discounts · Annual contracts available

Talk to Sales

Response within 4 business hours


  • Unlimited requests
  • All Pro features
  • Custom rule authoring (bring your own patterns)
  • Dedicated isolated deployment
  • p99 < 3ms SLA (dedicated infrastructure)
  • 99.99% uptime SLA
  • Dedicated Slack channel + named engineer
  • SOC 2 Type II, DPA available
  • Audit logs & compliance exports
  • Unlimited API keys
Feature Free Pro Enterprise
Monthly requests10K5MUnlimited
Bot Detection
DDoS Detection
Prompt Injection2 of 14All 14+All 14+ custom
Abuse Detection
Fake Account Detection
Custom thresholds
Custom rule authoring
Prometheus metrics
Latency SLABest-effortp99 < 5msp99 < 3ms
Uptime SLA99.9%99.99%
Audit logs
SOC 2 / DPA
SupportCommunityPriority emailDedicated Slack
API keys15Unlimited

Built for the inference gateway layer.

ShepAI is purpose-built for AI provider gateways — not adapted from a generic WAF. Every signal is designed around how LLMs are attacked in production, updated continuously as the threat landscape evolves.

< 1ms
cache-hit latency
100K
IPs cached per node
Java 22
virtual threads ready