ShepAI — LLM Risk Decisions in Under 5ms

What We Detect

6 evaluators. Every LLM threat vector covered.

Purpose-built for the inference gateway layer — not a generic WAF adapted for AI. Every evaluator is designed around how LLMs are actually attacked in production.

PROMPT_INJECTION

LLM Prompt Injection & Jailbreaks

Multi-layer detection covering the full spectrum of adversarial prompt techniques used against production LLMs — from social engineering to structural manipulation.

BOT_DETECTION

Automated Traffic & Bots

Identifies non-human traffic through multi-signal fingerprinting of request characteristics, client behaviour patterns, and session context.

DDOS

Volumetric & Rate Abuse

Detects abnormal request volumes across multiple time windows per source, with configurable thresholds that adapt to your traffic profile.

ABUSE

API Abuse & Policy Violations

Identifies usage patterns that violate content policies or indicate systematic misuse — including resource exhaustion and bulk automation.

FAKE_ACCOUNT

Fake & Synthetic Accounts

Evaluates account trust signals to detect newly-created or machine-generated identities attempting to abuse free tiers or bypass usage limits.

DISTILLATION

Model Extraction & Knowledge Theft

Detects systematic attempts to clone or replicate your AI model's capabilities through structured bulk queries, capability mapping, and output harvesting campaigns.

Full signal transparency. Every response includes a per-signal score (0–100) and a human-readable reason string. Your gateway sees exactly why a request was flagged — and can apply custom policies on top. Decision thresholds are fully configurable per client so you stay in control.

Quick Start

Integrate in minutes.

A single POST. No SDK required. Works with any HTTP client in any language.

Request
curl -X POST https://shepai.cloud/v1/risk/evaluate \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "requestId":   "req_abc123",
    "clientId":    "my-gateway-prod",
    "providerId":  "fireworks",
    "ipAddress":   "203.0.113.42",
    "userAgent":   "Mozilla/5.0 ...",
    "userId":      "user_7f3a",
    "accountCreatedAt": "2026-05-01T10:00:00Z",
    "prompt":      "Ignore all previous instructions...",
    "model":       "llama-3.1-70b",
    "requestsLastMinute": 12,
    "requestsLastHour":   87
  }'
Response — 3ms
{
  "requestId": "req_abc123",
  "decision": "BLOCK",
  "riskScore": 95,
  "riskLevel": "CRITICAL",
  "processingTimeMs": 3,
  "cached": false,
  "signals": [
    {
      "type":      "PROMPT_INJECTION",
      "score":     95,
      "reason":    "Known jailbreak persona",
      "triggered": true
    },
    {
      "type":      "DDOS",
      "score":     0,
      "reason":    "Normal rate: 12 req/min",
      "triggered": false
    }
  ]
}

Request
import httpx

client = httpx.Client(
    base_url="https://shepai.cloud",
    headers={"Authorization": "Bearer sk_live_..."},
)

response = client.post("/v1/risk/evaluate", json={
    "requestId":         "req_abc123",
    "clientId":          "my-gateway-prod",
    "ipAddress":         "203.0.113.42",
    "userAgent":         "Mozilla/5.0 ...",
    "userId":            "user_7f3a",
    "prompt":            "User's prompt text here...",
    "model":             "llama-3.1-70b",
})

result = response.json()
if result["decision"] == "BLOCK":
    raise PermissionError("Request blocked by ShepAI")

# Otherwise forward to inference provider
decision   = result["decision"]    # "ALLOW"
risk_score = result["riskScore"]   # 0–100
signals    = result["signals"]     # per-evaluator breakdown
Async variant (httpx)
import asyncio, httpx

async def check_risk(payload: dict) -> str:
    async with httpx.AsyncClient(
        base_url="https://shepai.cloud",
        headers={"Authorization": "Bearer sk_live_..."},
    ) as client:
        r = await client.post("/v1/risk/evaluate", json=payload)
        return r.json()["decision"]

Request
const SHEP_KEY = process.env.SHEPAI_API_KEY;

async function checkRisk(payload) {
  const res = await fetch("https://shepai.cloud/v1/risk/evaluate", {
    method:  "POST",
    headers: {
      "Authorization": `Bearer ${SHEP_KEY}`,
      "Content-Type":  "application/json",
    },
    body: JSON.stringify(payload),
  });
  return res.json();
}

// In your inference gateway middleware:
const { decision, riskScore, signals } = await checkRisk({
  requestId:  "req_abc123",
  clientId:   "my-gateway-prod",
  ipAddress:  req.ip,
  userAgent:  req.headers["user-agent"],
  userId:     session.userId,
  prompt:     req.body.messages.at(-1)?.content,
  model:      req.body.model,
});

if (decision === "BLOCK") {
  return res.status(403).json({ error: "Request blocked", riskScore });
}
OpenAI proxy example
import OpenAI from "openai";

const openai = new OpenAI();

async function safeCompletion(messages, ctx) {
  const risk = await checkRisk({
    ipAddress: ctx.ip,
    userId:    ctx.userId,
    prompt:    messages.at(-1).content,
    model:     "gpt-4o",
  });

  if (risk.decision !== "ALLOW") throw new Error("Blocked");

  return openai.chat.completions.create({ model: "gpt-4o", messages });
}

WebClient (reactive)
import org.springframework.web.reactive.function.client.WebClient;

var client = WebClient.builder()
    .baseUrl("https://shepai.cloud")
    .defaultHeader("Authorization", "Bearer sk_live_...")
    .build();

record RiskPayload(
    String requestId, String clientId,
    String ipAddress,  String userAgent,
    String userId,     String prompt
) {}

var result = client.post()
    .uri("/v1/risk/evaluate")
    .bodyValue(new RiskPayload(
        requestId, clientId, ip, userAgent, userId, prompt
    ))
    .retrieve()
    .bodyToMono(RiskResponse.class)
    .block(); // or .subscribe() for non-blocking

if (result.decision() == Decision.BLOCK) {
    throw new SecurityException("Request blocked: score="
        + result.riskScore());
}
Spring Boot filter (auto-wire)
@Component
public class ShepAIFilter implements WebFilter {

  private final ShepAIClient shepai;

  @Override
  public Mono<Void> filter(
      ServerWebExchange exchange,
      WebFilterChain   chain
  ) {
    return shepai.evaluate(exchange)
      .flatMap(r -> r.isBlock()
          ? reject(exchange, r)
          : chain.filter(exchange));
  }
}

Pricing

Simple, transparent pricing.

Start free. Scale without friction. No per-signal charges — flat rate for all evaluators.

Monthly Annual Save 20%

Free

For evaluation and side projects

$ 0 / month

Forever free · No credit card required

Get Started Free

API key delivered within 24h

✓10,000 requests / month
✓Bot Detection evaluator
✓DDoS evaluator (basic thresholds)
–Prompt Injection (limited — 2 of 14 families)
–Abuse & Fake Account evaluators
–Custom score thresholds
–Prometheus metrics
✓Community support (GitHub)

LLM Risk Decisions
in Under 5ms

One API call. Six signal layers.
Sub-millisecond logic.

Forward your gateway request

Six evaluators run in parallel

Apply ALLOW / CHALLENGE / BLOCK

6 evaluators. Every LLM threat vector covered.

LLM Prompt Injection & Jailbreaks

Automated Traffic & Bots

Volumetric & Rate Abuse

API Abuse & Policy Violations

Fake & Synthetic Accounts

Model Extraction & Knowledge Theft

Integrate in minutes.

Simple, transparent pricing.

Built for the inference gateway layer.

Feature	Free	Pro	Enterprise
Monthly requests	10K	5M	Unlimited
Bot Detection	✓	✓	✓
DDoS Detection	✓	✓	✓
Prompt Injection	Partial	Full	Full + custom
Abuse Detection	–	✓	✓
Fake Account Detection	–	✓	✓
Model Extraction Detection	–	✓	✓
Custom thresholds	–	✓	✓
Custom rule authoring	–	–	✓
Prometheus metrics	–	✓	✓
Latency SLA	Best-effort	p99 < 5ms	p99 < 3ms
Uptime SLA	–	99.9%	99.99%
Audit logs	–	–	✓
SOC 2 / DPA	–	–	✓
Support	Community	Priority email	Dedicated Slack
API keys	1	5	Unlimited

LLM Risk Decisions in Under 5ms

One API call. Six signal layers.Sub-millisecond logic.

Forward your gateway request

Six evaluators run in parallel

Apply ALLOW / CHALLENGE / BLOCK

6 evaluators. Every LLM threat vector covered.

LLM Prompt Injection & Jailbreaks

Automated Traffic & Bots

Volumetric & Rate Abuse

API Abuse & Policy Violations

Fake & Synthetic Accounts

Model Extraction & Knowledge Theft

Integrate in minutes.

Simple, transparent pricing.

Built for the inference gateway layer.

LLM Risk Decisions
in Under 5ms

One API call. Six signal layers.
Sub-millisecond logic.