Skip to content

Request lifecycle

Every request passes through an ordered chain of policy gates in the data plane before it reaches a provider. Each gate can let the request continue or reject it — and the gates run in a fixed order so that, for example, a request is authenticated before any budget is charged, and a cache hit avoids upstream spend entirely.

The gates in order

  1. Key-auth — validates the API key and resolves the caller's identity (the organization.project.user consumer). Everything downstream keys on this.
  2. Guardrails — screens the request: PII masking, pattern-based prompt-injection, and semantic prompt-injection. A match short-circuits with a 403 before any model is called.
  3. Routing — resolves the client's logical model (e.g. coding-default) to the configured provider and upstream model. For agent traffic, the equivalent step routes to the registered MCP server.
  4. Budget & limits — checks the consumer's monthly USD budget (hierarchical) and per-minute token limit. Over either cap, the request is rejected before reaching the provider.
  5. Semantic cache — if enabled, a similar prior prompt returns a stored answer, skipping the provider and saving the full token spend.
  6. Provider / MCP server — the request reaches your upstream. The response flows back and usage is recorded for budgets, dashboards, and the audit trail.

Which gate rejected my request?

The status code tells you which gate stopped a request:

StatusGateMeaningWhat to do
401 UnauthorizedKey-authMissing or invalid API keyCheck the key and the Authorization: Bearer … header — see Manage API keys
403 BlockedGuardrailsRequest matched a PII or prompt-injection ruleReview Blocked requests; report a false positive if needed
429 Too Many RequestsBudget & limitsOver the monthly budget or per-minute token limitSee Usage & budget; an admin can adjust budgets & limits
5xxProviderThe upstream provider or MCP server failedCheck the provider status in the Providers screen

For how identities and tenancy shape these checks, see Multi-tenancy.

Enterprise AI governance, on infrastructure you own.