🌐 เอกสารภาษาไทยกำลังจัดทำ — เนื้อหาด้านล่างเป็นภาษาอังกฤษชั่วคราว จนกว่าจะมีการแปล. This page is not yet translated; English content is shown temporarily.

Semantic cache

The semantic cache returns a stored answer when a new prompt is similar in meaning to a previous one, skipping the model call entirely and saving the full token spend. It's per-project, opt-in, and tenant-isolated.

Who can do this

Org admins (for their organization) and platform admins, on Projects → Semantic Cache.

Enable and tune

Open Projects → Semantic Cache and toggle it on.
Set the TTL — how long a cached answer stays valid.
Set the similarity threshold — how close a new prompt must be to a cached one to count as a hit (higher = stricter, fewer but safer hits).
Choose the key strategy — whether to match on the last question or the full conversation.
Save.

The Semantic Cache tab

How it helps

A cache hit short-circuits before the provider, so it costs nothing against the project's budget. Cached responses are isolated per project — one project's answers are never served to another. Savings appear in the project's usage and dashboards.

Tuning the threshold

Start strict (high threshold) and lower it gradually while watching quality. Too low, and unrelated prompts may share an answer; too high, and you get few hits.

Next steps

Semantic guard — the same vector approach applied to safety.
Budgets & limits — see cache savings against spend.

Semantic cache ​

Enable and tune ​

How it helps ​

Next steps ​

Semantic cache

Enable and tune

How it helps

Next steps