انتقل إلى المحتوى

SLOs & Budgets (Pilot)

This document outlines the initial Service Level Objectives (SLOs) and resource budgets for the Cloudflare RAG service during its pilot phase.


Service Level Objectives (SLOs)

Metric Threshold Notes
RAG Query Latency (p95) ≤ 1.5 seconds This is for a "warm" worker. The total time is broken down as follows:
- embed: ≤ 300ms
- search: ≤ 600ms
- rerank: ≤ 300ms
Uptime 99.5% Measured by the /rag/health endpoint.

Resource & Cost Budgets

These are soft limits for the pilot phase to control costs and usage.

Resource Budget (per month) Environment
RAG Queries ≤ 10,000 dev + stage combined
Vector Count ≤ 50,000 Per environment (dev, stage)

Retry Policy

  • AI Gateway: The Cloudflare AI Gateway automatically handles retries for requests to third-party providers.
  • API Client: To avoid compounding retries, the Labeeb API's HTTP client should be configured with retry(0) when calling endpoints via the AI Gateway.