AI-Box Service: The Intelligence Playbook¶
Service Status: Operational
This document is the primary operational manual for the AI-Box Service. As the intelligence core of the Labeeb platform, its reliability and performance are critical for all data enrichment and search functions. This playbook provides comprehensive, actionable guidance for on-call engineers to deploy, monitor, and troubleshoot this service. All procedures are designed for clarity, accuracy, and safe execution under pressure.
1. Mission & Scope¶
The AI-Box's mission is to provide a unified, high-performance interface for all machine learning models and complex retrieval logic used by the Labeeb platform.
It is a robust FastAPI application designed to be a stateless, scalable compute engine. It abstracts the complexity of interacting with ML models and search backends, providing a clean, versioned API for other services to consume.
Scope of Responsibilities
-
Is Responsible For:
- Executing Hybrid Search: Providing the
/retrieveendpoint to run complex search queries against the OpenSearch cluster. - Evidence Packaging: Containing the logic to build structured
EvidencePackobjects from raw search results. - Hosting ML Models: Serving AI models for tasks like S1 (Check-Worthiness) and S2 (Named Entity Recognition).
- Model Abstraction: Providing a consistent API interface, even if the underlying ML models are swapped or updated.
- Executing Hybrid Search: Providing the
-
Is NOT Responsible For:
- Data Persistence: The AI-Box is a stateless service; it does not own any data stores.
- Data Ingestion: It does not receive data from external sources; it only processes data it is asked to analyze.
- Job Scheduling: It does not run background jobs; it operates synchronously, responding to direct API requests.
2. Operational Characteristics¶
Understanding the unique behavior of the AI-Box is key to operating it effectively.
- Stateless Compute Layer: The service is fundamentally stateless. It holds no data in memory between requests and owns no database tables. This makes it highly scalable and resilient; new instances can be added or replaced without complex state migration, simplifying deployments and autoscaling.
- Synchronous API: Unlike other services that may use queues, the AI-Box operates entirely synchronously. Requests are processed in real-time, making latency a primary performance metric.
- Model-Driven: The service's identity and functionality are defined by the machine learning models it loads at startup. A change in a model can significantly alter the service's behavior, performance, and resource footprint.
- Resource Intensive: Depending on the models loaded, this service can be memory and CPU intensive. Resource allocation must be monitored closely.
3. Key Performance Indicators (KPIs)¶
As an on-call engineer, these are the primary metrics you should monitor to assess the health of the AI-Box.
| Metric | Prometheus Query | Threshold (Example) | Why It Matters |
|---|---|---|---|
| P95 Latency | histogram_quantile(0.95, rate(aibox_request_duration_seconds_bucket[5m])) |
> 500ms |
Indicates a slow response time for the majority of users. The primary measure of user experience. |
| Error Rate | rate(aibox_requests_total{code=~"5.."}[5m]) / rate(aibox_requests_total[5m]) |
> 2% |
A high error rate indicates a systemic problem with the service or its dependencies. |
| Request Rate | rate(aibox_requests_total[5m]) |
N/A | Provides a baseline of service traffic. Sudden drops can indicate upstream issues. |
| CPU Usage | container_cpu_usage_seconds_total{container="ai-box"} |
> 85% |
Sustained high CPU can lead to increased latency and request queuing. |
| RRF Merge Time | rate(aibox_retrieval_rrf_ms_sum[5m]) / rate(aibox_retrieval_rrf_ms_count[5m]) |
p95 < 30ms | Time spent in RRF fusion; spikes can indicate large candidate sets or backend lag. |
| Memory Usage | container_memory_usage_bytes{container="ai-box"} |
> 90% |
High memory usage risks the container being OOM-killed by the orchestrator. |
4. Standard Deployment Process¶
This checklist outlines the standard procedure for deploying the AI-Box service.
-
Prepare Environment Variables: Ensure all required environment variables are set in your deployment environment. Refer to the Environment & Configuration page for a complete list.
-
Build Docker Image: Build the Docker image for the AI-Box service.
-
Deploy to Target Environment: Update your deployment configuration to use the new image tag and apply the changes.
-
Verify Deployment: Perform a health check to ensure the service is running and can connect to its downstream dependencies.
-
Warm the Model Cache: After a fresh deployment, the ML models may not be loaded into memory. Send a sample request to each key endpoint to "warm up" the service and ensure the first real user request is fast.
# Warm the S1 model (feature-flagged via ENABLE_AIB_15) curl -X POST http://localhost:8001/s1/score -H "Content-Type: application/json" -d '{"text": "warmup"}' # Warm the retrieval models curl -X POST http://localhost:8001/retrieve -H "Content-Type: application/json" -d '{"query": "warmup"}' # Warm the evidence pack (hydration is optional) curl -X POST http://localhost:8001/retrieve_pack -H "Content-Type: application/json" -d '{"query": "warmup", "k_bm25": 10, "k_knn": 0, "k_rrf": 60, "rerank": false}'
5. Structured Incident Playbooks¶
This section provides direct links to detailed runbooks for common operational incidents affecting the AI-Box service.
-
:material-timer-sand-alert: High Search Latency
Playbook for when the
/retrieveendpoint is responding slowly. -
ML Model Loading Failure
Playbook for when the service fails to start due to an error in loading an ML model.
-
Downstream Dependency Failure
Playbook for handling errors when the AI-Box cannot connect to OpenSearch or the main API.
-
Reranker Timeout
Playbook for when the reranking step in the retrieval process is causing timeouts.