AI-Box Architecture¶
This document provides a detailed overview of the AI-Box service's internal architecture and its role as the intelligence core of the Labeeb platform.
1. Service Responsibilities Matrix¶
System-Wide Context
The Labeeb platform is a distributed system. A failure in one service can manifest as a symptom in another. This matrix defines the clear ownership and responsibility of each service, which is the foundation of our incident response process.
| Service | Tech | Core Responsibility | Inputs | Outputs | Depends On |
|---|---|---|---|---|---|
| API | Laravel/PHP | Central gateway, orchestrates jobs, owns PG & OS writes. | Ingest batches, client requests. | API responses, jobs. | PG, Redis, OS, AI-Box. |
| AI-Box | Python/FastAPI | Hosts AI models (search, NER, etc.). | API jobs/requests. | Analysis results (JSON). | OS, API (for hydration). |
| Scraper | Python/FastAPI | Fetches & normalizes articles from external sources. | Profiles, external websites. | Ingest batches. | API (for ingestion). |
| Search | OpenSearch | Provides search capabilities. | Indexing requests, search queries. | Search results. | (None) |
| Frontend | Next.js | User interface. | User actions. | Web pages. | API. |
2. Internal Architecture & Data Flow¶
Architectural Principles
The AI-Box is designed as a stateless, specialized service for hosting and executing AI models. Its architecture prioritizes performance and clear separation from the main platform data stores.
- Model Hosting: Its primary purpose is to abstract away the complexity of running different AI models (for search, NER, etc.) behind a clean REST API.
- Read-Only Search Access: The AI-Box has direct, read-only access to the OpenSearch cluster to perform high-performance search queries.
- Hydration via API: It does not have access to the main PostgreSQL database. To enrich search results with metadata (a process called "hydration"), it must call back to the main API service.
- Stateless by Design: The service holds no persistent state of its own, making it highly scalable and resilient. Any instance can serve any request.
Data Flow Diagram (DFD): Evidence Pack Retrieval¶
This diagram illustrates the flow for the /retrieve_pack endpoint, which is the most comprehensive use case for the service.
flowchart TD
subgraph "User / Client"
C[Client]:::ext
end
subgraph "AI-Box Service"
R[Router]:::svc
S[Retrieve Service]:::svc
H[Hydration Logic]:::svc
end
subgraph "Downstream Dependencies"
OS[(OpenSearch)]:::ext
API[(Labeeb API)]:::ext
end
C -- "POST /retrieve_pack" --> R
R -- "Query" --> S
S -- "BM25 + kNN Queries" --> OS
OS -- "Raw Search Hits" --> S
S -- "Fused & Ranked Hits" --> H
H -- "Request Metadata for Doc IDs" --> API
API -- "Full Document Metadata" --> H
H -- "Hydrated Evidence Pack" --> R
R -- "Final Response" --> C
classDef ext fill:#e0f2fe,stroke:#0ea5e9,stroke-width:1px;
classDef svc fill:#f8fafc,stroke:#64748b,stroke-width:1px;
classDef store fill:#f0fdf4,stroke:#22c55e,stroke-width:1px;
3. Architectural Decisions (ADRs)¶
This section documents key architectural decisions for the AI-Box service.
ADR-AIB-15: S1 Check-Worthiness Baseline¶
- Status: Accepted (Reversible via
S1_MODEflag)
Context:
Heuristic scoring could not reliably detect factual claims across languages. We introduce a zero-shot NLI model (XLM-R large) as a multilingual baseline because it covers 100+ languages without task-specific training. It classifies each sentence as checkworthy or not_checkworthy out of the box.
Decision:
- Load joeddav/xlm-roberta-large-xnli (mounted at /models/xlm-roberta-large-xnli) via Hugging Face's pipeline.
- Batch requests and truncate inputs, computing an is_checkworthy flag against a configurable threshold (S1_THRESHOLD, default 0.55). The default favors recall after spot-checking Arabic and English samples but can be tuned per deployment.
- Expose runtime flags (S1_MODE, S1_MODEL_ID, S1_MAX_TOKENS, S1_BATCH_SIZE, S1_DEVICE) with a safe fallback to the legacy heuristic.
- Emit Prometheus metrics for requests, latency, and active mode to observe rollout health.
Consequences:
- Better recall of claims in Arabic/English without retraining.
- Latency on CPU for a batch of eight sentences with XLM-R is roughly 0.1s at p95. Monitor aibox_request_duration_seconds for regressions when switching hardware or models.
- Fallback path ensures no downtime if the model is missing or fails to load.