Skip to content

AI-Box Architecture

This document provides a detailed overview of the AI-Box service's internal architecture and its role as the intelligence core of the Labeeb platform.


1. Service Responsibilities Matrix

System-Wide Context

The Labeeb platform is a distributed system. A failure in one service can manifest as a symptom in another. This matrix defines the clear ownership and responsibility of each service, which is the foundation of our incident response process.

Service Tech Core Responsibility Inputs Outputs Depends On
API Laravel/PHP Central gateway, orchestrates jobs, owns PG & OS writes. Ingest batches, client requests. API responses, jobs. PG, Redis, OS, AI-Box.
AI-Box Python/FastAPI Hosts AI models (search, NER, etc.). API jobs/requests. Analysis results (JSON). OS, API (for hydration).
Scraper Python/FastAPI Fetches & normalizes articles from external sources. Profiles, external websites. Ingest batches. API (for ingestion).
Search OpenSearch Provides search capabilities. Indexing requests, search queries. Search results. (None)
Frontend Next.js User interface. User actions. Web pages. API.

2. Internal Architecture & Data Flow

Architectural Principles

The AI-Box is designed as a stateless, specialized service for hosting and executing AI models. Its architecture prioritizes performance and clear separation from the main platform data stores.

  • Model Hosting: Its primary purpose is to abstract away the complexity of running different AI models (for search, NER, etc.) behind a clean REST API.
  • Read-Only Search Access: The AI-Box has direct, read-only access to the OpenSearch cluster to perform high-performance search queries.
  • Hydration via API: It does not have access to the main PostgreSQL database. To enrich search results with metadata (a process called "hydration"), it must call back to the main API service.
  • Stateless by Design: The service holds no persistent state of its own, making it highly scalable and resilient. Any instance can serve any request.

Data Flow Diagram (DFD): Evidence Pack Retrieval

This diagram illustrates the flow for the /retrieve_pack endpoint, which is the most comprehensive use case for the service.

flowchart TD
    subgraph "User / Client"
        C[Client]:::ext
    end

    subgraph "AI-Box Service"
        R[Router]:::svc
        S[Retrieve Service]:::svc
        H[Hydration Logic]:::svc
    end

    subgraph "Downstream Dependencies"
        OS[(OpenSearch)]:::ext
        API[(Labeeb API)]:::ext
    end

    C -- "POST /retrieve_pack" --> R
    R -- "Query" --> S
    S -- "BM25 + kNN Queries" --> OS
    OS -- "Raw Search Hits" --> S
    S -- "Fused & Ranked Hits" --> H
    H -- "Request Metadata for Doc IDs" --> API
    API -- "Full Document Metadata" --> H
    H -- "Hydrated Evidence Pack" --> R
    R -- "Final Response" --> C

    classDef ext fill:#e0f2fe,stroke:#0ea5e9,stroke-width:1px;
    classDef svc fill:#f8fafc,stroke:#64748b,stroke-width:1px;
    classDef store fill:#f0fdf4,stroke:#22c55e,stroke-width:1px;

3. Architectural Decisions (ADRs)

This section documents key architectural decisions for the AI-Box service.

ADR-AIB-15: S1 Check-Worthiness Baseline

  • Status: Accepted (Reversible via S1_MODE flag)

Context: Heuristic scoring could not reliably detect factual claims across languages. We introduce a zero-shot NLI model (XLM-R large) as a multilingual baseline because it covers 100+ languages without task-specific training. It classifies each sentence as checkworthy or not_checkworthy out of the box.

Decision: - Load joeddav/xlm-roberta-large-xnli (mounted at /models/xlm-roberta-large-xnli) via Hugging Face's pipeline. - Batch requests and truncate inputs, computing an is_checkworthy flag against a configurable threshold (S1_THRESHOLD, default 0.55). The default favors recall after spot-checking Arabic and English samples but can be tuned per deployment. - Expose runtime flags (S1_MODE, S1_MODEL_ID, S1_MAX_TOKENS, S1_BATCH_SIZE, S1_DEVICE) with a safe fallback to the legacy heuristic. - Emit Prometheus metrics for requests, latency, and active mode to observe rollout health.

Consequences: - Better recall of claims in Arabic/English without retraining. - Latency on CPU for a batch of eight sentences with XLM-R is roughly 0.1s at p95. Monitor aibox_request_duration_seconds for regressions when switching hardware or models. - Fallback path ensures no downtime if the model is missing or fails to load.