Skip to content

Architecture

High-Level Overview

  • Service name: sinatools (branded as NLP Lab sidecar).
  • Purpose: Provide Arabic NLP microservices (currently powered by the SinaTools SDK) with optional dialect enhancements.
  • Interface: REST over HTTP, OpenAPI v3 spec available at sinatools/openapi.json.
  • Deployment: Docker Compose service mounting sinatools/app into /app with Uvicorn entrypoint.

Module Layout

sinatools/app/
├── main.py              # Entrypoint: loads src.app.create_app()
├── src/
│   ├── app.py           # FastAPI factory, router registration, warmup logic
│   ├── config.py        # Paths to datasets and shared constants
│   ├── routers/         # Feature-specific FastAPI routers
│   ├── services/        # Cached SDK loaders, Nabra helpers
│   └── utils.py         # Normalisation, tokenisation, similarity helpers
└── tests/
    └── test_api.py      # FastAPI TestClient coverage for all endpoints

Components

Component Description
Routers Each router (ner, wsd, morph, dialect, relation, health) defines request models, response schemas, and error handling.
Services Thin wrappers exposing cached SDK functions. Nabra service handles CSV ingestion and glossary lexicon building.
Datasets Nabra CSVs mounted under /app/Nabra. Glosses for WSD pulled from sinatools.wsd.glosses_dic.
SDK SinaTools Python package bundles Wojood, Salma, Alma, Hadath models.
Entry / Warmup src/app.py optionally preloads models based on SINA_WARM.

Data Flow

  1. Client calls a REST endpoint.
  2. Router validates payload via Pydantic models.
  3. Router invokes the corresponding service (cached SDK loader or Nabra lookup).
  4. Response is normalised into JSON (adds metadata like sense_url, lemma_forms, match_type).
  5. FastAPI returns the response; OpenAPI schema and docs auto-update.

Dependencies & Integrations

  • No direct datastore; all data is in-memory or local files.
  • Downstream services consume these APIs for content tagging, search ranking, and UI tooltips.
  • Observability hooks (metrics/logging) can be added later via FastAPI middleware.

Extensibility Points

  • New models: Drop in additional routers/services to integrate future NLP tasks under the same sidecar.
  • Datasets: Additional dialect corpora can reuse the glossary patterns.
  • Auth / Rate Limiting: Currently unauthenticated; add fastapi middleware or shared gateway if needed.