Architecture¶

High-Level Overview¶

Service name: sinatools (branded as NLP Lab sidecar).
Purpose: Provide Arabic NLP microservices (currently powered by the SinaTools SDK) with optional dialect enhancements.
Interface: REST over HTTP, OpenAPI v3 spec available at sinatools/openapi.json.
Deployment: Docker Compose service mounting sinatools/app into /app with Uvicorn entrypoint.

Module Layout¶

sinatools/app/
├── main.py              # Entrypoint: loads src.app.create_app()
├── src/
│   ├── app.py           # FastAPI factory, router registration, warmup logic
│   ├── config.py        # Paths to datasets and shared constants
│   ├── routers/         # Feature-specific FastAPI routers
│   ├── services/        # Cached SDK loaders, Nabra helpers
│   └── utils.py         # Normalisation, tokenisation, similarity helpers
└── tests/
    └── test_api.py      # FastAPI TestClient coverage for all endpoints

Components¶

Component	Description
Routers	Each router (`ner`, `wsd`, `morph`, `dialect`, `relation`, `health`) defines request models, response schemas, and error handling.
Services	Thin wrappers exposing cached SDK functions. Nabra service handles CSV ingestion and glossary lexicon building.
Datasets	Nabra CSVs mounted under `/app/Nabra`. Glosses for WSD pulled from `sinatools.wsd.glosses_dic`.
SDK	SinaTools Python package bundles Wojood, Salma, Alma, Hadath models.
Entry / Warmup	`src/app.py` optionally preloads models based on `SINA_WARM`.

Data Flow¶

Client calls a REST endpoint.
Router validates payload via Pydantic models.
Router invokes the corresponding service (cached SDK loader or Nabra lookup).
Response is normalised into JSON (adds metadata like sense_url, lemma_forms, match_type).
FastAPI returns the response; OpenAPI schema and docs auto-update.

Dependencies & Integrations¶

No direct datastore; all data is in-memory or local files.
Downstream services consume these APIs for content tagging, search ranking, and UI tooltips.
Observability hooks (metrics/logging) can be added later via FastAPI middleware.

Extensibility Points¶

New models: Drop in additional routers/services to integrate future NLP tasks under the same sidecar.
Datasets: Additional dialect corpora can reuse the glossary patterns.
Auth / Rate Limiting: Currently unauthenticated; add fastapi middleware or shared gateway if needed.