Requirements¶

Overview¶

The NLP Lab sidecar is our deployable shell for Arabic NLP capabilities. Today it wraps the SinaTools SDK models (NER, WSD, morphology, relation extraction) plus the offline Nabra dialect corpus utilities, and it will host additional models as they come online.

Runtime Environment¶

Python: 3.11 (configured via SINA_PYTHON_VERSION).
Base image: Derived from our sinatools/Dockerfile with the SinaTools SDK installed from PyPI.
Process: uvicorn main:app (single process with multiple workers via UVICORN_WORKERS).
CPU / Memory: Align with container defaults; tune UVICORN_WORKERS for throughput.

Required Assets¶

SinaTools SDK models: Pulled during image build, includes Wojood, Salma, Alma, Hadath (collectively our current Arabic NLP stack).
Nabra dataset: CSV files mounted at /app/Nabra (Nabra-dataset.csv, Nabra RowText_sentences.csv). These provide dialect annotations and glosses for the glossary endpoint.
OpenAPI schema: Generated at sinatools/openapi.json for reference and SDK generation.

Configuration¶

Variable	Purpose	Default
`SINA_PORT`	Uvicorn listen port inside the container	`8000`
`SINA_HOST_PORT`	Published host port	`8000`
`SINA_WARM`	`0` (lazy) or `all` (preload models on startup)	`0`
`SINA_ENABLE_CORS`	Enable permissive CORS during development	`false`
`UVICORN_WORKERS`	Worker count for Uvicorn	`2`
`SINA_VERSION`	SDK version tag (for metadata only)	`0.1.36`

Local Development¶

Mount sinatools/app into the container (already configured in docker-compose.override.yml).
Install dev dependencies (pytest, requests) inside the container if deeper testing is needed.
Run pytest /app/tests/test_api.py after changes.

External Integrations¶

Platform: Other services call the REST endpoints; no direct DB or message queue dependency.
Monitoring: TBD (hook into existing Prometheus scrape once we expose metrics).
Feature toggles: API parameters handle optional metadata (include_gloss, include_lemma). No runtime config yet.