انتقل إلى المحتوى

Requirements

Overview

The NLP Lab sidecar is our deployable shell for Arabic NLP capabilities. Today it wraps the SinaTools SDK models (NER, WSD, morphology, relation extraction) plus the offline Nabra dialect corpus utilities, and it will host additional models as they come online.

Runtime Environment

  • Python: 3.11 (configured via SINA_PYTHON_VERSION).
  • Base image: Derived from our sinatools/Dockerfile with the SinaTools SDK installed from PyPI.
  • Process: uvicorn main:app (single process with multiple workers via UVICORN_WORKERS).
  • CPU / Memory: Align with container defaults; tune UVICORN_WORKERS for throughput.

Required Assets

  • SinaTools SDK models: Pulled during image build, includes Wojood, Salma, Alma, Hadath (collectively our current Arabic NLP stack).
  • Nabra dataset: CSV files mounted at /app/Nabra (Nabra-dataset.csv, Nabra RowText_sentences.csv). These provide dialect annotations and glosses for the glossary endpoint.
  • OpenAPI schema: Generated at sinatools/openapi.json for reference and SDK generation.

Configuration

Variable Purpose Default
SINA_PORT Uvicorn listen port inside the container 8000
SINA_HOST_PORT Published host port 8000
SINA_WARM 0 (lazy) or all (preload models on startup) 0
SINA_ENABLE_CORS Enable permissive CORS during development false
UVICORN_WORKERS Worker count for Uvicorn 2
SINA_VERSION SDK version tag (for metadata only) 0.1.36

Local Development

  • Mount sinatools/app into the container (already configured in docker-compose.override.yml).
  • Install dev dependencies (pytest, requests) inside the container if deeper testing is needed.
  • Run pytest /app/tests/test_api.py after changes.

External Integrations

  • Platform: Other services call the REST endpoints; no direct DB or message queue dependency.
  • Monitoring: TBD (hook into existing Prometheus scrape once we expose metrics).
  • Feature toggles: API parameters handle optional metadata (include_gloss, include_lemma). No runtime config yet.