Runbook — NLP Lab Sidecar¶

Use during incidents

The checklists below focus on stabilising the NLP Lab sidecar, which currently hosts the SinaTools SDK models and related datasets. Copy/paste commands as-is. Reference the linked sections for deeper analysis.

First 5 Minutes¶

# Health probe
curl -fsS ${NLP_LAB_URL:-http://localhost:8000}/health | jq

# Inspect logs
docker logs --tail=100 sinatools

# Quick functional test (NER)
curl -fsS ${NLP_LAB_URL:-http://localhost:8000}/ner \
  -H 'Content-Type: application/json' \
  -d '{"text": "اختبار", "mode": "flat"}' | jq

Decision Tree (High-Level)¶

flowchart TD
  A[API errors or 5xx] --> B{Health endpoint ok?}
  B -- no --> C[Restart sinatools container]:::act
  B -- yes --> D{Specific feature failing?}
  D -- NER/WSD -->> E[Inspect SDK model availability]:::act
  D -- Dialect -->> F[Verify /app/Nabra mount and CSV integrity]:::act
  D -- Morph/Glossary -->> G[Check include_* parameters and logs]:::act
classDef act fill:#e6ffed,stroke:#4CAF50;

Degradation Patterns¶

Model warmup latency: When SINA_WARM=all, initial requests may hang if weights are missing; confirm the SinaTools SDK bundle is present and reinstall if necessary.
Missing gloss data: /wsd?include_gloss=true returning null gloss values indicates glosses_dic failed to load; rebuild image or re-run SDK download script.
Dialect lookup failures: 404 with suggestion indicates the Nabra dataset cannot find a close match. If this happens for known sentences, verify /app/Nabra CSVs and inspect logs for NabraDataMissingError.

Operational Commands¶

Restart Service¶

docker compose restart sinatools

Tail Logs¶

docker logs -f sinatools

Run Tests In-Container¶

docker exec sinatools pytest /app/tests/test_api.py

Regenerate OpenAPI¶

docker exec sinatools python3 -c "from src.app import create_app; import json, pathlib; pathlib.Path('/tmp/openapi.json').write_text(json.dumps(create_app().openapi()))"
docker cp sinatools:/tmp/openapi.json sinatools/openapi.json

Troubleshooting Playbooks¶

5xx Responses:
Check container logs for tracebacks.
Run health checks and a sample request.
If SDK import errors appear, rebuild the image (docker compose build sinatools).
Glossary returning empty matches:
Confirm Nabra-dataset.csv exists and is readable.
Recreate the lexicon cache by bouncing the container.
Verify similarity_threshold is not set too high.
Inconsistent morphology results:
Ensure lemma metadata was requested with include_lemma=true.
Check the logs for ValueError conversions; these were fixed in the modular build—redeploy if old image is in use.

Escalation¶

Primary: NLP Lab on-call (#nlp-lab Slack channel).
Secondary: Platform SRE if container-level issues persist.
Provide docker logs, failing requests, and current compose hash when escalating.