Skip to content

API Observability Guide

This document provides a guide to the observability signals for the API service. As the central component of the platform, the API's health is critical. Understanding its logs and metrics is essential for troubleshooting.


1. Logging

The API service uses structured logging to provide detailed, queryable insight into its operations. All logs are written to standard output.

How to View Logs

Use the following command to tail the live logs for the API service:

docker compose logs -f api

Key Log Messages

When troubleshooting, look for these specific log messages:

  • "message": "payload too large"
    • Meaning: The Scraper service attempted to send a body over the 1.5 MB limit.
    • Action: The API returns 413 with Retry-After: 30 and an X-Request-ID header. Investigate spikes to ensure clients are backing off; correlate with the logged request_id.
  • "message": "batch too large"

    • Meaning: More than the allowed number of articles were submitted in a single request.
    • Action: The API returns 413 with Retry-After: 30 and an X-Request-ID header. Split batches and monitor for repeated occurrences.
  • "message": "conflict for external_id ..."

    • Meaning: The Scraper sent an article that already exists in the database but with different content, indicating a potential content hash mismatch.
    • Action: This may require manual investigation to determine why the content has changed.
  • Illuminate\Database\QueryException

    • Meaning: A fatal error indicating that the API cannot communicate with the PostgreSQL database.
    • Action: This is a high-severity incident. Escalate to the database administrator immediately.

2. Key Metrics & Dashboards

The API service's performance and health are monitored through a combination of application metrics and queue monitoring.

Laravel Horizon Dashboard

Primary Monitoring Tool: Horizon

The most critical observability tool for the API service is the Laravel Horizon dashboard. Horizon provides a real-time view of the Redis queue, including job throughput, failure rates, and retry statistics.

  • URL: http://localhost/horizon (or your production equivalent)
  • What to Watch:
    • Failed Jobs: A rising number of failed jobs is the primary indicator of a problem with the ingestion or analysis pipeline.
    • Queue Wait Times: High wait times indicate that the queue workers are overloaded or stuck, and may require scaling up the number of worker processes.

Prometheus Metrics

The API exposes a Prometheus-compatible /metrics endpoint.

Locked Down

The endpoint is protected in non-local environments. Set a METRICS_TOKEN env var and present it via the X-Metrics-Token header.

# dev-only example
export METRICS_TOKEN=dev
curl -H "X-Metrics-Token: $METRICS_TOKEN" http://localhost/metrics

In production, use a strong token and restrict network access at the ingress or firewall level.

The token is stored in the service's .env file (METRICS_TOKEN). To rotate without downtime: (1) update the value in your secret store and Prometheus scrape config, (2) redeploy the service to pick up the new token, then (3) remove the old token after verifying scrapes succeed with the new one.

Metrics

Metric Type Labels Units Description
ingest_requests_total Counter outcome requests Total ingest requests by outcome
ingest_body_bytes_total Counter bytes Sum of request body sizes
search_requests_total Counter mode, outcome requests Search requests grouped by retrieval mode
queue_latency_seconds Histogram seconds Time jobs spend waiting in the queue
aibox_rrf_mode Gauge mode 1 Active AI‑Box RRF mode
ai_classify_requests_total Counter task, upstream, status requests Total classify requests by task and upstream
ai_classify_latency_ms Histogram task, upstream milliseconds Latency distribution for classification
ai_classify_failures_total Counter task, reason failures Classification failures grouped by reason

Dashboards & Alerts


3. Health Checks

The API service provides a simple HTTP health check endpoint.

  • Endpoint: /api/v1/health (Note: This may not be exposed publicly and may only be accessible from within the Docker network).
  • Command (from another container):
    curl http://api/api/v1/health
    
  • Success Response: A 200 OK response with a simple JSON body.