Skip to content

title: Runbook: High Ingestion Error Rate description: A playbook for diagnosing and resolving a high rate of 4xx errors at the article ingestion endpoint. icon: material/tray-alert


Runbook: High Ingestion Error Rate

Impact: High - Data Loss

This alert fires when the API service is rejecting a high percentage of incoming requests from the Scraper service. This is a critical issue, as it means that valid data is being fetched by the scraper but is not being saved to the platform, leading to silent data loss.

Triage Checklist (5 Minutes)

Your immediate goal is to identify the type of error and the source of the invalid data.

  1. Identify the HTTP Error Code: Check the API logs to determine the specific 4xx status code being returned. The code will tell you the nature of the failure.

    # Look for POST requests to /api/v1/ingest/articles
    docker compose logs --tail=200 api
    

    • 422 Unprocessable Entity: The request body failed validation.
    • 401 Unauthorized or 403 Forbidden: The INGEST_TOKEN is incorrect.
    • 413 Payload Too Large: The scraper is sending a batch that exceeds the size limit.
  2. Check Scraper Logs for Error Details: The scraper's logs will often contain the full error response from the API, which can include detailed validation messages.

    docker compose logs --tail=100 scraper | grep "Ingest batch failed"
    

  3. Isolate the Problematic Scraper Profile: If the errors seem related to content, they are likely coming from a single, misconfigured scraper profile. The API logs may not have this context, but the scraper logs might. Correlate timestamps to identify the source.


Remediation Playbooks

Based on the HTTP status code you identified, select the appropriate playbook.

Symptom: The API is returning 422 Unprocessable Entity errors. This means the scraper is sending data that violates the API's contract.

  1. Find the Validation Error: The API's log entry for the 422 response should contain a JSON object detailing which field failed validation and why (e.g., "title": ["The title field is required."]).

  2. Identify the Root Cause in the Scraper: This error is almost always caused by a bug or a recent change in a specific scraper provider. The provider is likely failing to extract a required field or is extracting data in an incorrect format.

  3. Implement a Fix in the Scraper: The fix must be implemented in the relevant provider file within the scraper/app/scraping/providers/ directory. You may need to add better error handling or adjust a CSS selector.

  4. Deploy the Scraper Fix: A code change in the scraper requires a new Docker image to be built and deployed.

  5. Temporarily Disable the Failing Profile: While a fix is being developed, the safest immediate action is to disable the failing profile in the scraper to stop it from sending invalid data. Follow the procedure in the Profile Failures Runbook.

Symptom: The API is returning 401 Unauthorized or 403 Forbidden errors.

  1. Verify INGEST_TOKEN: This error means the INGEST_TOKEN environment variable in the scraper service's configuration does not match the one expected by the api service.

  2. Correct the Environment Variable: Ensure the INGEST_TOKEN value is identical in the .env files for both services.

  3. Restart the Scraper Service: Restart the scraper container to ensure it picks up the corrected environment variable.

    docker compose restart scraper
    

Symptom: The API is returning 413 Payload Too Large errors.

  1. Check Scraper Batch Size: The scraper's INGEST_BATCH_SIZE environment variable may be set too high.

  2. Reduce Batch Size: Lower the value of INGEST_BATCH_SIZE in the scraper's .env file and restart the service.

  3. Check API Limits (if necessary): If reducing the batch size is not desirable, you can increase the limits on the API side by adjusting INGEST_MAX_BODY_BYTES in the API's .env file, but this should be done with caution as it can impact performance.


Post-Incident Actions

  • Strengthen Contract Testing: Implement automated contract tests between the scraper and the API to catch validation issues in CI before they reach production.
  • Improve API Error Logging: Enhance the API's exception handler to always log the full validation error details, making it easier to diagnose which fields are failing.