انتقل إلى المحتوى

Scraper Service Runbook

For On-Call Engineers

This document is the primary operational playbook for the Scraper Service. It contains standardized procedures for deployment, maintenance, and incident response. Read and execute these steps carefully.


1. Standard Deployment Process

Objective: To safely deploy a new version of the Scraper service to production.

This process assumes the new Docker image (labeeb/scraper:new-version) has already been built and pushed to the container registry by the CI/CD pipeline.

Deployment Checklist

  • 1. Announce Deployment:

    • Notify the team in the appropriate channel (e.g., #ops) that you are beginning a deployment.
  • 2. Place System in Maintenance (if required):

    • If the deployment includes breaking changes to profiles or providers, consider pausing the scheduler via the API.
      # This endpoint is an example; to be implemented
      curl -X POST -H "Authorization: Bearer ..." http://scraper.labeeb.internal/scheduler/pause
      
  • 3. Update Service Configuration:

    • Pull the latest docker-compose.yml or Kubernetes manifest that points to the new image version.
  • 4. Perform Rolling Restart:

    • Execute the rolling update command to deploy the new version with zero downtime.
      # For Docker Compose deployments
      docker compose up -d --no-deps --build scraper
      
  • 5. Verify Deployment:

    • Check that the new container is running and healthy.
      docker compose ps scraper
      curl http://localhost:9001/health
      
    • Tail the logs to ensure the service started without any fatal errors.
      docker compose logs -f scraper
      
  • 6. Announce Completion:

    • Notify the team that the deployment is complete and the service is operational.

2. Incident Response Playbooks

This section contains step-by-step checklists for responding to common alerts and incidents.

Playbook: Ingestion Failures (Upstream API Errors)

  • Alert Trigger: ScraperIngestionFailureRateHigh
  • Symptom: Scraper logs show repeated errors when sending data to the core API (e.g., 4xx or 5xx status codes).

Incident Response Checklist

  • 1. Acknowledge the Alert: Acknowledge the alert in your monitoring system to notify the team you are investigating.

  • 2. Identify the Error: Check the Scraper logs to identify the specific error message and status code.

    docker compose logs -f scraper | grep "ingest_client"
    

  • 3. Triage Based on Status Code:

    • Meaning: The INGEST_TOKEN is incorrect or has expired.
    • Action: Verify that the INGEST_TOKEN in the Scraper's environment matches the one expected by the API service. Update and restart the Scraper if necessary.
    • Meaning: The API is reporting a data conflict (e.g., duplicate external_id with a different content_hash).
    • Action: This is likely a data issue, not a service failure. Investigate the specific article URL in the logs. No immediate action is usually required unless all ingestions are failing with conflicts.
    • Meaning: The upstream API service is unhealthy.
    • Action: This is not a Scraper issue. Escalate to the team responsible for the API service. See the API Service Runbook for its troubleshooting procedures.
  • 4. Resolve the Incident: Once the root cause is identified and fixed, resolve the alert in your monitoring system and document the incident.

Playbook: Profile Validation Errors

  • Alert Trigger: ScraperInvalidProfilesDetected (via CI/CD pipeline or log monitoring)
  • Symptom: Service fails to start, or logs show invalid profiles skipped warnings.

Incident Response Checklist

  • 1. Identify the Invalid Profile: Check the service logs at startup for detailed validation errors.

    docker compose logs scraper
    
    > The log will specify the filename (e.g., profiles/new-source.json) and the reason for failure (e.g., provider: unknown_provider).

  • 2. Correct the Profile:

    • Open the invalid JSON profile.
    • Compare its structure against the official schema defined in scraper/app/data/schemas/profile.schema.json.
    • Fix the error (e.g., correct a typo in the provider name, fix a data type).
  • 3. Reload Profiles without Restarting:

    • Use the /profiles/reload endpoint to apply the fix immediately.
      curl -X POST http://localhost:9001/profiles/reload
      
  • 4. Verify the Fix:

    • Check the logs again to confirm the profile loaded successfully.
    • Call the GET /profiles endpoint to ensure the corrected profile is now listed.

3. Routine Operations & Maintenance

Objective: To perform regular health checks and preventative maintenance on the Scraper service.

Weekly Maintenance Checklist

  • 1. Review Log Volume:

    • Check the disk space consumed by the scraper's logs and any .jsonl output files if write_to_disk is used.
    • Ensure log rotation is configured correctly.
  • 2. Audit Scraping Performance:

    • Review monitoring dashboards for scrape job durations. Identify any providers that are consistently slow or timing out.
    • Consider disabling or refactoring poorly performing providers.
  • 3. Check for New Libraries:

    • Periodically check for updates to key dependencies like FastAPI, requests, and BeautifulSoup to incorporate performance and security improvements.