انتقل إلى المحتوى

Scraper Service Dependencies

This document provides a categorized inventory of the Python dependencies required to run the Scraper service. Understanding these dependencies is critical for security scanning, performance monitoring, and troubleshooting.

Source of Truth

The canonical list of production dependencies is maintained in scraper/requirements.txt. Development and CI-related dependencies are in scraper/requirements-dev.txt.


Production Dependencies

These packages are required for the service to run in a production environment.

Core Application & API

Package Version Core Responsibility
fastapi >=0.111.0 The primary web framework for building the API.
uvicorn >=0.30.0 The high-performance ASGI server that runs the FastAPI application.
pydantic >=2.5.0 Used for all data modeling, validation, and settings management.
python-dotenv >=1.0.1 Manages environment variables by loading them from .env files.

Scraping & HTTP Clients

Package Version Core Responsibility
requests >=2.31.0 The primary HTTP client for making requests to external websites.
httpx >=0.27.0 An alternative, modern HTTP client.
tenacity >=8.3.0 Provides robust retry logic for network requests, crucial for handling transient failures.

Parsing & Content Normalization

Package Version Core Responsibility
beautifulsoup4 >=4.12.3 The main library for parsing HTML and XML content.
lxml >=5.0 A high-performance XML and HTML parser used by BeautifulSoup.
feedparser >=6.0.11 The specialized library for parsing RSS and Atom feeds.
newspaper3k >=0.2.8 Used for advanced article extraction and content cleaning.
dateparser >=1.2.0 Parses human-readable date strings from various languages into standard datetime objects.
python-dateutil >=2.9.0 Provides powerful extensions to the standard datetime module.

Scheduling & Validation

Package Version Core Responsibility
APScheduler >=3.10.4 The engine for the built-in cron-based job scheduler.
jsonschema >=4.22.0 Validates the structure and content of the JSON scraping profiles.

Development & Testing Dependencies

These packages are required for running local tests and CI/CD pipelines. They are defined in scraper/requirements-dev.txt.

  • pytest: The core framework for running unit and integration tests.
  • pytest-cov: A pytest plugin for measuring code coverage.
  • ruff: An extremely fast Python linter and code formatter.
  • black: The uncompromising code formatter to ensure consistent style.
  • mypy: The static type checker for ensuring type safety.
  • openapi-spec-validator: Validates the service's OpenAPI specification file.