This library is for scraping data from CSV style files, temporally, into MariaDB.
Main features are:
- Uploading data from strongly-typed Polars DataFrames.
- Querying data into Polars DataFrames, with column types inferred from the database schema.
- A scrape specification that:
- Defines pipelines for typing, enriching, and normalizing data before uploading.
- Allows construction of the 'as-of' time from file attributes or as a function of the input columns.
- Catalogs the history of scrape inputs to prevent duplication.
- Supports per-file transactional scraping (either the processing for a file succeeds, or the transaction is rolled back).
- Create a virtual environment:
python3 -m venv .venv
source .venv/bin/activate
- Install development dependencies:
poetry install --with dev
- Run tests:
poetry run pytest
- Make docs. The documentation will be generated in the
docs/_build/html
directory:
cd docs && poetry run make html
This project follows the following code style guidelines:
- Use type hints for all function parameters and return values
- Follow PEP 8 style guide
- Use Google-style docstrings
- Keep functions focused and single-purpose
- Write comprehensive tests for new features
Run make check
to check the code style.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the terms specified in the LICENSE file.