Closed
Description
pyarrow is an optional dependency of google-cloud-bigquery, but it's made mandatory by python-bigquery-sqlalchemy.
As pyarrow is quite large on disk — 100 MB on x86_64 Linux — I don't want to install it when it's unused in my application (AFAICT I don't need google-cloud-bigquery-storage
either but that's not huge).
I suggest:
- Removing the direct dependencies on
pyarrow
andgoogle-cloud-bigquery-storage
- Add a
bqstorage
extra that depends ongoogle-cloud-bigquery[bqstorage]
. That'll respect upstream's version bounds without introducing local bounds that could cause conflicts for users. - Document that users wanting improved performance with large result sets should install
bigquery-sqlalchemy[bqstorage]
.
There's an existing PR at #470 but it looks like it has stalled out, so I'm filing this issue to provide a blueprint for someone who's able to do this work.