Skip to content

pyarrow dependency should be optional #743

Closed
@twm

Description

@twm

pyarrow is an optional dependency of google-cloud-bigquery, but it's made mandatory by python-bigquery-sqlalchemy.

As pyarrow is quite large on disk — 100 MB on x86_64 Linux — I don't want to install it when it's unused in my application (AFAICT I don't need google-cloud-bigquery-storage either but that's not huge).

I suggest:

  1. Removing the direct dependencies on pyarrow and google-cloud-bigquery-storage
  2. Add a bqstorage extra that depends on google-cloud-bigquery[bqstorage]. That'll respect upstream's version bounds without introducing local bounds that could cause conflicts for users.
  3. Document that users wanting improved performance with large result sets should install bigquery-sqlalchemy[bqstorage].

There's an existing PR at #470 but it looks like it has stalled out, so I'm filing this issue to provide a blueprint for someone who's able to do this work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigqueryIssues related to the googleapis/python-bigquery-sqlalchemy API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions