Skip to content

feat: Allow jobs to be run in a different project #1180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 23, 2025

Conversation

withnale
Copy link
Contributor

@withnale withnale commented Mar 21, 2025

EDIT: For clarity modified PR definition to reference billing_project_id instead of job_project_id based on comments below.


Fixes #1181 🦕

This implements the logic to allow jobs to be run in a different project and allows for an explicit separation of jobs and data.

If the billing_project_id engine parameter is specified:

  • the bigquery client will be created with the default project_id=billing_project_id
  • any calls for datasets and tables now explicitly reference project_id which is the data project, specified in the bigquery:// URL.
  • any SQL for the data project must be fully qualified to include the project_id.dataset.table instead of just dataset.table

If billing_project_id is not specified it will just default to project_id from the bigquery:// URL.


Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Copy link

google-cla bot commented Mar 21, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. labels Mar 21, 2025
@withnale withnale force-pushed the job_project_id branch 2 times, most recently from c0befe0 to 6392440 Compare March 21, 2025 12:10
@withnale withnale marked this pull request as ready for review March 21, 2025 12:54
@withnale withnale requested review from a team as code owners March 21, 2025 12:54
@withnale withnale requested a review from tswast March 21, 2025 12:54
@tswast tswast added the kokoro:run Add this label to force Kokoro to re-run the tests. label Mar 21, 2025
@tswast
Copy link
Collaborator

tswast commented Mar 21, 2025

Could you add some tests for this feature?

@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Mar 21, 2025
@withnale
Copy link
Contributor Author

No problem. I'll change it to billing_project_id.

As far as tests go, I'll look into that. I didn't want to write the tests if it was unlikely to be merged.

@withnale
Copy link
Contributor Author

I've believe I've fixed the existing unit tests and added one that is billing_project_id aware, but not sure how to wire in any cursor/bqclient based tests, since it would mean making the existing mocks project aware. Any suggestions?

@Linchin Linchin added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 27, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 27, 2025
@Linchin
Copy link
Contributor

Linchin commented Mar 27, 2025

I've believe I've fixed the existing unit tests and added one that is billing_project_id aware, but not sure how to wire in any cursor/bqclient based tests, since it would mean making the existing mocks project aware. Any suggestions?

We can change the mock dbapi to take project_id but with a default value:

    def list_datasets(self, project="myproject"):
        return [
            google.cloud.bigquery.Dataset(f"{project}.mydataset"),
            google.cloud.bigquery.Dataset(f"{project}.yourdataset"),
        ]

@Linchin
Copy link
Contributor

Linchin commented Mar 27, 2025

I think this change is breaking the system tests, could you fix that too? Also it would be nice if we could add a system test for this change. You can query bigquery public dataset so it's in a different project.

@alvarowolfx alvarowolfx assigned Linchin and unassigned alvarowolfx Mar 31, 2025
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Apr 6, 2025
@withnale
Copy link
Contributor Author

withnale commented Apr 7, 2025

Ok. I've spent quite a bit of time looking at the system tests for the past few days and was struggling to get the tests working as I wanted them to, covering all of the different combinations. I've created a test_sqlalchemy_bigquery_remote.py file to separate these out.

In short, I wanted to be confident that the library would function correctly with all of the URL/table scenarios listed below.

REMOTE_TESTCASES = [
        ("bigquery://bigquery-public-data", "bigquery-public-data.fda_drug.drug_enforcement"),
        ("bigquery://bigquery-public-data", "fda_drug.drug_enforcement"),
        ("bigquery://bigquery-public-data/fda_drug", "bigquery-public-data.fda_drug.drug_enforcement"),
        ("bigquery://bigquery-public-data/fda_drug", "drug_enforcement"),
        ("bigquery://bigquery-public-data/fda_drug", "fda_drug.drug_enforcement"),
    ]

I was especially trying to get the second entry to work. As part of this I dug deeper into this library and the native bigquery library and found the dataset_project_id connection option, which did exactly what was required. With the combination of default dataset and default project logic it makes the change much cleaner.

There is now a matrix of project settings, since depending what is specified it will use the project from the URL, the billing_project_id field or the default from authentication. I'm not sure if there are any tests right now that cover all three being different. If you feel these are necessary can you outline how they might look?

I believe all of the pre-existing unit tests and system tests should now be working.

On a related note... I'm not that familiar with bigquery-public-data datasets and especially how often they change. The existing query I am using in the above tests is just asserting against the values in a specific row of fda_drug.drug_enforcement - do you have an suggestions about how to make this robust/simple?

@Linchin Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 21, 2025
@Linchin Linchin changed the title Allow jobs to be run in a different project feat: Allow jobs to be run in a different project Apr 21, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 21, 2025
@Linchin Linchin added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 22, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 22, 2025
@Linchin Linchin added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 22, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 22, 2025
@Linchin Linchin added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 22, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Apr 22, 2025
@Linchin Linchin added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 23, 2025
@Linchin Linchin self-requested a review April 23, 2025 17:07
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 23, 2025
@Linchin Linchin enabled auto-merge (squash) April 23, 2025 17:07
@Linchin Linchin merged commit eea4994 into googleapis:main Apr 23, 2025
12 checks passed
@withnale
Copy link
Contributor Author

Hello. Do you know when you're likely to cut a new tagged release?

@Linchin
Copy link
Contributor

Linchin commented Apr 28, 2025

Thanks for asking! I will make a release now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow jobs to be run in a different project
5 participants