-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Adding Iceberg REST Catalog Examples for Dataflow Documentation #10149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding Iceberg REST Catalog Examples for Dataflow Documentation #10149
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @tarun-google, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces new Dataflow documentation examples that illustrate how to interact with Apache Iceberg tables using a REST catalog, specifically leveraging Google Cloud Storage (GCS) for data storage. The changes include examples for both streaming data writes and Change Data Capture (CDC) reads, alongside necessary dependency updates and robust integration tests to validate the new functionalities.
Highlights
- Enhanced Apache Beam Integration: The Apache Beam SDK has been upgraded to version 2.67.0, which enables access to newer features beneficial for table management, such as the 'create database.table if not exists' functionality.
- GCS-backed Iceberg Examples: The
iceberg-gcp
dependency has been added, and two new Java examples (ApacheIcebergRestCatalogStreamingWrite.java
andApacheIcebergCDCRead.java
) have been introduced. These examples demonstrate streaming data operations with Iceberg tables that are stored in Google Cloud Storage (GCS) buckets and accessed via a REST catalog. - Comprehensive Integration Tests: A new integration test,
testApacheIcebergRestCatalog
, has been added toApacheIcebergIT.java
. This test verifies the end-to-end functionality of the new streaming write and CDC read examples, ensuring proper data persistence and metadata creation within GCS.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds new examples for using Apache Iceberg with a REST catalog, specifically for CDC reads and streaming writes. The changes include updating the Beam SDK version, adding the iceberg-gcp
dependency, and providing two new example pipelines with an integration test. My review focuses on improving the robustness of the new test and enhancing the documentation in the new example code. I've identified a critical issue in the integration test where a variable is initialized with a null value, which would cause the test to fail. I've also suggested adding a comment to one of the new examples to warn users about OAuth token expiration, which is a crucial piece of information for running these streaming pipelines for extended periods.
dataflow/snippets/src/test/java/com/example/dataflow/ApacheIcebergIT.java
Outdated
Show resolved
Hide resolved
dataflow/snippets/src/main/java/com/example/dataflow/ApacheIcebergCDCRead.java
Outdated
Show resolved
Hide resolved
Here is the summary of changes. You are about to add 2 region tags.
This comment is generated by snippet-bot.
|
I see there is failure in com.example.dataflow.KafkaReadIT. which is not relevant to the changes. |
Adding Reviewers from dataflow team: @chamikaramj @ahmedabu98 @VeronicaWasson |
Description
Fixes # b/427973623
I have already add complex RESTCatalog examples to apache/beam cookbook. These are simpler versions so that we can drive the Dataflow Documentation
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
pom.xml
parent set to latestshared-configuration
mvn clean verify
requiredmvn -P lint checkstyle:check
requiredmvn -P lint clean compile pmd:cpd-check spotbugs:check
advisory only