Skip to content

Commit b8fcb45

Browse files
authored
Add BigQueryKmsKey Dataflow sample (GoogleCloudPlatform#1559)
* Add BigQueryKmsKey Dataflow sample * Updated READMEs to use the "Getting started guide"
1 parent 3e8ab8a commit b8fcb45

File tree

7 files changed

+619
-108
lines changed

7 files changed

+619
-108
lines changed

dataflow/README.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Getting started with Google Cloud Dataflow
2+
3+
[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/editor)
4+
5+
[Apache Beam](https://beam.apache.org/)
6+
is an open source, unified model for defining both batch and streaming data-parallel processing pipelines.
7+
This guides you through all the steps needed to run an Apache Beam pipeline in the
8+
[Google Cloud Dataflow](https://cloud.google.com/dataflow) runner.
9+
10+
## Setting up your Google Cloud project
11+
12+
The following instructions help you prepare your Google Cloud project.
13+
14+
1. Install the [Cloud SDK](https://cloud.google.com/sdk/docs/).
15+
> *Note:* This is not required in
16+
> [Cloud Shell](https://console.cloud.google.com/cloudshell/editor)
17+
> since it already has the Cloud SDK pre-installed.
18+
19+
1. Create a new Google Cloud project via the
20+
[*New Project* page](https://console.cloud.google.com/projectcreate),
21+
or via the `gcloud` command line tool.
22+
23+
```sh
24+
export PROJECT=your-google-cloud-project-id
25+
gcloud projects create $PROJECT
26+
```
27+
28+
1. Setup the Cloud SDK to your GCP project.
29+
30+
```sh
31+
gcloud init
32+
```
33+
34+
1. [Enable billing](https://cloud.google.com/billing/docs/how-to/modify-project).
35+
36+
1. [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataflow,compute_component,storage_component,storage_api,logging,cloudresourcemanager.googleapis.com,iam.googleapis.com):
37+
Dataflow, Compute Engine, Cloud Storage, Cloud Storage JSON,
38+
Stackdriver Logging, Cloud Resource Manager, and IAM API.
39+
40+
1. Create a service account JSON key via the
41+
[*Create service account key* page](https://console.cloud.google.com/apis/credentials/serviceaccountkey),
42+
or via the `gcloud` command line tool.
43+
Here is how to do it through the *Create service account key* page.
44+
45+
* From the **Service account** list, select **New service account**.
46+
* In the **Service account name** field, enter a name.
47+
* From the **Role** list, select **Project > Owner** **(*)**.
48+
* Click **Create**. A JSON file that contains your key downloads to your computer.
49+
50+
Alternatively, you can use `gcloud` through the command line.
51+
52+
```sh
53+
export PROJECT=$(gcloud config get-value project)
54+
export SA_NAME=samples
55+
export IAM_ACCOUNT=$SA_NAME@$PROJECT.iam.gserviceaccount.com
56+
57+
# Create the service account.
58+
gcloud iam service-accounts create $SA_NAME --display-name $SA_NAME
59+
60+
# Set the role to Project Owner (*).
61+
gcloud projects add-iam-policy-binding $PROJECT \
62+
--member serviceAccount:$IAM_ACCOUNT \
63+
--role roles/owner
64+
65+
# Create a JSON file with the service account credentials.
66+
gcloud iam service-accounts keys create path/to/your/credentials.json \
67+
--iam-account=$IAM_ACCOUNT
68+
```
69+
70+
> **(*)** *Note:* The **Role** field authorizes your service account to access resources.
71+
> You can view and change this field later by using the
72+
> [GCP Console IAM page](https://console.cloud.google.com/iam-admin/iam).
73+
> If you are developing a production app, specify more granular permissions than **Project > Owner**.
74+
> For more information, see
75+
> [Granting roles to service accounts](https://cloud.google.com/iam/docs/granting-roles-to-service-accounts).
76+
77+
For more information, see
78+
[Creating and managing service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts)
79+
80+
1. Set your `GOOGLE_APPLICATION_CREDENTIALS` environment variable to point to your service account key file.
81+
82+
```sh
83+
export GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json
84+
```
85+
86+
## Setting up a Java development environment
87+
88+
The following instructions help you prepare your development environment.
89+
90+
1. Download and install the
91+
[Java Development Kit](https://adoptopenjdk.net/?variant=openjdk11&jvmVariant=openj9).
92+
Verify that the
93+
[JAVA_HOME](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars001.html)
94+
environment variable is set and points to your JDK installation.
95+
96+
```sh
97+
$JAVA_HOME/bin/java --version
98+
```
99+
100+
1. Download and install
101+
[Apache Maven](http://maven.apache.org/download.cgi)
102+
by following the
103+
[Maven installation guide](http://maven.apache.org/install.html)
104+
for your specific operating system.
105+
106+
```sh
107+
mvn --version
108+
```
109+
110+
1. *[optional]* Set up an IDE like
111+
[IntelliJ](https://www.jetbrains.com/idea/),
112+
[VS Code](https://code.visualstudio.com),
113+
[Eclipse](https://www.eclipse.org/ide/).
114+
[NetBeans](https://netbeans.org),
115+
etc.

dataflow/encryption-keys/README.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Using customer-managed encryption keys
2+
3+
[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/editor)
4+
5+
This sample demonstrate how to use
6+
[cryptographic encryption keys](https://cloud.google.com/kms/)
7+
for the I/O connectors in an
8+
[Apache Beam](https://beam.apache.org) pipeline.
9+
For more information, see the
10+
[Using customer-managed encryption keys](https://cloud.google.com/dataflow/docs/guides/customer-managed-encryption-keys)
11+
docs page.
12+
13+
## Before you begin
14+
15+
Follow the
16+
[Getting started with Google Cloud Dataflow](../README.md)
17+
page, and make sure you have a Google Cloud project with billing enabled
18+
and a *service account JSON key* set up in your `GOOGLE_APPLICATION_CREDENTIALS` environment variable.
19+
Additionally, for this sample you need the following:
20+
21+
1. [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=bigquery,cloudkms.googleapis.com):
22+
BigQuery and Cloud KMS API.
23+
24+
1. Create a Cloud Storage bucket.
25+
26+
```sh
27+
export BUCKET=your-gcs-bucket
28+
gsutil mb gs://$BUCKET
29+
```
30+
31+
1. [Create a symmetric key ring](https://cloud.google.com/kms/docs/creating-keys).
32+
For best results, use a [regional location](https://cloud.google.com/kms/docs/locations).
33+
This example uses a `global` key for simplicity.
34+
35+
```sh
36+
export KMS_KEYRING=samples-keyring
37+
export KMS_KEY=samples-key
38+
39+
# Create a key ring.
40+
gcloud kms keyrings create $KMS_KEYRING --location global
41+
42+
# Create a key.
43+
gcloud kms keys create $KMS_KEY --location global \
44+
--keyring $KMS_KEYRING --purpose encryption
45+
```
46+
47+
> *Note:* Although you can destroy the
48+
> [*key version material*](https://cloud.google.com/kms/docs/destroy-restore),
49+
> you [cannot delete keys and key rings](https://cloud.google.com/kms/docs/object-hierarchy#lifetime).
50+
> Key rings and keys do not have billable costs or quota limitations,
51+
> so their continued existence does not impact costs or production limits.
52+
53+
1. Grant Encrypter/Decrypter permissions to the *Dataflow*, *Compute Engine*, and *BigQuery* accounts.
54+
55+
```sh
56+
export PROJECT=$(gcloud config get-value project)
57+
export PROJECT_NUMBER=$(gcloud projects list --filter $PROJECT --format "value(PROJECT_NUMBER)")
58+
59+
# Grant Encrypter/Decrypter permissions to the Dataflow service account.
60+
gcloud projects add-iam-policy-binding $PROJECT \
61+
--member serviceAccount:service-$PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com \
62+
--role roles/cloudkms.cryptoKeyEncrypterDecrypter
63+
64+
# Grant Encrypter/Decrypter permissions to the Compute Engine service account.
65+
gcloud projects add-iam-policy-binding $PROJECT \
66+
--member serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
67+
--role roles/cloudkms.cryptoKeyEncrypterDecrypter
68+
69+
# Grant Encrypter/Decrypter permissions to the BigQuery service account.
70+
gcloud projects add-iam-policy-binding $PROJECT \
71+
--member serviceAccount:bq-$PROJECT_NUMBER@bigquery-encryption.iam.gserviceaccount.com \
72+
--role roles/cloudkms.cryptoKeyEncrypterDecrypter
73+
```
74+
75+
1. Clone the `java-docs-samples` repository.
76+
77+
```sh
78+
git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
79+
```
80+
81+
1. Navigate to the sample code directory.
82+
83+
```sh
84+
cd java-docs-samples/dataflow/encryption-keys
85+
```
86+
87+
## BigQueryKmsKey example
88+
89+
* [BigQueryKmsKey.java](src/main/java/com/example/dataflow/cmek/BigQueryKmsKey.java)
90+
* [pom.xml](pom.xml)
91+
92+
The following sample gets some data from the
93+
[NASA wildfires public BigQuery dataset](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=nasa_wildfire&t=past_week&page=table)
94+
using a customer-managed encryption key, and dump that data into the specified `outputBigQueryTable`
95+
using the same customer-managed encryption key.
96+
97+
Make sure you have the following variables set up:
98+
99+
```sh
100+
# Set the project ID, GCS bucket and KMS key.
101+
export PROJECT=$(gcloud config get-value project)
102+
export BUCKET=your-gcs-bucket
103+
104+
# Set the KMS key ID.
105+
export KMS_KEYRING=samples-keyring
106+
export KMS_KEY=samples-key
107+
export KMS_KEY_ID=$(gcloud kms keys list --location global --keyring $KMS_KEYRING --filter $KMS_KEY --format "value(NAME)")
108+
109+
# Output BigQuery dataset and table name.
110+
export DATASET=samples
111+
export TABLE=dataflow_kms
112+
```
113+
114+
Create the BigQuery dataset where the output table resides.
115+
116+
```sh
117+
# Create the BigQuery dataset.
118+
bq mk --dataset $PROJECT:$DATASET
119+
```
120+
121+
To run the sample using the Cloud Dataflow runner.
122+
123+
```sh
124+
mvn compile exec:java \
125+
-Dexec.mainClass=com.example.dataflow.cmek.BigQueryKmsKey \
126+
-Dexec.args="\
127+
--outputBigQueryTable=$PROJECT:$DATASET.$TABLE \
128+
--kmsKey=$KMS_KEY_ID \
129+
--project=$PROJECT \
130+
--tempLocation=gs://$BUCKET/samples/dataflow/kms/tmp \
131+
--runner=DataflowRunner"
132+
```
133+
134+
> *Note:* To run locally you can omit the `--runner` command line argument and it defaults to the `DirectRunner`.
135+
136+
You can check your submitted Cloud Dataflow jobs in the [GCP Console Dataflow page](https://console.cloud.google.com/dataflow) or by using `gcloud`.
137+
138+
```sh
139+
gcloud dataflow jobs list
140+
```
141+
142+
Finally, check the contents of the BigQuery table.
143+
144+
```sh
145+
bq query --use_legacy_sql=false "SELECT * FROM `$PROJECT.$DATASET.$TABLE`"
146+
```
147+
148+
## Cleanup
149+
150+
To avoid incurring charges to your GCP account for the resources used:
151+
152+
```sh
153+
# Remove only the files created by this sample.
154+
gsutil -m rm -rf "gs://$BUCKET/samples/dataflow/kms"
155+
156+
# [optional] Remove the Cloud Storage bucket.
157+
gsutil rb gs://$BUCKET
158+
159+
# Remove the BigQuery table.
160+
bq rm -f -t $PROJECT:$DATASET.$TABLE
161+
162+
# [optional] Remove the BigQuery dataset and all its tables.
163+
bq rm -rf -d $PROJECT:$DATASET
164+
165+
# Revoke Encrypter/Decrypter permissions to the Dataflow service account.
166+
gcloud projects remove-iam-policy-binding $PROJECT \
167+
--member serviceAccount:service-$PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com \
168+
--role roles/cloudkms.cryptoKeyEncrypterDecrypter
169+
170+
# Revoke Encrypter/Decrypter permissions to the Compute Engine service account.
171+
gcloud projects remove-iam-policy-binding $PROJECT \
172+
--member serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
173+
--role roles/cloudkms.cryptoKeyEncrypterDecrypter
174+
175+
# Revoke Encrypter/Decrypter permissions to the BigQuery service account.
176+
gcloud projects remove-iam-policy-binding $PROJECT \
177+
--member serviceAccount:bq-$PROJECT_NUMBER@bigquery-encryption.iam.gserviceaccount.com \
178+
--role roles/cloudkms.cryptoKeyEncrypterDecrypter
179+
```

0 commit comments

Comments
 (0)