Skip to content

DLP => v2 #1056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 20, 2018
Merged

DLP => v2 #1056

merged 25 commits into from
Mar 20, 2018

Conversation

jabubake
Copy link
Contributor

@jabubake jabubake commented Mar 9, 2018

(WIP)
Pending tasks:

@jabubake jabubake added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 9, 2018
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 9, 2018
jabubake and others added 2 commits March 13, 2018 10:03
(WIP)
Pending tasks:
-> Update / Add Tests
-> Region tag / comment review
-> Submit for code review + fixes
-> Merge once google-cloud-java PR :
googleapis/google-cloud-java#2958 is
released
@jabubake jabubake requested a review from ace-n March 15, 2018 22:52
/**
* [START dlp_deidentify_mask]
*
* <p>Deidentify a string by masking sensitive information with a character using the DLP API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is the <p> intentional?

If not, please apply any required changes throughout this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.setValue(string)
ByteContentItem byteContentItem =
ByteContentItem.newBuilder()
.setType(ByteContentItem.BytesType.TEXT_UTF8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is this necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// (Optional) The encrypted ('wrapped') AES-256 key to use when shifting dates
// This key should be encrypted using the Cloud KMS key specified above
// If this is specified, then 'keyName' and 'contextFieldId' must also be set
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/const/String/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// If this is specified, then 'keyName' and 'contextFieldId' must also be set
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// If contextFieldId , keyName or wrappedKey is set : all three arguments must be valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - remove spaces before , and : characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


while ((line = br.readLine()) != null) {
// convert csv rows to Table.Row
rows.add(convertCsvRowToTableRow(line));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include this in our snippet - IMO it's not intuitive enough to leave the exercise to the reader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind - it is included in the snippet. :)

assertTrue(Pattern.compile(
"Most common value occurs \\d time\\(s\\)").matcher(output).find());

assertThat(output, containsString("Most common value occurs"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we check for the full string here? (e.g. to ensure that a digit is printed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private ByteArrayOutputStream bout;
private PrintStream out;

private String topicId = "dlp-tests";
private String subscriptionId = "dlp-test";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: should these be based on environment variables?

(I don't know how the Java samples are tested - so perhaps I'm wrong about this. If you do change this, apply those changes throughout this PR.)

public void testCreateInspectTemplate() throws Exception {
Templates.main(new String[] {
"-c",
"-displayName", String.format("test-name-%s", new Date()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use some sort of UUID, instead of the current time (which may not be globally unique)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

@Test
public void testCreateInspectTemplate() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question/thought: would it be better to combine the {Create, List, Delete}Template tests into one? That way the test for DeleteTemplate (which depends on templates having been created) will only run if CreateTemplate passes

assertFalse(output.contains(text));
assertTrue(Pattern.compile("My SSN is \\w+").matcher(output).find());
assertThat(
output, containsString("Successfully saved date-shift output to: results.temp.csv"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check the contents of results.temp.csv against a file containing the desired results?

@ace-n
Copy link
Contributor

ace-n commented Mar 19, 2018

@jabubake you were right - that was a lot of Java code 😄 Nice work 👍

(Mostly nits, but there are some bigger things we may want to consider changing)

@ace-n ace-n requested a review from lesv March 20, 2018 01:12
ace-n
ace-n previously approved these changes Mar 20, 2018
@ace-n
Copy link
Contributor

ace-n commented Mar 20, 2018

LGTM - Les, would you mind taking a look (in case Jisha is busy)?

Copy link
Contributor

@lesv lesv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL - So, in general, I'd prefer smaller snippets, but it's probably ok here.

For testing, at least in Java, it's often better to snippet inside the method, and let the method return a result you check, rather that always looking at the printf results.

Ok to merge after you complete my comments, but please mention me so that I look at the last few commits.


/**
* [START dlp_inspect_gcs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be before /**

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
} else {
System.out.println("No findings.");
}
} catch (Exception e) {
e.printStackTrace();
System.out.println("Error in inspectGCSFileAsync: " + e.getMessage());
}
// [END dlp_inspect_gcs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be one line down ?? after final } since we aren't hiding a return to the test app.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/**
* [START dlp_inspect_bigquery]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b4 /**

Copy link
Contributor

@ace-n ace-n Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
return tableRowBuilder.build();
}
// [END dlp_deidentify_date_shift]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - one long snippet. Should it be shorter?

.build();
subscriber.startAsync();

// wait for job completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you show them this without commenting on what you might do in a production system?

Copy link
Contributor

@ace-n ace-n Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with the following:

// Wait for job completion semi-synchronously
// For long jobs, consider using a truly asynchronous execution model such as Cloud Functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/**
* [START dlp_l_diversity]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be outside the /**

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// wait for job completion
try{
done.get(1, TimeUnit.MINUTES);
Thread.sleep(500);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -384,10 +557,9 @@ private static void calculateLDiversity(
// [END dlp_l_diversity]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move outside final }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

public class Templates {

/**
* [START dlp_create_inspect_template]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move outside /**

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Ann,01/01/1970,4532908762519852,07/21/1996
James,03/06/1988,4301261899725540,04/09/2001
Dan,08/14/1945,4620761856015295,11/15/2011
Laura,11/03/1992,4564981067258901,01/04/2017
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a newline here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ace-n
Copy link
Contributor

ace-n commented Mar 20, 2018

@simonz130 expressed similar feedback about .NET - perhaps (when we're not concerned with launches) we can chat about whether/how we want to redesign our tests.

// For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
try{
done.get(1, TimeUnit.MINUTES);
Thread.sleep(500); // Wait for the job to become available
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid Thread.sleep in the sample. the done blocks anyway.

// For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
try{
done.get(1, TimeUnit.MINUTES);
Thread.sleep(500); // Wait for the job to become available
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not comfortable with this, since this sleep amount seems aribtrary

@ace-n
Copy link
Contributor

ace-n commented Mar 20, 2018

@lesv FYI Jisha is wrapping this up - feel free to take a look in the future.

@jabubake jabubake removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 20, 2018
@jabubake
Copy link
Contributor Author

Tests pass now. @lesv do take a final pass and merge if you approve.
FYI, @ace-n

@lesv
Copy link
Contributor

lesv commented Mar 20, 2018

Well, other than there being a merge conflict that I resolved... (let's see that tests work.)

@lesv
Copy link
Contributor

lesv commented Mar 20, 2018

@dpebot merge when green

@dpebot
Copy link
Contributor

dpebot commented Mar 20, 2018

Okay! I'll merge when all statuses are green and all reviewers approve.

@dpebot dpebot self-assigned this Mar 20, 2018
Copy link
Contributor

@ace-n ace-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMing to unblock @dpebot

@lesv lesv merged commit d332e2e into master Mar 20, 2018
@lesv lesv deleted the dlp-v2 branch March 20, 2018 20:25
Sita04 pushed a commit that referenced this pull request Feb 7, 2023
* DLP => v2
(WIP)
Pending tasks:
-> Update / Add Tests
-> Region tag / comment review
-> Submit for code review + fixes
-> Merge once google-cloud-java PR :
googleapis/google-cloud-java#2958 is
released

* Update to most recent versioning.

* Updated DeIdentification samples and tests.

* Revert pubsub to public version.

* Fix Inspect samples/tests (minus pubsub).

* Updated Jobs and add tests.

* Updated Metadata classes.

* Updated QuickStart tests and samples.

* Updated Redact samples and tests.

* Updated RiskAnalysis.

* Update Template samples.

* Update trigger tests.

* Make Checkstyle Happy Again.

* Fix (and ignore) tests using pubsub.

* Update PR tests to complete all tests before returning results.  (#1065)

* Return results of all tests.

* Use for loop instead of while.

* WIP: Address PR feedback, part 1

* Update deps

* Address PR feedback

* Remove mvn clean verify failure

* Add ReID FPE sample

* Address PR feedback

* Add k-map sample

* checkstyle fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants