-
Notifications
You must be signed in to change notification settings - Fork 6.5k
add Risk samples #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Risk samples #1411
Conversation
dlp/risk.py
Outdated
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""Sample app that uses the Data Loss Prevent API to """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is unfinished. Maybe "to perform risk anaylsis"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dlp/risk.py
Outdated
def numerical_risk_analysis(project, table_project_id, dataset_id, table_id, | ||
column_name, topic_id, subscription_id, | ||
timeout=180): | ||
"""Uses the Data Loss Prevention API to computes risk metrics of a column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to compute risk metrics"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dlp/risk.py
Outdated
print('Value Range: [{}, {}]'.format( | ||
results.min_value.integer_value, | ||
results.max_value.integer_value)) | ||
print('25% quantile value: {}\n' # NOTE: change this to unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this #NOTE mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a note to ask you about switching this to be more like the NodeJS samples. The NodeJS samples only print the quantiles with unique values. Here I am summarizing by providing the quartiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on the output, since I don't really understand the functionality of the endpoint very well. = ) It should be fine as it is.
dlp/risk.py
Outdated
.categorical_stats_result | ||
.value_frequency_histogram_buckets) | ||
# Print bucket stats | ||
for i in range(len(histogram_buckets)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally in python you would do "for bucket in histogram_buckets" instead of indexing them by number. Likewise in the examples below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dlp/risk.py
Outdated
|
||
def k_map_estimate_analysis(project, table_project_id, dataset_id, table_id, | ||
topic_id, subscription_id, quasi_ids, info_types, | ||
region_code='USA', timeout=180): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for region_code also needs to be added as "default=" to the corresponding command-line flag. Otherwise, the flag's default value of None will override this default when it's actually called.
Alternatively, if region_code can be safely omitted, maybe this should default to None in the first place?
GCLOUD_PROJECT = os.getenv('GCLOUD_PROJECT') | ||
TOPIC_ID = 'dlp-test' | ||
SUBSCRIPTION_ID = 'dlp-test-subscription' | ||
DATASET_ID = 'integration_tests_dlp' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these datasets public? (So, will they work for our test runner as well?)
No description provided.