Skip to content

[DO NOT MERGE] Upgrade pubsub samples. #543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

jmdobry
Copy link
Member

@jmdobry jmdobry commented Sep 26, 2016

Summary of changes:

  • Added region tags that match those in Pub/Sub samples for other languages
  • Switched from gcloud to google-cloud-pubsub package (can undo, I imagine you'll do this on master before this PR gets merged)
  • Consolidated samples into topics.py and subscriptions.py to match the Node.js and Ruby PRs
  • Added a get_subscription sample
  • Added a list_subscriptions sample (in addition to the existing list_topic_subscriptions sample
  • Broke the tests

Compare to:

Proposals implemented by the 3 PRs:

  • Show samples as method definitions
    • Principle from Sample Code Usability Study: "Make inputs and outputs explicit, i.e. what should the reader change to adapt to his/her own needs."
    • The inputs really can't be more obvious than a function signature (and still be easily testable, maintainable samples)
  • Use the exact same region tags across language, to enable automated sample tracking.
    • Less complex and less brittle than regular expressions (but yes, they're ugly)
  • Do not show importing within region tags (was already done for the python samples)
    • Leave showing installing/importing to the Client Libraries Pages on cloud.google.com and the Reference Docs for the client libraries. Anyone who opens the actual sample file will see the import at the top of the file.
  • Client library instantiation relies on implicit project ID (let the client library infer the project ID from the environment)
  • Be consistent with code comments across languages
    • Principle from Code Sample Usability Study: "Maintain consistency in style throughout samples"
  • Be consistent with variables names across languages
    • Principle from Code Sample Usability Study: "Name your variables carefully and consistently across samples."
    • Principle from Code Sample Usability Study: "Maintain consistency in style throughout samples"
  • Be consistent with printed output across languages
    • Principle from Code Sample Usability Study: "Maintain consistency in style throughout samples"
  • Be consistent with CLI across languages
    • Principle from Code Sample Usability Study: "Maintain consistency in style throughout samples"

@jmdobry
Copy link
Member Author

jmdobry commented Sep 26, 2016

Not done, don't merge. Also I want to fix the tests but can't figure out how to run them.

@theacodes
Copy link
Contributor

@jmdobry can you summarize the changes?

@theacodes theacodes changed the title Upgrade pubsub samples. [DO NOT MERGE] Upgrade pubsub samples. Sep 26, 2016
@jmdobry
Copy link
Member Author

jmdobry commented Sep 26, 2016

Will do

@theacodes
Copy link
Contributor

Top-level comments:

Consolidated samples into topics.py and subscriptions.py to match the Node.js and Ruby PRs

I did like the separation of iam-related samples in its own file. I organized these to match doc pages.

Use the exact same region tags across language, to enable automated sample tracking.

I'm pretty -1 on this. I've gotten feedback that these confuse users. Principle of least surprise.

Do not show importing or instantiating client library within region tags

I'm -1 on this, the justification "Avoid massive duplication of boilerplate import/instantiate code that gets out of date" isn't relevant when it's literally one statement with zero parameters.

from google.cloud import pubsub


pubsub_client = pubsub.Client()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no bueno:

  • It's a global variable (not a constant)
  • There should not be side-effects for top-level statements.
  • This is not easily testable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# References an existing topic, e.g. "my-topic"
topic = pubsub_client.topic(topic_name)

project_id = os.environ['GCLOUD_PROJECT'] or 'YOUR_PROJECT_ID'
Copy link
Contributor

@theacodes theacodes Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, lets not do this. pubsub_client.project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, yeah that works better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL google-cloud-node supports this too, nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed



# [START pubsub_get_subscription_metadata]
def get_subscription_metadata(topic_name, subscription_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_subscription is just a good as get_subscription_metadata and is more succinct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# Retrieves the IAM policy for the subscription
policy = subscription.get_iam_policy()

print('Policy for subscription: {}'.format(policy.to_api_repr()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_api_repr shouldn't be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jmdobry jmdobry force-pushed the pubsub-sample-style branch from e4305ae to 96ec3bc Compare September 28, 2016 00:33
@jmdobry
Copy link
Member Author

jmdobry commented Sep 28, 2016

Consolidated samples into topics.py and subscriptions.py to match the Node.js and Ruby PRs

I did like the separation of iam-related samples in its own file. I organized these to match doc pages.

The docs are being completely rewritten and won't have the same organization, so here I just went for consistency with other languages and the somewhat common "organization by resource type"

Use the exact same region tags across language, to enable automated sample tracking.

I'm pretty -1 on this. I've gotten feedback that these confuse users. Principle of least surprise.

Yeah, still pending on whether we can reliably track samples using regular expressions.

Do not show importing or instantiating client library within region tags

I'm -1 on this, the justification "Avoid massive duplication of boilerplate import/instantiate code that gets out of date" isn't relevant when it's literally one statement with zero parameters.

Fixed

@theacodes
Copy link
Contributor

The docs are being completely rewritten and won't have the same organization, so here I just went for consistency with other languages and the somewhat common "organization by resource type"

That seems.. weird. These samples are getting big - the cognitive load of completely understanding one is now getting high. I'm not a fan of it.

@beccasaurus
Copy link
Contributor

beccasaurus commented Sep 28, 2016

My feedback on the proposed samples

  • Method definitions (I propose that we do NOT include them in snippets)
  • Library imports (I propose that we DO include them in snippets)

Method Definitions

Show samples as method definitions

  • Principle from Sample Code Usability Study: "Make inputs and outputs explicit, i.e. what should the reader change to adapt to his/her own needs."

I propose that code without a wrapping method definition is more clear and makes it easier for the reader to adapt the code to his/her own needs.

As an example, see the code samples for the API reference docs,
eg. Buckets.insert
for the Google Cloud Storage API. These samples satisfy the same goal by providing simple placeholder variables.

// TODO: Change placeholders below to values for parameters to the Insert() method:

// A valid API project identifier.
string project = "";

// TODO Add code to assign values to properties of 'content'.
Data.Bucket content = new Data.Bucket();

BucketsResource.InsertRequest request = storageService.Buckets.Insert(content, project);
Data.Bucket response = request.Execute();

Documenting snippet variables

Method definitions typically include simple parameter names such as project_id, bucket_name, dataset_id. While the variable names are easy to understand, the type of value that should be passed is not always obvious. Is project_id a number? Are any of the parameters objects, or all simple primitive types?

To solve this, each variable MUST include a description. I see 3 options for how to include a description of the variable in the code snippet:

  1. Add parameter comments to the method that document each parameter
  2. Add a comment near the usage of each variable?
  3. Set the parameter to an example value
Method parameters

1. For the first option, each language typically uses its own idiomatic commenting style for documenting parameters. For example:

/**
* Creates a new bucket with the given name.
*
* @param {string} name The name of the new bucket.
* @param {function} cb The callback function.
*/
function createBucket (name, callback) {
  // ...
}

These comments can take up many lines of code, using up valuable vertical space. Because they follow language idioms for method parameter documentation, they can often be more syntactically complex and difficult to read at a glance than simple comments. C# parameter documentation, for example, is written in XML and is overkill for use in snippets.

These are not ideal due to readability and vertical space.

Comments in method code

2. The second option might be to add comments near where each variable is used. For example, when bucket_name is used for the first time we might add a comment above the usage of bucket_name describing to the user what the bucket_name variable should be:

# Name of a new bucket to create, e.g. "my-bucket"
bucket = storage.create_bucket bucket_name

This is helpful, although... what if multiple variables are introduced on the same line?

# extract_uri references a Google Storage file, e.g. "gs://my-bucket/my-file.csv"
# format references a the exported file format, e.g. "csv" or "json"
extract_job = table.extract extract_uri, format

These are not ideal because:

  1. The developer cannot see a list all of the required variables with a description of their values at a glance
  2. This does not provide a deterministic template that works for all samples. As a code sample author, I have to ask myself "where should I add comments?" Using an ideal template, the code sample author should not have to make such decisions.
Placeholder variables

3. The third option might be to add placeholder variables with example values / descriptions for each of the variables required to successfully run the snippet code. This is similar to what is currently used by some of the code snippets shown in API usage documentation 1.

def upload_file bucket_name, file_name, local_file_path
  # bucket_name     = "Name of Google Cloud Storage bucket"
  # local_file_path = "Path to local file to upload"

  bucket = storage.bucket bucket_name

  bucket.create_file local_file_path, file_name
end

The third option might allow users to copy/paste the snippet into their own file and uncomment existing variable definitions to run the code successfully without having to define any variables by themselves. Users will very likely also find it useful to see a list of all required variables at the top of the snippet.

I propose that this is ideal.

  • Developers can copy/paste the snippet and all of the required variable definitions are already present, waiting to be uncommented.
  • Developers can see all variables at a glace, as the first line of every sample shows a list of variables used.
  • Code sample authors do not have to decide where to add comments because this is part of the template. ALWAYS add the variable definitions with descriptions to the top of the snippet.
  • Minimal vertical space is used. At the top of each snippet, 1 line is required to define and describe each variable, followed by a newline.

Why we do not need to include method signatures

For all of the 3 possible options above, none of them benefit from being wrapped in a method definition. Method definitions take up 2 extra lines of vertical space and they result in the first line of code may be something like public void CreateBucket(string bucketName). What benefit does this line provide to developers?

Import Client Library in snippets

Do not show importing within region tags

  • Leave showing installing/importing to the Client Libraries Pages on cloud.google.com and the Reference Docs for the client libraries. Anyone who opens the actual sample file will see the import at the top of the file.

We have found that many users copy/paste our snippets into their own code files.

When a developer copy/pastes a snippet following these proposals, they will discover that the snippet DOES NOT FUNCTION independently for 2 reasons:

  1. The required imports are missing
  2. The instantiation of a client library is missing

As a developer, I may search Google looking for how to use a product such as Google Cloud Storage looking for a way to upload a file. When I scan through the documentation and find the documentation for uploading a file, I am likely to read the snippet code and attempt to upload a file using that code. I may not take the time to read the text surrounding the code snippet.

I propose that this is ideal.

  • Developers can solve the task at hand using one snippet without having to scan and search the documentation, looking for where a magic $storage_client variable is defined or figuring out what dependency to require
  • Developers don't have to search the documentation or browse through additional source code to answer these questions.

Example

As an example, this is the template that Ruby code sample snippets use today.

This snippet follows all of the proposals included in this comment and are the result of templatizing code sample snippets based on opinions of those gathering user study data on sample usability.

Translating a string using the Google Translate API

# api_key       = "Your API access key"
# text          = "The text you would like to translate"
# language_code = "The ISO 639-1 code of language to translate to, eg. 'en'"

require "google/cloud"

gcloud    = Google::Cloud.new
translate = gcloud.translate api_key

translation = translate.translate text, to: language_code

puts "Translated '#{text}' to '#{translation.text.inspect}'"
puts "Original language: #{translation.from} translated to: #{translation.to}"

Occassionally, variables are not simple strings but rather complex objects. BigQuery, for example, includes a snippet for inserting row data into a table and the row data is a complex object (List<Map>).

# project_id = "Your Google Cloud project ID"
# dataset_id = "ID of the dataset containing table"
# table_id   = "ID of the table to import data into"
# row_data   = [{ column1: value, column2: value }, ...]

require "google/cloud"

gcloud   = Google::Cloud.new project_id
bigquery = gcloud.bigquery
dataset  = bigquery.dataset dataset_id
table    = dataset.table table_id

response = table.insert row_data

if response.success?
  puts "Inserted rows successfully"
else
  puts "Failed to insert #{response.error_rows.count} rows"
end

@beccasaurus
Copy link
Contributor

@jonparrott

These samples are getting big - the cognitive load of completely understanding one is now getting high. I'm not a fan of it.

+1 I adore small, simple samples.

Whenever I see [START all] I think it's a bad smell. I really dislike seeing whole applications embedded into a document. I usually don't read those embedded samples because they're too long. Anything over ~ 20 or maybe 50 LOC or so, my eyes glaze over

@jmdobry
Copy link
Member Author

jmdobry commented Sep 29, 2016

He meant that the topics.py and subscriptions.py files are getting big, even though the individual snippets within those files are still about as small as they can be without expanding them with more comments and variable declarations.

I propose that code without a wrapping method definition is more clear and makes it easier for the reader to adapt the code to his/her own needs.

The problem is that most languages can't do imports within a method definition. Perhaps this works out okay in Ruby, but in Node.js dynamic imports are an anti-pattern. They can't be statically analyzed, and when Node.js implements ES2015 modules the language itself will enforce that imports be top-level statements only. This definitely won't work with Go, Java, and .NET, unless we want to do one snippet per file, which I don't think we want to do, and even then the samples for those languages would have to show a method definition.

An argument could be made that a method signature can be self-documenting. The majority of our samples deal with primitives like strings and numbers. This example seems pretty clear to me (though maybe not a good measure?):

def create_bucket(bucket_name):
  storage_client = Storage.client()

  # Creates a new bucket, e.g. "my-new-bucket"
  bucket = storage_client.create_bucket(bucket_name)

  print('Bucket {} created'.format(bucket.name))

Granted, this is a really simple example. For more complex samples, like writing a log entry (and your BigQuery row example), it's not enough for the method signature to simply accept a resource parameter because resource could be an arbitrarily complex object (see MonitoredResource), and the sample would be near impossible to understand. The sample would need to show an actual example value for resource, e.g. { type: 'global }.

We could change the Cloud Storage sample to this:

def create_bucket(bucket_name):
  # [START storage_create_bucket]
  storage_client = Storage.client()

  # Creates a new bucket, e.g. "my-new-bucket"
  bucket = storage_client.create_bucket(bucket_name)

  print('Bucket {} created'.format(bucket.name))
  # [END storage_create_bucket]

but now it's not clear where the bucket_name variable comes from. But, we can improve it by adding another comment:

def create_bucket(bucket_name):
  # [START storage_create_bucket]
  storage_client = Storage.client()

  # The name for the new bucket, e.g.
  # bucket_name = "my-new-bucket"

  # Creates a new bucket
  bucket = storage_client.create_bucket(bucket_name)

  print('Bucket {} created'.format(bucket.name))
  # [END storage_create_bucket]

But why not just show the method signature? I would say that decomposition and code re-use via functions is more common than writing apps that import, instantiate, and make a method call all at the top level. To reduce inconsistencies between our samples are how people write apps, my vote goes for showing the method definition. (Also, the user doesn't have to copy the method definition when they copy code from our docs. But if it's not there, they definitely can't copy it.)

This would be my ideal "quickstart" Cloud Storage sample, which would be shown on the Cloud Storage Client Libraries page:

# [START storage_quickstart]
# Imports the Google Cloud client library
from gcloud import storage

# Your Google Cloud Platform project ID
project_id = 'YOUR_PROJECT_ID'

# Instantiates a client
storage_client = storage.Client(project=project_id)

# The name for the new bucket
bucket_name = 'my-new-bucket'

# Creates the new bucket
bucket = storage_client.create_bucket(bucket_name)
# [END storage_quickstart]

And this would be my ideal "everything else" type of snippet:

(The snippet would be prefaced with a sentence like this: "For how to create a Cloud Storage client, refer to Cloud Storage Client Libraries.". The linked page would show installing and importing the client library.)

def create_bucket(bucket_name):
  # Instantiates a client
  storage_client = Storage.client()

  # The name for the new bucket, e.g.
  # bucket_name = "my-new-bucket"

  # Creates a new bucket
  bucket = storage_client.create_bucket(bucket_name)

  print('Bucket {} created'.format(bucket.name))

I'm okay with adding more comments that make it clear what type value a variable holds, e.g. "my-new-bucket", but I still prefer showing the method definition. For Node.js at least, there will be this random callback variable floating around if the method signature isn't shown. And as was discussed yesterday with the team, we want to explicitly show passing in the project ID to the client for the "quickstart" examples (the ones on the Client Libraries pages, like these), but everywhere else let the client library infer the project ID from the environment or the service account file.

@beccasaurus
Copy link
Contributor

@jmdobry jmdobry closed this Oct 17, 2016
@theacodes theacodes deleted the pubsub-sample-style branch November 15, 2016 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants