Face detection beta features #1414

dizcology · 2018-03-19T20:09:46Z

This PR adds samples for the following beta features:

face bounding boxes
face emotions
speech transcription

dizcology · 2018-03-19T20:10:24Z

tested against the released client library 1.1.0.

andrewsg · 2018-03-21T21:40:52Z

video/cloud-client/analyze/beta_snippets.py

+            emotions = frame.attributes[0].emotions
+
+            # from videointelligence.enums
+            emotion_labels = (


Is there any way to use the videointelligence.enums object directly or pull this programmatically? How badly will this break if the enum changes in the future?

The simplest way is for the enums in videointelligence.enums to extend python's Enums class, which would allow us to map the enums back to their names. However the client libraries do not do that (yet).

I don't know of a clean way to pull the names programmatically, and as it stands now the sample code will break in these possible ways:

the order of the enums changes (unlikely), in which case the printout of the sample will be incorrect.

more enums are added (possible) and the data we use for testing receive those added enums in the API responses, in which case we would get an IndexError in our testing.

(worst) more enums are added, but the testing data does not receive them, so we don't notice it is broken. Users relying on the same tuple to map enums to names might get IndexError with their data.

The best long term solution is to simply extend Enums. When that happens we will need to come back and update these samples.

+1 Programmatically, this keeps showing up in Python samples and, as a user, it seems really frustrating that Python makes you do this... you get the proto's enum value index for enums via the library but not the enum value name?

I don't think any of the other languages have this problem?

(Make this better! 😭)

Without fixing the client library generation tool, this is the closest I could come up with:

from google.cloud import videointelligence_v1p1beta1 as videointelligence emotion_label = {value:name for name, value in vars(videointelligence.enums.Emotion).iteritems() if not name.startswith('__')}

I would rather not have this as is in a code snippet - but perhaps abstracted away as a helper function? The reader of the sample would not be able to see the full list of enums this way.

To be clear, I wasn't thinking about fixing this in the code sample, but rather in google-cloud-core

IMO The code sample should use an inline, literal List with the values. The display name of the enum value should be printed by getting it by index. As we do today.

I'll move this to a thread on google-cloud-core 😄

Code in this PR regarding Enum LGTM

nnegrey

Looks pretty good, mostly stuff for discussion

nnegrey · 2018-03-23T20:18:15Z

video/cloud-client/analyze/beta_snippets.py

+
+Usage Examples:
+    python beta_snippets.py boxes \
+    gs://python-docs-samples-tests/video/googlework_short.mp4


Should we make another demo project to hold public files that can be used across languages?

nnegrey · 2018-03-23T20:18:57Z

video/cloud-client/analyze/beta_snippets.py

+# limitations under the License.
+
+"""This application demonstrates face detection, label detection,
+explicit content, and shot change detection using the Google Cloud API.


... face detection, face emotions, video transcription

nnegrey · 2018-03-23T20:19:37Z

video/cloud-client/analyze/beta_snippets.py

+    # include_bounding_boxes must be set to True for include_emotions
+    # to work.
+    config = videointelligence.types.FaceConfig(
+        include_bounding_boxes=True,


Is this still required? I thought we had them pull that?

nnegrey · 2018-03-23T20:21:45Z

video/cloud-client/analyze/beta_snippets.py

+
+
+# [START video_face_bounding_boxes]
+def face_bounding_boxes(path):


Can they use a local file too or just a GCS file?

They can. The request needs to be only slightly different, such as https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/video/cloud-client/analyze/analyze.py#L134

Do we only need to provide local file snippets, no GCS snippets?

Here we are showing only GCS snippets.

OH, do Python samples all use path for GCS? I figured that was for a local file

It should be uri, based on existing samples, eg: https://cloud.google.com/vision/docs/detecting-labels#vision-label-detection-python

def detect_labels(path): """Detects labels in the file.""" client = vision.ImageAnnotatorClient()

def detect_labels_uri(uri): """Detects labels in the file located in Google Cloud Storage or on the Web.""" client = vision.ImageAnnotatorClient()

Let's make this change across the board to all 3 samples

you are right - it should be uri, or rather gcs_uri (since currently the API only supports that). fixed.

nnegrey · 2018-03-23T20:28:26Z

video/cloud-client/analyze/beta_snippets.py

+        print('\tSegment: {}\n'.format(positions))
+
+        # There are typically many frames for each face,
+        # here we print information on only the first frame.


This comment confused me a little, especially with the possibility of segments above.
"There are typically many frames for each face, since a face is in a segment of time" (I don't know if that helps clarify it or just makes it more confusing)

Could a face appear in multiple segments?

Like:
Face 1: (Segment 0:00 --> 0:05, 0:15 --> 0:25) then the face would be in all the frames between those segments.

I slightly reworded this comment, please take a look.

For face detection, each face indeed is a segment. If the same person appears in two segments of the same video, two separate faces will be returned, each with its own segment.

nnegrey · 2018-03-23T20:35:21Z

video/cloud-client/analyze/beta_snippets.py

+        for time_offset, emotion_label, score in frame_emotions:
+            print('\t{:04.2f}s: {:14}({:4.3f})'.format(
+                time_offset, emotion_label, score))
+        print('\n')


Should we merely print out the results without sorting based on score? Will help simplify the code, if we were going across the whole segment to determine the most likely emotion across all frames I think that would be cool, but not as simple.

Also by printing out the time_offset, isn't it always the same, since we only get the one frame for each face?

Here actually we are sorting by time_offset - an earlier version of of the API did not sort the frames by it. Emotion is a frame level output since detecting the change of emotions is useful.

The output might look like this:

30.56s: INTEREST (0.606) 30.60s: INTEREST (0.582) 30.63s: INTEREST (0.561) 30.66s: AMUSEMENT (0.559) 30.70s: AMUSEMENT (0.556) 30.73s: AMUSEMENT (0.549) 30.76s: CONTENTMENT (0.564) 30.80s: CONTENTMENT (0.634) 30.83s: CONTENTMENT (0.688)

nnegrey · 2018-03-23T20:36:07Z

video/cloud-client/analyze/beta_snippets.py

+
+    features = [
+        videointelligence.enums.Feature.SPEECH_TRANSCRIPTION
+    ]


nnegrey · 2018-03-23T20:37:15Z

video/cloud-client/analyze/beta_snippets.py

+
+    result = operation.result(timeout=180)
+
+    annotation_results = result.annotation_results[0]


Can you add a comment that notes you are only pulling out the first result or is there only one result?

nnegrey

Looks great (1 tiny fix)

nnegrey · 2018-03-26T16:27:34Z

video/cloud-client/analyze/beta_snippets.py

+        positions = '{}s to {}s'.format(start_time, end_time)
+        print('\tSegment: {}\n'.format(positions))
+
+        # Each detected may appear in many frames of the video.


Each detected face
(I like the rewording)

beccasaurus · 2018-03-26T19:28:06Z

video/cloud-client/analyze/beta_snippets.py

+
+            emotion, score = sorted(
+                [(em.emotion, em.score) for em in emotions],
+                key=lambda p: p[1])[-1]


Non-trivial logic alert!

@nnegrey I think I misunderstood your comment about this part - indeed I was sorting by emotion scores. The reason being that the API returns one score for each emotion, and here I am trying to show only the one that scores the highest.

added a comment here to clarify what I was doing. please take a look.

This line is really impressive:

multiple return values

sorting

list comprehension

lambda

-1 index used

beccasaurus · 2018-03-26T22:38:15Z

video/cloud-client/analyze/beta_snippets.py

+
+            # every emotion gets a score, here we sort them by
+            # scores and keep only the one that scores the highest.
+            most_likely_emotion = sorted(emotions, key=lambda em: em.score)[-1]


Still somewhat impressive!

sorting

lambda expression

-1

Simplified. Thanks for taking the time explaining the issues to me!

beccasaurus · 2018-03-27T20:16:24Z

video/cloud-client/analyze/beta_snippets.py

+
+
+# [START video_speech_transcription]
+def speech_transcription(input_uri):


The other 2 functions use gcs_uri but this uses input_uri

We should make them consistent (and I'd TAL at whatever variable we use in all of the other Vision samples and use that for consistency)

…s-samples#1414)

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 19, 2018

dizcology requested review from beccasaurus, andrewsg and nnegrey March 19, 2018 20:10

dizcology added 6 commits March 20, 2018 13:18

add beta_snippets.py

c610dad

add beta_snippets for face detection features

4bed5f6

add beta_snippets test

c24243a

correct test names

69071f9

update client library version

3c1800b

update client library version

95a0b50

dizcology force-pushed the face_detection branch from 679b6a8 to 95a0b50 Compare March 20, 2018 20:58

dizcology added 2 commits March 20, 2018 14:49

add speech transcription

8d633ef

flake

974d237

andrewsg approved these changes Mar 21, 2018

View reviewed changes

nnegrey reviewed Mar 23, 2018

View reviewed changes

address review comments

f9e2dcc

nnegrey approved these changes Mar 26, 2018

View reviewed changes

fix missing word

9bedd7c

beccasaurus reviewed Mar 26, 2018

View reviewed changes

dizcology added 4 commits March 26, 2018 13:51

add comment

8c0ac58

rename path to gcs_uri

ba30eec

process emotions differently

5a5da60

simpler code

c15c729

beccasaurus reviewed Mar 26, 2018

View reviewed changes

dizcology added 2 commits March 26, 2018 16:31

use max instead of sorted

8d4adf1

update comment

7266d1e

beccasaurus approved these changes Mar 26, 2018

View reviewed changes

dizcology added 2 commits March 26, 2018 16:49

process emotions more simply

0fc6819

flake

d91bbe0

beccasaurus reviewed Mar 27, 2018

View reviewed changes

andrewsg merged commit 93124e0 into GoogleCloudPlatform:master Mar 27, 2018

busunkim96 pushed a commit to busunkim96/python-videointelligence that referenced this pull request May 20, 2020

Face detection beta features [(#1414)](GoogleCloudPlatform/python-doc…

48efaa1

…s-samples#1414)

danoscarmike pushed a commit to googleapis/python-videointelligence that referenced this pull request Sep 30, 2020

Face detection beta features [(#1414)](GoogleCloudPlatform/python-doc…

dc0e3e6

…s-samples#1414)

dizcology added a commit that referenced this pull request Sep 11, 2023

Face detection beta features [(#1414)](#1414)

7764727

dizcology mentioned this pull request Sep 11, 2023

migrate code from googleapis/python-videointelligence #10605

Merged

9 tasks

leahecole pushed a commit that referenced this pull request Sep 15, 2023

Face detection beta features [(#1414)](#1414)

22fbdc5

parthea pushed a commit to googleapis/google-cloud-python that referenced this pull request Sep 22, 2023

Face detection beta features [(#1414)](GoogleCloudPlatform/python-doc…

db9cc52

…s-samples#1414)



		# [START video_face_bounding_boxes]
		def face_bounding_boxes(path):


		result = operation.result(timeout=180)

		annotation_results = result.annotation_results[0]



		# [START video_speech_transcription]
		def speech_transcription(input_uri):

Face detection beta features #1414

Face detection beta features #1414

Uh oh!

Conversation

dizcology commented Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dizcology commented Mar 19, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnegrey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnegrey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dizcology commented Mar 19, 2018 •

edited

Loading