-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Face detection beta features #1414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Face detection beta features #1414
Conversation
tested against the released client library 1.1.0. |
679b6a8
to
95a0b50
Compare
emotions = frame.attributes[0].emotions | ||
|
||
# from videointelligence.enums | ||
emotion_labels = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to use the videointelligence.enums object directly or pull this programmatically? How badly will this break if the enum changes in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest way is for the enums in videointelligence.enums to extend python's Enums class, which would allow us to map the enums back to their names. However the client libraries do not do that (yet).
I don't know of a clean way to pull the names programmatically, and as it stands now the sample code will break in these possible ways:
-
the order of the enums changes (unlikely), in which case the printout of the sample will be incorrect.
-
more enums are added (possible) and the data we use for testing receive those added enums in the API responses, in which case we would get an IndexError in our testing.
-
(worst) more enums are added, but the testing data does not receive them, so we don't notice it is broken. Users relying on the same tuple to map enums to names might get IndexError with their data.
The best long term solution is to simply extend Enums. When that happens we will need to come back and update these samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Programmatically, this keeps showing up in Python samples and, as a user, it seems really frustrating that Python makes you do this... you get the proto's enum value index for enums via the library but not the enum value name?
I don't think any of the other languages have this problem?
(Make this better! 😭)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without fixing the client library generation tool, this is the closest I could come up with:
from google.cloud import videointelligence_v1p1beta1 as videointelligence
emotion_label = {value:name for name, value in vars(videointelligence.enums.Emotion).iteritems() if not name.startswith('__')}
I would rather not have this as is in a code snippet - but perhaps abstracted away as a helper function? The reader of the sample would not be able to see the full list of enums this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I wasn't thinking about fixing this in the code sample, but rather in google-cloud-core
IMO The code sample should use an inline, literal List with the values. The display name of the enum value should be printed by getting it by index. As we do today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move this to a thread on google-cloud-core 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code in this PR regarding Enum LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, mostly stuff for discussion
|
||
Usage Examples: | ||
python beta_snippets.py boxes \ | ||
gs://python-docs-samples-tests/video/googlework_short.mp4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make another demo project to hold public files that can be used across languages?
# limitations under the License. | ||
|
||
"""This application demonstrates face detection, label detection, | ||
explicit content, and shot change detection using the Google Cloud API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... face detection, face emotions, video transcription
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
# include_bounding_boxes must be set to True for include_emotions | ||
# to work. | ||
config = videointelligence.types.FaceConfig( | ||
include_bounding_boxes=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still required? I thought we had them pull that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
|
||
# [START video_face_bounding_boxes] | ||
def face_bounding_boxes(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can they use a local file too or just a GCS file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can. The request needs to be only slightly different, such as https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/video/cloud-client/analyze/analyze.py#L134
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only need to provide local file snippets, no GCS snippets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are showing only GCS snippets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH, do Python samples all use path
for GCS? I figured that was for a local file
It should be uri
, based on existing samples, eg: https://cloud.google.com/vision/docs/detecting-labels#vision-label-detection-python
def detect_labels(path):
"""Detects labels in the file."""
client = vision.ImageAnnotatorClient()
def detect_labels_uri(uri):
"""Detects labels in the file located in Google Cloud Storage or on the
Web."""
client = vision.ImageAnnotatorClient()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this change across the board to all 3 samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right - it should be uri
, or rather gcs_uri
(since currently the API only supports that). fixed.
print('\tSegment: {}\n'.format(positions)) | ||
|
||
# There are typically many frames for each face, | ||
# here we print information on only the first frame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment confused me a little, especially with the possibility of segments above.
"There are typically many frames for each face, since a face is in a segment of time" (I don't know if that helps clarify it or just makes it more confusing)
Could a face appear in multiple segments?
Like:
Face 1: (Segment 0:00 --> 0:05, 0:15 --> 0:25) then the face would be in all the frames between those segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly reworded this comment, please take a look.
For face detection, each face indeed is a segment. If the same person appears in two segments of the same video, two separate faces will be returned, each with its own segment.
for time_offset, emotion_label, score in frame_emotions: | ||
print('\t{:04.2f}s: {:14}({:4.3f})'.format( | ||
time_offset, emotion_label, score)) | ||
print('\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we merely print out the results without sorting based on score? Will help simplify the code, if we were going across the whole segment to determine the most likely emotion across all frames I think that would be cool, but not as simple.
Also by printing out the time_offset
, isn't it always the same, since we only get the one frame for each face?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here actually we are sorting by time_offset - an earlier version of of the API did not sort the frames by it. Emotion is a frame level output since detecting the change of emotions is useful.
The output might look like this:
30.56s: INTEREST (0.606)
30.60s: INTEREST (0.582)
30.63s: INTEREST (0.561)
30.66s: AMUSEMENT (0.559)
30.70s: AMUSEMENT (0.556)
30.73s: AMUSEMENT (0.549)
30.76s: CONTENTMENT (0.564)
30.80s: CONTENTMENT (0.634)
30.83s: CONTENTMENT (0.688)
|
||
features = [ | ||
videointelligence.enums.Feature.SPEECH_TRANSCRIPTION | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
result = operation.result(timeout=180) | ||
|
||
annotation_results = result.annotation_results[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment that notes you are only pulling out the first result or is there only one result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great (1 tiny fix)
positions = '{}s to {}s'.format(start_time, end_time) | ||
print('\tSegment: {}\n'.format(positions)) | ||
|
||
# Each detected may appear in many frames of the video. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each detected face
(I like the rewording)
|
||
emotion, score = sorted( | ||
[(em.emotion, em.score) for em in emotions], | ||
key=lambda p: p[1])[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-trivial logic alert!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnegrey I think I misunderstood your comment about this part - indeed I was sorting by emotion scores. The reason being that the API returns one score for each emotion, and here I am trying to show only the one that scores the highest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment here to clarify what I was doing. please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is really impressive:
- multiple return values
- sorting
- list comprehension
- lambda
- -1 index used
|
||
# every emotion gets a score, here we sort them by | ||
# scores and keep only the one that scores the highest. | ||
most_likely_emotion = sorted(emotions, key=lambda em: em.score)[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still somewhat impressive!
- sorting
- lambda expression
- -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified. Thanks for taking the time explaining the issues to me!
|
||
|
||
# [START video_speech_transcription] | ||
def speech_transcription(input_uri): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other 2 functions use gcs_uri
but this uses input_uri
We should make them consistent (and I'd TAL at whatever variable we use in all of the other Vision samples and use that for consistency)
This PR adds samples for the following beta features: