Skip to content

Commit 6305421

Browse files
committed
Merged commit includes the following changes:
195269567 by Zhichao Lu: Removing image summaries during train mode. -- 195147413 by Zhichao Lu: SSDLite config for mobilenet v2. -- 194883585 by Zhichao Lu: Simplify TPU compatible nearest neighbor upsampling using reshape and broadcasting. -- 194851009 by Zhichao Lu: Include ava v2.1 detection models in model zoo. -- 194292198 by Zhichao Lu: Add option to evaluate any checkpoint (without requiring write access to that directory and overwriting any existing logs there). -- 194122420 by Zhichao Lu: num_gt_boxes_per_image and num_det_boxes_per_image value incorrect. Should be not the expand dim. -- 193974479 by Zhichao Lu: Fixing a bug in the coco evaluator. -- 193959861 by Zhichao Lu: Read the default batch size from config file. -- 193737238 by Zhichao Lu: Fix data augmentation functions. -- 193576336 by Zhichao Lu: Add support for training keypoints. -- 193409179 by Zhichao Lu: Update protobuf requirements to 3+ in installation docs. -- 193382651 by Zhichao Lu: Updating coco evaluation metrics to allow for a batch of image info, rather than a single image. -- 193244778 by Zhichao Lu: Remove deprecated batch_norm_trainable field from ssd mobilenet v2 config -- 193228972 by Zhichao Lu: Make sure the final layers are also resized proportional to conv_depth_ratio. -- 193204364 by Zhichao Lu: Do not add batch norm parameters to final conv2d ops that predict boxes encodings and class scores in weight shared conv box predictor. This allows us to set proper bias and force initial predictions to be background when using focal loss. -- 193137342 by Zhichao Lu: Add a util function to visualize value histogram as a tf.summary.image. -- 193119411 by Zhichao Lu: Adding support for reading in logits as groundtruth labels and applying an optional temperature (scaling) before softmax in support of distillation. -- 193087707 by Zhichao Lu: Post-process now works again in train mode. -- 193067658 by Zhichao Lu: fix flakiness in testSSDRandomCropWithMultiClassScores due to randomness. -- 192922089 by Zhichao Lu: Add option to set dropout for classification net in weight shared box predictor. -- 192850747 by Zhichao Lu: Remove inaccurate caveat from proto file. -- 192837477 by Zhichao Lu: Extend to accept different ratios of conv channels. -- 192813444 by Zhichao Lu: Adding option for one_box_for_all_classes to the box_predictor -- 192624207 by Zhichao Lu: Update to trainer to allow for reading multiclass scores -- 192583425 by Zhichao Lu: Contains implementation of Visual Relations Detection evaluation metric (per image evaluation). -- 192529600 by Zhichao Lu: Modify the ssd meta arch to allow the option of not adding an implicit background class. -- 192512429 by Zhichao Lu: Refactor model_tpu_main.py files and move continuous eval loop into model_lib.py -- 192494267 by Zhichao Lu: Update create_pascal_tf_record.py and create_pet_tf_record.py -- 192485456 by Zhichao Lu: Enforcing that all eval metric ops have valid python strings. -- 192472546 by Zhichao Lu: Set regularize_depthwise to true in mobilenet_v1_argscope. -- 192421843 by Zhichao Lu: Refactoring of Mask-RCNN to put all mask prediction code in third stage. -- 192320460 by Zhichao Lu: Returning eval_on_train_input_fn from create_estimator_and_inputs(), rather than using train_input_fn in EVAL mode (which will still have data augmentation). -- 192226678 by Zhichao Lu: Access TPUEstimator and CrossShardOptimizer from tf namesspace. -- 192195514 by Zhichao Lu: Fix test that was flaky due to randomness -- 192166224 by Zhichao Lu: Minor fixes to match git repo. -- 192147130 by Zhichao Lu: use shape utils for assertion in feature extractor. -- 192132440 by Zhichao Lu: Class agnostic masks for mask_rcnn -- 192006190 by Zhichao Lu: Add learning rate summary in EVAL mode in model.py -- 192004845 by Zhichao Lu: Migrating away from Experiment class, as it is now deprecated. Also, refactoring into a separate model library and binaries. -- 191957195 by Zhichao Lu: Add classification_loss and localiztion_loss metrics for TPU jobs. -- 191932855 by Zhichao Lu: Add an option to skip the last striding in mobilenet. The modified network has nominal output stride 16 instead of 32. -- 191787921 by Zhichao Lu: Add option to override base feature extractor hyperparams in SSD models. This would allow us to use the same set of hyperparams for the complete feature extractor (base + new layers) if desired. -- 191743097 by Zhichao Lu: Adding an attribute to SSD model to indicate which fields in prediction dictionary have a batch dimension. This will be useful for future video models. -- 191668425 by Zhichao Lu: Internal change. -- 191649512 by Zhichao Lu: Introduce two parameters in ssd.proto - freeze_batchnorm, inplace_batchnorm_update - and set up slim arg_scopes in ssd_meta_arch.py such that applies it to all batchnorm ops in the predict() method. This centralizes the control of freezing and doing inplace batchnorm updates. -- 191620303 by Zhichao Lu: Modifications to the preprocessor to support multiclass scores -- PiperOrigin-RevId: 195269567
1 parent 5f9f6b8 commit 6305421

File tree

7 files changed

+424
-29
lines changed

7 files changed

+424
-29
lines changed

research/object_detection/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,15 @@ reporting an issue.
9090

9191
## Release information
9292

93+
### April 30, 2018
94+
95+
We have released a Faster R-CNN detector with ResNet-101 feature extractor trained on [AVA](https://research.google.com/ava/) v2.1.
96+
Compared with other commonly used object detectors, it changes the action classification loss function to per-class Sigmoid loss to handle boxes with multiple labels.
97+
The model is trained on the training split of AVA v2.1 for 1.5M iterations, it achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1.
98+
For more details please refer to this [paper](https://arxiv.org/abs/1705.08421).
99+
100+
<b>Thanks to contributors</b>: Chen Sun, David Ross
101+
93102
### April 2, 2018
94103

95104
Supercharge your mobile phones with the next generation mobile object detector!
Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
item {
2+
name: "bend/bow (at the waist)"
3+
id: 1
4+
}
5+
item {
6+
name: "crouch/kneel"
7+
id: 3
8+
}
9+
item {
10+
name: "dance"
11+
id: 4
12+
}
13+
item {
14+
name: "fall down"
15+
id: 5
16+
}
17+
item {
18+
name: "get up"
19+
id: 6
20+
}
21+
item {
22+
name: "jump/leap"
23+
id: 7
24+
}
25+
item {
26+
name: "lie/sleep"
27+
id: 8
28+
}
29+
item {
30+
name: "martial art"
31+
id: 9
32+
}
33+
item {
34+
name: "run/jog"
35+
id: 10
36+
}
37+
item {
38+
name: "sit"
39+
id: 11
40+
}
41+
item {
42+
name: "stand"
43+
id: 12
44+
}
45+
item {
46+
name: "swim"
47+
id: 13
48+
}
49+
item {
50+
name: "walk"
51+
id: 14
52+
}
53+
item {
54+
name: "answer phone"
55+
id: 15
56+
}
57+
item {
58+
name: "carry/hold (an object)"
59+
id: 17
60+
}
61+
item {
62+
name: "climb (e.g., a mountain)"
63+
id: 20
64+
}
65+
item {
66+
name: "close (e.g., a door, a box)"
67+
id: 22
68+
}
69+
item {
70+
name: "cut"
71+
id: 24
72+
}
73+
item {
74+
name: "dress/put on clothing"
75+
id: 26
76+
}
77+
item {
78+
name: "drink"
79+
id: 27
80+
}
81+
item {
82+
name: "drive (e.g., a car, a truck)"
83+
id: 28
84+
}
85+
item {
86+
name: "eat"
87+
id: 29
88+
}
89+
item {
90+
name: "enter"
91+
id: 30
92+
}
93+
item {
94+
name: "hit (an object)"
95+
id: 34
96+
}
97+
item {
98+
name: "lift/pick up"
99+
id: 36
100+
}
101+
item {
102+
name: "listen (e.g., to music)"
103+
id: 37
104+
}
105+
item {
106+
name: "open (e.g., a window, a car door)"
107+
id: 38
108+
}
109+
item {
110+
name: "play musical instrument"
111+
id: 41
112+
}
113+
item {
114+
name: "point to (an object)"
115+
id: 43
116+
}
117+
item {
118+
name: "pull (an object)"
119+
id: 45
120+
}
121+
item {
122+
name: "push (an object)"
123+
id: 46
124+
}
125+
item {
126+
name: "put down"
127+
id: 47
128+
}
129+
item {
130+
name: "read"
131+
id: 48
132+
}
133+
item {
134+
name: "ride (e.g., a bike, a car, a horse)"
135+
id: 49
136+
}
137+
item {
138+
name: "sail boat"
139+
id: 51
140+
}
141+
item {
142+
name: "shoot"
143+
id: 52
144+
}
145+
item {
146+
name: "smoke"
147+
id: 54
148+
}
149+
item {
150+
name: "take a photo"
151+
id: 56
152+
}
153+
item {
154+
name: "text on/look at a cellphone"
155+
id: 57
156+
}
157+
item {
158+
name: "throw"
159+
id: 58
160+
}
161+
item {
162+
name: "touch (an object)"
163+
id: 59
164+
}
165+
item {
166+
name: "turn (e.g., a screwdriver)"
167+
id: 60
168+
}
169+
item {
170+
name: "watch (e.g., TV)"
171+
id: 61
172+
}
173+
item {
174+
name: "work on a computer"
175+
id: 62
176+
}
177+
item {
178+
name: "write"
179+
id: 63
180+
}
181+
item {
182+
name: "fight/hit (a person)"
183+
id: 64
184+
}
185+
item {
186+
name: "give/serve (an object) to (a person)"
187+
id: 65
188+
}
189+
item {
190+
name: "grab (a person)"
191+
id: 66
192+
}
193+
item {
194+
name: "hand clap"
195+
id: 67
196+
}
197+
item {
198+
name: "hand shake"
199+
id: 68
200+
}
201+
item {
202+
name: "hand wave"
203+
id: 69
204+
}
205+
item {
206+
name: "hug (a person)"
207+
id: 70
208+
}
209+
item {
210+
name: "kiss (a person)"
211+
id: 72
212+
}
213+
item {
214+
name: "lift (a person)"
215+
id: 73
216+
}
217+
item {
218+
name: "listen to (a person)"
219+
id: 74
220+
}
221+
item {
222+
name: "push (another person)"
223+
id: 76
224+
}
225+
item {
226+
name: "sing to (e.g., self, a person, a group)"
227+
id: 77
228+
}
229+
item {
230+
name: "take (an object) from (a person)"
231+
id: 78
232+
}
233+
item {
234+
name: "talk to (e.g., self, a person, a group)"
235+
id: 79
236+
}
237+
item {
238+
name: "watch (a person)"
239+
id: 80
240+
}

research/object_detection/g3doc/detection_model_zoo.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ Some remarks on frozen inference graphs:
9191

9292
## Kitti-trained models {#kitti-models}
9393

94-
Model name | Speed (ms) | Pascal mAP@0.5 (ms) | Outputs
94+
Model name | Speed (ms) | Pascal mAP@0.5 | Outputs
9595
----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
9696
[faster_rcnn_resnet101_kitti](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_kitti_2018_01_28.tar.gz) | 79 | 87 | Boxes
9797

@@ -103,6 +103,13 @@ Model name
103103
[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz) | 347 | | Boxes
104104

105105

106+
## AVA v2.1 trained models {#ava-models}
107+
108+
Model name | Speed (ms) | Pascal mAP@0.5 | Outputs
109+
----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
110+
[faster_rcnn_resnet101_ava_v2.1](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_ava_v2.1_2018_04_30.tar.gz) | 93 | 11 | Boxes
111+
112+
106113
[^1]: See [MSCOCO evaluation protocol](http://cocodataset.org/#detections-eval).
107114
[^2]: This is PASCAL mAP with a slightly different way of true positives computation: see [Open Images evaluation protocol](evaluation_protocols.md#open-images).
108115

research/object_detection/model_lib.py

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -325,16 +325,16 @@ def tpu_scaffold():
325325
}
326326

327327
eval_metric_ops = None
328-
if mode in (tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL):
328+
if mode == tf.estimator.ModeKeys.EVAL:
329329
class_agnostic = (fields.DetectionResultFields.detection_classes
330330
not in detections)
331331
groundtruth = _get_groundtruth_data(detection_model, class_agnostic)
332332
use_original_images = fields.InputDataFields.original_image in features
333-
original_images = (
333+
eval_images = (
334334
features[fields.InputDataFields.original_image] if use_original_images
335335
else features[fields.InputDataFields.image])
336336
eval_dict = eval_util.result_dict_for_single_example(
337-
original_images[0:1],
337+
eval_images[0:1],
338338
features[inputs.HASH_KEY][0],
339339
detections,
340340
groundtruth,
@@ -355,22 +355,21 @@ def tpu_scaffold():
355355
img_summary = tf.summary.image('Detections_Left_Groundtruth_Right',
356356
detection_and_groundtruth)
357357

358-
if mode == tf.estimator.ModeKeys.EVAL:
359-
# Eval metrics on a single example.
360-
eval_metrics = eval_config.metrics_set
361-
if not eval_metrics:
362-
eval_metrics = ['coco_detection_metrics']
363-
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
364-
eval_metrics, category_index.values(), eval_dict,
365-
include_metrics_per_category=False)
366-
for loss_key, loss_tensor in iter(losses_dict.items()):
367-
eval_metric_ops[loss_key] = tf.metrics.mean(loss_tensor)
368-
for var in optimizer_summary_vars:
369-
eval_metric_ops[var.op.name] = (var, tf.no_op())
370-
if img_summary is not None:
371-
eval_metric_ops['Detections_Left_Groundtruth_Right'] = (
372-
img_summary, tf.no_op())
373-
eval_metric_ops = {str(k): v for k, v in eval_metric_ops.iteritems()}
358+
# Eval metrics on a single example.
359+
eval_metrics = eval_config.metrics_set
360+
if not eval_metrics:
361+
eval_metrics = ['coco_detection_metrics']
362+
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
363+
eval_metrics, category_index.values(), eval_dict,
364+
include_metrics_per_category=False)
365+
for loss_key, loss_tensor in iter(losses_dict.items()):
366+
eval_metric_ops[loss_key] = tf.metrics.mean(loss_tensor)
367+
for var in optimizer_summary_vars:
368+
eval_metric_ops[var.op.name] = (var, tf.no_op())
369+
if img_summary is not None:
370+
eval_metric_ops['Detections_Left_Groundtruth_Right'] = (
371+
img_summary, tf.no_op())
372+
eval_metric_ops = {str(k): v for k, v in eval_metric_ops.iteritems()}
374373

375374
if use_tpu:
376375
return tf.contrib.tpu.TPUEstimatorSpec(

0 commit comments

Comments
 (0)