Skip to content

Commit 5fefdcd

Browse files
Merge pull request justadudewhohacks#72 from justadudewhohacks/tiny-yolov2-seperable-conv2d
Tiny yolov2 seperable conv2d
2 parents 661f228 + 4b4ecdb commit 5fefdcd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+2088
-21968
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,6 @@ node_modules
22
.rpt2_cache
33
.env*
44
tmp
5-
weights_uncompressed
5+
proto
6+
weights_uncompressed
7+
weights_unused

.npmignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,6 @@ examples
55
proto
66
weights
77
weights_uncompressed
8+
weights_unused
89
test
910
tools

README.md

Lines changed: 27 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Table of Contents:
2727
* **[Face Detection & 5 Point Face Landmarks - MTCNN](#usage-face-detection-mtcnn)**
2828
* **[Face Recognition](#usage-face-recognition)**
2929
* **[68 Point Face Landmark Detection](#usage-face-landmark-detection)**
30-
* **[Full Face Detection and Recognition Pipeline](#usage-full-face-detection-and-recognition-pipeline)**
30+
* **[Shortcut Functions for Full Face Description](#shortcut-functions)**
3131

3232
## Examples
3333

@@ -89,15 +89,15 @@ The face detection model has been trained on the [WIDERFACE dataset](http://mmla
8989

9090
### Face Detection - Tiny Yolo v2
9191

92-
The Tiny Yolo v2 based face detector can easily adapt to different input image sizes, thus can be used as an alternative to SSD Mobilenet v1 to trade off accuracy for performance (inference time). In general the model is not as accurate as SSD Mobilenet v1 but can achieve faster inference for lower image sizes.
92+
The Tiny Yolo v2 implementation is a very performant face detector, which can easily adapt to different input image sizes, thus can be used as an alternative to SSD Mobilenet v1 to trade off accuracy for performance (inference time). In general the models ability to locate smaller face bounding boxes is not as accurate as SSD Mobilenet v1.
9393

94-
The Tiny Yolo v2 implementation is still experimental, meaning there is room for optimization (future work). The trained model weights are provided in the [azFace](https://github.com/azmathmoosa/azFace) project.
94+
The face detector has been trained on a custom dataset of ~10K images labeled with bounding boxes and uses depthwise separable convolutions instead of regular convolutions, which ensures very fast inference and allows to have a quantized model size of only 1.7MB making the model extremely mobile and web friendly. Thus, the Tiny Yolo v2 face detector should be your GO-TO face detector on mobile devices.
9595

9696
<a name="about-face-detection-mtcnn"></a>
9797

9898
### Face Detection & 5 Point Face Landmarks - MTCNN
9999

100-
MTCNN (Multi-task Cascaded Convolutional Neural Networks) represents an alternative face detector to SSD Mobilenet v1 and Tiny Yolo v2, which offers much more room for configuration and is able to achieve much lower processing times. MTCNN is a 3 stage cascaded CNN, which simultanously returns 5 face landmark points along with the bounding boxes and scores for each face. By limiting the minimum size of faces expected in an image, MTCNN allows you to process frames from your webcam in realtime. Additionally with 2MB, the size of the weights file is only a third of the size of the quantized SSD Mobilenet v1 model (~6MB).
100+
MTCNN (Multi-task Cascaded Convolutional Neural Networks) represents an alternative face detector to SSD Mobilenet v1 and Tiny Yolo v2, which offers much more room for configuration. By tuning the input parameters, MTCNN is able to detect a wide range of face bounding box sizes. MTCNN is a 3 stage cascaded CNN, which simultanously returns 5 face landmark points along with the bounding boxes and scores for each face. By limiting the minimum size of faces expected in an image, MTCNN allows you to process frames from your webcam in realtime. Additionally with the model size is only 2MB.
101101

102102
MTCNN has been presented in the paper [Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks](https://kpzhang93.github.io/MTCNN_face_detection_alignment/paper/spl.pdf) by Zhang et al. and the model weights are provided in the official [repo](https://github.com/kpzhang93/MTCNN_face_detection_alignment) of the MTCNN implementation.
103103

@@ -164,7 +164,7 @@ await net.load('/models/face_detection_model-weights_manifest.json')
164164
// await net.load('/models/face_landmark_68_model-weights_manifest.json')
165165
// await net.load('/models/face_recognition_model-weights_manifest.json')
166166
// await net.load('/models/mtcnn_model-weights_manifest.json')
167-
// await net.load('/models/tiny_yolov2_model-weights_manifest.json')
167+
// await net.load('/models/tiny_yolov2_separable_conv_model-weights_manifest.json')
168168

169169
// or simply load all models
170170
await net.load('/models')
@@ -197,7 +197,7 @@ const maxResults = 10
197197

198198
// inputs can be html canvas, img or video element or their ids ...
199199
const myImg = document.getElementById('myImg')
200-
const detections = await faceapi.locateFaces(myImg, minConfidence, maxResults)
200+
const detections = await faceapi.ssdMobilenetv1(myImg, minConfidence, maxResults)
201201
```
202202

203203
Draw the detected faces to a canvas:
@@ -356,7 +356,7 @@ const rightEyeBrow = landmarks.getRightEyeBrow()
356356
Compute the Face Landmarks for Detected Faces:
357357

358358
``` javascript
359-
const detections = await faceapi.locateFaces(input)
359+
const detections = await faceapi.ssdMobilenetv1(input)
360360

361361
// get the face tensors from the image (have to be disposed manually)
362362
const faceTensors = await faceapi.extractFaceTensors(input, detections)
@@ -366,50 +366,35 @@ const landmarksByFace = await Promise.all(faceTensors.map(t => faceapi.detectLan
366366
faceTensors.forEach(t => t.dispose())
367367
```
368368

369-
<a name="usage-full-face-detection-and-recognition-pipeline"></a>
369+
<a name="shortcut-functions"></a>
370370

371-
### Full Face Detection and Recognition Pipeline
371+
### Shortcut Functions for Full Face Description
372372

373-
After face detection has been performed, I would recommend to align the bounding boxes of the detected faces before passing them to the face recognition net, which will make the computed face descriptor much more accurate. Fortunately, the api can do this for you under the hood. You can obtain the full face descriptions (location, landmarks and descriptor) of each face in an input image as follows:
373+
After face detection has been performed, I would recommend to align the bounding boxes of the detected faces before passing them to the face recognition net, which will make the computed face descriptor much more accurate. Fortunately, the api can do this for you under the hood by providing convenient shortcut functions. You can obtain the full face descriptions (location, landmarks and descriptor) of each face in an input image as follows.
374374

375-
``` javascript
376-
const fullFaceDescriptions = await faceapi.allFaces(input, minConfidence)
377-
378-
const fullFaceDescription0 = fullFaceDescriptions[0]
379-
console.log(fullFaceDescription0.detection) // bounding box & score
380-
console.log(fullFaceDescription0.landmarks) // 68 point face landmarks
381-
console.log(fullFaceDescription0.descriptor) // face descriptor
375+
Using the SSD Mobilenet v1 face detector + 68 point face landmark detector:
382376

377+
``` javascript
378+
const fullFaceDescriptions = await faceapi.allFacesSsdMobilenetv1(input, minConfidence)
383379
```
384380

385-
You can also do everything manually as shown in the following:
381+
Using the Tiny Yolo v2 face detector + 68 point face landmark detector:
386382

387383
``` javascript
388-
// first detect the face locations
389-
const detections = await faceapi.locateFaces(input, minConfidence)
390-
391-
// get the face tensors from the image (have to be disposed manually)
392-
const faceTensors = (await faceapi.extractFaceTensors(input, detections))
384+
const fullFaceDescriptions = await faceapi.allFacesTinyYolov2(input, { inputSize: 'md' })
385+
```
393386

394-
// detect landmarks and get the aligned face image bounding boxes
395-
const alignedFaceBoxes = await Promise.all(faceTensors.map(
396-
async (faceTensor, i) => {
397-
const faceLandmarks = await faceapi.detectLandmarks(faceTensor)
398-
return faceLandmarks.align(detections[i])
399-
}
400-
))
387+
Or with MTCNN face detection + 5 point face landmarks:
401388

402-
// free memory for face image tensors after we detected the face landmarks
403-
faceTensors.forEach(t => t.dispose())
404-
405-
// get the face tensors for the aligned face images from the image (have to be disposed manually)
406-
const alignedFaceTensors = (await faceapi.extractFaceTensors(input, alignedFaceBoxes))
389+
``` javascript
390+
const fullFaceDescriptions = await faceapi.allFacesMtcnn(input, { minFaceSize: 20 })
391+
```
407392

408-
// compute the face descriptors from the aligned face images
409-
const descriptors = await Promise.all(alignedFaceTensors.map(
410-
faceTensor => faceapi.computeFaceDescriptor(faceTensor)
411-
))
393+
The shortcut functions return an array of FullFaceDescriptions:
412394

413-
// free memory for face image tensors after we computed their descriptors
414-
alignedFaceTensors.forEach(t => t.dispose())
415-
```
395+
``` javascript
396+
const fullFaceDescription0 = fullFaceDescriptions[0]
397+
console.log(fullFaceDescription0.detection) // bounding box & score
398+
console.log(fullFaceDescription0.landmarks) // face landmarks
399+
console.log(fullFaceDescription0.descriptor) // face descriptor
400+
```

examples/public/commons.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,10 @@ function renderNavBar(navbarId, exampleUri) {
146146
uri: 'tiny_yolov2_face_detection_webcam',
147147
name: 'Tiny Yolov2 Face Detection Webcam'
148148
},
149+
{
150+
uri: 'tiny_yolov2_face_recognition',
151+
name: 'Tiny Yolov2 Face Recognition'
152+
},
149153
{
150154
uri: 'batch_face_landmarks',
151155
name: 'Batch Face Landmarks'

examples/server.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ app.get('/mtcnn_face_recognition_webcam', (req, res) => res.sendFile(path.join(v
3232
app.get('/tiny_yolov2_face_detection', (req, res) => res.sendFile(path.join(viewsDir, 'tinyYolov2FaceDetection.html')))
3333
app.get('/tiny_yolov2_face_detection_video', (req, res) => res.sendFile(path.join(viewsDir, 'tinyYolov2FaceDetectionVideo.html')))
3434
app.get('/tiny_yolov2_face_detection_webcam', (req, res) => res.sendFile(path.join(viewsDir, 'tinyYolov2FaceDetectionWebcam.html')))
35+
app.get('/tiny_yolov2_face_recognition', (req, res) => res.sendFile(path.join(viewsDir, 'tinyYolov2FaceRecognition.html')))
3536
app.get('/batch_face_landmarks', (req, res) => res.sendFile(path.join(viewsDir, 'batchFaceLandmarks.html')))
3637
app.get('/batch_face_recognition', (req, res) => res.sendFile(path.join(viewsDir, 'batchFaceRecognition.html')))
3738

examples/views/faceDetectionVideo.html

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,15 @@
5353
let modelLoaded = false
5454
let result
5555

56+
let forwardTimes = []
57+
58+
function updateTimeStats(timeInMs) {
59+
forwardTimes = [timeInMs].concat(forwardTimes).slice(0, 30)
60+
const avgTimeInMs = forwardTimes.reduce((total, t) => total + t) / forwardTimes.length
61+
$('#time').val(`${Math.round(avgTimeInMs)} ms`)
62+
$('#fps').val(`${faceapi.round(1000 / avgTimeInMs)}`)
63+
}
64+
5665
function onIncreaseThreshold() {
5766
minConfidence = Math.min(faceapi.round(minConfidence + 0.1), 1.0)
5867
$('#minConfidence').val(minConfidence)
@@ -63,11 +72,6 @@
6372
$('#minConfidence').val(minConfidence)
6473
}
6574

66-
function displayTimeStats(timeInMs) {
67-
$('#time').val(`${timeInMs} ms`)
68-
$('#fps').val(`${faceapi.round(1000 / timeInMs)}`)
69-
}
70-
7175
async function onPlay(videoEl) {
7276
if(videoEl.paused || videoEl.ended || !modelLoaded)
7377
return false
@@ -79,7 +83,7 @@
7983

8084
const ts = Date.now()
8185
result = await faceapi.locateFaces(videoEl, minConfidence)
82-
displayTimeStats(Date.now() - ts)
86+
updateTimeStats(Date.now() - ts)
8387

8488
faceapi.drawDetection('overlay', result.map(det => det.forSize(width, height)))
8589
setTimeout(() => onPlay(videoEl))

examples/views/tinyYolov2FaceDetectionVideo.html

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,15 @@
6363
let sizeType = 'md'
6464
let modelLoaded = false
6565

66+
let forwardTimes = []
67+
68+
function updateTimeStats(timeInMs) {
69+
forwardTimes = [timeInMs].concat(forwardTimes).slice(0, 30)
70+
const avgTimeInMs = forwardTimes.reduce((total, t) => total + t) / forwardTimes.length
71+
$('#time').val(`${Math.round(avgTimeInMs)} ms`)
72+
$('#fps').val(`${faceapi.round(1000 / avgTimeInMs)}`)
73+
}
74+
6675
function onIncreaseThreshold() {
6776
scoreThreshold = Math.min(faceapi.round(scoreThreshold + 0.1), 1.0)
6877
$('#scoreThreshold').val(scoreThreshold)
@@ -78,11 +87,6 @@
7887
$('#sizeType').val(sizeType)
7988
}
8089

81-
function displayTimeStats(timeInMs) {
82-
$('#time').val(`${timeInMs} ms`)
83-
$('#fps').val(`${faceapi.round(1000 / timeInMs)}`)
84-
}
85-
8690
async function onPlay(videoEl) {
8791
if(videoEl.paused || videoEl.ended || !modelLoaded)
8892
return false
@@ -99,7 +103,7 @@
99103

100104
const ts = Date.now()
101105
result = await faceapi.tinyYolov2(videoEl, forwardParams)
102-
displayTimeStats(Date.now() - ts)
106+
updateTimeStats(Date.now() - ts)
103107

104108
faceapi.drawDetection('overlay', result.map(det => det.forSize(width, height)))
105109
setTimeout(() => onPlay(videoEl))

examples/views/tinyYolov2FaceDetectionWebcam.html

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,15 @@
6464
let sizeType = '160'
6565
let modelLoaded = false
6666

67+
let forwardTimes = []
68+
69+
function updateTimeStats(timeInMs) {
70+
forwardTimes = [timeInMs].concat(forwardTimes).slice(0, 30)
71+
const avgTimeInMs = forwardTimes.reduce((total, t) => total + t) / forwardTimes.length
72+
$('#time').val(`${Math.round(avgTimeInMs)} ms`)
73+
$('#fps').val(`${faceapi.round(1000 / avgTimeInMs)}`)
74+
}
75+
6776
function onIncreaseThreshold() {
6877
scoreThreshold = Math.min(faceapi.round(scoreThreshold + 0.1), 1.0)
6978
$('#scoreThreshold').val(scoreThreshold)
@@ -79,11 +88,6 @@
7988
$('#sizeType').val(sizeType)
8089
}
8190

82-
function displayTimeStats(timeInMs) {
83-
$('#time').val(`${timeInMs} ms`)
84-
$('#fps').val(`${faceapi.round(1000 / timeInMs)}`)
85-
}
86-
8791
async function onPlay(videoEl) {
8892
if(videoEl.paused || videoEl.ended || !modelLoaded)
8993
return false
@@ -100,7 +104,7 @@
100104

101105
const ts = Date.now()
102106
result = await faceapi.tinyYolov2(videoEl, forwardParams)
103-
displayTimeStats(Date.now() - ts)
107+
updateTimeStats(Date.now() - ts)
104108

105109
faceapi.drawDetection('overlay', result.map(det => det.forSize(width, height)))
106110
setTimeout(() => onPlay(videoEl))
@@ -124,7 +128,7 @@
124128
}
125129

126130
$(document).ready(function() {
127-
renderNavBar('#navbar', 'tiny_yolov2_face_detection_video')
131+
renderNavBar('#navbar', 'tiny_yolov2_face_detection_webcam')
128132

129133
const sizeTypeSelect = $('#sizeType')
130134
sizeTypeSelect.val(sizeType)

0 commit comments

Comments
 (0)