こんにちは、虎の穴ラボの後藤です。

この記事は夏の連載企画の11日目の記事です。
前回はA.Mさんによる「Gemini APIを触ってみる」が投稿されました

皆さん、イワトビペンギンお好きでしょうか？イワトビペンギンは黄色い飾り羽が特徴的で、そのユニークな外見とコミカルな動きはとても魅力的です！

今回は TensorFlow.js が公開している、事前トレーニング済みの顔のランドマーク検出モデルを用いて、カメラ映像に対して推論し、推論結果からイワトビペンギンっぽい眉毛を描画してみました。

TensorFlow.js

www.tensorflow.org

TensorFlow.js は Google が開発した JavaScript ライブラリで、JavaScript 実行環境で機械学習モデルのトレーニングや実行を可能にします。これにより、Web 開発者は専用のバックエンドサーバーなしで、慣れ親しんだ技術で機械学習を Web アプリケーションに組み込むことができます。

主に使用した技術

TensorFlow.js: 機械学習用 JavaScript ライブラリ
tfjs-models/face-landmarks-detection: 事前トレーニング済みの顔のランドマーク検出モデル
TypeScript: TensorFlow.js のライブラリ、@tensorflow/tfjs などは TypeScript 対応しており、型を用いた開発ができます！

顔のランドマーク検出

今回は TensorFlow.js に既に用意されているモデルを使用します。

github.com

インストール

ドキュメントを参照しインストール作業を行います。

github.com

npm install @tensorflow/tfjs-core
npm install @tensorflow/tfjs-converter
npm install @tensorflow/tfjs-backend-webgl
npm install @tensorflow-models/face-detection
npm install @tensorflow-models/face-landmarks-detection
npm install @mediapipe/face_mesh

※ エラー対応で一部のパッケージは--legacy-peer-depsオプションを用いてインストールしました。
※ 実行時に@mediapipe/face_meshが求められたため、追加でインストールしました。

モデルの準備

パッケージ経由でモデルの読み込み、および、オプションの設定を行います。

github.com

// main.ts
// 一部省略しています
import * as faceLandmarksDetection from "@tensorflow-models/face-landmarks-detection";
import "@tensorflow/tfjs-core";
import "@tensorflow/tfjs-backend-webgl";
import { MediaPipeFaceMeshTfjsModelConfig } from "@tensorflow-models/face-landmarks-detection";

async function loadModel() {
  const model = faceLandmarksDetection.SupportedModels.MediaPipeFaceMesh;
  const detectorConfig: MediaPipeFaceMeshTfjsModelConfig = {
    runtime: "tfjs",
    refineLandmarks: false,
  };
  return await faceLandmarksDetection.createDetector(model, detectorConfig);
}
async function main() {
  const detector = await loadModel();
}

main();

推論

モデルの準備が出来たので、推論してみます！
estimateFacesでは、オプションとしてstaticImageMode: falseを指定しています。これにより、推論の挙動がビデオに最適なものになります。

https://github.com/tensorflow/tfjs-models/tree/master/face-landmarks-detection/src/tfjs#run-inference

// main.ts
// 一部省略しています
const video = document.getElementById("video") as HTMLVideoElement;
const canvas = document.getElementById("canvas") as HTMLCanvasElement;
const ctx = canvas.getContext("2d") as CanvasRenderingContext2D;

async function setupCamera() {
  const stream = await navigator.mediaDevices.getUserMedia({
    video: true,
  });
  video.srcObject = stream;
  return new Promise((resolve) => {
    video.onloadedmetadata = () => {
      resolve(video);
    };
  });
}

async function detectFace(detector: FaceLandmarksDetector) {
  const estimationConfig = {
    flipHorizontal: false,
    staticImageMode: false,
  };
  const faces = await detector.estimateFaces(video, estimationConfig);
}

async function main() {
  await setupCamera();
  const detector = await loadModel();

  video.play();
  setInterval(() => {
    detectFace(detector);
  }, 100);
}

main();

推論結果の解釈

estimateFacesにより得られる出力を読み解きます。
出力内容には、検出された顔の数だけ、顔の境界ボックスとキーポイントの配列が含まれています。

[
  {
    "keypoints": [
      {
        "x": 331.91830494912404,
        "y": 269.55585071912424,
        "z": -15.809893296379931,
        "name": "lips"
      },
      {
        "x": 334.5417844648529,
        "y": 243.24231764032362,
        "z": -42.690887722326195
      },
　　　　　　　　　　　　// ...
    ],
    "box": {
      "xMin": 230.61824963185705,
      "yMin": 95.33102606265295,
      "xMax": 436.0677465572189,
      "yMax": 329.50020553335946,
      "width": 205.44949692536187,
      "height": 234.16917947070652
    }
  }
]

また、キーポイントのインデックス番号と位置の対応はこちらから確認してください

github.com

イワトビペンギンの眉毛を描画

それでは、イカした黄色い眉毛を描画してみます！
今回はキーポイントを繋いでそれっぽく描画します。

描画処理では、検出されたキーポイントを順に繋いで、キャンバスのパスとして描画しています。

function drawEyebrow(keypoints: Keypoint[]) {
  if (keypoints.length >= 2) {
    ctx.fillStyle = "yellow";
    ctx.beginPath();
    ctx.moveTo(keypoints[0].x, keypoints[0].y);
    keypoints.forEach((keypoint) => {
      ctx.lineTo(keypoint.x, keypoint.y);
    });
    ctx.closePath();
    ctx.fill();
  }
}

async function detectFace(detector: FaceLandmarksDetector) {
  const estimationConfig = {
    flipHorizontal: false,
    staticImageMode: false,
  };
  const faces = await detector.estimateFaces(video, estimationConfig);

  ctx.drawImage(video, 0, 0, canvas.width, canvas.height);

  const rightEyebrowCorners = [107, 65, 52, 53, 46, 234, 139, 162, 71, 21, 68];
  const leftEyebrowCorners = [336, 296, 300, 293, 334, 333, 337];

  faces.forEach((face) => {
    const rightEyebrowKeypoints = rightEyebrowCorners.map(
      (index) => face.keypoints[index]
    );
    const leftEyebrowKeypoints = leftEyebrowCorners.map(
      (index) => face.keypoints[index]
    );

    // Draw the eyebrows
    drawEyebrow(rightEyebrowKeypoints);
    drawEyebrow(leftEyebrowKeypoints);
  });
}

振り返り

既にトレーニング済みである TensorFlow.js の顔のランドマーク検出モデルを用いて、カメラ映像を加工する Web アプリを作成しました。
TensorFlow.js を用いると、慣れ親しんだ JavaScript で機械学習を扱うことができるので、機械学習をWeb アプリに組み込むハードルが下がりました！
機械学習を始めたい Web 開発者の方に TensorFlow.js はとてもオススメです！
また、今回利用したモデル以外にも様々な面白いモデルがあります、ぜひ覗いてみてください！

最後にイワトビペンギンに会えるオススメ水族館をご紹介します。