iterative · daavoo · Aug 3, 2023 · Aug 3, 2023 · Aug 3, 2023 · Aug 3, 2023
diff --git a/.devcontainer.json b/.devcontainer.json
@@ -1,6 +1,7 @@
 {
   "name": "example-repos-dev",
   "image": "mcr.microsoft.com/devcontainers/python:3.10",
+  "runArgs": ["--ipc=host"],
   "extensions": ["Iterative.dvc", "ms-python.python", "redhat.vscode-yaml"],
   "features": {
       "ghcr.io/devcontainers/features/nvidia-cuda:1": {

diff --git a/example-get-started-experiments/code/README.md b/example-get-started-experiments/code/README.md
@@ -1,7 +1,5 @@
-[![DVC](https://img.shields.io/badge/-Open_in_Studio-grey.svg?style=flat-square&logo=dvc)](https://studio.iterative.ai/team/Iterative/projects/example-get-started-experiments-y8toqd433r) 
-[![DVC-metrics](https://img.shields.io/badge/dynamic/json?style=flat-square&colorA=grey&colorB=F46737&label=Dice%20Metric&url=https://github.com/iterative/example-get-started-experiments/raw/main/results/evaluate/metrics.json&query=dice_multi)](https://github.com/iterative/example-get-started-experiments/raw/main/results/evaluate/metrics.json)
-
-[Train Report](./results/train/report.md) - [Evaluation Report](./results/evaluate/report.md)
+[![DVC Studio](https://img.shields.io/badge/-Open_in_Studio-grey.svg?style=flat-square&logo=dvc)](https://studio.iterative.ai/team/Iterative/projects/example-get-started-experiments-y8toqd433r) 
+[![DVC-metrics](https://img.shields.io/badge/dynamic/json?style=flat-square&colorA=grey&colorB=F46737&label=Dice%20Metric&url=https://github.com/iterative/example-get-started-experiments/raw/main/dvclive/metrics.json&query=metrics/mAP50(M))](https://github.com/iterative/example-get-started-experiments/raw/main/dvclive/metrics.json)
 
 # DVC Get Started: Experiments
 
@@ -11,8 +9,6 @@ This is an auto-generated repository for use in [DVC](https://dvc.org)
 This is a Computer Vision (CV) project that solves the problem of segmenting out 
 swimming pools from satellite images. 
 
-[Example results](./results/evaluate/plots/images/)
-
 We use a slightly modified version of the [BH-Pools dataset](http://patreo.dcc.ufmg.br/2020/07/29/bh-pools-watertanks-datasets/):
 we split the original 4k images into tiles of 1024x1024 pixels.
 
@@ -58,7 +54,7 @@ $ dvc pull
 ## Running in your environment
 
 Run [`dvc exp run`](https://man.dvc.org/exp/run) to reproduce the
-[pipeline](https://dvc.org/doc/user-guide/pipelines/defining-pipelinese):
+[pipeline](https://dvc.org/doc/user-guide/pipelines/defining-pipelines):
 
 ```console
 $ dvc exp run
@@ -107,47 +103,3 @@ This tag also contains a GitHub Actions workflow that reruns the pipeline if any
  changes are introduced to the pipeline-related files. 
 [CML](https://cml.dev/) is used in this workflow to provision a cloud-based GPU 
 machine as well as report model performance results in Pull Requests.
-
-## Deploying the model
-
-Check out the [PR](https://github.com/iterative/example-get-started-experiments/pulls)
-that adds this model to 
-[Iterative Studio Model Registry](https://dvc.org/doc/studio/user-guide/model-registry/what-is-a-model-registry).
-You can [trigger CI/CD](https://dvc.org/doc/studio/user-guide/model-registry/use-models#deploying-and-publishing-models-in-cicd)
-by [registering versions](https://dvc.org/doc/studio/user-guide/model-registry/register-version)
-and [assigning stages](https://dvc.org/doc/studio/user-guide/model-registry/assign-stage)
-in Model Registry, building and publishing Docker images with the model,
-or deploying the model to the cloud.
-
-## Project structure
-
-The data files, DVC files, and results change as stages are created one by one.
-After cloning and using [`dvc pull`](https://man.dvc.org/pull) to download
-data, models, and plots tracked by DVC, the workspace should look like this:
-
-```console
-$ tree -L 2
-.
-├── LICENSE
-├── README.md
-├── data.            # <-- Directory with raw and intermediate data
-│   ├── pool_data    # <-- Raw image data
-│   ├── pool_data.dvc # <-- .dvc file - a placeholder/pointer to raw data
-│   ├── test_data    # <-- Processed test data
-│   └── train_data   # <-- Processed train data
-├── dvc.lock
-├── dvc.yaml         # <-- DVC pipeline file
-├── models
-│   └── model.pkl    # <-- Trained model file
-├── notebooks
-│   └── TrainSegModel.ipynb # <-- Initial notebook (refactored into `dvc.yaml`) 
-├── params.yaml      # <-- Parameters file
-├── requirements.txt # <-- Python dependencies needed in the project
-├── results          # <-- DVCLive reports and plots
-│   ├── evaluate
-│   └── train
-└── src              # <-- Source code to run the pipeline stages
-    ├── data_split.py
-    ├── evaluate.py
-    └── train.py
-```
diff --git a/example-get-started-experiments/code/TrainSegModel.ipynb b/example-get-started-experiments/code/TrainSegModel.ipynb
@@ -0,0 +1,178 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import shutil\n",
+    "from pathlib import Path\n",
+    "\n",
+    "import cv2\n",
+    "from ultralytics import YOLO\n",
+    "\n",
+    "DATA = Path(\"datasets\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load data and split it into train/test\n",
+    "\n",
+    "We have some [data in DVC](https://dvc.org/doc/start/data-management/data-versioning) that we can pull. \n",
+    "\n",
+    "This data includes:\n",
+    "* satellite images\n",
+    "* masks of the swimming pools in each satellite image\n",
+    "\n",
+    "DVC can help connect your data to your repo, but it isn't necessary to have your data in DVC to start tracking experiments with DVC and DVCLive."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!dvc pull"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Convert to YOLO Dataset format\n",
+    "\n",
+    "https://docs.ultralytics.com/datasets/segment/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def mask_to_yolo_annotation(mask):\n",
+    "    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)\n",
+    "    annotation = \"\"\n",
+    "    for contour in contours:\n",
+    "        single_annotation = \"0\"\n",
+    "        for row, col in contour.squeeze():\n",
+    "            single_annotation += f\" {round(col / mask.shape[1], 3)} {round(row / mask.shape[0], 3)}\"\n",
+    "        annotation += f\"{single_annotation}\\n\"\n",
+    "    return annotation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_regions = [\"REGION_1-\"]\n",
+    "\n",
+    "train_data_dir = DATA / \"yolo_dataset\" / \"train\"\n",
+    "train_data_dir.mkdir(exist_ok=True, parents=True)\n",
+    "test_data_dir = DATA / \"yolo_dataset\" / \"val\"\n",
+    "test_data_dir.mkdir(exist_ok=True, parents=True)\n",
+    "\n",
+    "for img_path in DATA.glob(\"pool_data/images/*.jpg\"):\n",
+    "    yolo_annotation = mask_to_yolo_annotation(\n",
+    "        cv2.imread(\n",
+    "            str(DATA / \"pool_data\" / \"masks\" / f\"{img_path.stem}.png\"),\n",
+    "            cv2.IMREAD_GRAYSCALE\n",
+    "        )\n",
+    "    )\n",
+    "\n",
+    "    if any(region in str(img_path) for region in test_regions):\n",
+    "        dst = test_data_dir / img_path.name\n",
+    "    else:\n",
+    "        dst = train_data_dir / img_path.name\n",
+    "\n",
+    "    shutil.copy(img_path, dst)\n",
+    "    dst.with_suffix(\".txt\").write_text(yolo_annotation)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "yolo_dataset_yaml = DATA / \"yolo_dataset.yaml\"\n",
+    "yolo_dataset_yaml.write_text(\n",
+    "    \"\"\"\n",
+    "path: ./yolo_dataset\n",
+    "train: train\n",
+    "val: val\n",
+    "\n",
+    "names:\n",
+    "  0: pool\n",
+    "    \"\"\"\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Train model\n",
+    "Set up model training, using DVCLive to capture the results of each experiment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "imgsz = 512\n",
+    "epochs = 20\n",
+    "model = \"yolov8n-seg.pt\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "yolo = YOLO(model)\n",
+    "\n",
+    "yolo.train(data=yolo_dataset_yaml, epochs=epochs, imgsz=imgsz)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/example-get-started-experiments/code/data/.gitignore b/example-get-started-experiments/code/data/.gitignore
diff --git a/example-get-started-experiments/code/datasets/.gitignore b/example-get-started-experiments/code/datasets/.gitignore
@@ -0,0 +1,2 @@
+/pool_data
+/yolo_dataset
diff --git a/...arted-experiments/code/data/pool_data.dvc → ...d-experiments/code/datasets/pool_data.dvc b/...arted-experiments/code/data/pool_data.dvc → ...d-experiments/code/datasets/pool_data.dvc