{ "cells": [ { "cell_type": "markdown", "id": "09937637-0401-42a3-a54e-bf20a3256464", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", "
\n", " GitHub •\n", " Join Discord Community •\n", " Discussion Forum \n", "
\n", "\n", "
\n", " Blog •\n", " Documentation •\n", " About Us \n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", " \n", " \"site\"\n", " \n", " \"blog\"\n", " \n", " \"github\"\n", " \n", " \"slack\"\n", " \n", " \"linkedin\"\n", " \n", " \"youtube\"\n", " \n", " \"twitter\"\n", "
\n", "
" ] }, { "cell_type": "markdown", "id": "fbf7bc42-ba7d-498f-9b82-09584215a5db", "metadata": {}, "source": [ "# Analyzing Hugging Face Datasets\n", "\n", "[![Open in Google Colab](https://img.shields.io/badge/Open%20in%20Colab-blue?style=for-the-badge&logo=google-colab&labelColor=gray)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-hf-datasets.ipynb)\n", "[![Open in Kaggle](https://img.shields.io/badge/Open%20in%20Kaggle-blue?style=for-the-badge&logo=kaggle&labelColor=gray)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-hf-datasets.ipynb)\n", "[![Explore the Docs](https://img.shields.io/badge/Explore%20the%20Docs-blue?style=for-the-badge&labelColor=gray&logo=read-the-docs)](https://visual-layer.readme.io/docs/analyzing-hf-datasets)\n", "\n", "This notebook shows how you can use fastdup to analyze any dataset from [Hugging Face Datasets](https://huggingface.co/docs/datasets/index).\n", "\n", "We will analyze an image classification dataset for:\n", "\n", "+ Duplicates / near-duplicates\n", "+ Outliers\n", "+ Wrong labels" ] }, { "cell_type": "markdown", "id": "34d4d2db", "metadata": {}, "source": [ "## Installation" ] }, { "cell_type": "code", "execution_count": null, "id": "9e2c36c9", "metadata": {}, "outputs": [], "source": [ "import sys\n", "if \"google.colab\" in sys.modules:\n", " # Running in Google Colab\n", " !pip install --force-reinstall --no-cache-dir numpy==1.26.4 scipy fastdup datasets\n", "else:\n", " # Running outside Colab\n", " !pip install -Uq fastdup datasets\n" ] }, { "cell_type": "code", "execution_count": null, "id": "68fcab3a", "metadata": {}, "outputs": [], "source": [ "!pip install -Uq pillow" ] }, { "cell_type": "markdown", "id": "4dea523f", "metadata": {}, "source": [ "Now, test the installation. If there's no error message, we are ready to go." ] }, { "cell_type": "code", "execution_count": 2, "id": "655330c1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.0.21'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastdup\n", "fastdup.__version__" ] }, { "cell_type": "markdown", "id": "40145087", "metadata": {}, "source": [ "## Load Dataset\n", "\n", "In this example we load the Tiny ImageNet dataset from [Hugging Face Datasets](https://huggingface.co/datasets)..\n", "\n", "Tiny ImageNet contains 100,000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images.\n", "\n", "Let's load the dataset into our local directory." ] }, { "cell_type": "code", "execution_count": 3, "id": "9fb0fffc-ba54-4b77-beff-e068ae2f7753", "metadata": {}, "outputs": [], "source": [ "from fastdup.datasets import FastdupHFDataset" ] }, { "cell_type": "code", "execution_count": 4, "id": "ae56315c-07d6-4559-9a4d-fdf1772918e7", "metadata": {}, "outputs": [], "source": [ "dataset = FastdupHFDataset(\"zh-plus/tiny-imagenet\", split=\"train\")" ] }, { "cell_type": "markdown", "id": "be18cac4", "metadata": {}, "source": [ "We can inspect the `dataset` object." ] }, { "cell_type": "code", "execution_count": 5, "id": "85ea7e08", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dataset({\n", " features: ['image', 'label'],\n", " num_rows: 100000\n", "})" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset" ] }, { "cell_type": "code", "execution_count": 6, "id": "3e05ba85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'image': ,\n", " 'label': 0}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]" ] }, { "cell_type": "code", "execution_count": 7, "id": "e1078a54", "metadata": {}, "outputs": [ { "data": { "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABAAEADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDskuERD+8HTPBzSR3aSSr5TNjOGPrWQbmRbMqsy5IxtVdvFMsbtwzBiRkjLYqPrjfw6GWLzWpiYOnZJPc7S3QTKMc59KfcgQr+8JBFZ+j3UhjlZP3ihiPmIAH0qW+uhM4jYgD2Oa7JYmNSm2eQsM6ckjo4Y91hCUJB2Agj3qIJ5jNGSpdMMMZ6f0qSzmQWMYHVFCkfQUyN991xwdnzfn/+uuWUko3Z6kdGkTxzKQxZsADOTWbqk0T6LfTp8p+zsSCMEfKetXDIqSAHl1OxuevoazvFMwi8OXrYALKFJ9iQP60k9LocrWZ5zHdfaozFJMBIPu5GMipYA8Uu/cnyDpn71U7tYoZ1yQHU8Ov3T7GlBEc+4NkHnr3rz4rqccKV5GraQrPvneeRQrHKK2B65qaSbP3ScjjGf8/59Kp21zGkbq7BQT1pHuA5/dce561tG63Oz2MpT0OqsNXEdzB9plCRkbXOfyz/AJ71fl8S6NFLv+1kMD2jY/riuOtbCW5Oe3XJNU9RdbW8S1QbpW4z6ZFNzVrM7IYVzdrnpS6rZzK0sFzHKjEElWBwSOPp0rA8a3oXw0SDkSzBM/TJ/pXnd1e/Zbh3thPthZUlmyF2lugAzlvw9R06Vt381/f6MltdPmONshlAzkZGCO/X2rSE7RszCtQcW7O5gwXJjiMZPDAEjPGRUgl3tknpWaJ4dwPnISemCDmnqLq4DLbrhB959uTj2rne5lFKO5prKZCAe5q9HKIzyOvFc5LdQacPMY/ORgeppbTVLu+kYvCqRg/KwbFU03sdcKsep21tqYhiwMdOtYl80r6pHfRAFM/Nu7HvUUd7BawN5x38HAJxj61zep+NzaxtbW9uMPyRu6UowbN6ddU5XOou9Nsr+8F3vCI+HePzNqkjpuXocVV1rxLDZutrbzRSMuFJUggepJFcTca9c6hpkLQrIsruUfZwvAH88/oadpGlC1vXm1SOQx8bUH8R7Z9qv2a6mFbELojXhtoftyh1UAfxE45roYxdhTGjKEI4G3H8qxbjOC2C7Zz0qW01Ge2xufdjoh6is2rq5wyg3qipd6aJrvEku+aJskY4xTjItsGbI2rzkUpvI5llWVW3k7jIHA5+lUcPcoCsR8sH7xP4f5+ta/CjWnDXUoXN5e6kXKsYIgQFTHJHvVu1sbd1/wBKiR3xwfalldpWZdq5GCSox+B4FLPcyCMhnJQDaMDIX6Z5H4Uc9jZxiNtbnTrORolIKtztx3/zipjqJdwq4aJeEVu1UtisiMNvA5KnJJ605Y/LYkyE44KsuCvPenzXBU0f/9k=", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAIAAAAlC+aJAAAg90lEQVR4AS2ayY4cWZaebbg2Dz7ERDI4ZWblVKpu9aBu5KIFbbTQA2gh9FtoJegN9Ajdy94I6I0AtQQU0JIWmgBJKKmrKlFisipJBoNBMiJ8NHOb7ZqZvuOUZ2RmhLu52b1n+M9//nPN6V/9Z8OaOt/rQ2/0HGNSbm96nWVo12h1boz3sfPhUXj3MHo7t9aOMdq+MZn83zFHbzLMaTSHcZqmKApsa7InbQ6FqQ+WPqip9bm8m06T5eXJhTLMOsuVacVhNBrDZrOJZoFt231Tt2VZF6VrmScnJ2Ga9NOgjWmwDNM0ubU5WbZpje3Yt+16m/d6nJ2djo5jKuff/M2/VUalJ8sYpt4wLWOyTNOwtDUNo1m1Bl907NFom7o/7KvGVmPoO7E/moaaLK6WiydjMuQ1DAO/mcZojvLDHflHT4NjOY6tLMsyj9dprYui6HTvhUHb9G2XO5YZhpFtmG1RZVlmuc6AVViIsg0WzjOw0mRwhzRNbcsp285x3KLrhoGbdcowYq4JRrxg6WkajZHn9v1gj5PlB3ZoWW7f6ao+aMP3fW856ZgPLGtyDNM2JttiE/hjmgbDmPgONuC944q55ygrcVwMbeEo13V127VtV7d17KSWcgJl+57jY+m+t4IgjuOxHydbVjxxY9Nk3dhl0oPceDKUbfODNXh7MMZBNjDh26M1h9HmC4qVjJ3ZuYmrw6ELjdZu+imfrN41e8eJir6yFK7hNoZjGDZ/8BzT6HucYvGuMrSyXMvyTT0qzDcYSimu4IUFRlvVUz1MY7bP5T7KFG8ru+t6VjmOo+0oWfokLsXFlsU9xDiu4mlGh+G71otC31MDIdVVSo+9PTlcMeBitulag2nW49SGulB1Zum105VW1TudqXplJmpKjCl0JtM1WOuEE4hV7GHK7iVqLYMtiQf4g0eSHh5GY516EPMZVu+Mvq3cMCqqQ344cMHs9PTk1Cu2+6asozThJqyeQDLF3A5ftierLqsokoWzI4Ky7fGUZjPqcOIql5CeOt0OtjYdqzKanS5IhM1Q7oymdoyD0rU9GGY/9pmjlpjbNTE/ISKmEsPzX/IfE0gI4lnNco221V3jYGXb5gOeRgazLVaMT/wwCKPI87yJpZq253pWOg5d37XNpAhm0mgyHI/c4Ct6HF++fPnZ02ez2UxcMfVDX09jZ6tJXc0rh4yexrat9YTXjWKqV2Nejt1uqkpHG647era2tTJGbyin4eAM3Ne3HUcWRLwrw1YGt9Xa6HpLd5hGD207Nc3UVMQ/Nycb2RnXf8p4HLDb50EcEIKH/X5sm2WakFhKWWS4NRFFFlDBpnXXaz3WB7xVkOLD2JN2wVzydpp039Xq/zSvfcMnGHgmaGErp7a6tV1spyq3m8GzvMB2fQs0U6Ckodtqb7vKNiPHCBzHA2FcVu8btmO0vUQrgTuMmKIHYgCwDnA1Bok0YkspwRNHkc2u7gmTPM+zQx6fn8znqRrHKs8ncEwC8YhuI2Zt26rd73bPnj3zXBcE643BjYgaYyy77XatXoxvLqLzeZyAq27gB0lQN2W+3k1hOE1W3hauri6800C5th4czWJa1R6CUKV+wP3NwbAGssQkJ8gGComa7LY1a6BcdyaRNBgEUNs0mJPg6XuNw/U48Ky8yE7PzrqmdF1VVUWgQIOJnThesMmztqkkwI7x9vTpU+Ac83iVW+uW1ML12+02igM1/2bphN5gDm3X1FM1VuaqyD6Wa9tJWsqIKd7ny2arJ9DIqpbRItuTbrnVFPPFWZIkhmMemrFntfh1BJrrkWeMPcAJ2rr4i3QmLPgDz+BKCTyrG3pBJ9ziqVa34+QCMHw/y4rZ0gKw664t6iIMYvyFr7A5BvXcwPJIILc1+rqruY+6z9/uSuVaoKLtEHmTWTTNMDbVrhEANwkNxypaCt1Ud6NjNM1dsdljbX/qEteaQvJ5NPTgeZEeJ9KcKBv7BkdRlYmcgEwhbChxgu7k5giYgxmEBs+bTB0ETlMfJsNr+xo/d32txzZOZ3hwl5dAahAEfdOO1LRpYM+u58ECyrom/LCMutteY4fQj5IwGRyflGmGHsPopnFHyjkpT5i45mjYg+mym7GZuVMYBCeJq4am3t2bQTOOjoTHOHVd0zUHTUiNHZCJXRzbIhNd6j5QYVCjgUcLxAczgogI6RzXqg+NYXZYjfRwAupOGxgDKZ7VDRYZJk3MUHXZPK/AdvpB122/3WXigcuLc3LGp7JHoYY2ZIembvCmZ4MumLbDaoPyyG7l+W6kgkHbjg6UVmPVlm2b70cn6wn+eNFPRj+w/coaKs8YfNeKLAUyN01jTyoK3CMp4HYD+NrpztZYUOB2MlrT0rYa+IlTf73Ppv0UzRZEQFVWJoXOcnzbY/XAg6Mdvk5QHQ4HywFUCR5HqcCh1g2UQteePMscB4NEG3oeJeXHtR3Xb5UOiNhhLHerxtqZk46SUwJUAmUYATKATUyEwQacQapBAFRgOW1VW4PluwEb4MGsugXsJ0C/sa2+aSolvtEKLxo6SdNNlmdFBrPguUMxFnU1i2fEEoSEr+MIPIBrq6bF9Gq93buUN91PdU0SST0OXPKsO1QWKWwOMMzW6WsI3tSMzRBO7tiVjmvMA2d5OlfBrDVDsK3hYqFRALVUGWOsYGxUqPlshgeGfsLJtu2xAuyJcbAu4W66I/gzD4FX0MnCN441+aFX5cXhkHl+AjfR/QjogM8kgAHZBQj1RHXkto3uCU2Ks8tbVVv0cGNHWZRNMIQYgskqGJfrhh7ehoX0XbPLi6mqgyCyYSNEcpXVQ1EOVq3NlucD/G1psjKjm9SIdz3bnnqYjqCToSjTlErLGnvXVrqFxVi6qVU6o+5CAnmA1dQe1KHv890+nTt4VMNph0/s1+YiPACVgtqx/lo3arFYEP2GMouqLNqyG4eel27At4kIdiyIQZMXaZSeRafx5M7N6X7b6Uldv78d3t+BC3nTJsuzttNRFFPDiA8HQmlMXUv+tNaim4Up2PP2xxdffPF14Hjkd2wZoet6o19U+UmSdk2zmdpxHp/MFjbg59m56Vw8+6zqJ2UPnWMGYUKJ7AhM06jqxouDNzdvYER0AorV2k1LtIF0mBxbwZDIefir6zmOYfdERtP1Rq1UHPkgn3QOlM/s9rboGj9OBtvM6/z84gIIj9yAOM12h8D1Yj84HIrV6s5/KATbD5zr69dpOvfDGFraFIdx7ADZIEgHoyFWW23kVXUeLCR8qnLW1a4/Y61lpaumBuXxhrQslPQjLeVPAlLd3d0BcIQdpVEAZ8AF+IxPDcsV/o2/IAWdoXQkcEaQQKa7QQt3GnrHU0kSUTsxxHJ5Uu4PPJ245yur7WaZzLKySIrDMVDVzfubx6CG78e+69leA7xTjsaGcj1M3YC5DXsz5b2pncSvjaHKd7ZqYHpN3zmWN/bac4WZGwBJfhDaQiR0MIeyDFpvYVGSXU852uyOgGL22ASa3WhoMbtu6ybTe6soQfwgDJ4uH5NK4XwGO4fK0endr+9m0ew8vFh9vPccF5qwW629+exuu4r8kExr+7YZWi8k6xQ7HyysBO8cqJ82t4BTKUWZsGNv7p2wnbqpjc5YLCICH57ZjZ1L7Fsmxtpud1hZCOIsjcuqYqHsLPB8oAo07Npj1aAk9g3cJg7C2Iuge/t8PSPqdKsM7+LizHK9sm8xBPRzvd4ul6eHrNjW7dnpKfzx+vrmZJa2um/2jXPqQujJXmIPJkEvt93tKGtK2WTX0UoNoStMKXC1OdZ9TetmeT6tHuABF439FAilXEKY+q7frNZ4Tcq6BMwxmniAR9D7XhrPtM+bpsZzVUcNOInmoR80FT5oaZA6uHdXkTTKJXJxuA3+BKF3c3OdROl8PttsV2z77Ows3268JHYdOwh9QAO2m+12+2zr+XBx0q/VTTu0iAIUtpaCjW+N2I0Xiecoen3PCW9X28btDSioDSGsYhpQy6Eqb/c74g63qyLLhXkb036z5RnpfCYQSW9I6XVMCuCxfxeKDCfDQTSvkGn6jrquAng113kq2xV9NxL6VYHtunmUstc83y8gycZENrdlYZNhUOi2ff3DyyY/vHv7rigO5SHrSATdQr9h+Jat//CPfvbZF08H2yfio2gxViW5libL+rBvikLHURB4LDgvC4oJ7iXyWKhDUpL7sO2m6WazznV9CJJNd62nrm8PVU/zI/KGwHkjMNCPEhLwewpyp3b36+XZWZlnk6ZuOGWeUwRZN9TCw/7DcPfuPQ0h1dIaplcvXmS3d9iP9t7sdGAaoelSgjCKOer/9u9+fvXssjXtaL78s3/8T2Bt2WFX74o0nHXVwdAncDpYt3iMHB57BfwQVgAUzQxQZYNU3GwabM/Jd9nYdIkfQyibuvMdP4lieBi+DkkJYrDXcMwqPyySlLYvdDyCFDWJ1loaHYO2TY1llW1q2XANIOO/IQVs2taFf5IKPNKcwOTYth8+eLrfrRa6D2qzKbLrt3d//bv3y0eXD598TkBu9sXDy8e+sjerW9NzZrNkU2QwapXEsVAqgES5m81ut9lzz4cPFxCYieawgxqAmwPxI0Sm7SDYBD/qBTXaNuF7MGLID+ALM7EDAu+TLkBg1zC2wWprqLPAN4UZNKO1oJ/WECjpuug2afioadzf6we3bH3Tffe7K6rYUNOvVVv9odgSl/qrb/5ek8TjLIWuFDlqwH4Yu5PFDOIg6pfveO4CHc1crbdUeKATcQLBBkL9Ca2kCaQc6ZHifxRuYCfsG/SSX5SW/YNjRJ1DU0h8tN1QUxO7YKICsegJwkYNwv28BPDHyQGORiMC/RyH6+nKrLLu2+r3Pv/mx/c3aZL4gbperSFkXhD83f/6H/d3zyBOhqcObXXYbTpjVIEnlZiXcsYoSR+cnXtuWBd1tt0tZkuwgi4AwYyOmABzaB770Qa9pTDQt8CuRvzhEIK2F9Fh0oNjq7odkQd6bQ8jEQ/e4DterF1+UACOWhKqDLoAbZ5DNSL8i0OLYKL18yfPX7768dHlI3+5+J/f//pnX3/726s3+0N+dv7g4/XVenUbLWaf//TL7/74D96v7n68emMlobww8yHLASqSHO0SV5AMjk0CulLDaWdBLBhZQ6+IJqamHrGSrov4sTzLT1wEsFHWXbX8CB3qtT9MATanTrFdHoBsCj1xndD3o4B08udp5HGntiizbba6a/Id3Q+t/j/9Z38exMnb6xse+/O//XnZVF9+/llVZmeL+RfPHoODL375y92Hjxdp+o/+5E8UaI3ctdnsP96t2nYDbeLmaZyg3xGm4Cm1mRUQZ2wDjwOwKJeiI/bmAHDTN1L77HG32RAq6vhD/0VoHttw1o4lQFtsb9PpogEHvk+0YZFAqf3QF/kuz7aHfOfb5jJJtGX+1V//azdN//1/+ls3TbK6nOAPgUdfD8qDwj4pJOJDtK/LD++vgWnKqFQ44LQqu3yf9f4YRQlNJMBMGICz0GuSlRZEmYpqCLsDbOh2RokW+QNoa4qaqBCVEdURSYqdsnj2CUqzcN9BR6KcxXHouwq3tGXFPdBcKReIpfiHVhnrbUDlNPmb//gfZuen//xf/ov//etf/pf/+t9/ePnyq6++YgGPH1y4gbs7ZHXfHbJ9ip6x3dCaHfpuSKO5Z4fb7b7M8q6q3eU5rYNYGz1Dih5dkGiINlWxF+JB9ACoLVy/a9heHIZcbGhpQ3vZydGFtqjV6CpIE44LowO6MIwkOcE1IAZW6LwNd3JNm5QIPK9pD7SHi9P029//9uWL7/P9+urHl9/92XcXZ7NXr+7zbHXinShzIPZ8NjyLre9/9fr+ttysi6k1v/rs628+/8kyjs9i8QATAJQQ8JE1sn4iknqHeENJIy2MvjWaiv6DcKfPdDpNS4grUNa5wok9I3KkYQEgRfaGhMOv5IfV4zQk8pvrG5iUb/t1XruTM/PnULnAQxuhkvTXv33h0tcWeeQMd29/XN9dn9AEetSVbdXs2nY3n9ERizqN7mBDe7B9FKxhPwSAB9ZTNmkvyT15qmiggCHvTAPCCcxD5g6o4pKjVEFQXoRePHP8h32A+aTO0QPkLnWc+s2PBOKxAnJ7IJiAbNuenj1w6coR+byffvEk2Nx+ePrkNy9e/NVf/GXZlJdPHv/D7/6UzvL84UOYDq1C11daaBRdMbQSkiN01sbQxCPxK/qHML4OPQcKxAWsU6BQ1D5qeAfusR9yVzgUzR1jGFCSmOJ6ebEzkp4ygWLNetk/qiGWkIGEZAV/NAwjRrgjOyc1eUu4TF26NPt1pxv97ZffLmbz3736EWc9++y5PZqH7HB6erruNnf39z13Q6aqS/JR1XVNYiHiYRIKAsUdwGlqunJZJUWLX1gSQU0s8BHXEOgA1NHckIORoOJjFsY6MLvshOjuR/QO9iExg/xgNay+Uz3blg10PXiFVu4FPsQFkrfvCBvDDNzLyws1IIyoi+U5alWcJvQbq9324flD4He12RyykhnCPs8/3H7kCUrkX9btiHzHoEp0Xmkq6wDh4Yg22JUuSIY9UvkZZDAAEYTiB9vLNcdPCQ7ASnKbi6TZk4GHLJe6N3RMuPAoej8v+Zyq0vQqTpXnorOjUvVNV1ERmhpSnIYpJqM2ABG1jeZlL9KFbbkNzUdFEWVkNN5+3Hz4sEpgu5QtYRMi5wwQW4WazmOQ5iT6YcCiaUowE7HE7zEHWZRsCPxlFxOhL5xMpAzeII/RRGF88FKL8RiSsYg5Evrs0NJsgMfxrJLRhlh9mp0sYX59ickCOqRf/+b//vT3fw+pAV9Ki0AEIhymM8jloa4OVbfNy/Vm8+bdB9x9uThDMAsIDCk3QtaODfERwWVLcIFPToDLE3Wi70mUED7siSgUq/PDG/KS/336k+USMFgBaZI2SpJEemuJSLktaaXpDKbNPqNpj5dL2DtNF5STbED+X+92iNuAnuWhDKR0nDDndLa8L8p1Xry7XX28u4Pg+VF4vy+kI8PdmIpE5BmSu0wRugZugxPEB/iHf2Rt+B6bi0PYBv/DP+KXYziB+yQLVuDF82Sl8hWTzpBUhrzKyI7SxfuQkk6Tdjm1z9HzeElskXEkRlGWput/3G5DlBHTLtuG4ocVPqzXRNeb9zcfV/c0cMxO3SjpTOOH63fKd1xMBLbQevJ4FjTAdZTiDR/RGgGj7VgKEjEWPk6A2CbtQUkNYmQF/7N9FzWJ1ckshhEYnQplD4+hKpk2mi0FmyqHbktgsU920nSoyy3Jz9co9aePL9uyuft4O4DiOEcjsu9dL9DmVO42QOAPb99c3bwzPe/9dg1dfPL8We+oQ1nQr0INhFt6tkMMCflRkxRNZSFVULwI9SgBpBLsmmWHfXkYNPcJ0IJF02O4ZoMvsKIxOTmh5DGhZeVHiQ4WW9tlSYeuXIcpFdfQATomFnPCOMUKJG5nWAd8XrdN2xw6nfUa0QWwf/vumlYQkpSV5Srbseicux/qmpLkOduaOV6dLhZffv21oiUV42EJNDyUV+LcVqgwvGRQZxudBgxyus39Dt2pPj85R65jk5M/EXaUAGnK+66oEUEAEjqeI2S5gRAG8KZjnshwlQx1/CCK0hkKFNeygfpQg2kZyocumqreluW+aXb3d0/s50zw6AyQB9+u7le7jRfFiFA0tijGUJTenM4uzmFHn//kCwUDIchJRNZqoMQZ0iv9/3ekgQFb8/1OBmxMPNNkFiQx3aIgKulBvZbhGp2y3t3coA9AyNHkkGqg6GAL1AephorTdBrGzQ4QaICovKxAJib1BBITha7pmeHl5ERdv3p7dVcVj588w1FmMB3aH5CpoW6gIQcE0LQoQygff/QP/vD551+Iss1CMBWrEQ2CVEail+SkfCIKUSHkRRPgxCGq4MXpOYEm1BpXksqmhe7MEI4Gf7fbIccWRcWKT5cM7eZsAn5/eb4kk0gq6MQBat9vGD+sVptdxoSY2RmHGA6wGOlGWkJvQEXs3l6z16+++fIXf/fLsq6IUt5MFnNaMAYsYeCenC7iJEQavL37oMhUkFYWBFIK0aKGCopjvoo5W1WRfbPFHAme3gvxCa4vbaRAkUH0N/TsCNZFRUVDYWcqxQGFTz+2j9DnZkXFVB3D47Rdfuh1xgAeMfGeXtFWbABFbLPaVBVjHowitCrLi199/+unX3z2i1/8gnXrrnVDn4MSViPz4/nJEmXl1avfHZdBByIxT7Nh04BDSxDj5PiAY2EPcYvreL5PrUEUgXURJGgXAodM/doeHp5tZfV4iQkVRx1QFGezBeNoQoiW2lNmy6zN6Gwbxm+WBfOvCVxHYZZfaM+rmh/WDgPjvty4QfDhVVXbzS6JU0E/yz85Oz17cCb51nVwcuyy3qybpl2enahfff898jpNEqOoisbiOJyK0wj0Q0MOg5R/54tkPltioXKW3H+8x/xkdl4dVuv1fp9jOEd5cFfqUZik0WzueszaSN+aWd9JitqVI/pCYUlZigZYcn+/BnCpBts83yFGdcKRsBukSAvhmk4vHiD3//F3f/r66g0HDNI0efjwcr6cbTNGxhtUDIScbMxcptfJyQLDb8sc+Gf2RDQZIeSWjHX9JKGkIz6zE4pVxwjHaJGsAS5ACXHvfr/l6Yxe/MSP5gvqFXOQCgGIoydwcpmMhO83G9Qk7kvhIAhB0k1WHBjeMc1yvTCJK8YIVovoQfBAytw4lEIKfBGPvvOzP/j7ZBdC5M3NDR56eP4Axfv29pb52jKa0SGiEruma7mDh/ZrDigME8BB5Xr2/Lnol8oSttiUpAf5gHLIBAYQpA6RKpQkUiWMEdEWNIzgAUBw1FeEYpBP5BlHQhjU8FM13eGAMrrndBNL8f0AEhuG8YMHDIBNNsCbOJb7g0oC4qYRQ5I9DxaNd8BCPLq5X2Ed+mk4qiAOmAMpQ3Ljy6isHr+GbhgHru/c399/omjUMokln+kBi3F2u9uqbMAtoJ3EmMYG7OIx5JGwUfKGID9SUlEBuomPCKHV/Wa1WjHZ3e/3BFIQR0ch0IgiZ3a6ZJrNTUAF9ObNnm5Lpne6b20jAdOTWYq7YOFsLc92bACA5tEsDOQQddZVirMN9Dj0JsQx4i4DG8ZfPfnfweM7p3KYSLA0rkH1wmDYL46Y0g8bZwcayI1ojUStxO5oqEoYA30Bnadt3d18ePXqDRhPRh5Zeo9ggUhD0Way55qoK3IkivEl35lHSeuKjMFWQTpAURkRoxAm02ipDUjcNBgFUgXHYmOwUZ/EYlxGc03nQocxRgFNGkhCX1C1Xs8hAsJRzmlQp63zBxeAOc9jAyg9hDUiOXdE29MmxLCF0BI/SK1y6mwYbgjYm/fZdoPNzpenmJns5OskGb/L4Q8m2LAJkTKYpHKCBL1ZztLVZQkJJlo41wRYY3XGgYSzgCAlGV3jSPCJchHn2BDf4alkHlQnjAJGQ2ivnmJkwItJFrIndZCBgrPfcE7pAHI7pkcQj640X2wAiKHRogDiblRH6CCE77e/ecE+Yi84nS/ncYTZJVUw/DFrGdvIlBldmh6Ecy2jIVMZGCRD267NNztavQ5WhgxIakszyPkZajLVlzBl1zheznyBXUZAqaIa0x8zrRqMkLEuN0ce4BQbVpXGwrQRrd3ANGF6jAfaGAEtSn3PleM1HK5B+RJ6IVwI5g9PKPZ7DIz3ySION8mITo47imhAmOA3xACW1be1sgIQARK7X6+ICMgHJB+tn5rF3bgSw1BesRT60qfGg8VL78WLrchu0KA5wLMDJcogCx89e4IWZfGNgeiEYIuI7AApNbs3SYau7gaXHEVmo0LXTBcIAkQKqD18S1o3TjtWxbOnj1klpYmABuHJN/EAqipnowh9c5KjKlzNKgwk3p6v8Od2s2cSjmY3zph1i7wqLwwJSApVk4m/9OtkG+dzCCqiBetwAkGUNszZDZvbdTyL03lK4aNVYLxPBeC4AP1oUzYI608fP8Ouh30GF4LA4QEcKN7kZQ3XV69hRHEYwF5ZkGhnPcNtC7kVdEKKcx3/6ZPL87MHjAavrq4YjmAgPqXu7LPs6eNHFZNWAJHZVN3OOUxK8A2cifFlsGKa83QmXAFJBUKHC8kWzOYRP5xbom1WTpWV9PsMQhnT4yrm+XRSjLKQZOiJcY0wAUookQ4vqDmPo/3IotilUchAKQyR5nvS5vLyIakPBPE8XsxeKSdBhP4aXV29fvnyBSzr0aNHaRq/f//xhx9+UEC4stJEpioiGYmeJFoORF3ChHNpEAHKDO2uYbJOGCKTdmHn4AmfURykQA3HIGlbZlhu4MEsOGUrDKgdr16+GzqTsek8mYsqYnvrfsXUNqBrcd2urHALsY9UQ4tIBP/21Y+fWBfvEwUwDNRFPuUjL1DTYeTYmAzDIzkyFyZ8awCeF2liEyYdg7OROQq2oH5KtrAVZg6uR8qKDAN8w6kYXEO1QSFaXTQsZBWuQ6ykE6Ed4bwKeS5VqWFiiQcb5oigSgRPAK5cn44iC/YtvJe4tDkKIGxMQFSaTJsopKDgZ6QidshEnWNvDFv3eQZT4AI5+kd6MI6jDjOopDu2DGKMF6056Qv+YlnGpFQGadJF+BDBhhLBF5COGcKJCsqLwilNMG8zyKDA2bS7MUIne2XaX7SNOZacgSFJVkgDN3fcHWtRU1kB4ZEVBT3afnOP/gyLQLeiXr15+xpCKzxXawZb9KKoCVIcaZHR6o5H6HAOT+msjiDl1AIEBDPzFXYCl8FXmH6Wztg/L94XaZBhD2KHTMePL4qUbB7VkOXLcQk5HiKHoDwmd6gavRwLkIN7nOn1KNAUASgLG+BibsqXZa6z3zO04tk0ZFSWI+91nj5/FiUzkI0zDRBejAag49BHD87RfFgNnfb9+3v8EwWwzjSOI6g4u8U4dEz8Qu4KbJfiDS6ghiAPfcppHq1IVK7D4XJcnBaLM8PoFHI6jSzi2AvOICkkhIAtxo/SzMAUQptSjTG5BVWGvox18wnPePDgAYezgBOQFdO+fffB2+0xKmtlZXK4SroPe71DZKZH15B55FuOr++3tKXXj84uCWlaVb5L3W1zqWvUBN4sipxHkMosg/8SS7j9eEZY3gZ8CCx0Yk4RyIkrCB3ncRCH256DCKPtWViU4tMUIx5GRmAR9FNSyEHFrqEvZfVyROJkhj8YcV8foOjT5eVljgBS5HxIM8W34LSyW2qKjJnlv3iGF8kNsb1f3bb1jPPmHMUHGMWf9EpJ8tVPvn79+jV9M6C8nNGUoQzUdOr/DxeojuhFQ78kAAAAAElFTkSuQmCC", "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]['image']" ] }, { "cell_type": "code", "execution_count": 8, "id": "07daca49", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]['label']" ] }, { "cell_type": "code", "execution_count": 9, "id": "403cdf68-b3c3-4c64-b96b-4a0e1d86a526", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenamelabel
0/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71384.jpg142
1/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71204.jpg142
2/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71036.jpg142
3/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71014.jpg142
4/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71334.jpg142
.........
99995/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63864.jpg127
99996/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63822.jpg127
99997/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63874.jpg127
99998/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63824.jpg127
99999/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63752.jpg127
\n", "

100000 rows × 2 columns

\n", "
" ], "text/plain": [ " filename label\n", "0 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71384.jpg 142\n", "1 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71204.jpg 142\n", "2 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71036.jpg 142\n", "3 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71014.jpg 142\n", "4 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71334.jpg 142\n", "... ... ...\n", "99995 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63864.jpg 127\n", "99996 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63822.jpg 127\n", "99997 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63874.jpg 127\n", "99998 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63824.jpg 127\n", "99999 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63752.jpg 127\n", "\n", "[100000 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.annotations" ] }, { "cell_type": "markdown", "id": "6aac94ea", "metadata": {}, "source": [ "## Run fastdup" ] }, { "cell_type": "code", "execution_count": null, "id": "4f02adbc", "metadata": {}, "outputs": [], "source": [ "fd = fastdup.create(input_dir=dataset.img_dir)\n", "fd.run(annotations=dataset.annotations)" ] }, { "cell_type": "markdown", "id": "676d9175", "metadata": {}, "source": [ "## Inspect Issues" ] }, { "cell_type": "markdown", "id": "1017106b", "metadata": {}, "source": [ "There are several methods we can use to inspect the issues found:\n", "\n", "```python\n", "fd.vis.duplicates_gallery() # create a visual gallery of duplicates\n", "fd.vis.outliers_gallery() # create a visual gallery of anomalies\n", "fd.vis.component_gallery() # create a visualization of connected components\n", "fd.vis.stats_gallery() # create a visualization of images statistics (e.g. blur)\n", "fd.vis.similarity_gallery() # create a gallery of similar images\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "id": "8f558b89", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/dnth/anaconda3/envs/fastdup2021/lib/python3.10/site-packages/fastdup/galleries.py:102: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n", "/home/dnth/anaconda3/envs/fastdup2021/lib/python3.10/site-packages/fastdup/galleries.py:102: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0ae5d5bafd954cf895f406161a28a492", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Duplicates Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Duplicates Report

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/67/33675.jpg
To/125/62847.jpg
From_Label67
To_Label125
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/199/99746.jpg
To/177/88551.jpg
From_Label199
To_Label177
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/190/95277.jpg
To/13/6631.jpg
From_Label190
To_Label13
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/141/70895.jpg
To/8/4204.jpg
From_Label141
To_Label8
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/14/7463.jpg
To/198/99073.jpg
From_Label14
To_Label198
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/35/17643.jpg
To/37/18797.jpg
From_Label35
To_Label37
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/180/90258.jpg
To/174/87495.jpg
From_Label180
To_Label174
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/67/33640.jpg
To/125/62558.jpg
From_Label67
To_Label125
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/138/69225.jpg
To/102/51355.jpg
From_Label138
To_Label102
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/67/33973.jpg
To/125/62815.jpg
From_Label67
To_Label125
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/17/8525.jpg
To/16/8111.jpg
From_Label17
To_Label16
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.duplicates_gallery()" ] }, { "cell_type": "code", "execution_count": 12, "id": "de484e82", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cefa94b323da4ffc9cbb80e79cc58552", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Outliers Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Outliers Report

Showing image outliers, one per row

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.600712
Path/198/99254.jpg
label198
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.639867
Path/12/6152.jpg
label12
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.642672
Path/94/47232.jpg
label94
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.654982
Path/35/17626.jpg
label35
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.663625
Path/10/5240.jpg
label10
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.665014
Path/173/86745.jpg
label173
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.666785
Path/197/98818.jpg
label197
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668334
Path/54/27267.jpg
label54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668349
Path/78/39235.jpg
label78
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668735
Path/196/98461.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.66936
Path/54/27129.jpg
label54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.671666
Path/84/42148.jpg
label84
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.672583
Path/145/72520.jpg
label145
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.673422
Path/94/47006.jpg
label94
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.67446
Path/196/98207.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.674789
Path/196/98021.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.676092
Path/197/98911.jpg
label197
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.677318
Path/87/43785.jpg
label87
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.678071
Path/160/80147.jpg
label160
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.678366
Path/140/70208.jpg
label140
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.outliers_gallery()" ] }, { "cell_type": "code", "execution_count": 13, "id": "c5a7080b-04ff-42e3-8bdc-eb91d16e695d", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "89884a8af2164c9d8de7292e1d9e7252", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/7287 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Similarity Report, label_score\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Similarity Report, label_score

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label0
from/0/35.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906011/85/42517.jpg85
0.905423/190/95331.jpg190
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/513.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.911764/85/42716.jpg85
0.907565/166/83446.jpg166
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/515.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.933797/9/4557.jpg9
0.931858/93/46608.jpg93
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/521.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.916001/5/2756.jpg5
0.915641/5/2731.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/650.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923444/7/3800.jpg7
0.909015/17/8647.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/657.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.930722/166/83497.jpg166
0.930567/17/8749.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/671.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.925565/198/99447.jpg198
0.917802/17/8681.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/692.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.915914/15/7715.jpg15
0.907856/2/1496.jpg2
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/712.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906601/197/98725.jpg197
0.903525/196/98381.jpg196
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/732.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906571/5/2741.jpg5
0.900979/9/4555.jpg9
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/737.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.930051/17/8949.jpg17
0.926385/35/17505.jpg35
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/763.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.942992/195/97948.jpg195
0.940392/17/8642.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/769.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923526/46/23481.jpg46
0.914404/5/2583.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/852.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923934/148/74041.jpg148
0.920057/7/3839.jpg7
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/857.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906768/24/12085.jpg24
0.904972/191/95642.jpg191
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/868.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.909222/3/1599.jpg3
0.905293/111/55995.jpg111
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/871.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.914131/145/72763.jpg145
0.913387/9/4771.jpg9
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/899.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.911091/7/3800.jpg7
0.905312/5/2839.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/948.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.924178/40/20152.jpg40
0.921691/198/99487.jpg198
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/964.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.939224/35/17505.jpg35
0.939199/17/8749.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fromtolabellabel2distancescorelength
4/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/0/35.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/190/95331.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42517.jpg][0, 0][190, 85][0.905423, 0.906011]0.02
14/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/513.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/166/83446.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42716.jpg][1, 1][166, 85][0.907565, 0.911764]0.02
16/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/515.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/93/46608.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/9/4557.jpg][1, 1][93, 9][0.931858, 0.933797]0.02
19/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/521.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2731.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2756.jpg][1, 1][5, 5][0.915641, 0.916001]0.02
68/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/650.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/17/8647.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/7/3800.jpg][1, 1][17, 7][0.909015, 0.923444]0.02
........................
7273/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49882.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49544.jpg][99, 99][99, 99][0.904351, 0.913638]100.02
7275/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49895.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49799.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49806.jpg][99, 99][99, 99][0.92261, 0.92414]100.02
7279/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49919.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49734.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49875.jpg][99, 99][99, 99][0.904262, 0.913118]100.02
7283/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49940.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49877.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49858.jpg][99, 99][99, 99][0.91175, 0.914616]100.02
7285/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49644.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49705.jpg][99, 99][99, 99][0.913667, 0.917606]100.02
\n", "

3796 rows × 7 columns

\n", "
" ], "text/plain": [ " from to label label2 distance score length\n", "4 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/0/35.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/190/95331.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42517.jpg] [0, 0] [190, 85] [0.905423, 0.906011] 0.0 2\n", "14 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/513.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/166/83446.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42716.jpg] [1, 1] [166, 85] [0.907565, 0.911764] 0.0 2\n", "16 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/515.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/93/46608.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/9/4557.jpg] [1, 1] [93, 9] [0.931858, 0.933797] 0.0 2\n", "19 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/521.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2731.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2756.jpg] [1, 1] [5, 5] [0.915641, 0.916001] 0.0 2\n", "68 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/650.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/17/8647.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/7/3800.jpg] [1, 1] [17, 7] [0.909015, 0.923444] 0.0 2\n", "... ... ... ... ... ... ... ...\n", "7273 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49882.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49544.jpg] [99, 99] [99, 99] [0.904351, 0.913638] 100.0 2\n", "7275 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49895.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49799.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49806.jpg] [99, 99] [99, 99] [0.92261, 0.92414] 100.0 2\n", "7279 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49919.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49734.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49875.jpg] [99, 99] [99, 99] [0.904262, 0.913118] 100.0 2\n", "7283 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49940.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49877.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49858.jpg] [99, 99] [99, 99] [0.91175, 0.914616] 100.0 2\n", "7285 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49644.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49705.jpg] [99, 99] [99, 99] [0.913667, 0.917606] 100.0 2\n", "\n", "[3796 rows x 7 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.similarity_gallery(slice='diff')" ] }, { "cell_type": "markdown", "id": "f94163c2-23ce-45b6-ac7d-876bcc78b7c7", "metadata": {}, "source": [ "## Interactive Exploration\n", "In addition to the static visualizations presented above, fastdup also offers interactive exploration of the dataset.\n", "\n", "To explore the dataset and issues interactively in a browser, run:" ] }, { "cell_type": "code", "execution_count": null, "id": "c18ee572-b40c-4886-9f71-e271baa1dec1", "metadata": {}, "outputs": [], "source": [ "fd.explore()" ] }, { "cell_type": "markdown", "id": "f0419c95-e735-40ba-85c8-381f818c131e", "metadata": {}, "source": [ "> 🗒 **Note** - This currently requires you to sign-up (for free) to view the interactive exploration. Alternatively, you can visualize fastdup in a non-interactive way using fastdup's built in galleries shown in the upcoming cells.\n", "\n", "You'll be presented with a web interface that lets you conveniently view, filter, and curate your dataset in a web interface.\n", "\n", "\n", "![image.png](https://vl-blog.s3.us-east-2.amazonaws.com/fastdup_assets/cloud_preview.gif)" ] }, { "cell_type": "markdown", "id": "c4bbde22-82ab-4b8b-af3d-6850cb591cca", "metadata": {}, "source": [ "## Wrap Up\n", "\n", "That's a wrap! In this notebook, we showed how to get mislabels from a labeled dataset.\n", "\n", "\n", "Next, feel free to check out other tutorials -\n", "\n", "+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n", "+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n", "+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n", "+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. \n", "\n", "As usual, feedback is welcome! Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues).\n" ] }, { "cell_type": "markdown", "id": "3e734821-bb09-4236-9d8e-9fb4ecf47d12", "metadata": {}, "source": [ "
\n", "
\n", " \n", " \"site\"\n", " \n", " \"blog\"\n", " \n", " \"github\"\n", " \n", " \"slack\"\n", " \n", " \"linkedin\"\n", " \n", " \"youtube\"\n", " \n", " \"twitter\"\n", "
\n", "
\n", "
\n", " \"logo\"\n", "
Copyright © 2024 Visual Layer. All rights reserved.
\n", "
\n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }