|
2 | 2 | "cells": [
|
3 | 3 | {
|
4 | 4 | "cell_type": "markdown",
|
5 |
| - "id": "0f289675", |
| 5 | + "id": "e61ebd40", |
6 | 6 | "metadata": {},
|
7 | 7 | "source": [
|
8 | 8 | "<h3>NLP Tutorial: Text Classification Using Spacy Word Embeddings</h3>"
|
9 | 9 | ]
|
10 | 10 | },
|
11 | 11 | {
|
12 | 12 | "cell_type": "markdown",
|
13 |
| - "id": "d66b56e0", |
| 13 | + "id": "9b6e2a9d", |
14 | 14 | "metadata": {},
|
15 | 15 | "source": [
|
16 | 16 | "#### Problem Statement\n",
|
|
41 | 41 | {
|
42 | 42 | "cell_type": "code",
|
43 | 43 | "execution_count": 4,
|
44 |
| - "id": "16363902", |
| 44 | + "id": "5e5e94c2", |
45 | 45 | "metadata": {},
|
46 | 46 | "outputs": [
|
47 | 47 | {
|
|
136 | 136 | {
|
137 | 137 | "cell_type": "code",
|
138 | 138 | "execution_count": 5,
|
139 |
| - "id": "a3b3c28e", |
| 139 | + "id": "62809942", |
140 | 140 | "metadata": {},
|
141 | 141 | "outputs": [
|
142 | 142 | {
|
|
159 | 159 | },
|
160 | 160 | {
|
161 | 161 | "cell_type": "markdown",
|
162 |
| - "id": "045db059", |
| 162 | + "id": "7134e686", |
163 | 163 | "metadata": {},
|
164 | 164 | "source": [
|
165 | 165 | "From the above, we can see that almost the labels(classes) occured equal number of times and balanced. There is no problem of class imbalance and hence no need to apply any balancing techniques like undersampling, oversampling etc."
|
|
168 | 168 | {
|
169 | 169 | "cell_type": "code",
|
170 | 170 | "execution_count": 6,
|
171 |
| - "id": "3522310e", |
| 171 | + "id": "cbc8320c", |
172 | 172 | "metadata": {},
|
173 | 173 | "outputs": [
|
174 | 174 | {
|
|
256 | 256 | },
|
257 | 257 | {
|
258 | 258 | "cell_type": "markdown",
|
259 |
| - "id": "9a247477", |
| 259 | + "id": "db288e82", |
260 | 260 | "metadata": {},
|
261 | 261 | "source": [
|
262 | 262 | "**Get spacy word vectors and store them in a pandas dataframe**"
|
|
265 | 265 | {
|
266 | 266 | "cell_type": "code",
|
267 | 267 | "execution_count": null,
|
268 |
| - "id": "5a26a0f7", |
| 268 | + "id": "d09b11d0", |
269 | 269 | "metadata": {},
|
270 | 270 | "outputs": [],
|
271 | 271 | "source": [
|
|
276 | 276 | {
|
277 | 277 | "cell_type": "code",
|
278 | 278 | "execution_count": 7,
|
279 |
| - "id": "135943df", |
| 279 | + "id": "c80141f0", |
280 | 280 | "metadata": {},
|
281 | 281 | "outputs": [],
|
282 | 282 | "source": [
|
|
287 | 287 | {
|
288 | 288 | "cell_type": "code",
|
289 | 289 | "execution_count": 39,
|
290 |
| - "id": "bd570f99", |
| 290 | + "id": "e07897f7", |
291 | 291 | "metadata": {},
|
292 | 292 | "outputs": [
|
293 | 293 | {
|
|
385 | 385 | {
|
386 | 386 | "cell_type": "code",
|
387 | 387 | "execution_count": 42,
|
388 |
| - "id": "602ea3c4", |
| 388 | + "id": "84f8b618", |
389 | 389 | "metadata": {},
|
390 | 390 | "outputs": [],
|
391 | 391 | "source": [
|
|
402 | 402 | {
|
403 | 403 | "cell_type": "code",
|
404 | 404 | "execution_count": 47,
|
405 |
| - "id": "d3cbdab6", |
| 405 | + "id": "e8d475c0", |
406 | 406 | "metadata": {},
|
407 | 407 | "outputs": [],
|
408 | 408 | "source": [
|
|
415 | 415 | {
|
416 | 416 | "cell_type": "code",
|
417 | 417 | "execution_count": 51,
|
418 |
| - "id": "a5d3a48c", |
| 418 | + "id": "53b6072f", |
419 | 419 | "metadata": {},
|
420 | 420 | "outputs": [
|
421 | 421 | {
|
|
449 | 449 | {
|
450 | 450 | "cell_type": "code",
|
451 | 451 | "execution_count": 52,
|
452 |
| - "id": "8fa685ad", |
| 452 | + "id": "0e074362", |
453 | 453 | "metadata": {},
|
454 | 454 | "outputs": [
|
455 | 455 | {
|
|
477 | 477 | {
|
478 | 478 | "cell_type": "code",
|
479 | 479 | "execution_count": 53,
|
480 |
| - "id": "c6b36c17", |
| 480 | + "id": "46c78b8f", |
481 | 481 | "metadata": {
|
482 | 482 | "scrolled": true
|
483 | 483 | },
|
|
516 | 516 | },
|
517 | 517 | {
|
518 | 518 | "cell_type": "markdown",
|
519 |
| - "id": "aa8b852f", |
| 519 | + "id": "4e8bb2b8", |
520 | 520 | "metadata": {},
|
521 | 521 | "source": [
|
522 | 522 | "**Confusion Matrix**"
|
523 | 523 | ]
|
524 | 524 | },
|
525 | 525 | {
|
526 | 526 | "cell_type": "code",
|
527 |
| - "execution_count": null, |
528 |
| - "id": "80bc0ff6", |
| 527 | + "execution_count": 55, |
| 528 | + "id": "e54d8240", |
529 | 529 | "metadata": {},
|
530 |
| - "outputs": [], |
| 530 | + "outputs": [ |
| 531 | + { |
| 532 | + "data": { |
| 533 | + "text/plain": [ |
| 534 | + "Text(69.0, 0.5, 'Truth')" |
| 535 | + ] |
| 536 | + }, |
| 537 | + "execution_count": 55, |
| 538 | + "metadata": {}, |
| 539 | + "output_type": "execute_result" |
| 540 | + }, |
| 541 | + { |
| 542 | + "data": { |
| 543 | + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjoAAAGpCAYAAACEUpywAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/NK7nSAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcMklEQVR4nO3de7huVV0v8O9v7w0IyF0kBAoStMzKCA0z8QIoUAbltbyQD7bV1EzrJFknSn06Wl5OHc/JSFLwAirqgRI0BO8FXpBMMXKHR9hcFLmjpsIa5481wQXtG4t3rXe9c3w+PPNZ8x1zvO8c736evdeP32+MMau1FgCAMVo17QEAACwVgQ4AMFoCHQBgtAQ6AMBoCXQAgNFaM+0BbMz3v3mp5WAwBdvvdci0hwDd+t5319dy3m+Sv2u3us+PLuvYt5SMDgCw5Krq76rqG1X1xQVtu1bVOVX1leHnLkN7VdVfVdW6qvpCVR244D3HDv2/UlXHbu6+Ah0A6NXcbZM7Nu+tSY64S9vxSc5trR2Q5NzhdZIcmeSA4Vib5K+T+cAoyQlJfi7Jw5KccHtwtDECHQDoVZub3LG5W7X28STX3aX56CQnD+cnJzlmQfspbd75SXauqj2TPD7JOa2161pr1yc5J/81eLoTgQ4AcI9V1dqq+uyCY+0WvG2P1tpVw/nVSfYYzvdKcvmCfuuHto21b9SKnYwMACyxuc1nYrZUa+3EJCfeg/e3qpr4QiQZHQDoVGtzEzsW6etDSSrDz28M7Vck2WdBv72Hto21b5RABwCYljOT3L5y6tgkZyxof9aw+urgJDcOJa4PJXlcVe0yTEJ+3NC2UUpXANCrCZauNqeqTk3y6CT3qar1mV899eok766q45J8LclThu5nJTkqybok307y7CRprV1XVa9M8pmh3ytaa3ed4Hzn+7a2Mvfls2EgTIcNA2F6lnvDwO9d/i8T+1279T4/bcNAAIDlpHQFAL3aso3+ZppABwB6tfjVUjND6QoAGC0ZHQDo1TKuupoWgQ4AdOoebPQ3M5SuAIDRktEBgF4pXQEAo6V0BQAwu2R0AKBXNgwEAEZL6QoAYHbJ6ABAr6y6AgBGS+kKAGB2yegAQK+UrgCAsWpt/MvLla4AgNGS0QGAXnUwGVmgAwC9MkcHABitDjI65ugAAKMlowMAvfJQTwBgtJSuAABml4wOAPTKqisAYLSUrgAAZpeMDgD0SukKABitDgIdpSsAYLRkdACgU63ZMBAAGCulKwCA2SWjAwC96mAfHYEOAPRK6QoAYHbJ6ABAr5SuAIDRUroCAJhdMjoA0CulKwBgtJSuAABml4wOAPSqg4yOQAcAetXBHB2lKwBgtGR0AKBXSlcAwGgpXQEAzC4ZHQDoldIVADBaSlcAALNLRgcAeqV0BQCMVgeBjtIVADBaMjoA0KvWpj2CJSfQAYBeKV0BAMwuGR0A6FUHGR2BDgD0yoaBAACzS0YHAHqldAUAjFYHy8uVrgCA0ZLRAYBeKV0BAKPVQaCjdAUALLmqeklVfamqvlhVp1bVvapqv6q6oKrWVdW7qmrroe82w+t1w/V9F3tfgQ4A9KrNTe7YhKraK8lvJzmotfbgJKuTPC3Ja5K8obW2f5Lrkxw3vOW4JNcP7W8Y+i2KQAcAOtXm2sSOLbAmybZVtSbJdkmuSvLYJKcP109OcsxwfvTwOsP1Q6uqFvMdBToAwD1WVWur6rMLjrW3X2utXZHktUkuy3yAc2OSzyW5obV269BtfZK9hvO9klw+vPfWof9uixmXycgA0KsJTkZurZ2Y5MQNXauqXTKfpdkvyQ1J3pPkiIndfBMEOgDQq+V71tVhSb7aWrsmSarqfUkekWTnqlozZG32TnLF0P+KJPskWT+UunZKcu1ibqx0BQAstcuSHFxV2w1zbQ5NcnGSjyR50tDn2CRnDOdnDq8zXD+vtcVt4yyjAwC92rJJxPdYa+2Cqjo9yYVJbk3y+cyXuT6Q5LSqetXQdtLwlpOSvK2q1iW5LvMrtBZFoAMAvVrGDQNbayckOeEuzZcmedgG+v5nkidP4r4CHQDolZ2RAQBml4wOAPRqcfN7Z4pABwB6pXQFADC7BDpskT/6s9fnkF98Wo55xvPuaLvxppvznBe/PEc99bg858Uvz4033ZwkufRrl+fpa1+Sn3n0E/KWd56+2c8BFufEv3lt1l9+UT5/4YfvaPuTE34vn/vsOfnMpz+UD3zgHdlzzz2mOEJWvLk2uWOFEuiwRY456vC86fWvulPbm9/27hx80ENy1rtOysEHPSQnvf3dSZKddtwhx7/kefmNX3viFn0OsDinvO09+aUnPONOba97/Zvyswcdnoc+7PE566xz84d/+DvTGRyzYZmeXj5NAh22yEEP+cnstOMOd2r7yCf+OUcfeViS5OgjD8t5H//nJMluu+ycn/zxB2bNmv86BWxDnwMszic/eUGuv/6GO7XdfPMtd5xvv922WeRmsjAaSzYZuap+LPMP8Lr9SaRXJDmztfblpbony+va62/I7vfZNUlyn912ybV3+QcXmI5X/Onv5+lPf1JuuummHP64p0x7OKxkK7jkNClLktGpqpclOS1JJfn0cFSSU6vq+E28745HvL/5lFOXYmgskarK/ONLgGn74xP+PPff/2E59dT357ee/+xpD4cVrM3NTexYqZYqo3Nckp9orX1/YWNVvT7Jl5K8ekNvWviI9+9/89Lxh5kzbrddds4137wuu99n11zzzeuy6847TXtIwAKnnvb+nHnGKXnFK1837aHA1CzVHJ25JPfbQPuewzVG4NG/cHDOOHt+tccZZ384j3nkw6c8ImD//fe74/wJT3h8LrnkP6Y4Gla8DlZd1VJMVKuqI5K8MclXklw+NP9wkv2TvLC19sHNfYaMzsry3054dT7z+S/khhtuym677pzfOu6ZOfSQh+d3//uf5aqvX5P7/dB987pXvjw77bhDvnntdXnqcb+dW7717axatSrbbXuvnPGOv8m9t99+g5/zxCc8ftpfjwW23+uQaQ+BLfS2U96YQw55eO5zn13z9a9/M6945ety5BGPzQMe8KOZm2u57LL1ecEL/yBXXnn1tIfKFvred9cv6xyAb73qGRP7Xbv9H719Rc5fWJJAJ0mqalXmn0i6cDLyZ1prt23J+wU6MB0CHZgegc7kLdmqq9baXJLzl+rzAYB7aAWXnCbFs64AoFcreLXUpNgwEAAYLRkdAOiV0hUAMFor+BlVk6J0BQCMlowOAPRK6QoAGKuV/IyqSVG6AgBGS0YHAHqldAUAjFYHgY7SFQAwWjI6ANCrDvbREegAQK+UrgAAZpeMDgB0qnWQ0RHoAECvOgh0lK4AgNGS0QGAXnXwCAiBDgD0SukKAGB2yegAQK86yOgIdACgU62NP9BRugIARktGBwB6pXQFAIxWB4GO0hUAMFoyOgDQKc+6AgDGq4NAR+kKABgtGR0A6NX4H3Ul0AGAXvUwR0fpCgAYLRkdAOhVBxkdgQ4A9KqDOTpKVwDAaMnoAECnepiMLNABgF4pXQEAzC4ZHQDolNIVADBeHZSuBDoA0KnWQaBjjg4AMFoyOgDQqw4yOgIdAOiU0hUAwAyT0QGAXnWQ0RHoAECnlK4AAGaYjA4AdKqHjI5ABwA61UOgo3QFAIyWjA4A9KrVtEew5GR0AKBTbW5yx+ZU1c5VdXpV/VtVfbmqHl5Vu1bVOVX1leHnLkPfqqq/qqp1VfWFqjpwsd9RoAMALIe/TPLB1tqPJfnpJF9OcnySc1trByQ5d3idJEcmOWA41ib568XeVKADAJ1qczWxY1OqaqckhyQ5KUlaa99rrd2Q5OgkJw/dTk5yzHB+dJJT2rzzk+xcVXsu5jsKdACgU5MsXVXV2qr67IJj7YJb7ZfkmiRvqarPV9Wbq2r7JHu01q4a+lydZI/hfK8kly94//qh7W4zGRkAuMdaaycmOXEjl9ckOTDJi1prF1TVX+YHZarb39+qqk16XDI6ANCp1mpix2asT7K+tXbB8Pr0zAc+X7+9JDX8/MZw/Yok+yx4/95D290m0AGATi3XqqvW2tVJLq+qBw5Nhya5OMmZSY4d2o5NcsZwfmaSZw2rrw5OcuOCEtfdonQFACyHFyV5R1VtneTSJM/OfMLl3VV1XJKvJXnK0PesJEclWZfk20PfRRHoAECnNrdaaqL3au2iJAdt4NKhG+jbkrxgEvcV6ABAp9rEp/6uPOboAACjJaMDAJ1aztLVtAh0AKBTPQQ6SlcAwGjJ6ABAp3qYjCzQAYBOKV0BAMwwGR0A6NQWPKNq5gl0AKBTm3tG1RgoXQEAoyWjAwCdmlO6AgDGqoc5OkpXAMBoyegAQKd62EdHoAMAnephZ2SlKwBgtGR0AKBTSleDqvr5JPsu7N9aO2WJxgQALAPLy5NU1duS3D/JRUluG5pbEoEOALCibUlG56AkD2qthylLANCPHvbR2ZJA54tJfijJVUs8FgBgGfWQwthooFNVf5/5EtUOSS6uqk8n+e7t11trv7z0wwMAWLxNZXReu2yjAACWXdeTkVtrH0uSqnpNa+1lC69V1WuSfGyJxwYALKEe5uhsyYaBh2+g7chJDwQAYNI2NUfn+Ul+K8n9q+oLCy7tkOSflnpgAMDS6noycpJ3Jjk7yf9IcvyC9ptba9ct6agAgCXX+xydG5PcWFUvu8ule1fVvVtrly3t0AAA7pkt2UfnA5lfZl5J7pVkvySXJPmJJRxXtr3fI5fy44GN+NbnbXoOvehhMvJmA53W2k8ufF1VB2Z+7g4AMMN6KF1tyaqrO2mtXZjk55ZgLAAAE7UlD/V86YKXq5IcmOTKJRsRALAsOlh0tUVzdHZYcH5r5ufsvHdphgMALJceSlebDHSqanWSHVprv7dM4wEAlkkPk5E3Okenqta01m5L8ohlHA8AwMRsKqPz6czPx7moqs5M8p4k37r9YmvtfUs8NgBgCc1NewDLYEvm6NwrybVJHpsf7KfTkgh0AGCGtYy/dLWpQOe+w4qrL+YHAc7tepioDQDMuE0FOquT3DvZYLgn0AGAGTfXwW/zTQU6V7XWXrFsIwEAltVcB6WrTe2MPP5vDwCM2qYyOocu2ygAgGXX9WTk1tp1yzkQAGB59bC8/G4/1BMAYFZsyT46AMAIdV26AgDGTekKAGCGyegAQKd6yOgIdACgUz3M0VG6AgBGS0YHADo1N/6EjkAHAHrV+7OuAABmmowOAHSqTXsAy0CgAwCd6mF5udIVADBaMjoA0Km5Gv9kZIEOAHSqhzk6SlcAwGjJ6ABAp3qYjCzQAYBO9bAzstIVADBaMjoA0KkeHgEh0AGATll1BQAwwwQ6ANCpuZrcsSWqanVVfb6q/mF4vV9VXVBV66rqXVW19dC+zfB63XB938V+R4EOAHRqboLHFnpxki8veP2aJG9ore2f5Pokxw3txyW5fmh/w9BvUQQ6AMCSq6q9k/xikjcPryvJY5OcPnQ5Ockxw/nRw+sM1w8d+t9tAh0A6FSb4FFVa6vqswuOtXe53f9M8vv5QQJotyQ3tNZuHV6vT7LXcL5XksuTZLh+49D/brPqCgA6NckNA1trJyY5cUPXquqXknyjtfa5qnr05O66eQIdAGCpPSLJL1fVUUnulWTHJH+ZZOeqWjNkbfZOcsXQ/4ok+yRZX1VrkuyU5NrF3FjpCgA6tVyTkVtrf9Ba27u1tm+SpyU5r7X29CQfSfKkoduxSc4Yzs8cXme4fl5rbVHb/gh0AKBTU1h1dVcvS/LSqlqX+Tk4Jw3tJyXZbWh/aZLjF3sDpSsAYNm01j6a5KPD+aVJHraBPv+Z5MmTuJ9ABwA61cb/qCuBDgD06h6UnGaGOToAwGjJ6ABAp3rI6Ah0AKBTi1qvPWOUrgCA0ZLRAYBOTfIRECuVQAcAOtXDHB2lKwBgtGR0AKBTPWR0BDoA0CmrrgAAZpiMDgB0yqorAGC0zNEBAEbLHB0AgBkmowMAnZrrIKcj0AGATvUwR0fpCgAYLRkdAOjU+AtXAh0A6JbSFQDADJPRAYBO2RkZABitHpaXK10BAKMlowMAnRp/PkegAwDdsuoKAGCGyegAQKd6mIws0AGATo0/zFG6AgBGTEYHADrVw2RkgQ4AdKqHOTpKVwDAaMnoAECnxp/PEegAQLd6mKOjdAUAjJaMDgB0qnVQvBLoAECnlK4AAGaYjA4AdKqHfXQEOgDQqfGHOUpXAMCIyegAQKeUrgCA0eph1ZVAh4la9+/n5+Zbbsltt83l1ltvzcEPP2raQ4JRefs/nJf3nvOpJMmvHvaIPPMJj83/Oe0f8r4Pfyq77LhDkuS3n/7LeeTPPjg33HxLfvcv/jZfXHdZjn7MwXn5bz51mkOHqRDoMHGHHf7kXHvt9dMeBozOV752Zd57zqfyzj9/WbZaszrPf+Ub86iDHpwkecYvPTa/cczhd+q/9VZb5QW/9oSsu+zKrLvsqmkMmRWuhw0DTUYGmBFfveLq/NQD9s2222ydNatX56AHHZAPn3/RRvtvd69tcuCP759tttpq+QbJTJmb4LFSCXSYqNZazj7r1Fxw/tl5znFPn/ZwYFT2/+E9c+HF/5Ebbr4l3/nu9/KJC7+Ur39zPnt62tkfyxNf8qr88Rvflptu+faURworx7KXrqrq2a21t2zk2toka5OkVu+UVau2X9axcc896jG/kiuvvDq7775bPnj2abnkknX5xCcvmPawYBR+dO898+xfOTzP/dP/lW3vtU0euN/eWbVqVZ56xCF57pOPSlXyxlP/Pq9963vzihc+c9rDZQYoXS2NP93Yhdbaia21g1prBwlyZtOVV16dJLnmmmtzxhln56EPfch0BwQj86uHPSLveu0f5K2veml23H67/Mj97pvddt4xq1evyqpVq/LEw38h//qV/zftYTIjlK4Wqaq+sJHjX5PssRT3ZPq2227b3Pve299xfvhhj8qXvnTJlEcF43LtDTcnSa665rqce8FFOeqQh+aa62684/p5F1yUA374ftMaHqw4S1W62iPJ45PcdelNJfmnJbonU7bHHrvn9PeclCRZs2Z1Tjvt/+ZD//jR6Q4KRualf3Fibrz5W1mzenVe/ptPzY7bb5eXv/mt+bevrk9Vcr/dd8sfP+/X7+h/xHP/KLd85z/z/Vtvy3kX/Ev+5oQX5f777DnFb8BKMtfGX7qqtgRfsqpOSvKW1tonN3Dtna21X9/A2+5kzdZ7jf9PH1agb33+lGkPAbq1zU8cWst5v2f8yK9O7Hft27/2vmUd+5ZakoxOa+24TVzbbJADADAJNgwEgE551hUAMFqWlwMAzDAZHQDo1Ere/2ZSBDoA0Kke5ugoXQEAoyWjAwCd6mEyskAHADrVwxwdpSsAYLRkdACgU0vxGKiVRqADAJ2y6goAYIYJdACgU3MTPDalqvapqo9U1cVV9aWqevHQvmtVnVNVXxl+7jK0V1X9VVWtq6ovVNWBi/2OAh0A6FSb4H+bcWuS322tPSjJwUleUFUPSnJ8knNbawckOXd4nSRHJjlgONYm+evFfkeBDgB0ai5tYsemtNauaq1dOJzfnOTLSfZKcnSSk4duJyc5Zjg/Oskpbd75SXauqj0X8x0FOgDAPVZVa6vqswuOtRvpt2+Sn0lyQZI9WmtXDZeuTrLHcL5XkssXvG390Ha3WXUFAJ2a5PLy1tqJSU7cVJ+quneS9yb5ndbaTVW18P2tqia+DEygAwCdWs6dkatqq8wHOe9orb1vaP56Ve3ZWrtqKE19Y2i/Isk+C96+99B2tyldAQBLquZTNycl+XJr7fULLp2Z5Njh/NgkZyxof9aw+urgJDcuKHHdLTI6ANCpZXyo5yOSPDPJv1bVRUPby5O8Osm7q+q4JF9L8pTh2llJjkqyLsm3kzx7sTcW6ABAp5ZrZ+TW2ieT1EYuH7qB/i3JCyZxb6UrAGC0ZHQAoFMe6gkAjJaHegIAzDAZHQDo1DKuupoagQ4AdGqugzk6SlcAwGjJ6ABAp8afzxHoAEC3rLoCAJhhMjoA0KkeMjoCHQDoVA87IytdAQCjJaMDAJ1SugIARquHnZGVrgCA0ZLRAYBO9TAZWaADAJ3qYY6O0hUAMFoyOgDQKaUrAGC0lK4AAGaYjA4AdKqHfXQEOgDQqbkO5ugoXQEAoyWjAwCdUroCAEZL6QoAYIbJ6ABAp5SuAIDRUroCAJhhMjoA0CmlKwBgtJSuAABmmIwOAHRK6QoAGK3W5qY9hCWndAUAjJaMDgB0ak7pCgAYq2bVFQDA7JLRAYBOKV0BAKOldAUAMMNkdACgUz08AkKgAwCd6mFnZKUrAGC0ZHQAoFM9TEYW6ABApywvBwBGq4eMjjk6AMBoyegAQKcsLwcARkvpCgBghsnoAECnrLoCAEZL6QoAYIbJ6ABAp6y6AgBGy0M9AQBmmIwOAHRK6QoAGC2rrgAAZpiMDgB0qofJyAIdAOiU0hUAwAyT0QGATvWQ0RHoAECnxh/mKF0BACNWPaStWH5Vtba1duK0xwG98XcP7kxGh6WydtoDgE75uwcLCHQAgNES6AAAoyXQYamYIwDT4e8eLGAyMgAwWjI6AMBoCXQAgNES6DBRVXVEVV1SVeuq6vhpjwd6UVV/V1XfqKovTnsssJIIdJiYqlqd5H8nOTLJg5L8WlU9aLqjgm68NckR0x4ErDQCHSbpYUnWtdYuba19L8lpSY6e8pigC621jye5btrjgJVGoMMk7ZXk8gWv1w9tADAVAh0AYLQEOkzSFUn2WfB676ENAKZCoMMkfSbJAVW1X1VtneRpSc6c8pgA6JhAh4lprd2a5IVJPpTky0ne3Vr70nRHBX2oqlOT/HOSB1bV+qo6btpjgpXAIyAAgNGS0QEARkugAwCMlkAHABgtgQ4AMFoCHQBgtAQ6MIOq6raquqiqvlhV76mq7e7BZ721qp40nL95Uw9irapHV9XPL3j9vKp61mLvDbDUBDowm77TWntIa+3BSb6X5HkLL1bVmsV8aGvtOa21izfR5dFJ7gh0Wmtvaq2dsph7ASwHgQ7Mvk8k2X/Itnyiqs5McnFVra6qv6iqz1TVF6rquUlS895YVZdU1YeT3Pf2D6qqj1bVQcP5EVV1YVX9S1WdW1X7Zj6gesmQTXpkVf1JVf3e0P8hVXX+cK/3V9UuCz7zNVX16ar696p65PL+8QA9W9T/9QErw5C5OTLJB4emA5M8uLX21apam+TG1tpDq2qbJJ+qqn9M8jNJHpjkQUn2SHJxkr+7y+funuRvkxwyfNaurbXrqupNSW5prb126HfogredkuRFrbWPVdUrkpyQ5HeGa2taaw+rqqOG9sMm/EcBsEECHZhN21bVRcP5J5KclPmS0qdba18d2h+X5Kdun3+TZKckByQ5JMmprbXbklxZVedt4PMPTvLx2z+rtXbdpgZTVTsl2bm19rGh6eQk71nQ5X3Dz88l2XeLviHABAh0YDZ9p7X2kIUNVZUk31rYlPkMy4fu0u+oJR/df/Xd4edt8e8OsIzM0YHx+lCS51fVVklSVQ+oqu2TfDzJU4c5PHsmecwG3nt+kkOqar/hvbsO7Tcn2eGunVtrNya5fsH8m2cm+dhd+wEsN/9nBeP15syXiS6s+XTPNUmOSfL+JI/N/NycyzL/xOs7aa1dM8zxeV9VrUryjSSHJ/n7JKdX1dFJXnSXtx2b5E3DUvdLkzx7Cb4TwN3i6eUAwGgpXQEAoyXQAQBGS6ADAIyWQAcAGC2BDgAwWgIdAGC0BDoAwGj9f0qmK4ReMcVeAAAAAElFTkSuQmCC\n", |
| 544 | + "text/plain": [ |
| 545 | + "<Figure size 720x504 with 2 Axes>" |
| 546 | + ] |
| 547 | + }, |
| 548 | + "metadata": { |
| 549 | + "needs_background": "light" |
| 550 | + }, |
| 551 | + "output_type": "display_data" |
| 552 | + } |
| 553 | + ], |
531 | 554 | "source": [
|
532 | 555 | "#finally print the confusion matrix for the best model\n",
|
533 | 556 | "from sklearn.metrics import confusion_matrix\n",
|
|
541 | 564 | "plt.xlabel('Prediction')\n",
|
542 | 565 | "plt.ylabel('Truth')"
|
543 | 566 | ]
|
| 567 | + }, |
| 568 | + { |
| 569 | + "cell_type": "markdown", |
| 570 | + "id": "e6320b77", |
| 571 | + "metadata": {}, |
| 572 | + "source": [ |
| 573 | + "#### Key Takeaways\n", |
| 574 | + "\n", |
| 575 | + "1. KNN model which didn't perform well in the vectorization techniques like Bag of words, and TF-IDF due to very **high dimensional vector space**, performed really well with glove vectors due to only **300-dimensional** vectors and very good embeddings(similar and related words have almost similar embeddings) for the given text data.\n", |
| 576 | + "\n", |
| 577 | + "2. MultinomialNB model performed decently well but did not come into the top list because in the 300-dimensional vectors we also have the negative values present. The Naive Bayes model does not fit the data if there are **negative values**. So, to overcome this shortcoming, we have used the **Min-Max scaler** to bring down all the values between 0 to 1. In this process, there will be a possibility of variance and information loss among the data. But anyhow we got a decent recall and f1 scores." |
| 578 | + ] |
544 | 579 | }
|
545 | 580 | ],
|
546 | 581 | "metadata": {
|
|
0 commit comments