0% found this document useful (0 votes)
5 views160 pages

Tutorial.ipynb

Yjbvyh to get it done before the election and Information Science CS go back end of Computers and Information

Uploaded by

labib1.ahmed.1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views160 pages

Tutorial.ipynb

Yjbvyh to get it done before the election and Information Science CS go back end of Computers and Information

Uploaded by

labib1.ahmed.1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 160

{

"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Intro to Dataframes"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], columns=[\"A\", \"B\",
\"C\"], index=[\"x\",\"y\",\"z\",'zz'])"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>x</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>y</th>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>z</th>\n",
" <td>7</td>\n",
" <td>8</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>zz</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"x 1 2 3\n",
"y 4 5 6\n",
"z 7 8 9\n",
"zz 10 11 12"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>z</th>\n",
" <td>7</td>\n",
" <td>8</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>zz</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"z 7 8 9\n",
"zz 10 11 12"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail(2)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['A', 'B', 'C'], dtype='object')"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['x', 'y', 'z', 'zz']"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index.tolist()"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 4 entries, x to zz\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype\n",
"--- ------ -------------- -----\n",
" 0 A 4 non-null int64\n",
" 1 B 4 non-null int64\n",
" 2 C 4 non-null int64\n",
"dtypes: int64(3)\n",
"memory usage: 128.0+ bytes\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>4.000000</td>\n",
" <td>4.000000</td>\n",
" <td>4.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5.500000</td>\n",
" <td>6.500000</td>\n",
" <td>7.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>3.872983</td>\n",
" <td>3.872983</td>\n",
" <td>3.872983</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>3.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>3.250000</td>\n",
" <td>4.250000</td>\n",
" <td>5.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>5.500000</td>\n",
" <td>6.500000</td>\n",
" <td>7.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>7.750000</td>\n",
" <td>8.750000</td>\n",
" <td>9.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>10.000000</td>\n",
" <td>11.000000</td>\n",
" <td>12.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"count 4.000000 4.000000 4.000000\n",
"mean 5.500000 6.500000 7.500000\n",
"std 3.872983 3.872983 3.872983\n",
"min 1.000000 2.000000 3.000000\n",
"25% 3.250000 4.250000 5.250000\n",
"50% 5.500000 6.500000 7.500000\n",
"75% 7.750000 8.750000 9.750000\n",
"max 10.000000 11.000000 12.000000"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"A 4\n",
"B 4\n",
"C 4\n",
"dtype: int64"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 4, 7, 10])"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['A'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4, 3)"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.size"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>x</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>y</th>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>z</th>\n",
" <td>7</td>\n",
" <td>8</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>zz</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"x 1 2 3\n",
"y 4 5 6\n",
"z 7 8 9\n",
"zz 10 11 12"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading in Dataframes from Files"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"coffee = pd.read_csv('./warmup-data/coffee.csv')"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"results = pd.read_parquet('./data/results.parquet')\n",
"bios = pd.read_csv('./data/bios.csv')"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"## To read an excel spreadsheet\n",
"olympics_data = pd.read_excel('./data/olympics-data.xlsx',
sheet_name=\"results\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Accessing Data with Pandas"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Day Coffee Type Units Sold\n",
"0 Monday Espresso 25\n",
"1 Monday Latte 15\n",
"2 Tuesday Espresso 30\n",
"3 Tuesday Latte 20\n",
"4 Wednesday Espresso 35\n",
"5 Wednesday Latte 25\n",
"6 Thursday Espresso 40\n",
"7 Thursday Latte 30\n",
"8 Friday Espresso 45\n",
"9 Friday Latte 35\n",
"10 Saturday Espresso 45\n",
"11 Saturday Latte 35\n",
"12 Sunday Espresso 45\n",
"13 Sunday Latte 35\n"
]
}
],
"source": [
"print(coffee)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"0 Monday Espresso 25\n",
"1 Monday Latte 15\n",
"2 Tuesday Espresso 30\n",
"3 Tuesday Latte 20\n",
"4 Wednesday Espresso 35\n",
"5 Wednesday Latte 25\n",
"6 Thursday Espresso 40\n",
"7 Thursday Latte 30\n",
"8 Friday Espresso 45\n",
"9 Friday Latte 35\n",
"10 Saturday Espresso 45\n",
"11 Saturday Latte 35\n",
"12 Sunday Espresso 45\n",
"13 Sunday Latte 35"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(coffee)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"0 Monday Espresso 25\n",
"1 Monday Latte 15\n",
"2 Tuesday Espresso 30\n",
"3 Tuesday Latte 20\n",
"4 Wednesday Espresso 35"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"4 Wednesday Espresso 35\n",
"5 Wednesday Latte 25\n",
"6 Thursday Espresso 40\n",
"7 Thursday Latte 30\n",
"8 Friday Espresso 45\n",
"9 Friday Latte 35\n",
"10 Saturday Espresso 45\n",
"11 Saturday Latte 35\n",
"12 Sunday Espresso 45\n",
"13 Sunday Latte 35"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.tail(10)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"11 Saturday Latte 35\n",
"0 Monday Espresso 25\n",
"6 Thursday Espresso 40\n",
"7 Thursday Latte 30\n",
"13 Sunday Latte 35"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.sample(5) # Pass in random_state to make deterministic"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Day Monday\n",
"Coffee Type Espresso\n",
"Units Sold 25\n",
"Name: 0, dtype: object"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# loc\n",
"# coffee.loc[Rows, Columns]\n",
"\n",
"coffee.loc[0]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"0 Monday Espresso 25\n",
"1 Monday Latte 15\n",
"5 Wednesday Latte 25"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.loc[[0,1,5]]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Units Sold\n",
"5 Wednesday 25\n",
"6 Thursday 40\n",
"7 Thursday 30\n",
"8 Friday 45\n",
"9 Friday 35"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.loc[5:9, [\"Day\", \"Units Sold\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### iloc"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Units Sold\n",
"0 Monday 25\n",
"1 Monday 15\n",
"2 Tuesday 30\n",
"3 Tuesday 20\n",
"4 Wednesday 35\n",
"5 Wednesday 25\n",
"6 Thursday 40\n",
"7 Thursday 30\n",
"8 Friday 45\n",
"9 Friday 35\n",
"10 Saturday 45\n",
"11 Saturday 35\n",
"12 Sunday 45\n",
"13 Sunday 35"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.iloc[:, [0,2]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Other Stuff"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"coffee.index = coffee[\"Day\"]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Day</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Monday</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Monday</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tuesday</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tuesday</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wednesday</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wednesday</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"Day \n",
"Monday Monday Espresso 25\n",
"Monday Monday Latte 15\n",
"Tuesday Tuesday Espresso 30\n",
"Tuesday Tuesday Latte 20\n",
"Wednesday Wednesday Espresso 35\n",
"Wednesday Wednesday Latte 25"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.loc[\"Monday\":\"Wednesday\"]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"coffee = pd.read_csv('./warmup-data/coffee.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Setting Values"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"coffee.loc[1:3, \"Units Sold\"] = 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Optimized way to get single values (.at & .iat)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"25"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.at[0,\"Units Sold\"]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Latte'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.iat[3,1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Getting Columns"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Monday\n",
"1 Monday\n",
"2 Tuesday\n",
"3 Tuesday\n",
"4 Wednesday\n",
"5 Wednesday\n",
"6 Thursday\n",
"7 Thursday\n",
"8 Friday\n",
"9 Friday\n",
"10 Saturday\n",
"11 Saturday\n",
"12 Sunday\n",
"13 Sunday\n",
"Name: Day, dtype: object"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.Day"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Monday\n",
"1 Monday\n",
"2 Tuesday\n",
"3 Tuesday\n",
"4 Wednesday\n",
"5 Wednesday\n",
"6 Thursday\n",
"7 Thursday\n",
"8 Friday\n",
"9 Friday\n",
"10 Saturday\n",
"11 Saturday\n",
"12 Sunday\n",
"13 Sunday\n",
"Name: Day, dtype: object"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee[\"Day\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sort Values"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"8 Friday Espresso 45\n",
"10 Saturday Espresso 45\n",
"12 Sunday Espresso 45\n",
"6 Thursday Espresso 40\n",
"4 Wednesday Espresso 35\n",
"9 Friday Latte 35\n",
"11 Saturday Latte 35\n",
"13 Sunday Latte 35\n",
"7 Thursday Latte 30\n",
"0 Monday Espresso 25\n",
"5 Wednesday Latte 25\n",
"1 Monday Latte 10\n",
"2 Tuesday Espresso 10\n",
"3 Tuesday Latte 10"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.sort_values([\"Units Sold\"], ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"8 Friday Espresso 45\n",
"10 Saturday Espresso 45\n",
"12 Sunday Espresso 45\n",
"6 Thursday Espresso 40\n",
"4 Wednesday Espresso 35\n",
"9 Friday Latte 35\n",
"11 Saturday Latte 35\n",
"13 Sunday Latte 35\n",
"7 Thursday Latte 30\n",
"0 Monday Espresso 25\n",
"5 Wednesday Latte 25\n",
"2 Tuesday Espresso 10\n",
"1 Monday Latte 10\n",
"3 Tuesday Latte 10"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.sort_values([\"Units Sold\", \"Coffee Type\"], ascending=[0,1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Iterate over dataframe with for loop"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"Day Monday\n",
"Coffee Type Espresso\n",
"Units Sold 25\n",
"Name: 0, dtype: object\n",
"Coffee Type of Row: Espresso\n",
"1\n",
"Day Monday\n",
"Coffee Type Latte\n",
"Units Sold 10\n",
"Name: 1, dtype: object\n",
"Coffee Type of Row: Latte\n",
"2\n",
"Day Tuesday\n",
"Coffee Type Espresso\n",
"Units Sold 10\n",
"Name: 2, dtype: object\n",
"Coffee Type of Row: Espresso\n",
"3\n",
"Day Tuesday\n",
"Coffee Type Latte\n",
"Units Sold 10\n",
"Name: 3, dtype: object\n",
"Coffee Type of Row: Latte\n",
"4\n",
"Day Wednesday\n",
"Coffee Type Espresso\n",
"Units Sold 35\n",
"Name: 4, dtype: object\n",
"Coffee Type of Row: Espresso\n",
"5\n",
"Day Wednesday\n",
"Coffee Type Latte\n",
"Units Sold 25\n",
"Name: 5, dtype: object\n",
"Coffee Type of Row: Latte\n",
"6\n",
"Day Thursday\n",
"Coffee Type Espresso\n",
"Units Sold 40\n",
"Name: 6, dtype: object\n",
"Coffee Type of Row: Espresso\n",
"7\n",
"Day Thursday\n",
"Coffee Type Latte\n",
"Units Sold 30\n",
"Name: 7, dtype: object\n",
"Coffee Type of Row: Latte\n",
"8\n",
"Day Friday\n",
"Coffee Type Espresso\n",
"Units Sold 45\n",
"Name: 8, dtype: object\n",
"Coffee Type of Row: Espresso\n",
"9\n",
"Day Friday\n",
"Coffee Type Latte\n",
"Units Sold 35\n",
"Name: 9, dtype: object\n",
"Coffee Type of Row: Latte\n",
"10\n",
"Day Saturday\n",
"Coffee Type Espresso\n",
"Units Sold 45\n",
"Name: 10, dtype: object\n",
"Coffee Type of Row: Espresso\n",
"11\n",
"Day Saturday\n",
"Coffee Type Latte\n",
"Units Sold 35\n",
"Name: 11, dtype: object\n",
"Coffee Type of Row: Latte\n",
"12\n",
"Day Sunday\n",
"Coffee Type Espresso\n",
"Units Sold 45\n",
"Name: 12, dtype: object\n",
"Coffee Type of Row: Espresso\n",
"13\n",
"Day Sunday\n",
"Coffee Type Latte\n",
"Units Sold 35\n",
"Name: 13, dtype: object\n",
"Coffee Type of Row: Latte\n"
]
}
],
"source": [
"for index, row in coffee.iterrows():\n",
" print(index)\n",
" print(row)\n",
" print(\"Coffee Type of Row:\", row[\"Coffee Type\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering Data"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>Arnaud Boetsch</td>\n",
" <td>1969-04-01</td>\n",
" <td>Meulan</td>\n",
" <td>Yvelines</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>Jean Borotra</td>\n",
" <td>1898-08-13</td>\n",
" <td>Biarritz</td>\n",
" <td>Pyrénées-Atlantiques</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>1994-07-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>Jacques Brugnon</td>\n",
" <td>1895-05-11</td>\n",
" <td>Paris VIIIe</td>\n",
" <td>Paris</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>168.0</td>\n",
" <td>64.0</td>\n",
" <td>1978-03-20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>Albert Canet</td>\n",
" <td>1878-04-17</td>\n",
" <td>Wandsworth</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1930-07-25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city \\\n",
"0 1 Jean-François Blanchy 1886-12-12 Bordeaux \n",
"1 2 Arnaud Boetsch 1969-04-01 Meulan \n",
"2 3 Jean Borotra 1898-08-13 Biarritz \n",
"3 4 Jacques Brugnon 1895-05-11 Paris VIIIe \n",
"4 5 Albert Canet 1878-04-17 Wandsworth \n",
"\n",
" born_region born_country NOC height_cm weight_kg
died_date \n",
"0 Gironde FRA France NaN NaN 1960-
10-02 \n",
"1 Yvelines FRA France 183.0 76.0
NaN \n",
"2 Pyrénées-Atlantiques FRA France 183.0 76.0 1994-
07-17 \n",
"3 Paris FRA France 168.0 64.0 1978-
03-20 \n",
"4 England GBR France NaN NaN 1930-
07-25 "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios.head()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5089</th>\n",
" <td>5108</td>\n",
" <td>Viktor Pankrashkin</td>\n",
" <td>1957-06-19</td>\n",
" <td>Moskva (Moscow)</td>\n",
" <td>Moskva</td>\n",
" <td>RUS</td>\n",
" <td>Soviet Union</td>\n",
" <td>220.0</td>\n",
" <td>112.0</td>\n",
" <td>1993-07-24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5583</th>\n",
" <td>5606</td>\n",
" <td>Paulinho Villas Boas</td>\n",
" <td>1963-01-26</td>\n",
" <td>São Paulo</td>\n",
" <td>São Paulo</td>\n",
" <td>BRA</td>\n",
" <td>Brazil</td>\n",
" <td>217.0</td>\n",
" <td>106.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5673</th>\n",
" <td>5696</td>\n",
" <td>Gunther Behnke</td>\n",
" <td>1963-01-19</td>\n",
" <td>Leverkusen</td>\n",
" <td>Nordrhein-Westfalen</td>\n",
" <td>GER</td>\n",
" <td>Germany</td>\n",
" <td>221.0</td>\n",
" <td>114.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5716</th>\n",
" <td>5739</td>\n",
" <td>Uwe Blab</td>\n",
" <td>1962-03-26</td>\n",
" <td>München (Munich)</td>\n",
" <td>Bayern</td>\n",
" <td>GER</td>\n",
" <td>Germany West Germany</td>\n",
" <td>218.0</td>\n",
" <td>110.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5781</th>\n",
" <td>5804</td>\n",
" <td>Tommy Burleson</td>\n",
" <td>1952-02-24</td>\n",
" <td>Crossnore</td>\n",
" <td>North Carolina</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>223.0</td>\n",
" <td>102.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5796</th>\n",
" <td>5819</td>\n",
" <td>Andy Campbell</td>\n",
" <td>1956-07-21</td>\n",
" <td>Melbourne</td>\n",
" <td>Victoria</td>\n",
" <td>AUS</td>\n",
" <td>Australia</td>\n",
" <td>218.0</td>\n",
" <td>93.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6223</th>\n",
" <td>6250</td>\n",
" <td>Lars Hansen</td>\n",
" <td>1954-09-27</td>\n",
" <td>København (Copenhagen)</td>\n",
" <td>Hovedstaden</td>\n",
" <td>DEN</td>\n",
" <td>Canada</td>\n",
" <td>216.0</td>\n",
" <td>105.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6270</th>\n",
" <td>6298</td>\n",
" <td>Hu Zhangbao</td>\n",
" <td>1963-04-05</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>People's Republic of China</td>\n",
" <td>216.0</td>\n",
" <td>135.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6409</th>\n",
" <td>6440</td>\n",
" <td>Sergey Kovalenko</td>\n",
" <td>1947-08-11</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Soviet Union</td>\n",
" <td>216.0</td>\n",
" <td>111.0</td>\n",
" <td>2004-11-18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6420</th>\n",
" <td>6451</td>\n",
" <td>Jānis Krūmiņš</td>\n",
" <td>1930-01-30</td>\n",
" <td>Cēsis</td>\n",
" <td>Cēsu novads</td>\n",
" <td>LAT</td>\n",
" <td>Soviet Union</td>\n",
" <td>218.0</td>\n",
" <td>141.0</td>\n",
" <td>1994-11-20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6504</th>\n",
" <td>6537</td>\n",
" <td>Luc Longley</td>\n",
" <td>1969-01-19</td>\n",
" <td>Melbourne</td>\n",
" <td>Victoria</td>\n",
" <td>AUS</td>\n",
" <td>Australia</td>\n",
" <td>220.0</td>\n",
" <td>135.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6722</th>\n",
" <td>6755</td>\n",
" <td>Shaquille O'Neal</td>\n",
" <td>1972-03-06</td>\n",
" <td>Newark</td>\n",
" <td>New Jersey</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>216.0</td>\n",
" <td>137.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6937</th>\n",
" <td>6972</td>\n",
" <td>David Robinson</td>\n",
" <td>1965-08-06</td>\n",
" <td>Key West</td>\n",
" <td>Florida</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>216.0</td>\n",
" <td>107.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6978</th>\n",
" <td>7013</td>\n",
" <td>Arvydas Sabonis</td>\n",
" <td>1964-12-19</td>\n",
" <td>Kaunas</td>\n",
" <td>Kaunas</td>\n",
" <td>LTU</td>\n",
" <td>Lithuania Soviet Union</td>\n",
" <td>223.0</td>\n",
" <td>122.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7074</th>\n",
" <td>7111</td>\n",
" <td>Paulo da Silva</td>\n",
" <td>1963-07-21</td>\n",
" <td>São Paulo</td>\n",
" <td>São Paulo</td>\n",
" <td>BRA</td>\n",
" <td>Brazil</td>\n",
" <td>217.0</td>\n",
" <td>106.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7188</th>\n",
" <td>7226</td>\n",
" <td>Vladimir Tkachenko</td>\n",
" <td>1957-09-20</td>\n",
" <td>Golovinka</td>\n",
" <td>Krasnodar Kray</td>\n",
" <td>RUS</td>\n",
" <td>Soviet Union</td>\n",
" <td>220.0</td>\n",
" <td>110.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7281</th>\n",
" <td>7320</td>\n",
" <td>Stojko Vranković</td>\n",
" <td>1964-01-22</td>\n",
" <td>Drniš</td>\n",
" <td>Šibensko-kninska županija</td>\n",
" <td>CRO</td>\n",
" <td>Croatia Yugoslavia</td>\n",
" <td>217.0</td>\n",
" <td>115.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7376</th>\n",
" <td>7416</td>\n",
" <td>Eurelijus Žukauskas</td>\n",
" <td>1973-08-22</td>\n",
" <td>Klaipėda</td>\n",
" <td>Klaipėda</td>\n",
" <td>LTU</td>\n",
" <td>Lithuania</td>\n",
" <td>218.0</td>\n",
" <td>115.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52608</th>\n",
" <td>52983</td>\n",
" <td>Aleksey Kazakov</td>\n",
" <td>1976-03-18</td>\n",
" <td>Naberezhnye Chelny</td>\n",
" <td>Respublika Tatarstan</td>\n",
" <td>RUS</td>\n",
" <td>Russian Federation</td>\n",
" <td>217.0</td>\n",
" <td>102.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82100</th>\n",
" <td>82753</td>\n",
" <td>Frédéric Weis</td>\n",
" <td>1977-06-22</td>\n",
" <td>Thionville</td>\n",
" <td>Moselle</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>218.0</td>\n",
" <td>110.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89070</th>\n",
" <td>89782</td>\n",
" <td>Yao Ming</td>\n",
" <td>1980-09-12</td>\n",
" <td>Xuhui District</td>\n",
" <td>Shanghai</td>\n",
" <td>CHN</td>\n",
" <td>People's Republic of China</td>\n",
" <td>226.0</td>\n",
" <td>141.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89075</th>\n",
" <td>89787</td>\n",
" <td>Roberto Dueñas</td>\n",
" <td>1975-11-01</td>\n",
" <td>Madrid</td>\n",
" <td>Madrid</td>\n",
" <td>ESP</td>\n",
" <td>Spain</td>\n",
" <td>221.0</td>\n",
" <td>137.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107408</th>\n",
" <td>108533</td>\n",
" <td>Peter John Ramos</td>\n",
" <td>1985-05-23</td>\n",
" <td>Fajardo</td>\n",
" <td>Puerto Rico</td>\n",
" <td>PUR</td>\n",
" <td>Puerto Rico</td>\n",
" <td>219.0</td>\n",
" <td>113.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112312</th>\n",
" <td>113568</td>\n",
" <td>Stanko Barać</td>\n",
" <td>1986-08-13</td>\n",
" <td>Mostar</td>\n",
" <td>Hercegovačko-neretvanski kanton</td>\n",
" <td>BIH</td>\n",
" <td>Croatia</td>\n",
" <td>217.0</td>\n",
" <td>110.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112332</th>\n",
" <td>113588</td>\n",
" <td>Andreas Glyniadakis</td>\n",
" <td>1981-08-26</td>\n",
" <td>Chania</td>\n",
" <td>Kriti</td>\n",
" <td>GRE</td>\n",
" <td>Greece</td>\n",
" <td>216.0</td>\n",
" <td>115.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112337</th>\n",
" <td>113593</td>\n",
" <td>Hamed Haddadi</td>\n",
" <td>1985-05-19</td>\n",
" <td>Ahvaz</td>\n",
" <td>Khuzestan</td>\n",
" <td>IRI</td>\n",
" <td>Islamic Republic of Iran</td>\n",
" <td>218.0</td>\n",
" <td>110.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118663</th>\n",
" <td>120400</td>\n",
" <td>Timofey Mozgov</td>\n",
" <td>1986-07-16</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Russian Federation</td>\n",
" <td>216.0</td>\n",
" <td>113.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118676</th>\n",
" <td>120415</td>\n",
" <td>Dmitry Musersky</td>\n",
" <td>1988-10-29</td>\n",
" <td>Makiïvka</td>\n",
" <td>Donetsk</td>\n",
" <td>UKR</td>\n",
" <td>Russian Federation</td>\n",
" <td>219.0</td>\n",
" <td>104.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120266</th>\n",
" <td>122147</td>\n",
" <td>Zhang Zhaoxu</td>\n",
" <td>1987-11-18</td>\n",
" <td>Binzhou</td>\n",
" <td>Shandong</td>\n",
" <td>CHN</td>\n",
" <td>People's Republic of China</td>\n",
" <td>221.0</td>\n",
" <td>110.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>121694</th>\n",
" <td>123709</td>\n",
" <td>Salah Mejri</td>\n",
" <td>1986-06-15</td>\n",
" <td>Jendouba</td>\n",
" <td>Jendouba</td>\n",
" <td>TUN</td>\n",
" <td>Tunisia</td>\n",
" <td>216.0</td>\n",
" <td>110.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123850</th>\n",
" <td>126093</td>\n",
" <td>Tyson Chandler</td>\n",
" <td>1982-10-02</td>\n",
" <td>Hanford</td>\n",
" <td>California</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>216.0</td>\n",
" <td>107.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130460</th>\n",
" <td>133147</td>\n",
" <td>Li Muhao</td>\n",
" <td>1992-06-02</td>\n",
" <td>Guiyang</td>\n",
" <td>Guizhou</td>\n",
" <td>CHN</td>\n",
" <td>People's Republic of China</td>\n",
" <td>218.0</td>\n",
" <td>115.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130461</th>\n",
" <td>133148</td>\n",
" <td>Zhou Qi</td>\n",
" <td>1996-01-16</td>\n",
" <td>Xinxiang</td>\n",
" <td>Henan</td>\n",
" <td>CHN</td>\n",
" <td>People's Republic of China</td>\n",
" <td>217.0</td>\n",
" <td>95.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>138671</th>\n",
" <td>142084</td>\n",
" <td>Ondřej Balvín</td>\n",
" <td>1992-09-20</td>\n",
" <td>Ústí nad Labem</td>\n",
" <td>Ústecký kraj</td>\n",
" <td>CZE</td>\n",
" <td>Czechia</td>\n",
" <td>216.0</td>\n",
" <td>107.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139365</th>\n",
" <td>142836</td>\n",
" <td>Moustapha Fall</td>\n",
" <td>1992-02-23</td>\n",
" <td>Paris</td>\n",
" <td>Paris</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>218.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date
born_city \\\n",
"5089 5108 Viktor Pankrashkin 1957-06-19 Moskva
(Moscow) \n",
"5583 5606 Paulinho Villas Boas 1963-01-26 São
Paulo \n",
"5673 5696 Gunther Behnke 1963-01-19
Leverkusen \n",
"5716 5739 Uwe Blab 1962-03-26 München
(Munich) \n",
"5781 5804 Tommy Burleson 1952-02-24
Crossnore \n",
"5796 5819 Andy Campbell 1956-07-21
Melbourne \n",
"6223 6250 Lars Hansen 1954-09-27 København
(Copenhagen) \n",
"6270 6298 Hu Zhangbao 1963-04-05
NaN \n",
"6409 6440 Sergey Kovalenko 1947-08-11
NaN \n",
"6420 6451 Jānis Krūmiņš 1930-01-30
Cēsis \n",
"6504 6537 Luc Longley 1969-01-19
Melbourne \n",
"6722 6755 Shaquille O'Neal 1972-03-06
Newark \n",
"6937 6972 David Robinson 1965-08-06 Key
West \n",
"6978 7013 Arvydas Sabonis 1964-12-19
Kaunas \n",
"7074 7111 Paulo da Silva 1963-07-21 São
Paulo \n",
"7188 7226 Vladimir Tkachenko 1957-09-20
Golovinka \n",
"7281 7320 Stojko Vranković 1964-01-22
Drniš \n",
"7376 7416 Eurelijus Žukauskas 1973-08-22
Klaipėda \n",
"52608 52983 Aleksey Kazakov 1976-03-18 Naberezhnye
Chelny \n",
"82100 82753 Frédéric Weis 1977-06-22
Thionville \n",
"89070 89782 Yao Ming 1980-09-12 Xuhui
District \n",
"89075 89787 Roberto Dueñas 1975-11-01
Madrid \n",
"107408 108533 Peter John Ramos 1985-05-23
Fajardo \n",
"112312 113568 Stanko Barać 1986-08-13
Mostar \n",
"112332 113588 Andreas Glyniadakis 1981-08-26
Chania \n",
"112337 113593 Hamed Haddadi 1985-05-19
Ahvaz \n",
"118663 120400 Timofey Mozgov 1986-07-16
NaN \n",
"118676 120415 Dmitry Musersky 1988-10-29
Makiïvka \n",
"120266 122147 Zhang Zhaoxu 1987-11-18
Binzhou \n",
"121694 123709 Salah Mejri 1986-06-15
Jendouba \n",
"123850 126093 Tyson Chandler 1982-10-02
Hanford \n",
"130460 133147 Li Muhao 1992-06-02
Guiyang \n",
"130461 133148 Zhou Qi 1996-01-16
Xinxiang \n",
"138671 142084 Ondřej Balvín 1992-09-20 Ústí nad
Labem \n",
"139365 142836 Moustapha Fall 1992-02-23
Paris \n",
"\n",
" born_region born_country \\\n",
"5089 Moskva RUS \n",
"5583 São Paulo BRA \n",
"5673 Nordrhein-Westfalen GER \n",
"5716 Bayern GER \n",
"5781 North Carolina USA \n",
"5796 Victoria AUS \n",
"6223 Hovedstaden DEN \n",
"6270 NaN NaN \n",
"6409 NaN NaN \n",
"6420 Cēsu novads LAT \n",
"6504 Victoria AUS \n",
"6722 New Jersey USA \n",
"6937 Florida USA \n",
"6978 Kaunas LTU \n",
"7074 São Paulo BRA \n",
"7188 Krasnodar Kray RUS \n",
"7281 Šibensko-kninska županija CRO \n",
"7376 Klaipėda LTU \n",
"52608 Respublika Tatarstan RUS \n",
"82100 Moselle FRA \n",
"89070 Shanghai CHN \n",
"89075 Madrid ESP \n",
"107408 Puerto Rico PUR \n",
"112312 Hercegovačko-neretvanski kanton BIH \n",
"112332 Kriti GRE \n",
"112337 Khuzestan IRI \n",
"118663 NaN NaN \n",
"118676 Donetsk UKR \n",
"120266 Shandong CHN \n",
"121694 Jendouba TUN \n",
"123850 California USA \n",
"130460 Guizhou CHN \n",
"130461 Henan CHN \n",
"138671 Ústecký kraj CZE \n",
"139365 Paris FRA \n",
"\n",
" NOC height_cm weight_kg died_date \n",
"5089 Soviet Union 220.0 112.0 1993-07-24 \n",
"5583 Brazil 217.0 106.0 NaN \n",
"5673 Germany 221.0 114.0 NaN \n",
"5716 Germany West Germany 218.0 110.0 NaN \n",
"5781 United States 223.0 102.0 NaN \n",
"5796 Australia 218.0 93.0 NaN \n",
"6223 Canada 216.0 105.0 NaN \n",
"6270 People's Republic of China 216.0 135.0 NaN \n",
"6409 Soviet Union 216.0 111.0 2004-11-18 \n",
"6420 Soviet Union 218.0 141.0 1994-11-20 \n",
"6504 Australia 220.0 135.0 NaN \n",
"6722 United States 216.0 137.0 NaN \n",
"6937 United States 216.0 107.0 NaN \n",
"6978 Lithuania Soviet Union 223.0 122.0 NaN \n",
"7074 Brazil 217.0 106.0 NaN \n",
"7188 Soviet Union 220.0 110.0 NaN \n",
"7281 Croatia Yugoslavia 217.0 115.0 NaN \n",
"7376 Lithuania 218.0 115.0 NaN \n",
"52608 Russian Federation 217.0 102.0 NaN \n",
"82100 France 218.0 110.0 NaN \n",
"89070 People's Republic of China 226.0 141.0 NaN \n",
"89075 Spain 221.0 137.0 NaN \n",
"107408 Puerto Rico 219.0 113.0 NaN \n",
"112312 Croatia 217.0 110.0 NaN \n",
"112332 Greece 216.0 115.0 NaN \n",
"112337 Islamic Republic of Iran 218.0 110.0 NaN \n",
"118663 Russian Federation 216.0 113.0 NaN \n",
"118676 Russian Federation 219.0 104.0 NaN \n",
"120266 People's Republic of China 221.0 110.0 NaN \n",
"121694 Tunisia 216.0 110.0 NaN \n",
"123850 United States 216.0 107.0 NaN \n",
"130460 People's Republic of China 218.0 115.0 NaN \n",
"130461 People's Republic of China 217.0 95.0 NaN \n",
"138671 Czechia 216.0 107.0 NaN \n",
"139365 France 218.0 NaN NaN "
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios.loc[bios[\"height_cm\"] > 215]"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>height_cm</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5089</th>\n",
" <td>Viktor Pankrashkin</td>\n",
" <td>220.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5583</th>\n",
" <td>Paulinho Villas Boas</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5673</th>\n",
" <td>Gunther Behnke</td>\n",
" <td>221.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5716</th>\n",
" <td>Uwe Blab</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5781</th>\n",
" <td>Tommy Burleson</td>\n",
" <td>223.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5796</th>\n",
" <td>Andy Campbell</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6223</th>\n",
" <td>Lars Hansen</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6270</th>\n",
" <td>Hu Zhangbao</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6409</th>\n",
" <td>Sergey Kovalenko</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6420</th>\n",
" <td>Jānis Krūmiņš</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6504</th>\n",
" <td>Luc Longley</td>\n",
" <td>220.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6722</th>\n",
" <td>Shaquille O'Neal</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6937</th>\n",
" <td>David Robinson</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6978</th>\n",
" <td>Arvydas Sabonis</td>\n",
" <td>223.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7074</th>\n",
" <td>Paulo da Silva</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7188</th>\n",
" <td>Vladimir Tkachenko</td>\n",
" <td>220.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7281</th>\n",
" <td>Stojko Vranković</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7376</th>\n",
" <td>Eurelijus Žukauskas</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52608</th>\n",
" <td>Aleksey Kazakov</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82100</th>\n",
" <td>Frédéric Weis</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89070</th>\n",
" <td>Yao Ming</td>\n",
" <td>226.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89075</th>\n",
" <td>Roberto Dueñas</td>\n",
" <td>221.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107408</th>\n",
" <td>Peter John Ramos</td>\n",
" <td>219.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112312</th>\n",
" <td>Stanko Barać</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112332</th>\n",
" <td>Andreas Glyniadakis</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112337</th>\n",
" <td>Hamed Haddadi</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118663</th>\n",
" <td>Timofey Mozgov</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118676</th>\n",
" <td>Dmitry Musersky</td>\n",
" <td>219.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120266</th>\n",
" <td>Zhang Zhaoxu</td>\n",
" <td>221.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>121694</th>\n",
" <td>Salah Mejri</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123850</th>\n",
" <td>Tyson Chandler</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130460</th>\n",
" <td>Li Muhao</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130461</th>\n",
" <td>Zhou Qi</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>138671</th>\n",
" <td>Ondřej Balvín</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139365</th>\n",
" <td>Moustapha Fall</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name height_cm\n",
"5089 Viktor Pankrashkin 220.0\n",
"5583 Paulinho Villas Boas 217.0\n",
"5673 Gunther Behnke 221.0\n",
"5716 Uwe Blab 218.0\n",
"5781 Tommy Burleson 223.0\n",
"5796 Andy Campbell 218.0\n",
"6223 Lars Hansen 216.0\n",
"6270 Hu Zhangbao 216.0\n",
"6409 Sergey Kovalenko 216.0\n",
"6420 Jānis Krūmiņš 218.0\n",
"6504 Luc Longley 220.0\n",
"6722 Shaquille O'Neal 216.0\n",
"6937 David Robinson 216.0\n",
"6978 Arvydas Sabonis 223.0\n",
"7074 Paulo da Silva 217.0\n",
"7188 Vladimir Tkachenko 220.0\n",
"7281 Stojko Vranković 217.0\n",
"7376 Eurelijus Žukauskas 218.0\n",
"52608 Aleksey Kazakov 217.0\n",
"82100 Frédéric Weis 218.0\n",
"89070 Yao Ming 226.0\n",
"89075 Roberto Dueñas 221.0\n",
"107408 Peter John Ramos 219.0\n",
"112312 Stanko Barać 217.0\n",
"112332 Andreas Glyniadakis 216.0\n",
"112337 Hamed Haddadi 218.0\n",
"118663 Timofey Mozgov 216.0\n",
"118676 Dmitry Musersky 219.0\n",
"120266 Zhang Zhaoxu 221.0\n",
"121694 Salah Mejri 216.0\n",
"123850 Tyson Chandler 216.0\n",
"130460 Li Muhao 218.0\n",
"130461 Zhou Qi 217.0\n",
"138671 Ondřej Balvín 216.0\n",
"139365 Moustapha Fall 218.0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios.loc[bios[\"height_cm\"] > 215, [\"name\", \"height_cm\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Short-hand syntax (without .loc)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>height_cm</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5089</th>\n",
" <td>Viktor Pankrashkin</td>\n",
" <td>220.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5583</th>\n",
" <td>Paulinho Villas Boas</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5673</th>\n",
" <td>Gunther Behnke</td>\n",
" <td>221.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5716</th>\n",
" <td>Uwe Blab</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5781</th>\n",
" <td>Tommy Burleson</td>\n",
" <td>223.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5796</th>\n",
" <td>Andy Campbell</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6223</th>\n",
" <td>Lars Hansen</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6270</th>\n",
" <td>Hu Zhangbao</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6409</th>\n",
" <td>Sergey Kovalenko</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6420</th>\n",
" <td>Jānis Krūmiņš</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6504</th>\n",
" <td>Luc Longley</td>\n",
" <td>220.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6722</th>\n",
" <td>Shaquille O'Neal</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6937</th>\n",
" <td>David Robinson</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6978</th>\n",
" <td>Arvydas Sabonis</td>\n",
" <td>223.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7074</th>\n",
" <td>Paulo da Silva</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7188</th>\n",
" <td>Vladimir Tkachenko</td>\n",
" <td>220.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7281</th>\n",
" <td>Stojko Vranković</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7376</th>\n",
" <td>Eurelijus Žukauskas</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52608</th>\n",
" <td>Aleksey Kazakov</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82100</th>\n",
" <td>Frédéric Weis</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89070</th>\n",
" <td>Yao Ming</td>\n",
" <td>226.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89075</th>\n",
" <td>Roberto Dueñas</td>\n",
" <td>221.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107408</th>\n",
" <td>Peter John Ramos</td>\n",
" <td>219.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112312</th>\n",
" <td>Stanko Barać</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112332</th>\n",
" <td>Andreas Glyniadakis</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112337</th>\n",
" <td>Hamed Haddadi</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118663</th>\n",
" <td>Timofey Mozgov</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118676</th>\n",
" <td>Dmitry Musersky</td>\n",
" <td>219.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120266</th>\n",
" <td>Zhang Zhaoxu</td>\n",
" <td>221.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>121694</th>\n",
" <td>Salah Mejri</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123850</th>\n",
" <td>Tyson Chandler</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130460</th>\n",
" <td>Li Muhao</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130461</th>\n",
" <td>Zhou Qi</td>\n",
" <td>217.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>138671</th>\n",
" <td>Ondřej Balvín</td>\n",
" <td>216.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139365</th>\n",
" <td>Moustapha Fall</td>\n",
" <td>218.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name height_cm\n",
"5089 Viktor Pankrashkin 220.0\n",
"5583 Paulinho Villas Boas 217.0\n",
"5673 Gunther Behnke 221.0\n",
"5716 Uwe Blab 218.0\n",
"5781 Tommy Burleson 223.0\n",
"5796 Andy Campbell 218.0\n",
"6223 Lars Hansen 216.0\n",
"6270 Hu Zhangbao 216.0\n",
"6409 Sergey Kovalenko 216.0\n",
"6420 Jānis Krūmiņš 218.0\n",
"6504 Luc Longley 220.0\n",
"6722 Shaquille O'Neal 216.0\n",
"6937 David Robinson 216.0\n",
"6978 Arvydas Sabonis 223.0\n",
"7074 Paulo da Silva 217.0\n",
"7188 Vladimir Tkachenko 220.0\n",
"7281 Stojko Vranković 217.0\n",
"7376 Eurelijus Žukauskas 218.0\n",
"52608 Aleksey Kazakov 217.0\n",
"82100 Frédéric Weis 218.0\n",
"89070 Yao Ming 226.0\n",
"89075 Roberto Dueñas 221.0\n",
"107408 Peter John Ramos 219.0\n",
"112312 Stanko Barać 217.0\n",
"112332 Andreas Glyniadakis 216.0\n",
"112337 Hamed Haddadi 218.0\n",
"118663 Timofey Mozgov 216.0\n",
"118676 Dmitry Musersky 219.0\n",
"120266 Zhang Zhaoxu 221.0\n",
"121694 Salah Mejri 216.0\n",
"123850 Tyson Chandler 216.0\n",
"130460 Li Muhao 218.0\n",
"130461 Zhou Qi 217.0\n",
"138671 Ondřej Balvín 216.0\n",
"139365 Moustapha Fall 218.0"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios[bios['height_cm'] > 215][[\"name\",\"height_cm\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Multiple filter conditions"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5781</th>\n",
" <td>5804</td>\n",
" <td>Tommy Burleson</td>\n",
" <td>1952-02-24</td>\n",
" <td>Crossnore</td>\n",
" <td>North Carolina</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>223.0</td>\n",
" <td>102.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6722</th>\n",
" <td>6755</td>\n",
" <td>Shaquille O'Neal</td>\n",
" <td>1972-03-06</td>\n",
" <td>Newark</td>\n",
" <td>New Jersey</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>216.0</td>\n",
" <td>137.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6937</th>\n",
" <td>6972</td>\n",
" <td>David Robinson</td>\n",
" <td>1965-08-06</td>\n",
" <td>Key West</td>\n",
" <td>Florida</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>216.0</td>\n",
" <td>107.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123850</th>\n",
" <td>126093</td>\n",
" <td>Tyson Chandler</td>\n",
" <td>1982-10-02</td>\n",
" <td>Hanford</td>\n",
" <td>California</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>216.0</td>\n",
" <td>107.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city born_region
\\\n",
"5781 5804 Tommy Burleson 1952-02-24 Crossnore North Carolina
\n",
"6722 6755 Shaquille O'Neal 1972-03-06 Newark New Jersey
\n",
"6937 6972 David Robinson 1965-08-06 Key West Florida
\n",
"123850 126093 Tyson Chandler 1982-10-02 Hanford California
\n",
"\n",
" born_country NOC height_cm weight_kg died_date \n",
"5781 USA United States 223.0 102.0 NaN \n",
"6722 USA United States 216.0 137.0 NaN \n",
"6937 USA United States 216.0 107.0 NaN \n",
"123850 USA United States 216.0 107.0 NaN "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios[(bios['height_cm'] > 215) & (bios['born_country']=='USA')]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Filter by string conditions"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1897</th>\n",
" <td>1907</td>\n",
" <td>Keith Hanlon</td>\n",
" <td>1966-09-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Ireland</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3505</th>\n",
" <td>3517</td>\n",
" <td>Keith Wallace</td>\n",
" <td>1961-03-29</td>\n",
" <td>Preston</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>165.0</td>\n",
" <td>51.0</td>\n",
" <td>1999-12-31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6228</th>\n",
" <td>6255</td>\n",
" <td>Keith Hartley</td>\n",
" <td>1940-10-15</td>\n",
" <td>Vancouver</td>\n",
" <td>British Columbia</td>\n",
" <td>CAN</td>\n",
" <td>Canada</td>\n",
" <td>200.0</td>\n",
" <td>85.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8898</th>\n",
" <td>8946</td>\n",
" <td>Keith Mwila</td>\n",
" <td>1966-01-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Zambia</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1993-01-09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12053</th>\n",
" <td>12118</td>\n",
" <td>Keith Hervey</td>\n",
" <td>1898-11-03</td>\n",
" <td>Fulham</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1973-02-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109900</th>\n",
" <td>111105</td>\n",
" <td>Keith Cumberpatch</td>\n",
" <td>1927-08-25</td>\n",
" <td>Christchurch</td>\n",
" <td>Canterbury</td>\n",
" <td>NZL</td>\n",
" <td>New Zealand</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2013-11-15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>115973</th>\n",
" <td>117348</td>\n",
" <td>Keith Sanderson</td>\n",
" <td>1975-02-02</td>\n",
" <td>Plymouth</td>\n",
" <td>Massachusetts</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>183.0</td>\n",
" <td>95.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>117676</th>\n",
" <td>119195</td>\n",
" <td>Duncan Keith</td>\n",
" <td>1983-07-16</td>\n",
" <td>Winnipeg</td>\n",
" <td>Manitoba</td>\n",
" <td>CAN</td>\n",
" <td>Canada</td>\n",
" <td>185.0</td>\n",
" <td>88.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122121</th>\n",
" <td>124176</td>\n",
" <td>Keith Ferguson</td>\n",
" <td>1979-09-07</td>\n",
" <td>Sale</td>\n",
" <td>Victoria</td>\n",
" <td>AUS</td>\n",
" <td>Australia</td>\n",
" <td>176.0</td>\n",
" <td>78.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>127310</th>\n",
" <td>129749</td>\n",
" <td>Tracy Keith-Matchitt</td>\n",
" <td>1990-03-30</td>\n",
" <td>Palmerston North</td>\n",
" <td>Manawatu-Wanganui</td>\n",
" <td>NZL</td>\n",
" <td>Cook Islands</td>\n",
" <td>167.0</td>\n",
" <td>60.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>70 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city \\\
n",
"1897 1907 Keith Hanlon 1966-09-01 NaN \
n",
"3505 3517 Keith Wallace 1961-03-29 Preston \
n",
"6228 6255 Keith Hartley 1940-10-15 Vancouver \
n",
"8898 8946 Keith Mwila 1966-01-01 NaN \
n",
"12053 12118 Keith Hervey 1898-11-03 Fulham \
n",
"... ... ... ... ... \
n",
"109900 111105 Keith Cumberpatch 1927-08-25 Christchurch \
n",
"115973 117348 Keith Sanderson 1975-02-02 Plymouth \
n",
"117676 119195 Duncan Keith 1983-07-16 Winnipeg \
n",
"122121 124176 Keith Ferguson 1979-09-07 Sale \
n",
"127310 129749 Tracy Keith-Matchitt 1990-03-30 Palmerston North \
n",
"\n",
" born_region born_country NOC height_cm weight_kg
\\\n",
"1897 NaN NaN Ireland NaN NaN
\n",
"3505 England GBR Great Britain 165.0 51.0
\n",
"6228 British Columbia CAN Canada 200.0 85.0
\n",
"8898 NaN NaN Zambia NaN NaN
\n",
"12053 England GBR Great Britain NaN NaN
\n",
"... ... ... ... ... ...
\n",
"109900 Canterbury NZL New Zealand NaN NaN
\n",
"115973 Massachusetts USA United States 183.0 95.0
\n",
"117676 Manitoba CAN Canada 185.0 88.0
\n",
"122121 Victoria AUS Australia 176.0 78.0
\n",
"127310 Manawatu-Wanganui NZL Cook Islands 167.0 60.0
\n",
"\n",
" died_date \n",
"1897 NaN \n",
"3505 1999-12-31 \n",
"6228 NaN \n",
"8898 1993-01-09 \n",
"12053 1973-02-22 \n",
"... ... \n",
"109900 2013-11-15 \n",
"115973 NaN \n",
"117676 NaN \n",
"122121 NaN \n",
"127310 NaN \n",
"\n",
"[70 rows x 10 columns]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios[bios['name'].str.contains(\"keith\", case=False)]"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>Patrick Chila</td>\n",
" <td>1969-11-27</td>\n",
" <td>Ris-Orangis</td>\n",
" <td>Essonne</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>180.0</td>\n",
" <td>73.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>119</th>\n",
" <td>120</td>\n",
" <td>Patrick Wheatley</td>\n",
" <td>1899-01-20</td>\n",
" <td>Vryheid</td>\n",
" <td>KwaZulu-Natal</td>\n",
" <td>RSA</td>\n",
" <td>Great Britain</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1967-11-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319</th>\n",
" <td>320</td>\n",
" <td>Patrick De Koning</td>\n",
" <td>1961-04-23</td>\n",
" <td>Dendermonde</td>\n",
" <td>Oost-Vlaanderen</td>\n",
" <td>BEL</td>\n",
" <td>Belgium</td>\n",
" <td>178.0</td>\n",
" <td>92.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1897</th>\n",
" <td>1907</td>\n",
" <td>Keith Hanlon</td>\n",
" <td>1966-09-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Ireland</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2115</th>\n",
" <td>2125</td>\n",
" <td>Patrick Jopp</td>\n",
" <td>1962-01-08</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Switzerland</td>\n",
" <td>176.0</td>\n",
" <td>67.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>143975</th>\n",
" <td>147633</td>\n",
" <td>Patrick Chinyemba</td>\n",
" <td>2001-01-03</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Zambia</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144172</th>\n",
" <td>147850</td>\n",
" <td>Patrick Jakob</td>\n",
" <td>1996-10-17</td>\n",
" <td>Sankt Johann in Tirol</td>\n",
" <td>Tirol</td>\n",
" <td>AUT</td>\n",
" <td>Austria</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144547</th>\n",
" <td>148239</td>\n",
" <td>Patrick Galbraith</td>\n",
" <td>1986-03-11</td>\n",
" <td>Haderslev</td>\n",
" <td>Syddanmark</td>\n",
" <td>DEN</td>\n",
" <td>Denmark</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144565</th>\n",
" <td>148257</td>\n",
" <td>Patrick Russell</td>\n",
" <td>1993-01-04</td>\n",
" <td>Gentofte</td>\n",
" <td>Hovedstaden</td>\n",
" <td>DEN</td>\n",
" <td>Denmark</td>\n",
" <td>186.0</td>\n",
" <td>93.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145435</th>\n",
" <td>149158</td>\n",
" <td>Patrick Gasienica</td>\n",
" <td>1998-11-28</td>\n",
" <td>McHenry</td>\n",
" <td>Illinois</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2023-06-12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>303 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date
born_city \\\n",
"6 7 Patrick Chila 1969-11-27 Ris-
Orangis \n",
"119 120 Patrick Wheatley 1899-01-20
Vryheid \n",
"319 320 Patrick De Koning 1961-04-23
Dendermonde \n",
"1897 1907 Keith Hanlon 1966-09-01
NaN \n",
"2115 2125 Patrick Jopp 1962-01-08
NaN \n",

"... ... ... ... ... \n",


"143975 147633 Patrick Chinyemba 2001-01-03
NaN \n",
"144172 147850 Patrick Jakob 1996-10-17 Sankt Johann in
Tirol \n",
"144547 148239 Patrick Galbraith 1986-03-11
Haderslev \n",
"144565 148257 Patrick Russell 1993-01-04
Gentofte \n",
"145435 149158 Patrick Gasienica 1998-11-28
McHenry \n",
"\n",
" born_region born_country NOC height_cm
weight_kg \\\n",
"6 Essonne FRA France 180.0 73.0
\n",
"119 KwaZulu-Natal RSA Great Britain NaN NaN
\n",
"319 Oost-Vlaanderen BEL Belgium 178.0 92.0
\n",
"1897 NaN NaN Ireland NaN NaN
\n",
"2115 NaN NaN Switzerland 176.0 67.0
\n",
"... ... ... ... ... ...
\n",
"143975 NaN NaN Zambia NaN NaN
\n",
"144172 Tirol AUT Austria NaN NaN
\n",
"144547 Syddanmark DEN Denmark NaN NaN
\n",
"144565 Hovedstaden DEN Denmark 186.0 93.0
\n",
"145435 Illinois USA United States NaN NaN
\n",
"\n",
" died_date \n",
"6 NaN \n",
"119 1967-11-05 \n",
"319 NaN \n",
"1897 NaN \n",
"2115 NaN \n",
"... ... \n",
"143975 NaN \n",
"144172 NaN \n",
"144547 NaN \n",
"144565 NaN \n",
"145435 2023-06-12 \n",
"\n",
"[303 rows x 10 columns]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Regex syntax\n",
"bios[bios['name'].str.contains('keith|patrick', case=False)]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [

"/var/folders/rz/zcgcqm0x1bl9cj8slq9l2s1c0000gn/T/ipykernel_10015/351328603.py:10:
UserWarning: This pattern is interpreted as a regular expression, and has match
groups. To actually get the groups, use str.extract.\n",
" repeated_letters = bios[bios['name'].str.contains(r'(.)\\1', na=False)]\
n",

"/var/folders/rz/zcgcqm0x1bl9cj8slq9l2s1c0000gn/T/ipykernel_10015/351328603.py:25:
UserWarning: This pattern is interpreted as a regular expression, and has match
groups. To actually get the groups, use str.extract.\n",
" start_end_same = bios[bios['name'].str.contains(r'^(.).*\\1$', na=False)]\
n",

"/var/folders/rz/zcgcqm0x1bl9cj8slq9l2s1c0000gn/T/ipykernel_10015/351328603.py:31:
UserWarning: This pattern is interpreted as a regular expression, and has match
groups. To actually get the groups, use str.extract.\n",
" three_or_more_vowels = bios[bios['name'].str.contains(r'([AEIOUaeiou].*)
{3,}', na=False)]\n"
]
}
],
"source": [
"# Other cool regex filters\n",
"\n",
"# Find athletes born in cities that start with a vowel:\n",
"vowel_cities = bios[bios['born_city'].str.contains(r'^[AEIOUaeiou]',
na=False)]\n",
"\n",
"# Find athletes with names that contain exactly two vowels:\n",
"two_vowels = bios[bios['name'].str.contains(r'^[^AEIOUaeiou]*[AEIOUaeiou]
[^AEIOUaeiou]*[AEIOUaeiou][^AEIOUaeiou]*$', na=False)]\n",
"\n",
"# Find athletes with names that have repeated consecutive letters
(e.g., \"Aaron\", \"Emmett\"):\n",
"repeated_letters = bios[bios['name'].str.contains(r'(.)\\1', na=False)]\n",
"\n",
"# Find athletes with names ending in 'son' or 'sen':\n",
"son_sen_names = bios[bios['name'].str.contains(r'son$|sen$', case=False,
na=False)]\n",
"\n",
"# Find athletes born in a year starting with '19':\n",
"born_19xx = bios[bios['born_date'].str.contains(r'^19', na=False)]\n",
"\n",
"# Find athletes with names that do not contain any vowels:\n",
"no_vowels = bios[bios['name'].str.contains(r'^[^AEIOUaeiou]*$', na=False)]\n",
"\n",
"# Find athletes whose names contain a hyphen or an apostrophe:\n",
"hyphen_apostrophe = bios[bios['name'].str.contains(r\"[-']\", na=False)]\n",
"\n",
"# Find athletes with names that start and end with the same letter:\n",
"start_end_same = bios[bios['name'].str.contains(r'^(.).*\\1$', na=False,
case=False)]\n",
"\n",
"# Find athletes with a born_city that has exactly 7 characters:\n",
"city_seven_chars = bios[bios['born_city'].str.contains(r'^.{7}$', na=False)]\
n",
"\n",
"# Find athletes with names containing three or more vowels:\n",
"three_or_more_vowels = bios[bios['name'].str.contains(r'([AEIOUaeiou].*){3,}',
na=False)]\n"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [athlete_id, name, born_date, born_city, born_region,
born_country, NOC, height_cm, weight_kg, died_date]\n",
"Index: []"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Don't use regex search (exact match)\n",
"bios[bios['name'].str.contains('keith|patrick', case=False, regex=False)]"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3505</th>\n",
" <td>3517</td>\n",
" <td>Keith Wallace</td>\n",
" <td>1961-03-29</td>\n",
" <td>Preston</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>165.0</td>\n",
" <td>51.0</td>\n",
" <td>1999-12-31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12053</th>\n",
" <td>12118</td>\n",
" <td>Keith Hervey</td>\n",
" <td>1898-11-03</td>\n",
" <td>Fulham</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1973-02-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14577</th>\n",
" <td>14674</td>\n",
" <td>Keith Harrison</td>\n",
" <td>1933-03-28</td>\n",
" <td>Birmingham</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16166</th>\n",
" <td>16281</td>\n",
" <td>Keith Reynolds</td>\n",
" <td>1963-12-25</td>\n",
" <td>Solihull</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>173.0</td>\n",
" <td>68.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18734</th>\n",
" <td>18862</td>\n",
" <td>Keith Sinclair</td>\n",
" <td>1945-06-26</td>\n",
" <td>Sunderland</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>190.0</td>\n",
" <td>79.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29897</th>\n",
" <td>30123</td>\n",
" <td>Keith Langley</td>\n",
" <td>1961-06-03</td>\n",
" <td>Aldershot</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>173.0</td>\n",
" <td>70.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34011</th>\n",
" <td>34275</td>\n",
" <td>Keith Remfry</td>\n",
" <td>1947-11-17</td>\n",
" <td>Ealing</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>193.0</td>\n",
" <td>114.0</td>\n",
" <td>2015-09-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46885</th>\n",
" <td>47234</td>\n",
" <td>Keith Collin</td>\n",
" <td>1937-01-18</td>\n",
" <td>Marylebone</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>168.0</td>\n",
" <td>63.0</td>\n",
" <td>1991-03-06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50929</th>\n",
" <td>51288</td>\n",
" <td>Keith Carter</td>\n",
" <td>1924-08-30</td>\n",
" <td>Akron</td>\n",
" <td>Ohio</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2013-05-03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51185</th>\n",
" <td>51544</td>\n",
" <td>Keith Russell</td>\n",
" <td>1948-01-15</td>\n",
" <td>Mesa</td>\n",
" <td>Arizona</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>188.0</td>\n",
" <td>73.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52913</th>\n",
" <td>53288</td>\n",
" <td>Keith Erickson</td>\n",
" <td>1944-04-19</td>\n",
" <td>San Francisco</td>\n",
" <td>California</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>196.0</td>\n",
" <td>86.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55317</th>\n",
" <td>55712</td>\n",
" <td>Keith Boxell</td>\n",
" <td>1958-05-06</td>\n",
" <td>Clapham</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>170.0</td>\n",
" <td>87.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57818</th>\n",
" <td>58226</td>\n",
" <td>Keith Peache</td>\n",
" <td>1947-08-10</td>\n",
" <td>Lewisham</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>180.0</td>\n",
" <td>98.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61748</th>\n",
" <td>62202</td>\n",
" <td>Keith Grogono</td>\n",
" <td>1912-11-04</td>\n",
" <td>Stratford</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1999-03-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62620</th>\n",
" <td>63086</td>\n",
" <td>Keith Musto</td>\n",
" <td>1936-01-12</td>\n",
" <td>Rochford</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>174.0</td>\n",
" <td>72.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62678</th>\n",
" <td>63144</td>\n",
" <td>Keith Notary</td>\n",
" <td>1960-01-22</td>\n",
" <td>Merritt Island</td>\n",
" <td>Florida</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>170.0</td>\n",
" <td>66.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68324</th>\n",
" <td>68841</td>\n",
" <td>Keith Angus</td>\n",
" <td>1943-04-05</td>\n",
" <td>Sheffield</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>170.0</td>\n",
" <td>59.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68472</th>\n",
" <td>68989</td>\n",
" <td>Keith Cullen</td>\n",
" <td>1972-06-13</td>\n",
" <td>Ilford</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>177.0</td>\n",
" <td>61.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68997</th>\n",
" <td>69514</td>\n",
" <td>Keith Stock</td>\n",
" <td>1957-03-18</td>\n",
" <td>Woolwich</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>176.0</td>\n",
" <td>73.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>77550</th>\n",
" <td>78141</td>\n",
" <td>Keith Brantly</td>\n",
" <td>1962-05-23</td>\n",
" <td>Scott Air Force Base</td>\n",
" <td>Illinois</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>180.0</td>\n",
" <td>64.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>84097</th>\n",
" <td>84766</td>\n",
" <td>Keith Christiansen</td>\n",
" <td>1944-07-14</td>\n",
" <td>International Falls</td>\n",
" <td>Minnesota</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>165.0</td>\n",
" <td>69.0</td>\n",
" <td>2018-11-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94646</th>\n",
" <td>95413</td>\n",
" <td>Keith Meyer</td>\n",
" <td>1938-06-20</td>\n",
" <td>Geneva</td>\n",
" <td>Illinois</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2010-07-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95267</th>\n",
" <td>96037</td>\n",
" <td>Keith Oliver</td>\n",
" <td>1947-10-27</td>\n",
" <td>Liverpool</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>172.0</td>\n",
" <td>68.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96452</th>\n",
" <td>97229</td>\n",
" <td>Keith Schellenberg</td>\n",
" <td>1929-03-13</td>\n",
" <td>Middlesbrough</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2019-10-28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97499</th>\n",
" <td>98286</td>\n",
" <td>Keith Tkachuk</td>\n",
" <td>1972-03-28</td>\n",
" <td>Melrose</td>\n",
" <td>Massachusetts</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>188.0</td>\n",
" <td>102.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98068</th>\n",
" <td>98860</td>\n",
" <td>Keith Wegeman</td>\n",
" <td>1929-08-28</td>\n",
" <td>Denver</td>\n",
" <td>Colorado</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1974-08-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99921</th>\n",
" <td>100722</td>\n",
" <td>Keith Carney</td>\n",
" <td>1970-02-03</td>\n",
" <td>Providence</td>\n",
" <td>Rhode Island</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>188.0</td>\n",
" <td>93.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>115973</th>\n",
" <td>117348</td>\n",
" <td>Keith Sanderson</td>\n",
" <td>1975-02-02</td>\n",
" <td>Plymouth</td>\n",
" <td>Massachusetts</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>183.0</td>\n",
" <td>95.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date
born_city \\\n",
"3505 3517 Keith Wallace 1961-03-29
Preston \n",
"12053 12118 Keith Hervey 1898-11-03
Fulham \n",
"14577 14674 Keith Harrison 1933-03-28
Birmingham \n",
"16166 16281 Keith Reynolds 1963-12-25
Solihull \n",
"18734 18862 Keith Sinclair 1945-06-26
Sunderland \n",
"29897 30123 Keith Langley 1961-06-03
Aldershot \n",
"34011 34275 Keith Remfry 1947-11-17
Ealing \n",
"46885 47234 Keith Collin 1937-01-18
Marylebone \n",
"50929 51288 Keith Carter 1924-08-30
Akron \n",
"51185 51544 Keith Russell 1948-01-15
Mesa \n",
"52913 53288 Keith Erickson 1944-04-19 San
Francisco \n",
"55317 55712 Keith Boxell 1958-05-06
Clapham \n",
"57818 58226 Keith Peache 1947-08-10
Lewisham \n",
"61748 62202 Keith Grogono 1912-11-04
Stratford \n",
"62620 63086 Keith Musto 1936-01-12
Rochford \n",
"62678 63144 Keith Notary 1960-01-22 Merritt
Island \n",
"68324 68841 Keith Angus 1943-04-05
Sheffield \n",
"68472 68989 Keith Cullen 1972-06-13
Ilford \n",
"68997 69514 Keith Stock 1957-03-18
Woolwich \n",
"77550 78141 Keith Brantly 1962-05-23 Scott Air Force
Base \n",
"84097 84766 Keith Christiansen 1944-07-14 International
Falls \n",
"94646 95413 Keith Meyer 1938-06-20
Geneva \n",
"95267 96037 Keith Oliver 1947-10-27
Liverpool \n",
"96452 97229 Keith Schellenberg 1929-03-13
Middlesbrough \n",
"97499 98286 Keith Tkachuk 1972-03-28
Melrose \n",
"98068 98860 Keith Wegeman 1929-08-28
Denver \n",
"99921 100722 Keith Carney 1970-02-03
Providence \n",
"115973 117348 Keith Sanderson 1975-02-02
Plymouth \n",
"\n",
" born_region born_country NOC height_cm
weight_kg \\\n",
"3505 England GBR Great Britain 165.0 51.0 \
n",
"12053 England GBR Great Britain NaN NaN \
n",
"14577 England GBR Great Britain NaN NaN \
n",
"16166 England GBR Great Britain 173.0 68.0 \
n",
"18734 England GBR Great Britain 190.0 79.0 \
n",
"29897 England GBR Great Britain 173.0 70.0 \
n",
"34011 England GBR Great Britain 193.0 114.0 \
n",
"46885 England GBR Great Britain 168.0 63.0 \
n",
"50929 Ohio USA United States NaN NaN \
n",
"51185 Arizona USA United States 188.0 73.0 \
n",
"52913 California USA United States 196.0 86.0 \
n",
"55317 England GBR Great Britain 170.0 87.0 \
n",
"57818 England GBR Great Britain 180.0 98.0 \
n",
"61748 England GBR Great Britain NaN NaN \
n",
"62620 England GBR Great Britain 174.0 72.0 \
n",
"62678 Florida USA United States 170.0 66.0 \
n",
"68324 England GBR Great Britain 170.0 59.0 \
n",
"68472 England GBR Great Britain 177.0 61.0 \
n",
"68997 England GBR Great Britain 176.0 73.0 \
n",
"77550 Illinois USA United States 180.0 64.0 \
n",
"84097 Minnesota USA United States 165.0 69.0 \
n",
"94646 Illinois USA United States NaN NaN \
n",
"95267 England GBR Great Britain 172.0 68.0 \
n",
"96452 England GBR Great Britain NaN NaN \
n",
"97499 Massachusetts USA United States 188.0 102.0 \
n",
"98068 Colorado USA United States NaN NaN \
n",
"99921 Rhode Island USA United States 188.0 93.0 \
n",
"115973 Massachusetts USA United States 183.0 95.0 \
n",
"\n",
" died_date \n",
"3505 1999-12-31 \n",
"12053 1973-02-22 \n",
"14577 NaN \n",
"16166 NaN \n",
"18734 NaN \n",
"29897 NaN \n",
"34011 2015-09-16 \n",
"46885 1991-03-06 \n",
"50929 2013-05-03 \n",
"51185 NaN \n",
"52913 NaN \n",
"55317 NaN \n",
"57818 NaN \n",
"61748 1999-03-22 \n",
"62620 NaN \n",
"62678 NaN \n",
"68324 NaN \n",
"68472 NaN \n",
"68997 NaN \n",
"77550 NaN \n",
"84097 2018-11-05 \n",
"94646 2010-07-25 \n",
"95267 NaN \n",
"96452 2019-10-28 \n",
"97499 NaN \n",
"98068 1974-08-22 \n",
"99921 NaN \n",
"115973 NaN "
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## isin method & startswith\n",
"bios[bios['born_country'].isin([\"USA\", \"FRA\", \"GBR\"]) &
(bios['name'].str.startswith(\"Keith\"))]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Make sure to smash that like button & subscribe tehehehe\n"
]
}
],
"source": [
"print(\"Make sure to smash that like button & subscribe tehehehe\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Query functions"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>11030</th>\n",
" <td>11088</td>\n",
" <td>David Halpern</td>\n",
" <td>1955-08-18</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>178.0</td>\n",
" <td>79.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12800</th>\n",
" <td>12870</td>\n",
" <td>Todd Trewin</td>\n",
" <td>1958-04-20</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>180.0</td>\n",
" <td>75.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15476</th>\n",
" <td>15583</td>\n",
" <td>Scott McKinley</td>\n",
" <td>1968-10-15</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>183.0</td>\n",
" <td>75.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29079</th>\n",
" <td>29293</td>\n",
" <td>Joyce Tanac</td>\n",
" <td>1950-09-27</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>156.0</td>\n",
" <td>49.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31135</th>\n",
" <td>31371</td>\n",
" <td>Bill Kuhlemeier</td>\n",
" <td>1908-01-14</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2001-07-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133392</th>\n",
" <td>136331</td>\n",
" <td>Hans Struzyna</td>\n",
" <td>1989-03-31</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>188.0</td>\n",
" <td>91.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>135448</th>\n",
" <td>138662</td>\n",
" <td>Maude Davis Crossland</td>\n",
" <td>2003-03-19</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>Colombia</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>136993</th>\n",
" <td>140229</td>\n",
" <td>Jenell Berhorst</td>\n",
" <td>2003-12-13</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>143507</th>\n",
" <td>147159</td>\n",
" <td>Nevin Harrison</td>\n",
" <td>2002-06-02</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>175.0</td>\n",
" <td>73.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145446</th>\n",
" <td>149169</td>\n",
" <td>Corinne Stoddard</td>\n",
" <td>2001-08-15</td>\n",
" <td>Seattle</td>\n",
" <td>Washington</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>102 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city born_region
\\\n",
"11030 11088 David Halpern 1955-08-18 Seattle Washington
\n",
"12800 12870 Todd Trewin 1958-04-20 Seattle Washington
\n",
"15476 15583 Scott McKinley 1968-10-15 Seattle Washington
\n",
"29079 29293 Joyce Tanac 1950-09-27 Seattle Washington
\n",
"31135 31371 Bill Kuhlemeier 1908-01-14 Seattle Washington
\n",
"... ... ... ... ... ...
\n",
"133392 136331 Hans Struzyna 1989-03-31 Seattle Washington
\n",
"135448 138662 Maude Davis Crossland 2003-03-19 Seattle Washington
\n",
"136993 140229 Jenell Berhorst 2003-12-13 Seattle Washington
\n",
"143507 147159 Nevin Harrison 2002-06-02 Seattle Washington
\n",
"145446 149169 Corinne Stoddard 2001-08-15 Seattle Washington
\n",
"\n",
" born_country NOC height_cm weight_kg died_date \n",
"11030 USA United States 178.0 79.0 NaN \n",
"12800 USA United States 180.0 75.0 NaN \n",
"15476 USA United States 183.0 75.0 NaN \n",
"29079 USA United States 156.0 49.0 NaN \n",
"31135 USA United States NaN NaN 2001-07-08 \n",
"... ... ... ... ... ... \n",
"133392 USA United States 188.0 91.0 NaN \n",
"135448 USA Colombia NaN NaN NaN \n",
"136993 USA United States NaN NaN NaN \n",
"143507 USA United States 175.0 73.0 NaN \n",
"145446 USA United States NaN NaN NaN \n",
"\n",
"[102 rows x 10 columns]"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios.query('born_country == \"USA\" and born_city == \"Seattle\"')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding / Removing Columns"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"0 Monday Espresso 25\n",
"1 Monday Latte 10\n",
"2 Tuesday Espresso 10\n",
"3 Tuesday Latte 10\n",
"4 Wednesday Espresso 35"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.head()"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [],
"source": [
"coffee['price'] = 4.99"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [],
"source": [
"coffee['new_price'] = np.where(coffee['Coffee Type']=='Espresso', 3.99, 5.99)
"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" <th>price</th>\n",
" <th>new_price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25.0</td>\n",
" <td>3.99</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>10.0</td>\n",
" <td>5.99</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>NaN</td>\n",
" <td>3.99</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>NaN</td>\n",
" <td>5.99</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35.0</td>\n",
" <td>3.99</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25.0</td>\n",
" <td>5.99</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40.0</td>\n",
" <td>3.99</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30.0</td>\n",
" <td>5.99</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold price new_price\n",
"0 Monday Espresso 25.0 3.99 3.99\n",
"1 Monday Latte 10.0 5.99 5.99\n",
"2 Tuesday Espresso NaN 3.99 3.99\n",
"3 Tuesday Latte NaN 5.99 5.99\n",
"4 Wednesday Espresso 35.0 3.99 3.99\n",
"5 Wednesday Latte 25.0 5.99 5.99\n",
"6 Thursday Espresso 40.0 3.99 3.99\n",
"7 Thursday Latte 30.0 5.99 5.99\n",
"8 Friday Espresso 45.0 3.99 3.99\n",
"9 Friday Latte 35.0 5.99 5.99\n",
"10 Saturday Espresso 45.0 3.99 3.99\n",
"11 Saturday Latte 35.0 5.99 5.99\n",
"12 Sunday Espresso 45.0 3.99 3.99\n",
"13 Sunday Latte 35.0 5.99 5.99"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [],
"source": [
"coffee.drop(columns=['price'], inplace=True)\n",
"\n",
"# the below would also have worked\n",
"# coffee = coffee.drop(columns=['price'])"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [],
"source": [
"coffee = coffee[['Day', 'Coffee Type', 'Units Sold', 'new_price']]"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [],
"source": [
"coffee['revenue'] = coffee['Units Sold'] * coffee['new_price']"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" <th>new_price</th>\n",
" <th>revenue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25.0</td>\n",
" <td>3.99</td>\n",
" <td>99.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>10.0</td>\n",
" <td>5.99</td>\n",
" <td>59.90</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>NaN</td>\n",
" <td>3.99</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>NaN</td>\n",
" <td>5.99</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35.0</td>\n",
" <td>3.99</td>\n",
" <td>139.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25.0</td>\n",
" <td>5.99</td>\n",
" <td>149.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40.0</td>\n",
" <td>3.99</td>\n",
" <td>159.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30.0</td>\n",
" <td>5.99</td>\n",
" <td>179.70</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold new_price revenue\n",
"0 Monday Espresso 25.0 3.99 99.75\n",
"1 Monday Latte 10.0 5.99 59.90\n",
"2 Tuesday Espresso NaN 3.99 NaN\n",
"3 Tuesday Latte NaN 5.99 NaN\n",
"4 Wednesday Espresso 35.0 3.99 139.65\n",
"5 Wednesday Latte 25.0 5.99 149.75\n",
"6 Thursday Espresso 40.0 3.99 159.60\n",
"7 Thursday Latte 30.0 5.99 179.70\n",
"8 Friday Espresso 45.0 3.99 179.55\n",
"9 Friday Latte 35.0 5.99 209.65\n",
"10 Saturday Espresso 45.0 3.99 179.55\n",
"11 Saturday Latte 35.0 5.99 209.65\n",
"12 Sunday Espresso 45.0 3.99 179.55\n",
"13 Sunday Latte 35.0 5.99 209.65"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [],
"source": [
"coffee.rename(columns={'new_price': 'price'}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"bios_new = bios.copy()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"bios_new['first_name'] = bios_new['name'].str.split(' ').str[0]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" <th>first_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1897</th>\n",
" <td>1907</td>\n",
" <td>Keith Hanlon</td>\n",
" <td>1966-09-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Ireland</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3505</th>\n",
" <td>3517</td>\n",
" <td>Keith Wallace</td>\n",
" <td>1961-03-29</td>\n",
" <td>Preston</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>165.0</td>\n",
" <td>51.0</td>\n",
" <td>1999-12-31</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6228</th>\n",
" <td>6255</td>\n",
" <td>Keith Hartley</td>\n",
" <td>1940-10-15</td>\n",
" <td>Vancouver</td>\n",
" <td>British Columbia</td>\n",
" <td>CAN</td>\n",
" <td>Canada</td>\n",
" <td>200.0</td>\n",
" <td>85.0</td>\n",
" <td>NaN</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8898</th>\n",
" <td>8946</td>\n",
" <td>Keith Mwila</td>\n",
" <td>1966-01-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Zambia</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1993-01-09</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12053</th>\n",
" <td>12118</td>\n",
" <td>Keith Hervey</td>\n",
" <td>1898-11-03</td>\n",
" <td>Fulham</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Great Britain</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1973-02-22</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99921</th>\n",
" <td>100722</td>\n",
" <td>Keith Carney</td>\n",
" <td>1970-02-03</td>\n",
" <td>Providence</td>\n",
" <td>Rhode Island</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>188.0</td>\n",
" <td>93.0</td>\n",
" <td>NaN</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102227</th>\n",
" <td>103168</td>\n",
" <td>Keith Beavers</td>\n",
" <td>1983-02-09</td>\n",
" <td>London</td>\n",
" <td>Ontario</td>\n",
" <td>CAN</td>\n",
" <td>Canada</td>\n",
" <td>185.0</td>\n",
" <td>75.0</td>\n",
" <td>NaN</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109900</th>\n",
" <td>111105</td>\n",
" <td>Keith Cumberpatch</td>\n",
" <td>1927-08-25</td>\n",
" <td>Christchurch</td>\n",
" <td>Canterbury</td>\n",
" <td>NZL</td>\n",
" <td>New Zealand</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2013-11-15</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>115973</th>\n",
" <td>117348</td>\n",
" <td>Keith Sanderson</td>\n",
" <td>1975-02-02</td>\n",
" <td>Plymouth</td>\n",
" <td>Massachusetts</td>\n",
" <td>USA</td>\n",
" <td>United States</td>\n",
" <td>183.0</td>\n",
" <td>95.0</td>\n",
" <td>NaN</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122121</th>\n",
" <td>124176</td>\n",
" <td>Keith Ferguson</td>\n",
" <td>1979-09-07</td>\n",
" <td>Sale</td>\n",
" <td>Victoria</td>\n",
" <td>AUS</td>\n",
" <td>Australia</td>\n",
" <td>176.0</td>\n",
" <td>78.0</td>\n",
" <td>NaN</td>\n",
" <td>Keith</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>64 rows × 11 columns</p>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city \\\n",
"1897 1907 Keith Hanlon 1966-09-01 NaN \n",
"3505 3517 Keith Wallace 1961-03-29 Preston \n",
"6228 6255 Keith Hartley 1940-10-15 Vancouver \n",
"8898 8946 Keith Mwila 1966-01-01 NaN \n",
"12053 12118 Keith Hervey 1898-11-03 Fulham \n",
"... ... ... ... ... \n",
"99921 100722 Keith Carney 1970-02-03 Providence \n",
"102227 103168 Keith Beavers 1983-02-09 London \n",
"109900 111105 Keith Cumberpatch 1927-08-25 Christchurch \n",
"115973 117348 Keith Sanderson 1975-02-02 Plymouth \n",
"122121 124176 Keith Ferguson 1979-09-07 Sale \n",
"\n",
" born_region born_country NOC height_cm weight_kg
\\\n",
"1897 NaN NaN Ireland NaN NaN
\n",
"3505 England GBR Great Britain 165.0 51.0
\n",
"6228 British Columbia CAN Canada 200.0 85.0
\n",
"8898 NaN NaN Zambia NaN NaN
\n",
"12053 England GBR Great Britain NaN NaN
\n",
"... ... ... ... ... ...
\n",
"99921 Rhode Island USA United States 188.0 93.0
\n",
"102227 Ontario CAN Canada 185.0 75.0
\n",
"109900 Canterbury NZL New Zealand NaN NaN
\n",
"115973 Massachusetts USA United States 183.0 95.0
\n",
"122121 Victoria AUS Australia 176.0 78.0
\n",
"\n",
" died_date first_name \n",
"1897 NaN Keith \n",
"3505 1999-12-31 Keith \n",
"6228 NaN Keith \n",
"8898 1993-01-09 Keith \n",
"12053 1973-02-22 Keith \n",
"... ... ... \n",
"99921 NaN Keith \n",
"102227 NaN Keith \n",
"109900 2013-11-15 Keith \n",
"115973 NaN Keith \n",
"122121 NaN Keith \n",
"\n",
"[64 rows x 11 columns]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios_new.query('first_name == \"Keith\"')"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"bios_new['born_datetime'] = pd.to_datetime(bios_new['born_date'])"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"bios_new['born_year'] = bios_new['born_datetime'].dt.year"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>born_year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Arnaud Boetsch</td>\n",
" <td>1969.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Jean Borotra</td>\n",
" <td>1898.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Jacques Brugnon</td>\n",
" <td>1895.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Albert Canet</td>\n",
" <td>1878.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145495</th>\n",
" <td>Polina Luchnikova</td>\n",
" <td>2002.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145496</th>\n",
" <td>Valeriya Merkusheva</td>\n",
" <td>1999.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145497</th>\n",
" <td>Yuliya Smirnova</td>\n",
" <td>1998.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145498</th>\n",
" <td>André Foussard</td>\n",
" <td>1899.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145499</th>\n",
" <td>Bill Phillips</td>\n",
" <td>1913.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>145500 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" name born_year\n",
"0 Jean-François Blanchy 1886.0\n",
"1 Arnaud Boetsch 1969.0\n",
"2 Jean Borotra 1898.0\n",
"3 Jacques Brugnon 1895.0\n",
"4 Albert Canet 1878.0\n",
"... ... ...\n",
"145495 Polina Luchnikova 2002.0\n",
"145496 Valeriya Merkusheva 1999.0\n",
"145497 Yuliya Smirnova 1998.0\n",
"145498 André Foussard 1899.0\n",
"145499 Bill Phillips 1913.0\n",
"\n",
"[145500 rows x 2 columns]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios_new[['name','born_year']]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"bios_new.to_csv('./data/bios_new.csv', index=False)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"bios['height_category'] = bios['height_cm'].apply(lambda x: 'Short' if x < 165
else ('Average' if x < 185 else 'Tall'))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def categorize_athlete(row):\n",
" if row['height_cm'] < 175 and row['weight_kg'] < 70:\n",
" return 'Lightweight'\n",
" elif row['height_cm'] < 185 or row['weight_kg'] <= 80:\n",
" return 'Middleweight'\n",
" \n",
" else:\n",
" return 'Heavyweight'\n",
" \n",
"bios['Category'] = bios.apply(categorize_athlete, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" <th>height_category</th>\n",
" <th>Category</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" <td>Tall</td>\n",
" <td>Heavyweight</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>Arnaud Boetsch</td>\n",
" <td>1969-04-01</td>\n",
" <td>Meulan</td>\n",
" <td>Yvelines</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>NaN</td>\n",
" <td>Average</td>\n",
" <td>Middleweight</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>Jean Borotra</td>\n",
" <td>1898-08-13</td>\n",
" <td>Biarritz</td>\n",
" <td>Pyrénées-Atlantiques</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>1994-07-17</td>\n",
" <td>Average</td>\n",
" <td>Middleweight</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>Jacques Brugnon</td>\n",
" <td>1895-05-11</td>\n",
" <td>Paris VIIIe</td>\n",
" <td>Paris</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>168.0</td>\n",
" <td>64.0</td>\n",
" <td>1978-03-20</td>\n",
" <td>Average</td>\n",
" <td>Lightweight</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>Albert Canet</td>\n",
" <td>1878-04-17</td>\n",
" <td>Wandsworth</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1930-07-25</td>\n",
" <td>Tall</td>\n",
" <td>Heavyweight</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city \\\n",
"0 1 Jean-François Blanchy 1886-12-12 Bordeaux \n",
"1 2 Arnaud Boetsch 1969-04-01 Meulan \n",
"2 3 Jean Borotra 1898-08-13 Biarritz \n",
"3 4 Jacques Brugnon 1895-05-11 Paris VIIIe \n",
"4 5 Albert Canet 1878-04-17 Wandsworth \n",
"\n",
" born_region born_country NOC height_cm weight_kg \\\n",
"0 Gironde FRA France NaN NaN \n",
"1 Yvelines FRA France 183.0 76.0 \n",
"2 Pyrénées-Atlantiques FRA France 183.0 76.0 \n",
"3 Paris FRA France 168.0 64.0 \n",
"4 England GBR France NaN NaN \n",
"\n",
" died_date height_category Category \n",
"0 1960-10-02 Tall Heavyweight \n",
"1 NaN Average Middleweight \n",
"2 1994-07-17 Average Middleweight \n",
"3 1978-03-20 Average Lightweight \n",
"4 1930-07-25 Tall Heavyweight "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Merging & Concatenating Data"
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {},
"outputs": [],
"source": [
"nocs = pd.read_csv('./data/noc_regions.csv')"
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {},
"outputs": [],
"source": [
"bios_new = pd.merge(bios, nocs, left_on='born_country', right_on='NOC',
how='left')"
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {},
"outputs": [],
"source": [
"bios_new.rename(columns={'region': 'born_country_full'}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 218,
"metadata": {},
"outputs": [],
"source": [
"usa = bios[bios['born_country']=='USA'].copy()\n",
"gbr = bios[bios['born_country']=='GBR'].copy()"
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {},
"outputs": [],
"source": [
"new_df = pd.concat([usa,gbr])"
]
},
{
"cell_type": "code",
"execution_count": 226,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>144811</th>\n",
" <td>148512</td>\n",
" <td>Benjamin Alexander</td>\n",
" <td>1983-05-08</td>\n",
" <td>London</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Jamaica</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144815</th>\n",
" <td>148517</td>\n",
" <td>Ashley Watson</td>\n",
" <td>1993-10-28</td>\n",
" <td>Peterborough</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Jamaica</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145005</th>\n",
" <td>148716</td>\n",
" <td>Peder Kongshaug</td>\n",
" <td>2001-08-13</td>\n",
" <td>Wimbledon</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Norway</td>\n",
" <td>184.0</td>\n",
" <td>86.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145319</th>\n",
" <td>149041</td>\n",
" <td>Axel Brown</td>\n",
" <td>1992-04-02</td>\n",
" <td>Harrogate</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>Trinidad and Tobago</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145388</th>\n",
" <td>149111</td>\n",
" <td>Jean-Luc Baker</td>\n",
" <td>1993-10-07</td>\n",
" <td>Burnley</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>United States</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city
born_region \\\n",
"144811 148512 Benjamin Alexander 1983-05-08 London
England \n",
"144815 148517 Ashley Watson 1993-10-28 Peterborough
England \n",
"145005 148716 Peder Kongshaug 2001-08-13 Wimbledon
England \n",
"145319 149041 Axel Brown 1992-04-02 Harrogate
England \n",
"145388 149111 Jean-Luc Baker 1993-10-07 Burnley
England \n",
"\n",
" born_country NOC height_cm weight_kg died_date \
n",
"144811 GBR Jamaica NaN NaN NaN \
n",
"144815 GBR Jamaica NaN NaN NaN \
n",
"145005 GBR Norway 184.0 86.0 NaN \
n",
"145319 GBR Trinidad and Tobago NaN NaN NaN \
n",
"145388 GBR United States NaN NaN NaN "
]
},
"execution_count": 226,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_df.tail()"
]
},
{
"cell_type": "code",
"execution_count": 228,
"metadata": {},
"outputs": [],
"source": [
"combined_df = pd.merge(results, bios, on='athlete_id', how='left')"
]
},
{
"cell_type": "code",
"execution_count": 229,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>type</th>\n",
" <th>discipline</th>\n",
" <th>event</th>\n",
" <th>as</th>\n",
" <th>athlete_id</th>\n",
" <th>noc</th>\n",
" <th>team</th>\n",
" <th>place</th>\n",
" <th>tied</th>\n",
" <th>medal</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1912.0</td>\n",
" <td>Summer</td>\n",
" <td>Tennis</td>\n",
" <td>Singles, Men (Olympic)</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1</td>\n",
" <td>FRA</td>\n",
" <td>None</td>\n",
" <td>17.0</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1912.0</td>\n",
" <td>Summer</td>\n",
" <td>Tennis</td>\n",
" <td>Doubles, Men (Olympic)</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1</td>\n",
" <td>FRA</td>\n",
" <td>Jean Montariol</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1920.0</td>\n",
" <td>Summer</td>\n",
" <td>Tennis</td>\n",
" <td>Singles, Men (Olympic)</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1</td>\n",
" <td>FRA</td>\n",
" <td>None</td>\n",
" <td>32.0</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1920.0</td>\n",
" <td>Summer</td>\n",
" <td>Tennis</td>\n",
" <td>Doubles, Mixed (Olympic)</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1</td>\n",
" <td>FRA</td>\n",
" <td>Jeanne Vaussard</td>\n",
" <td>8.0</td>\n",
" <td>True</td>\n",
" <td>None</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1920.0</td>\n",
" <td>Summer</td>\n",
" <td>Tennis</td>\n",
" <td>Doubles, Men (Olympic)</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1</td>\n",
" <td>FRA</td>\n",
" <td>Jacques Brugnon</td>\n",
" <td>4.0</td>\n",
" <td>False</td>\n",
" <td>None</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year type discipline event
as \\\n",
"0 1912.0 Summer Tennis Singles, Men (Olympic) Jean-François
Blanchy \n",
"1 1912.0 Summer Tennis Doubles, Men (Olympic) Jean-François
Blanchy \n",
"2 1920.0 Summer Tennis Singles, Men (Olympic) Jean-François
Blanchy \n",
"3 1920.0 Summer Tennis Doubles, Mixed (Olympic) Jean-François
Blanchy \n",
"4 1920.0 Summer Tennis Doubles, Men (Olympic) Jean-François
Blanchy \n",
"\n",
" athlete_id noc team place tied medal \\\n",
"0 1 FRA None 17.0 True None \n",
"1 1 FRA Jean Montariol NaN False None \n",
"2 1 FRA None 32.0 True None \n",
"3 1 FRA Jeanne Vaussard 8.0 True None \n",
"4 1 FRA Jacques Brugnon 4.0 False None \n",
"\n",
" name born_date born_city born_region
born_country \\\n",
"0 Jean-François Blanchy 1886-12-12 Bordeaux Gironde FRA \
n",
"1 Jean-François Blanchy 1886-12-12 Bordeaux Gironde FRA \
n",
"2 Jean-François Blanchy 1886-12-12 Bordeaux Gironde FRA \
n",
"3 Jean-François Blanchy 1886-12-12 Bordeaux Gironde FRA \
n",
"4 Jean-François Blanchy 1886-12-12 Bordeaux Gironde FRA \
n",
"\n",
" NOC height_cm weight_kg died_date \n",
"0 France NaN NaN 1960-10-02 \n",
"1 France NaN NaN 1960-10-02 \n",
"2 France NaN NaN 1960-10-02 \n",
"3 France NaN NaN 1960-10-02 \n",
"4 France NaN NaN 1960-10-02 "
]
},
"execution_count": 229,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Handling Null Values"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"coffee.loc[[2,3], 'Units Sold'] = np.nan"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 25.00\n",
"1 10.00\n",
"2 33.75\n",
"3 33.75\n",
"4 35.00\n",
"5 25.00\n",
"6 40.00\n",
"7 30.00\n",
"8 45.00\n",
"9 35.00\n",
"10 45.00\n",
"11 35.00\n",
"12 45.00\n",
"13 35.00\n",
"Name: Units Sold, dtype: float64"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Make sure to set this to your Units Sold column if you want these changes to
stick\n",
"coffee['Units Sold'].fillna(coffee['Units Sold'].mean()) "
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 25.000000\n",
"1 10.000000\n",
"2 18.333333\n",
"3 26.666667\n",
"4 35.000000\n",
"5 25.000000\n",
"6 40.000000\n",
"7 30.000000\n",
"8 45.000000\n",
"9 35.000000\n",
"10 45.000000\n",
"11 35.000000\n",
"12 45.000000\n",
"13 35.000000\n",
"Name: Units Sold, dtype: float64"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# coffee['Units Sold'] = coffee['Units Sold'].interpolate()\n",
"coffee['Units Sold'].interpolate()"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold\n",
"0 Monday Espresso 25.0\n",
"1 Monday Latte 10.0\n",
"4 Wednesday Espresso 35.0\n",
"5 Wednesday Latte 25.0\n",
"6 Thursday Espresso 40.0\n",
"7 Thursday Latte 30.0\n",
"8 Friday Espresso 45.0\n",
"9 Friday Latte 35.0\n",
"10 Saturday Espresso 45.0\n",
"11 Saturday Latte 35.0\n",
"12 Sunday Espresso 45.0\n",
"13 Sunday Latte 35.0"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.dropna(subset=['Units Sold']) # Use inplace=True if you want to update
the coffee df"
]
},
{
"cell_type": "code",
"execution_count": 253,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" <th>price</th>\n",
" <th>revenue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>15.0</td>\n",
" <td>3.99</td>\n",
" <td>99.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15.0</td>\n",
" <td>5.99</td>\n",
" <td>89.85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35.0</td>\n",
" <td>3.99</td>\n",
" <td>139.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25.0</td>\n",
" <td>5.99</td>\n",
" <td>149.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40.0</td>\n",
" <td>3.99</td>\n",
" <td>159.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30.0</td>\n",
" <td>5.99</td>\n",
" <td>179.70</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold price revenue\n",
"0 Monday Espresso 15.0 3.99 99.75\n",
"1 Monday Latte 15.0 5.99 89.85\n",
"4 Wednesday Espresso 35.0 3.99 139.65\n",
"5 Wednesday Latte 25.0 5.99 149.75\n",
"6 Thursday Espresso 40.0 3.99 159.60\n",
"7 Thursday Latte 30.0 5.99 179.70\n",
"8 Friday Espresso 45.0 3.99 179.55\n",
"9 Friday Latte 35.0 5.99 209.65\n",
"10 Saturday Espresso 45.0 3.99 179.55\n",
"11 Saturday Latte 35.0 5.99 209.65\n",
"12 Sunday Espresso 45.0 3.99 179.55\n",
"13 Sunday Latte 35.0 5.99 209.65"
]
},
"execution_count": 253,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee[coffee['Units Sold'].notna()]"
]
},
{
"cell_type": "code",
"execution_count": 245,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" <th>price</th>\n",
" <th>revenue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>15.000000</td>\n",
" <td>3.99</td>\n",
" <td>99.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15.000000</td>\n",
" <td>5.99</td>\n",
" <td>89.85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>21.666667</td>\n",
" <td>3.99</td>\n",
" <td>119.70</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>28.333333</td>\n",
" <td>5.99</td>\n",
" <td>119.80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35.000000</td>\n",
" <td>3.99</td>\n",
" <td>139.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25.000000</td>\n",
" <td>5.99</td>\n",
" <td>149.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40.000000</td>\n",
" <td>3.99</td>\n",
" <td>159.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30.000000</td>\n",
" <td>5.99</td>\n",
" <td>179.70</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45.000000</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35.000000</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45.000000</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35.000000</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45.000000</td>\n",
" <td>3.99</td>\n",
" <td>179.55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35.000000</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold price revenue\n",
"0 Monday Espresso 15.000000 3.99 99.75\n",
"1 Monday Latte 15.000000 5.99 89.85\n",
"2 Tuesday Espresso 21.666667 3.99 119.70\n",
"3 Tuesday Latte 28.333333 5.99 119.80\n",
"4 Wednesday Espresso 35.000000 3.99 139.65\n",
"5 Wednesday Latte 25.000000 5.99 149.75\n",
"6 Thursday Espresso 40.000000 3.99 159.60\n",
"7 Thursday Latte 30.000000 5.99 179.70\n",
"8 Friday Espresso 45.000000 3.99 179.55\n",
"9 Friday Latte 35.000000 5.99 209.65\n",
"10 Saturday Espresso 45.000000 3.99 179.55\n",
"11 Saturday Latte 35.000000 5.99 209.65\n",
"12 Sunday Espresso 45.000000 3.99 179.55\n",
"13 Sunday Latte 35.000000 5.99 209.65"
]
},
"execution_count": 245,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Aggregating Data"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>Arnaud Boetsch</td>\n",
" <td>1969-04-01</td>\n",
" <td>Meulan</td>\n",
" <td>Yvelines</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>Jean Borotra</td>\n",
" <td>1898-08-13</td>\n",
" <td>Biarritz</td>\n",
" <td>Pyrénées-Atlantiques</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>1994-07-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>Jacques Brugnon</td>\n",
" <td>1895-05-11</td>\n",
" <td>Paris VIIIe</td>\n",
" <td>Paris</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>168.0</td>\n",
" <td>64.0</td>\n",
" <td>1978-03-20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>Albert Canet</td>\n",
" <td>1878-04-17</td>\n",
" <td>Wandsworth</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1930-07-25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city \\\n",
"0 1 Jean-François Blanchy 1886-12-12 Bordeaux \n",
"1 2 Arnaud Boetsch 1969-04-01 Meulan \n",
"2 3 Jean Borotra 1898-08-13 Biarritz \n",
"3 4 Jacques Brugnon 1895-05-11 Paris VIIIe \n",
"4 5 Albert Canet 1878-04-17 Wandsworth \n",
"\n",
" born_region born_country NOC height_cm weight_kg
died_date \n",
"0 Gironde FRA France NaN NaN 1960-
10-02 \n",
"1 Yvelines FRA France 183.0 76.0
NaN \n",
"2 Pyrénées-Atlantiques FRA France 183.0 76.0 1994-
07-17 \n",
"3 Paris FRA France 168.0 64.0 1978-
03-20 \n",
"4 England GBR France NaN NaN 1930-
07-25 "
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios.head()"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"born_city\n",
"Budapest 1378\n",
"Moskva (Moscow) 883\n",
"Oslo 708\n",
"Stockholm 629\n",
"Praha (Prague) 600\n",
" ... \n",
"Bodrogkisfalud 1\n",
"Ternberg 1\n",
"Klaus 1\n",
"Plaški 1\n",
"Dulwich Hill 1\n",
"Name: count, Length: 22368, dtype: int64"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios['born_city'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"born_region\n",
"California 1634\n",
"New York 990\n",
"Illinois 585\n",
"Pennsylvania 530\n",
"Massachusetts 530\n",
"New Jersey 381\n",
"Texas 368\n",
"Minnesota 365\n",
"Ohio 328\n",
"Michigan 319\n",
"Name: count, dtype: int64"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios[bios['born_country']=='USA']['born_region'].value_counts().head(10)"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"born_region\n",
"Utah 91\n",
"Missouri 91\n",
"North Carolina 86\n",
"Arizona 83\n",
"New Hampshire 83\n",
"Vermont 68\n",
"Mississippi 66\n",
"Alabama 64\n",
"Kentucky 62\n",
"Tennessee 62\n",
"Nebraska 60\n",
"Rhode Island 56\n",
"Montana 55\n",
"South Carolina 50\n",
"Maine 50\n",
"Alaska 45\n",
"Arkansas 42\n",
"Idaho 41\n",
"New Mexico 38\n",
"Nevada 36\n",
"South Dakota 27\n",
"West Virginia 24\n",
"Delaware 22\n",
"North Dakota 16\n",
"Wyoming 14\n",
"Name: count, dtype: int64"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios[bios['born_country']=='USA']['born_region'].value_counts().tail(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Groupby function in Pandas"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Coffee Type\n",
"Espresso 235.0\n",
"Latte 170.0\n",
"Name: Units Sold, dtype: float64"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.groupby(['Coffee Type'])['Units Sold'].sum()"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Coffee Type\n",
"Espresso 39.166667\n",
"Latte 28.333333\n",
"Name: Units Sold, dtype: float64"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.groupby(['Coffee Type'])['Units Sold'].mean()"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Units Sold</th>\n",
" <th>price</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Coffee Type</th>\n",
" <th>Day</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"7\" valign=\"top\">Espresso</th>\n",
" <th>Friday</th>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Monday</th>\n",
" <td>25.0</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Saturday</th>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sunday</th>\n",
" <td>45.0</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Thursday</th>\n",
" <td>40.0</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tuesday</th>\n",
" <td>0.0</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wednesday</th>\n",
" <td>35.0</td>\n",
" <td>3.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"7\" valign=\"top\">Latte</th>\n",
" <th>Friday</th>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Monday</th>\n",
" <td>10.0</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Saturday</th>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sunday</th>\n",
" <td>35.0</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Thursday</th>\n",
" <td>30.0</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tuesday</th>\n",
" <td>0.0</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wednesday</th>\n",
" <td>25.0</td>\n",
" <td>5.99</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Units Sold price\n",
"Coffee Type Day \n",
"Espresso Friday 45.0 3.99\n",
" Monday 25.0 3.99\n",
" Saturday 45.0 3.99\n",
" Sunday 45.0 3.99\n",
" Thursday 40.0 3.99\n",
" Tuesday 0.0 3.99\n",
" Wednesday 35.0 3.99\n",
"Latte Friday 35.0 5.99\n",
" Monday 10.0 5.99\n",
" Saturday 35.0 5.99\n",
" Sunday 35.0 5.99\n",
" Thursday 30.0 5.99\n",
" Tuesday 0.0 5.99\n",
" Wednesday 25.0 5.99"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coffee.groupby(['Coffee Type', 'Day']).agg({'Units Sold': 'sum', 'price':
'mean'})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Pivot Tables"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [],
"source": [
"pivot = coffee.pivot(columns='Coffee Type', index='Day', values='revenue')"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Coffee Type\n",
"Espresso 937.65\n",
"Latte 1018.30\n",
"dtype: float64"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pivot.sum()"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Day\n",
"Friday 389.20\n",
"Monday 159.65\n",
"Saturday 389.20\n",
"Sunday 389.20\n",
"Thursday 339.30\n",
"Tuesday 0.00\n",
"Wednesday 289.40\n",
"dtype: float64"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pivot.sum(axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using datetime with Groupby"
]
},
{
"cell_type": "code",
"execution_count": 293,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year_born</th>\n",
" <th>month_born</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1437</th>\n",
" <td>1970.0</td>\n",
" <td>1.0</td>\n",
" <td>239</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1461</th>\n",
" <td>1972.0</td>\n",
" <td>1.0</td>\n",
" <td>229</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1629</th>\n",
" <td>1986.0</td>\n",
" <td>1.0</td>\n",
" <td>227</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1497</th>\n",
" <td>1975.0</td>\n",
" <td>1.0</td>\n",
" <td>227</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1617</th>\n",
" <td>1985.0</td>\n",
" <td>1.0</td>\n",
" <td>225</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>1857.0</td>\n",
" <td>5.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>1857.0</td>\n",
" <td>7.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>1857.0</td>\n",
" <td>8.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>1857.0</td>\n",
" <td>9.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1884</th>\n",
" <td>2009.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1885 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" year_born month_born name\n",
"1437 1970.0 1.0 239\n",
"1461 1972.0 1.0 229\n",
"1629 1986.0 1.0 227\n",
"1497 1975.0 1.0 227\n",
"1617 1985.0 1.0 225\n",
"... ... ... ...\n",
"95 1857.0 5.0 1\n",
"96 1857.0 7.0 1\n",
"97 1857.0 8.0 1\n",
"98 1857.0 9.0 1\n",
"1884 2009.0 1.0 1\n",
"\n",
"[1885 rows x 3 columns]"
]
},
"execution_count": 293,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bios['born_date'] = pd.to_datetime(bios['born_date'])\n",
"bios['month_born'] = bios['born_date'].dt.month\n",
"bios['year_born'] = bios['born_date'].dt.year\n",
"bios.groupby([bios['year_born'],bios['month_born']])
['name'].count().reset_index().sort_values('name', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Functionality"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [],
"source": [
"# shift() rank() cumsum() rolling()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"latte = coffee[coffee['Coffee Type']==\"Latte\"].copy()\n",
"latte['3day'] = latte['Units Sold'].rolling(3).sum()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Day</th>\n",
" <th>Coffee Type</th>\n",
" <th>Units Sold</th>\n",
" <th>price</th>\n",
" <th>revenue</th>\n",
" <th>yesterday_revenue</th>\n",
" <th>pct_change</th>\n",
" <th>cumulative_revenue</th>\n",
" <th>3day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15.000000</td>\n",
" <td>5.99</td>\n",
" <td>89.85</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>189.6</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>28.333333</td>\n",
" <td>5.99</td>\n",
" <td>119.80</td>\n",
" <td>89.85</td>\n",
" <td>133.333333</td>\n",
" <td>429.1</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25.000000</td>\n",
" <td>5.99</td>\n",
" <td>149.75</td>\n",
" <td>119.80</td>\n",
" <td>125.000000</td>\n",
" <td>718.5</td>\n",
" <td>68.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30.000000</td>\n",
" <td>5.99</td>\n",
" <td>179.70</td>\n",
" <td>149.75</td>\n",
" <td>120.000000</td>\n",
" <td>1057.8</td>\n",
" <td>83.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35.000000</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" <td>179.70</td>\n",
" <td>116.666667</td>\n",
" <td>1447.0</td>\n",
" <td>90.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35.000000</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" <td>209.65</td>\n",
" <td>100.000000</td>\n",
" <td>1836.2</td>\n",
" <td>100.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35.000000</td>\n",
" <td>5.99</td>\n",
" <td>209.65</td>\n",
" <td>209.65</td>\n",
" <td>100.000000</td>\n",
" <td>2225.4</td>\n",
" <td>105.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Day Coffee Type Units Sold price revenue
yesterday_revenue \\\n",
"1 Monday Latte 15.000000 5.99 89.85
NaN \n",
"3 Tuesday Latte 28.333333 5.99 119.80
89.85 \n",
"5 Wednesday Latte 25.000000 5.99 149.75
119.80 \n",
"7 Thursday Latte 30.000000 5.99 179.70
149.75 \n",
"9 Friday Latte 35.000000 5.99 209.65
179.70 \n",
"11 Saturday Latte 35.000000 5.99 209.65
209.65 \n",
"13 Sunday Latte 35.000000 5.99 209.65
209.65 \n",
"\n",
" pct_change cumulative_revenue 3day \n",
"1 NaN 189.6 NaN \n",
"3 133.333333 429.1 NaN \n",
"5 125.000000 718.5 68.333333 \n",
"7 120.000000 1057.8 83.333333 \n",
"9 116.666667 1447.0 90.000000 \n",
"11 100.000000 1836.2 100.000000 \n",
"13 100.000000 2225.4 105.000000 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"latte"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Functionality (cont.)\n",
"These two libraries didn't actually make it into final video\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install pyjanitor"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>day</th>\n",
" <th>coffee_type</th>\n",
" <th>units_sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Monday</td>\n",
" <td>Espresso</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Monday</td>\n",
" <td>Latte</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tuesday</td>\n",
" <td>Espresso</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Tuesday</td>\n",
" <td>Latte</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Wednesday</td>\n",
" <td>Espresso</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Wednesday</td>\n",
" <td>Latte</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Thursday</td>\n",
" <td>Espresso</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Thursday</td>\n",
" <td>Latte</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Friday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Friday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Saturday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Saturday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sunday</td>\n",
" <td>Espresso</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Sunday</td>\n",
" <td>Latte</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" day coffee_type units_sold\n",
"0 Monday Espresso 25\n",
"1 Monday Latte 15\n",
"2 Tuesday Espresso 30\n",
"3 Tuesday Latte 20\n",
"4 Wednesday Espresso 35\n",
"5 Wednesday Latte 25\n",
"6 Thursday Espresso 40\n",
"7 Thursday Latte 30\n",
"8 Friday Espresso 45\n",
"9 Friday Latte 35\n",
"10 Saturday Espresso 45\n",
"11 Saturday Latte 35\n",
"12 Sunday Espresso 45\n",
"13 Sunday Latte 35"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import janitor\n",
"\n",
"coffee.clean_names()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install skimpy"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-
family:Menlo,'DejaVu Sans Mono',consolas,'Courier
New',monospace\">╭──────────────────────────────────────────────── skimpy summary
─────────────────────────────────────────────────╮\n",
"│ <span style=\"font-style: italic\"> Data Summary </span>
<span style=\"font-style: italic\"> Data Types </span>
│\n",
"│ ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┏━━━━━━━━━━━━━┳━━━━━━━┓
│\n",
"│ ┃<span style=\"color: #008080; text-decoration-color: #008080; font-
weight: bold\"> dataframe </span>┃<span style=\"color: #008080; text-
decoration-color: #008080; font-weight: bold\"> Values </span>┃ ┃<span
style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\"> Column
Type </span>┃<span style=\"color: #008080; text-decoration-color: #008080; font-
weight: bold\"> Count </span>┃
│\n",
"│ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ ┡━━━━━━━━━━━━━╇━━━━━━━┩
│\n",
"│ │ Number of rows │ 308408 │ │ string │ 7 │
│\n",
"│ │ Number of columns │ 11 │ │ float64 │ 2 │
│\n",
"│ └───────────────────┴────────┘ │ int64 │ 1 │
│\n",
"│ │ bool │ 1 │
│\n",
"│ └─────────────┴───────┘
│\n",
"│ <span style=\"font-style: italic\">
number </span> │\n",
"│
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━
━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓ │\n",
"│ ┃<span style=\"font-weight: bold\"> column_name </span>┃<span
style=\"font-weight: bold\"> NA </span>┃<span style=\"font-weight: bold\"> NA
% </span>┃<span style=\"font-weight: bold\"> mean </span>┃<span style=\"font-
weight: bold\"> sd </span>┃<span style=\"font-weight: bold\"> p0
</span>┃<span style=\"font-weight: bold\"> p25 </span>┃<span style=\"font-
weight: bold\"> p50 </span>┃<span style=\"font-weight: bold\"> p75
</span>┃<span style=\"font-weight: bold\"> p100 </span>┃<span style=\"font-
weight: bold\"> hist </span>┃ │\n",
"│
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━
━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">year
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2601</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0.84</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2000</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
31</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
1900</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2000</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2000</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2000</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2000</span> │ <span style=\"color: #008000; text-decoration-color: #008000\">▁▂▂▅▇▇
</span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color:
#af87ff\">athlete_id </span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 0</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 0</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 73000</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 41000</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 1</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 34000</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 74000</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 110000</span> │ <span style=\"color: #008080; text-decoration-
color: #008080\"> 150000</span> │ <span style=\"color: #008000; text-decoration-
color: #008000\">▆▇▆▇▇▅ </span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">place
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
25215</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
8.18</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
16</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
19</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
1</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
5</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
9</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
20</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
180</span> │ <span style=\"color: #008000; text-decoration-color: #008000\"> ▇▁
</span> │ │\n",
"│
└────────────────┴─────────┴───────┴────────┴────────┴───────┴────────┴────────┴───
──────┴─────────┴─────────┘ │\n",
"│ <span style=\"font-style: italic\">
bool </span> │\n",
"│
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ │\n",
"│ ┃<span style=\"font-weight: bold\"> column_name
</span>┃<span style=\"font-weight: bold\"> true </span>┃<span
style=\"font-weight: bold\"> true rate </span>┃<span
style=\"font-weight: bold\"> hist </span>┃ │\n",
"│
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">tied
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
45940</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0.15</span> │ <span style=\"color: #008000; text-decoration-color: #008000\">
▇ ▁ </span> │ │\n",
"│
└───────────────────────────────────┴───────────────────┴──────────────────────────
────┴─────────────────────┘ │\n",
"│ <span style=\"font-style: italic\">
string </span> │\n",
"│
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━
━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ │\n",
"│ ┃<span style=\"font-weight: bold\"> column_name </span>┃<span
style=\"font-weight: bold\"> NA </span>┃<span style=\"font-weight:
bold\"> NA % </span>┃<span style=\"font-weight: bold\"> words per row
</span>┃<span style=\"font-weight: bold\"> total words </span>┃ │\n",
"│
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━
━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">type
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2601</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0.84</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0.99</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
305807</span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color:
#af87ff\">discipline </span> │ <span style=\"color: #008080; text-
decoration-color: #008080\"> 1</span> │ <span style=\"color: #008080;
text-decoration-color: #008080\"> 0</span> │ <span style=\"color: #008080;
text-decoration-color: #008080\"> 2</span> │ <span
style=\"color: #008080; text-decoration-color: #008080\">
610211</span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">event
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
4.2</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
1303323</span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">as
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
2.1</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
634574</span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">noc
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
1</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
1</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
308407</span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">team
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
186694</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
60.53</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0.62</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
190405</span> │ │\n",
"│ │ <span style=\"color: #af87ff; text-decoration-color: #af87ff\">medal
</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
264269</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
85.69</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
0.14</span> │ <span style=\"color: #008080; text-decoration-color: #008080\">
44139</span> │ │\n",
"│
└─────────────────────────┴───────────────┴─────────────┴──────────────────────────
──┴───────────────────────┘ │\n",
"╰────────────────────────────────────────────────────── End
──────────────────────────────────────────────────────╯\n",
"</pre>\n"
],
"text/plain": [
"╭──────────────────────────────────────────────── skimpy summary
─────────────────────────────────────────────────╮\n",
"│ \u001b[3m Data Summary \u001b[0m \u001b[3m Data
Types \u001b[0m │\
n",
"│ ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┏━━━━━━━━━━━━━┳━━━━━━━┓
│\n",
"│ ┃\u001b[1;36m \u001b[0m\u001b[1;36mdataframe \u001b[0m\u001b[1;36m
\u001b[0m┃\u001b[1;36m \u001b[0m\u001b[1;36mValues\u001b[0m\u001b[1;36m \u001b[0m┃
┃\u001b[1;36m \u001b[0m\u001b[1;36mColumn Type\u001b[0m\u001b[1;36m \u001b[0m┃\
u001b[1;36m \u001b[0m\u001b[1;36mCount\u001b[0m\u001b[1;36m \u001b[0m┃
│\n",
"│ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ ┡━━━━━━━━━━━━━╇━━━━━━━┩
│\n",
"│ │ Number of rows │ 308408 │ │ string │ 7 │
│\n",
"│ │ Number of columns │ 11 │ │ float64 │ 2 │
│\n",
"│ └───────────────────┴────────┘ │ int64 │ 1 │
│\n",
"│ │ bool │ 1 │
│\n",
"│ └─────────────┴───────┘
│\n",
"│ \u001b[3m number
\u001b[0m │\n",
"│
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━
━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓ │\n",
"│ ┃\u001b[1m \u001b[0m\u001b[1mcolumn_name \u001b[0m\u001b[1m \u001b[0m┃\
u001b[1m \u001b[0m\u001b[1mNA \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\
u001b[1mNA % \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mmean \
u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1msd \u001b[0m\u001b[1m \
u001b[0m┃\u001b[1m \u001b[0m\u001b[1mp0 \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \
u001b[0m\u001b[1mp25 \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mp50
\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mp75 \u001b[0m\
u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mp100 \u001b[0m\u001b[1m \
u001b[0m┃\u001b[1m \u001b[0m\u001b[1mhist \u001b[0m\u001b[1m \u001b[0m┃ │\n",
"│
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━
━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩ │\n",
"│ │ \u001b[38;5;141myear \u001b[0m │ \u001b[36m 2601\u001b[0m
│ \u001b[36m 0.84\u001b[0m │ \u001b[36m 2000\u001b[0m │ \u001b[36m 31\u001b[0m
│ \u001b[36m 1900\u001b[0m │ \u001b[36m 2000\u001b[0m │ \u001b[36m 2000\u001b[0m
│ \u001b[36m 2000\u001b[0m │ \u001b[36m 2000\u001b[0m │ \u001b[32m▁▂▂▅▇▇ \
u001b[0m │ │\n",
"│ │ \u001b[38;5;141mathlete_id \u001b[0m │ \u001b[36m 0\u001b[0m
│ \u001b[36m 0\u001b[0m │ \u001b[36m 73000\u001b[0m │ \u001b[36m 41000\u001b[0m
│ \u001b[36m 1\u001b[0m │ \u001b[36m 34000\u001b[0m │ \u001b[36m 74000\u001b[0m
│ \u001b[36m 110000\u001b[0m │ \u001b[36m 150000\u001b[0m │ \u001b[32m▆▇▆▇▇▅ \
u001b[0m │ │\n",
"│ │ \u001b[38;5;141mplace \u001b[0m │ \u001b[36m 25215\u001b[0m
│ \u001b[36m 8.18\u001b[0m │ \u001b[36m 16\u001b[0m │ \u001b[36m 19\u001b[0m
│ \u001b[36m 1\u001b[0m │ \u001b[36m 5\u001b[0m │ \u001b[36m 9\u001b[0m
│ \u001b[36m 20\u001b[0m │ \u001b[36m 180\u001b[0m │ \u001b[32m ▇▁ \
u001b[0m │ │\n",
"│
└────────────────┴─────────┴───────┴────────┴────────┴───────┴────────┴────────┴───
──────┴─────────┴─────────┘ │\n",
"│ \u001b[3m bool
\u001b[0m │\n",
"│
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ │\n",
"│ ┃\u001b[1m \u001b[0m\u001b[1mcolumn_name \u001b[0m\
u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mtrue \u001b[0m\
u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mtrue rate \
u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mhist \
u001b[0m\u001b[1m \u001b[0m┃ │\n",
"│
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ │\n",
"│ │ \u001b[38;5;141mtied \u001b[0m │ \u001b[36m
45940\u001b[0m │ \u001b[36m 0.15\u001b[0m │ \u001b[32m
▇ ▁ \u001b[0m │ │\n",
"│
└───────────────────────────────────┴───────────────────┴──────────────────────────
────┴─────────────────────┘ │\n",
"│ \u001b[3m string
\u001b[0m │\n",
"│
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━
━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ │\n",
"│ ┃\u001b[1m \u001b[0m\u001b[1mcolumn_name \u001b[0m\u001b[1m \
u001b[0m┃\u001b[1m \u001b[0m\u001b[1mNA \u001b[0m\u001b[1m \u001b[0m┃\
u001b[1m \u001b[0m\u001b[1mNA % \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \
u001b[0m\u001b[1mwords per row \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \
u001b[0m\u001b[1mtotal words \u001b[0m\u001b[1m \u001b[0m┃ │\n",
"│
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━
━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ │\n",
"│ │ \u001b[38;5;141mtype \u001b[0m │ \u001b[36m
2601\u001b[0m │ \u001b[36m 0.84\u001b[0m │ \u001b[36m
0.99\u001b[0m │ \u001b[36m 305807\u001b[0m │ │\n",
"│ │ \u001b[38;5;141mdiscipline \u001b[0m │ \u001b[36m
1\u001b[0m │ \u001b[36m 0\u001b[0m │ \u001b[36m 2\
u001b[0m │ \u001b[36m 610211\u001b[0m │ │\n",
"│ │ \u001b[38;5;141mevent \u001b[0m │ \u001b[36m
0\u001b[0m │ \u001b[36m 0\u001b[0m │ \u001b[36m 4.2\
u001b[0m │ \u001b[36m 1303323\u001b[0m │ │\n",
"│ │ \u001b[38;5;141mas \u001b[0m │ \u001b[36m
0\u001b[0m │ \u001b[36m 0\u001b[0m │ \u001b[36m 2.1\
u001b[0m │ \u001b[36m 634574\u001b[0m │ │\n",
"│ │ \u001b[38;5;141mnoc \u001b[0m │ \u001b[36m
1\u001b[0m │ \u001b[36m 0\u001b[0m │ \u001b[36m 1\
u001b[0m │ \u001b[36m 308407\u001b[0m │ │\n",
"│ │ \u001b[38;5;141mteam \u001b[0m │ \u001b[36m
186694\u001b[0m │ \u001b[36m 60.53\u001b[0m │ \u001b[36m
0.62\u001b[0m │ \u001b[36m 190405\u001b[0m │ │\n",
"│ │ \u001b[38;5;141mmedal \u001b[0m │ \u001b[36m
264269\u001b[0m │ \u001b[36m 85.69\u001b[0m │ \u001b[36m
0.14\u001b[0m │ \u001b[36m 44139\u001b[0m │ │\n",
"│
└─────────────────────────┴───────────────┴─────────────┴──────────────────────────
──┴───────────────────────┘ │\n",
"╰────────────────────────────────────────────────────── End
──────────────────────────────────────────────────────╯\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from skimpy import skim\n",
"\n",
"skim(results)"
]
},
{
"cell_type": "code",
"execution_count": 319,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 14 entries, 0 to 13\n",
"Data columns (total 7 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Day 14 non-null object \n",
" 1 Coffee Type 14 non-null object \n",
" 2 Units Sold 14 non-null float64\n",
" 3 price 14 non-null float64\n",
" 4 revenue 14 non-null float64\n",
" 5 yesterday_revenue 12 non-null float64\n",
" 6 pct_change 12 non-null float64\n",
"dtypes: float64(5), object(2)\n",
"memory usage: 916.0+ bytes\n"
]
}
],
"source": [
"coffee.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## New Functionality"
]
},
{
"cell_type": "code",
"execution_count": 335,
"metadata": {},
"outputs": [],
"source": [
"results_numpy = pd.read_csv('./data/results.csv')\n",
"results_arrow = pd.read_csv('./data/results.csv', engine='pyarrow',
dtype_backend='pyarrow')"
]
},
{
"cell_type": "code",
"execution_count": 337,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 308408 entries, 0 to 308407\n",
"Data columns (total 11 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 year 305807 non-null float64\n",
" 1 type 305807 non-null object \n",
" 2 discipline 308407 non-null object \n",
" 3 event 308408 non-null object \n",
" 4 as 308408 non-null object \n",
" 5 athlete_id 308408 non-null int64 \n",
" 6 noc 308407 non-null object \n",
" 7 team 121714 non-null object \n",
" 8 place 283193 non-null float64\n",
" 9 tied 308408 non-null bool \n",
" 10 medal 44139 non-null object \n",
"dtypes: bool(1), float64(2), int64(1), object(7)\n",
"memory usage: 23.8+ MB\n"
]
}
],
"source": [
"results_numpy.info()"
]
},
{
"cell_type": "code",
"execution_count": 338,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 308408 entries, 0 to 308407\n",
"Data columns (total 11 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 year 305807 non-null double[pyarrow]\n",
" 1 type 305807 non-null string[pyarrow]\n",
" 2 discipline 308407 non-null string[pyarrow]\n",
" 3 event 308408 non-null string[pyarrow]\n",
" 4 as 308408 non-null string[pyarrow]\n",
" 5 athlete_id 308408 non-null int64[pyarrow] \n",
" 6 noc 308407 non-null string[pyarrow]\n",
" 7 team 121714 non-null string[pyarrow]\n",
" 8 place 283193 non-null double[pyarrow]\n",
" 9 tied 308408 non-null bool[pyarrow] \n",
" 10 medal 44139 non-null string[pyarrow]\n",
"dtypes: bool[pyarrow](1), double[pyarrow](2), int64[pyarrow](1),
string[pyarrow](7)\n",
"memory usage: 37.5 MB\n"
]
}
],
"source": [
"results_arrow.info()"
]
},
{
"cell_type": "code",
"execution_count": 349,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>athlete_id</th>\n",
" <th>name</th>\n",
" <th>born_date</th>\n",
" <th>born_city</th>\n",
" <th>born_region</th>\n",
" <th>born_country</th>\n",
" <th>NOC</th>\n",
" <th>height_cm</th>\n",
" <th>weight_kg</th>\n",
" <th>died_date</th>\n",
" <th>month_born</th>\n",
" <th>year_born</th>\n",
" <th>height_rank</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Jean-François Blanchy</td>\n",
" <td>1886-12-12</td>\n",
" <td>Bordeaux</td>\n",
" <td>Gironde</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1960-10-02</td>\n",
" <td>12.0</td>\n",
" <td>1886.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>Arnaud Boetsch</td>\n",
" <td>1969-04-01</td>\n",
" <td>Meulan</td>\n",
" <td>Yvelines</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>NaN</td>\n",
" <td>4.0</td>\n",
" <td>1969.0</td>\n",
" <td>27597.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>Jean Borotra</td>\n",
" <td>1898-08-13</td>\n",
" <td>Biarritz</td>\n",
" <td>Pyrénées-Atlantiques</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>183.0</td>\n",
" <td>76.0</td>\n",
" <td>1994-07-17</td>\n",
" <td>8.0</td>\n",
" <td>1898.0</td>\n",
" <td>27597.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>Jacques Brugnon</td>\n",
" <td>1895-05-11</td>\n",
" <td>Paris VIIIe</td>\n",
" <td>Paris</td>\n",
" <td>FRA</td>\n",
" <td>France</td>\n",
" <td>168.0</td>\n",
" <td>64.0</td>\n",
" <td>1978-03-20</td>\n",
" <td>5.0</td>\n",
" <td>1895.0</td>\n",
" <td>83975.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>Albert Canet</td>\n",
" <td>1878-04-17</td>\n",
" <td>Wandsworth</td>\n",
" <td>England</td>\n",
" <td>GBR</td>\n",
" <td>France</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1930-07-25</td>\n",
" <td>4.0</td>\n",
" <td>1878.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" athlete_id name born_date born_city \\\n",
"0 1 Jean-François Blanchy 1886-12-12 Bordeaux \n",
"1 2 Arnaud Boetsch 1969-04-01 Meulan \n",
"2 3 Jean Borotra 1898-08-13 Biarritz \n",
"3 4 Jacques Brugnon 1895-05-11 Paris VIIIe \n",
"4 5 Albert Canet 1878-04-17 Wandsworth \n",
"\n",
" born_region born_country NOC height_cm weight_kg \\\n",
"0 Gironde FRA France NaN NaN \n",
"1 Yvelines FRA France 183.0 76.0 \n",
"2 Pyrénées-Atlantiques FRA France 183.0 76.0 \n",
"3 Paris FRA France 168.0 64.0 \n",
"4 England GBR France NaN NaN \n",
"\n",
" died_date month_born year_born height_rank \n",
"0 1960-10-02 12.0 1886.0 NaN \n",
"1 NaN 4.0 1969.0 27597.5 \n",
"2 1994-07-17 8.0 1898.0 27597.5 \n",
"3 1978-03-20 5.0 1895.0 83975.0 \n",
"4 1930-07-25 4.0 1878.0 NaN "
]
},
"execution_count": 349,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filtered_bios = bios[(bios['born_region'] == 'New Hampshire') |
(bios['born_city'] == 'San Francisco')]\n",
"\n",
"bios.head()"
]
},
{
"cell_type": "code",
"execution_count": 351,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Item</th>\n",
" <th>Units Sold</th>\n",
" <th>Price Per Unit</th>\n",
" <th>Salesperson</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2024-05-01</td>\n",
" <td>Apple</td>\n",
" <td>30</td>\n",
" <td>1.00</td>\n",
" <td>John</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2024-05-01</td>\n",
" <td>Banana</td>\n",
" <td>21</td>\n",
" <td>0.50</td>\n",
" <td>John</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2024-05-01</td>\n",
" <td>Orange</td>\n",
" <td>15</td>\n",
" <td>0.75</td>\n",
" <td>John</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2024-05-02</td>\n",
" <td>Apple</td>\n",
" <td>40</td>\n",
" <td>1.00</td>\n",
" <td>Alice</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2024-05-02</td>\n",
" <td>Banana</td>\n",
" <td>34</td>\n",
" <td>0.50</td>\n",
" <td>Alice</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2024-05-03</td>\n",
" <td>Orange</td>\n",
" <td>20</td>\n",
" <td>0.75</td>\n",
" <td>John</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2024-05-03</td>\n",
" <td>Apple</td>\n",
" <td>45</td>\n",
" <td>1.00</td>\n",
" <td>Alice</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2024-05-03</td>\n",
" <td>Orange</td>\n",
" <td>25</td>\n",
" <td>0.75</td>\n",
" <td>John</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date Item Units Sold Price Per Unit Salesperson\n",
"0 2024-05-01 Apple 30 1.00 John\n",
"1 2024-05-01 Banana 21 0.50 John\n",
"2 2024-05-01 Orange 15 0.75 John\n",
"3 2024-05-02 Apple 40 1.00 Alice\n",
"4 2024-05-02 Banana 34 0.50 Alice\n",
"5 2024-05-03 Orange 20 0.75 John\n",
"6 2024-05-03 Apple 45 1.00 Alice\n",
"7 2024-05-03 Orange 25 0.75 John"
]
},
"execution_count": 351,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"# Creating a DataFrame\n",
"data = {\n",
" 'Date': ['2024-05-01', '2024-05-01', '2024-05-01', '2024-05-02', '2024-05-
02', '2024-05-03', '2024-05-03', '2024-05-03'],\n",
" 'Item': ['Apple', 'Banana', 'Orange', 'Apple', 'Banana', 'Orange',
'Apple', 'Orange'],\n",
" 'Units Sold': [30, 21, 15, 40, 34, 20, 45, 25],\n",
" 'Price Per Unit': [1.0, 0.5, 0.75, 1.0, 0.5, 0.75, 1.0, 0.75],\n",
" 'Salesperson': ['John', 'John', 'John', 'Alice', 'Alice', 'John', 'Alice',
'John']\n",
"}\n",
"\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Display the DataFrame\n",
"df\n"
]
},
{
"cell_type": "code",
"execution_count": 353,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Salesperson</th>\n",
" <th>Alice</th>\n",
" <th>John</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2024-05-01</th>\n",
" <td>NaN</td>\n",
" <td>66.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-05-02</th>\n",
" <td>74.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-05-03</th>\n",
" <td>45.0</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Salesperson Alice John\n",
"Date \n",
"2024-05-01 NaN 66.0\n",
"2024-05-02 74.0 NaN\n",
"2024-05-03 45.0 45.0"
]
},
"execution_count": 353,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pivot_table = pd.pivot_table(df, values='Units Sold', index='Date',
columns='Item', aggfunc='sum')\n",
"pivot_table\n"
]
},
{
"cell_type": "code",
"execution_count": 356,
"metadata": {},
"outputs": [
{
"data": {
"image/png":
"iVBORw0KGgoAAAANSUhEUgAAA1EAAAIjCAYAAADiGJHUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIH
ZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpA
ABVyUlEQVR4nO3deVxU9eL/
8feArAq4oBIp2leTRBFT0UxzKZfMNMtuWuKe1W3Q0vKmLS7dbntm5dz8tlqZZnqvS5lbZnIr+4YVtoCWZVK
5kiKCqMic3x9e5tfINgdnmIF5PR8PHjXnnDnnPYcPyts55zMWwzAMAQAAAABcEuDtAAAAAABQk1CiAAAAAM
AEShQAAAAAmECJAgAAAAATKFEAAAAAYAIlCgAAAABMoEQBAAAAgAmUKAAAAAAwgRIFAAAAACZQogB41Zw5c
2SxWKrlWH369FGfPn0cjz/++GNZLBatWLGiWo4/
btw4tWzZslqOVVX5+fm69dZbFRMTI4vForvvvtujx1u0aJEsFou2b99e6bbnfv9qs19+
+UUWi0WLFi2q8nOffvpp9wdzgcVi0Zw5c7xy7BI14WetZOz/
8ssv3o4CoAooUQDcpuSXgpKv0NBQxcbGauDAgXr++ed1/
Phxtxxn3759mjNnjjIyMtyyP3fy5WyuePTRR7Vo0SL99a9/1VtvvaXRo0dX+pzi4mLFxsbKYrFo3bp1ZW7z
z3/+s0qFwF0effRRrVq1yu37bdmypa699toy11V3Sa+KDz74wOuFx1Xvv/+
+rr76ajVq1EihoaFq06aN7r33Xv3xxx/ejgbAD1GiALjdww8/rLfeeksvvviiJk+eLEm6++67lZiYqG++
+cZp2wcffFCFhYWm9r9v3z7NnTvXdFHZuHGjNm7caOo5ZlWU7eWXX9auXbs8evzz9dFHH+myyy7T7NmzlZK
Sos6dO7v0nP3796tly5Z6++23y9ymtpYoT2nRooUKCwtdKrHn44MPPtDcuXPdus/
CwkI9+OCDbt3nvffeqyFDhujAgQO67777tGDBAvXr108LFixQUlKSz/
9clWX06NEqLCxUixYtvB0FQBXU8XYAALXPoEGD1KVLF8fjmTNn6qOPPtK1116roUOHKisrS2FhYZKkOnXqq
E4dz/5RdOLECYWHhys4ONijx6lMUFCQV4/vikOHDikhIcHUcxYvXqxOnTpp7Nixuv/+
+1VQUKC6det6KKF/KHkntyZyd+6lS5fqmWee0YgRI/T2228rMDDQsW7cuHHq27ev/
vKXv+irr77y+J8l7hQYGOj0WgDULLwTBaBaXHnllXrooYe0d+9eLV682LG8rHuiNm3apJ49e6p+/
fqqV6+e4uPjdf/990s6e4lUcnKyJGn8+PGOSwdL3uXo06eP2rdvry+//FK9evVSeHi447nl3VNTXFys++
+/XzExMapbt66GDh2qX3/91Wmbli1baty4caWe+
+d9VpatrPs0CgoKdM8996h58+YKCQlRfHy8nn76aRmG4bSdxWJRamqqVq1apfbt2yskJETt2rXT+vXryz7h
5zh06JAmTpyopk2bKjQ0VElJSXrjjTcc60suPduzZ4/Wrl3ryF7Z/
RqFhYVauXKlRo4cqZtuukmFhYVavXq10zYtW7bU999/
r61btzr2e+734dSpU5o2bZoaN26sunXr6vrrr9fhw4crfV2nTp3S7Nmz1bp1a4WEhKh58+b629/+plOnTjm
2sVgsKigo0BtvvOE4/p+/l7///rsmTJigpk2bOs7ra6+9Vumxq8qV45V3T9Ty5cuVkJCg0NBQtW/
fXitXrqzw/
p+XXnpJrVq1UkhIiJKTk5Wenu5YN27cONlsNklyugy3xDvvvKPOnTsrIiJCkZGRSkxM1HPPPVfp6zv3nqiS
n/Hdu3dr3Lhxql+/vqKiojR+/
HidOHGi0v3NnTtXDRo00EsvvVSqdHTt2lX33Xefvv3223IvmzQMQy1bttR1111Xat3JkycVFRWl22+/
XdL//zl49913NXfuXF144YWKiIjQjTfeqGPHjunUqVO6++671aRJE9WrV0/jx493Gmslrz81NVVvv/
224uPjFRoaqs6dOystLc1pu/LuiVq3bp169+7tOO/JyclasmSJY/2PP/
6o4cOHKyYmRqGhoWrWrJlGjhypY8eOVXouAbhPzfknGwA13ujRo3X//fdr48aNmjRpUpnbfP/
997r22mvVoUMHPfzwwwoJCdHu3bv16aefSpLatm2rhx9+WLNmzdJtt92mK664QpJ0+eWXO/bxxx9/
aNCgQRo5cqRSUlLUtGnTCnP94x//kMVi0X333adDhw5p/vz56tevnzIyMhzvmLnClWx/
ZhiGhg4dqi1btmjixInq2LGjNmzYoOnTp+v333/Xs88+67T9J598on//
+9+68847FRERoeeff17Dhw9Xdna2GjVqVG6uwsJC9enTR7t371ZqaqouuugiLV+
+XOPGjVNubq7uuusutW3bVm+99ZamTp2qZs2a6Z577pEkNW7cuMLXvGbNGuXn52vkyJGKiYlRnz599Pbbb+
uWW25xbDN//nxNnjxZ9erV0wMPPCBJpb4nkydPVoMGDTR79mz98ssvmj9/
vlJTU7Vs2bJyj2232zV06FB98sknuu2229S2bVt9++23evbZZ/
XDDz84Lt976623dOutt6pr16667bbbJEmtWrWSJB08eFCXXXaZ4xffxo0ba926dZo4caLy8vJcmlijqKhIO
Tk5pZaX9Uvt+Rxv7dq1GjFihBITE/XYY4/p6NGjmjhxoi688MIyt1+yZImOHz+u22+/XRaLRU8+
+aRuuOEG/fzzzwoKCtLtt9+uffv2adOmTXrrrbecnrtp0ybdfPPNuuqqq/TEE09IkrKysvTpp5/
qrrvuqvSclOWmm27SRRddpMcee0xfffWVXnnlFTVp0sSx/
7L8+OOP2rVrl8aNG6fIyMgytxkzZoxmz56t999/XyNHjiy13mKxKCUlRU8+
+aSOHDmihg0bOta99957ysvLU0pKitNzHnvsMYWFhWnGjBnavXu3XnjhBQUFBSkgIEBHjx7VnDlz9Pnnn2v
RokW66KKLNGvWLKfnb926VcuWLdOUKVMUEhKif/
7zn7r66qv1xRdfqH379uW+3kWLFmnChAlq166dZs6cqfr16+vrr7/
W+vXrdcstt+j06dMaOHCgTp06pcmTJysmJka///673n//
feXm5ioqKqrcfQNwMwMA3OT11183JBnp6enlbhMVFWVceumljsezZ882/
vxH0bPPPmtIMg4fPlzuPtLT0w1Jxuuvv15qXe/evQ1JxsKFC8tc17t3b8fjLVu2GJKMCy+80MjLy3Msf/
fddw1JxnPPPedY1qJFC2Ps2LGV7rOibGPHjjVatGjheLxq1SpDkvHII484bXfjjTcaFovF2L17t2OZJCM4O
Nhp2Y4dOwxJxgsvvFDqWH82f/
58Q5KxePFix7LTp08b3bt3N+rVq+f02lu0aGEMHjy4wv392bXXXmv06NHD8fill14y6tSpYxw6dMhpu3bt2
jmdpxIlY6Zfv36G3W53LJ86daoRGBho5ObmOpade67feustIyAgwPjPf/7jtM+FCxcakoxPP/
3Usaxu3bplfv8mTpxoXHDBBUZOTo7T8pEjRxpRUVHGiRMnKnz9LVq0MCRV+LV8+XLTx9uzZ0+pcZSYmGg0a
9bMOH78uGPZxx9/bEhyGlclz23UqJFx5MgRx/LVq1cbkoz33nvPscxqtRpl/
Spw1113GZGRkcaZM2cqfP1lkWTMnj3b8bjkZ3zChAlO211//fVGo0aNKtxXyc/
Is88+W+F2kZGRRqdOnRyPz/
1Z27VrlyHJePHFF52eN3ToUKNly5aOsVfyZ0L79u2N06dPO7a7+eabDYvFYgwaNMjp+d27d3c6jmEYju/
79u3bHcv27t1rhIaGGtdff71jWcnY37Nnj2EYhpGbm2tEREQY3bp1MwoLC532WZLv66+/LjWmAHgHl/
MBqFb16tWrcJa++vXrS5JWr14tu91epWOEhIRo/PjxLm8/ZswYRUREOB7feOONuuCCC/
TBBx9U6fiu+uCDDxQYGKgpU6Y4Lb/nnntkGEapme769evneAdFkjp06KDIyEj9/
PPPlR4nJiZGN998s2NZUFCQpkyZovz8fG3durVK+f/44w9t2LDBab/
Dhw93XA5lxm233eZ0KdkVV1yh4uJi7d27t9znLF++XG3bttUll1yinJwcx9eVV14pSdqyZUuFxzQMQ//
61780ZMgQGYbhtI+BAwfq2LFj+uqrryrN3q1bN23atKnU17lTjJ/P8fbt26dvv/1WY8aMUb169RzLe/
furcTExDKfM2LECDVo0MDxuOSd0crGi3T257CgoECbNm2qdFtX3XHHHU6Pr7jiCv3xxx/
Ky8sr9zklf1b8+eezLBERERXup02bNurWrZvTxCdHjhzRunXrNGrUqFKXFI8ZM8bpHsZu3brJMAxNmDDBab
tu3brp119/1ZkzZ5yWd+/
e3WlSlri4OF133XXasGGDiouLy8y4adMmHT9+XDNmzCh1X1lJvpJ3mjZs2ODSpZAAPIcSBaBa5efnV/
gL0YgRI9SjRw/deuutatq0qUaOHKl3333XVKG68MILTU0icfHFFzs9tlgsat26tcc/
v2Xv3r2KjY0tdT7atm3rWP9ncXFxpfbRoEEDHT16tNLjXHzxxQoIcP4jv7zjuGrZsmUqKirSpZdeqt27d2v
37t06cuRIqV9WXXHuayv55b+i1/bjjz/q+++/V+PGjZ2+2rRpI+nsfWAVOXz4sHJzc/
XSSy+V2kdJCa9sH5IUHR2tfv36lfo6d2bD8zleyfeodevWpdaVtUyq2jktceedd6pNmzYaNGiQmjVrpgkTJ
rh8/115qpKn5Gejso9HOH78eKVFa8yYMfr0008d53L58uUqKioqcwbEc7OWlJfmzZuXWm6320tdunnunynS
2SJ34sSJcu/1++mnnySpwsv9LrroIk2bNk2vvPKKoqOjNXDgQNlsNu6HAryAe6IAVJvffvtNx44dK/
eXPkkKCwtTWlqatmzZorVr12r9+vVatmyZrrzySm3cuNGl2azM3MfkqvI+ELi4uLjaZtgq7zjGOZNQVJeSo
tSjR48y1//888/6n//5H5f2VZXXZrfblZiYqHnz5pW5/
txfeMt6viSlpKRo7NixZW7ToUOHCvdhRnUf73zGS5MmTZSRkaENGzZo3bp1WrdunV5//
XWNGTPGaUIST+cpKfrnfjTCn+3du1d5eXmVzio5cuRITZ06VW+//bbuv/9+LV68WF26dFF8fLzLWX3hZ/
CZZ57RuHHjtHr1am3cuFFTpkzRY489ps8//1zNmjWrthyAv6NEAag2JTevDxw4sMLtAgICdNVVV+mqq67Sv
Hnz9Oijj+qBBx7Qli1b1K9fv3ILTVX9+OOPTo8Nw9Du3budfqFt0KCBcnNzSz137969TkXBTLYWLVroww8/
LPWv6Dt37nSsd4cWLVrom2++kd1ud3o36nyOs2fPHn322WdKTU1V7969ndbZ7XaNHj1aS5YscXxekLu/
Z9LZySF27Nihq666qtL9l7W+cePGioiIUHFxsfr16+f2fO48Xsn3aPfu3aXWlbXMVRWdt+DgYA0ZMkRDhgy
R3W7XnXfeqf/93//VQw89VOE/
hLhTmzZt1KZNG61atUrPPfdcme82vfnmm5JU7ocel2jYsKEGDx6st99+W6NGjdKnn36q+fPneyJ2qT9TJOm
HH35QeHh4uZO1lFyq+91331V6fhMTE5WYmKgHH3xQn332mXr06KGFCxfqkUceOf/
wAFzC5XwAqsVHH32kv//977rooos0atSocrc7cuRIqWUdO3aUJMdUwiWfQVRWqamKN9980+lyoRUrVmj//
v0aNGiQY1mrVq30+eef6/Tp045l77//fqmp0M1ku+aaa1RcXKwFCxY4LX/
22WdlsVicjn8+rrnmGh04cMBpprszZ87ohRdeUL169UqVIFeUvAv1t7/9TTfeeKPT10033aTevXs7XdJXt2
5dt32/Stx00036/fff9fLLL5daV1hYqIKCggqPHxgYqOHDh+tf//
qXvvvuu1L7cGWKdTPO53ixsbFq37693nzzTeXn5zuWb926Vd9++22VM5U3Xv/44w+nxwEBAY5/
VDh3Sm9PmzVrlo4ePao77rij1P1EX375pZ544gm1b99ew4cPr3Rfo0ePVmZmpqZPn67AwMAyZ/
Nzh23btjnd3/brr79q9erVGjBgQLnvZg0YMEARERF67LHHdPLkSad1Je905eXllbr/
KjExUQEBAdX+fQH8He9EAXC7devWaefOnTpz5owOHjyojz76SJs2bVKLFi20Zs2aCj+M8+GHH1ZaWpoGDx6
sFi1a6NChQ/rnP/
+pZs2aqWfPnpLOFpr69etr4cKFioiIUN26ddWtWzdddNFFVcrbsGFD9ezZU+PHj9fBgwc1f/
58tW7d2mka9ltvvVUrVqzQ1VdfrZtuukk//fSTFi9e7DTRg9lsQ4YMUd++ffXAAw/
ol19+UVJSkjZu3KjVq1fr7rvvLrXvqrrtttv0v//7vxo3bpy+/
PJLtWzZUitWrHD8S3xl95KU5e2331bHjh3LvWRu6NChmjx5sr766it16tRJnTt31osvvqhHHnlErVu3VpMm
TRwTQFTV6NGj9e677+qOO+7Qli1b1KNHDxUXF2vnzp169913tWHDBseHPnfu3Fkffvih5s2bp9jYWF100UX
q1q2bHn/8cW3ZskXdunXTpEmTlJCQoCNHjuirr77Shx9+WGapPx/
nc7xHH31U1113nXr06KHx48fr6NGjWrBggdq3b+9UrMwouW9rypQpGjhwoKNY3HrrrTpy5IiuvPJKNWvWTH
v37tULL7ygjh07Oi6xqy6jRo1Senq6nnvuOWVmZmrUqFFq0KCBvvrqK7322mtq1KiRVqxY4dKHWQ8ePFiNG
jXS8uXLNWjQIDVp0sQjmdu3b6+BAwc6TXEunf3Mq/
JERkbq2Wef1a233qrk5GTdcsstatCggXbs2KETJ07ojTfe0EcffaTU1FT95S9/
UZs2bXTmzBm99dZbjoIOoBp5ZU5AALVSyZS9JV/BwcFGTEyM0b9/f+O5555zmkq7xLlTnG/
evNm47rrrjNjYWCM4ONiIjY01br75ZuOHH35wet7q1auNhIQEo06dOk5TQffu3dto165dmfnKm+J86dKlxs
yZM40mTZoYYWFhxuDBg429e/eWev4zzzxjXHjhhUZISIjRo0cPY/
v27aX2WVG2c6ddNgzDOH78uDF16lQjNjbWCAoKMi6++GLjqaeecpru2zDOTptstVpLZSpv6vVzHTx40Bg/
frwRHR1tBAcHG4mJiWVOw+7KFOdffvmlIcl46KGHyt3ml19+MSQZU6dONQzDMA4cOGAMHjzYiIiIMCQ5zll
50+KXfG+2bNniWFbWuT59+rTxxBNPGO3atTNCQkKMBg0aGJ07dzbmzp1rHDt2zLHdzp07jV69ehlhYWGGJK
dzdvDgQcNqtRrNmzc3goKCjJiYGOOqq64yXnrppQrPg2FUfL5KXsO501G7cryypjg3DMN45513jEsuucQIC
Qkx2rdvb6xZs8YYPny4cckll5R67lNPPVUqk86ZfvzMmTPG5MmTjcaNGxsWi8Xxs7hixQpjwIABRpMmTYzg
4GAjLi7OuP322439+/dXek7OPUbJz/i5H1tw7hTflVm1apXRv39/
o0GDBkZISIjRunVr45577inz4xDK+lkrceeddxqSjCVLlpRaV973rLxxWtZrK/
lZXbx4sXHxxRcbISEhxqWXXuo0lit6/WvWrDEuv/xyIywszIiMjDS6du1qLF261DAMw/
j555+NCRMmGK1atTJCQ0ONhg0bGn379jU+/
PDD8k4bAA+xGIaX7kgGAADnrWPHjmrcuLFbpyOvzaZOnapXX31VBw4cUHh4uNv3b7FYZLVaS12mC6B24Z4o
AABqgKKiolL3w3z88cfasWOH+vTp451QNczJkye1ePFiDR8+3CMFCoD/
4J4oAABqgN9//139+vVTSkqKYmNjtXPnTi1cuFAxMTGlPsgWzg4dOqQPP/xQK1as0B9//
KG77rrL25EA1HCUKAAAaoAGDRqoc+fOeuWVV3T48GHVrVtXgwcP1uOPP65GjRp5O55PK5mQokmTJnr+
+ecdM34CQFVxTxQAAAAAmMA9UQAAAABgAiUKAAAAAEzw+3ui7Ha79u3bp4iICFksFm/
HAQAAAOAlhmHo+PHjio2NVUBA+e83+W2JstlsstlsOn36tH766SdvxwEAAADgI3799Vc1a9as3PV+P7HEsW
PHVL9+ff3666+KjIz0dhxUo6KiIm3cuFEDBgxQUFCQt+OgBmHsoCoYN6gKxg2qirFTNXl5eWrevLlyc3MVF
RVV7nZ+
+05UiZJL+CIjIylRfqaoqEjh4eGKjIzkDxeYwthBVTBuUBWMG1QVY+f8VHabDxNLAAAAAIAJlCgAAAAAMIE
SBQAAAAAmUKIAAAAAwARKFAAAAACYQIkCAAAAABMoUQAAAABgAiUKAAAAAEygRAEAAACACZQoAAAAADCBEg
UAAAAAJlCiAAAAAMAEShQAAAAAmECJAgAAAAATKFEAAAAAYAIlCgAAAABMoEQBAAAAgAl1vB3AW2w2m2w2m
4qLi70dBQD8VnZ2tnJycrwdQ5J06tQphYSEeGz/
drtdkrRjxw4FBFT8b5jR0dGKi4vzWBYAwPnx2xJltVpltVqVl5enqKgob8cBAL+TnZ2t+Pi2OnnyhLej/
FegJM/9w1pYWJiWLl2qXr16qbCwsMJtQ0PDtWtXFkUKAHyU35YoAIB35eTk/
LdALZbU1stpPpD0kIez2CX9LilNFV9Nn6WTJ1OUk5NDiQIAH0WJAgB4WVtJnbycIeu///
VkliKdLVFJkoI8dAwAQHVgYgkAAAAAMIESBQAAAAAmUKIAAAAAwARKFAAAAACYwMQSAOBnfOWzmbKysirfC
AAAH0SJAgA/4nufzYTy+FLJ5MN/AcAZJQoA/
IhvfjYTnO2XFKCUlBRvB3Hgw38BwBklCgD8ki99NhOc5ersB/P6QtGV+PBfACiNEgUAgE/yhaILACgLs/
MBAAAAgAmUKAAAAAAwgRIFAAAAACZQogAAAADABEoUAAAAAJhAiQIAAAAAEyhRAAAAAGACJQoAAAAATKBEA
QAAAIAJlCgAAAAAMIESBQAAAAAmUKIAAAAAwARKFAAAAACYQIkCAAAAABNqTYk6ceKEWrRooXvvvdfbUQAA
AADUYrWmRP3jH//QZZdd5u0YAAAAAGq5Ot4O4A4//
vijdu7cqSFDhui7777zdhwAcJKdna2cnBxvx5AkZWVleTsCAAA1ntdLVFpamp566il9+eWX2r9/
v1auXKlhw4Y5bWOz2fTUU0/
pwIEDSkpK0gsvvKCuXbs61t9777166qmn9Nlnn1VzegCoWHZ2tuLj2+rkyRPejgIAANzE6yWqoKBASUlJmj
Bhgm644YZS65ctW6Zp06Zp4cKF6tatm+bPn6+BAwdq165datKkiVavXq02bdqoTZs2lCgAPicnJ+e/
BWqxpLbejiPpA0kPeTsEAAA1mtdL1KBBgzRo0KBy18+bN0+TJk3S+PHjJUkLFy7U2rVr9dprr2nGjBn6/
PPP9c4772j58uXKz89XUVGRIiMjNWvWrDL3d+rUKZ06dcrxOC8vT5JUVFSkoqIiN74y+LqS7zffd5hlZuzY
7XaFhYVJipeU6NlgLsmSFCbJLskXxr7/
ZAkLK3L6rzezmGOXFCa73c6fl17A31WoKsZO1bh6viyGYRgezuIyi8XidDnf6dOnFR4erhUrVjhd4jd27Fj
l5uZq9erVTs9ftGiRvvvuOz399NPlHmPOnDmaO3duqeVLlixReHi4W14HAAA
AgJrnxIkTuuWWW3Ts2DFFRkaWu53X34mqSE5OjoqLi9W0aVOn5U2bNtXOnTurtM+ZM2dq2rRpjsd5eXlq3r
y5BgwYUOGJQu1TVFSkTZs2qX///goKCvJ2HNQgZsbOjh071KtXL0lpkpKqJV/
F3pU0Sb6Rx7+yhIUV6bXXNmnChP4qLKxo3PjSeZGkHZJ6KS0tTUlJvpDHv/
B3FaqKsVM1JVepVcanS5RZ48aNq3SbkJAQhYSElFoeFBTEAPNTfO9RVa6MnYCAABUWFursJ0r4yjjzpTz+l
6WwMKiSElV9WVwTIKlQu3btUkCA9z8ZJTo6WnFxcd6OUe34uwpVxdgxx9Vz5dMlKjo6WoGBgTp48KDT8oMH
DyomJsZLqQAA8Cf7JQUoJSXF20EkSaGh4dq1K8svixQA3+HTJSo4OFidO3fW5s2bHfdE2e12bd68Wampqd4
NBwCAX8jV2cklfGGGySydPJminJwcShQAr/J6icrPz9fu3bsdj/
fs2aOMjAw1bNhQcXFxmjZtmsaOHasuXbqoa9eumj9/
vgoKChyz9VWVzWaTzWZTcXHx+b4EAAD8QFtJnbwdAgB8gtdL1Pbt29W3b1/
H45JJH8aOHatFixZpxIgROnz4sGbNmqUDBw6oY8eOWr9+fanJJsyyWq2yWq3Ky8tTVFTUee0LAAAAgP/
weonq06ePKptlPTU1lcv3AAAAAPgE70+zAwAAAAA1CCUKAAAAAEzw2xJls9mUkJCg5ORkb0cBAAAAUIP4bY
myWq3KzMxUenq6t6MAAAAAqEH8tkQBAAAAQFVQogAAAADABEoUAAAAAJhAiQIAAAAAEyhRAAAAAGCC35Yop
jgHAAAAUBV+W6KY4hwAAABAVfhtiQIAAACAqqBEAQAAAIAJlCgAAAAAMIESBQAAAAAmUKIAAAAAwAS/
LVFMcQ4AAACgKvy2RDHFOQAAAICq8NsSBQAAAABVQYkCAAAAABMoUQAAAABgAiUKAAAAAEygRAEAAACACXW
8HQAAPCE7O1s5OTke2bfdbpck7dixQwEBFf9bVFZWlkcyAAAA76FEAah1srOzFR/fVidPnvDI/
sPCwrR06VL16tVLhYWFHjkGAADwXX5bomw2m2w2m4qLi70dBYCb5eTk/
LdALZbU1gNHsEv6XVKaKr8q+gNJD3kgAwAA8Ba/
LVFWq1VWq1V5eXmKiorydhwAHtFWUicP7LdIZ0tUkqSgSrblcj4AAGobvy1RAACgZvKVew2jo6MVFxfn7Rg
AvIASBQAAaoj9kgKUkpLi7SCSpNDQcO3alUWRAvwQJQoAANQQuTp7T6Kn7nc0I0snT6YoJyeHEgX4IUoUAA
CoYTx1vyMAuIYP2wUAAAAAEyhRAAAAAGACJQoAAAAATKBEAQAAAIAJlCgAAAAAMMFvS5TNZlNCQoKSk5O9H
QUAAABADeK3JcpqtSozM1Pp6enejgIAAACgBvHbEgUAAAAAVUGJAgAAAAATKFEAAAAAYAIlCgAAAABMoEQB
AAAAgAmUKAAAAAAwgRIFAAAAACZQogAAAADABEoUAAAAAJhAiQIAAAAAEyhRAAAAAGCC35Yom82mhIQEJSc
nezsKAAAAgBrEb0uU1WpVZmam0tPTvR0FAAAAQA3ityUKAAAAAKqCEgUAAAAAJlCiAAAAAMAEShQAAAAAmE
CJAgAAAAATKFEAAAAAYAIlCgAAAABMoEQBAAAAgAmUKAAAAAAwgRIFAAAAACZQogAAAADABEoUAAAAAJhAi
QIAAAAAEyhRAAAAAGACJQoAAAAATKBEAQAAAIAJlCgAAAAAMIESBQAAAAAm+G2JstlsSkhIUHJysrejAAAA
AKhB6ng7gLdYrVZZrVbl5eUpKirK23GAWiE7O1s5OTnejqGsrCxvRwAAALWY35YoAO6VnZ2t+Pi2OnnyhLe
jAAAAeBQlCoBb5OTk/
LdALZbU1stpPpD0kJczAACA2ooSBcDN2krq5OUMXM4HAAA8hxIFAABQRZ6+B9Nut0uSduzYoYCA8ucDi46O
VlxcnEezAPj/
KFEAAACm7ZcUoJSUFI8eJSwsTEuXLlWvXr1UWFhY7nahoeHatSuLIgVUE0oUAACAabmS7PL8faB2Sb9LSlP
5n0yTpZMnU5STk0OJAqoJJQoAAKDKPH0faJHOlqgkSUEePA4AM/
z2w3YBAAAAoCooUQAAAABgAiUKAAAAAEygRAEAAACACZQoAAAAADCBEgUAAAAAJlCiAAAAAMAEShQAAAAAm
ECJAgAAAAATKFEAAAAAYAIlCgAAAABMoEQBAAAAgAmUKAAAAAAwgRIFAAAAACZQogAAAADABEoUAAAAAJhA
iQIAAAAAEyhRAAAAAGBCjS9Rubm56tKlizp27Kj27dvr5Zdf9nYkAAAAALVYHW8HOF8RERFKS0tTeHi4Cgo
K1L59e91www1q1KiRt6MBAAAAqIVq/
DtRgYGBCg8PlySdOnVKhmHIMAwvpwIAAABQW3m9RKWlpWnIkCGKjY2VxWLRqlWrSm1js9nUsmVLhYaGqlu3
bvriiy+c1ufm5iopKUnNmjXT9OnTFR0dXU3pAQAAAPgbr5eogoICJSUlyWazlbl+2bJlmjZtmmbPnq2vvvp
KSUlJGjhwoA4dOuTYpn79+tqxY4f27NmjJUuW6ODBg9UVHwAAAICf8fo9UYMGDdKgQYPKXT9v3jxNmjRJ48
ePlyQtXLhQa9eu1WuvvaYZM2Y4bdu0aVMlJSXpP//
5j2688cYy93fq1CmdOnXK8TgvL0+SVFRUpKKiovN9OahBSr7ffN/
dw263KywsTJJdki+cU89lCQsrcvqvN7NUjS/
l8Z8s5saNL50Xybfy+FcW18aNXVKY7HY7f6fBgd9zqsbV82UxfOgGIovFopUrV2rYsGGSpNOnTys8PFwrVq
xwLJOksWPHKjc3V6tXr9bBgwcVHh6uiIgIHTt2TD169NDSpUuVmJhY5jHmzJmjuXPnllq+ZMkSx71VAAAAA
PzPiRMndMstt+jYsWOKjIwsdzuvvxNVkZycHBUXF6tp06ZOy5s2baqdO3dKkvbu3avbbrvNMaHE5MmTyy1Q
kjRz5kxNmzbN8TgvL0/NmzfXgAEDKjxRqH2Kioq0adMm9e/
fX0FBQd6OU+Pt2LFDvXr1kpQmKcnLad6VNMljWcLCivTaa5s0YUJ/
FRZWNnY8m8U8X8rjX1lcHze+dF4k38rjf1lcGzc7JPVSWlqakpK8fV7gK/
g9p2pKrlKrzHmXqLy8PH300UeKj49X27Ztz3d3pnXt2lUZGRkubx8SEqKQkJBSy4OCghhgforvvXsEBASos
LBQZ2+19IXz6fkshYVBLpSo6sliji/l8b8sro0bXzovkm/
l8c8sFY+bAEmFCggI4O8zlMLvOea4eq5MTyxx0003acGCBZKkwsJCdenSRTfddJM6dOigf/
3rX2Z3V6Ho6GgFBgaWmiji4MGDiomJceuxAAAAAMAVpktUWlqarrjiCknSypUrZRiGcnNz9fzzz+uRRx5xa
7jg4GB17txZmzdvdiyz2+3avHmzunfv7tZjAQAAAIArTJeoY8eOqWHDhpKk9evXa/
jw4QoPD9fgwYP1448/mg6Qn5+vjIwMxyV5e/bsUUZGhrKzsyVJ06ZN08svv6w33nhDWVlZ+utf/
6qCggLHbH1VZbPZlJCQoOTk5PPaDwAAAAD/
YvqeqObNm2vbtm1q2LCh1q9fr3feeUeSdPToUYWGhpoOsH37dvXt29fxuGTSh7Fjx2rRokUaMWKEDh8+rFm
zZunAgQPq2LGj1q9fX2qyCbOsVqusVqvy8vIUFRV1XvsCAAAA4D9Ml6i7775bo0aNUr169RQXF6c+ffpIOn
uZX0Wz4pWnT58+qmyW9dTUVKWmppreNwAAAAC4m+kSdeedd6pr16769ddf1b9/fwUEnL0i8H/+53/
cfk8UAAAAAPiaKk1x3qVLF3Xo0EF79uxRq1atVKdOHQ0ePNjd2QAAAADA55ieWOLEiROaOHGiwsPD1a5dO8
cEEJMnT9bjjz/
u9oAAAAAA4EtMl6iZM2dqx44d+vjjj50mkujXr5+WLVvm1nCexOx8AAAAAKrCdIlatWqVFixYoJ49e8pisT
iWt2vXTj/
99JNbw3mS1WpVZmam0tPTvR0FAAAAQA1iukQdPnxYTZo0KbW8oKDAqVQBAAAAQG1kukR16dJFa9eudTwuKU
6vvPKKunfv7r5kAAAAAOCDTM/
O9+ijj2rQoEHKzMzUmTNn9NxzzykzM1OfffaZtm7d6omMAAAAAOAzTL8T1bNnT2VkZOjMmTNKTEzUxo0b1a
RJE23btk2dO3f2REYAAAAA8BlV+pyoVq1a6eWXX3Z3lmpls9lks9lUXFzs7SgAAAAAahDTJSowMFD79+8vN
bnEH3/8oSZNmtSYUmK1WmW1WpWXl6eoqChvxwEAADgvWVlZ3o4gSYqOjlZcXJy3YwAeZbpEGYZR5vJTp04p
ODj4vAMBAADAjP2SApSSkuLtIJKk0NBw7dqVRZFCreZyiXr+
+eclnZ2N75VXXlG9evUc64qLi5WWlqZLLrnE/
QkBAABQgVxJdkmLJbX1bhRl6eTJFOXk5FCiUKu5XKKeffZZSWffiVq4cKECAwMd64KDg9WyZUstXLjQ/
QkBAADggraSOnk7BOAXXC5Re/bskST17dtX//
73v9WgQQOPhQIAAAAAX2V6ivMtW7aoQYMGOn36tHbt2qUzZ854IhcAAAAA+CTTJaqwsFATJ05UeHi42rVrp
+zsbEnS5MmT9fjjj7s9oKfYbDYlJCQoOTnZ21EAAAAA1CCmS9SMGTO0Y8cOffzxxwoNDXUs79evn5YtW+bW
cJ5ktVqVmZmp9PR0b0cBAAAAUIOYnuJ81apVWrZsmS677DJZLBbH8nbt2umnn35yazgAAAAA8DWm34k6fPh
wqQ/
alaSCggKnUgUAAAAAtZHpEtWlSxetXbvW8bikOL3yyivq3r27+5IBAAAAgA8yfTnfo48+qkGDBikzM1Nnzp
zRc889p8zMTH322WfaunWrJzICAAAAgM8wXaJ69uypjIwMPf7440pMTNTGjRvVqVMnbdu2TYmJiZ7ICKAC2
dnZysnJ8XYMZWVleTsCAABAtTBdoiSpVatWevnll92dBYBJ2dnZio9vq5MnT3g7CgAAgN9wqUTl5eW5vMPI
yMgqhwFgTk5Ozn8L1GJJbb2c5gNJD3k5AwAAgOe5VKLq169f6cx7hmHIYrGouLjYLcE8zWazyWaz1Zi8QMX
aSurk5QxczgcAAPyDSyVqy5Ytns5R7axWq6xWq/Ly8hQVFeXtOAAAAABqCJdKVO/
evT2dAwAAAABqhCpNLJGbm6svvvhChw4dkt1ud1o3ZswYtwQDAAAAAF9kukS99957GjVqlPLz8xUZGel0r5
TFYqFEAQAAAKjVAsw+4Z577tGECROUn5+v3NxcHT161PF15MgRT2QEAAAAAJ9hukT9/
vvvmjJlisLDwz2RBwAAAAB8mukSNXDgQG3fvt0TWQAAAADA57l0T9SaNWsc/
z948GBNnz5dmZmZSkxMVFBQkNO2Q4cOdW9CAAAAAPAhLpWoYcOGlVr28MMPl1pWkz5sFwAAAACqwqUSde40
5gAAAADgr0zfE/Xmm2/q1KlTpZafPn1ab775pltCVQebzaaEhAQlJyd7OwoAAACAGsR0iRo/
fryOHTtWavnx48c1fvx4t4SqDlarVZmZmUpPT/
d2FAAAAAA1iOkSZRiG0wfslvjtt98UFRXlllAAAAAA4KtcuidKki699FJZLBZZLBZdddVVqlPn/
z+1uLhYe/
bs0dVXX+2RkAAAAADgK1wuUSUz9GVkZGjgwIGqV6+eY11wcLBatmyp4cOHuz0gAAAAAPgSl0vU7NmzJUktW
7bUiBEjFBoaWmqb7777Tu3bt3dfOgAAAADwMabviRo7dqxTgTp+/
Lheeuklde3aVUlJSW4NBwAAAAC+xnSJKpGWlqaxY8fqggsu0NNPP60rr7xSn3/+uTuzAQAAAIDPcflyPkk6
cOCAFi1apFdffVV5eXm66aabdOrUKa1atUoJCQmeyggAAAAAPsPld6KGDBmi+Ph4ffPNN5o/
f7727dunF154wZPZAAAAAMDnuPxO1Lp16zRlyhT99a9/1cUXX+zJTAAAAADgs1x+J+qTTz7R8ePH1blzZ3X
r1k0LFixQTk6OJ7MBAAAAgM9xuURddtllevnll7V//
37dfvvteueddxQbGyu73a5Nmzbp+PHjnswJAAAAAD7B9Ox8devW1YQJE/TJJ5/o22+/
1T333KPHH39cTZo00dChQz2REQAAAAB8RpWnOJek+Ph4Pfnkk/
rtt9+0dOlSd2WqFjabTQkJCUpOTvZ2FAAAAAA1yHmVqBKBgYEaNmyY1qxZ447dVQur1arMzEylp6d7OwoAA
ACAGsQtJQoAAAAA/
AUlCgAAAABMoEQBAAAAgAkulahOnTrp6NGjkqSHH35YJ06c8GgoAAAAAPBVLpWorKwsFRQUSJLmzp2r/
Px8j4YCAAAAAF9Vx5WNOnbsqPHjx6tnz54yDENPP/
206tWrV+a2s2bNcmtAAAAAAPAlLpWoRYsWafbs2Xr//
fdlsVi0bt061alT+qkWi4USBQAAAKBWc6lExcfH65133pEkBQQEaPPmzWrSpIlHgwEAAKBmysrK8nYEh+jo
aMXFxXk7BmoZl0rUn9ntdk/kAAAAQI23X1KAUlJSvB3EITQ0XLt2ZVGk4FamS5Qk/fTTT5o/
f77jXxkSEhJ01113qVWrVm4NBwAAgJokV5Jd0mJJbb0bRZKUpZMnU5STk0OJgluZLlEbNmzQ0KFD1bFjR/
Xo0UOS9Omnn6pdu3Z677331L9/f7eHBAAAQE3SVlInb4cAPMZ0iZoxY4amTp2qxx9/vNTy+
+67jxIFAAAAoFZz6XOi/
iwrK0sTJ04stXzChAnKzMx0SygAAAAA8FWmS1Tjxo2VkZFRanlGRgYz9gEAAACo9Uxfzjdp0iTddttt+vnn
n3X55ZdLOntP1BNPPKFp06a5PSAAAAAA+BLTJeqhhx5SRESEnnnmGc2cOVOSFBsbqzlz5mjKlCluDwgAAAA
AvsR0ibJYLJo6daqmTp2q48ePS5IiIiLcHgwAAAAAfJHpe6L+LCIiosYWKJvNpoSEBCUnJ3s7CgAAAIAa5L
xKVE1mtVqVmZmp9PR0b0cBAAAAUIP4bYkCAAAAgKqgRAEAAACACaZKVFFRka666ir9+OOPnsoDAAAAAD7NV
IkKCgrSN99846ksAAAAAODzTF/
Ol5KSoldffdUTWQAAAADA55n+nKgzZ87otdde04cffqjOnTurbt26TuvnzZvntnAAAAAA4GtMl6jvvvtOnT
p1kiT98MMPTussFot7UgEAAACAjzJdorZs2eKJHAAAAABQI1R5ivPdu3drw4YNKiwslCQZhuG2UAAAAADgq
0yXqD/++ENXXXWV2rRpo2uuuUb79+
+XJE2cOFH33HOP2wMCAAAAgC8xXaKmTp2qoKAgZWdnKzw83LF8xIgRWr9+vVvDAQAAAICvMX1P1MaNG7Vhw
wY1a9bMafnFF1+svXv3ui0YAAAAAPgi0+9EFRQUOL0DVeLIkSMKCQlxSygAAAAA8FWmS9QVV1yhN9980/
HYYrHIbrfrySefVN++fd0aDgAAAAB8jenL+Z588kldddVV2r59u06fPq2//
e1v+v7773XkyBF9+umnnsgI+Jzs7Gzl5OR4O4aysrK8HQEAAMDvmC5R7du31w8//
KAFCxYoIiJC+fn5uuGGG2S1WnXBBRd4IiPgU7KzsxUf31YnT57wdhQAAAB4gekSJUlRUVF64IEH3J0FqBFy
cnL+W6AWS2rr5TQfSHrIyxkAAAD8S5VK1NGjR/Xqq686LiVKSEjQ+PHj1bBhQ7eGA3xbW0mdvJyBy/
kAAACqm+mJJdLS0tSyZUs9//
zzOnr0qI4eParnn39eF110kdLS0jyREQAAAAB8hul3oqxWq0aMGKEXX3xRgYGBkqTi4mLdeeedslqt+vbbb
90eEgAAAAB8hel3onbv3q177rnHUaAkKTAwUNOmTdPu3bvdGg4AAAAAfI3pEtWpU6cyp1XOyspSUlKSW0IB
AAAAgK9y6XK+b775xvH/U6ZM0V133aXdu3frsssukyR9/
vnnstlsevzxxz2TEgAAAAB8hEslqmPHjrJYLDIMw7Hsb3/7W6ntbrnlFo0YMcJ96QAAAADAx7hUovbs2ePp
HAAAAABQI7hUolq0aOHpHFX266+/avTo0Tp06JDq1Kmjhx56SH/
5y1+8HQsAAABALVWlD9vdt2+fPvnkEx06dEh2u91p3ZQpU9wSzFV16tTR/
Pnz1bFjRx04cECdO3fWNddco7p161ZrDgAAAAD+wXSJWrRokW6//
XYFBwerUaNGslgsjnUWi6XaS9QFF1ygCy64QJIUExOj6OhoHTlyhBIFAAAAwCNMT3H+0EMPadasWTp27Jh+
+eUX7dmzx/H1888/mw6QlpamIUOGKDY2VhaLRatWrSq1jc1mU8uWLRUaGqpu3brpiy+
+KHNfX375pYqLi9W8eXPTOQAAAADAFaZL1IkTJzRy5EgFBJh+apkKCgqUlJQkm81W5vply5Zp2rRpmj17tr
766islJSVp4MCBOnTokNN2R44c0ZgxY/
TSSy+5JRcAAAAAlMX05XwTJ07U8uXLNWPGDLcEGDRokAYNGlTu+nnz5mnSpEkaP368JGnhwoVau3atXnvtN
UeGU6dOadiwYZoxY4Yuv/zyCo936tQpnTp1yvE4Ly9PklRUVKSioqLzfTmoQUq+32a/
73a7XWFhYZLsknxhzJClbJ7LEhZW5PRfb2apGl/K4z9ZzI0bXzovkm/l8a8sro8b/zovrrNLCpPdbve73/
Oq+nuOv3P1fFmMP3/4kwuKi4t17bXXqrCwUImJiQoKCnJaP2/ePDO7cw5jsWjlypUaNmyYJOn06dMKDw/
XihUrHMskaezYscrNzdXq1atlGIZuueUWxcfHa86cOZUeY86cOZo7d26p5UuWLFF4eHiVswMAAACo2U6cOK
FbbrlFx44dU2RkZLnbmX4n6rHHHtOGDRsUHx8vSaUmlnCnnJwcFRcXq2nTpk7LmzZtqp07d0qSPv30Uy1bt
kwdOnRw3E/11ltvKTExscx9zpw5U9OmTXM8zsvLU/PmzTVgwIAKTxRqn6KiIm3atEn9+/
cv9Y8BFdmxY4d69eolKU1SksfyueZdSZPIUs1ZwsKK9NprmzRhQn8VFlY2dnzpvEi+lce/
srg+bnzpvEi+lcf/
srg2bvzvvLhuh6ReSktLU1KSL+SpPlX9PcfflVylVhnTJeqZZ57Ra6+9pnHjxpl9qkf07Nmz1DTrFQkJCVF
ISEip5UFBQQwwP2X2ex8QEKDCwkKdvaXQF8YMWcrm+SyFhUEulKjqyWKOL+XxvyyujRtfOi+Sb+XxzyyVjx
v/PC+VC5BUqICAAL/9PY/
fcc1x9VyZnh0iJCREPXr0MB2oKqKjoxUYGKiDBw86LT948KBiYmKqJQMAAAAA/
JnpEnXXXXfphRde8ESWUoKDg9W5c2dt3rzZscxut2vz5s3q3r17tWQAAAAAgD8zfTnfF198oY8++kjvv/+
+2rVrV+otr3//+9+m9pefn6/du3c7Hu/Zs0cZGRlq2LCh4uLiNG3aNI0dO1ZdunRR165dNX/
+fBUUFDhm66sqm80mm82m4uLi89oPAAAAAP9iukTVr19fN9xwg9sCbN+
+XX379nU8Lpn0YezYsVq0aJFGjBihw4cPa9asWTpw4IA6duyo9evXl5pswiyr1Sqr1aq8vDxFRUWd174AAA
Dgu7KysrwdQdLZW1Xi4uK8HQNuYLpEvf76624N0KdPH1U2y3pqaqpSU1PdelwAAADUdvslBSglJcXbQSRJo
aHh2rUriyJVC5guUQAAAEDNkKuzH7i7WFJb70ZRlk6eTFFOTg
4lqhYwXaIuuuiiCj8P6ueffz6vQAAAAIB7tZXUydshUIuYLlF333230+OioiJ9/
fXXWr9+vaZPn+6uXAAAAADgk0yXqLvuuqvM5TabTdu3bz/vQNWF2fkAAAAAVIXpz4kqz6BBg/Svf/
3LXbvzOKvVqszMTKWnp3s7CgAAAIAaxG0lasWKFWrYsKG7dgcAAAAAPsn05XyXXnqp08QShmHowIEDOnz4s
P75z3+6NRwAAAAA+BrTJWrYsGFOjwMCAtS4cWP16dNHl1xyibtyAQAAAIBPMl2iZs+e7YkcAAAAAFAjuO2e
qJrGZrMpISFBycnJ3o4CAAAAoAZxuUQFBAQoMDCwwq86dUy/seU1zM4HAAAAoCpcbj0rV64sd922bdv0/
PPPy263uyUUAAAAAPgql0vUddddV2rZrl27NGPGDL333nsaNWqUHn74YbeGAwAAAABfU6V7ovbt26dJkyYp
MTFRZ86cUUZGht544w21aNHC3fkAAAAAwKeYKlHHjh3Tfffdp9atW+v777/X5s2b9d5776l9+/
aeygcAAAAAPsXly/
mefPJJPfHEE4qJidHSpUvLvLwPAAAAAGo7l0vUjBkzFBYWptatW+uNN97QG2+8UeZ2//73v90WzpNsNptsN
puKi4u9HQUAAABADeJyiRozZowsFosns1Qrq9Uqq9WqvLw8RUVFeTsOAAAAgBrC5RK1aNEiD8YAAAAAgJqh
SrPzAQAAAIC/
okQBAAAAgAmUKAAAAAAwgRIFAAAAACZQogAAAADABEoUAAAAAJjgtyXKZrMpISFBycnJ3o4CAAAAoAbx2xJ
ltVqVmZmp9PR0b0cBAAAAUIP4bYkCAAAAgKqgRAEAAACACZQoAAAAADCBEgUAAAAAJlCiAAAAAMAEShQAAA
AAmFDH2wEAV2RnZysnJ8et+7Tb7ZKkHTt2KCDA9X9PyMrKcmsOAAAA1CyUKPi87Oxsxce31cmTJ9y637CwM
C1dulS9evVSYWGhW/
cNAACA2osSBZ+Xk5Pz3wK1WFJbN+7ZLul3SWkyd2XrB5IecmMOAAAA1CR+W6JsNptsNpuKi4u9HQUuayupk
xv3V6SzJSpJUpCJ53E5HwAAgD/
z24klrFarMjMzlZ6e7u0oAAAAAGoQvy1RAAAAAFAVlCgAAAAAMIESBQAAAAAmUKIAAAAAwARKFAAAAACYQI
kCAAAAABMoUQAAAABgAiUKAAAAAEygRAEAAACACZQoAAAAADCBEgUAAAAAJlCiAAAAAMAEvy1RNptNCQkJS
k5O9nYUAAAAADWI35Yoq9WqzMxMpaenezsKAAAAgBrEb0sUAAAAAFQFJQoAAAAATKBEAQAAAIAJlCgAAAAA
MIESBQAAAAAmUKIAAAAAwARKFAAAAACYQIkCAAAAABMoUQAAAABgAiUKAAAAAEygRAEAAACACXW8HQAAAAD
wF1lZWdVyHLvdLknasWOHAgJKv28SHR2tuLi4aslSG1GiAAAAAI/
bLylAKSkp1XK0sLAwLV26VL169VJhYWGp9aGh4dq1K4siVUWUKAAAAMDjciXZJS2W1LYajmeX9LukNJW+gy
dLJ0+mKCcnhxJVRZQoAAAAoNq0ldSpGo5TpLMlKklSUDUcz78wsQQAAAAAmECJAgAAAAAT/
LZE2Ww2JSQkKDk52dtRAAAAANQgfluirFarMjMzlZ6e7u0oAAAAAGoQvy1RAAAAAFAVlCgAAAAAMIESBQAA
AAAmUKIAAAAAwARKFAAAAACYQIkCAAAAABMoUQAAAABgAiUKAAAAAEygRAEAAACACZQoAAAAADCBEgUAAAA
AJlCiAAAAAMAEShQAAAAAmECJAgAAAAAT6ng7AHxXdna2cnJyvB1DWVlZ3o4AAAAAOFCiUKbs7GzFx7fVyZ
MnvB0FAAAA8CmUKJQpJyfnvwVqsaS2Xk7zgaSHvJwBAAAAOIsShUq0ldTJyxm4nA8AAAC+g4klAAAAAMAES
hQAAAAAmECJAgAAAAATKFEAAAAAYAIlCgAAAABMoEQBAAAAgAmUKAAAAAAwgRIFAAAAACbUihJ1/
fXXq0GDBrrxxhu9HQUAAABALVcrStRdd92lN99809sxAAAAAPiBWlGi+vTpo4iICG/
HAAAAAOAHvF6i0tLSNGTIEMXGxspisWjVqlWltrHZbGrZsqVCQ0PVrVs3ffHFF9UfFAAAAAAk1fF2gIKCAi
UlJWnChAm64YYbSq1ftmyZpk2bpoULF6pbt26aP3+
+Bg4cqF27dqlJkyamj3fq1CmdOnXK8TgvL0+SVFRUpKKioqq/
kFrGbrcrLCxMkl2SL5wX92cJCyty+q+381QdWcrmuSzmx44vnRfJt/L4TxZz48aXzovkW3n8K4vr48a/
zos5vpSn+rJUPHbsksJkt9v5/
fccrp4Pi2EYhoezuMxisWjlypUaNmyYY1m3bt2UnJysBQsWSDr7y33z5s01efJkzZgxw7Hdxx9/
rAULFmjFihUVHmPOnDmaO3duqeVLlixReHi4e14IAAAAgBrnxIkTuuWWW3Ts2DFFRkaWu53X34mqyOnTp/
Xll19q5syZjmUBAQHq16+ftm3bVqV9zpw5U9OmTXM8zsvLU/PmzTVgwIAKT5S/
2bFjh3r16iUpTVKSl9O8K2mS27OEhRXptdc2acKE/iosDPJ6nqohizeymBs7vnReJN/
K419ZXB83vnReJN/K439ZXBs3/
ndeXOdLeao3S8VjZ4ekXkpLS1NSkrfPi28puUqtMj5donJyclRcXKymTZs6LW/atKl27tzpeNyvXz/
t2LFDBQUFatasmZYvX67u3buXuc+QkBCFhISUWh4UFKSgIDO/
SNduAQEBKiws1Nnb5nzhvHguS2FhkMkS5dk85pGlbJ7P4vrY8aXzIvlWHv/L4tq48aXzIvlWHv/
MUvm48c/
z4hpfylP9WcoeOwGSChUQEMDvv+dw9Xz4dIly1YcffujtCAAAAAD8hNdn56tIdHS0AgMDdfDgQaflBw8eVE
xMjJdSAQAAAPBnPl2igoOD1blzZ23evNmxzG63a/
PmzeVerucqm82mhIQEJScnn29MAAAAAH7E65fz5efna/fu3Y7He/
bsUUZGhho2bKi4uDhNmzZNY8eOVZcuXdS1a1fNnz9fBQUFGj9+/
Hkd12q1ymq1Ki8vT1FRUef7MgAAAAD4Ca+XqO3bt6tv376OxyUz540dO1aLFi3SiBEjdPjwYc2aNUsHDhxQ
x44dtX79+lKTTQAAAABAdfB6ierTp48q+6iq1NRUpaamVlMiAAAAACifT98TBQAAAAC+hhIFAAAAACb4bYl
idj4AAAAAVeG3JcpqtSozM1Pp6enejgIAAACgBvHbEgUAAAAAVUGJAgAAAAATKFEAAAAAYAIlCgAAAABM8N
sSxex8AAAAAKrCb0sUs/MBAAAAqAq/
LVEAAAAAUBWUKAAAAAAwgRIFAAAAACZQogAAAADABEoUAAAAAJhQx9sB4Cw7O1s5OTnejqGsrCxvRwAAAAB
8kt+WKJvNJpvNpuLiYm9HccjOzlZ8fFudPHnC21EAAAAAlMNvS5TVapXValVeXp6ioqK8HUeSlJOT898CtV
hSWy+n+UDSQ17OAAAAAPgevy1Rvq2tpE5ezsDlfAAAAEBZmFgCAAAAAEygRAEAAACACZQoAAAAADCBEgUAA
AAAJlCiAAAAAMAEvy1RNptNCQkJSk5O9nYUAAAAADWI35Yoq9WqzMxMpaenezsKAAAAgBrEb0sUAAAAAFQF
JQoAAAAATKBEAQAAAIAJlCgAAAAAMIESBQAAAAAmUKIAAAAAwARKFAAAAACYQIkCAAAAABMoUQAAAABggt+
WKJvNpoSEBCUnJ3s7CgAAAIAaxG9LlNVqVWZmptLT070dBQAAAEAN4rclCgAAAACqghIFAAAAACZQogAAAA
DABEoUAAAAAJhAiQIAAAAAEyhRAAAAAGACJQoAAAAATKBEAQAAAIAJlCgAAAAAMIESBQAAAAAmUKIAAAAAw
IQ63g4AAAAAoPplZWV5O4IkKTo6WnFxcd6OYYrfliibzSabzabi4mJvRwEAAACq0X5JAUpJSfF2EElSaGi4
du3KqlFFym9LlNVqldVqVV5enqKiorwdBwAAAKgmuZLskhZLauvdKMrSyZMpysnJoUQBAAAA8HVtJXXydog
aiYklAAAAAMAEShQAAAAAmECJAgAAAAATKFEAAAAAYAIlCgAAAABMoEQBAAAAgAmUKAAAAAAwgRIFAAAAAC
ZQogAAAADABEoUAAAAAJhAiQIAAAAAEyhRAAAAAGACJQoAAAAATKBEAQAAAIAJlCgAAAAAMIESBQAAAAAm+
G2JstlsSkhIUHJysrejAAAAAKhB6ng7gLdYrVZZrVYdO3ZM9evXV15enrcjKT8/v+T/JHk7z4n//
rf2ZjGMIp04cUKGkScpyOt5qoYsZfNsFnNjx5fOi+Rbefwri+vjxpfOi+Rbefwvi2vjxv/Oi+t8KU/
1Zql47PjSeTn7+29+fr5P/D5eksEwjAq3sxiVbVHL/fbbb2revLm3YwAAAADwEb/++quaNWtW7nq/
L1F2u1379u1TRESELBaLt+OgGuXl5al58+b69ddfFRkZ6e04qEEYO6gKxg2qgnGDqmLsVI1hGDp+/
LhiY2MVEFD+nU9+ezlfiYCAgApbJmq/yMhI/nBBlTB2UBWMG1QF4wZVxdgxLyoqqtJt/
HZiCQAAAACoCkoUAAAAAJhAiYLfCgkJ0ezZsxUSEuLtKKhhGDuoCsYNqoJxg6pi7HiW308sAQAAAABm8E4U
AAAAAJhAiQIAAAAAEyhRAAAAAGACJQoAAAAATKBEodZJS0vTkCFDFBsbK4vFolWrVpW77R133CGLxaL58+c
7LT9y5IhGjRqlyMhI1a9fXxMnTlR+fr5ng8OrXBk3WVlZGjp0qKKiolS3bl0lJycrOzvbsf7kyZOyWq1q1K
iR6tWrp+HDh+vgwYPV+CpQ3SobN/n5+UpNTVWzZs0UFhamhIQELVy40Gkbxo3/
eeyxx5ScnKyIiAg1adJEw4YN065du5y2cWVcZGdna/
DgwQoPD1eTJk00ffp0nTlzpjpfCqpZZWPnyJEjmjx5suLj4xUWFqa4uDhNmTJFx44dc9oPY+f8UaJQ6xQUF
CgpKUk2m63C7VauXKnPP/9csbGxpdaNGjVK33//vTZt2qT3339faWlpuu222zwVGT6gsnHz008/
qWfPnrrkkkv08ccf65tvvtFDDz2k0NBQxzZTp07Ve+
+9p+XLl2vr1q3at2+fbrjhhup6CfCCysbNtGnTtH79ei1evFhZWVm6+
+67lZqaqjVr1ji2Ydz4n61bt8pqterzzz/
Xpk2bVFRUpAEDBqigoMCxTWXjori4WIMHD9bp06f12Wef6Y033tCiRYs0a9Ysb7wkVJPKxs6+ffu0b98+Pf
300/ruu++0aNEirV+/
XhMnTnTsg7HjJgZQi0kyVq5cWWr5b7/9Zlx44YXGd999Z7Ro0cJ49tlnHesyMzMNSUZ6erpj2bp16wyLxWL
8/vvv1ZAa3lbWuBkxYoSRkpJS7nNyc3ONoKAgY/
ny5Y5lWVlZhiRj27ZtnooKH1LWuGnXrp3x8MMPOy3r1KmT8cADDxiGwbjBWYcOHTIkGVu3bjUMw7Vx8cEHH
xgBAQHGgQMHHNu8+OKLRmRkpHHq1KnqfQHwmnPHTlneffddIzg42CgqKjIMg7HjLrwTBb9jt9s1evRoTZ8+
Xe3atSu1ftu2bapfv766dOniWNavXz8FBATo//7v/
6ozKnyE3W7X2rVr1aZNGw0cOFBNmjRRt27dnC7d+vLLL1VUVKR+/fo5ll1yySWKi4vTtm3bvJAavuDyyy/
XmjVr9Pvvv8swDG3ZskU//PCDBgwYIIlxg7NKLrVq2LChJNfGxbZt25SYmKimTZs6thk4cKDy8vL0/
fffV2N6eNO5Y6e8bSIjI1WnTh1JjB13oUTB7zzxxBOqU6eOpkyZUub6AwcOqEmTJk7L6tSpo4YNG+rAgQPV
ERE+5tChQ8rPz9fjjz+uq6+
+Whs3btT111+vG264QVu3bpV0dtwEBwerfv36Ts9t2rQp48aPvfDCC0pISFCzZs0UHBysq6+
+WjabTb169ZLEuMHZf6S5++671aNHD7Vv316Sa+PiwIEDTr8El6wvWYfar6yxc66cnBz9/e9/
d7olgbHjHnW8HQCoTl9+
+aWee+45ffXVV7JYLN6OgxrCbrdLkq677jpNnTpVktSxY0d99tlnWrhwoXr37u3NePBhL7zwgj7//
HOtWbNGLVq0UFpamqxWq2JjY53eZYD/slqt+u677/
TJJ594OwpqmMrGTl5engYPHqyEhATNmTOnesP5Ad6Jgl/
5z3/+o0OHDikuLk516tRRnTp1tHfvXt1zzz1q2bKlJCkmJkaHDh1yet6ZM2d05MgRxcTEeCE1vC06Olp16t
RRQkKC0/K2bds6ZueLiYnR6dOnlZub67TNwYMHGTd+qrCwUPfff7/
mzZunIUOGqEOHDkpNTdWIESP09NNPS2Lc+LvU1FS9//772rJli5o1a+ZY7sq4iImJKTVbX8ljxk7tV97YKX
H8+HFdffXVioiI0MqVKxUUFORYx9hxD0oU/Mro0aP1zTffKCMjw/EVGxur6dOna8OGDZKk7t27Kzc3V19+
+aXjeR999JHsdru6devmrejwouDgYCUnJ5eagviHH35QixYtJEmdO3dWUFCQNm/e7Fi/
a9cuZWdnq3v37tWaF76hqKhIRUVFCghw/qs2MDDQ8e4m48Y/GYah1NRUrVy5Uh999JEuuugip/
WujIvu3bvr22+/dfpHv02bNikyMrLUP/
ig9qhs7Ehn34EaMGCAgoODtWbNGqdZZCXGjrtwOR9qnfz8fO3evdvxeM+ePcrIyFDDhg0VFxenRo0aOW0fF
BSkmJgYxcfHSzr77sLVV1+tSZMmaeHChSoqKlJqaqpGjhxZ5nToqB0qGzfTp0/XiBEj1KtXL/Xt21fr16/
Xe++9p48//
liSFBUVpYkTJ2ratGlq2LChIiMjNXnyZHXv3l2XXXaZl14VPK2ycdO7d29Nnz5dYWFhatGihbZu3ao333xT
8+bNk8S48VdWq1VLlizR6tWrFRER4bgPJSoqSmFhYS6NiwEDBighIUGjR4/
Wk08+qQMHDujBBx+U1WpVSEiIN18ePKiysVNSoE6cOKHFixcrLy9PeXl5kqTGjRsrMDCQseMuXp4dEHC7LV
u2GJJKfY0dO7bM7c+d4twwDOOPP/
4wbr75ZqNevXpGZGSkMX78eOP48eOeDw+vcWXcvPrqq0br1q2N0NBQIykpyVi1apXTPgoLC40777zTaNCgg
REeHm5cf/31xv79+6v5laA6VTZu9u/fb4wbN86IjY01QkNDjfj4eOOZZ54x7Ha7Yx+MG/9T1piRZLz+
+uuObVwZF7/88osxaNAgIywszIiOjjbuuecexzTWqJ0qGzvl/
ZkkydizZ49jP4yd82cxDMPweFMDAAAAgFqCe6IAAAAAwARKFAAAAACYQIkCAAAAABMoUQAAAABgAiUKAAAA
AEygRAEAAACACZQoAAAAADCBEgUAAAAAJlCiAAA1QsuWLTV//nyXt//
ll19ksViUkZFx3seeM2eOOnbseN77AQDUDpQoAIDHjBs3TsOGDSu1/
OOPP5bFYlFubq7L+0pPT9dtt93mvnCSFi1apPr161e63b333qvNmze79dgAgJqrjrcDAADgisaNG3vt2PXq
1VO9evW8dnwAgG/hnSgAgE/45JNPdMUVVygsLEzNmzfXlClTVFBQ4Fh/
7uV8O3fuVM+ePRUaGqqEhAR9+OGHslgsWrVqldN+f/75Z/
Xt21fh4eFKSkrStm3bJJ19N2z8+PE6duyYLBaLLBaL5syZU2a2cy/
nK3mH7emnn9YFF1ygRo0ayWq1qqioqMLX+N577yk5OVmhoaGKjo7W9ddf7/
T6HnnkEY0ZM0b16tVTixYttGbNGh0+fFjXXXed6tWrpw4dOmj79u2unVAAgMdQogAAXvfTTz/
p6quv1vDhw/XNN99o2bJl+uSTT5Samlrm9sXFxRo2bJjCw8P1f//
3f3rppZf0wAMPlLntAw88oHvvvVcZGRlq06aNbr75Zp05c0aXX3655s+fr8jISO3fv1/79+/
Xvffe63LmLVu26KefftKWLVv0xhtvaNGiRVq0aFG5269du1bXX3+9rrnmGn399dfavHmzunbt6rTNs88+qx
49eujrr7/W4MGDNXr0aI0ZM0YpKSn66quv1KpVK40ZM0aGYbicEwDgflzOBwDwqPfff7/
UpXDFxcVOjx977DGNGjVKd999tyTp4osv1vPPP6/evXvrxRdfVGhoqNP2mzZt0k8//aSPP/5YMTExkqR//
OMf6t+/
f6nj33vvvRo8eLAkae7cuWrXrp12796tSy65RFFRUbJYLI59mNGgQQMtWLBAgYGBuuSSSzR48GBt3rxZkyZ
NKnP7f/zjHxo5cqTmzp3rWJaUlOS0zTXXXKPbb79dkjRr1iy9+OKLSk5O1l/
+8hdJ0n333afu3bvr4MGDVcoMAHAP3okCAHhU3759lZGR4fT1yiuvOG2zY8cOLVq0yHHvUb169TRw4EDZ7X
bt2bOn1D537dql5s2bOxWJc9/VKdGhQwfH/19wwQWSpEOHDp3362rXrp0CAwOd9l3RfjMyMnTVVVdVuM8/
Z23atKkkKTExsdQyd+QHAFQd70QBADyqbt26at26tdOy3377zelxfn6+br/
9dk2ZMqXU8+Pi4s7r+EFBQY7/t1gskiS73X5e+zx3vyX7rmi/
YWFhpvZZktVT+QEAVUeJAgB4XadOnZSZmVmqbJUnPj5ev/
76qw4ePOh4dyY9Pd30cYODg0tdWugpHTp00ObNmzV+/
PhqOR4AwHO4nA8A4HX33XefPvvsM6WmpiojI0M//vijVq9eXe7EEv3791erVq00duxYffPNN/
r000/14IMPSvr/
79a4omXLlsrPz9fmzZuVk5OjEydOuOX1lGX27NlaunSpZs+eraysLH377bd64oknPHY8AIDnUKIAAF7XoUM
Hbd26VT/
88IOuuOIKXXrppZo1a5ZiY2PL3D4wMFCrVq1Sfn6+kpOTdeuttzpm5zt3EoqKXH755brjjjs0YsQINW7cWE
8++aRbXk9Z+vTpo+XLl2vNmjXq2LGjrrzySn3xxRceOx4AwHMsBvOkAgBqgU8//VQ9e/bU7t271apVK2/
HAQDUYpQoAECNtHLlStWrV08XX3yxdu/
erbvuuksNGjTQJ5984u1oAIBajoklAAA10vHjx3XfffcpOztb0dHR6tevn5555hlvxwIA+AHeiQIAAAAAE5
hYAgAAAABMoEQBAAAAgAmUKAAAAAAwgRIFAAAAACZQogAAAADABEoUAAAAAJhAiQIAAAAAEyhRAAAAAGDC/
wOMgkYRu7Ph9wAAAABJRU5ErkJggg==",
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Assuming your DataFrame is named 'bios' and already loaded\n",
"# First, filter out rows where the height_cm data is missing\n",
"bios_filtered = bios.dropna(subset=['height_cm'])\n",
"\n",
"# Plotting the histogram\n",
"plt.figure(figsize=(10, 6))\n",
"plt.hist(bios_filtered['height_cm'], bins=20, color='blue',
edgecolor='black')\n",
"\n",
"plt.title('Distribution of Athlete Heights in Olympics')\n",
"plt.xlabel('Height in cm')\n",
"plt.ylabel('Number of Athletes')\n",
"plt.grid(True)\n",
"\n",
"# Using a logarithmic scale for the y-axis if the data spread is wide\n",
"plt.yscale('log')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What Next???"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check out some of my other tutorials:\n",
"- [Cleaning Data w/ Pandas](https://www.youtube.com/live/oad9tVEsfI0?
si=qnDOg9BSRFxcP5gZ)\n",
"- [Solving 100 Python Pandas Problems](https://youtu.be/i7v2m-ebXB4?
si=VSJHnZryqMv8GW54)\n",
"- [Real-world Data Analsys Problems w/ Python
Pandas](https://youtu.be/eMOA1pPVUc4)\n",
"\n",
"Platforms to Try\n",
"- [Stratascratch](https://stratascratch.com/?via=keith)\n",
"- [Analyst Builder](https://www.analystbuilder.com/?via=keith)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "tutorial",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

You might also like