Skip to content

Commit 579120f

Browse files
authored
Merge pull request animator#619 from kRiShNa-429407/main
Add Content: Handling Missing Values In Pandas
2 parents c9b8146 + 8c95bb1 commit 579120f

File tree

3 files changed

+276
-0
lines changed

3 files changed

+276
-0
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Make,Colour,Odometer,Doors,Price
2+
Toyota,White,150043,4,"$4,000"
3+
Honda,Red,87899,4,"$5,000"
4+
Toyota,Blue,,3,"$7,000"
5+
BMW,Black,11179,5,"$22,000"
6+
Nissan,White,213095,4,"$3,500"
7+
Toyota,Green,,4,"$4,500"
8+
Honda,,,4,"$7,500"
9+
Honda,Blue,,4,
10+
Toyota,White,60000,,
11+
,White,31600,4,"$9,700"
Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
# Handling Missing Values in Pandas
2+
3+
In real life, many datasets arrive with missing data either because it exists and was not collected or it never existed.
4+
5+
In Pandas missing data is represented by two values:
6+
7+
* `None` : None is simply is `keyword` refer as empty or none.
8+
* `NaN` : Acronym for `Not a Number`.
9+
10+
There are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame:
11+
12+
1. `isnull()`
13+
2. `notnull()`
14+
3. `dropna()`
15+
4. `fillna()`
16+
5. `replace()`
17+
18+
## 2. Checking for missing values using `isnull()` and `notnull()`
19+
20+
Let's import pandas and our fancy car-sales dataset having some missing values.
21+
22+
```python
23+
import pandas as pd
24+
25+
car_sales_missing_df = pd.read_csv("Datasets/car-sales-missing-data.csv")
26+
print(car_sales_missing_df)
27+
```
28+
29+
Make Colour Odometer Doors Price
30+
0 Toyota White 150043.0 4.0 $4,000
31+
1 Honda Red 87899.0 4.0 $5,000
32+
2 Toyota Blue NaN 3.0 $7,000
33+
3 BMW Black 11179.0 5.0 $22,000
34+
4 Nissan White 213095.0 4.0 $3,500
35+
5 Toyota Green NaN 4.0 $4,500
36+
6 Honda NaN NaN 4.0 $7,500
37+
7 Honda Blue NaN 4.0 NaN
38+
8 Toyota White 60000.0 NaN NaN
39+
9 NaN White 31600.0 4.0 $9,700
40+
41+
42+
43+
```python
44+
## Using isnull()
45+
46+
print(car_sales_missing_df.isnull())
47+
```
48+
49+
Make Colour Odometer Doors Price
50+
0 False False False False False
51+
1 False False False False False
52+
2 False False True False False
53+
3 False False False False False
54+
4 False False False False False
55+
5 False False True False False
56+
6 False True True False False
57+
7 False False True False True
58+
8 False False False True True
59+
9 True False False False False
60+
61+
62+
Note here:
63+
* `True` means for `NaN` values
64+
* `False` means for no `Nan` values
65+
66+
If we want to find the number of missing values in each column use `isnull().sum()`.
67+
68+
69+
```python
70+
print(car_sales_missing_df.isnull().sum())
71+
```
72+
73+
Make 1
74+
Colour 1
75+
Odometer 4
76+
Doors 1
77+
Price 2
78+
dtype: int64
79+
80+
81+
You can also check presense of null values in a single column.
82+
83+
84+
```python
85+
print(car_sales_missing_df["Odometer"].isnull())
86+
```
87+
88+
0 False
89+
1 False
90+
2 True
91+
3 False
92+
4 False
93+
5 True
94+
6 True
95+
7 True
96+
8 False
97+
9 False
98+
Name: Odometer, dtype: bool
99+
100+
101+
102+
```python
103+
## using notnull()
104+
105+
print(car_sales_missing_df.notnull())
106+
```
107+
108+
Make Colour Odometer Doors Price
109+
0 True True True True True
110+
1 True True True True True
111+
2 True True False True True
112+
3 True True True True True
113+
4 True True True True True
114+
5 True True False True True
115+
6 True False False True True
116+
7 True True False True False
117+
8 True True True False False
118+
9 False True True True True
119+
120+
121+
Note here:
122+
* `True` means no `NaN` values
123+
* `False` means for `NaN` values
124+
125+
`isnull()` means having null values so it gives boolean `True` for NaN values. And `notnull()` means having no null values so it gives `True` for no NaN value.
126+
127+
## 2. Filling missing values using `fillna()`, `replace()`.
128+
129+
130+
```python
131+
## Filling missing values with a single value using `fillna`
132+
print(car_sales_missing_df.fillna(0))
133+
```
134+
135+
Make Colour Odometer Doors Price
136+
0 Toyota White 150043.0 4.0 $4,000
137+
1 Honda Red 87899.0 4.0 $5,000
138+
2 Toyota Blue 0.0 3.0 $7,000
139+
3 BMW Black 11179.0 5.0 $22,000
140+
4 Nissan White 213095.0 4.0 $3,500
141+
5 Toyota Green 0.0 4.0 $4,500
142+
6 Honda 0 0.0 4.0 $7,500
143+
7 Honda Blue 0.0 4.0 0
144+
8 Toyota White 60000.0 0.0 0
145+
9 0 White 31600.0 4.0 $9,700
146+
147+
148+
149+
```python
150+
## Filling missing values with the previous value using `ffill()`
151+
print(car_sales_missing_df.ffill())
152+
```
153+
154+
Make Colour Odometer Doors Price
155+
0 Toyota White 150043.0 4.0 $4,000
156+
1 Honda Red 87899.0 4.0 $5,000
157+
2 Toyota Blue 87899.0 3.0 $7,000
158+
3 BMW Black 11179.0 5.0 $22,000
159+
4 Nissan White 213095.0 4.0 $3,500
160+
5 Toyota Green 213095.0 4.0 $4,500
161+
6 Honda Green 213095.0 4.0 $7,500
162+
7 Honda Blue 213095.0 4.0 $7,500
163+
8 Toyota White 60000.0 4.0 $7,500
164+
9 Toyota White 31600.0 4.0 $9,700
165+
166+
167+
168+
```python
169+
## illing null value with the next ones using 'bfill()'
170+
print(car_sales_missing_df.bfill())
171+
```
172+
173+
Make Colour Odometer Doors Price
174+
0 Toyota White 150043.0 4.0 $4,000
175+
1 Honda Red 87899.0 4.0 $5,000
176+
2 Toyota Blue 11179.0 3.0 $7,000
177+
3 BMW Black 11179.0 5.0 $22,000
178+
4 Nissan White 213095.0 4.0 $3,500
179+
5 Toyota Green 60000.0 4.0 $4,500
180+
6 Honda Blue 60000.0 4.0 $7,500
181+
7 Honda Blue 60000.0 4.0 $9,700
182+
8 Toyota White 60000.0 4.0 $9,700
183+
9 NaN White 31600.0 4.0 $9,700
184+
185+
186+
#### Filling a null values using `replace()` method
187+
188+
Now we are going to replace the all `NaN` value in the data frame with -125 value
189+
190+
For this we will also need numpy
191+
192+
193+
```python
194+
import numpy as np
195+
196+
print(car_sales_missing_df.replace(to_replace = np.nan, value = -125))
197+
```
198+
199+
Make Colour Odometer Doors Price
200+
0 Toyota White 150043.0 4.0 $4,000
201+
1 Honda Red 87899.0 4.0 $5,000
202+
2 Toyota Blue -125.0 3.0 $7,000
203+
3 BMW Black 11179.0 5.0 $22,000
204+
4 Nissan White 213095.0 4.0 $3,500
205+
5 Toyota Green -125.0 4.0 $4,500
206+
6 Honda -125 -125.0 4.0 $7,500
207+
7 Honda Blue -125.0 4.0 -125
208+
8 Toyota White 60000.0 -125.0 -125
209+
9 -125 White 31600.0 4.0 $9,700
210+
211+
212+
## 3. Dropping missing values using `dropna()`
213+
214+
In order to drop a null values from a dataframe, we used `dropna()` function this function drop Rows/Columns of datasets with Null values in different ways.
215+
216+
#### Dropping rows with at least 1 null value.
217+
218+
219+
```python
220+
print(car_sales_missing_df.dropna(axis = 0)) ##Now we drop rows with at least one Nan value (Null value)
221+
```
222+
223+
Make Colour Odometer Doors Price
224+
0 Toyota White 150043.0 4.0 $4,000
225+
1 Honda Red 87899.0 4.0 $5,000
226+
3 BMW Black 11179.0 5.0 $22,000
227+
4 Nissan White 213095.0 4.0 $3,500
228+
229+
230+
#### Dropping rows if all values in that row are missing.
231+
232+
233+
```python
234+
print(car_sales_missing_df.dropna(how = 'all',axis = 0)) ## If not have leave the row as it is
235+
```
236+
237+
Make Colour Odometer Doors Price
238+
0 Toyota White 150043.0 4.0 $4,000
239+
1 Honda Red 87899.0 4.0 $5,000
240+
2 Toyota Blue NaN 3.0 $7,000
241+
3 BMW Black 11179.0 5.0 $22,000
242+
4 Nissan White 213095.0 4.0 $3,500
243+
5 Toyota Green NaN 4.0 $4,500
244+
6 Honda NaN NaN 4.0 $7,500
245+
7 Honda Blue NaN 4.0 NaN
246+
8 Toyota White 60000.0 NaN NaN
247+
9 NaN White 31600.0 4.0 $9,700
248+
249+
250+
#### Dropping columns with at least 1 null value
251+
252+
253+
```python
254+
print(car_sales_missing_df.dropna(axis = 1))
255+
```
256+
257+
Empty DataFrame
258+
Columns: []
259+
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
260+
261+
262+
Now we drop a columns which have at least 1 missing values.
263+
264+
Here the dataset becomes empty after `dropna()` because each column as atleast 1 null value so it remove that columns resulting in an empty dataframe.

contrib/pandas/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@
66
- [Group By Functions with Pandas](GroupBy_Functions_Pandas.md)
77
- [Excel using Pandas DataFrame](excel_with_pandas.md)
88
- [Importing and Exporting Data in Pandas](import-export.md)
9+
- [Handling Missing Values in Pandas](handling-missing-values.md)

0 commit comments

Comments
 (0)