Skip to content

Commit 8dc01e2

Browse files
Add files via upload
1 parent 18b4c52 commit 8dc01e2

File tree

1 file changed

+273
-0
lines changed

1 file changed

+273
-0
lines changed
Lines changed: 273 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
# Handling Missing Values in Pandas
2+
3+
**Upuntil now we're working on complete data i.e not having any missing values. But in real life it is the one of the main problem.**
4+
5+
*Many datasets arrive with missing data either because it exists and was not collected or it never existed.*
6+
7+
In Pandas missing data is represented by two values:
8+
9+
* `None` : None is simply is `keyword` refer as empty or none.
10+
* `NaN` : Acronym for `Not a Number`.
11+
12+
**There are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame :**
13+
14+
1. isnull()
15+
2. notnull()
16+
3. dropna()
17+
4. fillna()
18+
5. replace()
19+
20+
## 2. Checking for missing values using `isnull()` and `notnull()`
21+
22+
Let's import pandas and our fancy car-sales dataset having some missing values.
23+
24+
25+
```python
26+
import pandas as pd
27+
```
28+
29+
30+
```python
31+
car_sales_missing_df = pd.read_csv("https://raw.githubusercontent.com/kRiShNa-429407/learn-python/main/contrib/pandas/Datasets/car-sales-missing-data.csv")
32+
print(car_sales_missing_df)
33+
```
34+
35+
Make Colour Odometer Doors Price
36+
0 Toyota White 150043.0 4.0 $4,000
37+
1 Honda Red 87899.0 4.0 $5,000
38+
2 Toyota Blue NaN 3.0 $7,000
39+
3 BMW Black 11179.0 5.0 $22,000
40+
4 Nissan White 213095.0 4.0 $3,500
41+
5 Toyota Green NaN 4.0 $4,500
42+
6 Honda NaN NaN 4.0 $7,500
43+
7 Honda Blue NaN 4.0 NaN
44+
8 Toyota White 60000.0 NaN NaN
45+
9 NaN White 31600.0 4.0 $9,700
46+
47+
48+
49+
```python
50+
## Using isnull()
51+
52+
print(car_sales_missing_df.isnull())
53+
```
54+
55+
Make Colour Odometer Doors Price
56+
0 False False False False False
57+
1 False False False False False
58+
2 False False True False False
59+
3 False False False False False
60+
4 False False False False False
61+
5 False False True False False
62+
6 False True True False False
63+
7 False False True False True
64+
8 False False False True True
65+
9 True False False False False
66+
67+
68+
Note here:
69+
* `True` means for `NaN` values
70+
* `False` means for no `Nan` values
71+
72+
If we want to find the number of missing values in each column use `isnull().sum()`.
73+
74+
75+
```python
76+
print(car_sales_missing_df.isnull().sum())
77+
```
78+
79+
Make 1
80+
Colour 1
81+
Odometer 4
82+
Doors 1
83+
Price 2
84+
dtype: int64
85+
86+
87+
You can also check presense of null values in a single column.
88+
89+
90+
```python
91+
print(car_sales_missing_df["Odometer"].isnull())
92+
```
93+
94+
0 False
95+
1 False
96+
2 True
97+
3 False
98+
4 False
99+
5 True
100+
6 True
101+
7 True
102+
8 False
103+
9 False
104+
Name: Odometer, dtype: bool
105+
106+
107+
108+
```python
109+
## using notnull()
110+
111+
print(car_sales_missing_df.notnull())
112+
```
113+
114+
Make Colour Odometer Doors Price
115+
0 True True True True True
116+
1 True True True True True
117+
2 True True False True True
118+
3 True True True True True
119+
4 True True True True True
120+
5 True True False True True
121+
6 True False False True True
122+
7 True True False True False
123+
8 True True True False False
124+
9 False True True True True
125+
126+
127+
Note here:
128+
* `True` means no `NaN` values
129+
* `False` means for `NaN` values
130+
131+
#### A little note here : `isnull()` means having null values so it gives boolean `True` for NaN values. And `notnull()` means having no null values so it gives `True` for no NaN value.
132+
133+
## 2. Filling missing values using `fillna()`, `replace()`.
134+
135+
136+
```python
137+
## Filling missing values with a single value using `fillna`
138+
print(car_sales_missing_df.fillna(0))
139+
```
140+
141+
Make Colour Odometer Doors Price
142+
0 Toyota White 150043.0 4.0 $4,000
143+
1 Honda Red 87899.0 4.0 $5,000
144+
2 Toyota Blue 0.0 3.0 $7,000
145+
3 BMW Black 11179.0 5.0 $22,000
146+
4 Nissan White 213095.0 4.0 $3,500
147+
5 Toyota Green 0.0 4.0 $4,500
148+
6 Honda 0 0.0 4.0 $7,500
149+
7 Honda Blue 0.0 4.0 0
150+
8 Toyota White 60000.0 0.0 0
151+
9 0 White 31600.0 4.0 $9,700
152+
153+
154+
155+
```python
156+
## Filling missing values with the previous value using `ffill()`
157+
print(car_sales_missing_df.ffill())
158+
```
159+
160+
Make Colour Odometer Doors Price
161+
0 Toyota White 150043.0 4.0 $4,000
162+
1 Honda Red 87899.0 4.0 $5,000
163+
2 Toyota Blue 87899.0 3.0 $7,000
164+
3 BMW Black 11179.0 5.0 $22,000
165+
4 Nissan White 213095.0 4.0 $3,500
166+
5 Toyota Green 213095.0 4.0 $4,500
167+
6 Honda Green 213095.0 4.0 $7,500
168+
7 Honda Blue 213095.0 4.0 $7,500
169+
8 Toyota White 60000.0 4.0 $7,500
170+
9 Toyota White 31600.0 4.0 $9,700
171+
172+
173+
174+
```python
175+
## illing null value with the next ones using 'bfill()'
176+
print(car_sales_missing_df.bfill())
177+
```
178+
179+
Make Colour Odometer Doors Price
180+
0 Toyota White 150043.0 4.0 $4,000
181+
1 Honda Red 87899.0 4.0 $5,000
182+
2 Toyota Blue 11179.0 3.0 $7,000
183+
3 BMW Black 11179.0 5.0 $22,000
184+
4 Nissan White 213095.0 4.0 $3,500
185+
5 Toyota Green 60000.0 4.0 $4,500
186+
6 Honda Blue 60000.0 4.0 $7,500
187+
7 Honda Blue 60000.0 4.0 $9,700
188+
8 Toyota White 60000.0 4.0 $9,700
189+
9 NaN White 31600.0 4.0 $9,700
190+
191+
192+
#### Filling a null values using `replace()` method
193+
194+
**Now we are going to replace the all Nan value in the data frame with -125 value**
195+
196+
*For this we will need numpy also*
197+
198+
199+
```python
200+
import numpy as np
201+
```
202+
203+
204+
```python
205+
print(car_sales_missing_df.replace(to_replace = np.nan, value = -125) )
206+
```
207+
208+
Make Colour Odometer Doors Price
209+
0 Toyota White 150043.0 4.0 $4,000
210+
1 Honda Red 87899.0 4.0 $5,000
211+
2 Toyota Blue -125.0 3.0 $7,000
212+
3 BMW Black 11179.0 5.0 $22,000
213+
4 Nissan White 213095.0 4.0 $3,500
214+
5 Toyota Green -125.0 4.0 $4,500
215+
6 Honda -125 -125.0 4.0 $7,500
216+
7 Honda Blue -125.0 4.0 -125
217+
8 Toyota White 60000.0 -125.0 -125
218+
9 -125 White 31600.0 4.0 $9,700
219+
220+
221+
## 3. Dropping missing values using `dropna()`
222+
223+
**In order to drop a null values from a dataframe, we used `dropna()` function this function drop Rows/Columns of datasets with Null values in different ways.**
224+
225+
#### Dropping rows with at least 1 null value.
226+
227+
228+
```python
229+
print(car_sales_missing_df.dropna(axis = 0)) ##Now we drop rows with at least one Nan value (Null value)
230+
```
231+
232+
Make Colour Odometer Doors Price
233+
0 Toyota White 150043.0 4.0 $4,000
234+
1 Honda Red 87899.0 4.0 $5,000
235+
3 BMW Black 11179.0 5.0 $22,000
236+
4 Nissan White 213095.0 4.0 $3,500
237+
238+
239+
#### Dropping rows if all values in that row are missing.
240+
241+
242+
```python
243+
print(car_sales_missing_df.dropna(how = 'all',axis = 0)) ## If not have leave the row as it is
244+
```
245+
246+
Make Colour Odometer Doors Price
247+
0 Toyota White 150043.0 4.0 $4,000
248+
1 Honda Red 87899.0 4.0 $5,000
249+
2 Toyota Blue NaN 3.0 $7,000
250+
3 BMW Black 11179.0 5.0 $22,000
251+
4 Nissan White 213095.0 4.0 $3,500
252+
5 Toyota Green NaN 4.0 $4,500
253+
6 Honda NaN NaN 4.0 $7,500
254+
7 Honda Blue NaN 4.0 NaN
255+
8 Toyota White 60000.0 NaN NaN
256+
9 NaN White 31600.0 4.0 $9,700
257+
258+
259+
#### Dropping columns with at least 1 null value
260+
261+
262+
```python
263+
print(car_sales_missing_df.dropna(axis = 1))
264+
```
265+
266+
Empty DataFrame
267+
Columns: []
268+
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
269+
270+
271+
Now we drop a columns which have at least 1 missing values.
272+
273+
**Here the dataset becomes empty after dropna() because each column as atleast 1 null value so it remove that columns resulting in an empty dataframe.**

0 commit comments

Comments
 (0)