Chapter1 PDF
Chapter1 PDF
Chapter1 PDF
constraints
C L E A N I N G D ATA I N P Y T H O N
Adel Nehme
Content Developer @ DataCamp
Course outline
SalesOrderID int64
Revenue object
Quantity int64
dtype: object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31465 entries, 0 to 31464
Data columns (total 3 columns):
SalesOrderID 31465 non-null int64
Revenue 31465 non-null object
Quantity 31465 non-null int64
dtypes: int64(2), object(1)
memory usage: 737.5+ KB
'23153$1457$36865$32474$472$27510$16158$5694$6876$40487$807$6893$9153$6895$4216..
df['marriage_status'].describe()
marriage_status
...
mean 1.4
std 0.20
min 0.00
50% 1.8 ...
marriage_status
count 241
unique 4
top 1
freq 120
Adel Nehme
Content Developer @ DataCamp
Motivation
movies.head()
movie_name avg_rating
0 The Godfather 5
1 Frozen 2 3
2 Shrek 4
...
movie_name avg_rating
23 A Beautiful Mind 6
65 La Vita e Bella 6
77 Amelie 6
# Assert statement
assert movies['avg_rating'].max() <= 5
subscription_date object
user_name object
Country object
dtype: object
# Convert to DateTime
user_signups['subscription_date'] = pd.to_datetime(user_signups['subscription_date'])
Adel Nehme
Content Developer @ DataCamp
What are duplicate values?
All columns have the same values
1 False
... ....
22 True
23 False
... ...
keep : Whether to keep rst ( 'first' ), last ( 'last' ) or all ( False ) duplicate values.
keep : Whether to keep rst ( 'first' ), last ( 'last' ) or all ( False ) duplicate values.
inplace : Drop duplicated rows directly inside DataFrame without creating new object ( True
).
# Drop duplicates
height_weight.drop_duplicates(inplace = True)