0% found this document useful (0 votes)
11 views6 pages

DS_task-2

The document outlines a process for performing Exploratory Data Analysis (EDA) on a weather dataset using Python libraries like pandas, matplotlib, and seaborn. It includes data loading, inspection, visualization, and handling missing values, along with insights derived from the dataset such as temperature averages based on weather summaries. The analysis aims to identify patterns and relationships within the weather data.

Uploaded by

dsp16026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

DS_task-2

The document outlines a process for performing Exploratory Data Analysis (EDA) on a weather dataset using Python libraries like pandas, matplotlib, and seaborn. It includes data loading, inspection, visualization, and handling missing values, along with insights derived from the dataset such as temperature averages based on weather summaries. The analysis aims to identify patterns and relationships within the weather data.

Uploaded by

dsp16026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

perform Exploratory Data Analysis on the dataset to extract meaningful insights using

pandas,matplotlib/pyplot

import pandas as pd import


matplotlib.pyplot as plt import
seaborn as sns

#10ad the dataset df=pd . read_csv( " /


content/weatherHistory. csv" ) print(df. head())

Formatted Date Summary Precip Type Temperature (C)


o 2006- 04-01 +0200 Partly Cloudy rain 9.472222 1 2006- 04-
01 01 +0200 Partly Cloudy rain 9.355556
2 2006-04-01 02 +0200 Mostly Cloudy rain 9.377778
3 2006- 04-01 03 +0200 Partly Cloudy rain 8.288889
4 2006- 04-01 +0200 Mostly Cloudy rain 8.755556

Apparent Temperature (C) Humidity Wind Speed (km/h) \


7 .388889 0.89 14.1197

1 7.227778 0.86 14.2646


2 9.377778 0.89 3.9284

3 5 .944444 0.83 14. 1036

4 6.977778 0.83 11.0446

Wind Bearing (degrees) Visibility (km) Loud Cover Pressure (millibars)


0 251.0 15.8263 0.0 1015.13
1 259.0 15.8263 0.0 1015.63
2 204.0 14.9569 0.0 1015.94
3 269.0 15.8263 0.0 1016.41 4 259.0
15.8263 0.0 1016. 51

Daily Summary
0 Partly cloudy throughout the day.
1 partly cloudy throughout the day.
2 Partly cloudy throughout the day. 3 Partly cloudy throughout the day. 4 Partly cloudy throughout the
day.

#inspecting data print(df.


shape)
(96453, 12)

print(df. info())

<class pandas. core. frame . DataFrame'>


Rangelndex: 96453 entries, 0 to 96452 Data
columns (total 12 columns):
Column Non-Null Count Dtype

o Formatted Date 96453non-null object


1 Summary 96453non-null object
2 Precip Type 95936non-null object
3 Temperature (C) 96453non-null float64
4 96453non-
Apparent Temperature float64
(C) null
5 Humidity 96453non-nullfloat64
6 Wind Speed (km/h) 96453non-nullfloat64
7 Wind Bearing 96453non-null float64
(degrees)
96453non-null float64
8 Visibility (km)
9 Loud Cover 96453non-null float64
10 Pressure (millibars) 96453non-null float64
11 Daily 96453non-null object
Summary dtypes:
float64(8), object(4)
memory usage:
8.8+ MB
None

print(df. describe())
Temperature (C) Apparent Temperature (C) Humidity
count 96453. eeeeee 96453. eeeeeo 96453.000000
mean 11.932678 10.855029
734899
std 9.551546 10.696847
195473
min -21.822222 -27.716667
ooooee
25% 4.688889
o .600000
12. eeeeee 12. eeoooo
780000
75% 18.838889 18.838889
o .890000
max 39.905556 39. 344444 1 .
000000
Wind Speed (km/h) Wind Bearing Visibility Loud
(degrees) Cover
(km)
count 96453. 96453. 96453. eøøøøø 96453.
eeeeee oeeeee e
mean 10.81064e 187.509232 le. e.e
347325
std 6.913571 107.383428 4. e.e
192123
min o. eeeeee o. eeeeee e.
øøøøøø
25% 5.8282ee 116. oeeeee 8. 3398ØØ e.e
sex 9.9659ee 180. eeeeee e.e
14.1358ee 290. eeeeee 14.812øøø e.e
max 63 .8526 359. eeeeee 16. løøøøø e.e
00
Pressure (millibars) count
96453. eeeeee mean
1003 .235956 std
116.9699e6 min a.
eeeeee
1011. geeeee
sex 1016.45eeee 75% 1021.
egeeee max 1046.38eeee

print (df. columns)

' Formatted Date' , ' Summary ' , ' Precip Type' , 'Temperature (C) ' ,
' Apparent Temperature (C) ' , 'Humidity' , 'Wind Speed (km/h) ,
'Wind Bearing (degrees) , 'Visibility (km)' , 'Loud Cover • ,
'pressure (millibars)', 'Daily Summary' ] , dtype=
' object ' )

#visualizing the data pit. figure 6)) sns.x:


'Temperature , kde=True) pit. title( ' Distribution
Of Numerical Feature ' ) plt. show()

Distribution of Numerical Feature


3000

2500

2000

e 1500

1000

500

Temperature (C)

6)) plt. figure sns. 'Wind Speed (km/h) ' , y='Wind


Bearing (degrees) ' plt.title( Relationship
between Features' ) plt. show()
Relationship between Features

Wind Speed (km/h)

plt. 6) )
# Calculate correlation matrix, excluding non-numeric columns sns . heatmap(df.
select_dtypes(include=[ 'number' ] ) . corr(), annot=True, cmap= ' coolwarm' ) plt.
title( 'Correlation Matrix' ) plt. show()

Correlation Matrix 1.0


0.009 0.03 0.39
Temperature (C)
-0.63
-0.057
Apparent Temperature (C)
0.029
-0.0054
-0.6 0.38
Humidity
0.99 .00022
-0.22 ' 0.00073
Wind Speed (km/h)
- 00055
Wind Bearing (degrees) 0.6
3
-0.049
0.1 0.1
Visibility (km) 0,009 -0.057 -0.22
-0,012

Loud 0.029 0.00073


0.03 0.1 0.048
Cover - 0.06

pressure (miliibars) 0.39


0.1
0.38 -0.37 0.048

#identifying
patterns and relationships.
print(df.
groupby( ' Summary' ) [ 'Temperature (C) ' ] .mean())
Summary

Breezy 7 .92201
Breezy and Dry 6

Breezy and Foggy -e.


510317
Breezy and Mostly Cloudy 11.093411
Breezy and Overcast 7.
241614
Breezy and Partly Cloudy 12.492761
Clear 11.9251e9
Dangerously windy and Partly Cloudy 8 .944444
Drizzle 10.847578
Dry 29.08366B
Dry and Mostly Cloudy 26.838492 Dry and Partly Cloudy
26. 605749
Foggy 1.464€35
Humid and Mostly Cloudy 20.886389
Humid and Overcast 21. 515Ø79
Humid and partly Cloudy 21.5683e1
Light Rain 10.021517
Mostly Cloudy 12. 629334
Overcast 7.5165e2 partly Cloudy 16. ø24782
Rain 10. B96111
Windy 6.8e4861

Windy and Dry 27. 222222


Windy and Foggy 11.876389
Windy and Mostly Cloudy 11.8346e3
Windy and Overcast 7 .932963
Windy and Partly Cloudy 9 .968€76
Name : Temperature (C), dtype: float64

x: 'Summary'
plt.title( 'Distribution of Categorical Feature •
plt. show()

Distribution of Categorical Feature

30000

25000

20000

15000

10000

5000

Summary

plt. figure 6)) sns.boxplot(data=df, x- 'Summary', y-


'Temperature (C) ' ) plt.title( 'Relationship between Categorical
and Numerical Feature ) pit. show()
Relationship between Categorical and Numerical Feature
#handling the missing values.
print(df. isnull() . sum()) df
= df.dropna()
Formatted Date
Summary e
Precip Type 517
Temperature (C) e
Apparent Temperature (C)
Humidity
Wind Speed (km/h)
Wind Bearing (degrees) e
Visibility (km) e
Loud Cover e
Pressure (millibars)
Daily Summary
dtype: int64

Name: T.LAVANYA
Roll no: 21MQ1A0531
college: Sri vasavi institute Of engineering and
technology
Email: lavanyaterli.2003@gmail.com

You might also like