AIDS C04 Session 24

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

21CS2213RA

AI for Data Science

Session -24

Contents: Density diagrams, Mean, Standard Deviation , Median,


Quantiles and correlations

1
The topics covered
• Representing statistical measures:
• Density diagrams
• Mean, Standard Deviation ,
• Median,
• Quantiles,
• and correlations
Density Plot
• A Density plot is a smoothed, continuous version of a histogram
estimated from the data.
• The most common form of estimation is known as kernel density
estimation.
• In this method, a continuous curve (the kernel) is drawn at every
individual data point and all of these curves are then added together
to make a single smooth density estimation.
Why Density Plot?
• It visualizes the distribution of data over a continuous interval or time
period.
• This chart is a variation of a Histogram that uses kernel smoothing to
plot values, allowing for smoother distributions by smoothing out the
noise.
• The peaks of a Density Plot help display where values are
concentrated over the interval.
• Density Plots have over Histograms is that they're better at
determining the distribution shape because they're not affected by the
number of bins used (each bar used in a typical histogram).
Example of Density Plot
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

data = np.random.normal(10,3,100) #
Generate Data
density = gaussian_kde(data)

x_vals = np.linspace(0,20,200) #
Specifying the limits of our data
density.covariance_factor = lambda : .5
#Smoothing parameter

density._compute_covariance()
plt.plot(x_vals,density(x_vals))
plt.show()
5
Statistical measures
• Statistics, in general, is the method of collection of data, tabulation,
and interpretation of numerical data
• With statistics, we can see how data can be used to solve complex
problems.
Descriptive Statistics

• descriptive statistics generally means describing the data with the


help of some representative methods like charts, tables, Excel files,
etc.
• The data is described in such a way that it can express some
meaningful information that can also be used to find some future
trends.
• Describing and summarizing a single variable is called univariate
analysis.
• Describing a statistical relationship between two variables is
called bivariate analysis.
• Describing the statistical relationship between multiple variables is
called multivariate analysis.
Mean
• It is the sum of observations divided by the total number of observations. It
is also defined as average which is the sum divided by count.
• The mean() function returns the mean or average of the data passed in its
arguments. If passed argument is empty, Statistics Error is raised.
• Example:

# mean()
import statistics

# initializing list
li = [1, 2, 3, 3, 2, 2, 2, 1]

# using mean() to calculate average of list


# elements
print ("The average of list values is : ",end="")
print (statistics.mean(li))
median()
• median() function is used to calculate the median, i.e middle element
of data. If the passed argument is empty, StatisticsError is raised.
Caclulating Median
• Step 1:Arrange the data in the increasing order and then find the mid
value.
• Step 2:Calulate median using the function.
Mode
• Mode is the number which occur most often in the data set.Here 150
is occurring twice so this is our mode.
Co-relations and Heat map
• A correlation heatmap is a graphical representation of a correlation
matrix representing the correlation between different variables.
• The value of correlation can take any value from -1 to 1.
• Correlation between two random variables or bivariate data does not
necessarily imply a causal relationship.
How to create seaborn correlation heatmap
Steps:
• Install seaborn package
• Ex: pip install seaborn
• Import all required modules
• Import the file where your data is stored
• Plot a heatmap
• Display it using matplotlib
Example
• import matplotlib.pyplot as py
• import pandas as pd
• import seaborn as sb

• # import file with data
• data = pd.read_csv(“data.csv”)
• print(data.corr())
• dataplot = sb.heatmap(data.corr(), cmap="YlGnBu", annot=True)

• # displaying heatmap
• py.show()
Thank you

15

You might also like