Mall Customer Data Analysis PDF
Mall Customer Data Analysis PDF
Mall Customer Data Analysis PDF
ipynb - Colaboratory
Project planning –
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 1/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
data = pd.read_csv("Mall_Customers.csv")
data.head(10)
CustomerID Genre Ag Annual Income ( k$) Spending Score (1-100)
e
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 2/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
3 4 Female 23 16 77
4 5 Female 31 17 40
5 6 Female 22 17 76
6 7 Female 35 18 6
7 8 Female 23 18 94
8 9 Male 64 19 3
9 10 Female 30 19 72
data.isnull().sum()
CustomerID 0
Genre 0
Age 0
Annual Income ( k $ ) 0
Spending Score ( 1- 100)
0 dtype: i n t 64
3- Rename Data Frame
column names if
required.
data.rename(columns = {'Genre':'Gender','Spending Score (1-100)':'SpendingScore','Annual I
data.head(3)
0 1 Male19 15 39
1 2 Male21 15 81
2 3 Female20 16 6
x = d f . i l o c [ : , [ 2 , 3 , 4 ] ] # s p l i t i n g columns x
0 19 15 39
1 21 15 81
2 20 16 6
3 23 16 77
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 3/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
4 31 17 40
195 35 120 79
196 45 126 28
197 32 126 74
198 32 137 18
199 30 137 83
x.dtypes
Age int64
AnnualIncome(k$)
SpendingScore int64
dtype: obj ect
int64
5- Perform descriptive statistics and calculate mean, median etc.
import s t a t i s t i c s as s t
df=pd.DataFrame(x) df.mean()
Age 38.85
AnnualIncome(k$) 60.56
SpendingScore 50.20
dtype: f l oa t 64
st.median(x)
'AnnualIncome(k$)'
st.mode(x)
' A ge'
df.mode()
0 32.0 54 42.0
1 NaN 78 NaN
df.std()
Age 13.969007
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 4/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
AnnualIncome(k$) 26.264721
SpendingScore 25.823522
dtype: f l o at 64
sns.boxplot(data =x)
<AxesSubplot: >
data.corr()
-0.32722684603909014
sns.heatmap(data.corr(),annot=True)
<AxesSubplot:>
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 5/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
sns.heatmap(x.corr(),annot=True)
<AxesSubplot:>
(array ([[17., 20., 12., 30., 43., 21., 24., 16., 17.],
[ 0 . , 25., 59., 47., 40., 19., 10., 0., 0.]]),
array([ 1. , 11.88888889, 22.77777778, 33.66666667, 44.55555556,
55.44444444, 66.33333333, 77.22222222, 88.11111111, 99. ] ) , <a
list
o f 2 BarContainer objects>)
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 6/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
x.columns
x
Age AnnualIncome(k$)
SpendingScore
0 19 15 39
1 21 15 81
2 20 16 6
3 23 16 77
4 31 17 40
196 45 126 28
197 32 126 74
198 32 137 18
199 30 137 83
s ns .di s pl ot ( dat a = data, x = "AnnualIncome(k$)",hue = 'Gender')
200 rows × 3 columns
<seaborn.axisgrid.FacetGrid a t 0x1f8ff324220>
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 7/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
data.dtypes
CustomerID i nt 64
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 8/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
Gender object
Age
AnnualIncome(k$) int64
SpendingScore
dtype: object int64
int64
11- Perform Train and Test split for Client data and t into required
Model.
88 34 58
58 27 46
113 19 64
149 34 78
42 34
12- Create model as per requirement and perform36
...
classi... cation/regression/clustering....
151 39 78
67 68 48
from sklearn.linear_model import LinearRegression from
sklearn.model_selection 25 29 import28
train_test_split
l r = LinearRegression()
array([40.41222091,
61.34408401, 50.75108989, 54.33957352, 47.61793098, 40.95952258,
57.40374973, 45.67135234,
32.58447792, 45.26776466, 62.2822877 , 62.73761242, 49.14367625,
52.28952082, 52.8701253 ,
57.46698388, 40.85029995, 61.60276918, 50.81432404, 43.3384317 ,
38.20002002, 52.60569158, 54.43155047, 37.97582621, 41.26994478,
54.49478462, 44.06968726, 54.71897843, 33.79405243, 57.52446947,
31.87046803, 33.79405243, 39.93390196, 60.957742 , 59.84014444,
44.62848604, 47.75014784, 56.37238055, 39.18540072, 35.61991133,
50.77408413, 38.07355171, 31.27717789, 49.55301249, 52.34125785,
34.25512571, 59.05715184, 54.58676157, 60.81977658, 45.37123872,
41.48383001, 58.94792922, 44.08693294, 59.74816749, 50.16929687,
49.32307012, 42.03113167, 54.42580191, 61.67175189, 57.36350981])
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 9/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 10/10