Fdepcdd

Download as pdf or txt
Download as pdf or txt
You are on page 1of 193

Fundamentos de Estatística para Ciência de

Dados

Filipe J. Zabala

2020-11-07
Histogram of altura
40
30
Frequency

20
10
0

1.50 1.55 1.60 1.65 1.70 1.75

altura
dnorm(x)

0.0 0.1 0.2 0.3 0.4

−3
−2
−1

x
0
1
2
3
dnorm(x, mean = mu, sd = sigma)

0.000 0.010 0.020

60
80

x
100
120
140
50
40
Histogram of ma
Frequency

30
20
10
0

90 95 100 105

ma
dnorm(x)

0.0 0.1 0.2 0.3 0.4

−3
−2
−1

x
0
1
2
3
dt(x, df = n − 1)

0.0 0.1 0.2 0.3 0.4

−3
−2
−1

x
0
1
2
3
dchisq(x, df = n − 1)

0.00 0.01 0.02 0.03 0.04

0
20

x
40
60
80
dchisq(x, df = 5)

0.00 0.05 0.10 0.15

0
5

x
10
15
20
x

4 5 6 7 8 9 10

g
2
3
Standardized residuals
Residuals vs Fitted Normal Q−Q
2
12 12
Residuals

0.5
0

−1.5
−2

8 8
11 11

6.5 7.0 7.5 8.0 8.5 9.0 −1 0 1

Fitted values Theoretical Quantiles


Standardized residuals

Scale−Location Cook's distance

Cook's distance
11 11
12 8

0.20
12
0.8

0.00
0.0

6.5 7.0 7.5 8.0 8.5 9.0 2 4 6 8 10 12 14

Fitted values Obs. number


x

4 5 6 7 8 9 10

g
2
3
550 800 120 220 2.0 4.0 0 3 10 40

X1

0.65
X2
550

X3

250
X4
120

X5

3.5
X6
2.0

X7

0.0
X8
0 4

Y1

10
Y2
10

0.65 0.95 250 400 3.5 6.0 0.0 0.3 10 40


Variances Variances

0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4

1
1

2
2
prcomp(df)

3
3

prcomp(df, scale = T)

4
4
4.5
Species Species
Petal.Length
Sepal.Width

4.0 6
3.5 setosa setosa
4
3.0 versicolor versicolor
2.5 2
virginica virginica
2.0
5 6 7 8 5 6 7 8
Sepal.Length Sepal.Length
2.5 Species Species
Petal.Length
Petal.Width

2.0 6
1.5 setosa setosa
4
1.0 versicolor versicolor
0.5 virginica 2 virginica
0.0
5 6 7 8 2.0 2.5 3.0 3.5 4.0 4.5
Sepal.Length Sepal.Width
2.5 Species 2.5 Species
Petal.Width

Petal.Width

2.0 2.0
1.5 setosa 1.5 setosa
1.0 versicolor 1.0 versicolor
0.5 virginica 0.5 virginica
0.0 0.0
2.0 2.5 3.0 3.5 4.0 4.5 2 4 6
Sepal.Width Petal.Length
0.2

0.1

Species
PC2 (5.31%)

Petal.Length
Petal.Width setosa
0.0
versicolor
virginica

−0.1 Sepal.Length
Sepal.Width

−0.2

−0.10 −0.05 0.00 0.05 0.10 0.15


PC1 (92.46%)
0.2

0.1
PC2 (22.85%)

Species
setosa
0.0 Petal.Length
Petal.Width versicolor
virginica
Sepal.Length
−0.1

Sepal.Width
−0.2

−0.1 0.0 0.1


PC1 (72.96%)
Height

0 2 4 6

South Dakota
West Virginia
North Dakota
Vermont
Maine
Iowa
New Hampshire
Idaho
Montana
Nebraska
Kentucky
Arkansas
Virginia
Wyoming
Missouri
Oregon
Washington
Delaware
Rhode Island
Massachusetts
New Jersey
Connecticut
Minnesota
Wisconsin
Oklahoma
Indiana

dUSA
Kansas
Ohio
Pennsylvania
Hawaii
Utah
Colorado

hclust (*, "complete")


California
Nevada

Cluster Dendrogram
Florida
Texas
Illinois
New York
Arizona
Michigan
Maryland
New Mexico
Alaska
Alabama
Louisiana
Georgia
Tennessee
North Carolina
Mississippi
South Carolina
Height

0
2
4
6
South Dakota
West Virginia
North Dakota
Vermont
Maine
Iowa
New Hampshire
Idaho
Montana
Nebraska
Kentucky
Arkansas

Cluster Dendrogram
Virginia
Wyoming
Missouri
Oregon
Washington
Delaware
Rhode Island
Massachusetts
New Jersey
Connecticut
Minnesota
Wisconsin
Oklahoma
Indiana
Kansas
Ohio
Pennsylvania
Hawaii
Utah
Colorado
California
Nevada
Florida
Texas
Illinois
New York
Arizona
Michigan
Maryland
New Mexico
Alaska
Alabama
Louisiana
Georgia
Tennessee
North Carolina
Mississippi
South Carolina
Height

0
2
4
6
South Dakota
West Virginia
North Dakota
Vermont
Maine
Iowa
New Hampshire
Idaho
Montana
Nebraska
Kentucky
Arkansas

Cluster Dendrogram
Virginia
Wyoming
Missouri
Oregon
Washington
Delaware
Rhode Island
Massachusetts
New Jersey
Connecticut
Minnesota
Wisconsin
Oklahoma
Indiana
Kansas
Ohio
Pennsylvania
Hawaii
Utah
Colorado
California
Nevada
Florida
Texas
Illinois
New York
Arizona
Michigan
Maryland
New Mexico
Alaska
Alabama
Louisiana
Georgia
Tennessee
North Carolina
Mississippi
South Carolina
Height

0
2
4
6

South Dakota
West Virginia
North Dakota
Vermont
Maine
Iowa
New Hampshire
Idaho
Montana
Nebraska
Kentucky
Arkansas
Cluster Dendrogram

Virginia
Wyoming
Missouri
Oregon
Washington
Delaware
Rhode Island
Massachusetts
New Jersey
Connecticut
Minnesota
Wisconsin
Oklahoma
Indiana
Kansas
Ohio
Pennsylvania
Hawaii
Utah
Colorado
California
Nevada
Florida
Texas
Illinois
New York
Arizona
Michigan
Maryland
New Mexico
Alaska
Alabama
Louisiana
Georgia
Tennessee
North Carolina
Mississippi
South Carolina
Height

0
2
4
6
South Dakota
West Virginia
North Dakota
Vermont
Maine
Iowa
New Hampshire
Idaho
Montana
Nebraska
Kentucky
Arkansas
Cluster Dendrogram

Virginia
Wyoming
Missouri
Oregon
Washington
Delaware
Rhode Island
Massachusetts
New Jersey
Connecticut
Minnesota
Wisconsin
Oklahoma
Indiana
Kansas
Ohio
Pennsylvania
Hawaii
Utah
Colorado
California
Nevada
Florida
Texas
Illinois
New York
Arizona
Michigan
Maryland
New Mexico
Alaska
Alabama
Louisiana
Georgia
Tennessee
North Carolina
Mississippi
South Carolina
3
2
Sepal.Width

1
0
−1
−2

−2 −1 0 1 2

Sepal.Length
3
2
Sepal.Width

1
0
−1
−2

−2 −1 0 1 2

Sepal.Length
3
2
Sepal.Width

1
0
−1
−2

−2 −1 0 1 2

Sepal.Length
Cluster plot
61
69
94 81 63 120 135
42 99
2 58 82 54 107 88 73
60 90
80 70 95 114 124
14 93
83 91 102 147 109
48 39 9 46 26 100 122
1 84 115
43 4 13 2 31 68 56 134
65 97 143
Dim2 (22.9%)

72 74 55 127 112 129 133


7 3 10 35 cluster
30 89 85 79 64 77 104 105 131
12 25 36 24 150
0 41 50 96 67 117 148 146 108 a 1
75 59
23 8 29 27 40 98
1 92 128 78 138 130 113 103 119 a 2
44 32 62 87
38 5 18 139 76 140 141 123
−1 28 37 21 57 111 101
49 71 142 106
47 45 11 22 66 51 116 126
86 136
20 17 19 52 53 149 137 125 121
144
−2 33 15 6 145
34 110
118
16
132
−2 0 2
Dim1 (73%)
Cluster plot
61
58 54 63 114
42 81 88 120
2 60 94 107 135
82 90 69
99 95 73 124
14 9 4 46 70 109
68 80 93 102 147
48 39 13 91 122 127 104
1 2 100 83 84 143
43 10 26 31 56 115 129
65 72 74 134
Dim2 (22.9%)

7 3 55 112 133 105 cluster


35 85 97
30 79 64 77 148 146 119
12 25 36 24 a 1
0 89 67 150
41 50 75 139 128 117 131 108 a 2
23 8 27 40 96 98 78 113
1 92 76 130 141 123
29 32 62 138 a 3
59 71 87 142
38 5 18 21 140 103
−1 28 44 57 111
106
49 52 121 136
20 45 22 37 66 116 149 101
86 126 144
47 17 6 11 51 137 125 145
−2 33 15 19 53 110
34
16 118
132
−2 0 2
Dim1 (73%)

Cluster plot
61
42 70 54 63 69
94 90
58 107 88 120
2 82 81
99 60 95 114 135
14 2 93 147
83 80 122 102 73
48 39 9 46 26 91 124 109
1 100 68 84 143
43 4 13 31 35 74 115 104
97 56 127 cluster
Dim2 (22.9%)

72 64 112 129 148


7 3 10 30 36 65 85 134 a 1
79 55 77 133 105 131
12 25 50 24
0 41 89 67 150 a 2
75 139 128 117 113 108
23 8 40 27 96 98 78 146
76 130 141 119 a 3
1 92 116
29 32 62 87 53 103
38 5 18 59 71 140 123 a 4
−1 28 44 21 57 51 101 142 136 106
49 47 45 22 37 86 66 149
138 126 121
20 17 6 11 52 137 144
111 145
−2 33 15 19 125
110
34 118
16 132
−2 0 2
Dim1 (73%)
Optimal number of clusters
600
Total Within Sum of Square

400

200

1 2 3 4 5 6 7 8 9 10
Number of clusters k
Optimal number of clusters
0.6
Average silhouette width

0.4

0.2

0.0
1 2 3 4 5 6 7 8 9 10
Number of clusters k
Optimal number of clusters

0.5
Gap statistic (k)

0.4

0.3

0.2
1 2 3 4 5 6 7 8 9 10
Number of clusters k
500

400

300

1960 1980 2000

You might also like