Similarity
Similarity
Similarity
• Similarity measure
• Numerical measure of how alike two data objects are.
• Is higher when objects are more alike.
• Often falls in the range [0,1]
• Dissimilarity measure
• Numerical measure of how different two data objects are
• Lower when objects are more alike
• Minimum dissimilarity is often 0
• Upper limit varies
• Proximity refers to a similarity or dissimilarity
Similarity/Dissimilarity for Simple Attributes
The following table shows the similarity and dissimilarity
between two objects, x and y, with respect to a single, simple
attribute.
Euclidean Distance
• Euclidean Distance
3
point x y
2 p1
p1 0 2
p3 p4
1
p2 2 0
p2 p3 3 1
0 p4 5 1
0 1 2 3 4 5 6
p1 p2 p3 p4
p1 0 2.828 3.162 5.099
p2 2.828 0 1.414 3.162
p3 3.162 1.414 0 2
p4 5.099 3.162 2 0
Distance Matrix
Minkowski Distance
•
Minkowski Distance: Examples
• r = 2. Euclidean distance
8
Distances for real vectors
• Vectors 𝑥 = 𝑥1 , … , 𝑥𝑑 and 𝑦 = (𝑦1 , … , 𝑦𝑑 )
• Lp norms or Minkowski distance:
𝐿𝑝 𝑥, 𝑦 = 𝑥1 − 𝑦1 𝑝 + ⋯ + 𝑥𝑑 − 𝑦𝑑 𝑝 1ൗ𝑝
• L∞ norm:
𝐿∞ 𝑥, 𝑦 = max 𝑥1 − 𝑦1 , … , |𝑥𝑑 − 𝑦𝑑 |
• The limit of Lp as p goes to infinity.
Minkowski Distance
L1 p1 p2 p3 p4
p1 0 4 4 6
p2 4 0 2 4
p3 4 2 0 2
p4 6 4 2 0
point x y
p1 0 2 L2 p1 p2 p3 p4
p2 2 0 p1 0 2.828 3.162 5.099
p3 3 1 p2 2.828 0 1.414 3.162
p4 5 1 p3 3.162 1.414 0 2
p4 5.099 3.162 2 0
L p1 p2 p3 p4
p1 0 2 3 5
p2 2 0 1 3
p3 3 1 0 2
p4 5 3 2 0
Distance Matrix
Example of Distances
y = (9,8)
L2-norm:
𝑑𝑖𝑠𝑡(𝑥, 𝑦) = 42 + 32 = 5
5 3
L1-norm:
4 𝑑𝑖𝑠𝑡(𝑥, 𝑦) = 4 + 3 = 7
x = (5,5)
L∞-norm:
𝑑𝑖𝑠𝑡(𝑥, 𝑦) = max 3,4 = 4
11
Common Properties of a Distance
• Distances, such as the Euclidean distance, have
some well known properties.
1. d(x, y) 0 for all x and y and d(x, y) = 0 if and only if
x = y.
2. d(x, y) = d(y, x) for all x and y. (Symmetry)
3. d(x, z) d(x, y) + d(y, z) for all points x, y, and z.
(Triangle Inequality)
where d(x, y) is the distance (dissimilarity) between points
(data objects), x and y.
• Example:
d1 = 3 2 0 5 0 0 0 2 0 0
d2 = 1 0 0 0 0 0 0 1 0 2
<d1, d2> ( d1 • d2 ) =3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
|| d1 || = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481
|| d2 || = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2) 0.5 = (6) 0.5 = 2.449
cos(d1, d2 ) = 0.3150
Common Properties of a Similarity
• Similarities, also have some well known
properties.
1. s(x, y) = 1 (or maximum similarity) only if x = y.
2. s(x, y) = s(y, x) for all x and y. (Symmetry)
• Cosine distance:
𝐷𝑖𝑠𝑡(𝑋, 𝑌) = 1 − cos(𝑋, 𝑌)