0% found this document useful (0 votes)

69 views30 pages

Cluster Analysis in Python Chapter4 PDF

This document discusses using k-means clustering to identify dominant colors in images. It explains that images are made up of pixels with red, green, and blue values and k-means can be used to cluster the RGB values and find cluster centers representing dominant colors. It also provides code samples for converting an image to pixels, clustering the pixels, and displaying the dominant color clusters.

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views30 pages

Cluster Analysis in Python Chapter4 PDF

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Dominant colors in

images
C L U S T E R A N A LY S I S I N P Y T H O N

Shaumik Daityari
Business Analyst
Dominant colors in images
All images consist of pixels

Each pixel has three values: Red, Green and Blue

Pixel color: combination of these RGB values

Perform k-means on standardized RGB values

to nd cluster centers
Source
Uses: Identifying features in satellite images

CLUSTER ANALYSIS IN PYTHON

Feature identi cation in satellite images

Source

CLUSTER ANALYSIS IN PYTHON

Tools to nd dominant colors
Convert image to pixels: matplotlib.image.imread

Display colors of cluster centers: matplotlib.pyplot.imshow

CLUSTER ANALYSIS IN PYTHON

CLUSTER ANALYSIS IN PYTHON
Convert image to RGB matrix
import matplotlib.image as img
image = img.imread('sea.jpg')
image.shape

(475, 764, 3)

r = []
g = []
b = []

for row in image:

for pixel in row:
# A pixel contains RGB values
temp_r, temp_g, temp_b = pixel
r.append(temp_r)
g.append(temp_g)
b.append(temp_b)

CLUSTER ANALYSIS IN PYTHON

Data frame with RGB values
pixels = pd.DataFrame({'red': r,
'blue': b,
'green': g})
pixels.head()

red blue green

252 255 252

75 103 81

... ... ...

CLUSTER ANALYSIS IN PYTHON

Create an elbow plot
distortions = []
num_clusters = range(1, 11)

# Create a list of distortions from the kmeans method

for i in num_clusters:
cluster_centers, _ = kmeans(pixels[['scaled_red', 'scaled_blue',
'scaled_green']], i)
distortions.append(distortion)

# Create a data frame with two lists - number of clusters and distortions
elbow_plot = pd.DataFrame({'num_clusters': num_clusters,
'distortions': distortions})

# Creat a line plot of num_clusters and distortions

sns.lineplot(x='num_clusters', y='distortions', data = elbow_plot)
plt.xticks(num_clusters)
plt.show()

CLUSTER ANALYSIS IN PYTHON

Elbow plot

CLUSTER ANALYSIS IN PYTHON

Find dominant colors
cluster_centers, _ = kmeans(pixels[['scaled_red', 'scaled_blue',
'scaled_green']], 2)

colors = []

# Find Standard Deviations

r_std, g_std, b_std = pixels[['red', 'blue', 'green']].std()

# Scale actual RGB values in range of 0-1

for cluster_center in cluster_centers:
scaled_r, scaled_g, scaled_b = cluster_center
colors.append((
scaled_r * r_std/255,
scaled_g * g_std/255,
scaled_b * b_std/255
))

CLUSTER ANALYSIS IN PYTHON

Display dominant colors
#Dimensions: 2 x 3 (N X 3 matrix)
print(colors)

[(0.08192923122023911, 0.34205845943857993, 0.2824002984155429),

(0.893281510956742, 0.899818770315129, 0.8979114272960784)]

#Dimensions: 1 x 2 x 3 (1 X N x 3 matrix)
plt.imshow([colors])
plt.show()

CLUSTER ANALYSIS IN PYTHON

Next up: exercises
C L U S T E R A N A LY S I S I N P Y T H O N
Document clustering
C L U S T E R A N A LY S I S I N P Y T H O N

Shaumik Daityari
Business Analyst
Document clustering: concepts
1. Clean data before processing

2. Determine the importance of the terms in a document (in TF-IDF matrix)

3. Cluster the TF-IDF matrix

4. Find top terms, documents in each cluster

CLUSTER ANALYSIS IN PYTHON

Clean and tokenize data
Convert text into smaller parts called tokens, clean data for processing
from nltk.tokenize import word_tokenize
import re

def remove_noise(text, stop_words = []):

tokens = word_tokenize(text)
cleaned_tokens = []
for token in tokens:
token = re.sub('[^A-Za-z0-9]+', '', token)
if len(token) > 1 and token.lower() not in stop_words:
# Get lowercase
cleaned_tokens.append(token.lower())
return cleaned_tokens
remove_noise("It is lovely weather we are having.
I hope the weather continues.")

['lovely', 'weather', 'hope', 'weather', 'continues']

CLUSTER ANALYSIS IN PYTHON

Document term matrix and sparse matrices
Document term matrix formed Sparse matrix is created

Most elements in matrix are zeros

Source

CLUSTER ANALYSIS IN PYTHON

TF-IDF (Term Frequency - Inverse Document
Frequency)
A weighted measure: evaluate how important a word is to a document in a collection

from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=50,

min_df=0.2, tokenizer=remove_noise)
tfidf_matrix = tfidf_vectorizer.fit_transform(data)

CLUSTER ANALYSIS IN PYTHON

Clustering with sparse matrix
kmeans() in SciPy does not support sparse matrices

Use .todense() to convert to a matrix

cluster_centers, distortion = kmeans(tfidf_matrix.todense(), num_clusters)

CLUSTER ANALYSIS IN PYTHON

Top terms per cluster
Cluster centers: lists with a size equal to the number of terms

Each value in the cluster center is its importance

Create a dictionary and print top terms

terms = tfidf_vectorizer.get_feature_names()

for i in range(num_clusters):
center_terms = dict(zip(terms, list(cluster_centers[i])))
sorted_terms = sorted(center_terms, key=center_terms.get, reverse=True)
print(sorted_terms[:3])

['room', 'hotel', 'staff']

['bad', 'location', 'breakfast']

CLUSTER ANALYSIS IN PYTHON

More considerations
Work with hyperlinks, emoticons etc.

Normalize words (run, ran, running -> run)

.todense() may not work with large datasets

CLUSTER ANALYSIS IN PYTHON

Next up: exercises!
C L U S T E R A N A LY S I S I N P Y T H O N
Clustering with
multiple features
C L U S T E R A N A LY S I S I N P Y T H O N

Shaumik Daityari
Business Analyst
Basic checks
# Cluster centers
print(fifa.groupby('cluster_labels')[['scaled_heading_accuracy',
'scaled_volleys', 'scaled_finishing']].mean())

cluster_labels scaled_heading_accuracy scaled_volleys scaled_ nishing

0 3.21 2.83 2.76

1 0.71 0.64 0.58

# Cluster sizes
print(fifa.groupby('cluster_labels')['ID'].count())

cluster_labels count

0 886

CLUSTER ANALYSIS IN PYTHON

Visualizations
Visualize cluster centers

Visualize other variables for each cluster

# Plot cluster centers

fifa.groupby('cluster_labels') \
[scaled_features].mean()
.plot(kind='bar')
plt.show()

CLUSTER ANALYSIS IN PYTHON

Top items in clusters
# Get the name column of top 5 players in each cluster
for cluster in fifa['cluster_labels'].unique():
print(cluster, fifa[fifa['cluster_labels'] == cluster]['name'].values[:5])

Cluster Label Top Players

0 ['Cristiano Ronaldo' 'L. Messi' 'Neymar' 'L. Suárez' 'R. Lewandowski']

1 ['M. Neuer' 'De Gea' 'G. Buffon' 'T. Courtois' 'H. Lloris']

CLUSTER ANALYSIS IN PYTHON

Feature reduction
Factor analysis

Multidimensional scaling

CLUSTER ANALYSIS IN PYTHON

Final exercises!
C L U S T E R A N A LY S I S I N P Y T H O N
Farewell!
C L U S T E R A N A LY S I S I N P Y T H O N

Shaumik Daityari
Business Analyst
What comes next?
Clustering is one of the exploratory steps

More courses on DataCamp

Practice, practice, practice!

CLUSTER ANALYSIS IN PYTHON

Until next time
C L U S T E R A N A LY S I S I N P Y T H O N

Cluster Analysis For Researcher - Charles Romesburg PDF
No ratings yet
Cluster Analysis For Researcher - Charles Romesburg PDF
244 pages
Credit Risk Modeling in Python Chapter3
No ratings yet
Credit Risk Modeling in Python Chapter3
35 pages
Sample Epms (Map Lna - PPCR With Epms) - Updated100917
No ratings yet
Sample Epms (Map Lna - PPCR With Epms) - Updated100917
5 pages
Introduction To Data Visualization With Seaborn Chapter3
100% (1)
Introduction To Data Visualization With Seaborn Chapter3
32 pages
Designing Machine Learning Workflows in Python Chapter2
No ratings yet
Designing Machine Learning Workflows in Python Chapter2
39 pages
People V Carandang
67% (3)
People V Carandang
2 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
Cluster Analysis in Python Chapter1 PDF
No ratings yet
Cluster Analysis in Python Chapter1 PDF
31 pages
Dominant Colors in Images: Shaumik Daityari
No ratings yet
Dominant Colors in Images: Shaumik Daityari
30 pages
Unit 3 Unsupervised Learning
No ratings yet
Unit 3 Unsupervised Learning
9 pages
Cluster Analysis in Python Chapter2 PDF
No ratings yet
Cluster Analysis in Python Chapter2 PDF
30 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Fuzzy Clustering Toolbox
No ratings yet
Fuzzy Clustering Toolbox
77 pages
(Balasko, Dkk. 2007) Fuzzy Clustering
No ratings yet
(Balasko, Dkk. 2007) Fuzzy Clustering
77 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
CH 3 2
No ratings yet
CH 3 2
17 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
20 ENG 016 Assignment 8
No ratings yet
20 ENG 016 Assignment 8
4 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Unit IV
No ratings yet
Unit IV
96 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Seminar 10
No ratings yet
Seminar 10
3 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Clustering Course Slides
No ratings yet
Clustering Course Slides
26 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
23 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
23 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
ML Python Exercises UOM BDS Cluster Analysis
No ratings yet
ML Python Exercises UOM BDS Cluster Analysis
8 pages
Course Notes Solutions Answers Image Processing in Python
No ratings yet
Course Notes Solutions Answers Image Processing in Python
99 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
78 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
Unit 2 - Introduction To Cluster Analysis
No ratings yet
Unit 2 - Introduction To Cluster Analysis
53 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
K Means Clustering
No ratings yet
K Means Clustering
78 pages
BDU - Document - Dominant Color in An Image Using K
No ratings yet
BDU - Document - Dominant Color in An Image Using K
46 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
DM 4
No ratings yet
DM 4
76 pages
51 DA5400 - FML51 - 20250501 ProblemSet06
No ratings yet
51 DA5400 - FML51 - 20250501 ProblemSet06
4 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
10 Clus Basic
No ratings yet
10 Clus Basic
66 pages
2020 06-06-02 Hierarchical Clustering - Ipynb Colab
No ratings yet
2020 06-06-02 Hierarchical Clustering - Ipynb Colab
5 pages
UNIT5
No ratings yet
UNIT5
60 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
Spoken Language Processing in Python Chapter3
No ratings yet
Spoken Language Processing in Python Chapter3
26 pages
Preparing Your Gures To Share With Others: Ariel Rokem
No ratings yet
Preparing Your Gures To Share With Others: Ariel Rokem
35 pages
Spoken Language Processing in Python Chapter4
No ratings yet
Spoken Language Processing in Python Chapter4
46 pages
Spoken Language Processing in Python Chapter1
No ratings yet
Spoken Language Processing in Python Chapter1
17 pages
Changing Plot Style and Color: Erin Case
No ratings yet
Changing Plot Style and Color: Erin Case
54 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
36 pages
Spoken Language Processing in Python Chapter2
No ratings yet
Spoken Language Processing in Python Chapter2
23 pages
Introduction To Data Visualization With Matplotlib Chapter2
No ratings yet
Introduction To Data Visualization With Matplotlib Chapter2
27 pages
Introduction To Data Visualization With Matplotlib: Ariel Rokem
No ratings yet
Introduction To Data Visualization With Matplotlib: Ariel Rokem
30 pages
Cleaning Data With PySpark Chapter4
No ratings yet
Cleaning Data With PySpark Chapter4
23 pages
Introduction To Data Visualization With Seaborn Chapter2
No ratings yet
Introduction To Data Visualization With Seaborn Chapter2
38 pages
Designing Machine Learning Workflows in Python Chapter1
No ratings yet
Designing Machine Learning Workflows in Python Chapter1
32 pages
Introduction To Data Visualization With Seaborn Chapter1
No ratings yet
Introduction To Data Visualization With Seaborn Chapter1
26 pages
Customer Segmentation in Python Chapter4
No ratings yet
Customer Segmentation in Python Chapter4
37 pages
Designing Machine Learning Workflows in Python Chapter4
No ratings yet
Designing Machine Learning Workflows in Python Chapter4
38 pages
Designing Machine Learning Workflows in Python Chapter3
No ratings yet
Designing Machine Learning Workflows in Python Chapter3
42 pages
Customer Segmentation in Python Chapter3
No ratings yet
Customer Segmentation in Python Chapter3
25 pages
Credit Risk Modeling in Python Chapter4
100% (1)
Credit Risk Modeling in Python Chapter4
35 pages
Analyzing IoT Data in Python Chapter4
No ratings yet
Analyzing IoT Data in Python Chapter4
34 pages
Cleaning Data With PySpark Chapter3
No ratings yet
Cleaning Data With PySpark Chapter3
25 pages
Cleaning Data With PySpark Chapter1
0% (1)
Cleaning Data With PySpark Chapter1
20 pages
Cleaning Data With PySpark Chapter2
100% (1)
Cleaning Data With PySpark Chapter2
25 pages
Analyzing IoT Data in Python Chapter2
No ratings yet
Analyzing IoT Data in Python Chapter2
35 pages
Building Chatbots in Python Chapter4
No ratings yet
Building Chatbots in Python Chapter4
20 pages
Building Chatbots in Python Chapter2 PDF
No ratings yet
Building Chatbots in Python Chapter2 PDF
41 pages
Analyzing IoT Data in Python Chapter1
100% (1)
Analyzing IoT Data in Python Chapter1
27 pages
Analyzing IoT Data in Python Chapter3
No ratings yet
Analyzing IoT Data in Python Chapter3
30 pages
Appendix A 2
No ratings yet
Appendix A 2
7 pages
FIBA 2019NationalRefereeCurriculum L2 en
No ratings yet
FIBA 2019NationalRefereeCurriculum L2 en
156 pages
Glorious Things of Thee Are Spoken
No ratings yet
Glorious Things of Thee Are Spoken
1 page
Integrated Social Work
33% (6)
Integrated Social Work
8 pages
Catalogo de Decapodos
No ratings yet
Catalogo de Decapodos
394 pages
Hindu Dance: Bharata Natyam
100% (1)
Hindu Dance: Bharata Natyam
10 pages
Electrical Installation Theory and Practice by e L Donnelly
100% (1)
Electrical Installation Theory and Practice by e L Donnelly
1 page
Workplace Childcare (Organization Psychology)
No ratings yet
Workplace Childcare (Organization Psychology)
19 pages
UTF-8gov Uscourts Med 67924 1 0-1
No ratings yet
UTF-8gov Uscourts Med 67924 1 0-1
31 pages
Class 2 Bridge Course
No ratings yet
Class 2 Bridge Course
6 pages
Set Design Checklist1
50% (2)
Set Design Checklist1
4 pages
Common Printer Problem
No ratings yet
Common Printer Problem
17 pages
Solids Lesson Plan
No ratings yet
Solids Lesson Plan
5 pages
2 The Tortoise and The Hare
No ratings yet
2 The Tortoise and The Hare
8 pages
Unit 8 Reading Comprehension
No ratings yet
Unit 8 Reading Comprehension
4 pages
Recurrent Pneumonia Final2
No ratings yet
Recurrent Pneumonia Final2
81 pages
Human Resource Management Thesis Philippines
100% (2)
Human Resource Management Thesis Philippines
6 pages
Psychoanalytic Literary Theory and Criticism Presentation
No ratings yet
Psychoanalytic Literary Theory and Criticism Presentation
17 pages
Advances in Mechanical Engineering ME 702
No ratings yet
Advances in Mechanical Engineering ME 702
2 pages
Iso 45009
No ratings yet
Iso 45009
30 pages
Importance of Technology Transfer
80% (10)
Importance of Technology Transfer
6 pages
Amphibian and Reptile Adaptations To The Environment Interplay Between Physiology and Behavior (Andrade, Denis Vieira de Bevier Etc.) (Z-Library)
No ratings yet
Amphibian and Reptile Adaptations To The Environment Interplay Between Physiology and Behavior (Andrade, Denis Vieira de Bevier Etc.) (Z-Library)
222 pages
Introduction To Mechanisms: 2 Mechanisms and Simple Machines
No ratings yet
Introduction To Mechanisms: 2 Mechanisms and Simple Machines
6 pages
People Express Airlines: Case Analysis: Group 5 (Section C)
No ratings yet
People Express Airlines: Case Analysis: Group 5 (Section C)
20 pages
Assemblage Sculpture
No ratings yet
Assemblage Sculpture
21 pages
Bridge Development Offer For Willow Springs Land
No ratings yet
Bridge Development Offer For Willow Springs Land
3 pages
Items Description of Module: Subject Name Paper Name Module Title Module Id Pre-Requisites Objectives Keywords
No ratings yet
Items Description of Module: Subject Name Paper Name Module Title Module Id Pre-Requisites Objectives Keywords
8 pages
Italiano Amare
No ratings yet
Italiano Amare
3 pages