Running Head:: Data Mining 1

Running Head: DATA MINING 1
Data Mining
Student’s Name:
Institutional Affiliation:
DATA MINING 2
Part 1
1. What is K-means from a basic standpoint?
This is a clustering technique that encompasses vector quantization. Its main aim is to
partition and make observations into k-clusters.
2. What are the various types of clusters, and why is the distinction important?
There are different clusters, including partitioning, hierarchical, Density-Based
Clustering, and distribution Model-Based Clustering.
Partitioning Clustering
The technique usually divides a set of data into a set number of groups. The method is
also referred to as the centroid-based technique (Tan et al., 2016). In this method, the cluster
centroid is formed, ensuring the distance of data points in that particular cluster is minimum,
especially when calculated with other centroids.
Hierarchical Clustering
The technique divides data set into numerous clusters in which the user fails to specify
cluster numbers to be generated before training the model (Tan et al., 2016). The method is also
referred to as a connectivity-based technique.
Density-Based Clustering
DATA MINING 3
This is the most commonly used clustering technique and is mainly formed by
segregating different density regions.
Distribution Model-Based Clustering
It encompasses the identification of probability of all data points in a cluster from similar
distribution results in this technique's formation.
3. What are the strengths and weaknesses of K-means?
Strengths of k-means
First, the method is relatively simple to execute. Besides, it scales to vast data sets. The
technique also guarantees convergence and easily adapts to new examples. Lastly, it generalizes
to clusters of different sizes and shapes, for instance, elliptical clusters.
Weaknesses
One of the weaknesses of K-means is being dependent on initial values. Consequently,
the technique chooses manually and utilizes the "Loss vs. Clusters" plot to determine the optimal
(k) (Tan et al., 2016). Moreover, clustering data do vary in size and density. Lastly, the technique
has troubles, especially in clustering data where clusters are of different sizes and densities.
4. What is a cluster evaluation?
The method encompasses sharing of both mutual problems solving and successes across a
cluster of projects.
DATA MINING 4
Select at least two types of cluster evaluation and discuss the concepts of each method.
Fuzzy Clustering
Typically, fuzzy Clustering is a clustering evaluation technique in which every data point
belongs to more than a cluster (Tan et al., 2016). The clustering evaluation technique entails
assigning the data points to the sets so that items in a similar set are alike in every way possible.
However, items that do belong to different clusters are not identical.
Constraint-based (Supervised Clustering)
It belongs to semi-supervised learning algorithms and encompasses cannot-link
constraints, must-link constraints, or the two with a data clustering algorithm. Consequently, the
two define a correlation between two data instances (Tan et al., 2016). For instance, a must-link
constraint is utilized in the specification of two cases. In contrast, a cannot-link constraint is used
to specify two cases that should not be associated with a similar cluster.
Part 2
1. What is the definition of data mining that the author mentions? How is this
different from our current understanding of data mining?
Križanić (2020) has provided a tentative definition of data mining in the case study.
According to the author, data mining involves integrating various efficient methods for analyzing
a large and complex collection of data (Križanić, 2020). Thus, data mining also involves
extracting useful and unexpected data patterns.

DATA MINING 5
Consequently, the current understanding of data mining involves the extraction of usable
data from a complex and larger set or collection of raw data. In other words, data mining
involves the analysis of trends and data patterns in the large collection by integrating software
tools. Currently, data mining is highly applicable in data warehousing, data collection, and
computer processing. More importantly, the current understanding of data mining involves the
techniques used in data extraction in various ways such as spam Email filtering, fraud detection,
credit risk management, and database marketing.
2. What is the premise of the use case and findings?
Krizanic (2020) eludes that data mining is highly applicable in the education setting for
higher education institutions in Croatia. Therefore, educational data mining has been integrated
with big data to demonstrate how students’ actions and behavior in e-courses (Križanić, 2020).
The premise of the use case is that educational data mining can be justified through the use of
decision tree technique and cluster analysis as data mining approaches. The case also used event
logs downloaded from an e-learning environment for analyzing student behavior (Križanić,
2020). Thus, data mining was used to analyze student’s achievement via midterm exams based
on their behavior in e-course, thereby justifying the findings that students performed better in
mid-term exams after accessing learning materials for the lectures.
3. What type of tools are used in the use case's data mining aspect, and how are they
used?
Data mining uses many tools such as Teradata, python, SPSS, SAS, Oracle data mining
and many more. These tools are based on data mining techniques such as classification analysis,
association rule learning and anomaly. More importantly, they are based on clustering analysis,
decision tree, and regression analysis.

DATA MINING 6
The case study by Krizanic (2020) mainly uses cluster analysis and decision trees in the
data mining aspect. Notably, cluster analysis was executed by organizing pattern collections into
a group based on students' similarity of behavior in using course materials. In addition to that,
the decision tree was the critical technique of interest in the generation of a representation of
resolution-making that enabled defining different classes of objects for the sole purpose of
deeper evaluation of how students learned. The cluster analysis tool is mainly used in the
identification of similar patterns of behavior. Decision trees are easy to comprehend and are well
adapted to classifying issues. Consequently, they suffer from data sensitivity employed in their
construction and are deemed a less natural regression model. The key benefit associated with
decision trees is that there are many efficient algorithms, which makes it easy to find
approximate optimal tree architectures.
4. Were the tools used appropriately for the use case? Why or why not?
In my perspective, I believe the tools used in the case were appropriate. This is because
the cluster analysis tool played a critical part by organizing different collections of patterns into
distinct groups based on the similarity of the student's behavior. On the other hand, the decision
tree helped generate a representation of resolution-making, which enabled definitions of classes
of objects for deeper analysis.

DATA MINING 7
References
Križanić, S. (2020). Educational data mining using cluster analysis and decision tree technique:
A case study. International Journal of Engineering Business Management, 12,
1847979020908675.
Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson Education
India.

Running Head:: Data Mining 1

Uploaded by

Copyright:

Available Formats

Running Head:: Data Mining 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Running Head:: Data Mining 1

Uploaded by

Copyright:

Available Formats

Running Head: DATA MINING 1

1. What is K-means from a basic standpoint?

partition and make observations into k-clusters.

There are different clusters, including partitioning, hierarchical, Density-Based

Clustering, and distribution Model-Based Clustering.

especially when calculated with other centroids.

referred to as a connectivity-based technique.

segregating different density regions.

Distribution Model-Based Clustering

distribution results in this technique's formation.

3. What are the strengths and weaknesses of K-means?

to clusters of different sizes and shapes, for instance, elliptical clusters.

One of the weaknesses of K-means is being dependent on initial values. Consequently,

4. What is a cluster evaluation?

However, items that do belong to different clusters are not identical.

Constraint-based (Supervised Clustering)

It belongs to semi-supervised learning algorithms and encompasses cannot-link

different from our current understanding of data mining?

extracting useful and unexpected data patterns.

credit risk management, and database marketing.

2. What is the premise of the use case and findings?

mid-term exams after accessing learning materials for the lectures.

decision tree, and regression analysis.

approximate optimal tree architectures.

tree helped generate a representation of resolution-making, which enabled definitions of classes

of objects for deeper analysis.

A case study. International Journal of Engineering Business Management, 12,

You might also like