Running Head:: Data Mining 1
Running Head:: Data Mining 1
Running Head:: Data Mining 1
Data Mining
Student’s Name:
Institutional Affiliation:
DATA MINING 2
Part 1
This is a clustering technique that encompasses vector quantization. Its main aim is to
2. What are the various types of clusters, and why is the distinction important?
Partitioning Clustering
The technique usually divides a set of data into a set number of groups. The method is
also referred to as the centroid-based technique (Tan et al., 2016). In this method, the cluster
centroid is formed, ensuring the distance of data points in that particular cluster is minimum,
Hierarchical Clustering
The technique divides data set into numerous clusters in which the user fails to specify
cluster numbers to be generated before training the model (Tan et al., 2016). The method is also
Density-Based Clustering
DATA MINING 3
This is the most commonly used clustering technique and is mainly formed by
It encompasses the identification of probability of all data points in a cluster from similar
Strengths of k-means
First, the method is relatively simple to execute. Besides, it scales to vast data sets. The
technique also guarantees convergence and easily adapts to new examples. Lastly, it generalizes
Weaknesses
the technique chooses manually and utilizes the "Loss vs. Clusters" plot to determine the optimal
(k) (Tan et al., 2016). Moreover, clustering data do vary in size and density. Lastly, the technique
has troubles, especially in clustering data where clusters are of different sizes and densities.
The method encompasses sharing of both mutual problems solving and successes across a
cluster of projects.
DATA MINING 4
Select at least two types of cluster evaluation and discuss the concepts of each method.
Fuzzy Clustering
Typically, fuzzy Clustering is a clustering evaluation technique in which every data point
belongs to more than a cluster (Tan et al., 2016). The clustering evaluation technique entails
assigning the data points to the sets so that items in a similar set are alike in every way possible.
constraints, must-link constraints, or the two with a data clustering algorithm. Consequently, the
two define a correlation between two data instances (Tan et al., 2016). For instance, a must-link
constraint is utilized in the specification of two cases. In contrast, a cannot-link constraint is used
to specify two cases that should not be associated with a similar cluster.
Part 2
1. What is the definition of data mining that the author mentions? How is this
Križanić (2020) has provided a tentative definition of data mining in the case study.
According to the author, data mining involves integrating various efficient methods for analyzing
a large and complex collection of data (Križanić, 2020). Thus, data mining also involves
Consequently, the current understanding of data mining involves the extraction of usable
data from a complex and larger set or collection of raw data. In other words, data mining
involves the analysis of trends and data patterns in the large collection by integrating software
tools. Currently, data mining is highly applicable in data warehousing, data collection, and
computer processing. More importantly, the current understanding of data mining involves the
techniques used in data extraction in various ways such as spam Email filtering, fraud detection,
Krizanic (2020) eludes that data mining is highly applicable in the education setting for
higher education institutions in Croatia. Therefore, educational data mining has been integrated
with big data to demonstrate how students’ actions and behavior in e-courses (Križanić, 2020).
The premise of the use case is that educational data mining can be justified through the use of
decision tree technique and cluster analysis as data mining approaches. The case also used event
logs downloaded from an e-learning environment for analyzing student behavior (Križanić,
2020). Thus, data mining was used to analyze student’s achievement via midterm exams based
on their behavior in e-course, thereby justifying the findings that students performed better in
3. What type of tools are used in the use case's data mining aspect, and how are they
used?
Data mining uses many tools such as Teradata, python, SPSS, SAS, Oracle data mining
and many more. These tools are based on data mining techniques such as classification analysis,
association rule learning and anomaly. More importantly, they are based on clustering analysis,
The case study by Krizanic (2020) mainly uses cluster analysis and decision trees in the
data mining aspect. Notably, cluster analysis was executed by organizing pattern collections into
a group based on students' similarity of behavior in using course materials. In addition to that,
the decision tree was the critical technique of interest in the generation of a representation of
resolution-making that enabled defining different classes of objects for the sole purpose of
deeper evaluation of how students learned. The cluster analysis tool is mainly used in the
identification of similar patterns of behavior. Decision trees are easy to comprehend and are well
adapted to classifying issues. Consequently, they suffer from data sensitivity employed in their
construction and are deemed a less natural regression model. The key benefit associated with
decision trees is that there are many efficient algorithms, which makes it easy to find
4. Were the tools used appropriately for the use case? Why or why not?
In my perspective, I believe the tools used in the case were appropriate. This is because
the cluster analysis tool played a critical part by organizing different collections of patterns into
distinct groups based on the similarity of the student's behavior. On the other hand, the decision
References
Križanić, S. (2020). Educational data mining using cluster analysis and decision tree technique:
1847979020908675.
Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson Education
India.