Long Quiz 3
Long Quiz 3
Long Quiz 3
I. Sanity checks by validating the decision with decision experts will determine if the decision
rules are sound
The most informative method is chosen using an entropy based-methods. Entropy is a measure
of what characteristics of the attribute?
The following is always true about the k-means algorithm EXCEPT
I. To remedy the issue on unit measurement, rescaling of the attributes may be done by dividing
each attribute value by its standard deviation.
II. The choice for the unit of measurement of a particular object is important because it directly
affects the cluster membership of the data points.
Assuming 1,000 transactions, with {bread,wine} appearing in 450 of them, {wine} appearing in
600, and {bread} appearing in 500, then the lift (bread wine) is equal to
Which of the following is ALWAYS TRUE about considerations regarding the implementation of
k-means?
II. The k-means algorithm is sensitive to the starting positions of the initial centroid.
Which of the following are possible questions that association rules can address?
Assuming 1,000 transactions, with {bread,wine} appearing in 450 of them, {wine} appearing in
600, and {bread} appearing in 500, then the confidence (bread wine) is equal to
Mathematically, this is the percent of transactions that contain a particular itemset.
Refer to the figure below, give the number of leaves and the maximum depth of the decision
tree?
The following is an application of clustering EXCEPT
Which of the following is ALWAYS TRUE about the considerations regarding the object
attributes when performing cluster analysis?
I. On the choice of which attributes to use, it is important to understand what attributes will be
known at the time a new object will be assigned to a cluster.
II. Whenever possible and based on the data, it is best to have as many attributes as possible.
This approach to improve apriori algorithm add new candidate item sets only when all of their
subsets are estimated to be frequent.
II. The goal in the association rule is to discover interesting relationships hidden in a large
dataset.
II. Clustering methods find the similarities between objects according to the object attributes
and group the similar objects into clusters.
The following is a result from when running means on a particular dataset on 620 high school
students with attributes regarding their grades on English, Math and Science. Based on this
result, to which cluster does a student belong whose grade for English, Math and Science are
75, 70 and 72 respectively?
Which of the following is TRUE about apriori algorithm?
II. Apriori algorithm utilizes ‘pruning’ to control the exponential growth of candidate itemsets.
Which of the following is TRUE about unsupervised learning?
I. Unsupervised learning refers to the problem of finding hidden structures within unlabeled
data.