0% found this document useful (0 votes)

48 views12 pages

Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach

This document describes a new top-down search strategy and algorithm called TD-Close for mining frequent closed patterns from very high dimensional data, such as microarray data. TD-Close uses a top-down approach to fully leverage the pruning power of the minimum support threshold, dramatically reducing the search space compared to bottom-up methods. It also introduces an efficient closeness-checking method that avoids multiple dataset scans. Experimental results show TD-Close outperforms other algorithms for high dimensional mining problems.

Uploaded by

Yog Sothoth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views12 pages

Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach

Uploaded by

Yog Sothoth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Mining Frequent Patterns from Very High Dimensional Data:

A Top-Down Row Enumeration Approach*

Hongyan Liu1 Jiawei Han2 Dong Xin2 Zheng Shao2
1
Department of Management Science and Engineering, Tsinghua University
hyliu@tsinghua.edu.cn
2
Department of Computer Science, University of Illinois at Urbana-Champaign
{hanj, dongxin, zshao1}@uiuc.edu

Abstract i.e., microarray data. Different from transactional

data set, microarray data usually does not have so
Data sets of very high dimensionality, such as many rows (samples) but have a large number of
microarray data, pose great challenges on efficient columns (genes). This kind of very high dimensional
processing to most existing data mining algorithms. data needs data mining techniques to discover
Recently, there comes a row-enumeration method that interesting knowledge from it. For example, frequent
performs a bottom-up search of row combination pattern mining algorithm can be used to find co-
space to find corresponding frequent patterns. Due to regulated genes or gene groups [2, 14]. Association
a limited number of rows in microarray data, this rules based on frequent patterns can be used to build
method is more efficient than column enumeration- gene networks [9]. Classification and clustering
based algorithms. However, the bottom-up search algorithms are also applied on microarray data [3, 4,
strategy cannot take an advantage of user-specified 6]. Although there are many algorithms dealing with
minimum support threshold to effectively prune search transactional data sets that usually have a small number
space, and therefore leads to long runtime and much of dimensions and a large number of tuples, there are
memory overhead. few algorithms oriented to very high dimensional data
In this paper we propose a new search strategy, sets with a small number of tuples. Taking frequent-
top-down mining, integrated with a novel row- pattern mining as an example, most of the existing
enumeration tree, which makes full use of the pruning algorithms [1, 10, 11, 12, 13] are column enumeration-
power of the minimum support threshold to cut down based, which take column (item) combination space as
search space dramatically. Using this kind of search space. Due to the exponential number of
searching strategy, we design an algorithm, TD-Close, column combinations, this method is usually not
to find a complete set of frequent closed patterns from suitable for very high dimensional data.
very high dimensional data. Furthermore, an effective Recently, a row enumeration-based method [5] is
closeness-checking method is also developed that proposed to handle this kind of very high dimensional
avoids scanning the dataset multiple times. Our data. Based on this work, several algorithms have
performance study shows that the TD-Close algorithm been developed to find frequent closed patterns or
outperforms substantially both Carpenter, a bottom-up classification rules [5, 6, 7, 8]. As they search through
searching algorithm, and FPclose, a column the row enumeration space instead of column
enumeration-based frequent closed pattern mining enumeration space, these algorithms are much faster
algorithm. than their counterparts in very high dimensional data.
However, as this method exploits a bottom-up search
1 Introduction strategy to check row combinations from the smallest
to the largest, it cannot make full use of the minimum
With the development of bioinformatics, microarray support threshold to prune search space. As a result,
technology produces many gene expression data sets, experiments show that it often cannot run to
completion in a reasonable time for large microarray
*
This work was supported in part by the National Natural Science data, and it sometimes runs out of memory before
Foundation of China under Grant No. 70471006 and 70321001, and by completion. To solve these problems, we propose a
the U.S. National Science Foundation NSF IIS-02-09199 and IIS-03- new top-down search strategy for row enumeration-
08215.

280
based mining algorithm. To show its effectiveness, we relation. For a ri S, and a ij I, (ri, ij) R denotes
design an algorithm, called TD-Close, to mine a that ri contains ij, or ij is contained by ri.
complete set of frequent closed patterns and compare it Let TT be the transposed table of T, in which each
with two bottom-up search algorithms, Carpenter [5], row corresponds to an item ij and consists of a set of
and FPclose [15]. Here are the main contributions of rids which contain ij in T. For clarity, we call each
this paper: row of TT a tuple. Table TT is a triple (I, S, R), where
(1) A top-down search method and a novel row- R S I is a relation. For a rid ri S, and an item ij
enumeration tree are proposed to take advantage I, (ri, ij) R denotes that ij contains ri, or ri is contained
of the pruning power of minimum support by ij.
threshold. Our experiments and analysis show
that this cuts down the search space dramatically. Example 2.1 (Table and transposed table) Table 2.1
This is critical for mining high dimensional data, shows an example table T with 4 attributes (columns):
because the dataset is usually big, and without A, B, C and D. The corresponding transposed table TT
pruning the huge search space, one has to is shown in Table 2.2. For simplicity, we use number i
generate a very large set of candidate itemsets for (i = 1, 2, , n) instead of ri to represent each rid. In
checking. order to describe our search strategy and mining
algorithm clearly, we need to define an order of these
(2) A new method, called closeness-checking, is rows. In this paper, we define the numerical order of
developed to check efficiently and effectively rids as the order, i.e., a row j is greater than k if j > k.
whether a pattern is closed. Unlike other existing Let minimum support (denoted minsup) be set to 2.
closeness-checking methods, it does not need to All the tuples with the number of rids less than minsup
scan the mining data set, nor the result set, and is is deleted from TT. Table TT shown in Table 2.2 is
easy to integrate with the top-down search already pruned by minsup. This kind of pruning will
process. The correctness of this method is proved be further explained in the following sections.
by both theoretic proof and experimental results. In this paper we aim to discover the set of the
(3) An algorithm using the above two methods is frequent closed patterns. Some concepts related to it
designed and implemented to discover a complete are defined as follows.
set of frequent closed patterns. Experimental
results show that this algorithm is more efficient Table 2.1 An example table T
and uses less memory than bottom-up search
styled algorithms, Carpenter and FPclose. ri A B C D
The remaining of the paper is organized as follows. 1 a1 b1 c1 d1
In section 2, we present some preliminaries and the 2 a1 b1 c2 d2
mining task. In section 3, we describe our top-down 3 a1 b1 c1 d2
search strategy and compare it with the bottom-up
search strategy. We present the new algorithm in 4 a2 b1 c2 d2
section 4 and conduct experimental study in section 5. 5 a2 b2 c2 d3
Finally, we give the related work in section 6 and
conclude the study in section 7. Table 2.2 Transposed table TT of T

2 Preliminaries itemset rowset

a1 1, 2, 3
Let T be a discretized data table (or data set),
a2 4, 5
composed of a set of rows, S = {r1, r2, , rn}, where ri
(i = 1, , n) is called a row ID, or rid in short. Each b1 1, 2, 3, 4
row corresponds to a sample consisting of k discrete c1 1, 3
values or intervals, and I is the complete set of these
c2 2, 4, 5
values or intervals, I = {i1, i2, , im}. For simplicity,
we call each ij an item. We call a set of rids S S a d2 2, 3, 4
rowset, and a rowset having k rids a k-rowset. Definition 2.1 (Closure) Given an itemset I I and a
Likewise, we call a set of items I I an itemset. Hence, rowset S S, we define
a table T is a triple (S, I, R), where R S I is a r(I) = { ri S | ij I , (ri, ij) R }

281
i(S) = { ij I | ri S , (ri, ij) R } that along every search path, we search the row
Based on these definitions, we define C(I) as the enumeration space from small rowsets to large ones.
closure of an itemset I, and C(S) as the closure of a For example, first single rows, then 2-rowsets, , and
rowset S as follows: finally n-rowsets. Both depth-first and breadth-first
C(I) = i(r(I)) search of this tree belong to this search strategy.
C(S) = r(i(S)) In Figure 3.1, each node represents a rowset. Our
Note that definition 2.1 is applicable to both table T mining task is to discover all of the large closed
and TT. rowsets. So the main constraint for mining is the size
of rowset. Since it is monotonic in terms of bottom up
Definition 2.2 (Closed itemset and closed rowset) An search order, it is hard to prune the row enumeration
itemset I is called a closed itemset iff I = C(I). search space early. For example, suppose minsup is
Likewise, a rowset S is called a closed rowset iff S = set to 3, although obviously all of the nodes in the first
C(S). two levels from the root cannot satisfy this constraint,
Definition 2.3 (Frequent itemset and large rowset) these nodes still need to be checked [5, 6, 7, 8]. As a
Given an absolute value of user-specified threshold result, as the minsup increases, the time needed to
minsup, an itemset I is called frequent if |r(I)| minsup, complete the mining process cannot decrease rapidly.
and a rowset S is called large if |S| minsup, where This limits the application of this kind of algorithms to
|r(I)| is called the support of itemset I and minsup is real situations.
called the minimum support threshold. |S| is called the 123 1234 12345
124 1235
size of rowset S, and minsup is called minimum size 12
125 1245
threshold accordingly. Further, an itemset I is called
134 1345
frequent closed itemset if it is both closed and frequent. 13
Likewise, a rowset S is called large closed rowset if it 135
1
14 145
is both closed and large.
15 234 2345
Example 2.2 (Closed itemset and closed rowset) In 23 235
table 2.1, for an itemset {b1, c2}, r({b1, c2}) = {2, 4}, {} 2 24 245
and i({2, 4}) = {b1, c2, d2}, so C({b1, c2}) = {b1, c2, d2}.
Therefore, {b1, c2} is not a closed itemset. If minsup = 25
2, it is a frequent itemset. In table 2.2, for a rowset {1, 34 345
3
2}, i({1, 2}) = {a1, b1} and r({a1, b1}) = {1, 2, 3}, then 35
C(S) = {1, 2, 3}. So rowset {1, 2} is not a closed 4 45
rowset, but apparently {1, 2, 3} is. 5

Mining task: Originally, we want to find all of the Figure 3.1 Bottom-up row enumeration tree
frequent closed itemsets which satisfy the minimum
In addition, the memory cost for this kind of
support threshold minsup from table T. After
bottom-up search is also big. Take Carpenter as an
transposing T to transposed table TT, the mining task
example. Similar to several other algorithms [6, 7, 8],
becomes finding all of the large closed rowsets which
Carpenter uses a pointer list to point to each tuple
satisfy minimum size threshold minsup from table TT.
belonging to an x-conditional transposed table. For a
table with n rows, the maximum number of different
3 Top-down Search Strategy levels of pointer lists needed to remain in memory is n,
although among which the first (minsup 1) levels of
Before giving our top-down search strategy, we will pointer lists will not contribute to the final result.
first look at what is bottom-up search strategy used by
the previous mining algorithms [5, 6, 7, 8]. For These observations motivate the proposal of our
simplicity, we will use Carpenter as a representative method.
for this group of algorithms since they use the same
kind of search strategy. 3.2 Top-down Search Strategy

3.1 Bottom-up Search Strategy Almost all of the frequent pattern mining algorithms
dealing with the data set without transposition use the
Figure 3.1 shows a row enumeration tree that uses anti-monotonicity property of minsup to speed up the
the bottom-up search strategy. By bottom-up we mean mining process. For transposed data set, the minsup

282
constraint maps to the size of rowset. In order to stop and child table respectively. Following is the definition
further search when minsup is not satisfied, we of x-excluded transposed table.
propose to exploit a top-down search strategy instead
Definition 3.1 (x-excluded transposed table) Given a
of the bottom-up one. To do so, we design a row
rowset x = {ri1, ri2, , rik} with an order such that ri1 >
enumeration tree following this strategy, which is
ri2 > > rik, a minimum support threshold minsup and
shown in Figure 3.2. Contrary to the bottom-up search
its parent table TT|p, an x-excluded transposed table
strategy, the top-down searching strategy means that
TT|x is a table in which each tuple contains rids less
along each search path the rowsets are checked from
than any of rids in x, and at the same time contains all
large to small ones.
of the rids greater than any of rids in x. Rowset x is
In Figure 3.2, each node represents a rowset. We
called an excluded rowset.
define the level of root node as 0, and then the highest
level for a data set with n rows is (n 1). Example 3.1 (x-excluded transposed table) For
It is easy to see from the figure 3.2 that for a table transposed table TT shown in Table 2.2, two of its x-
with n rows, if the user specified minimum support excluded tranposed tables are shown in Tables 3.1 and
threshold is minsup, we do not need to search all of 3.2 respectively, assuming minsup = 2.
rowsets which are in levels greater than (n minsup) Table 3.1 shows an x-excluded tranposed table TT|54,
in the row enumeration tree. For example, suppose where x = {5, 4}. In this table, each tuple only
minsup = 3, for data set shown in Table 2.1, we can contains rids which are less then 4, and contains at
stop further search at level 2, because rowsets least two such rids as minsup is 2. Since the largest rid
represented by nodes at level 3 and 4 will not in the original data set is 5, it is not necessary for each
contribute to the set of frequent patterns at all. tuple to contain some other rids. Procedures to get this
table are shown in Example 3.2.
12 1
2
Table 3.2 is an x-excluded transposed table TT|4,
123 13 where x = {4}. Its parent table is the table shown in
23 3
Table 2.2. Each tuple in TT|4 must contain rid 5 as it is
14 4
124 greater than 4, and in the meantime must contain at
1234 24
134 34
least one rid less than 4 as minsup is set to 2. As a
result, in Table 2.2 only those tuples containing rid 5
234
15 5 can be a candidate tuple of TT|4. Therefore, only tuples
125 25 a2 and c2 satisfy this condition. But tuple a2 does not
12345 1235 135 satisfy minsup after excluding rid 4, so only one tuple
35
235 left in TT|4. Note, although the current size of tuple c2
45
in TT|4 is 1, its actual size is 2 since it contains rid 5
145
1245 which is not listed explicitly in the table.
245
1345
345 Table 3.1 TT|54
2345

Figure 3.2 Top-down row enumeration tree itemset rowset

With this row enumeration tree, we can also mine a1 1, 2, 3
the transposed table TT by divide-and-conquer method. b1 1, 2, 3
Each node of the tree in Figure 3.2 corresponds to a
sub-table. For example, the root represents the whole c1 1, 3
table TT, and then it can be divided into 5 sub-tables:
d2 2, 3
table without rid 5, table with 5 but without 4, table
with 45 but without 3, table with 345 but without 2,
and table with 2345 but without 1. Here 2345 Table 3.2 TT|4
represents the set of rows {2, 3, 4, 5}, and same holds
for 45 and 345. Each of these tables can be further itemset rowset
divided by the same rule. We call each of these sub- c2 2
tables x-excluded transposed table, where x is a rowset
which is excluded in the table. Tables corresponding to The x-excluded transposed table can be obtained by
a parent node and a child node are called parent table the following steps.
(1) Extract from TT or its direct parent table TT|p
each tuple containing all rids greater than ri1.

283
(2) For each tuple obtained in the first step, keep TT|54. Then we get the final TT|54 shown in
only rids less than rik. Table 3.1.
(3) Get rid of tuples containing less than (minsup
j) number of rids, where j is the number of Table 3.4 TT|54 without pruning
rids greater than ri1 in S.
The reason of the operation in step 3 will be given itemset rowset
in section 4. Note that the original transposed table
a1 1, 2, 3
corresponds to TT|, where is an empty rowset.
Figure 3.3 shows the corresponding excluded row b1 1, 2, 3
enumeration tree for the row enumeration tree shown c1 1, 3
in Figure 3.2. This tree shows the parent-child c2 2
relationship between the excluded rowsets.
d2 2, 3
543 5432 54321
542 5431 From definition 3.1 and the above procedure to get
54
541 5421 x-excluded transposed table we can see that the size of
532 5321 the excluded table will become smaller and smaller due
53 531 to the minsup threshold, so the search space will shrink
5
52 521 rapidly.
51 432 4321 As for the memory cost, in order to compare with
43 431
Carpenter, we also use pointer list to simulate the x-
{} 4 excluded transposed table. What is different is that
42 421
this pointer list keeps track of rowsets from the end of
41 each tuple of TT, and we also split it according to the
32 321 current rid. We will not discuss the detail of
3
31 implementation due to space limitation. However, what
2 21 is clear is that when we stop search at level (n
1 minsup), we do not need to spend more memory for all
Figure 3.3 Excluded row enumeration tree of the excluded transposed tables corresponding to
nodes at levels greater than (n minsup), and we can
Example 3.2 (Procedure to get x-excluded release the memory used for nodes along the current
transposed table) Take TT|54 as an example, here is the search path. Therefore, comparing to Carpenter, it is
step to get it. Table TT|5 shown in Table 3.3 is its more memory saving. This is also demonstrated in our
parent table. experimental study, as Carpenter often runs out of
memory before completion.
Table 3.3 TT|5
itemset rowset
4 Algorithm
a1 1, 2, 3 To mining frequent closed itemsets from high
b1 1, 2, 3, 4 dimensional data using the top-down search strategy,
we design an algorithm, TD-Close, and compare it
c1 1, 3
with the corresponding bottom-up based algorithm
c2 2, 4 Carpenter. In this section, we first present our new
d2 2, 3, 4 closeness-checking method and then describe the new
algorithm.
(1) Each tuple in table TT|5 is a candidate tuple
of TT|54 as there is no rid greater than 5 for 4.1 Closeness-Checking Method
the original data set.
(2) After excluding rids not less than 4, the table To avoid generating all the frequent itemsets during
is shown in Table 3.4. the mining process, it is important to perform the
(3) Since tuple c2 only contains one rid, it does closeness checking as early as possible during mining.
not satisfy minsup and is thus pruned from Thus an efficient closeness-checking method has been
developed, based on the following lemmas.

284
Lemma 4.1 Given an itemset I I and a rowset S S, tuples in TT|54 is 4. Table 4.1 shows TT|54 with this
the following two equations hold: kind of skip-rowsets.
r(I) = r(i(r(I))) (1) In Table 4.1, the first 2 tuples have the same rowset
i(S) = i(r(i(S))) (2) {1, 2, 3}. After merging these two tuples, it becomes
Table 4.2. The skip-rowset of this rowset becomes
Proof. Since r(I) is a set of rows that share a given set empty because the intersection of an empty set and any
of items, and i(S) is a set of items that is common to a other set is still empty. If the intersection result is
set of rows, according to this semantics and definition empty, it means that currently this rowset is the result
2.1, these two equations are obviously correct. of intersection of two tuples. When it is time to output
Lemma 4.2 In transposed table TT, a rowset S S is a rowset, this skip-rowset will be checked. If it is
closed iff it can be represented by an intersection of a empty, then it must be a closed rowset.
set of tuples, that is:
I I, s.t. S = r(I) = j r({ij}) Table 4.1 TT|54 with skip-rowset
where ij I, and I = {i1, i2, , il} itemset rowset skip-rowset
Proof. First, we prove that if S is closed then s = r(I)
holds. If S is closed, according to definition 2.2, S = a1 1, 2, 3
C(S) = r(i(S)), we can always set I = i(S), so S = r(I)
b1 1, 2, 3 4
holds. Now we need to prove that if s = r(I) holds then
S is closed. If s = r(I) holds for some I I , according c1 1, 3
to definition 2.1, C(S) = r(i(S)) = r(i(r(I))) holds. Then d2 2, 3 4
based on Lemma 4.1 we have r(i(r(I))) = r(I) = S, so
C(S) = S holds. Therefore S is closed.
Table 4.2 TT|54 after merge
Lemma 4.3 Given a rowset S S, in transposed table
TT, for every tuple ij containing S, which means ij itemset rowset skip-rowset
i(S), if S jr({ij}), where ij i(S), then S is not a1 b1 1, 2, 3
closed.
Proof. For tuple ij i(S), jr({ij}) S apparently c1 1, 3
holds. If S jr({ij}) holds, then jr({ij}) S holds, d2 2, 3 4
which means that there exists at least another one item,
say y, such that Sy = jr({ij}). So S r(I), that is S
Table 4.3 TT|4 with skip-rowset
r(i(S)). Therefore S is not closed.
itemset rowset skip-rowset
Lemmas 4.2 and 4.3 are the basis of our closeness- c2 2 4
checking method. In order to speed up this kind of
checking, we add some additional information for each
x-excluded transposed table. The third column of the 4.2 The TD-Close Algorithm
table shown in Tables 4.1 or 4.3 is just it. The so-
called skip-rowset is a set of rids which keeps track of Based on the top-down search strategy and the
of the rids that are excluded from the same tuple of all closeness-checking method, we design an algorithm,
of its parent tables. When two tuples in an x-excluded called TD-Close, to mine all of the frequent closed
transposed table have the same rowset, they will be patterns from table T.
merged to one tuple, and the intersection of Figure 4.1 shows the main steps of the algorithm. It
corresponding two skip-rowsets will become the begins with the transposition operation that transforms
current skip-rowset. table T to the transposed table TT. Then, after the
initialization of the set of frequent closed patterns FCP
Example 4.1 (skip-rowset and merge of x-excluded to empty set and excludedSize to 0, the subroutine
transposed table) In example 3.2, when we got TT|54 TopDownMine is called to deal with each x-excluded
from its parent TT|5, we excluded rid 4 from tuple b1 transposed table and find all of the frequent closed
and d2 respectively. The skip-rowset of these two itemsets. The General processing order of rowsets is
tuples in TT|5 should be empty as they do not contain equivalent to the depth-first search of the row
rid 5 in TT|. Therefore, the skip-rowset of these two enumeration tree shown in Figure 3.2.

285
Subroutine TopDownMine takes each x-excluded Pruning strategy 2: If an x-excluded transposed table
transposed table and another two variables, cMinsup contains only one tuple, it is not necessary to do
excludedSize, as parameters and checks each candidate further recursive call to deal with its child transposed
rowset of the x-excluded transposed table to see if it is tables.
closed. Candidate rowsets are those large rowsets that The reason for this pruning strategy is apparent.
occur at least once in table TT. Parameter cMinsup is Suppose the rowset corresponds to this tuple is S.
a dynamically changing minimum support threshold as From the itemset point of view, any child transposed
indicated in step 5, and excludedSize is the size of table of this current table will not produce any
rowset x. There are five main steps in this subroutine, different itemsets from the one corresponding to
which will be explained one by one as follows. rowset S. From the rowset point of view, each rowset
Si corresponding to each child transposed table of S is
Algorithm TD-Close a subset of S, and Si cannot be closed because r(i(Si))
Input: Table T, and minimum support threshold, minsup S holds, and therefore Si r(i(Si)) holds.
Output: A complete set of frequent closed patterns, FCP Of course, before return according to pruning
Method: strategy 2, the current rowset S might be a closed
1. Transform T into transposed table TT rowset, so if the skip-rowset is empty, we need to
2. Initialize FCP = and excludedSize = 0 output it first.
3. Call TopDownMine(TT|, minsup, excludedSize) Example 4.2 (pruning strategy 2) For table TT|4
shown in Table 4.3, there is only one tuple in this table.
Subroutine TopDownMine(TT|x, cMinsup, excludedSize) After we check this tuple to see if it is closed (it is not
Method:
closed apparently as its skip-rowset is not empty), we
1. Pruning 1: if excludedSize >= (nminsup) return;
do not need to produce any child excluded transposed
2. Pruning 2: If the size of TT|x is 1, output the
table from it. That is, according to excluded row
corresponding itemset if the rowset is closed, and
then return.
enumeration tree shown in Figure 3.3, all of the child
nodes of node {4} are pruned. This is because all of
3. Pruning 3: Derive TT|xy and TT|x, where
the subsets of the current rowset cannot be closed
y is the largest rid among rids in tuples of TT|x, anymore since it is already contained by a larger
TT|x = {tuple ti | ti TT|x and ti contains y}, rowset.
TT|xy = {tuple ti | ti TT|x and if ti contains y, Step 3 is performed to derive from TT|x two child
size of ti must be greater than cMinsup} excluded transposed tables: TT|xy and TT|x, where y is
Note, we delete y from both TT|xy and TT|x . the largest rid among all rids in tuples of TT|x. These
4. Output: Add to FCI itemset corresponding to two tables correspond to a partition of current table
each rowset in TT|xy with the largest size k and TT|x. TT|xy is the sub-table without y, and TT|x is the
ending with rid k. sub-table with every tuple containing y. Since every
5. Recursive call: rowset that will be derived from TT|x must contain y,
TopDownMine(TT|xy, cMinsup, excluedSize+1) we delete y from TT|x and at the same time decrease
TopDownMine(TT|x, cMinsup1, excluedSize)
cMinsup by 1. Pruning strategy 3 is applied to shrink
table TT|xy.
Figure 4.1 Algorithm TD-Close
Pruning strategy 3: Each tuple t containing rid y in
In step 1, we apply pruning strategy 1 to stop TT|x will be deleted from TT|xy if size of t (that is the
processing current excluded transposed table. number of rids t contains) equals cMinsup.
Pruning strategy 1: If excludedSize is equal to or Example 4.3 (pruning strategy 3) Suppose currently
greater than (nminsup), then there is no need to do we have finished dealing with table TT|54 which is
any further recursive call of TopDownMine. shown in Table 4.2, and we need to create TT|543 with
ExcludedSize is the number of rids excluded from cMinsup being 2. Then, according to pruning strategy
current table. If it is not less than (nminsup), the size 2, tuples c1 and d2 will be pruned from TT|543, because
of each rowset in current transposed table must be less after excluding rid 3 from these two tuples, their size
than minsup, so these rowsets are impossible to will become less than cMinsup, although currently they
become large. satisfy the minsup threshold. As a result, there is only
In step 2, we apply pruning strategy 2 to stop one tuple {a1b1} left in TT|543, as shown in Table 4.4.
further recursive calls.

286
We can get these two tables, TT|xy and TT|x, by However, when dealing with every rowset, Carpenter
steps as follows. First, for each tuple t in TT|x with backward pruning strategy still needs to scan the
containing rid y, delete y from it, and copy it to TT|x. corresponding conditional transposed table to find the
And then check if the size of t is less than cMinsup. If largest common rowset, and also needs to scan the
not, keep it and in the meantime put y in the skip- original transposed table to do backward pruning, so it
rowset, otherwise get rid of it from TT|x. Finally, TT|x needs lots of scan of the transposed table. In addition,
to make the comparison fair, we just use a flat table to
becomes TT|xy, and at the same time TT|x is obtained.
represent the transposed table TT instead of other data
Step 4 is the major output step. If there is a tuple
structure such as FP-tree that may speed up search to
with empty skip-rowset and the largest size, say k, and
some extent.
containing rid k in TT|xy, then output it. Since this
tuple has the largest size among all of the tuples in
5.1 Synthetic Data Sets
TT|xy, there is no other tuple that will contribute to it.
Therefore, it can be output. For example, in table In order to test the performance of our top-down
TT|54 shown in Table 4.2, itemset a1b1 corresponding strategy based algorithm TD-Close with respect to
to rowset {1,2,3} can be output as its size is 3 which is several aspects, we use synthetic data set first. Figures
larger than the size of the other two tuples, and it also 5.1 to 5.8 illustrate the change of running time as
contains the largest rid 3. minsup decreases for data sets with different
Step 5 conducts two recursive calls to deal with dimensions, tuples, and cardinalities. We use D#T#C#
TT|xy and TT|x respectively. to represent specific dataset, where D# stands for
dimension, the number of attributes of each data set,
5 Experimental Study T# for number of tuples, and C# for cardinality, the
number of values per dimension (or attribute). All
In this section, we will study the performance of the data are generated randomly. In these experiments, D#
algorithm TD-Close. Experiments for both synthetic ranges from 4000 to 10000, T# varies from 100, 150 to
data sets and real microarray data sets were conducted. 200, and C# varies from 8, 10 to 12.
Carpenter has already shown its much better To test the performance of three algorithms with
performance than those column enumeration based respect to the number of dimensions, we created 5 data
algorithms such as CLOSET [10] and CHARM [11], sets with 4000, 6000, 8000 and 10000 dimensions
so we only compare our algorithm with Carpenter and respectively. Figures 5.1 to 5.4 show the effect of
another column enumeration-based algorithm FPclose, changing dimensionality on the runtime of these three
which won the FIMI03 best implementation award algorithms.
[15, 16]. All experiments were performed on a PC We can see from Figure 5.1 that as minsup
with a Pentium-4 1.5 Ghz CPU, 1GB RAM, and 30GB decreases, runtime of these three algorithms increases,
hard disk. All of the runtimes plotted in the figures and TD-Close is the fastest among these algorithms.
include both computation time and I/O time. Apparently, the increase speed of algorithm TD-Close
For algorithm FPclose, we downloaded the source and Carpenter is not fast, while when minsup reaches
of the implementation from the FIMI repository [17]. 10, the runtime of FPclose increases dramatically.
For algorithm Carpenter, we implement it to our This is because when minsup reaches 10, the number
best knowledge according to paper [5] and its of frequent one items increases dramatically. Since
subsequent papers [6, 7, 8], and we also improve it by FPclose is a column enumeration-based algorithm, the
using a faster closeness-checking method, backward increase of the number of frequent items will lead to
pruning, which is used in several other algorithms the explosion of the number of frequent itemsets which
dealing with microarray data [6, 8]. The original are needed to be checked one by one. On the other
closeness-checking method is that before outputting hand, row enumeration-based algorithms, such as TD-
each itemset found currently, we must check if it is Close and Carpenter, search the row combination
already found before. If not, output it. Otherwise, space. The number of rows influences the runtime of
discard it. This method is slower than the backward these algorithms much more than the number of
pruning method, because itemset from very high frequent items does. The reason that TD-Close uses
dimensional data set usually contains large number of less time than Carpenter is that TD-Close can prune
items, and it is not in specific order, so comparison the search space much more than Carpenter, and can
with a large number of large itemsets takes long time. stop search much earlier than Carpenter.
This is also the reason why some algorithms proposed Figures 5.2 to 5.4 also indicate the same trend as
after Carpenter use backward checking method. shown in Figure 5.1. What is different is that we

287
cannot get all runtimes for Carpenter and FPclose for due to the rise of the number of itemsets as the number
some datasets. For example, for dataset of dimensions increases.
D6000T100C10, we cannot get the runtime for
Carpenter when minsup is less than 13, and for 9000
FPclose when minsup is less than 11, which is either 8000 TD-Close
because it can not run to the end in reasonable time (in 7000 FPclose
several hours) or due to memory error occurred during

runtime (s)
6000
running. Same situations also happen in some of the 5000

following figures. 4000

3000
8000 2000

7000 TD-Close 1000

Carpenter
6000 0
FPclose
runtime (s)

5000 13 12 11 10 9

4000 minsup
3000

2000
Figure 5.4 Runtime for D10000T100C10
1000

0 9000
12 11 10 9 8
8000 TD-Close
minsup runtime (s)
7000 FPclose

Figure 5.1 Runtime for D4000T100C10 6000

5000

4000
4500
3000
4000 TD-Close
2000
3500 Carpenter
FPclose 1000
runtime (s)

3000
0
2500 19 18 17 16 15 14 13
2000
minsup
1500
1000 Figure 5.5 Runtime for D4000T150C10
500
0
13 12 11 10 9 10000
9000
minsup TD-Close
8000 FPclose
7000
runtime (s)

Figure 5.2 Runtime for D6000T100C10

6000
5000
2500 4000
3000
TD-Close
2000 FPclose
2000
runtime (s)

1000
1500 0
25 23 22 20 19 18

1000 minsup

500 Figure 5.6 Runtime for D4000T200C10

0 To test the runtime of these algorithms with respect

13 12 11 10 9
to the number of tuples, two more data sets are
minsup
produced. One contains 150 tuples and the other 200
Figure 5.3 Runtime for D8000T100C10 tuples, while dimension is 4000 and cardinality is 10.
Figures 5.5 and 5.6 show the experimental results. For
From Figures 5.1 to 5.4, one can clearly see that all these two data sets, Carpenter cannot run to
of these algorithms need more time for data sets with completion due to too long time needed. As for TD-
more dimensions for the same minsup value, and this is Close and FPclose, we can see that for some relatively

288
high values of minsup, FPclose runs a little faster than be pruned by minsup becomes small, so the time saved
TD-Close. However, when minsup becomes relatively by this kind of pruning in algorithm TD-Close
small, TD-Close runs much faster than FPclose. The becomes less.
reason is the same as explained above. That is, when
Comparing with data set D4000T100C12, data set
minsup is high, FPclose can cut the itemset space to
D4000T100C8 is relatively denser. With the same
very small by minsup threshold so the search time is
minsup threshold, the latter data set will produce more
limited. But once the number of frequent items
frequent patterns. Figure 5.8 shows almost the same
becomes large, the search space increases
features of TD-Close and FPclose. But obviously, it
exponentially, while the number of rows remains
took them much more time than the former data set for
relative stable.
the same minsup threshold. That is also the reason
In the above two groups of experiments, the
why we cannot get the exact runtime of Carpenter on
cardinality of each data set is set to 10, which means
it.
each dimension of each dataset has 10 distinct values.
To test the performance regarding different From these experimental results, one may see that
cardinalities, two other datasets are created, which for some relatively high minsup, FPclose is a little bit
correspond to 12 distinct values and 8 distinct values faster than TD-Close. But in that case, the runtime for
respectively. The number of dimension is 4000 and both TD-close and FPclose is usually less than one
the number of tuples is 100. Experimental results are minute, or only several minutes. So, the difference is
shown in Figures 5.7 and 5.8. not that significant. However, when minsup becomes
low, it is obvious that TD-Close outperforms FPclose
5000 and Carpenter much.
4500
4000 TD-Close
5.2 Real Data Sets
runtime (s)

3500 Carpenter
FPclose
3000
Besides testing on the synthetic data sets, we also
2500
2000
tested our algorithm on three real microarray data sets.
1500 They are clinical data on ALL-AML leukemia, lung
1000 cancer and breast cancer [5, 6]. We take the
500 corresponding training data sets for experiments. ALL-
0
11 10 9 8 7 6
AML has 38 rows and 7129 dimensions, Lung Cancer
minsup has 32 rows and 12533 dimensions, and Breast Cancer
has 78 rows and 24481 dimensions. Before using
Figure 5.7 Runtime for D4000T100C12 frequent pattern mining algorithms, they are all
discretized using the equi-depth binning method. To
test the performance on different cardinality, we
4000
produce two sets of discretized data sets, one with five
TD-Close
3500
FP_close
bins for each dimension, and another with 10 bins for
3000
each dimension.
runtime (s)

2500
The first group of experiments is done for these
2000
three data sets with 5 bins per dimension. Figures 5.9
1500
to 5.11 show the runtime of TD-Close, Carpenter, and
1000
FPclose at different minsup values. Note that the y-
500
axes in these figures are in logarithmic scale, and we
0
15 14 13 12 11 10
plot the figure as minsup increase so that we can see
m insup
the pruning power of minsup clearly.
Figure 5.9 shows that as minsup increases the
Figure 5.8 Runtime for D4000T100C8 runtime of both TD-Close and Fpclose reduces
dramatically, while the runtime of Carpenter remains
Figure 5.7 shows us that when minsup becomes relatively stable. This is because Carpenter still needs
small, for example, less than 9, both TD-Close and to search the rowset space in which the size of each
Carpenter spend less time than FPclose. When rowset is less than minsup although apparently they
minsup reaches 6, the runtime of TD-Close and will not satisfy minsup. This figure also indicates that
Carpenter become almost the same. This is because among these three algorithms, TD-Close is the fastest.
when minsup becomes very low, the search space can We did not get the runtime of FPclose when minsup is

289
7, because when minsup is 8, it already spends much row enumeration algorithms TD-Close and Carpenter,
longer time than the other two algorithms. they only check rowsets instead of itemsets, and the
number of rowsets does not change that dramatically.
For breast cancer data set, we cannot get the
10000
rumtime for FPclose when minsup is equal to or less
TD-Close than 16 because of memory error (run out of memory).
1000 Carpenter
Similarly, Carpenter cannot run to completion when
runtime (s)

FPclose

minsup is 18 after about 11 hours of running.

100
Therefore, in Figure 5.11, we can only see the runtime
for Carpenter when minsup is not less than 71, and the
10
runtime for FPclose when minsup is greater than 17.
For FPclose and TD-Close, when the value of minsup
1
is between 20 and 70, the runtime is less than one
7 8 9 10
second, so these values are not shown in the x-axis.
minsup The results of these experiments for the real world
microarray data sets illustrate that TD-Close and
Figure 5.9 ALL-AML Leukemia Fpclose can make use of the anti-monotonic constraint
minsup to prune the search space dramatically, while
Carpenter cannot. Since the pruning strategies used in
100 TD-Close benefit from the top-down search strategy,
TD-Close
Carpenter
we can conclude that this search strategy is more
FPclose
effective and useful compared to the bottom-up search
strategy for row enumeration-based search algorithms.
runtime (s)

10 Also, for very high dimensional dataset, row

enumeration-based algorithm TD-Close outperform
column enumeration-based algorithm FPclose very
much.
1
6 7 8 9 10 6 Related Work
minsup
Since the last decade, many algorithms [1, 10, 11,
Figure 5.10 Lung Cancer 12, 13, 18, 19] have been proposed to find frequent
itemsets from not very high dimensional data sets, such
10000
as transactional data sets. Suppose there are n different
items in data set, these algorithms usually adopt a
1000
bottom-up strategy to search the itemset space that
could be as large as 2n. By bottom-up search, these
runtime (s)

100
algorithms can use the minsup threshold to stop further
search the superset of an itemset once this itemset does
10
TD-Close
Ca rpenter not satisfy the minsup threshold. However, for very
high dimensional data sets, since n becomes very large,
FPclose

1 the search space becomes huge. This leads to the low

16 17 18 19 71 72 74 performance of these algorithms for very high
dimensional data. As a result, a new group of
minsup
algorithms [5, 6, 7, 8] were proposed to deal with long
Figure 5.11 Breast Cancer microarray data. Our work is directly related to these
algorithms. In [5], an algorithm called Carpenter was
Figure 5.10 tells us almost the same situation as proposed to find all of the frequent closed itemsets.
Figure 5.9 does. What is different is that FPclose Carpenter conducts a depth-first order traversal of the
cannot run successfully when minsup is 6 due to a row-enumeration as shown in Figure 3.1, and checks
memory error. This is because for this dataset, the each rowset corresponding to the node visited to see if
number of dimensions is large. When minsup is 6, the it is frequent and closed. In [6], an algorithm called
number of itemsets needed to check is huge. While for Farmer is developed to find the set of association-

290
based classification rules. Farmer also searches the row [5] F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J.
enumeration tree by depth-first order. In [7], Zaki. CARPENTER: Finding closed patterns in long
Algorithm Cobbler is proposed to find frequent closed biological datasets. In Proc. 2003 ACM SIGKDD Int.
itemsets by integrating row enumeration method with Conf. On Knowledge Discovery and Data Mining,
2003.
column enumeration method. It shows its high
performance by conducting experiments on a data set [6] G. Cong, A. K. H. Tung, X. Xu, F. Pan, and J. Yang.
FARMER: Finding interesting rule groups in
with high dimension and a relatively large number of
microarray datasets. In Proc. 23rd ACM Int. Conf.
rows. In [8], an algorithm is proposed to find the top- Management of Data, 2004.
k classification rules for each row of the microarray
[7] F Pan, A. K. H. Tung, G. Cong, X. Xu. COBBLER:
data set. All of these algorithms aim to facilitate the Combining column and Row Enumeration for Closed
mining of frequent pattern by searching the row Pattern Discovery. In Proc 2004 Int. Conf. on
enumeration space, and they all search the space in a Scientific and Statistical Database Management
top-down style. (SSDBM'04), Santorini Island, Greece, June 2004. pp.
21-30.
7 Conclusions [8] G. Cong, K.-L. Tan, A. K. H. Tung, X. Xu. Mining
Top-k covering Rule Groups for Gene Expression Data.
In this paper we propose a top-down search strategy In 24th ACM International Conference on
Management of Data, 2005.
for mining frequent closed patterns from very high
dimensional data such as microarray data. Existing [9] C. Creighton and S. Hanash. Mining gene expression
databases for association rules. Bioinformatics, 19,
algorithms, such as Carpenter and several other related
2003.
algorithms, adopt a bottom-up fashion to search the
[10] J. Pei, J. Han, and R. Mao. CLOSET: An efficient
row enumeration space, which makes the pruning
algorithm for mining frequent closed itemsets. In Proc.
power of minimum support threshold (minsup) very 2000 ACM-SIGMOD Int. Workshop Data Mining and
weak, and therefore results in long mining process, Knowledge Discovery (DMKD'00), pp. 11-20, Dallas,
even for high minsup, and much memory cost. To TX, May 2000.
solve this problem, based on our top-down search [11] M. Zaki and C. Hsiao. CHARM: An efficient algorithm
strategy, a top-down style row enumeration method for closed association rule mining. In Proc. of 2002
and an effective closeness-checking method are SIAM Data Mining Conf., 2002.
proposed. A new algorithm, TD-Close, is designed [12] J. Han and J. Pei. Mining frequent patterns by pattern
and implemented for mining a complete set of frequent growth: methodology and implications. KDD
closed itemsets from high dimensional data. Several Exploration, 2, 2000.
pruning strategies are developed to speed up searching. [13] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal.
Both analysis and experimental study show that these Discovering frequent closed itemsets for association
methods are effective and useful. Future work rules. In Proc. 7th Int. Conf. Database Theory
includes integrating top-down row enumeration (ICDT99), Jan. 1999.
method and column row enumeration method for [14] J. Li and L. Wong. Identifying good diagnostic genes
frequent pattern mining from both long and deep large or genes groups from gene expression data by using the
datasets, and mining classification rules based on concept of emerging patterns. Bioinformatics, 18:725
association rules using top-down searching strategy. 734, 2002.
[15] G. Grahne and J. Zhu. Efficiently Using Prefix-trees in
Mining Frequent Itemsets. In Proc. of the IEEE ICDM
References Workshop on Frequent Itemset Mining
[1] R. Agrawal and R. Srikant. Fast algorithms for mining Implementations. Melbourne, Florida, Nov., 2003
association rules. In Proc. 1994 Int. Conf. Very Large [16] http://fimi.cs.helsinki.fi/fimi03/
Data Bases (VLDB94), pp. 487499, Sept. 1994. [17] http://fimi.cs.helsinki.fi/.
[2] C. Niehrs and N. Pollet. Synexpression groups in [18] M. J. Zaki. Scalable algorithms for association mining.
eukaryotes. Nature, 402: 483-487, 1999. IEEE Trans. Knowledge and Data Engineering,
[3] Y. Cheng and G. M. Church. Biclustering of 12:372{390, 2000.
expression data. In Proc of the 8th Intl. Conf. [19] J. Wang, J. Han, and J. Pei. Closet+: Searching for the
Intelligent Systems for Mocular Biology, 2000. best strategies for mining frequent closed itemsets. In
[4] J. Yang, H. Wang, W. Wang, and P. S. Yu. Enhanced Proc. 2003 ACM SIGKDD Int. Conf. On Knowledge
Biclustering on Gene Expression data. In Proc. of the Discovery and Data Mining (KDD'03), Washington,
3rd IEEE Symposium on Bioinformatics and D.C., Aug 2003.
Bioengineering (BIBE), Washington DC, Mar. 2003.

291

Barrowmaze Complete PDF
96% (26)
Barrowmaze Complete PDF
260 pages
The Perilous Wilds PDF
100% (2)
The Perilous Wilds PDF
72 pages
Petty Gods Revised & Expanded Edition PDF
100% (14)
Petty Gods Revised & Expanded Edition PDF
396 pages
Thingonomicon II Screen
100% (7)
Thingonomicon II Screen
231 pages
Ua Trailrun Half Marathon Training Plan NT v4
No ratings yet
Ua Trailrun Half Marathon Training Plan NT v4
12 pages
Ua Trailrun 10k Training Plan NT v4
No ratings yet
Ua Trailrun 10k Training Plan NT v4
12 pages
Perilous Almanacs
88% (8)
Perilous Almanacs
22 pages
Adventures On Dungeon Planet PDF
90% (10)
Adventures On Dungeon Planet PDF
143 pages
Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach
No ratings yet
Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach
12 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
Fast and Memory Efficient Mining of Frequent Closed Itemsets
No ratings yet
Fast and Memory Efficient Mining of Frequent Closed Itemsets
18 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
Ijesat 2012 02 01 13
No ratings yet
Ijesat 2012 02 01 13
6 pages
p132 Closet
No ratings yet
p132 Closet
11 pages
Mjoiuytrsfedsqwe 4 e 56 R 7 I 8 Ouikjghfvdcsretjyukilopl, KMJHNGB
No ratings yet
Mjoiuytrsfedsqwe 4 e 56 R 7 I 8 Ouikjghfvdcsretjyukilopl, KMJHNGB
9 pages
Jdm6
No ratings yet
Jdm6
12 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
Utility Mining
No ratings yet
Utility Mining
5 pages
An Association Rule Mining Algorithm Based On A Boolean Matrix
No ratings yet
An Association Rule Mining Algorithm Based On A Boolean Matrix
7 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
An Efficient Algorithm For Enumerating Closed Patterns in Transaction Databases
No ratings yet
An Efficient Algorithm For Enumerating Closed Patterns in Transaction Databases
15 pages
3Mining With Multiple Minimum Supports.pptx
No ratings yet
3Mining With Multiple Minimum Supports.pptx
35 pages
dm 2
No ratings yet
dm 2
71 pages
5615ijdkp06 PDF
No ratings yet
5615ijdkp06 PDF
8 pages
DMDW-U3
No ratings yet
DMDW-U3
16 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
No ratings yet
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
4 pages
Unit-4 Da
No ratings yet
Unit-4 Da
15 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
5 pages
Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2
No ratings yet
Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2
15 pages
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
No ratings yet
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
5 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Chapter-4 Association Pattern Mining: 4.1 A Critical Look On Currently Used Algorithms
No ratings yet
Chapter-4 Association Pattern Mining: 4.1 A Critical Look On Currently Used Algorithms
40 pages
FDS_Unit02
No ratings yet
FDS_Unit02
16 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
Comparing The Performance of Frequent Pattern Mini
No ratings yet
Comparing The Performance of Frequent Pattern Mini
5 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
(IJCST-V4I2P44) :dr. K.Kavitha
No ratings yet
(IJCST-V4I2P44) :dr. K.Kavitha
7 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Unit 3
No ratings yet
Unit 3
62 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Itemset Mining Over Large Transactional Tables On The Relational Databases
No ratings yet
Itemset Mining Over Large Transactional Tables On The Relational Databases
6 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Improved Data Mining Approach To Find Frequent Itemset Using Support Count Table
No ratings yet
Improved Data Mining Approach To Find Frequent Itemset Using Support Count Table
7 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Assignment 1: Data Mining MGSC5126 - 10
No ratings yet
Assignment 1: Data Mining MGSC5126 - 10
10 pages
CS 515 Data Warehousing and Data Mining
No ratings yet
CS 515 Data Warehousing and Data Mining
5 pages
Lecture_4
No ratings yet
Lecture_4
76 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Week 3
No ratings yet
Week 3
56 pages
C ARM: An Efficient Algorithm For Closed Association Rule Mining
No ratings yet
C ARM: An Efficient Algorithm For Closed Association Rule Mining
20 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
ETP-Mine: An Efficient Method For Mining Transitional Patterns
No ratings yet
ETP-Mine: An Efficient Method For Mining Transitional Patterns
9 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Introduction To Arules - A Computational Environment For Mining Association Rules and Frequent Item Sets
No ratings yet
Introduction To Arules - A Computational Environment For Mining Association Rules and Frequent Item Sets
37 pages
Mining Frequent Itemsets Using Vertical Data Format
No ratings yet
Mining Frequent Itemsets Using Vertical Data Format
14 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
City Builder 01 - Communities
100% (13)
City Builder 01 - Communities
36 pages
Arcus The Arcane
No ratings yet
Arcus The Arcane
1 page
Cavegirl - Horror Classes For LotFP PDF
No ratings yet
Cavegirl - Horror Classes For LotFP PDF
12 pages
Ten Foot Polemic Unified House Rule Document
No ratings yet
Ten Foot Polemic Unified House Rule Document
17 pages
The Problems Young Learners Encounter During Liste
No ratings yet
The Problems Young Learners Encounter During Liste
6 pages
Perilous Deeps
100% (11)
Perilous Deeps
76 pages
TotGaD Compendium PDF
100% (1)
TotGaD Compendium PDF
164 pages
Cluster Bio
No ratings yet
Cluster Bio
8 pages
Assessing The Success of Heritage Conservation Districts: Insights From
No ratings yet
Assessing The Success of Heritage Conservation Districts: Insights From
10 pages
Genetic Algorithms and Fuzzy Multiobjective Optimization
No ratings yet
Genetic Algorithms and Fuzzy Multiobjective Optimization
294 pages