0% found this document useful (0 votes)

16 views16 pages

Distributed DataMining

Uploaded by

Ab Ir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views16 pages

Distributed DataMining

Uploaded by

Ab Ir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Distributed Data Mining

Grigorios Tsoumakas*
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, 54124
Greece
voice: +30 2310-998418
fax: +30 2310-998419
email: greg@csd.auth.gr

Ioannis Vlahavas
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, 54124
Greece
voice: +30 2310-998418
fax: +30 2310-998419
email: vlahavas@csd.auth.gr

(* Corresponding author)
Distributed Data Mining

Grigorios Tsoumakas, Aristotle University of Thessaloniki, Greece

Ioannis Vlahavas, Aristotle University of Thessaloniki, Greece

INTRODUCTION

The continuous developments in information and communication technology

have recently led to the appearance of distributed computing environments, which

comprise several, and different sources of large volumes of data and several

computing units. The most prominent example of a distributed environment is the

Internet, where increasingly more databases and data streams appear that deal with

several areas, such as meteorology, oceanography, economy and others. In addition

the Internet constitutes the communication medium for geographically distributed

information systems, as for example the earth observing system of NASA

(eos.gsfc.nasa.gov). Other examples of distributed environments that have been

developed in the last few years are sensor networks for process monitoring and grids

where a large number of computing and storage units are interconnected over a high-

speed network.

The application of the classical knowledge discovery process in distributed

environments requires the collection of distributed data in a data warehouse for

central processing. However, this is usually either ineffective or infeasible for the

following reasons:

(1) Storage cost. It is obvious that the requirements of a central storage system are

enormous. A classical example concerns data from the astronomy science, and

especially images from earth and space telescopes. The size of such databases is
reaching the scale of exabytes (1018 bytes) and is increasing at a high pace. The

central storage of the data of all telescopes of the planet would require a huge data

warehouse of enormous cost.

(2) Communication cost. The transfer of huge data volumes over network might take

extremely much time and also require an unbearable financial cost. Even a small

volume of data might create problems in wireless network environments with limited

bandwidth. Note also that communication may be a continuous overhead, as

distributed databases are not always constant and unchangeable. On the contrary, it is

common to have databases that are frequently updated with new data or data streams

that constantly record information (e.g remote sensing, sports statistics, etc.).

(3) Computational cost. The computational cost of mining a central data warehouse is

much bigger than the sum of the cost of analyzing smaller parts of the data that could

also be done in parallel. In a grid, for example, it is easier to gather the data at a

central location. However, a distributed mining approach would make a better

exploitation of the available resources.

(4) Private and sensitive data. There are many popular data mining applications that

deal with sensitive data, such as people’s medical and financial records. The central

collection of such data is not desirable as it puts their privacy into risk. In certain

cases (e.g. banking, telecommunication) the data might belong to different, perhaps

competing, organizations that want to exchange knowledge without the exchange of

raw private data.

This article is concerned with Distributed Data Mining algorithms, methods

and systems that deal with the above issues in order to discover knowledge from

distributed data in an effective and efficient way.

BACKGROUND

Distributed Data Mining (DDM) (Fu, 2001; Park & Kargupta, 2003) is

concerned with the application of the classical Data Mining procedure in a distributed

computing environment trying to make the best of the available resources

(communication network, computing units and databases). Data Mining takes place

both locally at each distributed site and at a global level where the local knowledge is

fused in order to discover global knowledge.

A typical architecture of a DDM approach is depicted in Figure 1. The first

phase normally involves the analysis of the local database at each distributed site.

Then, the discovered knowledge is usually transmitted to a merger site, where the

integration of the distributed local models is performed. The results are transmitted

back to the distributed databases, so that all sites become updated with the global

knowledge. In some approaches, instead of a merger site, the local models are

broadcasted to all other sites, so that each site can in parallel compute the global

model.
(1) Local Mining

(2) Transmit
DB 1 statistics and/or
local models to (3) Global Mining
a merger site

DB 2
Merger
Site

… (4) Transmit
global statistics
and/or model to
local databases
DB N

(5) Update Local Model

Figure 1: Typical architecture of Distributed Data Mining approaches

Distributed databases may have homogeneous or heterogeneous schemata. In

the former case, the attributes describing the data are the same in each distributed

database. This is often the case when the databases belong to the same organization

(e.g. local stores of a chain). In the latter case the attributes differ among the

distributed databases. In certain applications a key attribute might be present in the

heterogeneous databases, which will allow the association between tuples. In other

applications the target attribute for prediction might be common across all distributed

databases.

MAIN FOCUS
Distributed Classification and Regression

Approaches for distributed classification and regression are mainly inspired

from methods that appear in the area of ensemble methods, such as Stacking,

Boosting, Voting and others. Some distributed approaches are straightforward

adaptations of ensemble methods in a distributed computing environment, while

others extend the existing approaches in order to minimize the communication and

coordination costs that arise.

Chan and Stolfo (1993) applied the idea of Stacked Generalization (Wolpert,

1992) to DDM via their meta-learning methodology. They focused on combining

distributed data sets and investigated various schemes for structuring the meta-level

training examples. They showed that meta-learning exhibits better performance with

respect to majority voting for a number of domains. Knowledge Probing (Guo &

Sutiwaraphun, 1999) builds on the idea of meta-learning and in addition uses an

independent data set, called the probing set, in order to discover a comprehensible

model. The output of a meta-learning system on this independent data set together

with the attribute value vector of the same data set are used as training examples for a

learning algorithm that outputs a final model.

The Collective Data Mining (CDM) framework (Kargupta, Park, Hershberger

& Johnson, 2000) allows the learning of classification and regression models over

heterogeneous databases. It is based on the observation that any function can be

represented using an appropriate set of basis functions. Initially, CDM generates

approximate orthonormal basis coefficients at each site. It then moves an

appropriately chosen sample of each data set to a single site and generates the

approximate basis coefficients corresponding to non-linear cross terms. Finally, it

combines the local models and transforms the global model into the user-specified

canonical representation.

A number of approaches have been presented for learning a single rule set

from distributed data. Hall, Chawla and Bowyer (1997; 1998) present an approach

that involves learning decision trees in parallel from disjoint data, converting trees to

rules and then combining the rules into a single rule set. Hall, Chawla, Bowyer and

Kegelmeyer (2000) present a similar approach for the same case, with the difference

that rule learning algorithms are used locally. In both approaches, the rule

combination step starts by taking the union of the distributed rule sets and continues

by resolving any conflicts that arise. Cho and Wüthrich (2002) present a different

approach that starts by learning a single rule for each class from each distributed site.

Subsequently, the rules of each class are sorted according to a criterion that is a

combination of confidence, support and deviation, and finally the top k rules are

selected to form the final rule set. Conflicts that appear during the classification of

new instances are resolved using the technique of relative deviation (Wüthrich, 1997).

Fan, Stolfo and Zhang (1999) present d-sampling AdaBoost, an extension to

the generalized AdaBoost learning algorithm (Schapire and Singer, 1999) for DDM.

At each round of the algorithm, a different site takes the role of training a weak model

using the locally available examples weighted properly so as to become a distribution.

Then, the update coefficient αt is computed based on the examples of all distributed

sites and the weights of all examples are updated. Experimental results show that the

performance of the proposed algorithm is in most cases comparable to or better than

learning a single classifier from the union of the distributed data sets, but only in

certain cases comparable to boosting that single classifier. The distributed boosting

algorithm of Lazarevic and Obradovic (2001) at each round learns a weak model in
each distributed site in parallel. These models are exchanged among the sites in order

to form an ensemble, which takes the role of the hypothesis. Then, the local weight

vectors are updated at each site and their sums are broadcasted to all distributed sites.

This way each distributed site maintains a local version of the global distribution

without the need of exchanging the complete weight vector. Experimental results

show that the proposed algorithm achieved classification accuracy comparable or

even slightly better than boosting on the union of the distributed data sets.

Distributed Association Rule Mining

Agrawal and Shafer (1996) discuss three parallel algorithms for mining

association rules. One of those, the Count Distribution (CD) algorithm, focuses on

minimizing the communication cost, and is therefore suitable for mining association

rules in a distributed computing environment. CD uses the Apriori algorithm

(Agrawal and Srikant, 1994) locally at each data site. In each pass k of the algorithm,

each site generates the same candidate k-itemsets based on the globally frequent

itemsets of the previous phase. Then, each site calculates the local support counts of

the candidate itemsets and broadcasts them to the rest of the sites, so that global

support counts can be computed at each site. Subsequently, each site computes the k-

frequent itemsets based on the global counts of the candidate itemsets. The

communication complexity of CD in pass k is O(|Ck|n2), where Ck is the set of

candidate k-itemsets and n is the number of sites. In addition, CD involves a

synchronization step when each site waits to receive the local support counts from

every other site.

Another algorithm that is based on Apriori is the Distributed Mining of

Association rules (DMA) algorithm (Cheung, Ng, Fu & Fu, 1996), which is also
found as Fast Distributed Mining of association rules (FDM) algorithm in (Cheung,

Han, Ng, Fu & Fu, 1996). DMA generates a smaller number of candidate itemsets

than CD, by pruning at each site the itemsets that are not locally frequent. In addition,

it uses polling sites to optimize the exchange of support counts among sites, reducing

the communication complexity in pass k to O(|Ck|n), where Ck is the set of candidate

k-itemsets and n is the number of sites. However, the performance enhancements of

DMA over CD are based on the assumption that the data distributions at the different

sites are skewed. When this assumption is violated, DMA actually introduces a larger

overhead than CD due to its higher complexity.

The Optimized Distributed Association rule Mining (ODAM) algorithm

(Ashrafi, Taniar & Smith, 2004) follows the paradigm of CD and DMA, but attempts

to minimize communication and synchronization costs in two ways. At the local

mining level, it proposes a technical extension to the Apriori algorithm. It reduces the

size of transactions by: i) deleting the items that weren’t found frequent in the

previous step and ii) deleting duplicate transactions, but keeping track of them

through a counter. It then attempts to fit the remaining transaction into main memory

in order to avoid disk access costs. At the communication level, it minimizes the total

message exchange by sending support counts of candidate itemsets to a single site,

called receiver. The receiver broadcasts the globally frequent itemsets back to the

distributed sites.

Distributed Clustering

Johnson and Kargupta (1999) present the Collective Hierarchical Clustering

(CHC) algorithm for clustering distributed heterogeneous data sets, which share a

common key attribute. CHC comprises three stages: i) local hierarchical clustering at
each site, ii) transmission of the local dendrograms to a facilitator site, and iii)

generation of a global dendrogram. CHC estimates a lower and an upper bound for

the distance between any two given data points, based on the information of the local

dendrograms. It then clusters the data points using a function on these bounds (e.g.

average) as a distance metric. The resulting global dendrogram is an approximation of

the dendrogram that would be produced if all data were gathered at a single site.

Samatova, Ostrouchov, Geist and Melechko (2002), present the RACHET

algorithm for clustering distributed homogeneous data sets. RACHET applies a

hierarchical clustering algorithm locally at each site. For each cluster in the hierarchy

it maintains a set of descriptive statistics, which form a condensed summary of the

data points in the cluster. The local dendrograms along with the descriptive statistics

are transmitted to a merging site, which agglomerates them in order to construct the

final global dendrogram. Experimental results show that RACHET achieves good

quality of clustering compared to a centralized hierarchical clustering algorithm, with

minimal communication cost.

Januzaj, Kriegel and Pfeifle (2004) present the Density Based Distributed

Clustering (DBDC) algorithm. Initially, DBDC uses the DBSCAN clustering

algorithm locally at each distributed site. Subsequently, a small number of

representative points that accurately describe each local cluster are selected. Finally,

DBDC applies the DBSCAN algorithm on the representative points in order to

produce the global clustering model.

Database Clustering

Real-world, physically distributed databases have an intrinsic data skewness

property. The data distributions at different sites are not identical. For example, data
related to a disease from hospitals around the world might have varying distributions

due to different nutrition habits, climate and quality of life. The same is true for

buying patterns identified in supermarkets at different regions of a country. Web

document classifiers trained from directories of different Web portals is another

example.

Neglecting the above phenomenon, may introduce problems in the resulting

knowledge. If all databases are considered as a single logical entity then the

idiosyncrasies of different sites will not be detected. On the other hand if each

database is mined separately, then knowledge that concerns more than one database

might be lost. The solution that several researchers have followed is to cluster the

databases themselves, identify groups of similar databases, and apply DDM methods

on each group of databases.

Parthasarathy and Ogihara (2000) present an approach on clustering

distributed databases, based on association rules. The clustering method used, is an

extension of hierarchical agglomerative clustering that uses a measure of similarity of

the association rules at each database. McClean, Scotney, Greer and Páircéir (2001)

consider the clustering of heterogeneous databases that hold aggregate count data.

They experimented with the Euclidean metric and the Kullback-Leibler information

divergence for measuring the distance of aggregate data. Tsoumakas, Angelis and

Vlahavas (2003) consider the clustering of databases in distributed classification

tasks. They cluster the classification models that are produced at each site based on

the differences of their predictions in a validation data set. Experimental results show

that the combining of the classifiers within each cluster leads to better performance

compared to combining all classifiers to produce a global model or using individual

classifiers at each site.

FUTURE TRENDS

One trend that can be noticed during the last years is the implementation of

DDM systems using emerging distributed computing paradigms such as Web services

and the application of DDM algorithms in emerging distributed environments, such as

mobile networks, sensor networks, grids and peer-to-peer networks.

Cannataro and Talia (2003), introduced a reference software architecture for

knowledge discovery on top of computational grids, called Knowledge Grid. Datta,

Bhaduri, Giannela, Kargupta and Wolff (2006), present an overview of DDM

applications and algorithms for P2P environments. McConnell and Skillicorn (2005)

present a distributed approach for prediction in sensor networks, while Davidson and

Ravi (2005) present a distributed approach for data pre-processing in sensor networks.

CONCLUSION

DDM enables learning over huge volumes of data that are situated at different

geographical locations. It supports several interesting applications, ranging from fraud

and intrusion detection, to market basket analysis over a wide area, to knowledge

discovery from remote sensing data around the globe.

As the network is increasingly becoming the computer, the role of DDM

algorithms and systems will continue to play an important role. New distributed

applications will arise in the near future and DDM will be challenged to provide

robust analytics solutions for these applications.

REFERENCES

Agrawal, R. & Shafer J.C. (1996). Parallel Mining of Association Rules. IEEE

Transactions on Knowledge and Data Engineering, 8(6), 962-969.

Agrawal R. & Srikant, R. (1994, September). Fast Algorithms for Mining Association

Rules. In Proceedings of the 20th International Conference on Very Large

Databases (VLDB'94), Santiago, Chile, 487-499.

Ashrafi, M. Z., Taniar, D. & Smith, K. (2004). ODAM: An Optimized Distributed

Association Rule Mining Algorithm. IEEE Distributed Systems Online, 5(3).

Cannataro, M. and Talia, D. (2003). The Knowledge Grid. Communications of the

ACM, 46(1), 89-93.

Chan, P. & Stolfo, S. (1993). Toward parallel and distributed learning by meta-

learning. In Proceedings of AAAI Workshop on Knowledge Discovery in

Databases, 227-240.

Cheung, D.W., Han, J., Ng, V., Fu, A.W. & Fu, Y. (1996, December). A Fast

Distributed Algorithm for Mining Association Rules. In Proceedings of the 4th

International Conference on Parallel and Distributed Information System

(PDIS-96), Miami Beach, Florida, USA, 31-42.

Cheung, D.W., Ng, V., Fu, A.W. & Fu, Y. (1996). Efficient Mining of Association

Rules in Distributed Databases. IEEE Transactions on Knowledge and Data

Engineering, 8(6), 911-922.

Cho, V. & Wüthrich, B. (2002). Distributed Mining of Classification Rules.

Knowledge and Information Systems, 4, 1-30.

Datta, S, Bhaduri, K., Giannella, C., Wolff, R. & Kargupta, H. (2006). Distributed

Data Mining in Peer-to-Peer Networks, IEEE Internet Computing 10(4), 18-

26.
Davidson I. & Ravi A. (2005). Distributed Pre-Processing of Data on Networks of

Berkeley Motes Using Non-Parametric EM. In Proceedings of 1st International

Workshop on Data Mining in Sensor Networks, 17-27.

Fan, W., Stolfo, S. & Zhang, J. (1999, August). The Application of AdaBoost for

Distributed, Scalable and On-Line Learning. In Proceedings of the 5th ACM

SIGKDD International Conference on Knowledge Discovery and Data

Mining, San Diego, California, USA, 362-366.

Fu, Y. (2001). Distributed Data Mining: An Overview. Newsletter of the IEEE

Technical Committee on Distributed Processing, Spring 2001, pp.5-9.

Guo, Y. & Sutiwaraphun, J. (1999). Probing Knowledge in Distributed Data Mining.

In Proceedings of the Third Pacific-Asia Conference on Methodologies for

Knowledge Discovery and Data Mining (PAKDD-99), 443-452.

Hall, L.O., Chawla, N., Bowyer, K. & Kegelmeyer, W.P. (2000). Learning Rules

from Distributed Data. In M. Zaki & C. Ho (Eds.), Large-Scale Parallel Data

Mining. (pp. 211-220). LNCS 1759, Springer.

Hall, L.O., Chawla, N. & Bowyer, K. (1998, July). Decision Tree Learning on Very

Large Data Sets. In Proceedings of the IEEE Conference on Systems, Man

and Cybernetics.

Hall, L.O., Chawla, N. & Bowyer, K. (1997). Combining Decision Trees Learned in

Parallel. In Proceedings of the Workshop on Distributed Data Mining of the

4th ACM SIGKDD International Conference on Knowledge Discovery and

Data Mining.

Johnson, E.L & Kargupta, H. (1999). Collective Hierarchical Clustering from

Distributed, Heterogeneous Data. In M. Zaki & C. Ho (Eds.), Large-Scale

Parallel Data Mining. (pp. 221-244). LNCS 1759, Springer.

Kargupta, H., Park, B-H, Herschberger, D., Johnson, E. (2000) Collective Data

Mining: A New Perspective Toward Distributed Data Mining. In H. Kargupta

& P. Chan (Eds.), Advances in Distributed and Parallel Knowledge Discovery.

(pp. 133-184). AAAI Press.

Lazarevic, A, & Obradovic, Z. (2001, August). The Distributed Boosting Algorithm.

In Proceedings of the 7th ACM SIGKDD international conference on

Knowledge discovery and data mining, San Francisco, California, USA, 311-

316.

McClean, S., Scotney, B., Greer, K. & P Páircéir, R. (2001). Conceptual Clustering of

Heterogeneous Distributed Databases. In Proceedings of the PKDD’01

Workshop on Ubiquitous Data Mining.

McConnell S. and Skillicorn D. (2005). A Distributed Approach for Prediction in

Sensor Networks. In Proceedings of the 1st International Workshop on Data

Mining in Sensor Networks, 28-37.

Park, B. & Kargupta, H. (2003). Distributed Data Mining: Algorithms, Systems, and

Applications. In N. Ye (Ed.), The Handbook of Data Mining. (pp. 341-358).

Lawrence Erlbaum Associates.

Parthasarathy, S. & Ogihara, M. (2000). Clustering Distributed Homogeneous

Databases. In Proceedings of the 4th European Conference on Principles of

Data Mining and Knowledge Discovery (PKDD-00), Lyon, France, September

13-16, 566-574.

Samatova, N.F., Ostrouchov, G., Geist, A. & Melechko A.V. (2002). RACHET: An

Efficient Cover-Based Merging of Clustering Hierarchies from Distributed

Datasets. Distributed and Parallel Databases 11, 157-180.

Schapire, R & Singer, Y. (1999). Improved boosting algorithms using confidence-

rated predictions. Machine Learning 37(3), 297-336.

Tsoumakas, G., Angelis, L. & Vlahavas, I. (2003). Clustering Classifiers for

Knowledge Discovery from Physically Distributed Databases. Data &

Knowledge Engineering 49(3), 223-242.

Wolpert, D. (1992). Stacked Generalization. Neural Networks 5, 241-259.

Wüthrich, B. (1997). Discovery probabilistic decision rules. International Journal of

Information Systems in Accounting, Finance, and Management 6, 269-277.

KEY TERMS AND THEIR DEFINITIONS

Data Skewness: The observation that the probability distribution of the same

attributes in distributed databases is often very different.

Distributed Data Mining (DDM): A research area that is concerned with the

development of efficient algorithms and systems for knowledge discovery in

distributed computing environments.

Global Mining: The combination of the local models and/or sufficient statistics in

order to produce the global model that corresponds to all distributed data.

Grid: A network of computer systems that share resources in order to provide a high

performance computing platform.

Homogeneous and Heterogeneous Databases: The schemata of Homogeneous

(Heterogeneous) databases contain the same (different) attributes.

Local Mining: The application of data mining algorithms at the local data of each

distributed site.

Sensor Network: A network of spatially distributed devices that use sensors in order

to monitor environment conditions.

Solutions For Digital Signal Processing 5th Edition by Proakis
No ratings yet
Solutions For Digital Signal Processing 5th Edition by Proakis
31 pages
Survay On Distributed DataMining in P2P Networks
No ratings yet
Survay On Distributed DataMining in P2P Networks
23 pages
Distributed Data Mining: Scaling Up and Beyond: Foster Provost New York University New York, NY 10012
No ratings yet
Distributed Data Mining: Scaling Up and Beyond: Foster Provost New York University New York, NY 10012
25 pages
Agent Based Meta Learning in Distributed
No ratings yet
Agent Based Meta Learning in Distributed
7 pages
Unit 2
No ratings yet
Unit 2
20 pages
Synopsis - 119997107013 - Dineshkumar B. Vaghela
No ratings yet
Synopsis - 119997107013 - Dineshkumar B. Vaghela
23 pages
Agent DDM
No ratings yet
Agent DDM
10 pages
A Data Mining Architecture For Distributed Environments: Lecture Notes in Computer Science June 2002
No ratings yet
A Data Mining Architecture For Distributed Environments: Lecture Notes in Computer Science June 2002
13 pages
Advanced Databases and Mining Unit 3
No ratings yet
Advanced Databases and Mining Unit 3
30 pages
Unit 2
No ratings yet
Unit 2
37 pages
Yihao Final Paper CCSC For Submission
No ratings yet
Yihao Final Paper CCSC For Submission
6 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
Data Mining and Data Warehousing Notes ct1
No ratings yet
Data Mining and Data Warehousing Notes ct1
12 pages
BI - Unit 5
No ratings yet
BI - Unit 5
9 pages
Solutions To DM I MID (A)
100% (1)
Solutions To DM I MID (A)
19 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
DW&DM (Unit - 4)
No ratings yet
DW&DM (Unit - 4)
9 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
Data Mining AND Data Warehousing: Computer Science & Engineering
0% (1)
Data Mining AND Data Warehousing: Computer Science & Engineering
14 pages
Data Mining
No ratings yet
Data Mining
87 pages
InTech-Mining Enrollment Data Using Descriptive and Predictive Approaches
No ratings yet
InTech-Mining Enrollment Data Using Descriptive and Predictive Approaches
21 pages
Data Mining System and Applications A Re
No ratings yet
Data Mining System and Applications A Re
13 pages
Journal On Decision Tree
No ratings yet
Journal On Decision Tree
5 pages
A Data Mining Approach For Unification of Association Rules in Distributed and Parallel Databases
No ratings yet
A Data Mining Approach For Unification of Association Rules in Distributed and Parallel Databases
5 pages
Meta-Learning in Distributed Data Mining Systems
No ratings yet
Meta-Learning in Distributed Data Mining Systems
38 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
30-Additional Themes On Data Mining
No ratings yet
30-Additional Themes On Data Mining
9 pages
TPW Data Mining
No ratings yet
TPW Data Mining
4 pages
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
No ratings yet
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
5 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining With XML Data For Improved Response Time
No ratings yet
An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining With XML Data For Improved Response Time
14 pages
2010 - An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining With XML Data For Improved Response Time
No ratings yet
2010 - An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining With XML Data For Improved Response Time
14 pages
Students' Admission Prediction Using GRBST With Distributed Data Mining
No ratings yet
Students' Admission Prediction Using GRBST With Distributed Data Mining
5 pages
A Review of Various KNN Techniques
No ratings yet
A Review of Various KNN Techniques
6 pages
Association Rule Mining On Distributed Data: Pallavi Dubey
No ratings yet
Association Rule Mining On Distributed Data: Pallavi Dubey
6 pages
A Comparative Study On Association Rule Mining in Distributed Data Mining
No ratings yet
A Comparative Study On Association Rule Mining in Distributed Data Mining
7 pages
Data Mining: A Brief Introduction To The Field and Research Community
No ratings yet
Data Mining: A Brief Introduction To The Field and Research Community
5 pages
DM 5th Unit
No ratings yet
DM 5th Unit
54 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
Explain Multirelational Data Mining Concept in Detail
No ratings yet
Explain Multirelational Data Mining Concept in Detail
7 pages
IJAERS-SEPT-2014-020-MAD-ARM - Distributed Association Rule Mining Mobile Agent
No ratings yet
IJAERS-SEPT-2014-020-MAD-ARM - Distributed Association Rule Mining Mobile Agent
5 pages
Data Availability: Ÿdopamine (Da)
No ratings yet
Data Availability: Ÿdopamine (Da)
107 pages
1.1 Data and Information Mining
No ratings yet
1.1 Data and Information Mining
24 pages
An Introduction To Data Mining Technique: August 2014
No ratings yet
An Introduction To Data Mining Technique: August 2014
6 pages
Singh2013 - 2 Metode PDF
No ratings yet
Singh2013 - 2 Metode PDF
5 pages
Meta-Learning in Grid-Based Data Mining Systems
No ratings yet
Meta-Learning in Grid-Based Data Mining Systems
16 pages
Data Mining Issues and Tasks
No ratings yet
Data Mining Issues and Tasks
5 pages
Parallel and Distributed Algorithm in Data Mining
No ratings yet
Parallel and Distributed Algorithm in Data Mining
1 page
Data Mining Technique
No ratings yet
Data Mining Technique
7 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Cake - Classifying, Associating & Knowledge Discovery An Approach For Distributed Data Mining (DDM) Using Parallel Data Mining Agents (Padmas)
No ratings yet
Cake - Classifying, Associating & Knowledge Discovery An Approach For Distributed Data Mining (DDM) Using Parallel Data Mining Agents (Padmas)
6 pages
Unit 4
No ratings yet
Unit 4
5 pages
Data Mining Models and Tasks
No ratings yet
Data Mining Models and Tasks
6 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
U1 - Data Mining Task Primitives
No ratings yet
U1 - Data Mining Task Primitives
4 pages
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
10 1016@j Asoc 2020 106200
No ratings yet
10 1016@j Asoc 2020 106200
30 pages
Wang 2010
No ratings yet
Wang 2010
4 pages
Fuzzy - Means For Fuzzy Hierarchical Clustering: Vicenc Torra
No ratings yet
Fuzzy - Means For Fuzzy Hierarchical Clustering: Vicenc Torra
6 pages
Towards Data Mining in Large and Fully Distributed Peer To Peer Overlay Networks
No ratings yet
Towards Data Mining in Large and Fully Distributed Peer To Peer Overlay Networks
6 pages
Guan 2015
No ratings yet
Guan 2015
4 pages
Current Trends Characteristics Challenge
No ratings yet
Current Trends Characteristics Challenge
3 pages
Security Challenges in Cloud Computing
No ratings yet
Security Challenges in Cloud Computing
8 pages
DSP 1
No ratings yet
DSP 1
21 pages
(NOTES) CAYLEY HAMILTON THEOREMwerytweryterte
No ratings yet
(NOTES) CAYLEY HAMILTON THEOREMwerytweryterte
5 pages
(Ebook PDF) Spreadsheet Modeling and Decision Analysis: A Practical Introduction To Business Analytics 7th Edition Instant Download
100% (1)
(Ebook PDF) Spreadsheet Modeling and Decision Analysis: A Practical Introduction To Business Analytics 7th Edition Instant Download
58 pages
Neural Network Verification With Pyrat: Augustin Lemesle, Julien Lehmann, Tristan Le Gall November 1, 2024
No ratings yet
Neural Network Verification With Pyrat: Augustin Lemesle, Julien Lehmann, Tristan Le Gall November 1, 2024
21 pages
Mathematics Behind Blockchain - Pooja Warke
No ratings yet
Mathematics Behind Blockchain - Pooja Warke
42 pages
AI Manual
No ratings yet
AI Manual
69 pages
The Mean, Median, Mode, Range
No ratings yet
The Mean, Median, Mode, Range
2 pages
Unit 3 Process Synchronization
No ratings yet
Unit 3 Process Synchronization
18 pages
Ultimate Beginner's Path For 2017: 3.1: Getting Started and Testing The Waters
No ratings yet
Ultimate Beginner's Path For 2017: 3.1: Getting Started and Testing The Waters
14 pages
Vehicle Routing Models
No ratings yet
Vehicle Routing Models
11 pages
VAR Models and Cointegration - Sebastian Fossati
No ratings yet
VAR Models and Cointegration - Sebastian Fossati
64 pages
Finite Impulse Response (FIR) Filter Design
No ratings yet
Finite Impulse Response (FIR) Filter Design
74 pages
Cuckoo Search With Levy Flight
No ratings yet
Cuckoo Search With Levy Flight
4 pages
Algorithm Analysis GATE Questions SET 1
No ratings yet
Algorithm Analysis GATE Questions SET 1
9 pages
Artificial Intelligence Chapter 2: Intelligent Agents
No ratings yet
Artificial Intelligence Chapter 2: Intelligent Agents
12 pages
Chapter 12 Solutions
No ratings yet
Chapter 12 Solutions
11 pages
CCS Question Bank
No ratings yet
CCS Question Bank
36 pages
Theory of Computation and Compiler Design - Introduction PDF
No ratings yet
Theory of Computation and Compiler Design - Introduction PDF
20 pages
A Star Algorithm
No ratings yet
A Star Algorithm
20 pages
LoRA Retains More
No ratings yet
LoRA Retains More
3 pages
Intelligent Inverse Kinematic Control of Scorbot-Er V Plus Robot Manipulator
No ratings yet
Intelligent Inverse Kinematic Control of Scorbot-Er V Plus Robot Manipulator
12 pages
Sample Prob 1 2 Crashing
No ratings yet
Sample Prob 1 2 Crashing
4 pages
BSI05 Adba
No ratings yet
BSI05 Adba
3 pages
Stresm Cipher Lecture
No ratings yet
Stresm Cipher Lecture
10 pages
Repetition and Loop Statements
No ratings yet
Repetition and Loop Statements
44 pages
Artificial Intelligence: Thinking Humanly Thinking Rationally
No ratings yet
Artificial Intelligence: Thinking Humanly Thinking Rationally
9 pages
Banker's Algorithm: Operating System LAB Report
No ratings yet
Banker's Algorithm: Operating System LAB Report
15 pages
Oxford Insight Mathematics 10-5-25 3 AC For NSW Student Book Obook John Ley Michael Fuller Z Lib Org 358
No ratings yet
Oxford Insight Mathematics 10-5-25 3 AC For NSW Student Book Obook John Ley Michael Fuller Z Lib Org 358
1 page
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
No ratings yet
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
4 pages