Database Systems Journal vol. III, no.

3/2012 67

Data mining in Cloud Computing

Ruxandra-tefania PETRE
Bucharest Academy of Economic Studies

This paper describes how data mining is used in cloud computing. Data Mining is used for
extracting potentially useful information from raw data. The integration of data mining
techniques into normal day-to-day activities has become common place. Every day people are
confronted with targeted advertising, and data mining techniques help businesses to become
more efficient by reducing costs.
Data mining techniques and applications are very much needed in the cloud computing
paradigm. The implementation of data mining techniques through Cloud computing will allow
the users to retrieve meaningful information from virtually integrated data warehouse that
reduces the costs of infrastructure and storage.
Keywords: Cloud Computing, Data mining

Introduction engineering, spatial data etc.

1 The Internet is becoming an
increasingly vital tool in our everyday
The emerging Cloud Computing trends
provides for its users the unique benefit of
life, both professional and personal, as its unprecedented access to valuable data that
users are becoming more numerous. can be turned into valuable insight that can
It is not surprising that business is help them achieve their business objectives.
increasingly conducted over the Internet.
Perhaps one of the most revolutionary 2 Some aspects regarding Cloud
concepts of recent years is Cloud Computing
Computing. Cloud computing represents both the
The Cloud, as it is often referred to, software and the hardware delivered as
involves using computing resources services over the Internet.
hardware and software that are Cloud Computing is a new concept that
delivered as a service over the Internet defines the use of computing as a utility, that
(shown as a cloud in most IT diagrams). has recently attracted significant attention.
Many companies are choosing as an In Figure 1 below it is illustrated the
alternative to building their own IT computing paradigm shift on the last half
infrastructure to host databases or century through six distinct phases: [1]
software, having a third party to host Phase 1: people used terminals to
them on its large servers, so the company connect to powerful mainframes
would have access to its data and shared by many users.
software over the Internet. Phase 2: stand-alone personal
The use of Cloud Computing is gaining computers became powerful enough
popularity due to its mobility, huge to satisfy users daily work.
availability and low cost. On the other Phase 3: computer networks allowed
hand it brings more threats to the security multiple computers to connect to
of the companys data and information. each other.
At an equally significant extent in recent Phase 4: local networks could
years, data mining techniques have connect to other local networks to
evolved and became more used, establish a more global network.
discovering knowledge in databases Phase 5: the electronic grid
becoming increasingly vital in various facilitated shared computing power
fields: business, medicine, science and and storage resources.
68 Data mining in Cloud Computing

Phase 6: Cloud Computing allows Table 1 Top Cloud Computing Companies

the exploitation of all available and Key Features [3]
resources on the Internet in a Cloud Name Key Feature
scalable and simple way.
Sun More available
Microsystems application than any
Sun Cloud other open OS.
IBM Integrated power
Dynamic management to help you
plan, predict, monitor
and actively manage
power consumption of
your BladeCenter
Amazon EC2 Designed to make web-
Figure 1. Computing paradigm shift of scale computing easier
the last half century [1] for developers.
Google App No limit to the free trial
As it is defined by the National Institute Engine period if you do not
of Standards and Technology, Cloud exceed the quota
computing is a model for enabling allotted.
ubiquitous, convenient, on-demand Microsoft Currently offering a
network access to a shared pool of Azure development
configurable computing resources (e.g., accelerator discount
networks, servers, storage, applications, plan. 15-30 % discount
and services) that can be rapidly off consumption charges
provisioned and released with minimal for first 6 months.
management effort or service provider AT&T Use fully on-demand
interaction. Synaptic infrastructure or combine
This cloud model is composed of five Hosting it with dedicated
essential characteristics, three service components to meet
models, and four deployment models. [2] specialized requirements.
The essential characteristics of cloud GoGrid Cloud Free load balancing and
computing are on-demand self-service, Computing free 24/7 support.
broad network access, resource pooling, Salesforce Offers cloud solutions
rapid elasticity and measured service. for automation, customer
The service models that compose cloud service and platform,
computing are Software as a Service respectively.
(SaaS), Platform as a Service (PaaS) and Transparency through
Infrastructure as a Service (IaaS). real-time information on
The deployment models of cloud system performance and
computing are private cloud, community security at
cloud, public cloud and hybrid cloud.
Table 1 presents details on the top cloud
computing companies and their products Cloud computing represents all possible
key features: resources on the Internet, offering infinite
computing power.
As cloud computing is becoming a more
significant technology trend, it could
reshape the IT sector and the IT marketplace.
Database Systems Journal vol. III, no. 3/2012 69

3 Some aspects regarding Data mining Regression Technique for predicting

Data mining represents finding useful a continuous numerical
patterns or trends through large amounts outcome such a customer
of data. lifetime value, house
Data mining is defined as a type of value, process yield
database analysis that attempts to rates.
discover useful patterns or relationships Attribute Ranks attributes
in a group of data. The analysis uses Importance according to strength of
advanced statistical methods, such as relationship with target
cluster analysis, and sometimes employs attribute. Use cases
artificial intelligence or neural network include finding factors
techniques. A major goal of data mining most associated with
is to discover previously unknown customers who respond
relationships among the data, especially to an offer, factors most
when the data come from different associated with healthy
databases. [4] patients.
The most important data mining Anomaly Identifies unusual or
techniques and their description are Detection suspicious cases based
presented in table 2 below: on deviation from the
norm. Common
Table 2 Data mining techniques [5] examples include health
Cloud Name Key Feature care fraud, expense
report fraud, and tax
Clustering Useful for exploring data compliance.
and finding natural Feature Produces new attributes
groupings. Members of a Extraction as linear combination of
cluster are more like existing attributes.
each other than they are Applicable for text data,
like members of a latent semantic analysis,
different cluster. data compression, data
Common examples decomposition and
include finding new projection, and pattern
customer segments and recognition.
life sciences discovery.
Classification Most commonly used Considering the varied data mining
technique for predicting techniques and the great need for
a specific outcome such discovering patterns and trends in data that
as response / no- would lead to knowledge that could not be
response, high / medium obtained otherwise, its no wonder that data
/ low value customer, mining is used in the most varies field of
likely to buy / not buy. activity.
Association Find rules associated Data mining, the extraction of hidden
with frequently co- predictive information from large databases,
occurring items, used for is a powerful new technology with great
market basket analysis, potential to help companies focus on the
cross-sell, root cause most important information in their data
analysis. Useful for warehouses. Data mining tools predict future
product bundling, in- trends and behaviors, allowing businesses to
store placement, and make proactive, knowledge-driven
defect analysis. decisions.
70 Data mining in Cloud Computing

The automated, prospective analyses the customer only pays for the data
offered by data mining move beyond the mining tools that he needs that
analyses of past events provided by reduces his costs since he doesnt
retrospective tools typical of decision have to pay for complex data mining
support systems. [6] suites that he is not using exhaustive;
Businesses can make predictions about the customer doesnt have to
how well a product will sell or develop maintain a hardware infrastructure,
new advertising campaigns by using as he can apply data mining through
these new relationships reflected by the a browser this means that he has to
data mining algorithms. pay only the costs that are generated
The medical sector benefits from the data by using Cloud computing.
mining techniques, as well as the Using data mining through Cloud computing
geographical data being better analyzed reduces the barriers that keep small
by using data mining. companies from benefiting of the data
Governments can discern illegal or mining instruments.
embargoed activities done by individuals, Cloud Computing denotes the new trend in
associations or other governments with Internet services that rely on clouds of
the implementation of the data mining servers to handle tasks. Data mining in cloud
techniques. computing is the process of extracting
In short, data mining has developed uses structured information from unstructured or
in the majority of field of activity. semi-structured web data sources.
The data mining in Cloud Computing allows
4 Data mining in Cloud Computing organizations to centralize the management
Data mining techniques and applications of software and data storage, with assurance
are very much needed in the cloud of efficient, reliable and secure services for
computing paradigm. their users. [6]
As cloud computing is penetrating more The implementation of data mining
and more in all ranges of business and techniques through Cloud computing will
scientific computing, it becomes a great allow the users to retrieve meaningful
area to be focused by data mining. information from virtually integrated data
Cloud computing denotes the new trend warehouse that reduces the costs of
in Internet services that rely on clouds of infrastructure and storage.
servers to handle tasks. Data mining in
cloud computing is the process of 5 Conclusions
extracting structured information from Data mining technologies provided through
unstructured or semi-structured web data Cloud computing is an absolutely necessary
sources. characteristic for todays businesses to make
The data mining in Cloud Computing proactive, knowledge driven decisions, as it
allows organizations to centralize the helps them have future trends and behaviors
management of software and data predicted.
storage, with assurance of efficient, This paper provides an overview of the
reliable and secure services for their necessity and utility of data mining in cloud
users. [6] computing. As the need for data mining
As Cloud computing refers to software tools is growing every day, the ability of
and hardware delivered as services over integrating them in cloud computing
the Internet, in Cloud computing data becomes more and more stringent.
mining software is also provided in this
way. References
The main effects of data mining tools [1] Jeffrey Voas and Jia Zhang, Cloud
being delivered by the Cloud are: Computing: New Wine or Just a New
Database Systems Journal vol. III, no. 3/2012 71

Ruxandra-tefania PETRE graduated from the Faculty of Cybernetics, Statistics and

Economic Informatics of the Academy of Economic Studies in 2010. She graduated from the
Business Support Databases Master of the Academy of Economic Studies in 2012. At present
she is a Junior System Architect at LOXON Solutions since November 2011. She is
developing and implementing Business Intelligence and Data Warehousing solutions for the
banking system.

