Data Analytics - Unit - 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Introduetion to Data Analytin

1-2J(CSIT-5)

UNIT
1 Introduction to
Data Analytics
PART-1

Introdution To Data Analytics :Sources and Nature


Claseification of Data (Structured, Semi-Structured,
Characteristies of Data.
of Data,
Unatructured.

Questions-Answers
Questions
Long Answer Type and Medium Answer Type
CONTENTS
1-2J to 1-5J Que 1.1. What is data analytics ?

Part-1 : Introduction of Data Analytics


Data, Answer
Sources and Nature of(Structured, raw data in order to make
Classification of Data
Semi-Structured, Unstructured), 1 Data analytics is the science of analyzing
conclusions about that information.
Characteristics of Data
2.
subjected to data analytics techniques to
Any type of information can be improve
1-5J to 1-6J
get insight that can be used to things.
Part-2 : Introduction to Big Data techniques can help in finding the trends and metrics
Platform, Need of Data Analytics 3. Data analytics processes to increase the overall efficiency
1-6J to 1-13J that would be used to optimize
Evolution of Analytic of a business or system.
Part-3 : of data analytics have been
Sealability, Analytic 4 Many of the techniques and processes algorithms that work over
Process and Tools, Analysis automated into mechanical processes and
Vs Reporting, Modern
Data raw data for human consumption. runtime,
Analytic Tools, Applications of 5.
com panies often record the
For example, mnanufacturing various analyze the
Data Analysis machines and then
downtime, and work queue for machines operate closer to peak
Data Analytics Lifecycle
1-13J to 1-17J
data to better plan the workloads so the
Part-4 : capacity.
Need, Key Roles for
Successful Analytic Projects, Data).
Life Que i.2. Explain the source of data (or Big
Various Phases of Data Analytic
Cycle : Discovery, Data Preparations
1-17J to 1-20J Answver
Model Planning, Model are :
Part-5 :
Building, Comnunicating Three primary sources of Big Data
Results, Operationalization 1 Social data :
tweets & retweets, comments,
Social data comes from the likes,
media that are uploaded and shared via
video uploads, and general
social media platforms.
insights into consumer
b This kind of data provides inval uable
enormously infuential in
behaviour and sentiment and can be
marketing analytics.
1-1J(CS-5/1T-6)
133(C8-TT6)
143CS-TT6) Introduction to Data Analytics
andi tools like
Aabt of soeial data,
i anather eood souree inerease the volume of
Thepble web nod effect to ue 14. Differentinte hetween structured, semi-structured and
enbesed to
le trende unstruetured data.

Mahine data t generated by


defined as information which is machinery. and Answer
Nahine data is inetalled in Semi-struetured Unstructured
eqipent,ensors that are
ndatrial behaviour.
Properties Struetured data
data data
which track user exponentially astheinternet
even web l s grow Technology It is based on It is based on XMV It is based on
datais expected to around the world character and
Thixtype ef
ever more pervasive and expands Relational database RDF.
binary data.
ofthins greww meters, road cameras. table
such as medical devices, smart Internet of Thing will Tranraetion Matured trananction Transaction is No transact ion
Sensors rapidly growing management and
atellites, games and the variety of datain the very
volume and management and various adapted from
deliver high velocity, value,
no concurrency
concurTency DBMS
near future techniques.
Transactional data : transactions that Plecbility It is schema It is more flexible It very flexible and
& from all the daily dependent and lessthan structured there is absence of
Transactional data is generated Decible.
data but less than schema.
and ofline.
take place both online receipts are flexible than
payment orders, storage records, delivery unstructured data.
Inveices, transactional data. is very scalable.
characterized as
Scalability It is very difficult to It is more scalable It
classification of data.
scale databasethan structured
Que L3 Write short notes on schema. data.
Struetured query Queries over Only textual query
Query
Answer performance allow complex anonymous nodesare possible.
Unstruetured data : joining are possible
form of data.
Unstructured data is the rawest include text
struct ure, which may Explain the characteristics of Big Data.
Data that has no inherent video. Que I.5.
documents, PDFs, images, and
repository of files.
This data is often stored in a Answer
Structured data : Big Datais characterized into four dimensions :
2 and columns) which are very
Structured data is tabular data (rows 1 Volume:
ofthe data
well defined. Valume is concerned about scale of data ie.,the volume
which
type, format, and structure, at which it is growing
Data containinga defined datatraditional RDBMS, CSV files, and due to several applications of
may include transaction
data, The volume of datais growing rapidly,
simple spreadsheets. business, social, web and scientific explorations.
Semi-structured data : 2 Velocity:
demanding analysis of
&
pattern that enables parsing such The speed at which data is increasing thus
Textual data files with a distinct
Extensible Markup Language XML]data files or JSON. streaming data.
intelligence
as
however the structure is not very b The velocity is due to growing speed of business and banking
applications such as trading, transaction of telecom
Aconsistent format is defined
with the increased
strict domain, growing number of internet connections
Semi-structured data are often stored as files. usage of internet ete.
1-6J(C8-5/1T-4)
Data Analytices Introduction to Data Analytics
analysis such ae 163(CS-6/1T-6)
forms of data to use for
3 Variety tIt depicta different unstructured.
structured, semi structured and Answer
4. Veracity:
uncertainty or inaccuracy of the
data Features of Big Dataanalytics platform :
Veracity is concerned with accommodate new plat forms and
filtering and selectinu 1. Big Dataplatform should be able to
hence
data will be innceurate
In many cases the actually
tool based on the business requiremnent.
b
is needed is a complicated task. It should support linear scale-out.
the data which to go for data cleansing
2.

C Alot ofstatistical
and analvtical process has 3 Itshould have capability for rapid deployment.
for decision making.
for choosing intrinsic data 4 It should support variety of data format.
5 Platform should provide data analysis and reporting tools.
PART-2
6. It should provide real-time data analysis software.
of DataAnalytics. 7 It should have tools for searching the data through large
data sets.
Introduction to Big Data Platform, Need
Que 1.8. Why there isneed of data analytics ?
Questions-Answers
Questions Answer
Medi um Ans wer Type
Long Answer Type and Need of data analytics :
1 It optimizes the business performance.
2 It helps to make better decisions.
data platform. It helps to analyze customers trends and solutions.
Que 1.6. Write short note on big 3

Answer PART-3
features and
IT solution that combines the
1. Big data platform is a type of application and utilities within a single Evolution of Analytic Scalability, Analytic Process and Tools,
capabilities of several big data Analysis Us Reporting, Modern Data Analytic Tools, Applications
solution. organization in
IT platform that enables of Data Analy sis.
2 It is an enterprise class managing a big data infrastruct ure/
developing, deploying, operating and
environment. Questions-Answers
bigdata storage, servers, database,
Big data platform generally consists ofintelligence
3 and other big data Long Answer Type and Medium Answer Type Questions
big data management, business
management utilities.
and integration with
4 It also supports custom development, querying
other systems. Que 1.9. What are the steps involved in data analysis ?
5 The primary benefit behind a big data platform is to reduce the complexity
of multiple vendors/ solutions into a one cohesive solution.
Answer
6 Big data platform are also delivered through cloud where the provider Steps involved in data analysis are :
provides an all inclusive big data solutions and services. 1. Determine the data :
Que 1.7. What are the features of big data platform ? The first step is to determine the data requirements or how the
data is grouped.
b Data may be separated by age, demographic, income, or gender.
C. Data values may be numerical or be divided by category.
Data Analytics
173C8IT4 Introdaction fo Data
Analytics
1841CS6T4)
Colleetion of data : eolleetingit made up of
Data preparatiosinisthis
2
step in data analytiesis the proeess of ete ale do dsta prepar ation transformations. process,
The seeond variety of soureee
such as
compulers, B Analyagregation, derivations, and create
it all together to
Thie can be done through a environmental sourees, or through Jana, varios sources and merge
online sOurees,
cameras, they pull date fros for an analyis
regired
perennel
the variables
(MPPi sytem isthe most matiure, proven.
analyzing large
Maively Parsllel Proeonng
mechania for storing and
& Organization of data t snd widely deployed
the data pieces managed by
Third step is to organize
BmOunta of data
so it can be analy data into independent
is collected, it must beorganized An MPP database breaku the
unit (CPU) resoures.
Onee the data or other form 7
and central procesing
place on a spreadsheet independent storsge
Organization maytakestatistical data 100 GB 100 GB 100 GB
software that can take 100 GB 100 GB Chunk Chunks
Chunks Chonks
Chunks
4 Cleaning of data : before analysis.
data is then cleaned up 1tersbyte 100 GB 100 GB
In fourth step, the ensure there is ne table 100 GB 100 GB Chunks
scrubbed and checked to
100 GB Chunks Chanks
means it is Chunk Chunks
b. This it is not incomplete.
duplication or error, and that analyt 10 Simultaneous
100-GB queries
errors before it goes on to a data A traditional database
This step helps correct any will query a one
to be analyzed.
terabyte one row st time starage
note on evolution of
anaytics scalability. Paralal Proeeming syste data
ue 1.10. Write short Fig. L.10.1. Maasivaly easy
redundancy to make recovery
8. MPP systems build in management tools :
resource
Answer in a separate MPP systems have
have to pullthe data together
9
In analyticscalability, we Manage the CPU and disik space
1 start performing analysis.
analytics environment and then b. Query optimizer
analytic process.
Database 3 notes on evolution of
Database 4 Que 1.11.Write short
Database 1
Answer processes
scalability, it needs to update analytic
Database 2
1 With increased level of
to take advantage of it sandboxes to provide
the use of analytical
2 This can be achieved with scalable environment to build advanced
analytic professionals witha
analytics processes. the building and
The heavy processing occurs database syste mis to facilitate
in the analytic environment 3 One of the uses of MPP processes.
Analytic server deployment of advanced analytic data
or PC sandbox is the mechanism to utilize an enterprise
data sets which contain rows 4 An analytic
Analysts do the merge operation on the warehouse.
2 one of the primary
and columns.
5. If used appropriately,
an analytic sandbox can be
the customers such as name, data.
3. The columns represent information about drivers of value in the world of big
spending level, or status. Analytical sandbox :
are combined together. They a set of resources with
which in-depth
4 In merge or join, two or more data sets An analytie sandbox provides
are typically merged /joined so that specific rows of one data set or 1
answer eritical business questions.
another. analysis can be done to
table are combined with specific rows of
Analytics
Introduction to Data
Data Analytsos 1-9J(CS-5/1T-6
developnent of analy
I-10J(CS5IT-S)
large toolbox option
exploration, interactive workflows witha
2 An analytic sandbox is ideal for data This tool provides
b
helpe in analysis and
visaalizing ot data.
processes, proof ofconcepts, and prototyping producti to create the same whach
ongoing, user-managed proceses Or RapidMiner : and also it is
Chee thngs progress into not be involved.
5 Visual programming
RapidMiner tool operates usinganalyzing and modeling the data.
processes, then the sandbox should small set of users. manipulating,
be leveraged by a fairly nuch capable of and productive
sCience teams easier like machine
A sandbox is going to within the sandbox that is SegTegated fro b RapidMiner tools make data all their jobs
There willbe data created by using an open-source platform for deployment.
model
the productäon database
of their own for bre learning, data preparation, and
be allowed to load datais not
Sandbox users wil alsoproject. part of the offe a
even if that data 6 R-programnming :
programming language and
tme periods as part ofa a R is a free open source l eomPuting and graphcs
enterprise data model. software environment for
statistical software anddata
tools. miners for developing
ue L12. Explain modern data analytic b.It is used by data
analysis. recentt years.
popular tool for big data in
Answer C. It has becomne a highly
Datawrapper: interactive charts.
Modern data anaytic tools : 7.
Visualization toolfor making
fre It is an online data
Apache Hadoop : which is a Java based pdf or excel format.
data analytics tool It uses data file ina csv.visualization in the form of bar, line, map
Apache Hadoop, a big Datawrapper generate
software framework. data ina storare any other website as well.
etc. It can be embeddedinto
C.
amount of
effective storage of huge
b It belps in Tableau : intuitive
place known as a
cluster.
has ability to
process huge data tool. It is simple and very
on a cluster and
also Tableau isanother popular big
Itruns in parallel it. Hadoon to use. data visualization.
data acrossall nodes in popularly known as the It communicates the
insights of the data through
and explore
Svstem in Hadoop the large b
can check a hypothesis
There isa storageSystem (HDFS), which helps to splits present in a Through Tableau, an analyst
Distributed File nany nodes work on it extensively.
distribute across the data before starting to
volume of data and
analytic sandbox from the view
chuster.
Que 1.13. What are the benefits of
for
KNIE: leading open solutions an analyytic professional?
2
KNIE analytics platform is one of the of
data-driven innovation. hidden ina huge Answer
of an analytic
discovering the potential and predicts sandbox from the view
This tool helps inalso performs mine for fresh insights, or Benefits of analytic
volume of data, it professional:
professionals will be able to work
the new futures. Independence : Analytic without needing to continually
independently on the database system
1.
messy
OpenRefine: to work on the permissions for specific projects.
3 the efficient tools go back and ask for
OneRefine tool is one of flexibility to use
will have the visuahzation
and large volume of data.
data from one format 2 Flexibility : Analytic professionals
statist ical analysis, or
tools
intelligence,
data, transforming that whatever business
b It includes cleansing that they need to use. existing
anotber. will be able to leverage the
data sets easily. Efficiency : Analytic professionals
data mart, without having
to move or
It belps to explore large
3.
c. enterprise data warehouse or
Orange : and helps in data migrate data.
open-source data visualization
Orange is famous well to the expert.
analysis for beginner and as
Data Analytics
1-11J (CS-5/T Analytics
focus on the Introduction to Data
ofFreedom
svstems: and

tasks to IT.
profoseionals
production
Analytie
can reduce
processes by shifting those

improvement will be realized


with the
ndministration
maintenane 1-12J (CS-5IT-6)

They used 'divide and conguer


expenditure, profiles,
policy with the data, analyzing
any other
important information
defaulting
recent
to

Speed: Massive speed also enables rapid iteration and the ability o
processing. This
move understand any probability of a customer
parallel innovate. to examine
fail fast" and take more risks to 4 Delivery:
are using data analysis
What are the benefits of analytic sandbox from the Severaltop logistic companies
their overall efficiency
collected data and improve
able to find
Que L.14. applications, the companies were most cost
b Using data analytics
routes, delivery time, as well as the
view of IT ?
the best shipping
efficient transport means
Answer IT : Fast internet allocation :
sandbox from the view of 5. every area
allocating fast internet in engage in
Benefits of analvtic to centrally manage
a
While it might seem that
1.
Centralization : IT will be able
database
environment just as every
other
environment on
the sandb0xis
system
a
makes a city 'Smart', in
smart allocation. This smart
reality. it is more important
allocation would mean
to
understanding
and for the right
used in specific areas
managed. of ana: how bandwidth is being
greatly simplify the promotion
Streamlining: A sandbox will there will be a consistent platfor cause.
based on timing and
production since shift the data allocation
processes into b. It is also important to financial and commercial areas require
both development and
deployment. priority. It is assumned that residential areas
during developm during weekdays, while
be no more processes built
Simplicity : There willrewritten environme the most bandwidth weekends. But the sitation is much more
3
be totally to run in the production require it during the
that have to balano: complex. Data analytics can solve it.
control the sandbox environment, community can
Control: IT willbe able to environme applications of data analysis, a
4 other users. The production For example, using high-tech industries and in such cases; higher
sandbox needs and the needs of sandbox. draw the attention of
experiment gone wrong in the
is safe from an many anals bandwidth will be required in such areas.
be realized by consolidating
5 Costs :Big cost savings can system. 6 Internet searching: analytics
data narts into one central using one of their many data
a. When we use Google, we are
application of data analytics. applications employed by the company. data
Que 1.15. Explain the Bing, Yahoo, AOL etc., use
b Most search engines like Google, different algorithms to deliver
analytics. These search engines use
Answver the best result for a search query.
Application of data analytics : Digital advertisement :
specifically, predictive 7
Security : Data analytics applications or, more
rates in certain areas. Data analytics has revolutionized
digital advertising.
analysis has also helped in dropping crime as banners on websites, that is,
a

b Digital billboards in cities as well analytics


sources nowadays use data
2. Transportation :
transportation. most of the advertisement
a. Data analytics can be used to revolutionize transport a
using data algorithms.
where we need to
It can be used especially in areas of Big Data analyties ?
and require seamless Que 1.16. What are the different types
b.
large number of people to a specific area
transportation.
Ansver
3 Risk detection:
they wanted a Different types of Big Data analytics :
M
2Vany were struggling under debt, and
organizations of fraud. 1. Descriptive analytics :
solution to
mnining to provide insight into the
b They already had enough customer data in their hands,
and so. a It uses data aggregation and data
they applied data analytics. past.
4 & 2 Analytes
Data
Key1Answer Data
bBusiness :
Que Projects, b
userroles
Long Diagnostic Preseriptive Predietive
s Business L.17.Explain Analy root
Diagnostic data andoutenmes
It It interprethmane
able by
ationalized. the ofThiusually for is
al is
lo ws
It ItoutcomePredi
basedctive futurethe usesIt Deariptive
person
Answer Various tics causes mining ed guide es
on
project, a
benefitssuccessful
user Life
characterizedanalytics
to themopt uersim ization analyties
data statistical
can Type analytics
of and
determine analyties analyties
the is the PhasesCycle the to
analyticsIt
consult from key Preparations.
Data events. towards provides 1
value sormeone and
Questions-Answers correlations. by "prescribe"
: and : models
the analytics of :
Need, takestechniques provides describe
and
roles Medium Data PART-4 why a
of a simulation estimates and
the results. who
advise for deeper something solution. a
Analytic Key or
results,understands project a Roles such number companies forecasta
the successful Answer algorithms sumarize
look as about
and
project : Life For at happened rofdifferent the techniques
with
how Type datdrill-down,
a raw
the Successful
Cycle ponaible
adviceto likelihood
theteam analytics to inaight,
actionable (C8-6/T4)
1-13J
domain Questions discovery,
,data in data,
it
outputs on understand the possible to
Discovery, : on understand and
the Analytie past, of
area projects. make
context
wil actions future a
and the
be

7. 6 5. 4. 3. 2
1-143(CS-/IT)
Phase
VariousAnswer Que
C scientist bDatabase Business Project
objectives b C. b. Project
provides b. Data :tuningData
1.18. outputa. expert
Uually
1: theThey They
techniques,to Data analytics
related Theseor the DBA reports
Business keyAnalyst
understanding ThiGenerally all
s fromthe problerm Projct
phases given engineer
SQL tables thesponsor
Discovery data metrics, manager
Explain designs ensure scientist support responsibilities provisions Intelligence are perion requirermente spnsor in
queries toAdministrator and met final the a
of available business the and Intelligence provides provides busines
data : have and sets projeet
:data overall
and
various
for Dataensuring data needs of
on : outputa is :
provides data for and business Project
tim e the
analytic executes modeling,
to problems. engineer
repositories. knowledge business the responaible analyst,
data configures of Analyst priorities the for domain
thanalytics
e ingestion the may the (DBA)Analysts data, and of
manager funding
the the
phases project. subject
management working
include intelligence at
lifecycle analytical and have
appropriate Key the for
working project for fuifills line
objectivesapplying the : theof domain : the manayer,
of matter
into deep
providing database generally
Performance expected
ensures the and
the team. data start thisIntrodution
are:
data methods technical from
expertise
project team.gauges snd
analytics valid analytic and security feeds a
defines of role or
are expertise
environment create quality. that and the
met. data access reporting the deep
and techniques
analytical and key clarifes project
the to
approaches
sandbox. skills levels to dashboardsIndicatorsbased
extraction, sources. milestones degree subject Data
life for to key core and
assist are to
perspective. on the
cycle. analytical databases in support (KPLs), a desired valueof business
provides matterAnalytics
place and deep and
with and with
rationalize:
Phase1 6: 2 Phase1 3 2 2
Phase1 Phase1 1 2 1
Phase Analytics
Data
ments. Indevelopdetermines In running Theproduction
The on executing he In In
tpurposes. models. building
Thephase. Phase In
phase the 5: variables
4 techniques, thoroughly transformed transform The caPhae n nd asterns The
Rimilar ln
team phase teamaddition,
work phase : team 3 this Importantan hatery
a
6, criteria
Communicate Model Model3 :
team work Data2t
2
begin am Phase
teof
tnarrative
he should 5, the also done
4, and is phase, requires analytios projeeta
if models andneeds wi t h ch 1,
in the explores andmodel
building and in preparation aneses
learningactivitiespeople,
team developed the the mnodels, considers in
this subsequently planning dat a the
identify resultsteam, and
to the team workflow take the the load to the challenge awhether in
delivers phase the planning, team ETL(ETL) execute and technology,
the the team
summarize results work ormodel : steps presence in the
key in of in whether if develops
the data : data. past
it it also process perform this learne
the
final findings, phase the flows. wiplanning
collaboration l selects to extract, to : and resourees
: team to
intends get phase from
reports, and project learn
wherecondition
needs so ofanfornulating time, organization the
1. needa its builds data data
analytios
key the th e load, inelude
convey
quantify are existingphase. to analytic and whichbusineae
available
sets about followtto the te am into
briefings, more and variables eamfamiliarize and data
witha for initial they or
for the data. can the framing
findings the success robusttoolsexecutes for
determines transform the
sundlbox, businens
tocandomain,
major and the work
sandbox.
will
testing,relationships duration hypotheses
code, business or the suPport learn
to a the itself with
in (CS-/IT)
environment be
models subsequent unit 1-16J
including
and
stakeholders. failure
stakeholders, most
(ELT) which busineus
adequate training, the with Datita of the
attempted haa
value, and (IHa)
technical based should orthe
based between methods
suitable the ext the project
model analw proiect problem relevant
and for for date ru te to
on and
' in

1.Answer 5 4 3
Sub-phases Que 2. 1
during
Answer
identifying1-163(CS-6/T4) 2
a
Main
of
Scope b b a C b
reparing 1.20. problem Evaluate Review aggregate
Capture
data b ue
sources:
1dentify
data
activities aproduction In
infrastructure access may analyze It
dincovery L.I9. addition,
quality, BeginObtain
This data.the Theattributes, further wanta to
It This perform.
understanding. thatMakeinitial Make
the alsoenables
ofExplain be data the
In
:evaluation
to the is can potentinl What
an
data sort and points an a
bypotheses that
addition thesegood understanding raw
exploration be list phase the
type data preliminary
the for emvironment.
analyties of its and inventory are tesm
preparation required,
the candidates and data the purchased of are
data tools. limitations. team
previewing are
getsstructures become team
candidate performed dnta
sub-phases to structure :
the may
the data onto outlined :
the sources of
sandbox such
infrastructure specific
to gain sOurces : orthe activities run
tools r
foteam familiar the from possible data
are: a the otherwise while s
as dictate and quick datasets in pilot
needed,
disk
the interdependencies
initial sources
discovery during
: of thinking areas. data
project tools with areas
overview identifying should project
datastorage which data acquired
currently Introduetion
the the and the
needed of discovery to
preparation. needed data and
about phase.team
and
tools content feeds. interest implement
be of
providing for potential
how the may performed
influences
network which the :
available
for among the to
to team of within data need phase Data
this start the tests data the
technologies and
can the
high-level the and to
? models
Analytics
capacity. thetype data, the perform test sources
getting use while
kind data data. teamthose
of to its the in
4. & Performing
ETLT: 2 Analytists
Model b.
conditioning Data : R
LoogweAnsr b.Learning A
Panning, It
further
phases.
datasets
indatasets, Data leverage.
team
canteam output.
In to datathe A new data
Spending itaelf.
is Data Indata data In call This kindeWhen
ndal AS The interfering
addition, logs Anal ytie
viewedconditioning
conditioning has
understand eritical statestore, thiw backfrom ETL, struetured
about varieties
ean firt
or case, users or of
o developing
Type as and access time aspect or into a inelude
Model otherwise it where th e data web data, data with
processing performing is what to the leave e
thperform of sandlEphase b
Questions-Answers
Mediumand can refers important to learn of dat a lo g
Building, and data
a analysts it datatore, raw datthere,
everything a the ve
PART-5 get
involve constitutes data in is in of
to identify the its
: extracted store.
perform ext data foranala ytsie which data production
step datasets tean
transformations the to nuances science original, Can ract, a
Communicating for many process feeds, from Big the
Questions
Answer
Type catalogue a choose in tranalorm,
data Data
members preparstion
data intoa additional and
sandbx, tea
complex reasonable ofproject raw ita summary analytics dataheses
analysis. state of the toraw
cleaning the datasets condition. transfornations, unetructured it ean
on
istransforn
to forn l o nd ed is requiree
thatsteps the data data level aexplore
Results value become projeetaceens best
enables todata.data, sources and procesoes WITCS
sources provides pratiee the 1-17J
join arnd the
loaded aggregated volurte
text high,to with the
normalizing familiar data and data
Open. or that thatexpected to data
analysismerge contert iito load extr
the the witk ins fromdata,
th

building
phase. 3. 2 3 2
Common Que
Answer phase ?1.22. 1-184C8TT4)
Answer phase
Aetivities ?
Que
b
SASSACCESS: b a
database SQL
aggregations, brusinens
as Ensure D. Ansess e
1.23. Determine
part
appliances,
warehouse OLEDB.
SAS sandbox connection Itcode. It The 121What
datausers SASIACCESS analysis Data environment analyticaland
transactional
Depeding
has has tools of that that
Explain generally canitself is analytics via a
What objectives structure the
the a for if strueture are
via an complete larger the
connect and the
services open and
ability the are on performed are
multiple for situation
analytic analytical data of
the provides basic exe the and whether
techniques the ativities
ofsource building set model
to predictive cute to acceptdifferent of
common common com data the
used data : of allows
relational integration SQL interface
statistical
connection. plan workflow. techniques the in that
on
modeling
interpretive mon or fdatasets
or seta model
files,and connectors a
reject toola team
file Analysis
models. data ning the is are
commereial databases tools singe
extracts, with
capabilities andplans next one
between tests
mining phase the
enable planning perfored Introduetion
for model approaches factor :
applications.
enterprise such services databases working phase.
to
and models
but : the or the analyze that
(such as
SAS phase
functions, analyses and a in
tools with OBDC, with model hypotheses. model to
and can
series team are dictates
as via providesa textual Data
SAS/ACCESS,
Oracle) the to
ofrequired. are
for JDBC,and perform againsthigh-quality
an
analytics involved
planningtechniques meet the planning Analytics
model ODBC data
good tools
and in Big the or
1-19J (C8-5/T-4
Data Analvtics
1-20J (CS-5/1T-6) Introduction to Data Analyties
Answer buildingphase :
tools for the model 5 MADIib: SQL in-database implementations,such as MADlib MADlib, provide
Commereialcommon analytical tools. provides
Miner: predictive and an alternative to in-memory desktop algorithms that can be
SAS enterprise learning library of
1. models based on
SAS Enterprise
large
Miner allows
volumesusers
of data
descriptive
torunfrom across the enterprie, an open-8ource machine
executed in-database, for PostgreSQL.
data stores, hass many partnerships
with other large analytics.
b It interoperatesenterprise-levelcomputingand
and is built for offers methods to explore
Modeler provided by IBM : It and
2 SPSS GUI.
analyze data through a language for performing av
Matlab: Matlab provides a high-level exploration.
algorithms, and data
variety
ofdata analytics, provides a GUI frontend for platfo
users to
: Alpine Miner
4 Apline Miner workflows with Big Data tools and
and interact
develop analytic
on the backend. well-regarded as
STATISTICA and Mathematica are also popular and
5
mining and analytics tools.
open-source tools for the modal
Que 1.24. Explain common
building phase.
Answer
are:
Free or open source tools
1. Rand PUR:
interpretive models
a R provides a good environment for building with R.
language for PostgreSQL
and PLR is a procedural
Using this approach means that R commands can be executed in
b.
database.
C This technique provides higher performance and is more scalable
than running R in memory.
2. Octave :
a It is a free software programming language for computational
modeling, has some of the functionality of Matlab.
b Octave is used in major universities when teaching machine
learning.
3. WEKA: WEKA is a free data mining software package with an analytic
workbench. The functions created in WEKAcan be executed within
Java code.
4. Python : Python is aprogramming language that provides toolkits for
machine learning and analysis, such as numpy, scipy, pandas, and related
data visualization using matplotlib.

You might also like