BDA MODULE-2 (Unit-2)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

UNLT-I|

Analylia Ahrncács
. The woys
poblcms
adoptcd
are Kopwn
to sort, analyzc, or solve
as analytical appoaches

apioaches have becn îo ute ovor


Many analyhcal ot 1hd dato)Ato
lhe yearc. lo weve, aJ Ihe sizc be

anaty?cd grew, )ewer analy tical


appioacheii
werc
adopted:
anay tical appoachcs,
The commioly ulcd
which are a tollo
w:
i Cosenblc methods
i Tet data nnatyticshpooea
j) ENSEMBLE METIDDS:
of generatinT multiple.
•It cters to a rioccss or
1hcm to solvo a specihc
method and ombinin

poblen

• Themain oim ot uriog ensemble melhod is oot item


minimizehe p0bability of selecha po

peatomance a model
ad al0 impove the o

oD Thc 1hcoy t the


{osemble modelk are bascd

oiLdon o crowdi.

. Aceoidiog to
Accoiding James Surowicck, The wiudors D
CrOwcdxplains the princíple ojoup 1hink
masc4 are btter
1 an the cont cpthat thc
cleciiîon makes
poblem solvers, forccarlas, and
than anY Onc i0dividual.

cosmkle
namploilexen aoihm,
# bain ctand tor bootstrap

a3gogatiog
.)modcl comprcsuCcl of many modelt stallcd ao
Cnsemble moclel

Fniemble tcavniog incudu 6aggn9. Boating statkiog

mufiplgithrms umin)
SVM Regciion
B0/

Data NB Hfoal

7o poduct
b
•Bootsthap A9cg ation.also kouon as baggin is a
machine earning enucmbl e meta algo sihn deiigned to
fmpoe the iiabilit9 and acuraty of
learníng algorilhs uwcd lo statiiticat clasificatipn and

Tegcsíon.
Itdeocasa he Va ñance and hdp to avoid overfiti07
•It wualy applied to dectsipO teemcthodi

A: The Kandom forat mocet uuce


boyosng

o0ríginal
Oatact
Boottafr in 7

...

clauifi2 (arsilicr K

Enscmble model
Booufing
-It is on cnsemble modeling tcchoiquc nat altmpls tobuild
a stvony assiicI bom Ihe numbrr of wak claubhers
- It is done by building a model by wf weat moclels

-- Fistty, a model ir built frpm The training dala Theo he


Second mdtel ii built which thu to conect Thr endrs pIescn
in b tiut modcl.
-Thiy prDccclurce is contfou d and MOd as are addcd unt|
either the complclT taini ng data set s prtdictect
concatty

Typa of B 00ri
qradint Boouiog:It n a boauting fcchnijue that butd
a
fioa model bom The sum o
Stvtral eak
w (aníng alsoth ms That wre traincd on
same dataset1t
Ihe
optratJ on
sfagjcwie iaddion.
the idea of
Fop < bt tfist model cwill beco me p & seLo nc
model cohích ís uca to ecuee e0Y a
7he data t to 9adicnt boosfing must be în the
foum o numcocal categon cal data and
os
uj cd (0 9cn crall
The scSfcual
he
mut b
Loss function

dittrcotial at all tmu

.The full name 0f XG Borr alg oñhn


ohi lh
iu The cXhene
an cuhcme
Cpadient boDstig algonthrn, ít

vaatí on of Ihe previour yadint boating tehnigue

XBOt applíu a rgulaiation aprDach -

A ddítionaly, itt oonkt btllCi Th cohen the atusrt


contaíns bolh numical and catao al voables

ii )dabpast:
modet s buít in îclentifyiD The LorDngt clautied
.Thís
and gívíng more wcíghrage
process

ItworkI on fbe piocíplt stagcwuc adiion m(hod


ohere multiple weak lcarnrs are wcd tu
Jb pg lcarns
tpha parameti oill br indive cty prrpotional (o

MOclel Modcl z MOdcl 3)

wcíght

pDu o weak (carns ct u to fom onc stog leana

OcacasU Bia ,D0t raane


• Mod ave weighkd accovdig to theír pofomance
.Io Toí¢ awe claKirs B au
taincd icquentially
ex;Adaboot
RONDOM roRCSJ;
Foeto Kandom Deciion Focs o
The Random
suprn iscd Mathinc lainio algoilhn yed tor
ra cle cisin
clasihcation, cgc1Si oD. and oincr raski uin? tree

• Random Focss arc povtículaty well-iuited for handlip


pontulady.
deating cwilh high -
largc and complre datascli,
climenáonal feauro spacee, and prövidinJ ing9 int
feature impontanle
at cxplain whywe dhould we (bc RF:
Some poinls Th
compored to olhcr algonb

•1H predicü outpur wih aeurauy.


t high

•It
lanc
o0hcny
catauct

can also nainluin


uto
runs

acusay when a
acusauy laige proposioo
Of dala is

Random forest clausicatio

INSTANce

TREC 1

clas ctau -y
clau- X

Majoril Votin

Final clas
ie anolhei ensemblc algoihn 1hat 3eneralt
Boxting to meige
and obtaïn weial
nuliple model
trom Thele modejs into a single peticted
pacdicliont
clai6ficafioo
toicls
eKando algo ilh whae i)
torcstíe a data-minin
Kantom are qcnerated, mulliple.
campla o data
D
rando conteuctcd,
ant a radorm subsct
hece are
dociñg) each ree.
Call ed predic tors archcvaluatcdtor
inpuls (

ii) Teut Data Analyes


ata to
modelin g of
tertual d

The pioeescing and texF Data


iouigh/ js called,
goi0 u elut businc! in4ighlaio
qoin

nnalyi:
• Data 9nalyícis
ugithoutIhe ai
oogfit
don
Ltañzed tooli (o' analy 2

fo cienial data analyií íu bxt minínT.


part 'of tcet
uobich învolver"obtainin 7 meaning tul ondkgh
uality information oniitutual data the hizb
1uality antovrna tion is derived by finding, aclationrhip
and usetul butín cus pattni Bendi, oro, a
masivc
medical dic tatton Dr
even tnt that can be ccanned hpm a hard cOpy

Converted into eloctvontc torn0h0 S

Dala i0urco Data foYmat 1Data sbuttur type


Litcrcturea TXT Doc,H TML
0r PDF
UNiuld
ocb-page &emí-4tuttured
New artícleu
unshuctucd
E-mai| TaT, M4 G, 0 E ML
tlidorg ot oatytical Tool

be q
lalt 130', anclyticc,wa. coniídered
to
• Inthe
very daunting taik due to he"availabi lity of thc
appopvialT computation
tools <yilm
or Ihoe al i,o
data analyti was petoimed by uing
main thamo and by developin numiour prograrg
Langy age
cvdes ing the complicated Tob contol,
ur

(JCL) colo
Easy
imited
(ike
6tatistical

harcluware
IBM, Fortran
mcthods(Betoe,1910's)
cafabilitiu
. firnple pr0g7a

Devlopment o DBMI(I970 190)32os8g


-sal arc cts data, Job contol languag4,cC)

tcnguagc, OLD muutf-dim enional data


sOiphng
anayrit
comm c cíaf analytical
By the tate 1990, all The

tools sueh iotcace


Dffcred
Then onward,u
1n
om usY0
there wtre tremendou! improvemenls
tomake more powrtulJrapbicu,
intifices
bns Th at are
workflow cliarams ond applicalí
manly to cus ed oo spcciticpoint solkutí004;

Kise o Dataware house i980 -199D)

we to
JIt a p'DCa Thar 0naníz atio
ns

data tom aitfrcnt Dur tci în to a iojle daabaic.

• fmigencc oj data minio (1920)


- Advanco statisti caf so twarc, sAs, SPS o
prtovm prdichvo analytir, ctasíficati pn,
etustriz: point" olutinn, sotftoar patagt,
solu group oj problcn

Big Data Eia (20D0) - la,ge voume ara


ot Da
Nos untbuctured dara
tladoop
machinc kayoip
• Modan 7

MW fhamcworke(py torch <cikit -Learn ) #meied


moc cp sopbíutca ted ML mDal cls.
Enablcd

loud compuig & BDBD(w,


& G00gle cloud,ot

)zurc BDA
(crvíccu)
charli
Tobli- Tabl cau,Gaphy
MDdo víuualízatt on

Tool (uch au
Vítualiz ation rDols has <normouiy
help
ndviz0r and spot tire
JNIP.
Quíckyicw uIcre move
ion al and buuinss
anay iG proteti
clatai and hetp
thcm to
laphice'DI
byon Just c ccmi ng ly nIclated piccea
dolt between
(onncct the tmeilíoc and nuprc
o data,rad betwccn
oJnot
io tormatipo vaiou
the
poldatio
Tools
Popular Analyieal
toolsac availakle 70
analytical
Variouu typei o

1he market, but, 0o companYcan by and implemnl


SDurG
ot thm (om ib thc analy tical
all
as tolto w i

s tor

Thedecísí on invest in an
to

and needs carctul coniícleratino n he


pasamelcr Such as hec
pat o a (ompany Dn varíou!
ypr D fhe corn pany the kind of procuclt or
and
Th e and ways
toca tíons to.,întract
services it offere:
oilh (ustormers, stakeholdersinvoKed in The buin es

mann i0 obich bi
compant tecctb a

data
foro cuitomrd, lhe amount Or generated by
al Thesc various & OurceI & activitits , tutue g0cl
pcrspectives , ete n caicful eooicderaion
of atl
thesc vaniDut ounccs b aivities ,tuturc g0als &
tanttut (oniideration
ofal
o thcse.
perspecties, et A
thè tool that
selec
factor help tompanies
individu at 'rquincmcnls,
best suilcd to i/5

The folowing are some ropular analytical tool *


The R moject tov ctati tical cormputing. eia

Shs

The Kpeject fo1 statítticat Computiny


.Thc K poject
roputavly
tor ttatíitical computingalis

package
IH u a he, open (ourte canalyticr

• Ric an implementation o the s prograMmin


calicr being uied t
languag which
siaaobg U0 tV
Statírtical
deríved
anayiis
is namehomik Ceatos , sce
Ihaka
Robut Gentleman

•R widcuy used by
ii
scveral anay tta pp feruional,
and wuay used is in the atadic ticarch
developmcnt environmeng.
and
0rieotcd, platto urm inclependen
R is objecl
(work! o0 all opevating rytem)
co an aly ze dtatistical infomathioo,
Rii used
cpeentati on, vepouting K data
Graphical maaling
R CCn be wib ci/Javarsh on

Commpn
GGNor
dplya
Líbraícs:.

mata
yr data vualizatio
maniputatiog
. 99ot

ott tomachíne (eavnin

shioy,
,R ha he tollo win7 cmakable fcatucs :
cd o 0Ther analytical toob, R is moc
Ds comp

•R t can be catily inked oil5 (be pogammin


porible to

in differcnt aplicaipns A

cmbcd î

tom com mrcial analytic


R can now be împlemented

ticalti Ihir
language.: Drvefopr an
R ir anicxtoiible
disbibult it as ddd òn
Oon sD twaroand
packagee

dvantagc: new moclelin


compahbiliy
analyis approacher tarks, such
fr diHercnt typs bt n00 tatiitica!
Vird
vil uati2atí on k
anayris
h Qs data proceiting, gaphic
0f all typs of data
tioane
utuatly în domains such as
used
Jt ls t
siii33s12,
biploy <maket sscachsot

Twilter
fatebooK, fovge

.tack o scalability
•SlDwY than
sficutt poctis

• The R soffwaic uns in mrm0y ar Comparcd

vanníng
IBM SPSS :

fo hc social scicn (e
SPSS -statistital packag soo
.The sPSS analyticat toD ol wa! intocuced in 1168

•Iotu atio Bus in ctt Mahio c


(po al Busincis al cosporation (I6M)
SpI íu a poweifut tatitical tof fwarc plattoig
to oata maagemtnt, ttaticàl nnalyri
n aphic al spescntati on o Dàta

•h functionality ot IBM SPs can be accsIcd oik

hetp.ot apropiict usy 4L


46 tourth
Geoceatipo progvamminlanguage) *nown
Jyntux tanguaye ,an d ulín Jraphícal
intufarc menu
D eríng

Key fealusc : ilidisgco0L


i 0ata managem enl: 3
and manage larse l9
|wecan cariy impo»
datase, ercat ustoo data tranfomàttpo and
man ag mitin2 data.
b Statisti cal oalyir iAiss
Dtfere wid e range or statu tie al proe dureo
including ducriptfvo stahstis, inerenfali
statistis rgcsíbn analyss

iii> Grahícal Reprcicotation :


I povicls detaîled, chali
tool tor caeatin
nd graphi ísualize your data

i ndvanu analytics:
can.poor mocteling

toees.
oimnbe
analysís 0ecision

& civcle intface,it uppoT


6esice the point

Commad synta o cpcirive task .


SrSS CoOmands d (xctuted one linc at a
and add e Uils to the
tíme tD updare tablcs
t editoa wind po . 4oL aliD store Che
Dutpu you

Cetut ed ynturJ coith their times o et tutipn

in this windDw
.IBM SPSS tatíutics can tad. dal a foom and
LOile to ns1I fles , databascs & tabls of
other statiitical softw o e. This tool alro
oliD helpi ig

bajic tata managemcht funaiooi,


prDvid ing
cuch ai s0fog agggaion and table mergc
the output dhe to
IBN Spss statiiiu can seDA itty
a tile instead of The Output, Cditor indoo.
Thr flc can be in The •(aF, html, 0r nt

xol toymat The output, monag emrnt Syittm

(OM) hetps ín stoíng th outpuli obtoined hom a


urmbrr pt
of ikatcd calcuta ti on in a iomle
large
kie by cating a
differcnt
IBM pSS Staiuh s c on be
Mac X, and Uoix.
platfons such as iodows, a

Dpplications:
us ed io vauipu fietds induding
IL,í wiclely
SDti al 4cicncs
, heatI cart, marketing bypobrin

prcdicive anatyiñs
tu fing

s
S0S stànds for slatiutical
inovmati pn
Doalyiiu
delivery sytterm, obich s
•SOS cfs to an
computin
and hadw are- independen t

10 TheIBM mainhcme.
S SA intood u ced in 1976
handle
w0ld,Io
infovmo
ation dlivey ryitem2iaoe

data stepspoccctui e

SAs p Deluels ,commony KnDon as nmDuler


and behavíor al
are

efentiu
ability to hondle lasge dátauct &" po fom to
t Comple statíshcal analys s
Is tlexibility, sata bility choilo
ctaiitícal capabiIttidmake a top

to analyuls K rcicai chers

features O'SAs
taiaicc softwar c .
'otfes a vaiícty ot statistical
Sos statisis tta iutic al
The infoimation al ¿Ductec by using
to
altows ojahizations
todu
analytiu
dleiet
pD(il
bpmentand ions
p0cedurd, cntiprse
hetpr
SAL statiutiçal,analyfs
reta in Theis customs

i> Data bteut


organi2a
mining
tipos
:(oiic ct la)e amoun f
data
a

BusintuS

Coletted
data mining Lthke
and
bellts clecisip os
stater9
tis Data vsualigatipo:
virualiza tí on poviles user -fitndly
SDS data
Jraph
iotefalek.peime
o th esus of 1he analyis.
charlsD

ypes óf fovecattio
suppovfS all
SAS unigucly and fovccait
anaty*
to
tooiThat hetp
fOre caitio helps,
Various typa
bfppcesI Cs i aned
busintss shatcqics
identity ncuo,
01ganíZatíons plan tor
erpcet vaviations
sAs ab several tt
to
tutuve opelr and forcCaut
basiness
Lndestan d pucvi pas shatgia
utßc
thc busíncsr poccss
take
planss0

schedulto .
.pojcct
porvide optimizatioD
Ss tool
to achier e mazimuy
techniqucs
and sínulaion
cs utls while operatng
wilhin ncthichioo and irmited
Thet e cnab/e 0rganizahoo fo:
stratcgy
•Inpove contort

cooiderjcontíogenics cSOUTccs
on of
Dctrmioc16e bet
allbcati

cn and deploymcol -haoh


Vi) Modcl managc proCett
The tedious
managc simplfict
Shs model analytical
deployin a
eating managing, and
veriiu Ihe atcuvauy
monages cgularly
mndtc:The
modeti
op The modets
ucefulocs4
:262
ual fty inmprovemot to
vii
Quality Conbol. (Qc) tools
Shs poovidr dul poVe
cnhonc ho qualíty os pro
agcs oganizafion
ation 1t cncoui
inreate cuttomersatílaction
to 'g0 btyond tadition
go
a 1occsics
lant bastc pocest and ct

statiitical analytis to
Some advanced
ncuda paoc eis:
t
improve
he prduct develtpmcn

0tho

i> Qatamanagemch a Stn diffuent, fomali


I can ímpot por, atans fo rmin
t00ls or
PDVidu Dbutt
data.
statstit
15edian,
i> Desciptive
analyí inctudinT
atto ws slattticat vanance
deviatio
stanctart
slatirticr:
, DNOVA
iy To fienhal test
chi-squaici
Sappoi,
i, Reeuion
UIed for
tnalyris
(incai ;
t0gítis,,
muttíple gthian analyis
17ke
.dictive m0dcl wilb o ols

SAS cosemble Ddels


2
trees, Nrural, Netwok
Decísion

Oata vitualízation it bd
SumorT
Drplicatio ns
Fraud delectine
1 1tealt carc, finon e, maiketin 9.

el,wilh; differcnt t9p of operalin


SAs woks w
Sucb as maiotramtu,
syoterns & compukrt
pasonal Canputer, & macintoshes.
Comporing Voaiout foalyticat ools
Thr tolldwins prctco ls aa britt compasson of
ditfrent analytical tool curh ar R IBM, SPSs.

i User îo ftrface : stoo

Betoe he innoductionCottpisc Fuide, osos


spss was igoiicantiy ahead o its ro gad
usti -hicndlincsu, au it har an intrface imilar tn
MicioIof t standaid
ío sectos
•Spss is alo vcy su Ctessful such
makein7 and The social seicoces .
IBM SRSs statistis har a Jrapbieal intafa ce
cootainin0
1, R doy oot 6avo a univeN al graphical in terfacc,
and al! o i function aie not intcga tcd int
valus giaphical 7otcrfa tcs dithbuted in i

i Deciwion making
• TBM SPSS sIatútis i considered bettr han shs
causc 1Ouer price and
be o ilT The, passibility of

06tainiog Jencrated
de Cison oithoul
totcs The otcd to purchase thc
ata minín
suit*jn0ce

otSAsyou wil have to buyr Eotr prie Mincr


•IBM SPSS oters mbrc theo al1ovitbm. than R,
and thus iF s a befitr choice in casee ohert
you nced to take a lot ot decui0n.
i} File monagemo stabilty?
ore icooplclc. &.flrxible wih itgand to file
.sOs ir
calte d

managcmcorIE has an cohancd protedúre


bolh itlaipbal and s9s
TRÐNSrosE tD, manag
uy
. tables, clfr

Suppbe thcre i
Theo,it is beli
cicni

o0
to
indox
mrge
applied
wo
on h
talu quick
jon keys.
y oib
Ihe help ot an sQL Join inittud o a sAs megt

mergini when iF coms to (ajc


tablu bfose
Voume conideica the obil
D petorman(e, sAs
Jo ttims
bir ihe thice systm
stable

iv Data management
OVer
SAS has an edge

IBMSPSS and iu
o is,funttions

.major draoback of R ii hat mbit


have to (Dad all Th data înto
rmcmpry prDo to
This bóns a imit on the Dlumu to bo
hCNccution
handled pactag have staited to
bnwtvch0aip0
he o Ihis
bicak
lincar midcu
deiga
bigm package

V Documcntation
: hiox
fo SEN it

The technical d0cumcntationdeigned


alm DI 8000
for instan,
vey compch cniive the just
mod ute, cohich it just á pal02
(0 TBM
DS
SfIS statistiu,
oricl es
&
In conpaipn iucb armathematical
functipns
many pa ede fintd .
hes e include depreciaons
tioancial tunctions. funions,
compound tnkett. cash fow, bupcykolic
, (ombína ioos & anangemeots,
-factoal

furthtr , Jho pre dithve & deserpt v aly oilhmi


availablc in B,and shs a0c o0c n 0umbe) au
IBM SPSS <tattstia . The staliihcal
to
Cömpascd
7ndi cato s n sAs are mpre Actatled
pavamttr settng finay
compaied to spSs, ehsh oacrb (anguage
mot lxiblc & complelt

R IBM SPSS SAS


Features
Beltn Evcllent
uSer iotutato Good
Belley
Deciion makin g

rile manage mcnt Excelnt


stabilitý
Belle fecclunt
Data Managcmcn

Dbcumtotatio io ree Require tepeorire

Rquired coding Mio codi


(anguage Bot
-point:
Command linc
bascd intrtae

vírualization ston Not


capabilitics Moderale
wirh flxible
(brylyshio

bat
deicóphve algoñthrys1

pridetincd tuncions
ctipos,badta bad
(en ianitat
funct pncompc

You might also like