BDA MODULE-2 (Unit-2)
BDA MODULE-2 (Unit-2)
BDA MODULE-2 (Unit-2)
Analylia Ahrncács
. The woys
poblcms
adoptcd
are Kopwn
to sort, analyzc, or solve
as analytical appoaches
poblen
peatomance a model
ad al0 impove the o
oiLdon o crowdi.
. Aceoidiog to
Accoiding James Surowicck, The wiudors D
CrOwcdxplains the princíple ojoup 1hink
masc4 are btter
1 an the cont cpthat thc
cleciiîon makes
poblem solvers, forccarlas, and
than anY Onc i0dividual.
cosmkle
namploilexen aoihm,
# bain ctand tor bootstrap
a3gogatiog
.)modcl comprcsuCcl of many modelt stallcd ao
Cnsemble moclel
mufiplgithrms umin)
SVM Regciion
B0/
Data NB Hfoal
7o poduct
b
•Bootsthap A9cg ation.also kouon as baggin is a
machine earning enucmbl e meta algo sihn deiigned to
fmpoe the iiabilit9 and acuraty of
learníng algorilhs uwcd lo statiiticat clasificatipn and
Tegcsíon.
Itdeocasa he Va ñance and hdp to avoid overfiti07
•It wualy applied to dectsipO teemcthodi
o0ríginal
Oatact
Boottafr in 7
...
clauifi2 (arsilicr K
Enscmble model
Booufing
-It is on cnsemble modeling tcchoiquc nat altmpls tobuild
a stvony assiicI bom Ihe numbrr of wak claubhers
- It is done by building a model by wf weat moclels
Typa of B 00ri
qradint Boouiog:It n a boauting fcchnijue that butd
a
fioa model bom The sum o
Stvtral eak
w (aníng alsoth ms That wre traincd on
same dataset1t
Ihe
optratJ on
sfagjcwie iaddion.
the idea of
Fop < bt tfist model cwill beco me p & seLo nc
model cohích ís uca to ecuee e0Y a
7he data t to 9adicnt boosfing must be în the
foum o numcocal categon cal data and
os
uj cd (0 9cn crall
The scSfcual
he
mut b
Loss function
ii )dabpast:
modet s buít in îclentifyiD The LorDngt clautied
.Thís
and gívíng more wcíghrage
process
wcíght
•It
lanc
o0hcny
catauct
acusay when a
acusauy laige proposioo
Of dala is
INSTANce
TREC 1
clas ctau -y
clau- X
Majoril Votin
Final clas
ie anolhei ensemblc algoihn 1hat 3eneralt
Boxting to meige
and obtaïn weial
nuliple model
trom Thele modejs into a single peticted
pacdicliont
clai6ficafioo
toicls
eKando algo ilh whae i)
torcstíe a data-minin
Kantom are qcnerated, mulliple.
campla o data
D
rando conteuctcd,
ant a radorm subsct
hece are
dociñg) each ree.
Call ed predic tors archcvaluatcdtor
inpuls (
nnalyi:
• Data 9nalyícis
ugithoutIhe ai
oogfit
don
Ltañzed tooli (o' analy 2
be q
lalt 130', anclyticc,wa. coniídered
to
• Inthe
very daunting taik due to he"availabi lity of thc
appopvialT computation
tools <yilm
or Ihoe al i,o
data analyti was petoimed by uing
main thamo and by developin numiour prograrg
Langy age
cvdes ing the complicated Tob contol,
ur
(JCL) colo
Easy
imited
(ike
6tatistical
harcluware
IBM, Fortran
mcthods(Betoe,1910's)
cafabilitiu
. firnple pr0g7a
we to
JIt a p'DCa Thar 0naníz atio
ns
)zurc BDA
(crvíccu)
charli
Tobli- Tabl cau,Gaphy
MDdo víuualízatt on
Tool (uch au
Vítualiz ation rDols has <normouiy
help
ndviz0r and spot tire
JNIP.
Quíckyicw uIcre move
ion al and buuinss
anay iG proteti
clatai and hetp
thcm to
laphice'DI
byon Just c ccmi ng ly nIclated piccea
dolt between
(onncct the tmeilíoc and nuprc
o data,rad betwccn
oJnot
io tormatipo vaiou
the
poldatio
Tools
Popular Analyieal
toolsac availakle 70
analytical
Variouu typei o
s tor
Thedecísí on invest in an
to
mann i0 obich bi
compant tecctb a
data
foro cuitomrd, lhe amount Or generated by
al Thesc various & OurceI & activitits , tutue g0cl
pcrspectives , ete n caicful eooicderaion
of atl
thesc vaniDut ounccs b aivities ,tuturc g0als &
tanttut (oniideration
ofal
o thcse.
perspecties, et A
thè tool that
selec
factor help tompanies
individu at 'rquincmcnls,
best suilcd to i/5
Shs
package
IH u a he, open (ourte canalyticr
•R widcuy used by
ii
scveral anay tta pp feruional,
and wuay used is in the atadic ticarch
developmcnt environmeng.
and
0rieotcd, platto urm inclependen
R is objecl
(work! o0 all opevating rytem)
co an aly ze dtatistical infomathioo,
Rii used
cpeentati on, vepouting K data
Graphical maaling
R CCn be wib ci/Javarsh on
Commpn
GGNor
dplya
Líbraícs:.
mata
yr data vualizatio
maniputatiog
. 99ot
shioy,
,R ha he tollo win7 cmakable fcatucs :
cd o 0Ther analytical toob, R is moc
Ds comp
in differcnt aplicaipns A
cmbcd î
ticalti Ihir
language.: Drvefopr an
R ir anicxtoiible
disbibult it as ddd òn
Oon sD twaroand
packagee
Twilter
fatebooK, fovge
.tack o scalability
•SlDwY than
sficutt poctis
vanníng
IBM SPSS :
fo hc social scicn (e
SPSS -statistital packag soo
.The sPSS analyticat toD ol wa! intocuced in 1168
i ndvanu analytics:
can.poor mocteling
toees.
oimnbe
analysís 0ecision
in this windDw
.IBM SPSS tatíutics can tad. dal a foom and
LOile to ns1I fles , databascs & tabls of
other statiitical softw o e. This tool alro
oliD helpi ig
Dpplications:
us ed io vauipu fietds induding
IL,í wiclely
SDti al 4cicncs
, heatI cart, marketing bypobrin
prcdicive anatyiñs
tu fing
s
S0S stànds for slatiutical
inovmati pn
Doalyiiu
delivery sytterm, obich s
•SOS cfs to an
computin
and hadw are- independen t
10 TheIBM mainhcme.
S SA intood u ced in 1976
handle
w0ld,Io
infovmo
ation dlivey ryitem2iaoe
data stepspoccctui e
efentiu
ability to hondle lasge dátauct &" po fom to
t Comple statíshcal analys s
Is tlexibility, sata bility choilo
ctaiitícal capabiIttidmake a top
features O'SAs
taiaicc softwar c .
'otfes a vaiícty ot statistical
Sos statisis tta iutic al
The infoimation al ¿Ductec by using
to
altows ojahizations
todu
analytiu
dleiet
pD(il
bpmentand ions
p0cedurd, cntiprse
hetpr
SAL statiutiçal,analyfs
reta in Theis customs
BusintuS
Coletted
data mining Lthke
and
bellts clecisip os
stater9
tis Data vsualigatipo:
virualiza tí on poviles user -fitndly
SDS data
Jraph
iotefalek.peime
o th esus of 1he analyis.
charlsD
ypes óf fovecattio
suppovfS all
SAS unigucly and fovccait
anaty*
to
tooiThat hetp
fOre caitio helps,
Various typa
bfppcesI Cs i aned
busintss shatcqics
identity ncuo,
01ganíZatíons plan tor
erpcet vaviations
sAs ab several tt
to
tutuve opelr and forcCaut
basiness
Lndestan d pucvi pas shatgia
utßc
thc busíncsr poccss
take
planss0
schedulto .
.pojcct
porvide optimizatioD
Ss tool
to achier e mazimuy
techniqucs
and sínulaion
cs utls while operatng
wilhin ncthichioo and irmited
Thet e cnab/e 0rganizahoo fo:
stratcgy
•Inpove contort
cooiderjcontíogenics cSOUTccs
on of
Dctrmioc16e bet
allbcati
statiitical analytis to
Some advanced
ncuda paoc eis:
t
improve
he prduct develtpmcn
0tho
Oata vitualízation it bd
SumorT
Drplicatio ns
Fraud delectine
1 1tealt carc, finon e, maiketin 9.
i Deciwion making
• TBM SPSS sIatútis i considered bettr han shs
causc 1Ouer price and
be o ilT The, passibility of
06tainiog Jencrated
de Cison oithoul
totcs The otcd to purchase thc
ata minín
suit*jn0ce
Suppbe thcre i
Theo,it is beli
cicni
o0
to
indox
mrge
applied
wo
on h
talu quick
jon keys.
y oib
Ihe help ot an sQL Join inittud o a sAs megt
iv Data management
OVer
SAS has an edge
IBMSPSS and iu
o is,funttions
V Documcntation
: hiox
fo SEN it
bat
deicóphve algoñthrys1
pridetincd tuncions
ctipos,badta bad
(en ianitat
funct pncompc