Data Mining (D) PDF
Data Mining (D) PDF
Data Mining (D) PDF
DataminingWikipedia,thefreeencyclopedia
Datamining
FromWikipedia,thefreeencyclopedia
Datamining(theanalysisstepofthe"KnowledgeDiscoveryinDatabases"process,orKDD),[1]an
interdisciplinarysubfieldofcomputerscience,[2][3][4]isthecomputationalprocessofdiscoveringpatternsin
largedatasetsinvolvingmethodsattheintersectionofartificialintelligence,machinelearning,statistics,
anddatabasesystems.[2]Theoverallgoalofthedataminingprocessistoextractinformationfromadata
setandtransformitintoanunderstandablestructureforfurtheruse.[2]Asidefromtherawanalysisstep,it
involvesdatabaseanddatamanagementaspects,datapreprocessing,modelandinferenceconsiderations,
interestingnessmetrics,complexityconsiderations,postprocessingofdiscoveredstructures,visualization,
andonlineupdating.[2]
Thetermisamisnomer,becausethegoalistheextractionofpatternsandknowledgefromlargeamountof
data,nottheextractionofdataitself.[5]Italsoisabuzzword,[6]andisfrequentlyalsoappliedtoanyformof
largescaledataorinformationprocessing(collection,extraction,warehousing,analysis,andstatistics)as
wellasanyapplicationofcomputerdecisionsupportsystem,includingartificialintelligence,machine
learning,andbusinessintelligence.Thepopularbook"Datamining:Practicalmachinelearningtoolsand
techniqueswithJava"[7](whichcoversmostlymachinelearningmaterial)wasoriginallytobenamedjust
"Practicalmachinelearning",andtheterm"datamining"wasonlyaddedformarketingreasons.[8]Often
themoregeneralterms"(largescale)dataanalysis",or"analytics"orwhenreferringtoactualmethods,
artificialintelligenceandmachinelearningaremoreappropriate.
Theactualdataminingtaskistheautomaticorsemiautomaticanalysisoflargequantitiesofdatatoextract
previouslyunknowninterestingpatternssuchasgroupsofdatarecords(clusteranalysis),unusualrecords
(anomalydetection)anddependencies(associationrulemining).Thisusuallyinvolvesusingdatabase
techniquessuchasspatialindices.Thesepatternscanthenbeseenasakindofsummaryoftheinputdata,
andmaybeusedinfurtheranalysisor,forexample,inmachinelearningandpredictiveanalytics.For
example,thedataminingstepmightidentifymultiplegroupsinthedata,whichcanthenbeusedtoobtain
moreaccuratepredictionresultsbyadecisionsupportsystem.Neitherthedatacollection,datapreparation,
norresultinterpretationandreportingarepartofthedataminingstep,butdobelongtotheoverallKDD
processasadditionalsteps.
Therelatedtermsdatadredging,datafishing,anddatasnoopingrefertotheuseofdataminingmethodsto
samplepartsofalargerpopulationdatasetthatare(ormaybe)toosmallforreliablestatisticalinferences
tobemadeaboutthevalidityofanypatternsdiscovered.Thesemethodscan,however,beusedincreating
newhypothesestotestagainstthelargerdatapopulations.
Contents
1Etymology
2Background
2.1Researchandevolution
3Process
http://en.wikipedia.org/wiki/Data_mining
1/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
3.1Preprocessing
3.2Datamining
3.3Resultsvalidation
4Standards
5Notableuses
5.1Games
5.2Business
5.3Scienceandengineering
5.4Humanrights
5.5Medicaldatamining
5.6Spatialdatamining
5.7Temporaldatamining
5.8Sensordatamining
5.9Visualdatamining
5.10Musicdatamining
5.11Surveillance
5.12Patternmining
5.13Subjectbaseddatamining
5.14Knowledgegrid
6Privacyconcernsandethics
6.1SituationinEurope
6.2SituationintheUnitedStates
7CopyrightLaw
7.1SituationinEurope
7.2SituationintheUnitedStates
8Software
8.1Freeopensourcedataminingsoftwareandapplications
8.2Commercialdataminingsoftwareandapplications
8.3Marketplacesurveys
9Seealso
10References
11Furtherreading
12Externallinks
Etymology
http://en.wikipedia.org/wiki/Data_mining
2/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Inthe1960s,statisticiansusedtermslike"DataFishing"or"DataDredging"torefertowhatthey
consideredthebadpracticeofanalyzingdatawithoutanapriorihypothesis.Theterm"DataMining"
appearedaround1990inthedatabasecommunity.Forashorttimein1980s,aphrase"databasemining",
wasused,butsinceitwastrademarkedbyHNC,aSanDiegobasedcompany(nowmergedintoFICO),to
pitchtheirDatabaseMiningWorkstation[9]researchersconsequentlyturnedto"datamining".Otherterms
usedincludeDataArchaeology,InformationHarvesting,InformationDiscovery,KnowledgeExtraction,
etc.GregoryPiatetskyShapirocoinedtheterm"KnowledgeDiscoveryinDatabases"forthefirstworkshop
onthesametopic(KDD1989)(http://www.kdnuggets.com/meetings/kdd89/)andthistermbecamemore
popularinAIandMachineLearningCommunity.However,thetermdataminingbecamemorepopularin
thebusinessandpresscommunities.[10]Currently,DataMiningandKnowledgeDiscoveryareused
interchangeably.Sinceabout2007,"PredictiveAnalytics"andsince2011,"DataScience"termswerealso
usedtodescribethisfield.
Background
Themanualextractionofpatternsfromdatahasoccurredforcenturies.Earlymethodsofidentifying
patternsindataincludeBayes'theorem(1700s)andregressionanalysis(1800s).Theproliferation,ubiquity
andincreasingpowerofcomputertechnologyhasdramaticallyincreaseddatacollection,storage,and
manipulationability.Asdatasetshavegrowninsizeandcomplexity,direct"handson"dataanalysishas
increasinglybeenaugmentedwithindirect,automateddataprocessing,aidedbyotherdiscoveriesin
computerscience,suchasneuralnetworks,clusteranalysis,geneticalgorithms(1950s),decisiontreesand
decisionrules(1960s),andsupportvectormachines(1990s).Dataminingistheprocessofapplyingthese
methodswiththeintentionofuncoveringhiddenpatterns[11]inlargedatasets.Itbridgesthegapfrom
appliedstatisticsandartificialintelligence(whichusuallyprovidethemathematicalbackground)to
databasemanagementbyexploitingthewaydataisstoredandindexedindatabasestoexecutetheactual
learninganddiscoveryalgorithmsmoreefficiently,allowingsuchmethodstobeappliedtoeverlargerdata
sets.
Researchandevolution
ThepremierprofessionalbodyinthefieldistheAssociationforComputingMachinery's(ACM)Special
InterestGroup(SIG)onKnowledgeDiscoveryandDataMining(SIGKDD).[12][13]Since1989thisACM
SIGhashostedanannualinternationalconferenceandpublisheditsproceedings,[14]andsince1999ithas
publishedabiannualacademicjournaltitled"SIGKDDExplorations".[15]
Computerscienceconferencesondatamininginclude:
CIKMConferenceACMConferenceonInformationandKnowledgeManagement
DMINConferenceInternationalConferenceonDataMining
DMKDConferenceResearchIssuesonDataMiningandKnowledgeDiscovery
ECDMConferenceEuropeanConferenceonDataMining
ECMLPKDDConferenceEuropeanConferenceonMachineLearningandPrinciplesandPractice
ofKnowledgeDiscoveryinDatabases
EDMConferenceInternationalConferenceonEducationalDataMining
http://en.wikipedia.org/wiki/Data_mining
3/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
ICDMConferenceIEEEInternationalConferenceonDataMining
KDDConferenceACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining
MLDMConferenceMachineLearningandDataMininginPatternRecognition
PAKDDConferenceTheannualPacificAsiaConferenceonKnowledgeDiscoveryandData
Mining
PAWConferencePredictiveAnalyticsWorld
SDMConferenceSIAMInternationalConferenceonDataMining(SIAM)
SSTDSymposiumSymposiumonSpatialandTemporalDatabases
WSDMConferenceACMConferenceonWebSearchandDataMining
Dataminingtopicsarealsopresentonmanydatamanagement/databaseconferencessuchastheICDE
Conference,SIGMODConferenceandInternationalConferenceonVeryLargeDataBases
Process
TheKnowledgeDiscoveryinDatabases(KDD)processiscommonlydefinedwiththestages:
(1)Selection
(2)Preprocessing
(3)Transformation
(4)DataMining
(5)Interpretation/Evaluation.[1]
Itexists,however,inmanyvariationsonthistheme,suchastheCrossIndustryStandardProcessforData
Mining(CRISPDM)whichdefinessixphases:
(1)BusinessUnderstanding
(2)DataUnderstanding
(3)DataPreparation
(4)Modeling
(5)Evaluation
(6)Deployment
orasimplifiedprocesssuchas(1)preprocessing,(2)datamining,and(3)resultsvalidation.
Pollsconductedin2002,2004,and2007showthattheCRISPDMmethodologyistheleading
methodologyusedbydataminers.[16][17][18]Theonlyotherdataminingstandardnamedinthesepollswas
SEMMA.However,34timesasmanypeoplereportedusingCRISPDM.Severalteamsofresearchers
havepublishedreviewsofdataminingprocessmodels,[19][20]andAzevedoandSantosconducteda
comparisonofCRISPDMandSEMMAin2008.[21]
http://en.wikipedia.org/wiki/Data_mining
4/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Preprocessing
Beforedataminingalgorithmscanbeused,atargetdatasetmustbeassembled.Asdataminingcanonly
uncoverpatternsactuallypresentinthedata,thetargetdatasetmustbelargeenoughtocontainthese
patternswhileremainingconciseenoughtobeminedwithinanacceptabletimelimit.Acommonsourcefor
dataisadatamartordatawarehouse.Preprocessingisessentialtoanalyzethemultivariatedatasetsbefore
datamining.Thetargetsetisthencleaned.Datacleaningremovestheobservationscontainingnoiseand
thosewithmissingdata.
Datamining
Datamininginvolvessixcommonclassesoftasks:[1]
Anomalydetection(Outlier/change/deviationdetection)Theidentificationofunusualdatarecords,
thatmightbeinterestingordataerrorsthatrequirefurtherinvestigation.
Associationrulelearning(Dependencymodeling)Searchesforrelationshipsbetweenvariables.For
exampleasupermarketmightgatherdataoncustomerpurchasinghabits.Usingassociationrule
learning,thesupermarketcandeterminewhichproductsarefrequentlyboughttogetherandusethis
informationformarketingpurposes.Thisissometimesreferredtoasmarketbasketanalysis.
Clusteringisthetaskofdiscoveringgroupsandstructuresinthedatathatareinsomewayor
another"similar",withoutusingknownstructuresinthedata.
Classificationisthetaskofgeneralizingknownstructuretoapplytonewdata.Forexample,ane
mailprogrammightattempttoclassifyanemailas"legitimate"oras"spam".
Regressionattemptstofindafunctionwhichmodelsthedatawiththeleasterror.
Summarizationprovidingamorecompactrepresentationofthedataset,includingvisualizationand
reportgeneration.
Resultsvalidation
Dataminingcanunintentionallybemisused,andcanthenproduceresultswhichappeartobesignificant
butwhichdonotactuallypredictfuturebehaviorandcannotbereproducedonanewsampleofdataand
bearlittleuse.Oftenthisresultsfrominvestigatingtoomanyhypothesesandnotperformingproper
statisticalhypothesistesting.Asimpleversionofthisprobleminmachinelearningisknownasoverfitting,
butthesameproblemcanariseatdifferentphasesoftheprocessandthusatrain/testsplitwhenapplicable
atallmaynotbesufficienttopreventthisfromhappening.
Thefinalstepofknowledgediscoveryfromdataistoverifythatthepatternsproducedbythedatamining
algorithmsoccurinthewiderdataset.Notallpatternsfoundbythedataminingalgorithmsarenecessarily
valid.Itiscommonforthedataminingalgorithmstofindpatternsinthetrainingsetwhicharenotpresent
http://en.wikipedia.org/wiki/Data_mining
5/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
inthegeneraldataset.Thisiscalledoverfitting.Toovercomethis,theevaluationusesatestsetofdataon
whichthedataminingalgorithmwasnottrained.Thelearnedpatternsareappliedtothistestset,andthe
resultingoutputiscomparedtothedesiredoutput.Forexample,adataminingalgorithmtryingto
distinguish"spam"from"legitimate"emailswouldbetrainedonatrainingsetofsampleemails.Once
trained,thelearnedpatternswouldbeappliedtothetestsetofemailsonwhichithadnotbeentrained.The
accuracyofthepatternscanthenbemeasuredfromhowmanyemailstheycorrectlyclassify.Anumberof
statisticalmethodsmaybeusedtoevaluatethealgorithm,suchasROCcurves.
Ifthelearnedpatternsdonotmeetthedesiredstandards,subsequentlyitisnecessarytoreevaluateand
changethepreprocessinganddataminingsteps.Ifthelearnedpatternsdomeetthedesiredstandards,then
thefinalstepistointerpretthelearnedpatternsandturnthemintoknowledge.
Standards
Therehavebeensomeeffortstodefinestandardsforthedataminingprocess,forexamplethe1999
EuropeanCrossIndustryStandardProcessforDataMining(CRISPDM1.0)andthe2004JavaData
Miningstandard(JDM1.0).Developmentonsuccessorstotheseprocesses(CRISPDM2.0andJDM2.0)
wasactivein2006,buthasstalledsince.JDM2.0waswithdrawnwithoutreachingafinaldraft.
Forexchangingtheextractedmodelsinparticularforuseinpredictiveanalyticsthekeystandardisthe
PredictiveModelMarkupLanguage(PMML),whichisanXMLbasedlanguagedevelopedbytheData
MiningGroup(DMG)andsupportedasexchangeformatbymanydataminingapplications.Asthename
suggests,itonlycoverspredictionmodels,aparticulardataminingtaskofhighimportancetobusiness
applications.However,extensionstocover(forexample)subspaceclusteringhavebeenproposed
independentlyoftheDMG.[22]
Notableuses
SeealsoCategory:Applieddatamining.
Games
Sincetheearly1960s,withtheavailabilityoforaclesforcertaincombinatorialgames,alsocalled
tablebases(e.g.for3x3chess)withanybeginningconfiguration,smallboarddotsandboxes,smallboard
hex,andcertainendgamesinchess,dotsandboxes,andhexanewareafordatamininghasbeenopened.
Thisistheextractionofhumanusablestrategiesfromtheseoracles.Currentpatternrecognitionapproaches
donotseemtofullyacquirethehighlevelofabstractionrequiredtobeappliedsuccessfully.Instead,
extensiveexperimentationwiththetablebasescombinedwithanintensivestudyoftablebaseanswersto
welldesignedproblems,andwithknowledgeofpriorart(i.e.,pretablebaseknowledge)isusedtoyield
insightfulpatterns.Berlekamp(indotsandboxes,etc.)andJohnNunn(inchessendgames)arenotable
examplesofresearchersdoingthiswork,thoughtheywerenotandarenotinvolvedintablebase
generation.
Business
http://en.wikipedia.org/wiki/Data_mining
6/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Inbusiness,dataminingistheanalysisofhistoricalbusinessactivities,storedasstaticdataindata
warehousedatabases.Thegoalistorevealhiddenpatternsandtrends.Dataminingsoftwareusesadvanced
patternrecognitionalgorithmstosiftthroughlargeamountsofdatatoassistindiscoveringpreviously
unknownstrategicbusinessinformation.Examplesofwhatbusinessesusedataminingforinclude
performingmarketanalysistoidentifynewproductbundles,findingtherootcauseofmanufacturing
problems,topreventcustomerattritionandacquirenewcustomers,crossselltoexistingcustomers,and
profilecustomerswithmoreaccuracy.[23]
Intodaysworldrawdataisbeingcollectedbycompaniesatanexplodingrate.Forexample,
Walmartprocessesover20millionpointofsaletransactionseveryday.Thisinformationisstoredin
acentralizeddatabase,butwouldbeuselesswithoutsometypeofdataminingsoftwaretoanalyzeit.
IfWalmartanalyzedtheirpointofsaledatawithdataminingtechniquestheywouldbeableto
determinesalestrends,developmarketingcampaigns,andmoreaccuratelypredictcustomer
loyalty.[24]
Everytimeacreditcardorastoreloyaltycardisbeingused,orawarrantycardisbeingfilled,datais
beingcollectedabouttheusersbehavior.Manypeoplefindtheamountofinformationstoredaboutus
fromcompanies,suchasGoogle,Facebook,andAmazon,disturbingandareconcernedabout
privacy.Althoughthereisthepotentialforourpersonaldatatobeusedinharmful,orunwanted,
waysitisalsobeingusedtomakeourlivesbetter.Forexample,FordandAudihopetooneday
collectinformationaboutcustomerdrivingpatternssotheycanrecommendsaferroutesandwarn
driversaboutdangerousroadconditions.[25]
Dataminingincustomerrelationshipmanagementapplicationscancontributesignificantlytothe
bottomline.Ratherthanrandomlycontactingaprospectorcustomerthroughacallcenterorsending
mail,acompanycanconcentrateitseffortsonprospectsthatarepredictedtohaveahighlikelihood
ofrespondingtoanoffer.Moresophisticatedmethodsmaybeusedtooptimizeresourcesacross
campaignssothatonemaypredicttowhichchannelandtowhichofferanindividualismostlikelyto
respond(acrossallpotentialoffers).Additionally,sophisticatedapplicationscouldbeusedto
automatemailing.Oncetheresultsfromdatamining(potentialprospect/customerandchannel/offer)
aredetermined,this"sophisticatedapplication"caneitherautomaticallysendanemailoraregular
mail.Finally,incaseswheremanypeoplewilltakeanactionwithoutanoffer,"upliftmodeling"can
beusedtodeterminewhichpeoplehavethegreatestincreaseinresponseifgivenanoffer.Uplift
modelingtherebyenablesmarketerstofocusmailingsandoffersonpersuadablepeople,andnotto
sendofferstopeoplewhowillbuytheproductwithoutanoffer.Dataclusteringcanalsobeusedto
automaticallydiscoverthesegmentsorgroupswithinacustomerdataset.
Businessesemployingdataminingmayseeareturnoninvestment,butalsotheyrecognizethatthe
numberofpredictivemodelscanquicklybecomeverylarge.Forexample,ratherthanusingone
http://en.wikipedia.org/wiki/Data_mining
7/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
modeltopredicthowmanycustomerswillchurn,abusinessmaychoosetobuildaseparatemodel
foreachregionandcustomertype.Insituationswherealargenumberofmodelsneedtobe
maintained,somebusinessesturntomoreautomateddataminingmethodologies.
Dataminingcanbehelpfultohumanresources(HR)departmentsinidentifyingthecharacteristicsof
theirmostsuccessfulemployees.Informationobtainedsuchasuniversitiesattendedbyhighly
successfulemployeescanhelpHRfocusrecruitingeffortsaccordingly.Additionally,Strategic
EnterpriseManagementapplicationshelpacompanytranslatecorporatelevelgoals,suchasprofit
andmarginsharetargets,intooperationaldecisions,suchasproductionplansandworkforce
levels.[26]
Marketbasketanalysis,relatestodatamininguseinretailsales.Ifaclothingstorerecordsthe
purchasesofcustomers,adataminingsystemcouldidentifythosecustomerswhofavorsilkshirts
overcottonones.Althoughsomeexplanationsofrelationshipsmaybedifficult,takingadvantageof
itiseasier.Theexampledealswithassociationruleswithintransactionbaseddata.Notalldataare
transactionbasedandlogical,orinexactrulesmayalsobepresentwithinadatabase.
MarketbasketanalysishasbeenusedtoidentifythepurchasepatternsoftheAlphaConsumer.
Analyzingthedatacollectedonthistypeofuserhasallowedcompaniestopredictfuturebuying
trendsandforecastsupplydemands.
Dataminingisahighlyeffectivetoolinthecatalogmarketingindustry.Catalogershavearich
databaseofhistoryoftheircustomertransactionsformillionsofcustomersdatingbackanumberof
years.Dataminingtoolscanidentifypatternsamongcustomersandhelpidentifythemostlikely
customerstorespondtoupcomingmailingcampaigns.
Dataminingforbusinessapplicationscanbeintegratedintoacomplexmodelinganddecision
makingprocess.[27]Reactivebusinessintelligence(RBI)advocatesa"holistic"approachthat
integratesdatamining,modeling,andinteractivevisualizationintoanendtoenddiscoveryand
continuousinnovationprocesspoweredbyhumanandautomatedlearning.[28]
Intheareaofdecisionmaking,theRBIapproachhasbeenusedtomineknowledgethatis
progressivelyacquiredfromthedecisionmaker,andthenselftunethedecisionmethod
accordingly.[29]Therelationbetweenthequalityofadataminingsystemandtheamountof
investmentthatthedecisionmakeriswillingtomakewasformalizedbyprovidinganeconomic
perspectiveonthevalueofextractedknowledgeintermsofitspayofftotheorganization[27]This
decisiontheoreticclassificationframework[27]wasappliedtoarealworldsemiconductorwafer
manufacturingline,wheredecisionrulesforeffectivelymonitoringandcontrollingthe
http://en.wikipedia.org/wiki/Data_mining
8/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
semiconductorwaferfabricationlineweredeveloped.[30]
Anexampleofdataminingrelatedtoanintegratedcircuit(IC)productionlineisdescribedinthe
paper"MiningICTestDatatoOptimizeVLSITesting."[31]Inthispaper,theapplicationofdata
mininganddecisionanalysistotheproblemofdielevelfunctionaltestingisdescribed.Experiments
mentioneddemonstratetheabilitytoapplyasystemofmininghistoricaldietestdatatocreatea
probabilisticmodelofpatternsofdiefailure.Thesepatternsarethenutilizedtodecide,inrealtime,
whichdietotestnextandwhentostoptesting.Thissystemhasbeenshown,basedonexperiments
withhistoricaltestdata,tohavethepotentialtoimproveprofitsonmatureICproducts.Other
examples[32][33]oftheapplicationofdataminingmethodologiesinsemiconductormanufacturing
environmentssuggestthatdataminingmethodologiesmaybeparticularlyusefulwhendataisscarce,
andthevariousphysicalandchemicalparametersthataffecttheprocessexhibithighlycomplex
interactions.Anotherimplicationisthatonlinemonitoringofthesemiconductormanufacturing
processusingdataminingmaybehighlyeffective.
Scienceandengineering
Inrecentyears,datamininghasbeenusedwidelyintheareasofscienceandengineering,suchas
bioinformatics,genetics,medicine,educationandelectricalpowerengineering.
Inthestudyofhumangenetics,sequencemininghelpsaddresstheimportantgoalofunderstanding
themappingrelationshipbetweentheinterindividualvariationsinhumanDNAsequenceandthe
variabilityindiseasesusceptibility.Insimpleterms,itaimstofindouthowthechangesinan
individual'sDNAsequenceaffectstherisksofdevelopingcommondiseasessuchascancer,whichis
ofgreatimportancetoimprovingmethodsofdiagnosing,preventing,andtreatingthesediseases.One
dataminingmethodthatisusedtoperformthistaskisknownasmultifactordimensionality
reduction.[34]
Intheareaofelectricalpowerengineering,dataminingmethodshavebeenwidelyusedforcondition
monitoringofhighvoltageelectricalequipment.Thepurposeofconditionmonitoringistoobtain
valuableinformationon,forexample,thestatusoftheinsulation(orotherimportantsafetyrelated
parameters).Dataclusteringtechniquessuchastheselforganizingmap(SOM),havebeenapplied
tovibrationmonitoringandanalysisoftransformeronloadtapchangers(OLTCS).Usingvibration
monitoring,itcanbeobservedthateachtapchangeoperationgeneratesasignalthatcontains
informationabouttheconditionofthetapchangercontactsandthedrivemechanisms.Obviously,
differenttappositionswillgeneratedifferentsignals.However,therewasconsiderablevariability
amongstnormalconditionsignalsforexactlythesametapposition.SOMhasbeenappliedtodetect
abnormalconditionsandtohypothesizeaboutthenatureoftheabnormalities.[35]
http://en.wikipedia.org/wiki/Data_mining
9/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Dataminingmethodshavebeenappliedtodissolvedgasanalysis(DGA)inpowertransformers.
DGA,asadiagnosticsforpowertransformers,hasbeenavailableformanyyears.Methodssuchas
SOMhasbeenappliedtoanalyzegenerateddataandtodeterminetrendswhicharenotobvioustothe
standardDGAratiomethods(suchasDuvalTriangle).[35]
Ineducationalresearch,wheredatamininghasbeenusedtostudythefactorsleadingstudentsto
choosetoengageinbehaviorswhichreducetheirlearning,[36]andtounderstandfactorsinfluencing
universitystudentretention.[37]Asimilarexampleofsocialapplicationofdataminingisitsusein
expertisefindingsystems,wherebydescriptorsofhumanexpertiseareextracted,normalized,and
classifiedsoastofacilitatethefindingofexperts,particularlyinscientificandtechnicalfields.Inthis
way,dataminingcanfacilitateinstitutionalmemory.
Dataminingmethodsofbiomedicaldatafacilitatedbydomainontologies,[38]miningclinicaltrial
data,[39]andtrafficanalysisusingSOM.[40]
Inadversedrugreactionsurveillance,theUppsalaMonitoringCentrehas,since1998,useddata
miningmethodstoroutinelyscreenforreportingpatternsindicativeofemergingdrugsafetyissuesin
theWHOglobaldatabaseof4.6millionsuspectedadversedrugreactionincidents.[41]Recently,
similarmethodologyhasbeendevelopedtominelargecollectionsofelectronichealthrecordsfor
temporalpatternsassociatingdrugprescriptionstomedicaldiagnoses.[42]
Datamininghasbeenappliedtosoftwareartifactswithintherealmofsoftwareengineering:Mining
SoftwareRepositories.
Humanrights
Dataminingofgovernmentrecordsparticularlyrecordsofthejusticesystem(i.e.,courts,prisons)
enablesthediscoveryofsystemichumanrightsviolationsinconnectiontogenerationandpublicationof
invalidorfraudulentlegalrecordsbyvariousgovernmentagencies.[43][44]
Medicaldatamining
In2011,thecaseofSorrellv.IMSHealth,Inc.,decidedbytheSupremeCourtoftheUnitedStates,ruled
thatpharmaciesmayshareinformationwithoutsidecompanies.Thispracticewasauthorizedunderthe1st
AmendmentoftheConstitution,protectingthe"freedomofspeech."[45]However,thepassageoftheHealth
InformationTechnologyforEconomicandClinicalHealthAct(HITECHAct)helpedtoinitiatethe
adoptionoftheelectronichealthrecord(EHR)andsupportingtechnologyintheUnitedStates.[46]The
HITECHActwassignedintolawonFebruary17,2009aspartoftheAmericanRecoveryand
ReinvestmentAct(ARRA)andhelpedtoopenthedoortomedicaldatamining.[47]Priortothesigningof
thislaw,estimatesofonly20%ofUnitedStatesbasedphysicianwereutilizingelectronicpatient
http://en.wikipedia.org/wiki/Data_mining
10/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
records.[46]SrenBrunaknotesthatthepatientrecordbecomesasinformationrichaspossibleand
therebymaximizesthedataminingopportunities.[46]Hence,electronicpatientrecordsfurtherexpands
thepossibilitiesregardingmedicaldataminingtherebyopeningthedoortoavastsourceofmedicaldata
analysis.
Spatialdatamining
Spatialdataminingistheapplicationofdataminingmethodstospatialdata.Theendobjectiveofspatial
dataminingistofindpatternsindatawithrespecttogeography.Sofar,dataminingandGeographic
InformationSystems(GIS)haveexistedastwoseparatetechnologies,eachwithitsownmethods,
traditions,andapproachestovisualizationanddataanalysis.Particularly,mostcontemporaryGIShave
onlyverybasicspatialanalysisfunctionality.Theimmenseexplosioningeographicallyreferenceddata
occasionedbydevelopmentsinIT,digitalmapping,remotesensing,andtheglobaldiffusionofGIS
emphasizestheimportanceofdevelopingdatadriveninductiveapproachestogeographicalanalysisand
modeling.
DataminingoffersgreatpotentialbenefitsforGISbasedapplieddecisionmaking.Recently,thetaskof
integratingthesetwotechnologieshasbecomeofcriticalimportance,especiallyasvariouspublicand
privatesectororganizationspossessinghugedatabaseswiththematicandgeographicallyreferenceddata
begintorealizethehugepotentialoftheinformationcontainedtherein.Amongthoseorganizationsare:
officesrequiringanalysisordisseminationofgeoreferencedstatisticaldata
publichealthservicessearchingforexplanationsofdiseaseclustering
environmentalagenciesassessingtheimpactofchanginglandusepatternsonclimatechange
geomarketingcompaniesdoingcustomersegmentationbasedonspatiallocation.
ChallengesinSpatialmining:Geospatialdatarepositoriestendtobeverylarge.Moreover,existingGIS
datasetsareoftensplinteredintofeatureandattributecomponentsthatareconventionallyarchivedin
hybriddatamanagementsystems.Algorithmicrequirementsdiffersubstantiallyforrelational(attribute)
datamanagementandfortopological(feature)datamanagement.[48]Relatedtothisistherangeand
diversityofgeographicdataformats,whichpresentuniquechallenges.Thedigitalgeographicdata
revolutioniscreatingnewtypesofdataformatsbeyondthetraditional"vector"and"raster"formats.
Geographicdatarepositoriesincreasinglyincludeillstructureddata,suchasimageryandgeoreferenced
multimedia.[49]
Thereareseveralcriticalresearchchallengesingeographicknowledgediscoveryanddatamining.Miller
andHan[50]offerthefollowinglistofemergingresearchtopicsinthefield:
Developingandsupportinggeographicdatawarehouses(GDW's):Spatialpropertiesareoften
reducedtosimpleaspatialattributesinmainstreamdatawarehouses.CreatinganintegratedGDW
requiressolvingissuesofspatialandtemporaldatainteroperabilityincludingdifferencesin
semantics,referencingsystems,geometry,accuracy,andposition.
Betterspatiotemporalrepresentationsingeographicknowledgediscovery:Currentgeographic
http://en.wikipedia.org/wiki/Data_mining
11/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
knowledgediscovery(GKD)methodsgenerallyuseverysimplerepresentationsofgeographic
objectsandspatialrelationships.Geographicdataminingmethodsshouldrecognizemorecomplex
geographicobjects(i.e.,linesandpolygons)andrelationships(i.e.,nonEuclideandistances,
direction,connectivity,andinteractionthroughattributedgeographicspacesuchasterrain).
Furthermore,thetimedimensionneedstobemorefullyintegratedintothesegeographic
representationsandrelationships.
Geographicknowledgediscoveryusingdiversedatatypes:GKDmethodsshouldbedeveloped
thatcanhandlediversedatatypesbeyondthetraditionalrasterandvectormodels,includingimagery
andgeoreferencedmultimedia,aswellasdynamicdatatypes(videostreams,animation).
Temporaldatamining
Datamaycontainattributesgeneratedandrecordedatdifferenttimes.Inthiscasefindingmeaningful
relationshipsinthedatamayrequireconsideringthetemporalorderoftheattributes.Atemporal
relationshipmayindicateacausalrelationship,orsimplyanassociation.
Sensordatamining
Wirelesssensornetworkscanbeusedforfacilitatingthecollectionofdataforspatialdataminingfora
varietyofapplicationssuchasairpollutionmonitoring.[51]Acharacteristicofsuchnetworksisthatnearby
sensornodesmonitoringanenvironmentalfeaturetypicallyregistersimilarvalues.Thiskindofdata
redundancyduetothespatialcorrelationbetweensensorobservationsinspiresthetechniquesforin
networkdataaggregationandmining.Bymeasuringthespatialcorrelationbetweendatasampledby
differentsensors,awideclassofspecializedalgorithmscanbedevelopedtodevelopmoreefficientspatial
dataminingalgorithms.[52]
Visualdatamining
Intheprocessofturningfromanalogicalintodigital,largedatasetshavebeengenerated,collected,and
storeddiscoveringstatisticalpatterns,trendsandinformationwhichishiddenindata,inordertobuild
predictivepatterns.Studiessuggestvisualdataminingisfasterandmuchmoreintuitivethanistraditional
datamining.[53][54][55]SeealsoComputervision.
Musicdatamining
Dataminingtechniques,andinparticularcooccurrenceanalysis,hasbeenusedtodiscoverrelevant
similaritiesamongmusiccorpora(radiolists,CDdatabases)forpurposesincludingclassifyingmusicinto
genresinamoreobjectivemanner.[56]
Surveillance
http://en.wikipedia.org/wiki/Data_mining
12/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
DatamininghasbeenusedbytheU.S.government.ProgramsincludetheTotalInformationAwareness
(TIA)program,SecureFlight(formerlyknownasComputerAssistedPassengerPrescreeningSystem
(CAPPSII)),Analysis,Dissemination,Visualization,Insight,SemanticEnhancement(ADVISE),[57]and
theMultistateAntiTerrorismInformationExchange(MATRIX).[58]Theseprogramshavebeen
discontinuedduetocontroversyoverwhethertheyviolatethe4thAmendmenttotheUnitedStates
Constitution,althoughmanyprogramsthatwereformedunderthemcontinuetobefundedbydifferent
organizationsorunderdifferentnames.[59]
Inthecontextofcombatingterrorism,twoparticularlyplausiblemethodsofdataminingare"pattern
mining"and"subjectbaseddatamining".
Patternmining
"Patternmining"isadataminingmethodthatinvolvesfindingexistingpatternsindata.Inthiscontext
patternsoftenmeansassociationrules.Theoriginalmotivationforsearchingassociationrulescamefrom
thedesiretoanalyzesupermarkettransactiondata,thatis,toexaminecustomerbehaviorintermsofthe
purchasedproducts.Forexample,anassociationrule"beerpotatochips(80%)"statesthatfouroutof
fivecustomersthatboughtbeeralsoboughtpotatochips.
Inthecontextofpatternminingasatooltoidentifyterroristactivity,theNationalResearchCouncil
providesthefollowingdefinition:"Patternbaseddatamininglooksforpatterns(includinganomalousdata
patterns)thatmightbeassociatedwithterroristactivitythesepatternsmightberegardedassmallsignals
inalargeoceanofnoise."[60][61][62]PatternMiningincludesnewareassuchaMusicInformationRetrieval
(MIR)wherepatternsseenbothinthetemporalandnontemporaldomainsareimportedtoclassical
knowledgediscoverysearchmethods.
Subjectbaseddatamining
"Subjectbaseddatamining"isadataminingmethodinvolvingthesearchforassociationsbetween
individualsindata.Inthecontextofcombatingterrorism,theNationalResearchCouncilprovidesthe
followingdefinition:"Subjectbaseddataminingusesaninitiatingindividualorotherdatumthatis
considered,basedonotherinformation,tobeofhighinterest,andthegoalistodeterminewhatother
personsorfinancialtransactionsormovements,etc.,arerelatedtothatinitiatingdatum."[61]
Knowledgegrid
Knowledgediscovery"OntheGrid"generallyreferstoconductingknowledgediscoveryinanopen
environmentusinggridcomputingconcepts,allowinguserstointegratedatafromvariousonlinedata
sources,aswellmakeuseofremoteresources,forexecutingtheirdataminingtasks.Theearliestexample
wastheDiscoveryNet,[63][64]developedatImperialCollegeLondon,whichwonthe"MostInnovative
DataIntensiveApplicationAward"attheACMSC02(Supercomputing2002)conferenceandexhibition,
basedonademonstrationofafullyinteractivedistributedknowledgediscoveryapplicationfora
bioinformaticsapplication.OtherexamplesincludeworkconductedbyresearchersattheUniversityof
Calabria,whodevelopedaKnowledgeGridarchitecturefordistributedknowledgediscovery,basedongrid
computing.[65][66]
http://en.wikipedia.org/wiki/Data_mining
13/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Privacyconcernsandethics
Whiletheterm"datamining"itselfhasnoethicalimplications,itisoftenassociatedwiththeminingof
informationinrelationtopeoples'behavior(ethicalandotherwise).[67]
Thewaysinwhichdataminingcanbeusedcaninsomecasesandcontextsraisequestionsregarding
privacy,legality,andethics.[68]Inparticular,datamininggovernmentorcommercialdatasetsfornational
securityorlawenforcementpurposes,suchasintheTotalInformationAwarenessProgramorinADVISE,
hasraisedprivacyconcerns.[69][70]
Dataminingrequiresdatapreparationwhichcanuncoverinformationorpatternswhichmaycompromise
confidentialityandprivacyobligations.Acommonwayforthistooccuristhroughdataaggregation.Data
aggregationinvolvescombiningdatatogether(possiblyfromvarioussources)inawaythatfacilitates
analysis(butthatalsomightmakeidentificationofprivate,individualleveldatadeducibleorotherwise
apparent).[71]Thisisnotdataminingperse,butaresultofthepreparationofdatabeforeandforthe
purposesoftheanalysis.Thethreattoanindividual'sprivacycomesintoplaywhenthedata,once
compiled,causethedataminer,oranyonewhohasaccesstothenewlycompileddataset,tobeableto
identifyspecificindividuals,especiallywhenthedatawereoriginallyanonymous.[72][73][74]
Itisrecommendedthatanindividualismadeawareofthefollowingbeforedataarecollected:[71]
thepurposeofthedatacollectionandany(known)dataminingprojects
howthedatawillbeused
whowillbeabletominethedataandusethedataandtheirderivatives
thestatusofsecuritysurroundingaccesstothedata
howcollecteddatacanbeupdated.
Datamayalsobemodifiedsoastobecomeanonymous,sothatindividualsmaynotreadilybe
identified.[71]However,even"deidentified"/"anonymized"datasetscanpotentiallycontainenough
informationtoallowidentificationofindividuals,asoccurredwhenjournalistswereabletofindseveral
individualsbasedonasetofsearchhistoriesthatwereinadvertentlyreleasedbyAOL.[75]
SituationinEurope
Europehasratherstrongprivacylaws,andeffortsareunderwaytofurtherstrengthentherightsofthe
consumers.However,theU.S.E.U.SafeHarborPrinciplescurrentlyeffectivelyexposeEuropeanusersto
privacyexploitationbyU.S.companies.AsaconsequenceofEdwardSnowden'sGlobalsurveillance
disclosure,therehasbeenincreaseddiscussiontorevokethisagreement,asinparticularthedatawillbe
fullyexposedtotheNationalSecurityAgency,andattemptstoreachanagreementhavefailed.
SituationintheUnitedStates
IntheUnitedStates,privacyconcernshavebeenaddressedtosomeextentbytheUSCongressviathe
passageofregulatorycontrolssuchastheHealthInsurancePortabilityandAccountabilityAct(HIPAA).
TheHIPAArequiresindividualstogivetheir"informedconsent"regardinginformationtheyprovideand
http://en.wikipedia.org/wiki/Data_mining
14/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
itsintendedpresentandfutureuses.AccordingtoanarticleinBiotechBusinessWeek',"'[i]npractice,
HIPAAmaynotofferanygreaterprotectionthanthelongstandingregulationsintheresearcharena,'says
theAAHC.Moreimportantly,therule'sgoalofprotectionthroughinformedconsentisunderminedbythe
complexityofconsentformsthatarerequiredofpatientsandparticipants,whichapproachalevelof
incomprehensibilitytoaverageindividuals."[76]Thisunderscoresthenecessityfordataanonymityindata
aggregationandminingpractices.
U.S.informationprivacylegislationsuchasHIPAAandtheFamilyEducationalRightsandPrivacyAct
(FERPA)appliesonlytothespecificareasthateachsuchlawaddresses.Useofdataminingbythemajority
ofbusinessesintheU.S.isnotcontrolledbyanylegislation.
CopyrightLaw
SituationinEurope
DuetoalackofflexibilitiesinEuropeancopyrightanddatabaselaw,theminingofincopyrightworks
suchaswebminingwithoutthepermissionofthecopyrightownerisnotlegal.Whereadatabaseispure
datainEuropethereislikelytobenocopyright,butdatabaserightsmayexistsodataminingbecomes
subjecttoregulationsbytheDatabaseDirective.OntherecommendationoftheHargreavesreviewthisled
totheUKgovernmenttoamenditscopyrightlawin2014[77]toallowcontentminingasalimitationand
exception.OnlythesecondcountryintheworldtodosoafterJapan,whichintroducedanexceptionin
2009fordatamining.HoweverduetotherestrictionoftheCopyrightDirective,theUKexceptiononly
allowscontentminingfornoncommercialpurposes.UKcopyrightlawalsodoesnotallowthisprovision
tobeoverriddenbycontractualtermsandconditions.TheEuropeanCommissionfacilitatedstakeholder
discussionontextanddataminingin2013,underthetitleofLicencesforEurope.[78]Thefocusonthe
solutiontothislegalissuebeinglicencesandnotlimitationsandexceptionsledtorepresentativesof
universities,researchers,libraries,civilsocietygroupsandopenaccesspublisherstoleavethestakeholder
dialogueinMay2013.[79]
SituationintheUnitedStates
BycontrasttoEurope,theflexiblenatureofUScopyrightlaw,andinparticularfairusemeansthatcontent
mininginAmerica,aswellasotherfairusecountriessuchasIsrael,TaiwanandSouthKoreaisviewedas
beinglegal.Ascontentminingistransformative,thatisitdoesnotsupplanttheoriginalwork,itisviewed
asbeinglawfulunderfairuse.ForexampleaspartoftheGoogleBooksettlementthepresidingjudgeon
thecaseruledthatGoogle'sdigitisationprojectofincopyrightbookswaslawful,inpartbecauseofthe
transformativeusesthatthedigitisationprojectdisplayedonebeingtextanddatamining.[80]
Software
SeealsoCategory:Dataminingandmachinelearningsoftware.
Freeopensourcedataminingsoftwareandapplications
Carrot2:Textandsearchresultsclusteringframework.
http://en.wikipedia.org/wiki/Data_mining
15/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Chemicalize.org:Achemicalstructureminerandwebsearchengine.
ELKI:Auniversityresearchprojectwithadvancedclusteranalysisandoutlierdetectionmethods
writtenintheJavalanguage.
GATE:anaturallanguageprocessingandlanguageengineeringtool.
KNIME:TheKonstanzInformationMiner,auserfriendlyandcomprehensivedataanalytics
framework.
MLFlex:Asoftwarepackagethatenablesuserstointegratewiththirdpartymachinelearning
packageswritteninanyprogramminglanguage,executeclassificationanalysesinparallelacross
multiplecomputingnodes,andproduceHTMLreportsofclassificationresults.
MLPACKlibrary:acollectionofreadytousemachinelearningalgorithmswrittenintheC++
language.
MassiveOnlineAnalysis(MOA):arealtimebigdatastreamminingwithconceptdrifttoolinthe
Javaprogramminglanguage.
NLTK(NaturalLanguageToolkit):Asuiteoflibrariesandprogramsforsymbolicandstatistical
naturallanguageprocessing(NLP)forthePythonlanguage.
OpenNN:Openneuralnetworkslibrary.
Orange:AcomponentbaseddataminingandmachinelearningsoftwaresuitewritteninthePython
language.
R:Aprogramminglanguageandsoftwareenvironmentforstatisticalcomputing,datamining,and
graphics.ItispartoftheGNUProject.
RapidMiner:Anenvironmentformachinelearninganddataminingexperiments.
SCaViS:JavacrossplatformdataanalysisframeworkdevelopedatArgonneNationalLaboratory.
SenticNetAPI(http://sentic.net/api):Asemanticandaffectiveresourceforopinionminingand
sentimentanalysis.
Tanagra:Avisualisationorienteddataminingsoftware,alsoforteaching.
Torch:AnopensourcedeeplearninglibraryfortheLuaprogramminglanguageandscientific
computingframeworkwithwidesupportformachinelearningalgorithms.
UIMA:TheUIMA(UnstructuredInformationManagementArchitecture)isacomponentframework
foranalyzingunstructuredcontentsuchastext,audioandvideooriginallydevelopedbyIBM.
Weka:AsuiteofmachinelearningsoftwareapplicationswrittenintheJavaprogramminglanguage.
Commercialdataminingsoftwareandapplications
AngossKnowledgeSTUDIO:dataminingtoolprovidedbyAngoss.
Clarabridge:enterpriseclasstextanalyticssolution.
HPVerticaAnalyticsPlatform:dataminingsoftwareprovidedbyHP.
IBMSPSSModeler:dataminingsoftwareprovidedbyIBM.
http://en.wikipedia.org/wiki/Data_mining
16/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
KXENModeler:dataminingtoolprovidedbyKXEN.
LIONsolver:anintegratedsoftwareapplicationfordatamining,businessintelligence,andmodeling
thatimplementstheLearningandIntelligentOptimizatioN(LION)approach.
MicrosoftAnalysisServices:dataminingsoftwareprovidedbyMicrosoft.
NetOwl:suiteofmultilingualtextandentityanalyticsproductsthatenabledatamining.
OracleDataMining:dataminingsoftwarebyOracle.
SASEnterpriseMiner:dataminingsoftwareprovidedbytheSASInstitute.
STATISTICADataMiner:dataminingsoftwareprovidedbyStatSoft.
Marketplacesurveys
Severalresearchersandorganizationshaveconductedreviewsofdataminingtoolsandsurveysofdata
miners.Theseidentifysomeofthestrengthsandweaknessesofthesoftwarepackages.Theyalsoprovide
anoverviewofthebehaviors,preferencesandviewsofdataminers.Someofthesereportsinclude:
2011WileyInterdisciplinaryReviews:DataMiningandKnowledgeDiscovery[81]
RexerAnalyticsDataMinerSurveys(20072013)[82]
ForresterResearch2010PredictiveAnalyticsandDataMiningSolutionsreport[83]
Gartner2008"MagicQuadrant"report[84]
RobertA.Nisbet's2006ThreePartSeriesofarticles"DataMiningTools:WhichOneisBestFor
CRM?"[85]
Haughtonetal.'s2003ReviewofDataMiningSoftwarePackagesinTheAmericanStatistician[86]
Goebel&Gruenwald1999"ASurveyofDataMiningaKnowledgeDiscoverySoftwareTools"in
SIGKDDExplorations[87]
Seealso
Methods
Anomaly/outlier/changedetection
Associationrulelearning
Classification
Clusteranalysis
Decisiontree
Factoranalysis
Geneticalgorithms
Intentionmining
Multilinearsubspacelearning
http://en.wikipedia.org/wiki/Data_mining
17/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Neuralnetworks
Regressionanalysis
Sequencemining
Structureddataanalysis
Supportvectormachines
Textmining
Applicationdomains
Analytics
Bioinformatics
Businessintelligence
Dataanalysis
Datawarehouse
Decisionsupportsystem
Drugdiscovery
Exploratorydataanalysis
Predictiveanalytics
Webmining
Applicationexamples
SeealsoCategory:Applieddatamining.
Customeranalytics
Datamininginagriculture
Datamininginmeteorology
Educationaldatamining
NationalSecurityAgency
PoliceenforcedANPRintheUK
Quantitativestructureactivityrelationship
Surveillance/Masssurveillance(e.g.,StellarWind)
Relatedtopics
Dataminingisaboutanalyzingdataforinformationaboutextractinginformationoutofdata,see:
Dataintegration
Datatransformation
http://en.wikipedia.org/wiki/Data_mining
18/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Informationextraction
Informationintegration
Namedentityrecognition
Profiling(informationscience)
Webscraping
References
1. ^abcFayyad,UsamaPiatetskyShapiro,GregorySmyth,Padhraic(1996)."FromDataMiningtoKnowledge
DiscoveryinDatabases"(http://www.kdnuggets.com/gpspubs/aimagkddoverview1996Fayyad.pdf).Retrieved
17December2008.
2. ^abcd"DataMiningCurriculum"(http://www.sigkdd.org/curriculum.php).ACMSIGKDD.20060430.
Retrieved20111028.
3. ^Clifton,Christopher(2010)."EncyclopdiaBritannica:DefinitionofDataMining"
(http://www.britannica.com/EBchecked/topic/1056150/datamining).Retrieved20101209.
4. ^Hastie,TrevorTibshirani,RobertFriedman,Jerome(2009)."TheElementsofStatisticalLearning:Data
Mining,Inference,andPrediction"(http://wwwstat.stanford.edu/~tibs/ElemStatLearn/).Retrieved20120807.
5. ^Han,JiaweiKamber,Micheline(2001).Datamining:conceptsandtechniques.MorganKaufmann.p.5.
ISBN9781558604896."Thus,dataminingshouldhabebeenmoreappropriatelynamed"knowledgeminingfrom
data,"whichisunfortunatelysomewhatlong"
6. ^Seee.g.OKAIRP2005FallConference,ArizonaStateUniversity
(http://www.okairp.org/documents/2005%20Fall/F05_ROMEDataQualityETC.pdf),About.com:Datamining
(http://databases.about.com/od/datamining/a/datamining.htm)
7. ^Witten,IanH.Frank,EibeHall,MarkA.(30January2011).DataMining:PracticalMachineLearning
ToolsandTechniques(3ed.).Elsevier.ISBN9780123748560.
8. ^Bouckaert,RemcoR.Frank,EibeHall,MarkA.Holmes,GeoffreyPfahringer,BernhardReutemann,Peter
Witten,IanH.(2010)."WEKAExperienceswithaJavaopensourceproject".JournalofMachineLearning
Research11:25332541."theoriginaltitle,"Practicalmachinelearning",waschanged...Theterm"datamining"
was[added]primarilyformarketingreasons."
9. ^Mena,Jess(2011).MachineLearningForensicsforLawEnforcement,Security,andIntelligence.Boca
Raton,FL:CRCPress(Taylor&FrancisGroup).ISBN9781439860694.
10. ^PiatetskyShapiro,GregoryParker,Gary(2011)."Lesson:DataMining,andKnowledgeDiscovery:An
Introduction"(http://www.kdnuggets.com/data_mining_course/x1introtodataminingnotes.html).Introduction
toDataMining.KDNuggets.Retrieved30August2012.
11. ^Kantardzic,Mehmed(2003).DataMining:Concepts,Models,Methods,andAlgorithms.JohnWiley&Sons.
ISBN0471228524.OCLC50055336(https://www.worldcat.org/oclc/50055336).
12. ^"MicrosoftAcademicSearch:Topconferencesindatamining"(http://academic.research.microsoft.com/?
SearchDomain=2&SubDomain=7&entitytype=2).MicrosoftAcademicSearch.
13. ^"GoogleScholar:ToppublicationsDataMining&Analysis"(http://scholar.google.de/citations?
view_op=top_venues&hl=en&vq=eng_datamininganalysis).GoogleScholar.
http://en.wikipedia.org/wiki/Data_mining
19/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
14. ^Proceedings(http://www.kdd.org/conferences.php),InternationalConferencesonKnowledgeDiscoveryand
DataMining,ACM,NewYork.
15. ^SIGKDDExplorations(http://www.kdd.org/explorations/about.php),ACM,NewYork.
16. ^GregoryPiatetskyShapiro(2002)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2002/methodology.htm)
17. ^GregoryPiatetskyShapiro(2004)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2004/data_mining_methodology.htm)
18. ^GregoryPiatetskyShapiro(2007)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2007/data_mining_methodology.htm)
19. ^scarMarbn,GonzaloMariscalandJavierSegovia(2009)ADataMining&KnowledgeDiscoveryProcess
Model(http://cdn.intechopen.com/pdfs/5937/InTech
A_data_mining_amp_knowledge_discovery_process_model.pdf).InDataMiningandKnowledgeDiscoveryin
RealLifeApplications,Bookeditedby:JulioPonceandAdemKarahoca,ISBN9783902613530,pp.438
453,February2009,ITech,Vienna,Austria.
20. ^LukaszKurganandPetrMusilek(2006)AsurveyofKnowledgeDiscoveryandDataMiningprocessmodels
(http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=451120).TheKnowledge
EngineeringReview.Volume21Issue1,March2006,pp124,CambridgeUniversityPress,NewYork,NY,
USAdoi:10.1017/S0269888906000737(http://dx.doi.org/10.1017%2FS0269888906000737)
21. ^Azevedo,A.andSantos,M.F.KDD,SEMMAandCRISPDM:aparalleloverview
(http://www.iadis.net/dl/final_uploads/200812P033.pdf).InProceedingsoftheIADISEuropeanConferenceon
DataMining2008,pp182185.
22. ^Gnnemann,StephanKremer,HardySeidl,Thomas(2011)."AnextensionofthePMMLstandardto
subspaceclusteringmodels"."Proceedingsofthe2011workshoponPredictivemarkuplanguagemodeling
PMML'11".p.48.doi:10.1145/2023598.2023605(http://dx.doi.org/10.1145%2F2023598.2023605).
ISBN9781450308373.
23. ^O'Brien,J.A.,&Marakas,G.M.(2011).ManagementInformationSystems.NewYork,NY:McGraw
Hill/Irwin.
24. ^Alexander,D.(n.d.).DataMining.RetrievedfromTheUniversityofTexasatAustin:CollegeofLiberalArts:
http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/
25. ^Goss,S.(2013,April10).Dataminingandourpersonalprivacy.RetrievedfromTheTelegraph:
http://www.macon.com/2013/04/10/2429775/dataminingandourpersonalprivacy.html
26. ^Monk,EllenWagner,Bret(2006).ConceptsinEnterpriseResourcePlanning,SecondEdition.Boston,MA:
ThomsonCourseTechnology.ISBN0619216638.OCLC224465825
(https://www.worldcat.org/oclc/224465825).
27. ^abcElovici,YuvalBraha,Dan(2003)."ADecisionTheoreticApproachtoDataMining"
(http://necsi.edu/affiliates/braha/IEEE_Decision_Theoretic.pdf)(PDF).IEEETransactionsonSystems,Man,and
CyberneticsPartA:SystemsandHumans33(1).
28. ^Battiti,RobertoandBrunato,MauroReactiveBusinessIntelligence.FromDatatoModelstoInsight
(http://www.reactivebusinessintelligence.com/),ReactiveSearchSrl,Italy,February2011.ISBN97888905795
09.
29. ^Battiti,RobertoPasserini,Andrea(2010)."BrainComputerEvolutionaryMultiObjectiveOptimization(BC
EMO):ageneticalgorithmadaptingtothedecisionmaker"
http://en.wikipedia.org/wiki/Data_mining
20/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
EMO):ageneticalgorithmadaptingtothedecisionmaker"
(http://rtm.science.unitn.it/~battiti/archive/bcemo.pdf).IEEETransactionsonEvolutionaryComputation14(15):
671687.doi:10.1109/TEVC.2010.2058118(http://dx.doi.org/10.1109%2FTEVC.2010.2058118).
30. ^Braha,DanElovici,YuvalLast,Mark(2007)."Theoryofactionabledataminingwithapplicationto
semiconductormanufacturingcontrol"(http://necsi.edu/affiliates/braha/TPRS_A_165421_O.pdf)(PDF).
InternationalJournalofProductionResearch45(13).
31. ^Fountain,TonyDietterich,ThomasandSudyka,Bill(2000)MiningICTestDatatoOptimizeVLSITesting
(http://web.engr.oregonstate.edu/~tgd/publications/kdd2000dlft.pdf),inProceedingsoftheSixthACMSIGKDD
InternationalConferenceonKnowledgeDiscovery&DataMining,ACMPress,pp.1825
32. ^Braha,DanShmilovici,Armin(2002)."DataMiningforImprovingaCleaningProcessintheSemiconductor
Industry"(http://necsi.edu/affiliates/braha/IEEECleaning_02.pdf)(PDF).IEEETransactionsonSemiconductor
Manufacturing15(1).
33. ^Braha,DanShmilovici,Armin(2003)."OntheUseofDecisionTreeInductionforDiscoveryofInteractions
inaPhotolithographicProcess"(http://necsi.edu/affiliates/braha/IEEE_Decision_Trees.pdf)(PDF).IEEE
TransactionsonSemiconductorManufacturing16(4).
34. ^Zhu,XingquanDavidson,Ian(2007).KnowledgeDiscoveryandDataMining:ChallengesandRealities.New
York,NY:Hershey.p.18.ISBN9781599042527.
35. ^abMcGrail,AnthonyJ.Gulski,EdwardAllan,DavidBirtwhistle,DavidBlackburn,TrevorR.Groot,
EdwinR.S."DataMiningTechniquestoAssesstheConditionofHighVoltageElectricalPlant".CIGRWG
15.11ofStudyCommittee15.
36. ^Baker,RyanS.J.d."IsGamingtheSystemStateorTrait?EducationalDataMiningThroughtheMulti
ContextualApplicationofaValidatedBehavioralModel".WorkshoponDataMiningforUserModeling2007.
37. ^SuperbyAguirre,JuanFranciscoVandamme,JeanPhilippeMeskens,Nadine."Determinationoffactors
influencingtheachievementofthefirstyearuniversitystudentsusingdataminingmethods".Workshopon
EducationalDataMining2006.
38. ^Zhu,XingquanDavidson,Ian(2007).KnowledgeDiscoveryandDataMining:ChallengesandRealities.New
York,NY:Hershey.pp.163189.ISBN9781599042527.
39. ^Zhu,XingquanDavidson,Ian(2007).KnowledgeDiscoveryandDataMining:ChallengesandRealities.New
York,NY:Hershey.pp.3148.ISBN9781599042527.
40. ^Chen,YudongZhang,YiHu,JianmingLi,Xiang(2006)."TrafficDataAnalysisUsingKernelPCAand
SelfOrganizingMap".IEEEIntelligentVehiclesSymposium.
41. ^Bate,AndrewLindquist,MarieEdwards,I.RalphOlsson,StenOrre,RolandLansner,Andersandde
Freitas,RogelioMelhadoABayesianneuralnetworkmethodforadversedrugreactionsignalgeneration
(http://dml.cs.byu.edu/~cgc/docs/atdm/W11/BCPNNADR.pdf),EuropeanJournalofClinicalPharmacology
1998Jun54(4):31521PubMed(https://www.ncbi.nlm.nih.gov/pubmed/9696956)
42. ^Norn,G.NiklasBate,AndrewHopstadius,JohanStar,KristinaandEdwards,I.Ralph(2008)Temporal
PatternDiscoveryforTrendsandTransientEffects:ItsApplicationtoPatientRecords.Proceedingsofthe
FourteenthInternationalConferenceonKnowledgeDiscoveryandDataMining(SIGKDD2008),LasVegas,
NV,pp.963971.
43. ^Zernik,JosephDataMiningasaCivicDutyOnlinePublicPrisoners'RegistrationSystems
(http://www.scribd.com/doc/38328591/),InternationalJournalonSocialMedia:Monitoring,Measurement,
Mining,1:8496(2010)
44. ^Zernik,JosephDataMiningofOnlineJudicialRecordsoftheNetworkedUSFederalCourts
http://en.wikipedia.org/wiki/Data_mining
21/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
44. ^Zernik,JosephDataMiningofOnlineJudicialRecordsoftheNetworkedUSFederalCourts
(http://www.scribd.com/doc/38328585/),InternationalJournalonSocialMedia:Monitoring,Measurement,
Mining,1:6983(2010)
45. ^DavidG.Savage(20110624)."Pharmaceuticalindustry:SupremeCourtsideswithpharmaceuticalindustryin
twodecisions"(http://articles.latimes.com/2011/jun/24/nation/lanacourtdrugs20110624).LosAngelesTimes.
Retrieved20121107.
46. ^abcAnalyzingMedicalData.(2012).CommunicationsoftheACM55(6),1315.
doi:10.1145/2184319.2184324(http://dx.doi.org/10.1145%2F2184319.2184324)
47. ^http://searchhealthit.techtarget.com/definition/HITECHAct
48. ^Healey,RichardG.(1991)DatabaseManagementSystems,inMaguire,DavidJ.Goodchild,MichaelF.and
Rhind,DavidW.,(eds.),GeographicInformationSystems:PrinciplesandApplications,London,GB:Longman
49. ^Camara,AntonioS.andRaper,Jonathan(eds.)(1999)SpatialMultimediaandVirtualReality,London,GB:
TaylorandFrancis
50. ^Miller,HarveyJ.andHan,Jiawei(eds.)(2001)GeographicDataMiningandKnowledgeDiscovery,London,
GB:Taylor&Francis
51. ^Ma,Y.Richards,M.Ghanem,M.Guo,Y.Hassard,J.(2008)."AirPollutionMonitoringandMiningBased
onSensorGridinLondon".Sensors8(6):3601.doi:10.3390/s8063601
(http://dx.doi.org/10.3390%2Fs8063601).
52. ^Ma,Y.Guo,Y.Tian,X.Ghanem,M.(2011)."DistributedClusteringBasedAggregationAlgorithmfor
SpatialCorrelatedSensorNetworks".IEEESensorsJournal11(3):641.doi:10.1109/JSEN.2010.2056916
(http://dx.doi.org/10.1109%2FJSEN.2010.2056916).
53. ^Zhao,KaidiandLiu,BingTirpark,ThomasM.andWeimin,XiaoAVisualDataMiningFrameworkfor
ConvenientIdentificationofUsefulKnowledge(http://dl.acm.org/citation.cfm?id=1106390)
54. ^Keim,DanielA.InformationVisualizationandVisualDataMining
(http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.135.7051)
55. ^Burch,MichaelDiehl,StephanWeigerber,PeterVisualDataMininginSoftwareArchives
(http://dl.acm.org/citation.cfm?doid=1056018.1056024)
56. ^Pachet,FranoisWestermann,GertandLaigre,DamienMusicalDataMiningforElectronicMusic
Distribution(http://www.csl.sony.fr/downloads/papers/2001/pachet01c.pdf),Proceedingsofthe1stWedelMusic
Conference,Firenze,Italy,2001,pp.101106.
57. ^GovernmentAccountabilityOffice,DataMining:EarlyAttentiontoPrivacyinDevelopingaKeyDHS
ProgramCouldReduceRisks,GAO07293(February2007),Washington,DC
58. ^SecureFlightProgramreport(http://www.msnbc.msn.com/id/20604775/),MSNBC
59. ^"Total/TerrorismInformationAwareness(TIA):IsItTrulyDead?"
(http://w2.eff.org/Privacy/TIA/20031003_comments.php).ElectronicFrontierFoundation(officialwebsite).
2003.Retrieved20090315.
60. ^Agrawal,RakeshMannila,HeikkiSrikant,RamakrishnanToivonen,HannuandVerkamo,A.InkeriFast
discoveryofassociationrules,inAdvancesinknowledgediscoveryanddatamining,MITPress,1996,pp.307
328
61. ^abNationalResearchCouncil,ProtectingIndividualPrivacyintheStruggleAgainstTerrorists:AFramework
forProgramAssessment,Washington,DC:NationalAcademiesPress,2008
62. ^Haag,StephenCummings,MaevePhillips,Amy(2006).ManagementInformationSystemsforthe
http://en.wikipedia.org/wiki/Data_mining
22/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
62. ^Haag,StephenCummings,MaevePhillips,Amy(2006).ManagementInformationSystemsforthe
informationage.Toronto:McGrawHillRyerson.p.28.ISBN0070955697.OCLC63194770
(https://www.worldcat.org/oclc/63194770).
63. ^Ghanem,MoustafaGuo,YikeRowe,AnthonyWendel,Patrick(2002)."Gridbasedknowledgediscovery
servicesforhighthroughputinformatics"."Proceedings11thIEEEInternationalSymposiumonHigh
PerformanceDistributedComputing".p.416.doi:10.1109/HPDC.2002.1029946
(http://dx.doi.org/10.1109%2FHPDC.2002.1029946).ISBN0769516866.
64. ^Ghanem,MoustafaCurcin,VasaWendel,PatrickGuo,Yike(2009)."BuildingandUsingAnalytical
WorkflowsinDiscoveryNet"."DataMiningTechniquesinGridComputingEnvironments".p.119.
doi:10.1002/9780470699904.ch8(http://dx.doi.org/10.1002%2F9780470699904.ch8).ISBN9780470699904.
65. ^Cannataro,MarioTalia,Domenico(January2003)."TheKnowledgeGrid:AnArchitectureforDistributed
KnowledgeDiscovery"(http://grid.deis.unical.it/papers/pdf/CACM2003.pdf).CommunicationsoftheACM46
(1):8993.doi:10.1145/602421.602425(http://dx.doi.org/10.1145%2F602421.602425).Retrieved17October
2011.
66. ^Talia,DomenicoTrunfio,Paolo(July2010)."Howdistributeddataminingtaskscanthriveasknowledge
services"(http://grid.deis.unical.it/papers/pdf/CACM2010.pdf).CommunicationsoftheACM53(7):132137.
doi:10.1145/1785414.1785451(http://dx.doi.org/10.1145%2F1785414.1785451).Retrieved17October2011.
67. ^Seltzer,William."ThePromiseandPitfallsofDataMining:EthicalIssues"
(http://www.amstat.org/committees/ethics/linksdir/Jsm2005Seltzer.pdf).
68. ^Pitts,Chip(15March2007)."TheEndofIllegalDomesticSpying?Don'tCountonIt"
(http://www.washingtonspectator.com/articles/20070315surveillance_1.cfm).WashingtonSpectator.
69. ^Taipale,KimA.(15December2003)."DataMiningandDomesticSecurity:ConnectingtheDotstoMake
SenseofData"(http://www.stlr.org/cite.cgi?volume=5&article=2).ColumbiaScienceandTechnologyLaw
Review5(2).OCLC45263753(https://www.worldcat.org/oclc/45263753).SSRN546782
(https://ssrn.com/abstract=546782).
70. ^Resig,JohnandTeredesai,Ankur(2004)."AFrameworkforMiningInstantMessagingServices"
(http://citeseer.ist.psu.edu/resig04framework.html).Proceedingsofthe2004SIAMDMConference.
71. ^abcThinkBeforeYouDig:PrivacyImplicationsofDataMining&Aggregation
(http://www.nascio.org/publications/documents/NASCIOdataMining.pdf),NASCIOResearchBrief,September
2004
72. ^Ohm,Paul."Don'tBuildaDatabaseofRuin"
(http://blogs.hbr.org/cs/2012/08/dont_build_a_database_of_ruin.html).HarvardBusinessReview.
73. ^DarwinBondGraham,IronCagebookTheLogicalEndofFacebook'sPatents
(http://www.counterpunch.org/2013/12/03/ironcagebook/),Counterpunch.org,2013.12.03
74. ^DarwinBondGraham,InsidetheTechindustrysStartupConference
(http://www.counterpunch.org/2013/09/11/insidethetechindustrysstartupconference/),Counterpunch.org,
2013.09.11
75. ^AOLsearchdataidentifiedindividuals(http://www.securityfocus.com/brief/277),SecurityFocus,August2006
76. ^BiotechBusinessWeekEditors(June30,2008)BIOMEDICINEHIPAAPrivacyRuleImpedesBiomedical
Research,BiotechBusinessWeek,retrieved17November2009fromLexisNexisAcademic
77. ^ResearchersGivenDataMiningRightUnderNewUKCopyrightLaws.(http://www.out
law.com/en/articles/2014/june/researchersgivendataminingrightundernewukcopyrightlaws/%7CUK)Out
Law.com.Retrieved14November2014
http://en.wikipedia.org/wiki/Data_mining
23/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
Law.com.Retrieved14November2014
78. ^"LicencesforEuropeStructuredStakeholderDialogue2013"(http://ec.europa.eu/licencesforeurope
dialogue/en/content/aboutsite).EuropeanCommission.Retrieved14November2014.
79. ^"TextandDataMining:ItsimportanceandtheneedforchangeinEurope"(http://libereurope.eu/news/textand
dataminingitsimportanceandtheneedforchangeineurope/).AssociationofEuropeanResearchLibraries.
Retrieved14November2014.
80. ^"JudgegrantssummaryjudgmentinfavorofGoogleBooksafairusevictory"
(http://www.lexology.com/library/detail.aspx?g=a18c5b925a204d1da098a3095046a88e).Lexology.com.
AntonelliLawLtd.Retrieved14November2014.
81. ^Mikut,RalfReischl,Markus(SeptemberOctober2011)."DataMiningTools"
(http://onlinelibrary.wiley.com/doi/10.1002/widm.24/abstract).WileyInterdisciplinaryReviews:DataMiningand
KnowledgeDiscovery1(5):431445.doi:10.1002/widm.24(http://dx.doi.org/10.1002%2Fwidm.24).Retrieved
October21,2011.
82. ^KarlRexer,HeatherAllen,&PaulGearan(2011)UnderstandingDataMiners(http://www.analytics
magazine.org/mayjune2011/320understandingdataminers),AnalyticsMagazine,May/June2011(INFORMS:
InstituteforOperationsResearchandtheManagementSciences).
83. ^Kobielus,JamesTheForresterWave:PredictiveAnalyticsandDataMiningSolutions,Q12010
(http://www.forrester.com/rb/Research/wave%26trade%3B_predictive_analytics_and_data_mining_solutions%2C
/q/id/56077/t/2),ForresterResearch,1July2008
84. ^Herschel,GarethMagicQuadrantforCustomerDataMiningApplications
(http://mediaproducts.gartner.com/reprints/sas/vol5/article3/article3.html),GartnerInc.,1July2008
85. ^Nisbet,RobertA.(2006)DataMiningTools:WhichOneisBestforCRM?Part1(http://www.information
management.com/specialreports/20060124/10460251.html),InformationManagementSpecialReports,January
2006
86. ^Haughton,DominiqueDeichmann,JoelEshghi,AbdolrezaSayek,SelinTeebagy,NicholasandTopi,Heikki
(2003)AReviewofSoftwarePackagesforDataMining(http://www.jstor.org/pss/30037299),TheAmerican
Statistician,Vol.57,No.4,pp.290309
87. ^Goebel,MichaelGruenwald,Le(1999)ASurveyofDataMiningandKnowledgeDiscoverySoftwareTools
(https://wwwmatthes.in.tum.de/file/1klx69ggd5riv/Enterprise%202.0%20Tool%20Survey/Paper/A%20survey%20
of%20data%20mining%20and%20knowledge%20discovery%20software%20tools.pdf),SIGKDDExplorations,
Vol.1,Issue1,pp.2033
Furtherreading
Cabena,PeterHadjnian,PabloStadler,RolfVerhees,JaapandZanasi,Alessandro(1997)
DiscoveringDataMining:FromConcepttoImplementation,PrenticeHall,ISBN0137439806
M.S.Chen,J.Han,P.S.Yu(1996)"Datamining:anoverviewfromadatabaseperspective
(http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading01/chen_tkde96.pdf)".
KnowledgeanddataEngineering,IEEETransactionson8(6),866883
Feldman,RonenandSanger,JamesTheTextMiningHandbook,CambridgeUniversityPress,ISBN
http://en.wikipedia.org/wiki/Data_mining
24/25
25/11/2014
DataminingWikipedia,thefreeencyclopedia
9780521836579
Guo,YikeandGrossman,Robert(editors)(1999)HighPerformanceDataMining:Scaling
Algorithms,ApplicationsandSystems,KluwerAcademicPublishers
Han,Jiawei,MichelineKamber,andJianPei.Datamining:conceptsandtechniques.Morgan
kaufmann,2006.
Hastie,Trevor,Tibshirani,RobertandFriedman,Jerome(2001)TheElementsofStatistical
Learning:DataMining,Inference,andPrediction,Springer,ISBN0387952845
Liu,Bing(2007)WebDataMining:ExploringHyperlinks,ContentsandUsageData,Springer,
ISBN3540378812
Murphy,Chris(16May2011)."IsDataMiningFreeSpeech?".InformationWeek(UMB):12.
Nisbet,RobertElder,JohnMiner,Gary(2009)HandbookofStatisticalAnalysis&DataMining
Applications,AcademicPress/Elsevier,ISBN9780123747655
Poncelet,PascalMasseglia,FlorentandTeisseire,Maguelonne(editors)(October2007)"Data
MiningPatterns:NewMethodsandApplications",InformationScienceReference,ISBN9781
599041629
Tan,PangNingSteinbach,MichaelandKumar,Vipin(2005)IntroductiontoDataMining,ISBN
0321321367
Theodoridis,SergiosandKoutroumbas,Konstantinos(2009)PatternRecognition,4thEdition,
AcademicPress,ISBN9781597492720
Weiss,SholomM.andIndurkhya,Nitin(1998)PredictiveDataMining,MorganKaufmann
Witten,IanH.Frank,EibeHall,MarkA.(30January2011).DataMining:PracticalMachine
LearningToolsandTechniques(3ed.).Elsevier.ISBN9780123748560.(SeealsoFreeWeka
software)
Ye,Nong(2003)TheHandbookofDataMining,Mahwah,NJ:LawrenceErlbaum
Externallinks
Retrievedfrom"http://en.wikipedia.org/w/index.php?
title=Data_mining&oldid=634499498"
WikimediaCommonshas
mediarelatedtoData
mining.
http://en.wikipedia.org/wiki/Data_mining
25/25