0% found this document useful (0 votes)
152 views

Data Science Syllabus

This document outlines the topics that will be covered across 5 units on data science. Unit 1 introduces data processing architectures and challenges of big data analysis. It also discusses providing structure to unstructured data through machine translation and indexing. Unit 2 covers data types, feature engineering, and analytic techniques. Unit 3 focuses on simple analytic techniques like distributions and relationships as well as deeper analysis like clustering and modeling. Unit 4 applies techniques like text mining and processing to information retrieval. Finally, Unit 5 uses a case study to examine challenges of large scale data and frameworks like MapReduce and Hadoop. Reading materials with more details on these topics are also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views

Data Science Syllabus

This document outlines the topics that will be covered across 5 units on data science. Unit 1 introduces data processing architectures and challenges of big data analysis. It also discusses providing structure to unstructured data through machine translation and indexing. Unit 2 covers data types, feature engineering, and analytic techniques. Unit 3 focuses on simple analytic techniques like distributions and relationships as well as deeper analysis like clustering and modeling. Unit 4 applies techniques like text mining and processing to information retrieval. Finally, Unit 5 uses a case study to examine challenges of large scale data and frameworks like MapReduce and Hadoop. Reading materials with more details on these topics are also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Unit1

Introduction
Introduction:DataProcessingArchitectures,componentsandprocessesDataStoresand
Datakind,Challenges"BigData"andotherwise
SpecialConsiderationsinBigDataAnalysis:Background,TheoryinSearchofData,Data
inSearchofTheory,Overfitting,BignessBias,TooMuchData,FixingData,DataSubsetsin
BigData:NeitherAdditivenorTransitive,AdditionalBigDataPitfalls.
ProvidingStructuretoUnstructuredData:Background,MachineTranslation,Autocoding,
IndexingandTermExtraction
Unit2
DataandFeatures
ComponentPartsofDataScience:DataTypes,ClassesofAnalyticTechniques,Learning
Models,ExecutionModelsFractalAnalyticModel,AnalyticSelectionProcess:
ImplementationConstraints
FeatureEngineering:FeatureSelection,DataVeracity,ApplicationofDomainKnowledge,
CurseofDimensionality
Unit3
DataandAnalysis
SimpleAnalyticTechniques:Background,LookattheData,DataRange,Denominator,
FrequencyDistributions,MeanandStandardDeviation,EstimationOnlyAnalyses
DeepDiveintoAnalysis:Background,AnalyticTasks,Clustering,Classifying,
Recommending,andModelling,DataReduction,NormalisingandAdjustingData,Find
RelationshipsNotSimilarities
Unit4
ApplyingNuancesofDataSciencetoTextProcessingAndInformationRetrieval
TextMining:Definition,Genericarchitecture,TextMiningOperations,FrequentItemset
Mining,CategorizationDocumentRepresentation,Clusteringandcategorization,Bayesian
Classifier
TextProcessing:Tokenization,Stem,Stop,nGram,categorization,informationextraction
Unit5
BignatureofDataCasestudy
MapReduce,ThePaper:Programmingmodel,TypesandExamplesImplementationand
ExecutionArchitecturePartitioning,types,Combiners,DataLocality
Hadoop:ChallengesatLargeScaleandtheHadoopApproachHDFSMapReducein
Hadoop
ReadingMaterial:(Innoparticularorderofprecedence)
1. PrinciplesofBigData:Preparing,SharingandAnalyzingComplexInformation,JulesJ
Berman,FirstEdition,MKPublishers,2013.
2. TheFieldGuidetoDataScience:
http://www.boozallen.com/media/file/TheFieldGuidetoDataScience.pdf
3. UnderstandingBigData:
ftp://129.35.224.12/software/tw/Defining_Big_Data_through_3V_v.pdf
4. Ghemawatet.alGoogle,MapReduce:SimpliedDataProcessingonLargeClusters

5.
6.
7.

8.

http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduc
eosdi04.pdf
HadoopTutorial:http://developer.yahoo.com/hadoop/tutorial/
ScalableSQLandNoSQLDataStores
http://www.sigmod.org/publications/sigmodrecord/1012/pdfs/04.surveys.cattell.pdf
OracleInformationArchitecture
http://www.oracle.com/technetwork/topics/entarch/articles/oeabigdataguide1522052
.pdf
ChallengesandopportunitieswithBigData
http://www.purdue.edu/discoverypark/cyber/assets/pdfs/BigDataWhitePaper.pdf

9. FeatureEngineeringinTextProblems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.9770&rep=rep1&type=pd
f
10. ClusteringExplained(ReadatleasttillVectorSpaceModelandKMeans)
http://www.iula.upf.edu/materials/040701wanner.pdf
Kmeansbrokendown
http://www.engineering.uiowa.edu/~ie_155/Lecture/Kmeans.pdf
11. KMeansexplained(especiallythereasoningforkmeans)
http://www.croce.ggf.br/dados/K%20mean%20Clustering1.pdf
12. NaiveBayesBrokenDown
http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4NaiveBayes.pdf

13. TextMiningSlideshttp://www.vscse.org/summerschool/2013/Abbott.pdf
14. TextMiningHandbookhttp://www.roelsbeestenboel.nl/text.pdf(Chapter1and2,4and
5)

OtherInterestingReadingMaterial:
1. FeatureSelectionforHighDimensionalData:APearsonRedundancyBasedFilter
http://kzi.polsl.pl/~jbiesiada/prace/selekcja/07Wroclaw.pdf
2. OntheRoleofFeatureSelectioninMachineLearning:ThesisonFeatureEngineering
http://www.cs.huji.ac.il/labs/learning/Theses/Navot_PhD.pdf
3. FeatureSelectionMethods(GoodThesis)
http://www.cs.cmu.edu/~kdeng/thesis/feature.pdf
http://www.dccia.ua.es/~boyan/papers/TesisBoyan.pdf
Asurvey
:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.1337&rep=rep1&type=
pdf
4. DimensionalityReduction:http://infolab.stanford.edu/~ullman/mmds/ch11.pdf
5. BayesExplained(Slides)(Veryuseful)
http://www.stanford.edu/class/cs124/lec/naivebayes.pdf

6. NaiveBayesClassifierswithexamples(VeryIntuitiveExplanation)
http://www.cs.ucr.edu/~eamonn/CE/Bayesian%20Classification%20withInsect_exampl
es.pdf

You might also like