0% found this document useful (0 votes)

6 views48 pages

BDA BigDataArchitecturesAndModelManagement

Uploaded by

Juan Pablo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views48 pages

BDA BigDataArchitecturesAndModelManagement

Uploaded by

Juan Pablo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Big Data and Model

Management

1
Knowledge objectives
1. Explain the data and model management challanges in a data science
pipeline
2. Justify the need of a Data Lake for data management
3. Identify the difficulties of a Data Lake
4. Explain different model selection techniques
5. Justifiy the need for model management
6. Explain different model management techniques

2
Application Objectives
1. Given a use case, define its software architecture
2. Given a data science pipeline, manage the models and their associated
data

3
Data Science
Overview and Challenges
Data science (I)
Collective processes, theories, concepts, tools and technologies that enable the review,
analysis and extraction of valuable knowledge and information from raw data1

• Traditionally, descriptive analysis

• Data: Focused on gathering data that are obviously valuable
• E.g., business objects representing customers, items in catalogs, purchases, contracts, etc.
• Common tasks: ETL from multiple data sources, creating joined views, followed
by filtering, aggregation, or cube materialization
• Feed report generators, front-end dashboards, and other visualization tools to support
common “roll-up” and “drill-down” operations on multi-dimensional data
Data warehouse
BI tools

ETL Data marts

Reporting tools

Data sources Storage layer Presentation layer

1
Source: Techopedia 5
Descriptive analysis (Use case)
Collective processes, theories, concepts, tools and technologies that enable the review,
analysis and extraction of valuable knowledge and information from raw data1
Use case: e-commerce company selling a wide range of products
• Data: Customer data, their demographics, purchasing history
• Traditionally,• descriptive analysis
Examine the frequency and distribution of purchases across different product
on gathering data that are obviously valuable
• Data: Focusedcategories
Discover:
• E.g., business• objects a significantcustomers,
representing portion of the customers
items frequently
in catalogs, purchase
purchases, contracts, etc.
electronics (e.g., smartphones and laptops)
• Common tasks: ETL from multiple data sources, creating joined views, followed
• Analyze their demographic information
by filtering, aggregation, or cube materialization
Discover: a large number of them belong to a young group (18-35)
•
• Feed report generators, and
front-end
reside in dashboards,
urban areas and other visualization tools to support
common “roll-up” and “drill-down” operations on multi-dimensional data
• Action: tailer marketing campaigns and product offerings to better cater to this
specific group Data warehouse
BI tools

ETL Data marts

Reporting tools

Data sources Storage layer Presentation layer

1
Source: Techopedia 6
Data science
Collective processes, theories, concepts, tools and technologies that enable the review,
analysis and extraction of valuable knowledge and information from raw data1

• More recently, predictive analysis

• Data: In addition, gather behavioral data from users (not obviously important)
• E.g., web pages that users click, links that they click on, etc.
• Common tasks: Use machine learning techniques to train predictive models
• E.g., whether a piece of content is spam, whether two users should become “friends”, etc.

1
Source: Techopedia 7
Predictive analysis (Use case)
Collective processes, theories, concepts, tools and technologies that enable the review,
analysis and extraction of valuable knowledge and information from raw data1

Use case: e-commerce company selling a wide range of products

• More recently, predictive
• Data: Customer data, theiranalysis
demographics, purchasing history + browsing behavior
• Data:• InProduct
addition, gather behavioral data from users
recommendation
• Analyze
• E.g., web customer
pages that userspurchasing
click, linkshistory, browsing
that they behavior,
click on, etc. and similarities with
other customers, the e-commerce company can create a recommendation
• Common tasks: Use machine learning techniques to train predictive models
engine that suggests relevant products to individual customers
• E.g., whether a piece of content is spam, whether two users should become “friends”, the
• This personalized recommendation system can enhance the customer experience,
likelihood that a user will complete a purchase or be interested in a related product, etc.
increase customer engagement, and drive cross-selling or upselling opportunities

1
Source: Techopedia 8
A typical data science workflow1
1. Data management
Acquire data 2. Model management Analysis

Execute
Reformat and scripts
clean data Edit analysis
scripts

Data-centric
Preparation tric
l-c
en Inspect
e
Mo
d outputs
Explore Debug
alternatives

Dissemination

Make comparisons Write reports

Take notes Deploy online

Hold meetings Archive experiment

Share experiment
Reflection

1
Adapted from: Philip J. Guo. Software Tools to Facilitate Research Programming. Ph.D. dissertation, 2012. 9
A typical data science workflow1
Challenges
1. Data management
Acquire data 2. Model management Analysis

Execute
scripts
1. How to Reformat and
assign names to data files that are created or downloaded? How to organize
clean data
those files into directories? Editscripts
analysis

Data-centric
2. Preparation
How to keep track of provenance?
ntr
ic
ce Inspect
e l-
3. Where does data come Mo from and is it still up-to-date?
d outputs
Explore Debug
4. How to store data that cannot fit on a single hard drive?
alternatives
5. How to fix semantic errors, missing entries, inconsistent formatting? How to do it
efficiently for large amounts of data? Dissemination
6. How to integrate data?Make comparisons Write reports
7. … Take notes Deploy online

Hold meetings Archive experiment

Share experiment
Reflection

1
Adapted from: Philip J. Guo. Software Tools to Facilitate Research Programming. Ph.D. dissertation, 2012. 10
A typical data science experiment directory1

A file listing from a computational

biologist’s experiment directory

Often metadata such as

version numbers, script
parameter values, and even
short notes are encoded in
output filenames:
J Metadata stays attached
to the file and remains
highly visible
L Leads to data
management problems

1
Source: Philip J. Guo. Software Tools
to Facilitate Research Programming.
Ph.D. dissertation, 2012.
Therefore, in a real world ML system1 …

Data Machine
Resource Monitoring
Verification
Management
Configuration Data Collection
Serving
ML Infrastructure
Code
Analysis Tools

Feature
Process
Extraction
Management Tools

Only a tiny fraction of the code is devoted to learning or prediction,

much of the remainder may be described as plumbing

1
Source: D. Sculley et al. Hidden Technical Debt in Machine Learning Systems. NeurIPS, 2021. 12
Data Management Backbone
From Warehouses to Lakehouses
Michael Armbrust, Ali Ghodsi, Reynold
Xin, Matei Zaharia. Lakehouse: A New
Generation of Open Platforms that
Unify Data Warehousing and Advanced
Analytics. CIDR 2021
Data Management
Data management refers to the functionalities a DBMS must provide:
§ Ingestion: means provided to insert /upload data
§ E.g., ORACLE SQL*Loader
§ Storage: format/structures used to persist data
§ E.g., hash, B-tree, heap file
§ Modelling: arrangement of data within the available structures
§ E.g., normalization, partitioning
§ Processing: means provided to manipulate data
§ E.g., PL/SQL
§ Querying/fetching: means provided to allow users to retrieve data
§ E.g., SQL, Relational Algebra

In Big Data settings, they are the same concepts but assuming NOSQL underneath
1. Typically, a distributed system
2. Possibly with an alternative data model to the Relational one
3. Implementing ad-hoc architectural solutions

14
Data Management The majority of NoSQL
databases were inspired by
the necessity to be
Data management refers to the functionalities a DBMS must provide:
executed in clusters, which
§ Ingestion: means provided to insert /upload data lead to ”aggregated” data
§ E.g., ORACLE SQL*Loader models: data that are
§ Storage: format/structures used to persist data accessed together.
§ E.g., hash, B-tree, heap file
§ Modelling: arrangement of data within the available structures They differ on how they
§ E.g., normalization, partitioning structure the “aggregate”
§ Processing: means provided to manipulate data and how they allow it to be
§ E.g., PL/SQL accessed.
§ Querying/fetching: means provided to allow users to retrieve data
§ E.g., SQL, Relational Algebra

source: https://k21academy.com/dba-to-cloud-dba/nosql-database-service-in-oracle-cloud 15
1st gen.: Data Warehouses
• Aid the business via analytical insights
• Extract data from operational databases
• Transform and load them into centralized DWs
• Schema-on-write
• The data model is optimized for BI operations
• Some challenges
• Compute and storage in on-premise appliances
• Requires to provision and pay for peak workloads / large datasets
• Unstructured data
• Video, audio, text documents
• DWs cannot deal with this formats

16
Model-First (Load-Later)

Product
• Popularity
• Top feature • Avg (sentiment)
• Bottom feature • Keen: Avg(landing time)/#visits

Interested In
Is part of

Feature Assesses User

• Avg rating
• Avg(sentiment) • List of preferences

Sentiment Analysis Product homogenization Log Analysis (e.g.,

(e.g., Text Mining) (e.g., duplicate detection) Process Mining)

• User
Twitter API • User • Product • Product Web Logs
(JSON) • Tweet • Product features • Landing time (Logs)
• Date • Visits ts
• Location USER WEB
USER FEEDBACK PRODUCT INFO
In-house DB BEHAVIOUR
(PostgreSQL)

17
Drawbacks of Model-First (Load-Later)
- Costly: on-premise appliance, provisioned and payed for the peak of user load
- Unstructured data: more and more datasets are completely unstructured, DWs cannot cope well

Product
• Popularity
• Top feature • Avg (sentiment)
Fixed Target • Bottom feature • Keen: Avg(landing time)/#visits
Schema
Interested In
Is part of

Feature Assesses User

• Avg rating
Permanent • Avg(sentiment) • List of preferences
transformations
Sentiment Analysis Product homogenization Log Analysis (e.g.,
(e.g., Text Mining) (e.g., duplicate detection) Process Mining) High Entry
• User
Barriers
Twitter API • User • Product • Product Web Logs
(JSON) • Tweet • Product features • Landing time (Logs)
• Date • Visits ts
• Location USER WEB
USER FEEDBACK PRODUCT INFO
In-house DB BEHAVIOUR
(PostgreSQL)

18
2nd gen.: Data Lakes
• Idea: Load-First, Model-Later
• Modelling at load time restricts the
potential analysis that can be done
later (Big Analytics)
• Characteristics:
a) Store raw data
b) Low cost storage
c) Create on-demand
selection/processing to handle
precise analysis needs
d) Initiated by the Apache Hadoop
movement

19
Load-First (Model-Later)
Data repository
Product Interested In User
• Popularity
USER WEB • Top feature
• Avg rating
USER FEEDBACK • Avg (sentiment) • List of preferences
BEHAVIOUR • Bottom feature • Keen:
Twitter API Web Logs Avg(landing
time)/#visits Analyst 1
(JSON) (Logs)
User
PRODUCT INFO • Avg rating
• List of preferences
Product Is part of
In-house DB • Popularity
Feature
• Top feature Assesses
(Relational) • Bottom feature - Avg (sentiment)
Analyst 2

Data Views
USER WEB
USER FEEDBACK PRODUCT INFO
BEHAVIOUR
Twitter API Web Logs
(JSON) In-house DB
(Logs)
(Relational)

20
Drawbacks of Load-First (Model-Later)
Data Swamp
Data repository
Product Interested In User
• Popularity
USER WEB • Top feature
• Avg rating
USER FEEDBACK • Avg (sentiment) • List of preferences
BEHAVIOUR • Bottom feature • Keen:
Twitter API Web Logs Avg(landing
time)/#visits Analyst 1
(JSON) (Logs) Complex
Transformations User
PRODUCT INFO • Avg rating
• List of preferences
Product Is part of
In-house DB • Popularity
Feature
• Top feature Assesses
(PostgreSQL) • Bottom feature - Avg (sentiment)
Analyst 2

Data Views
USER WEB
USER FEEDBACK PRODUCT INFO
BEHAVIOUR
Twitter API Web Logs
(JSON) In-house DB
(Logs)
(PostgreSQL)
Stonebraker (2014)

21
Towards semantic-awareness
Semantic-aware
Data repository data repository
Product Interested In User
• Popularity
• Avg rating
• Top feature • Avg (sentiment) • List of preferences
• Bottom feature • Keen:
Avg(landing
time)/#visits Analyst 1

User
Metadata catalog • Avg rating
File 3 • List of preferences
File 1 File 2
Product Is part of
• User • Popularity
• User
• Product • Product • Top feature Feature Assesses
• Tweet • Landing • Bottom feature - Avg (sentiment)
• Product
• Date time
Analyst 2
features
• Location • Visits ts
Data Views

USER FEEDBACK PRODUCT INFO USER WEB • Metadata provides semantics

• Source schemata, mappings to views, parsing
BEHAVIOUR
Twitter API In-house DB
Web Logs
information, …
(JSON) (PostgreSQL)
(Logs)

• Automation of the integration processes

22
From IT-Centered to User-Centered
Semantic-aware data repository
Product Interested In User
• Popularity
• Avg rating
• Top feature • Avg (sentiment) • List of preferences
• Bottom feature • Keen:
Avg(landing
time)/#visits Analyst 1

USER FEEDBACK PRODUCT INFO USER WEB

BEHAVIOUR
Twitter API
(JSON)
In-house DB
(PostgreSQL) Web Logs AUTOMATED DATA GOVERNANCE
(Logs)

23
A possible implementation
Sergi Nadal, Petar Jovanovic, Besim
Bilalli, and Oscar Romero.
Operationalizing and automating Data
Goverance. Journal of Big Data 2022.
A possible instantiation
Sergi Nadal, Petar Jovanovic, Besim
Bilalli, and Oscar Romero.
Operationalizing and automating Data
Goverance. Journal of Big Data 2022.
Hetereogeneity of data formats
• General-Purpose Formats
• CSV (comma separated values), JSON ( javascript object notation), XML, Protobuf
• CLI/API access to DBs, KV-stores, Doc-stores, Time series DBs, etc
• Sparse Matrix Formats
• Matrix market: text IJV (row, col, value)
• Libsvm: text compressed sparse rows
• Scientific formats: NetCDF, HDF5
• Large-Scale Data Formats
• Parquet (columnar file format)
• Arrow (cross-platform columnar in-memory data)
• Domain-Specific Formats
• Health care: DICOM images, HL7 messages (health-level seven XML)
• Automotive: MDF (measurements), CDF (calibrations), ADF (auto-lead XML)

27
Difference between Parquet and CSV
• Columnar (hybrid) storage that brings efficiency compared to row-based files like CSV
• When querying you can skip over the non-relevant data quickly, aggregation queries are less time
consuming -> hardware savings and minimizes latency
• Build from ground up and is able to support advanced data structures
• The layout is optimized for queries that process large volumes of data
• Supports flexible compression options and efficient encoding schemes

Dataset Size on Amazon S3 Query Run Time Data Scanned Cost

Data stored as CSV files 1 TB 236 seconds 1.15 TB $5.75
Data stored in Apache
13 GB 6.78 seconds 2.51 GB $0.01
Parquet Format
Savings 87% less when 34x faster 99% less 99.7% savings
using Parquet data scanned

source: https://www.databricks.com/glossary/what-is-parquet 28
Parquet
• Row groups (RG) - horizontal partitions
• Data vertically partitioned within RGs
• Statistics per row group (aid filtering)
• E.g., min-max

29
3rd gen.: Cloud Data Lakes
• Cloud Data Lakes
• Superior availability
• Geo-replication
• Low cost (pay as you go and elasticity)
• AWS S3, AWS Glacier, Azure Data Lake Storage (ADLS), Google Cloud Storage
(GCS)
• Two-tier data architectures
• Same as 2nd gen, but the DW also resides on the cloud
• AWS Redshift, Snowflake

33
Drawbacks of 3rd gen. Data Analytics Platforms
• Reliability
• Keeping the DL and the DW consistent is difficult and costly
• Data staleness
• Data in DW is stale compared to that of the DL
• This is a step backwards w.r.t. 1st gen where operational data was quickly available
• Limited support for advanced analytics
• None of the leading ML systems (TensorFlow, PyTorch or XGBoost) work well on
top of DWs
• These systems need to process large datasets using complex non-SQL code
• Accessing the DL, all nice features from the DW are lost (transactions, data versioning,
indexing,…)
• Total cost of ownership
• You pay double for the stored data (in the DL and in the DW)

34
A 4th gen.: Lakehouses
• Question: “is it possible to turn data lakes based on
standard open data formats, such as Parquet and ORC,
into high - performance systems that can provide both
the performance and management features of data
warehouses and fast, direct I/O from advanced analytics
workloads?”
• The Lakehouse
• Reliable data management on data lakes
• Not just a ”bunch of files”
• Support for ML and data science
• Declarative DataFrame APIs
• SQL performance
• Optimize open data layouts to compete with DWs

35
Data Analysis Backbone
The need for Model Management

*Some of the following slides are borrowed from the ‘Architectures of ML Systems’ course of Matthias Boehm, TU Berlin
Traditional Software vs AI/Data Science1
Software AI (Software + Data)
Goal Functional correctness Optimization of a metric, e.g., minimize loss
Quality Depends on code Depends on data, code, model architecture,
hyperparameters, random seeds, …
Outcome Works deterministically Changes due to data drift
People Software Engineers Software Engineers, Data Scientists, Data Engineers,
Research Scientists, ML engineers
Tooling Usually standardized Often heterogeneous even within teams
within a dev team
Established/hardened Few established standards and in constant change due
over decades to open source innovation

AI depends on Code AND Data AI requires many different roles

to get involved

1Clemens Mewald: Announcing Databricks Machine Learning, Feature Store, AutoML, Keynote Data+ AI Summit 2021. 37
The Data Analysis Backbone
1. Data management
Acquire data 2. Model management Analysis

Execute
Reformat and scripts
Data/SW clean data Edit analysis
Engineer scripts
Preparation
Inspect
outputs
Explore Debug
alternatives

Data Scientist
Dissemination

Make comparisons Write reports

Take notes Deploy online

Hold meetings Archive experiment

DevOps
Share experiment
Reflection Engineer

1
Adapted from: Philip J. Guo. Software Tools to Facilitate Research Programming. Ph.D. dissertation, 2012. 38
Thriving Procurement and
ecosystem of DevOps
innovation! nightmare!

39
ML Lifecycle Management
[Clemens Mewald: Announcing
Databricks Machine Learning,
Feature Store, AutoML, Keynote
Data+ AI Summit 2021]

Data versioning Code versioning Model lifecycle

… … management

MLOps = DataOps + DevOps + ModelOps

40
AutoML Overview
[Chris Thornton, Frank Hutter, Holger
H. Hoos, Kevin Leyton-Brown: Auto-
WEKA: combined selection and
hyperparameter optimization of
classification algorithms. KDD 2013]

• #1 Model Selection
• Given a dataset and ML task (e.g., classification or regression)
• Select the model (type) that performs best
(e.g.: LogReg, Naïve Bayes, SVM, Decision Tree, Random Forest, DNN)
• #2 Hyper Parameter Tuning
• Given a model and dataset, find best hyper parameter values
(e.g., learning rate, regularization, kernels, kernel params, tree params)
• Validation: Generalization Error
• Goodness of fit to held-out data (e.g., 80-20 train/test)
• Cross validation (e.g., 10 fold cross validation, leave-one out)
• AutoML Systems/Services
• Often providing both model selection and hyper parameter search
• Integrated ML system, often in distributed/cloud environments

41
Basic Grid Search
• Basic approach
• Given n hyper parameters λ1, …, λn with domains Λ1, …, Λn
• Enumerate and evaluate parameter space Λ ⊆ Λ1 × … × Λ𝑛
(often strict subset due to dependency structure of params)
• Continuous hyper parameters -> discretization
• Equi-width
• Exponential
(e.g., regularization 0.1, 0.01, 0.001, etc.)
• Problem: Only applicable with small domains

• Heuristic: Monte-Carlo
(random search, anytime)

42
Basic Grid Search, cont.
• Example Adult Dataset (train 32,561 x 14)
• Binary classification (>50K),
• #1 MLogReg defaults with one-hot categoricals Accuracy(%): 82.35
• #2 MLogReg defaults with one-hot + binning Accuracy(%): 84.73
• #3 GridSearch MLogReg: Accuracy(%): 90.07

params = list (“icpt”, “reg”, “numBins”)

paramRanges = list(seq(0,2), 10^seq(3,-6), 10^(seq(1,4))

43
More sophisticated methods Example 1D Problem
Gaussian process
4 iterations
• Simulated Annealing
• Recursive Random Search
• Bayesian Optimization
• Sequential Model-Based Optimization
• Fit a probabilistic model based on the first n-1
evaluated hyperparameters
• Use model to select next candidate
• Gaussian process (GP) models, or tree-based
Bayesian Optimization
[Eric Brochu, Vlad M. Cora, Nando de
Freitas: A Tutorial on Bayesian
Optimization of Expensive Cost
Functions, with Application to Active
User Modeling and Hierarchical
Reinforcement Learning. CoRR 2010]

44
Selected AutoML Systems
• Auto Weka
• Bayesian optimization with 28 learners, 11
ensemble/meta methods
• Auto Sklearn
• Bayesian optimization with 15 classifiers, 14
feature prep, 4 data prep
• TuPAQ
• Multi-armed bandit and large-scale
• TPOT
• Genetic programming
• Other Services
• Azure ML, Amazon ML
• Google AutoML, H20 AutoML

45
Model Management and
Provenance
Overview Model Management
• Motivation
• Exploratory data science process -> trial and error
(preparation, feature engineering, model selection)
• Different personas (data engineer, ML expert, devops)
• Problems
• No record of experiments, insights lost along the way
• Difficult to reproduce results
• Cannot search or query models
• Difficult to collaborate
• Overview
• Experiment tracking and visualization
• Coarse-grained ML pipeline provenance and versioning
• Fine-grained data provenance (data-/ops-oriented)

47
Model Management Systems (MLOps) [Hui Miao, Ang Li, Larry S. Davis,
Amol Deshpande: ModelHub:
Deep Learning Lifecycle
Management. ICDE 2017]

ModelHub
• Versioning system for DBB models, including provenance tracking
• DSL for model exploration and enumeration queries (model selection +
hyeperparameters) [Manasi Vartak, Samuel Madden:

• Model versions stored as deltas MODELDB: Opportunities and

Challenges in Managing Machine
Learning Models. IEEE Data
ModelDB -> Verta.ai Eng. Bull. 2018]

• Model and provenance logging for ML pipelines via programmatic

APIs
• Support for different ML systems (e.g., spark.mk, scikit-learn, others)
• GUIs for capturing metadata and metric visualization

48
Model Management Systems (MLOps), cont.
• MLflow [https://mlflow.org]
• An open source platform for the machine learning lifecycle
• Use of existing ML systems and various language bindings
• MLflow Tracking: logging and querying experiments
• MLflow Projects: packaging/reproduction of ML pipeline results
• MLflow Models: deployment of models in various services/tools
• MLflow Model Registry: cataloging models and managing in deployment

[Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski,
Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, Corey Zumar:
Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Eng. Bull. 41(4) 2018]

49
Mlflow
Example

50
MLflow UI
• Run mlflow ui in the command line on top of the folder mlruns
http://localhost:5000

Load a registered model

More details in https://docs.databricks.com/en/mlflow/ 51

Closing

AI 102 Dump1
100% (1)
AI 102 Dump1
201 pages
Terminologies Used in Big Data Environments
No ratings yet
Terminologies Used in Big Data Environments
3 pages
Big Data
No ratings yet
Big Data
957 pages
1 U Data-Analytics-Unit-I-1
100% (1)
1 U Data-Analytics-Unit-I-1
81 pages
Big Data Components
No ratings yet
Big Data Components
58 pages
CS8091 BDA Unit 1
No ratings yet
CS8091 BDA Unit 1
118 pages
Unit 1
No ratings yet
Unit 1
137 pages
Unit 1
No ratings yet
Unit 1
76 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
Updated DM
No ratings yet
Updated DM
72 pages
00 - 00 DS - Overview - FRAMEWORK
No ratings yet
00 - 00 DS - Overview - FRAMEWORK
63 pages
Lecture 1- Introduction to Big Data
No ratings yet
Lecture 1- Introduction to Big Data
51 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Kuliah M1 - TEKREK - Komputasi Big Data
No ratings yet
Kuliah M1 - TEKREK - Komputasi Big Data
55 pages
Unit 1 Introduction: Data Science and Big Data: Syllabus
No ratings yet
Unit 1 Introduction: Data Science and Big Data: Syllabus
38 pages
Data Structures
No ratings yet
Data Structures
50 pages
Big Data (1)
No ratings yet
Big Data (1)
23 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
DA Unit 5
No ratings yet
DA Unit 5
191 pages
Data Engineering - Session 01
No ratings yet
Data Engineering - Session 01
34 pages
MLPPT 5
No ratings yet
MLPPT 5
97 pages
Orientation To Computing
No ratings yet
Orientation To Computing
67 pages
Ch3 - Introduction To Big Data Analytics
No ratings yet
Ch3 - Introduction To Big Data Analytics
37 pages
UNUT 1- Introduction and Data Analytics Life Cycle
No ratings yet
UNUT 1- Introduction and Data Analytics Life Cycle
86 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
48 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
Cad - Phase 5
No ratings yet
Cad - Phase 5
24 pages
EmTec Chapter 2 (1)
No ratings yet
EmTec Chapter 2 (1)
32 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Big Data in Management Unit - I: Session 1-5
No ratings yet
Big Data in Management Unit - I: Session 1-5
25 pages
BIG DATA Module 1
No ratings yet
BIG DATA Module 1
16 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
100% (1)
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
75 pages
DAM
No ratings yet
DAM
9 pages
IT 231 Foundation of Information Technology
No ratings yet
IT 231 Foundation of Information Technology
44 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Unit-1
No ratings yet
Unit-1
11 pages
Data Science and Big Data Analytics Unit 1 notes
No ratings yet
Data Science and Big Data Analytics Unit 1 notes
13 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
Orange Lecture01 - Machine Learning (1)
No ratings yet
Orange Lecture01 - Machine Learning (1)
7 pages
Introduction To Big Data & Basic Data Analysis
No ratings yet
Introduction To Big Data & Basic Data Analysis
51 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Data Mining 1
No ratings yet
Data Mining 1
13 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Web Mining
No ratings yet
Web Mining
8 pages
Module 6_Social Media Analytics and Text Mining.
No ratings yet
Module 6_Social Media Analytics and Text Mining.
27 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
BUSINESS ANALYTICS NOTES
No ratings yet
BUSINESS ANALYTICS NOTES
31 pages
IDA Essay question - answer copy
No ratings yet
IDA Essay question - answer copy
6 pages
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
No ratings yet
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
5 pages
Ontology (Information Science)
No ratings yet
Ontology (Information Science)
17 pages
Article
No ratings yet
Article
2 pages
Course1_summary
No ratings yet
Course1_summary
4 pages
Employee Management System
100% (1)
Employee Management System
15 pages
Sebastian Will PHD Thesis
100% (3)
Sebastian Will PHD Thesis
5 pages
IT - Skills Checklist
No ratings yet
IT - Skills Checklist
11 pages
Fda 1
No ratings yet
Fda 1
5 pages
BCIS 4th Semester Syallbus
No ratings yet
BCIS 4th Semester Syallbus
11 pages
Try Latest & Free Nutanix NCSE Core Exam Dumps
No ratings yet
Try Latest & Free Nutanix NCSE Core Exam Dumps
11 pages
AI4Libraries Conference Reports - AI in Library Technical Services Panel and Marshall Breeding S AI in Library Systems Keynote
No ratings yet
AI4Libraries Conference Reports - AI in Library Technical Services Panel and Marshall Breeding S AI in Library Systems Keynote
8 pages
CN_Questions
No ratings yet
CN_Questions
8 pages
QGIS (1)
No ratings yet
QGIS (1)
6 pages
629336c02d0c461239aa37ac - Web Development Syllabus - Masterschool
No ratings yet
629336c02d0c461239aa37ac - Web Development Syllabus - Masterschool
12 pages
DMT - Polapally Kishore
No ratings yet
DMT - Polapally Kishore
7 pages
ISCComm 2023
No ratings yet
ISCComm 2023
6 pages
JD For Linguist - AI Training & Evaluation
No ratings yet
JD For Linguist - AI Training & Evaluation
2 pages
ST2195 Programming For Data Science
No ratings yet
ST2195 Programming For Data Science
11 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
3 pages
Senior Data Engineer
No ratings yet
Senior Data Engineer
2 pages
Introduction of Data Science
No ratings yet
Introduction of Data Science
3 pages
Resume Shravani Sorte-1
No ratings yet
Resume Shravani Sorte-1
1 page
CPS CIE- (2)
No ratings yet
CPS CIE- (2)
1 page
CV Vũ Văn Bảo - baovvCV - InternIT-TopCV.vn
No ratings yet
CV Vũ Văn Bảo - baovvCV - InternIT-TopCV.vn
1 page
Clarity Bio Internship Feb 2023
No ratings yet
Clarity Bio Internship Feb 2023
2 pages
DE Week-1, Lecture
No ratings yet
DE Week-1, Lecture
3 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet