0% found this document useful (0 votes)

154 views

Lecture 6 Data Preprocessing

The document discusses various techniques for preprocessing data before analysis, including data cleaning, integration, transformation, reduction, and discretization. It describes why preprocessing is important for obtaining quality data and mining results. Some key tasks covered are handling missing data, noisy data, and inconsistent data through methods like binning, clustering, and regression. The goal of preprocessing is to clean the data by filling in gaps, smoothing outliers, and resolving inconsistencies so the cleaned data supports higher quality analysis and knowledge discovery.

Uploaded by

lasithrandima123

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views

Lecture 6 Data Preprocessing

Uploaded by

lasithrandima123

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Data Preprocessing

Lecture 6

Conducted by
Ms. Akila Brahmana
Department of ICT
Faculty of Technology
University of Ruhuna
Outline
❑ Why preprocess the data?
❑ Data cleaning
❑ Data integration and transformation
❑ Data reduction
❑ Discretization and concept hierarchy generation
❑ Summary
Why Data Preprocessing?
❑ Data in the real world is dirty
1. Incomplete: lacking attribute values, lacking certain attributes
of interest, or containing only aggregate data
❑ e.g., occupation=“ ”
2. Noisy: containing errors or outliers
❑ e.g., Salary=“-10”
3. Inconsistent: containing discrepancies in codes or names
❑ e.g., Age=“42” Birthday=“03/07/1997”
❑ e.g., Was rating “1,2,3”, now rating “A, B, C”
❑ e.g., discrepancy between duplicate records
Why Data Preprocessing?
❑ No quality data, no quality mining results!
❑ Quality decisions must be based on quality data
❑ Data warehouse needs consistent integration of quality data
Why Is Data Dirty?
❑ Incomplete data may come from
❑ “Not applicable” data value when collected
❑ Different considerations between the time when the data was collected and when
it is analyzed.
❑ Human/hardware/software problems
❑ Noisy data (incorrect values) may come from
❑ Faulty data collection instruments
❑ Human or computer error at data entry
❑ Errors in data transmission
❑ Inconsistent data may come from
❑ Different data sources
❑ Functional dependency violation (e.g., modify some linked data)
❑ Duplicate records also need data cleaning
Why Is Data Preprocessing Important?
❑ No quality data, no quality mining results!
❑ Quality decision must be based on quality data
❑ E.g. duplicate or missing data may cause incorrect or even
misleading statistics
❑ Data warehouse needs consistent integration of quality data
❑ Data extraction, cleaning, and transformation comprises the
majority of the work of building a data warehouse
Knowledge Discovery Process, in practice
Multi-Dimensional Measure of Data Quality
❑ A well-accepted multidimensional view:
❑ Accuracy
❑ Completeness
❑ Consistency
❑ Timeliness
❑ Believability
❑ Value added
❑ Interpretability
❑ Accessibility
Major Tasks in Data Preprocessing
❑ Data cleaning
❑ Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
❑ Data integration
❑ Integration of multiple databases, data cubes, or files
❑ Data transformation
❑ Normalization and aggregation
Major Tasks in Data Preprocessing
❑ Data reduction
❑ Obtains reduced representation in volume but produces the
same or similar analytical results
❑ Data discretization
❑ Part of data reduction but with particular importance,
especially for numerical data
Forms of data preprocessing
Data Preprocessing
❑ Why preprocess the data?
❑ Data cleaning
❑ Data integration and transformation
❑ Data reduction
❑ Discretization and concept hierarchy generation
❑ Summary
Data Cleaning Steps
❑ Data acquisition and metadata
❑ Missing values
❑ Unified date format
❑ Converting nominal to numeric
❑ Discretization of numeric data
Data Cleaning: Acquisition
❑ Data can be in DBMS
❑ ODBC, JDBC protocols
❑ Data in a flat file
❑ Fixed-column format
❑ Delimited format: tab, comma “,” , other
❑ E.g. Weka “arff” use comma-delimited data
❑ Verify the number of fields before and after
Types of Attributes
❑ There are different types of attributes
❑ Nominal
❑ Examples: ID numbers, eye color, zip codes
❑ Ordinal
❑ Examples: rankings (e.g., taste of potato chips on a scale
from 1-10), grades, height in {tall, medium, short}
❑ Interval
❑ Examples: calendar dates, temperatures in Celsius or
❑ Ratio
❑ Examples: temperature, length, time, counts
Data Cleaning
❑ Data cleaning tasks

❑ Fill in missing values

❑ Identify outliers and smooth out noisy data

❑ Correct inconsistent data

Missing Data
❑ Data is not always available
❑ E.g., many tuples have no recorded value for several attributes, such
as customer income in sales data
❑ Missing data may be due to
❑ equipment malfunction
❑ inconsistent with other recorded data and thus deleted
❑ data not entered due to misunderstanding
❑ certain data may not be considered important at the time of entry
❑ not register history or changes of the data
❑ Missing data may need to be inferred.
How to Handle Missing Data?
❑ Ignore the tuple: usually done when class label is missing (assuming the tasks in
classification—not effective when the percentage of missing values per attribute
varies considerably)
❑ Fill in the missing value manually: tedious + infeasible?
❑ Use a global constant to fill in the missing value: e.g., “unknown”, a new class?!
❑ Use the attribute mean to fill in the missing value
❑ Use the attribute mean for all samples belonging to the same class to fill in the
missing value: smarter
❑ Use the most probable value to fill in the missing value: inference-based such as
Bayesian formula or decision tree
Data Cleaning: Unified Date Format
❑ We want to transform all dates to the same format internally
❑ Some systems accept dates in many formats
❑ e.g. “Sep 24, 2003” , 9/24/03, 24.09.03, etc
❑ dates are transformed internally to a standard value
❑ Frequently, just the year (YYYY) is sufficient
❑ For more details, we may need the month, the day, the hour, etc
❑ Representing date as YYYYMM or YYYYMMDD
Noisy Data
❑ Noise: random error or variance in a measured variable
❑ Incorrect attribute values may due to
❑ faulty data collection instruments
❑ data entry problems
❑ data transmission problems
❑ technology limitation
❑ inconsistency in naming convention
❑ Other data problems which requires data cleaning
❑ duplicate records
❑ incomplete data
❑ inconsistent data
Data Cleaning: Example
❑ Original data (fixed column format)
000000000130.06.19971979--10-3080145722 #000310 111000301.01.000100000000004
0000000000000.000000000000000.000000000000000.000000000000000.000000000000000.000000000000000.0000
00000000000. 000000000000000.000000000000000.0000000...…
000000000000000.000000000000000.000000000000000.000000000000000.000000000000000.000000000000000.00
0000000000000.000000000000000.000000000000000.000000000000000.000000000000000.000000000000000.0000
00000000000.000000000000000.000000000000000.000000000000000.000000000000000.000000000000000.000000
000000000.000000000000000.000000000000000.000000000000000.00 0000000000300.00 0000000000300.00

❑ Clean data
0000000001,199706,1979.833,8014,5722 , ,#000310 ….
,111,03,000101,0,04,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0300,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0300,0300.00
How to Handle Noisy Data?
❑ Binning method:
❑ first sort data and partition into (equi-depth) bins
❑ then smooth by bin means, smooth by bin median, smooth by bin
boundaries, etc.
❑ Clustering
❑ detect and remove outliers
❑ Combined computer and human inspection
❑ detect suspicious values and check by human
❑ Regression
❑ smooth by fitting the data into regression functions
Simple Discretization Methods: Binning
❑ Equal-width (distance) partitioning:
❑ It divides the range into N intervals of equal size: uniform grid
❑ if A and B are the lowest and highest values of the attribute, the width of intervals
will be: W = (B-A)/N.
❑ The most straightforward
❑ But outliers may dominate presentation
❑ Skewed data is not handled well.

❑ Equal-depth (frequency) partitioning:

❑ It divides the range into N intervals, each containing approximately same number
of samples
❑ Good data scaling
❑ Managing categorical attributes can be tricky.
Binning Methods for Data Smoothing
* Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
Clustering
❑ Partition data set into clusters, and one can store cluster
representation only

❑ Can be very effective if data is clustered but not if data is

“smeared”

❑ Can have hierarchical clustering and be stored in multi-

dimensional index tree structures

❑ There are many choices of clustering definitions and clustering

algorithms
Cluster Analysis
Regression Analysis
A Regression analysis makes it possible to predict another variable
on the basis of one or more variables.
Regress Analysis and Log-Linear Models
Linear regression: Data are modeled to fit a straight line
Y=+X
Two parameters ,  and  specify the line and are to be estimated by using
the data at hand.
using the least squares criterion to the known values of Y1, Y2, …, X1, X2, ….
Ex :- Does the weekly working time have an influence on the hourly wage of
employees.
Regress Analysis and Log-Linear Models
Multiple regression: allows a response variable Y to be modeled as a linear
function of multidimensional feature vector
Y = b0 + b1 X1 + b2 X2.
Many nonlinear functions can be transformed into the above.
Ex: Does the weekly working time and the age of employee have an
influence on the hourly wage of employees.

Log-linear models: approximates discrete multidimensional probability

distributions.
The multi-way table of joint probabilities is approximated by a product of
lower-order tables.
Probability: p(a, b, c, d) = ab acad bcd
Conversion: Nominal to Numeric
❑ Some tools can deal with nominal values internally
❑ Other methods (neural nets, regression, nearest neighbor) require
only numeric inputs
❑ To use nominal fields in such methods need to convert them to a
numeric value
❑ Q: Why not ignore nominal fields altogether?
❑ A: They may contain valuable information
❑ Different strategies for binary, ordered, multivalued nominal fields
Conversion
❑ How would you convert binary fields to numeric?
❑ E.g. Gender=M, F
❑ How would you convert ordered attributes to numeric?
❑ E.g. Grades
Conversion: Nominal, Few Values
❑ Multi-valued, unordered attributes with small (rule of thumb <
20) no. of values
❑ e.g. Color=Red, Orange, Yellow, …, Violet
❑ for each value v create a binary “flag” variable C_v , which is 1 if
Color=v, 0 otherwise
Conversion: Nominal, Many Values
❑ Examples:
❑ US State Code (50 values)
❑ Profession Code (7,000 values, but only few frequent)
❑ Q: How to deal with such fields ?
❑ A: Ignore ID-like fields whose values are unique for each record
❑ For other fields, group values “naturally”:
❑ e.g. 50 US States Æ 3 or 5 regions
❑ Profession – select most frequent ones, group the rest
❑ Create binary flag-fields for selected values
Data Integration
❑ Data integration:
❑ combines data from multiple sources into a coherent store
❑ Schema integration:
❑ integrate metadata from different sources
❑ Entity identification problem:
❑ identify real world entities from multiple data sources, e.g., A.cust-id 
B.cust-#
❑ Detecting and resolving data value conflicts
❑ for the same real world entity, attribute values from different sources are
different
❑ possible reasons: different representations, different scales, e.g., metric vs.
British units
Handling Redundancy in Data Integration
❑ Redundant data occur often when integration of multiple
databases
❑ The same attribute may have different names in different
databases
❑ Careful integration of the data from multiple sources may help
reduce/avoid redundancies and inconsistencies and improve
mining speed and quality
Data Transformation
❑ Smoothing: remove noise from data
❑ Aggregation: summarization, data cube construction
❑ Generalization: concept hierarchy climbing
❑ Normalization: scaled to fall within a small, specified range
❑ min-max normalization
❑ z-score normalization
❑ normalization by decimal scaling
❑ Attribute/feature construction
❑ New attributes constructed from the given ones
Data Transformation: Normalization
❑ min-max normalization
v − minA
v' = ( new _ maxA − new _ minA) + new _ minA
maxA − minA
❑ z-score normalization
v − mean A
v' =
stand _ devA

❑ normalization by decimal scaling

v
v' = j Where j is the smallest integer such that Max(| v ' |)<1
10
Min-max normalization
Suppose that the minimum and maximum values for the attribute
income are $12,000 and $98,000, respectively. We would like to map
income to the range [0:0;1:0]. By minmax normalization, a value of
$73,600 for income is transformed to?
z-score normalization
z-score normalization Suppose that the mean and standard
deviation of the values for the attribute income are $54,000 and
$16,000, respectively. With z-score normalization, a value of $73,600
for income is transformed to?
Data Preprocessing
❑ Why preprocess the data?

❑ Data cleaning

❑ Data integration and transformation

❑ Data reduction

❑ Discretization and concept hierarchy generation

❑ Summary
Data Reduction Strategies
❑ Why data reduction?
❑ A database/data warehouse may store terabytes of data
❑ Complex data analysis/mining may take a very long time to run on the
complete data set
❑ Data reduction
❑ Obtain a reduced representation of the data set that is much smaller in
volume but yet produce the same (or almost the same) analytical results
❑ Data reduction strategies
❑ Data cube aggregation:
❑ Dimensionality reduction — e.g., remove unimportant attributes
❑ Data Compression
❑ Numerosity reduction — e.g., fit data into models
❑ Discretization and concept hierarchy generation
Data Cube Aggregation
❑ The lowest level of a data cube (base cuboid)
❑ the aggregated data for an individual entity of interest
❑ e.g., a customer in a phone calling data warehouse.

❑ Multiple levels of aggregation in data cubes

❑ Further reduce the size of data to deal with

❑ Reference appropriate levels

❑ Use the smallest representation which is enough to solve the task

❑ Queries regarding aggregated information should be answered

using data cube, when possible
Dimensionality Reduction
❑ Purpose:
❑ Avoid curse of dimensionality
❑ Reduce amount of time and memory required by data mining
algorithms
❑ Allow data to be more easily visualized
❑ May help to eliminate irrelevant features or reduce noise

❑ Techniques
❑ Principle Component Analysis
❑ Singular Value Decomposition
❑ Others: supervised and non-linear techniques
Example of Decision Tree Induction
Initial attribute set:
{A1, A2, A3, A4, A5, A6}

A4 ?

A1? A6?

Class 1 Class 2 Class 1 Class 2

> Reduced attribute set: {A1, A4, A6}

Data Compression
❑ String compression
❑ There are extensive theories and well-tuned algorithms
❑ Typically lossless
❑ But only limited manipulation is possible without expansion
❑ Audio/video compression
❑ Typically lossy compression, with progressive refinement
❑ Sometimes small fragments of signal can be reconstructed
without reconstructing the whole
Numerosity Reduction
❑ Parametric methods
❑ Assume the data fits some model, estimate model parameters,
store only the parameters, and discard the data (except possible
outliers)
❑ Log-linear models: obtain value at a point in m-D space as the
product on appropriate marginal subspaces
❑ Non-parametric methods
❑ Do not assume models
❑ Major families: histograms, clustering, sampling
Histograms
A popular data reduction
40
❑

technique 35
30
❑ Divide data into buckets and store 25
average (sum) for each bucket 20

❑ Can be constructed optimally in 15

one dimension using dynamic 10

programming 5
0
❑ Related to quantization problems. 10000 30000 50000 70000 90000
Sampling
❑ Sampling can be used as a data reduction technique because it
allows a large data set to be represented by a much smaller
random sample (or subset) of the data.
❑ Choose a representative subset of the data
❑ Every sampling type comes under two broad categories.
▪Probability Sampling - Random selection techniques are used to
select the sample.
▪Non- probability Sampling - Non-random selection techniques
based on certain criteria are used to select the sample.
Sampling
▪Probability Sampling
- Simple Random Sampling
- Systematic Sampling
- Stratified Sampling
-Cluster Sampling
Non- probability Sampling
- Convenience Sampling
- Voluntary Response Sampling
Data Preprocessing
❑ Why preprocess the data?

❑ Data cleaning

❑ Data integration and transformation

❑ Data reduction

❑ Discretization and concept hierarchy generation

❑ Summary
Discretization
❑ Three types of attributes:
❑ Nominal — values from an unordered set e.g., color, profession
❑ Ordinal — values from an ordered set e.g., military or academic rank
❑ Continuous — real numbers e.g., integer or real numbers

❑ Discretization:
❑ divide the range of a continuous attribute into intervals
❑ Some classification algorithms only accept categorical attributes.
❑ Reduce data size by discretization
❑ Prepare for further analysis
Discretization and Concept hierarchy
❑ Discretization
reduce the number of values for a given continuous attribute by dividing the
range of the attribute into intervals. Interval labels can then be used to
replace actual data values.

❑ Concept hierarchies
reduce the data by collecting and replacing low level concepts (such as
numeric values for the attribute age) by higher level concepts (such as
young, middle-aged, or senior).
Discretization for numeric data

❑ Binning (see sections before)

❑ Histogram analysis (see sections before)

❑ Clustering analysis (see sections before)

Data Preprocessing
❑ Why preprocess the data?

❑ Data cleaning

❑ Data integration and transformation

❑ Data reduction

❑ Discretization and concept hierarchy generation

❑ Summary
Summary
❑ Data preparation is a big issue for both warehousing and mining

❑ Data preparation includes

❑ Data cleaning and data integration

❑ Data reduction and feature selection

❑ Discretization

❑ A lot a methods have been developed but still an active area of research
Summary:
1. Data Cleaning
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It involves handling of
missing data, noisy data etc.
a) Missing Data:
This situation arises when some data is missing in the data. It can be handled in various ways.
Some of them are:
1. Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and multiple values are missing within a
tuple.
2. Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values manually, by attribute mean or the
most probable value.
b) Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated due to faulty data collection,
data entry errors etc. It can be handled in following ways :
1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided into segments of equal size and
then various methods are performed to complete the task. Each segmented is handled separately. One can replace
all data in a segment by its mean or boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.The regression used may be linear (having one
independent variable) or multiple (having multiple independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or it will fall outside the clusters.
Summary:
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process. This involves
following ways:

a) Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)

b) Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the mining process.

c) Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual levels.

d) Concept Hierarchy Generation:

Here attributes are converted from level to higher level in hierarchy. For Example-The attribute “city” can be
converted to “country”.
Summary:
3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with huge volume of
data, analysis became harder in such cases. In order to get rid of this, we uses data reduction technique. It aims to
increase the storage efficiency and reduce data storage and analysis costs.

The various steps to data reduction are:

a) Data Cube Aggregation:

Aggregation operation is applied to data for the construction of the data cube.

b) Attribute Subset Selection:

The highly relevant attributes should be used, rest all can be discarded. For performing attribute selection, one
can use level of significance and p- value of the attribute. the attribute having p-value greater than significance
level can be discarded.

c) Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression Models.

d) Dimensionality Reduction:
This reduce the size of data by encoding mechanisms. It can be lossy or lossless. If after reconstruction from
compressed data, original data can be retrieved, such reduction are called lossless reduction else it is called lossy
reduction. The two effective methods of dimensionality reduction are: Wavelet transforms and PCA (Principal
Componenet Analysis).
Thank You!

Predictive Analytics Complete Notes
No ratings yet
Predictive Analytics Complete Notes
82 pages
Practical Research 2 Q2 Module 6 7
100% (3)
Practical Research 2 Q2 Module 6 7
32 pages
Demand Forecasting
100% (3)
Demand Forecasting
45 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Unit 5
No ratings yet
Unit 5
104 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
23 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
Business Data Analytics Question Bank
No ratings yet
Business Data Analytics Question Bank
2 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
Service Discovery Middleware: Finding Needed Services
100% (1)
Service Discovery Middleware: Finding Needed Services
52 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Data Mining Metrices
No ratings yet
Data Mining Metrices
6 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Chapter 3. Big Data Adoption and Planning Considerations
No ratings yet
Chapter 3. Big Data Adoption and Planning Considerations
70 pages
PPT1
No ratings yet
PPT1
93 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Ad3491 Fdsa Unit 4 Notes Eduengg-2
No ratings yet
Ad3491 Fdsa Unit 4 Notes Eduengg-2
16 pages
Lesson 6 Virtualization
No ratings yet
Lesson 6 Virtualization
5 pages
Accenture Exam Pattern & Syllabus
No ratings yet
Accenture Exam Pattern & Syllabus
47 pages
FDP Day1
No ratings yet
FDP Day1
35 pages
Key Data Mining Tasks: 1. Descriptive Analytics
No ratings yet
Key Data Mining Tasks: 1. Descriptive Analytics
10 pages
Advanced Cluster Analysis: Clustering High-Dimensional Data
No ratings yet
Advanced Cluster Analysis: Clustering High-Dimensional Data
49 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
Module II
No ratings yet
Module II
22 pages
DWDM Unit - 1 MCQ: by Arpit Sharma 01629802018
No ratings yet
DWDM Unit - 1 MCQ: by Arpit Sharma 01629802018
27 pages
Data Visualization and Story Telling Notes
No ratings yet
Data Visualization and Story Telling Notes
31 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
MC4102 OOSE Question bank
No ratings yet
MC4102 OOSE Question bank
4 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Data Mining: Concepts and Techniques: - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 5
63 pages
ASSIGNMENT 1 Questions BI
No ratings yet
ASSIGNMENT 1 Questions BI
1 page
Unit 3
No ratings yet
Unit 3
24 pages
Backup and Recovery
No ratings yet
Backup and Recovery
35 pages
Unit 2 - Data Preprocessing
No ratings yet
Unit 2 - Data Preprocessing
23 pages
DAP Lab Manual
No ratings yet
DAP Lab Manual
20 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
Unit 2 Fod
No ratings yet
Unit 2 Fod
27 pages
OOSE Syllabus
No ratings yet
OOSE Syllabus
2 pages
CS8792 CNS Unit 1 - R1
No ratings yet
CS8792 CNS Unit 1 - R1
89 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
BI UNIT-II Chp01(Mathematical models for decision making)
No ratings yet
BI UNIT-II Chp01(Mathematical models for decision making)
9 pages
Cyber Security Lab Manual
No ratings yet
Cyber Security Lab Manual
112 pages
Lesson Plan: Data Warehousing and Data Mining
No ratings yet
Lesson Plan: Data Warehousing and Data Mining
1 page
Unit II Data Analytics
No ratings yet
Unit II Data Analytics
17 pages
Data Analytics Important Questions
No ratings yet
Data Analytics Important Questions
11 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
No ratings yet
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
6 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Big Data Notes
No ratings yet
Big Data Notes
4 pages
Data Mining
No ratings yet
Data Mining
8 pages
Bda Unit 5
No ratings yet
Bda Unit 5
14 pages
3.1 Lesson 1.2 Hints PDF
0% (1)
3.1 Lesson 1.2 Hints PDF
7 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Pre Processing
No ratings yet
Pre Processing
68 pages
DM Lect3
No ratings yet
DM Lect3
41 pages
2.04 Regression - Describing The Line
No ratings yet
2.04 Regression - Describing The Line
2 pages
Application of Machine Learning Techniques On Traffic Data For Customer's Segmentation, Churn Prediction and Customer's Lifetime Value Evaluation
No ratings yet
Application of Machine Learning Techniques On Traffic Data For Customer's Segmentation, Churn Prediction and Customer's Lifetime Value Evaluation
113 pages
SC0x W0 Course Overview CLEAN PDF
No ratings yet
SC0x W0 Course Overview CLEAN PDF
19 pages
CS3361 Data Science Lab Manual (II CYS)
100% (1)
CS3361 Data Science Lab Manual (II CYS)
40 pages
Solar
No ratings yet
Solar
22 pages
Xu Et Al. - 2010-Information Seeking in An Information Systems Project Team
No ratings yet
Xu Et Al. - 2010-Information Seeking in An Information Systems Project Team
12 pages
Web Edition - Thesis-Manjunath Sadashiva
No ratings yet
Web Edition - Thesis-Manjunath Sadashiva
322 pages
machine learning notes
No ratings yet
machine learning notes
19 pages
Computational Methods For Mixed Models
No ratings yet
Computational Methods For Mixed Models
21 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
A Manual Method For Applying The Hansch Approach To Drug Design
No ratings yet
A Manual Method For Applying The Hansch Approach To Drug Design
7 pages
Module 2.3 EDA Part 3 Time Series Data in Python and R
No ratings yet
Module 2.3 EDA Part 3 Time Series Data in Python and R
20 pages
8-Curve Fitting: Least-Squares Criterion (Linear Regression)
No ratings yet
8-Curve Fitting: Least-Squares Criterion (Linear Regression)
5 pages
Eva Output
No ratings yet
Eva Output
24 pages
The Role and Application of Matrices in Artificial Intelligence: Foundations, Methods, and Advancements
No ratings yet
The Role and Application of Matrices in Artificial Intelligence: Foundations, Methods, and Advancements
9 pages
SL Paper133
No ratings yet
SL Paper133
18 pages
UGC NET Management 2021 Set 1
No ratings yet
UGC NET Management 2021 Set 1
2 pages
Product Management 4th Edition Donald Lehmann pdf download
100% (2)
Product Management 4th Edition Donald Lehmann pdf download
66 pages
2009 Vol8 No2 PDF
No ratings yet
2009 Vol8 No2 PDF
319 pages
Narita Et Al. - 2000 - Observation of Current Waveshapes of Lightning STR
No ratings yet
Narita Et Al. - 2000 - Observation of Current Waveshapes of Lightning STR
7 pages
Frequency of Visit To Doctor and Health PDF
No ratings yet
Frequency of Visit To Doctor and Health PDF
9 pages
Pengaruh Audit Tenure, Komite Audit Dan Audit Capacity: Stress Terhadap Kualitas Audit
No ratings yet
Pengaruh Audit Tenure, Komite Audit Dan Audit Capacity: Stress Terhadap Kualitas Audit
13 pages
BDM - June 2014 LSBF Assignment Brief
No ratings yet
BDM - June 2014 LSBF Assignment Brief
9 pages
Mekelle University College of Business and Economics Department of Accounting and Finance
No ratings yet
Mekelle University College of Business and Economics Department of Accounting and Finance
106 pages
Introduction To Data Science Syllabus
No ratings yet
Introduction To Data Science Syllabus
2 pages
Durbin Watson Hitung Excel
No ratings yet
Durbin Watson Hitung Excel
4 pages
Apstats Unit2combined Practice Test
No ratings yet
Apstats Unit2combined Practice Test
8 pages
Happiness, Loneliness, and Social Media: Perceived Intimacy Mediates The Emotional Benefits of Platform Use - Matthew Pittman
No ratings yet
Happiness, Loneliness, and Social Media: Perceived Intimacy Mediates The Emotional Benefits of Platform Use - Matthew Pittman
13 pages