lOMoARcPSD|21669462
Dhiren DM LAB
B Tech (Parul University)
Studocu is not sponsored or endorsed by any college or university
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Faculty of Engineering & Technology
Subject Name: Data Mining
Subject Code: 203105432
B. Tech IT Year Semester 7th
Annexure No:
PRACTICAL - 1
AIM: Design and create cube by identifying measures and dimensions for star schema, snowflake
schema and fact constellation schema.
Software Used: Analysis services- SQL Server-2008.
(1) Creating a New Analysis Services Project
To create a new Analysis Services project, Use the New Project dialog box in BIDS to create a new
Analysis Services project, follow these steps:
1.Select Microsoft SQL Server 2008 ⇒SQL Server Business Intelligence. Development Studio from the
Programs menu to launch Business Intelligence Development Studio.
2.Select File ⇒ New Project. ⇒
3.In the New Project dialog box, select the Business Intelligence Projects project type.
4.Select the Analysis Services Project template.
5.Name the new project Practical1 and select a convenient location to save it.
6.Click OK to create the new project.
ENROLL NO - 200303108901 1
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
(2) Defining a Data Source
To define a data source, use the Data Source Wizard. Launch this wizard by right-clicking on
the Data Sources folder in new Analysis Services project
To define a data source for the new cube, follow these steps:
1. Right-click on the Data Sources folder in Solution Explorer and select New Data
Source.
2. Read the first page of the Data Source Wizard and click Next.
3. You can base a data source on a new or an existing connection. Because you don’t
have any existing connections, click New.
4. In the Connection Manager dialog box, select the server containing your analysis
services sample database from the Server Name combo box.
5. Fill in your authentication information.
6. Select the Native OLE DB\SQL Native Client provider (this is the default provider).
7. Select the AdventureWorksDW database. Figure shows the filled-in Connection
Manager dialog box.
8. Click OK to dismiss the Connection Manager dialog box.
9. Click Next.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
PRACTICAL 2
AIM: Make an OLAP cube and perform Roll Up and Drill Down operations on it. Show the Apex
and Base cuboid for the same. Draw Star-net query model for the cube.
Description:
ROLL UP:
The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on a data cube, by climbing
down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data cubes. Figure shows the
result of roll-up operations performed on the dimension location. The hierarchy for the location is defined as the Order
Street, city, province, or state, country. The roll-up operation aggregates the data by ascending the location hierarchy from
the level of the city to the level of the country.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
DRILL DOWN:
The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-down is like
zooming-in on the data cube. It navigates from less detailed record to more detailed data. Drill-down can
be performed by either stepping down a concept hierarchy for a dimension or adding additional
dimensions.
Figure shows a drill-down operation performed on the dimension time by stepping down a concept
hierarchy which is defined as day, month, quarter, and year. Drill-down appears by descending the time
hierarchy from the level of the quarter to a more detailed level of the month. Because a drill-down adds
more details to the given data, it can also be performed by adding a new dimension to a cube. For example,
a drill-down on the central cubes of the figure can occur by introducing an additional dimension, such as a
customer group.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
SLICE
A slice is a subset of the cubes corresponding to a single value for one or more members of the dimension.
For example, a slice operation is executed when the customer wants a selection on one dimension of a
three-
dimensional cube resulting in a two-dimensional site. So, the Slice operations perform a selection on one
dimension of the given cube, thus resulting in a subcube.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
DICE
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the
following diagram that shows the dice operation. The dice operation on the cube based on the following
selection criteria involves three dimensions.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
PIVOT
Broadly, this term refers to a new view of data available within a Slice of a multidimensional OLAP Cube.
As an example: a financial analyst might want to view or “pivot” data in various ways, such as displaying
all the cities down the page and all the products across a page.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
QUERY IN MS SERVER
1) Student Database
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
2) Cube
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
3) Roll Up
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
4) Cube and Rollup
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
PRACTICAL 3
AIM: Create calculated members using arithmetic operators and member property of
dimension members.
Follow the Steps to create new calculated measure:
1. Open Cube Designer for the Analysis Services Tutorial cube, and then click the
Calculations tab.
2.On the toolbar of the Calculations tab, click New Calculated Member.
A new form appears in the Calculation Expressions pane within which you define the
properties of this new calculated member.
The new member also appears in the Script Organizer pane.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
3. In the Name box, change the name of the calculated measure to [Parikshit].
Parikshit
[Parikshit]
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
4. In the Expression box, type a plus sign ﴾+﴿ after [Measures].
[Data Source Count]. In the Format string list, select " Currency "
.In the Non‐empty behavior list, select the checkboxes for Fact Cache Policy Count-
Configuration Info Count, and then click OK.
5.On the toolbar of the Calculations tab, click Script View, and then review the
calculation script in the Calculation Expressions pane.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
PRACTICAL 4
Aim: Design and Create cube by identifying measures and dimensions for Design storage
using storage mode MOLAP, ROLAP and HOLAP.
Description:
Relational OLAP(ROLAP): ROLAP is an extended RDBMS along with
multidimensional data mapping to perform the standard relational operation.
Multidimensional OLAP (MOLAP) : MOLAP Implemented operation in
multidimensional data.
Hybrid Online Analytical Processing (HOLAP) : In HOLAP approach the aggregated
totals are stored in a multidimensional database while the detailed data is stored in the
relational database. This offers both data efficiency of the ROLAP model and the
performance of the MOLAP model.
Advantages of OLAP
● OLAP is a platform for all types of business including planning, budgeting,
reporting, and analysis.
● Information and calculations are consistent in an OLAP cube. This is a crucial benefit.
● Quickly create and analyze "What if" scenarios
● Easily search OLAP database for broad or specific terms.
● OLAP provides the building blocks for business modeling tools, Data mining tools,
performance reporting tools.
● Allows users to slice and dice cube data all by various dimensions, measures, and
filters.
● It is good for analyzing time series.
● Finding some clusters and outliers is easy with OLAP.
● It is a powerful visualization online analytical process system which provides faster
response times
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Cube View:
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
STORAGE SETTINGS :
1. MOLAP : By default MOLAP uses array-based multidimensional storage engines to
display multidimensional views of data. Basically, they use an OLAP cube.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
2. ROLAP : It works with data that exists in a relational database. Facts and dimension tables
are stored as relational tables.It also allows multidimensional analysis of data and is the fastest
growing OLAP.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
3.HOLAP : Hybrid OLAP is a mixture of both ROLAP and MOLAP. It offers fast
computation of MOLAP and higher scalability of ROLAP.Aggregated or computed data is
stored in a multidimensional OLAP cube and Detailed information is stored in a relational
database.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Custom Settings :
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
PRACTICAL 5
Aim: Perform Pre-processing on a dataset. Apply various Filters and discuss the
effect of each filter applied.
Description :
1. WEKA : Waikato Environment for Knowledge analysis fully implemented in
JAVA we will be working and exploring on explorer.
2. Data preprocessing is a data mining technique that involves
transforming raw data into an understandable format.
3. Real-world data is often incomplete, inconsistent, lacking in certain
behaviors or trends, and is likely to contain many errors.
4. Data preprocessing is an important step in the data mining process.
5. Data pre-processing includes cleaning, normalization, transformation, feature
extraction and selection, etc.
6. The product of data pre-processing is the final training
Prerequisite :
1. Weka Tool Installation
2. Data set
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
STEPS :
1. Home page of weka
2. Click on Explorer and find the tab of Preprocess.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
3.Go to Open file ->C Drive ->Program Files ->Weka 3-8-4 -> data and
choose any data set given for this practical I have chosen iris.arff data set
4. After selecting database :
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
5. Select the filters and start exploring each of them and observe the changes .
→Filters: -
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
1. ADD
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
2. NORMALIZE
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
3. REMOVE
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
4. REMOVERANGE
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
5. REPLACEWITHMISSINGVALUE
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
6. REMOVEDFREQUENTVALUES
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
7. REMOVEPERCENTAGE
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
8. SUBSETBYEXPRESSION
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
9. RESAMPLE
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
10. ADDID
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
11. DISCRETIZE
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
12. MERGENOMINALVALUES
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Practical 6
Aim : Perform different binning methods for smooth out noise.
● Data binning, bucketing is a data pre-processing method used to minimize
the effects of small observation errors.
● The original data values are divided into small intervals known as bins and
then they are replaced by a general value calculated for that bin.
● This has a smoothing effect on the input data and may also reduce the
chances of overfitting in case of small datasets
● There are 2 methods of dividing data into bins ”
1. Equal Frequency Binning : bins have equal frequency.
2. Equal Width Binning : bins have equal width with a range of each bin are
defined as [min + w], [min + 2w] …. [min + nw] where w = (max – min) /
(no of bins).
Source code :
#equal frequency
def equifreq(arr1, m):
a = len(arr1)
n = int(a / m)
for i in range(0, m):
arr = []
for j in range(i * n, (i + 1) * n):
if j >= a:
break
arr = arr + [arr1[j]]
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
print(arr)
#equal width
def equiwidth(arr1, m):
a = len(arr1)
w = int((max(arr1) - min(arr1)) / m)
min1 = min(arr1)
arr = []
for i in range(0, m + 1):
arr = arr + [min1 + w * i]
arri=[]
for i in range(0, m):
temp = []
for j in arr1:
if j >= arr[i] and j <= arr[i+1]:
temp += [j]
arri += [temp]
print(arri)
#data to be binned
data = [5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]
#no of bins
m=3
print("equal frequency binning")
equifreq(data, m)
print("\n\nequal width binning")
equiwidth(data, 3)
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
OUTPUT :
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Practical 7
Aim: Perform Association Rules Mining using weka tool.
Theory : -
It is an important data mining model studied extensively by the database
and data mining community.
Assume
all data are categorical.
No good
algorithm for numeric data.
Initially
used for Market Basket Analysis to find how items purchased by customers
are
related.
Support count: The support count of an itemset X, denoted by X.count, in a
data set T is
the number of transactions in T that contain X. Assume T
has n transactions.
Then,
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Process: -
1. Choose a Random dataset:
2. Select random attributes from the given database.
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
3. Go to associate section and by default Apriori assosciator will be
there so just click on start :
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Practical 8
Aim: Perform classification with WEKA tool.
Theory : -
Classification : -
Classification is a data mining function that assigns items in a
collection to target categories or classes. The goal of classification is to
accurately predict the target class for each case in the data. For
example, a classification model could be used to identify loan
applicants as low, medium, or high credit risks.
Types of classifiers:
There are different types of classifiers, a classifier is an algorithm that
maps the input data to a specific category. Now, let us take a look at
the different types of classifiers:
1. Perceptron
2. Naive Bayes
3. Decision Tree
4. Logistic Regression
5. K-Nearest Neighbor
6. Artificial Neural Networks/Deep Learning
7. Support Vector Machine
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Steps: -
1. Open weka tool:
2. Select explorer option and choose the dataset from the pre-upload
data :
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
3. Go to classify tab → choose j48 and then start :
4. Visualization the j48 with decision tree:
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
5. Go to classify tab → choose NaiveBayes and then start :
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
6. Visualizing NaiveBayes
P R PS Total
Downloaded by Mriyank Das (mriyankdas428@gmail.com)
lOMoARcPSD|21669462
Downloaded by Mriyank Das (mriyankdas428@gmail.com)