DWM QB Cyse

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Shah and Anchor Kutchhi Engineering College, Chembur

Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

Module-wise Questions
Introduction to Data Warehouse

1. Differentiate Data Warehouse Vs. Data Mart


2. In what way ETL cycle can be used in typical data warehouse, explain it with suitable instance.
3. What is meant by meta data in context of a Data warehouse? Explain the different types of
metadata stored in a data warehouse. Illustrate with a suitable example.
4. Discuss Architecture of a typical data warehouse system.
5. Write a short note on: Operational Vs. Decision Support System
6. Differentiate top-down and bottom-up approaches for building data warehouse. Discuss the
merits and limitations of each approach.
7.
Dimensional Modelling
1. Write a short note on:
a) Fact less Fact table
b) Fact Constellation
c) Updates to Dimension tables
d) Star schema
e) Snowflake schema
f) Aggregate fact tables
2. What is dimensional modeling? Design the data warehouse for wholesale furniture company. The
data warehouse has to allow analyzing the company’s situation at least with respect to the Furniture,
Customer and Time. More ever, the company needs to analyze: The furniture with respect to its type,
category and material. The customers with respect to its type, category and material. The customers
with respect to their spatial location, by considering at least cities, regions and states. The company is
interested in learning the quantity, income and discount of its sales.
3. Consider following dimensions for a hypermarket chain: Product, Store, Time and Promotion.
With respect to this business scenario, answer the following questions. Clearly state any reasonable
assumptions you make. Design a star schema. Whether the star schema can be converted to
snowflake schema? Justify your answer and draw snowflake schema for the data warehouse (clearly
mention the fact table(s), Dimension table(s), their attributes and measures).
Shah and Anchor Kutchhi Engineering College, Chembur
Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

4. For a supermarket chain, Consider the following dimensions namely Product, Store, Time and
Promotion. The schema contains a central fact table for sales.
i) Design star schema for the above application.
ii) Calculate the maximum number of base fact table records for warehouse with the following
values given below:
 Time Period – 5 years
 Store – 300 stores reporting daily sales
 Product – 40,000 products in each store (about 4000 sell in each store daily)
A manufacturing company has a huge sales network. To control the sales, it is divided into regions.
Each region has multiple zones. Each zone has different cities. Each sales person is allocated
different cities. The objective is to track sales figure at different granularity levels of region and to
count no. of products sold. Design a star schema by considering levels for region, sales person and
time. Convert the star schema to snowflake schema.

5. What is meant by Information Package Diagram, for recording the information requirements
for “Hotel Occupancy” having dimensions like time, hotel etc., give the Information package
diagram for the same. Also draw the star schema and snowflake schema.
6. A bank wants to develop a data warehouse for effective decision-making about their loan
schemes. The bank provides loans to customers for various purposes like house building loan,
car loan, educational loan, personal loan, etc. the whole country is categorized into a number
of regions, namely, north, south, east and west. Each region consists of a set of states. Loan is
disbursed to customers at interest rates that change from time to time. Also, at any given point
of time, the different types of loans have different rates. The data warehouse should record an
entry for each disbursement of loan to customer.
(a) Design an information package diagram for the application.
(b) Design a star schema for the data warehouse clearly identifying the fact tables (s),
dimensional table(s), their attributes and measures.
7. Consider the following data base for a chain of bookstores.
BOOKS (Booknum, Primary_author, Topic, Total_Stock, price)
BOOKSTORE (Storenum, City, State, Zip, Inventory_value)
STOCK (Storenum, Booknum, Qty)
With respect to the above business scenario, answer the following questions. Clearly state any
reasonable assumptions you make.
(a) Design an information package diagram.
(b) Design a star schema for the data warehouse clearly identifying the fact tables (s),
dimensional table(s), their attributes and measures.
8. State and explain the various schemas used in data warehousing with examples for each of
them.
Shah and Anchor Kutchhi Engineering College, Chembur
Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

ETL Process
1. In what way ETL cycle can be used in typical data warehouse, explain with suitable instance.
2. Describe the steps of ETL process.
3. Write a short note on: a) ETL Process

OLAP

1. Differentiate between OLTP Vs. OLAP


2. Discuss various OLAP Models and their Architecture.
3. We would like to view sales data of a company with respect to three dimensions namely
Location, Item and Time. Represent the sales data in the form of a 3-D data cube for the
above and Perform Roll up, Drill Down, Slice and Dice OLAP operations on the above data
cube and Illustrate.
4. Discuss how computations can be performed efficiently on data cubes.
5. The college wants to record the Marks for the courses completed by students using the
dimensions: i) Course, ii) Student, iii) Time & measure Aggregate marks.
Create a cube and describe following OLAP operations:
1) Rollup 2) Drill Down 3) Slice 4) Pivot 5) Dice
6. Consider a data warehouse for a hospital where there are three dimension
a) Doctor b) Patient c) Time
Consider two measures i) Count ii) Charge where charge is the fee that the doctor
charges a patient for a visit. For the above example create a cube and illustrate the following
OLAP operations.
1) Rollup 2) Drill Down 3) Slice 4) Pivot 5) Dice
7. Write a short note on: a) Indexing OLAP data

Introduction to Data Mining

1. What is Data Mining? What are techniques and applications of data mining?
2. Explain data mining as a step in KDD. Give the architecture of typical DM system.
3. Describe the various functionalities of Data Mining as a step in the process of knowledge
Discovery.
4. Discuss Architecture of a typical data mining system.
Shah and Anchor Kutchhi Engineering College, Chembur
Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

5. Discuss Application and major issues in Data Mining.


Data Exploration

1. Write a short note on: Data Visualization


2. Write a short note on: Measuring similarity and dissimilarity
3. Explain types of attributes and data visualization for data exploration.

Data Preprocessing

1. Discuss different steps involved in Data Pre-processing.


2. Explain data reduction in data preprocessing.
3. Define: Normalization, Binning, Histogram Analysis
4. Write a short note on: Concept Hierarchy.
Classification

1. What is classification? What are the issues in Classification? Apply statistical based algorithm to
obtain the actual probabilities of each event to classify the new tuple as a tail. Use the following data:
Person ID Name Gender Height Class
1 Kristina Female 1.6 m Short
2 Jim Male 2m Tall
3 Maggi Female 1.9 m Medium
4 Marya Female 2.1 m Tall
5 Stephanie Female 1.7 m Short
6 Bob Male 1.85 m Medium
7 Catherine Female 1.6 m Short
8 Dave Male 1.7 m Short
9 Wilson Male 2.2 m Tall

2. A simple example from the stock market involved only discrete ranges has Profit as categorical
attributes, with values { up, down}. And the training data is:
COMPETITION TYPE PROFIT
AGE
Old Yes Software Down
Old No Software Down
Old No Hardware Down
Mid Yes Software Down
Mid Yes Hardware Down
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
Shah and Anchor Kutchhi Engineering College, Chembur
Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

New No Hardware Up
New No Software Up

Apply the decision tree algorithm and show the generated rules.
3. Define linear, non-linear and multiple regressions. Plan a regression model for Disease
development with respect to change in weather parameters.
4. Write a short note on: Decision Tree Based Classification Approach
5. Why naïve Bayesian classification is called “naive”? Briefly outline the major ideas of naïve
Bayesian classification.
6. Apply the Naives Bayes classifier algorithm for buys computer classification and classify the
tuple X=(age= “young”, income= “medium”, student= “yes” and credit_rating= “fair”

Id Age Income Student Credit_rating Buys computer


1 Young High No Fair No
2 Young High No Good No
3 Middle High No Fair Yes
4 Old Medium No Fair Yes
5 Old Low Yes Fair Yes
6 Old Low Yes Good No
7 Middle Low Yes Good Yes
8 Young Medium No Fair No
9 Young Low Yes Fair Yes
10 Old Medium Yes Fair Yes
11 Young Medium Yes Good Yes
12 Middle Medium No Good Yes
13 Middle High Yes Fair Yes
14 Old Medium No Good No

7. Explain classification algorithm.

Clustering

1. What is K-means clustering? Confer the K-means algorithm with the following data for two
clusters. Data set {10, 4, 2, 12, 3, 20, 30, 11, 25, 31}
2. What is clustering? Explain K-means clustering algorithm. Suppose the data for clustering is
{2, 4, 10, 12, 3, 20, 30, 11, 25} consider K=2, cluster the given data using above algorithm.
3. What is clustering technique? Discuss the agglomerative algorithm using following data and
plot a Dendrogram using single link approach. The following figure contains sample data items
indicating the distance between the elements:
Item E A C B D
E 0 1 2 2 3
A 1 0 2 5 3
Shah and Anchor Kutchhi Engineering College, Chembur
Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

C 2 2 0 1 6
B 2 5 1 0 3
D 3 3 6 3 0

4. Write a short note on:

a. K-means Clustering

b. Hierarchical clustering methods

5. Apply agglomerative hierarchical clustering and draw single link and average link dendrogram for
the following distance matrix.
A B C D E
A 0 2 6 10 9
B 2 0 3 9 8
C 6 3 0 7 5
D 10 9 7 0 4
E 9 8 5 4 0

6. Illustrate how the supermarket can use clustering methods to improve sales.

7. Find cluster using k-means clustering algorithm if we have several objects (4 types of medicines)
and each object have two attributes or features as shown in the table below. The goal is to group
these objects into k=2 group of medicine based on the two features (pH and weighted index).
Attribute1 (X) Attribute 2
Object
Weight Index (Y) pH
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4

8. Write a short note on: DBSCAN

9. Consider the data given below, create adjacency matrix. Apply single link algorithm to cluster the
given data set and draw the dendogram

Attribute1 Attribute 2
Object (X) (Y)

A 2 2

B 3 2

C 1 1

D 3 1

E 1.5 0.5
Shah and Anchor Kutchhi Engineering College, Chembur
Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

Mining Frequent Patterns and Association Rules

1. Consider the following transactions:


TID Items
01 1, 3, 4, 6
02 2, 3, 5, 7
03 1, 2, 3, 5, 8
04 2, 5, 9, 10
05 1, 4

Apply the Apriori Algorithm with minimum support of 30% and minimum confidence of 75% and
find the large item set L.

2. What is association mining rule? Give the Apriori Algorithm. Apply AR Mining to find all
frequent itemsets from the following table:
TID Items
100 1, 2, 5
200 2, 4
300 2, 3
400 1, 2, 4
500 1, 3
600 2, 3
700 1, 3, 2, 5
800 1, 3
900 1, 2, 3

Minimum Support Count = 2


Minimum – Confidence= 70%
3. Define Multidimensional and multilevel association mining.
4. Write a short note on: FP tree
5. Consider the following transactions:

TID Items
01 A, B, C, D
02 A, B, C, D, E, G
03 A, C, G, H, K
04 B, C, D, E, K
05 D, E, F, H, L
06 A, B, C, D, L
07 B, I, E, K, L
08 A, B, D, E, K
09 A, E, F, H, L
10 B, C, D, F
Shah and Anchor Kutchhi Engineering College, Chembur
Department of Cyber Security

Class: TE SEM: V
Subject: Data Warehousing and Mining

Apply the Apriori Algorithm with minimum support of 30% and minimum confidence of
70% and find all the association rules in the data set.
6. A Database has five transactions. Let min-support=60% and min-confidence = 80%. Find all
frequent item sets by using Apriori Algorithm. T_ID is the transaction ID.

T_ID Items bought


T_1000 M, O, N, K, E, Y
T_1001 D, O, N, K, E, Y
T_1002 M, A, K, E
T_1003 M, U, C, K, Y
T_1004 C, O, O, K, E

7. Discuss Association Rule Mining and Mining Algortihm.


A data Database has four transactions. Let min-support=60% and min-confidence = 80%.

T_ID Items bought


T_100 A, B, C
T_200 A, C
T_300 A, D
T_400 B, E, F

Find all frequent item sets using apriori algorithm. List strong association rules.

8. A Database has five transactions. Let min-support=30% and min-confidence = 70%. Find all
frequent patterns using Apriori Algorithm. List strong association rules.

T_ID Items bought


A 1, 3, 4, 6
B 2, 3, 5, 7
C 1, 2, 3, 5, 8
D 2, 5, 9, 10
E 1, 4

You might also like