DMW Question Paper
DMW Question Paper
DMW Question Paper
Sixth Semester
Computer Science and Engineering
DATA MINING AND DATA WAREHOUSING
Time: Three hours
Maximum:75marks
PART A-(10*2=20marks)
Answer ALL questions.
ALL questions carry equal marks.
1. What is data generation?
2. Define preprocessing.
3. What is association mining?
4. Define correlation analysis.
5. What is classification?
6. Define classifier accuracy.
7. What is grid based method?
8. What are the trends in data mining?
9. Define data warehousing.
10. Write data visualization principles.
PART-B (5*11=55 marks)
Answer FIVE Question by choosing ONE full question from each Unit.
All questions carry equal marks.
UNIT-I
11. Explain the data mining primitives in detail.
Or
12. Discuss the concept of descriptive statistical measures.
UNIT-II
13. Explain the multilevel association rules from transaction database.
Or
14. Describe the constraint based association mining.
UNIT-III
15. Discuss in detail about decision tree induction and issues.
Or
16. Describe the association rule based and Bayesin Classification methods.
UNIT-IV
17. Explain the multidimentional analysis and descriptive mining of complex data object.
Or
18. Discuss the cluster analysis method.
UNIT-V
19. Explain the multidimentional dataq model and data warehouse architecture.
Or
20. Explain the mapping the data warehouse to an architecture.
Maximum: 75marks
PART A-(10*2=20marks)
Answer ALL questions.
ALL questions carry equal marks.
1. Define concept hierarchy.
2. What is descriptive model?
3. Define associative rule mining.
4. Mention few approaches to mining multilevel association rules.
5. Define concept of prediction.
6. Specify the five criteria for the evalution of classification and prediction.
7. What is hierarchical method?
8. What are the different types of data used for cluster analysis?
9. Define OLAP.
10. Write short notes on multidimentional data model.
PART-B (5*11=55 marks)
Answer FIVE Question by choosing ONE full question from each Unit.
All questions carry equal marks.
UNIT-I
11. Briefly explain the integration of a data mining system with a database and data ware
house system.
Or
12. Discuss the architecture of data mining system.
UNIT-II
13. Explaining mining multi-dimentional Boolean association rule from transaction
database.
Or
14. Discuss about constrain based association mining.
UNIT-III
15. Discuss in detail about Bayesian classification.
Or
16. (a)Write short notes on prediction.
(b)Explain about classifier accuracy measures.
UNIT-IV
17. Discuss the types of data in cluster analysis in detail.
Or
18. Explain in details mining the World Wide Web.
UNIT-V
19. Disscuss the components of data warehouse design and construction of a dataware
house.
Or
20. Explain about three tier architecture of a dataware house.
Maximum: 75marks
PART A-(10*2=20marks)
Answer ALL questions.
ALL questions carry equal marks.
1. What are the strategies for data reduction?
2. How is class comparison performed?
3. What is Apriori property?
4. How Association rule can be classifed?
5. Differentiate Prediction and Classification.
6. What is Bagging and Boosting?
7. What is meant by nominal and ordinal variables?
8. What kind of associations can be miined in multimedia data?
9. What are the benefits of data warehouse?
10. Define Data Mart.
PART-B (5*11=55 marks)
Answer FIVE Question by choosing ONE full question from each Unit.
All questions carry equal marks.
UNIT-I
11. Discuss
(a)Methods for Data Smoothing.
(b)Issues to be consider Data Integration.
Or
12. Explain briefly various types of concept hierarchies with suitable example.
UNIT-II
13. Explain in detail about fg-growth Algorithm.
Or
14. Discuss about the mining Multilevel Association Rules from transaction Database.
UNIT-III
15. Explain the
(a)K-nearest meighbour classifier
(b)Genetic Algorithm
(c)Rough set Approach.
Or
16. Griefly explain the algorithm for decision tree induction with example and also discuss
attribute selection measure.
UNIT-IV
17. Describe briefly about mining text Database.
Or
18. Explain various Hierarchical clustering methods.
UNIT-V
19. Explain Star, Snow flakes and fact consellation scheme for multidimensional Database.
Or
20. Explain the categorization of OLAP tools.
1.
2.
3.
4.
5.
6.
Maximum: 75marks
PART A-(10*2=20marks)
Answer ALL questions.
ALL questions carry equal marks.
Suppose a group of 12 sales price records has been sorted as
follows:5,10,11,13,15,35,50,55,72,92,204,215.partion them into three bins using equiwidth partitioning and use smoothing by bin means to smooth the data, using a bin of
depth 3.
Define and justify why data cleansing and data transformation function are vital task in
the integration process.
How can frequent item set mining be applied on data that contains numerics attributes
such as age, height, ect?
What is multilevel association rule mining? Give an example in any domain like Marketbasket analysis or medical diagnosis.
How is information Gain differ from Gain ratio? State their use in classification
problems.
Differentiate between prediction and classification.
7. What is the structure produced by a hierarchical clustering called? Give the structure
with an example and explain the merits and demerits of the method.
8. How spatial data mining help in the application development to design and implement a
City Planner?
9. Sate the significance of ETL process in a data warehouse.
10. List the data visualization principles.
PART-B (5*11=55 marks)
Answer FIVE Question by choosing ONE full question from each Unit.
All questions carry equal marks.
UNIT-I
11. Elaborate on the data mining primitives. Discuss in detail the issues in Discretization
and concept hierarchy generation with an example.
Or
12. Explain the major application of data mining and the trends followed. Discuss a domain
from medical diagnosis and social network analysis.
UNIT-II
13. Distinguish between weak association rules and strong association tules.Compare and
contrast the various issues in Apriori and FP growth association rule mining. Fine the
association rules using Appriori algorithm with min, conf-50% and min. supp=40% for
the soles data given in the table below.
Transaction ID
Item set
3010
Milk,Bread,Jam
3011
Bread,butter,Juice
3012
Soda,Bread,Butter
3013
Bread.Juice,Soda
3014
Milk,Juice
Or
14. State the FP growth algorithm and their benefits. List the method that helps improve the
performance of Apriori algorithm.
UNIT-III
15. Consider th e2-c;ass data of heights and weights of students studying in class 7 and class
8 given in the table,use nave bayes algorithm to determine the class of the student with
roll no.6 in the table given below.
Roll.no
Height
Weight
1
2
3
4
5
6
47
49
52
53
52
52
50kg
52kg
55kg
54kg
56kg
55kg
Class
7
7
7
8
8
Determine?
Or
16. Discuss the various factor in constructing a decision tree. Sate its merits and demerits.
UNIT-IV
17. What are the key ideas of grids-based clustering? What application benefits the most
from grid-based clustering? Compare with density based clustering technique.
Or
18. How dose the Page Rank algorithm measure the importance of a webpage? Give a
sketch of the computational methods it uses to compute the importance of a webpage.
Discuss the web mining issues.
UNIT-V
19. Suppose that a data warehouse consist of the three dimensions time, doctor and patient
and the two measure count and charge where charge is the fee that a Doctor charges a
patient for a visit.
(a). Enumerate three classes of schemas that are popularly used for modeling data
warehouses.
(b). Draw a schema diagram for the above data warehouse using one of the Schema
classes listed in (a)
. To obtain the list of total fees collected by each doctor in 2004,write an SQL query
assuming the data is stored in a relational database with the schema
fee(day,month,year,doctor,hospital,patient,count,charge)
Or
20. Discuss about the data warehouse architecture layers and views regarding the design of a
warehouse. Explain the complete ETL process and its components.