Mid Sem
Mid Sem
Mid Sem
1.a-f are multiple choice type questions. One or more options may be correct. 2 marks
will be awarded for each correct answer, 1 mark will be deducted for each wrong answer.
An answer will be considered correct iff all the correct options are chosen. 12 Marks
e. It is beneficial and practical to materialize all the views in a data cube when
(i) the number of levels in dimensional hierarchies are very large and there are
too many dimensions
(ii) the speed of retrieval is the primary objective
(iii) the cardinality of the dimension is high
(iv) we can implement a greedy algorithm for selecting the views to be
materialized
1/3
Mid Semester Exam
Course Id: 406035 Data Warehousing and Data Mining
2. Consider the following business scenario. A telecom company plans to maintain a
CRM data warehouse. There are 10 million customers of the company. Besides the usual
attributes, the company wants to maintain additional demographic information like
literacy percentage, male/female ratio, average life expectancy and average income of the
people belonging to the state to which each customer belongs. The company also wants
to maintain information about the age group, income level and marital status of its
customers. They also need to run queries like the number of married and unmarried
customers they have at any point in time.
15+3=18 Marks
3. Consider a hypothetical sales fact table that contains the columns item_code and
state_code as its dimensions. The corresponding dimension tables are also shown below.
2/3
Mid Semester Exam
Course Id: 406035 Data Warehousing and Data Mining
4. Consider a 3-D data array consisting of 3 dimensions A, B and C. The 3-D array is
partitioned into 64 memory-based chunks. Dimension A is organized into 4-equisized
partitions a0, a1, a2 and a3. Similarly dimensions B and C are also organized into 4-
equisized partitions each. Chunks are numbered as 1, 2, 3, , 64 corresponding to the
sub cubes a0b0c0, a1b0c0, a2b0c0, a3b0c0, a0b1c0, , a3b3c3, respectively. Suppose
the size of the array of the dimensions A, B and C are 300, 3,000 and 30,000,
respectively. If we perform multi-way array aggregation, then what is the minimum
memory requirement for holding all relevant 2-D partial sums in chunk memory, if the
chunks are brought into memory in the order: 1, 17, 33, 49, 5, 21, , 13, 29, 45, 61, 2,
18,
20 Marks
5. Consider the following lattice of views along with a representation of the number of
rows in each view where A is the base cuboid:
If you have to choose 3 views to materialize apart from the base cuboid, which of the
views B-H would you choose and how? Assume that the cost of running a query is
linearly proportional to the number of rows in the view from which it is run.
20 Marks
A
100
C
B 60
40
D E F
20 30 10
G H
5 8
3/3