0% found this document useful (0 votes)
42 views

Tut8 QPO Qa

Uploaded by

6r8bwn769s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Tut8 QPO Qa

Uploaded by

6r8bwn769s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

INFS2200/INFS7903 – Relational Database Systems

Tutorial 8 Query Processing and Optimization

Exercise 1. Assume that relation R has attributes A, B, C, D, and relation S has attributes D, E, F. Which of the
following are valid relational algebra transformations during query optimization and which are not?

Please explain the reason briefly when one is not a valid transformation; you do not need to give any
explanation for valid transformations. Note that corresponds to the natural join operator on the common
attribute (i.e., attribute D).

A. πA,B(πA,B,C,D(R S)) = πA,B(R S)

B. σA>15 or B<10 or C=20(R) = σA>15(σB<10(σC=20(R)))

C. πA,B(σC<12(R)) = σC<12(πA,B(R))

D. πA,F(R S) = (πA(R)) (πF(S))

Exercise 2. Consider the following relational schema and SQL query. The schema captures information about
employees, departments, and company finances (organized on a per department basis).

Emp (eid:integer, did:integer, sal:integer, hobby:char(20))


Dept (did:integer, dname:char(20), floor:integer, phone:char(10))
Finance (did:integer, budget:real, sales:real, expenses:real)

Consider the following query:

SELECT D.dname, F.budget


FROM Emp E, Dept D, FinanceF
WHERE E.did = D.did AND D.did = F.did
AND D.floor = 1 AND E.hobby = ‘camping’;

A. List the join orders (i.e., orders in which pairs of relations can be joined to compute the query result) that a
relational query optimizer will consider. (Assume that the optimizer follows the heuristic of never
considering plans that require the computation of cross-products.)

B. For one of the join orders above, identify a relational algebra tree (or a relational algebra expression) that
reflects the order of operations a query optimizer would choose.

C. Suppose that the following additional information is available: B+ tree indexes exist on Emp.did, Emp.sal,
Dept.floor, Dept.did, and Finance.did. The system’s statistics indicate that employees enjoy 200 different
hobbies, and the company owns two floors in the building, with a uniform value distribution. There are a
total of 50,000 employees and 5,000 departments (each with corresponding financial information) in the
database. The DBMS used by the company has only one join method available: single nested loop join.
a. For each of the query’s base relations (Emp, Dept, and Finance), estimate the number of tuples that
would be initially selected from that relation if all the non-join predicates on that relation were applied
to it before any join processing begins.
b. Given your answer to the above question, which of the join orders considered by the query optimizer
would be more efficient?

Exercise 3. Consider the following relational schemas and instances:

Student (SID, Name, Class, Major)


Student_Dir (SID, Address, Phone)
FK: (SID) → Student (SID)
Course (Course_No, Name, Level)
Course_Taken (Course_No, Term, SID, Grade)
FK: (Course_No) → Course (Course_No); (SID) → Student (SID)

1
INFS2200/INFS7903 – Relational Database Systems

Student
SID Name Class Major
123 John 3 CS
124 Mary 3 CS
126 Sam 2 CS
129 Julie 2 Math

Student_Dir
SID Address Phone
123 333 Library St 555---535---5263
124 219 Library St 555---963---9635
129 555 Library St 555---123---4567

Course
Course_No Name Level
CS1520 Web Programming UGrad
CS1555 Database Management Systems UGrad
CS1550 Operating Systems UGrad
CS1655 Secure Data Management and Web Applications Ugrad
CS2550 Database Management Systems Grad

Course_Taken
Course_No Term SID Grade
CS1520 Fall 11 123 3.75
CS1520 Fall 11 124 4
CS1520 Fall 11 126 3
CS1555 Fall 11 123 4
CS1555 Fall 11 124 NULL
CS1550 Spring 12 123 NULL
CS1550 Spring 12 124 NULL
CS1550 Spring 12 126 NULL
CS1550 Spring 12 129 NULL
CS2550 Spring 12 124 NULL
CS1520 Spring 12 126 NULL

For each of the relational algebra expressions below, identify the expected arity (number of attributes),
resulting schema, and min/max cardinality (number of distinct tuples) of the relation resulted from the query,
without actually evaluating the query and based only on the schemas and cardinalities of the four given
relations.

A. σTerm = 'Spring 12' (Course_Taken)

B. Course_Taken * Course

Note
(a) Cardinality means NDV (Number of Distinct Values). Cardinality of a relation means the population of
a relation, as every tuple is unique in a relation.
(b) The symbol ‘*’ corresponds to the natural join operator on the common attribute (i.e., attribute
Course_No). When there is no specific subscript, is regarded as an equal join by default. If there is
no ambiguity within the context, can also be used as a natural join.

2
INFS2200/INFS7903 – Relational Database Systems

Exercise 4. Consider the following schema, annotated with the number of records, whose population is for
the calendar year 2022. Further, there are 10 distinct values of ProdType in Product, and 5 distinct values of
Type in Customer.

Customer (CustID, Name, Type) 10,000


Invoice (InvID, CustID, Date, Amount) 10 * customer * month (1.2 million)
FK: (CustID) à Customer (CustID)
LineItem (InvID, LineNo, ProdID, Qty) 10 per invoice (12 million)
FK: (InvID) à Invoice(InvID)
FK: (ProdID) à Product(ProdID)
Product (ProdID, Description, ProdType) 1,000

A. How many tuples are there in the natural joins of them (i.e., the relations Customer, Invoice, LineItem,
Product)?

B. What would be the cost of computing these natural joins, step by step, in the sequence indicated? Is
there another sequence that would cost more, or less?

C. Suppose we want to know the types of customers who have bought a given type of product (widget) in
July. How many tuples would you expect in the result?

D. What relational projection operations are there? Can any be done immediately?

E. What relational selections can be done immediately? How big are the resulting tables?

F. What joins are remaining, in increasing order of cost?

G. How many tuples are there in the result of the cheapest join?

H. Repeat above questions F. and G. until the joins are completed. What operation is left?

Exercise 5. An SQL statement for the query of Exercise 4 is

SELECT Type
FROM Customer, invoice, LineItem, Product
WHERE Invoice.CustID = Customer.CustID
AND LineItem.InvID = Invoice.InvID
AND Product.ProdID = LineItem.ProdID
AND Product.ProdType = 'Widget'
AND Invoice.Date.Month = July;

Construct a query tree for this query and show the steps in optimization process.

3
INFS2200/INFS7903 – Relational Database Systems

Solutions of Tutorial 8
Exercise 1 Solution

A. Valid.

B. Invalid. Because it is a disjunction query. For instance, records with A>15 or B<10 but C≠20 should
appear in the query answer. However, under this transformation all those records will be eliminated when
evaluating σC=20(R).

C. Invalid. Because the projection does not contain the attributes in the selection condition (i.e., attribute C).
Under this transformation, applying πA,B first will eliminate attribute C from the intermediate result and it will
not be possible to apply the selection condition σC<12.

D. Invalid. Because the projection does not contain the attributes in the join condition (i.e., attribute D). The
condition for a natural join is R.D=S.D, since D is the common attribute. However, under this
transformation, applying πA will eliminate attribute D from relation R. Similarly, applying πF will also
eliminate attribute D from relation S.

Exercise 2 Solution:

A. There are two join orders considered, assuming that the optimizer ignores cross-products:
• First possible join order: ((E D) F)
• Second possible join order: ((D F) E)

B. A query optimizer would typically push down the selection and projection as far down the tree as possible.
For the join order ((E D) F), this would result in:

πD.dname,F.budget(
(πE.did(σE.hobby=’camping’(E)) πD.did,D.dname(σD.floor=1(D)))
πF.budget,F.did(F))

C. Given the additional information and statistics of the database:

a. Emp size = 50,000 records, E.hobby = ‘camping’


Resulting size = 50,000 * (1/200) = 250 records.

Dept size = 5,000 records, D.floor = 1


Resulting size = 5,000 * (1/2) = 2,500 records.

Finance size = 5,000 records, as there are no combined predicates for departments.
Resulting size = 5,000 records.

b. Plan ((E D) F) is more efficient because in that plan, the optimizer executes the most restrictive
operations first.

As opposed to plan ((D F) E), where there are no restrictions on the Finance relation to reduce the
resulting number of records.

Exercise 3 Solution:

A. Arity = Arity of Course_Taken = 4.

Schema = the schema of Course_Taken = (Course_No, Term, SID, Grade).


Cardinality = Cardinality of Course_Taken * Selectivity of σTerm = 'Spring 12'
• Cardinality of Course_Taken = 11;
• Selectivity is within the range of 0 to 1;

4
INFS2200/INFS7903 – Relational Database Systems

Hence, Min Cardinality = 0 and Max Cardinality = 11.

B. Arity = Arity of Course_Taken + Arity of Course - number of common attributes = 4 + 3 - 1 = 6.

Schema = (Course_No, Term, SID, Grade, Name, Level).

Attribute Course_No is a foreign key of Course_Taken that refers to Course, which means that for every
Course_Taken tuple there is exactly one matching Course tuple (because Course.Course_No is a primary
key). Hence, Cardinality = Cardinality of Course_Taken = 11.

Exercise 4 Solution:

A. The joins of these 4 relations can be performed separately. Firstly, we join the two of them:

|Customer Invoice| = 1.2 million


|LineItem Product| = 12 million;

Then the final resulting size, by joining the above two, is 12 million. Here LineItem governs the size by the
foreign keys.

B. See below the join cost calculations.

Join (10,000, 1,200,000) + Join (1,200, 000, 12,000,000) + Join (12,000,000, 1,000)

If LineItem were in the first join that would cost more since now all results have 12 million tuples, whereas
in the original sequence the first one had only 1.2 million.

C. At most 5 (the number of distinct customer types).

D. (This question is regarding Question 4-C. above. Please see Exercise 5 for a graphical illustration.)
Please refer to the graphical solutions of Exercise 5:

Projection onto customer type. It can't be done until the end.


Projection of line items on invoice ID. Must be done after join of Line item with Product.

E. (This question is regarding Question 4-C. above. Please see Exercise 5 for a graphical illustration.)
Please refer to the graphical solutions of Exercise 5:

We assume uniform distributions of product type values and monthly invoices.

Selection of Product on Product Type (around 100 tuples)


Selection of invoice on month July (around 100,000 tuples)

F. Excluding Cartesian products, the following three joins can be performed:

First Join R = Customer (invoices for July), cost = Join(10,000, 100,000), result 100,000 tuples
Second Join S = (invoices for July) LineItem, cost = 100,000 x 12,000,000, result 1,000,000 tuples
Third Join T = (LineItem for July) (products of given type), cost = Join(1,000,000, 100), result 100,000 tuples

The final join can be done on R and T: R T, cost = Join(100,000, 100,000), result 100,000 tuples

G. See above solutions of different number of tuples to Exercise 4-F.

5
INFS2200/INFS7903 – Relational Database Systems

H. After joins R, T and R T, a projection on Customer Type is needed, which requires a sorting operation
followed by duplicate elimination.

Exercise 5 Solution:

The steps of this optimisation process are following:

(1) we must build a standard canonical tree for the query.

πType
σInvoice.CustID = Customer.CustID
σLineItem.InvID = Invoice.InvID
σProduct.ProdID = LineItem.ProdID
σProductT ype = Widget
σM onth = July

(2) Then we must push down as far as possible the selections.

.CustID

6
INFS2200/INFS7903 – Relational Database Systems

(3) And now we can replace Cartesian products with joins.

(4) Here is a sequence of joins, so we can choose the less expensive arrangement.

(5) At the final step, we push down the projections to eliminate irrelevant attributes.

---ooo000O000ooo---

You might also like