R DB Design
R DB Design
R DB Design
Should contain, as far as possible, attributes of only the entity it represents. Example: A bad relational schema. semantically. Department (bad schema) Dnumber Dlocation Dname Interest rate Interest rate attribute does not fit
Should require minimum storage space and have minimum data redundancy. Example: Suppose it takes one word to store the value of an attribute. Sport (good schema)
Inst_id 1 1 2 2
This relation will require 20 words. We split this relation into S1 and S2 relations
S1 Sname Instructor-id Foorball 1 Tennis 1 Baseball 2 Golf 2 S2 Sname Instructor-id Tom 1 Peter 2
This arrangement requires 18 words to store the same information. The amount of saving increases with database size. A smaller storage space also improves searching. These are a few simple examples to illustrate the importance of good schema. Our aim is then to design good schema.
Database Processing
Consider the following Student relation. The relation indicates that students with stu_id take up some Activities and pay certain amount of fee for learning the activity. A student with any id can take any number of Activities provided they pay the fee. Assume
that this relation is stored in the database. We want to see what problems occur if this relation is modified. Student Stu_id 100 100 150 175 175 200 200 Activity Skiing Golf Swimming Squash Swimming Swimming Golf Fee 200 65 50 50 50 50 65
Modification: Suppose Student 100 gave up skiing. The first record must be deleted from the database. Effect: Database does not have information about skiing fee. More information is deleted than intended. This is not a good relation.
Modification: Suppose you want to add Baseball in the database and you are going to charge 200. Effect: Cannot add this information since there is no student. We cannot add a fact about one entity until we have an additional fact about another entity. To add this record you must have a student willing to learn baseball. You are not allowed to insert null value in Stu_id attribute. This is not a good relation.
These are called Modification Anomalies. We will study them in detail. First we define some useful terms
Prime attribute: An attribute primary key set. If primary key = {A, B, C}, then attributes A, B, and C are prime attributes. Non-prime attribute: Attribute primary key set. If R(A, B, C, D) and primary key = {A, B, C}, then attribute D is a non-prime attribute. There is no formula to establish FD. The semantics of a relation should indicate how attributes of a relation are related. Examples Suppose we create relation Sport_Fee. It holds information about fee charged for each sport. The semantics of this relation indicates that the value of the attribute Activity determines the value of the fee. This means wherever Golf appears, its fee will be 65. From this semantics we can conclude that Activity Fee. We can also see that Fee does not determine Activity (Fee Activity), i.e., Fee cannot be the left side of FD. In Sport_Fee fee value of 50 is associated with two different activities. Sport_Fee Activity Skiing Golf Squash Swimming Consider the following Parts relation. The Primary key = {SSN, Pnumber}. The FDs are: Pnumber {Plocation, Pname} and Plocation Pnumber Parts SSN 1234 1234 1235 1236 1237 1238 Pnumber 1 2 1 3 1 5 Plocation Kansas City Mission Kansas City Wichita Kansas City Kansas City Pname Car Rocket Car DC 10 Car Truck Fee 200 65 50 50
SSN {Ename, Bdate, Address, Dnumber} Dnumber {Dname, Dmgrssn} These FDs are represented as follows using FD diagram. Emp_Dept Ename SSN Bdate Address Dnumber Dname Dmgrssn Similarly FDs in Emp_Proj schema are represented as follows. Note that one of the prime attributes (e.g., Pnumber) can also be a determinant. {SSN, Pnumber} Hours. SSN Ename Pnumber {Pname, Plocation} Emp_Proj
SSN Pnumber Hours Ename Pname Plocation
IMPORTANT: One must remember that FDs in a relation are defined by the database designer. In the above examples these FDs may not be valid if they have not been defined.
To understand 1NF we define Repeating groups. In Order relation, for some Ono value, there are more than one Pno values (BT04 and BZ66). Thus Pno is a repeating group. This is not allowed in relational model where every tuple must have values (including NULL) in all its attributes. Thus, for the 3rd and 6th tuples in Order, Ono and Date must have some appropriate values. If suitable values are inserted, then the repeating groups are eliminated. This is called normalization. To normalize Order copy the values of all other attributes from the previous tuple. Thus, the 5th tuple has no
value for Ono and Date attributes. These copies are copied from the previous tuple. This process is repeated until every tuple has values for all its attributes.
Non-normalized: Order
Ono 12489 12491 12494 12495 Date 90287 90287 90487 90487 Pno AX12 BT04 BZ66 CB03 CX11 AZ52 BA74 BT04 Pno AX12 BT04 BZ66 CB03 CX11 AZ52 BA74 BT04 No_Ordered 11 1 1 4 2 2 4 1 No_Ordered 11 1 1 4 2 2 4 1
12498 12500
Normalized: Order Ono 12489 12491 12494 12495
90587 90587
Date 90287 90287 90487 90487
1245
12498
90487
90587
1248
12500
90587
90587
Normalized Order relation is in 1NF. The primary key to the not-normalized relation Order was Ono alone. The primary key to the normalized Order is now the combination of Ono and Pno. In general it is true that the primary key will expand in the transformation of not-normalize relation to 1NF. It will typically include the original primary key concatenated with the key to the repeating group.
Modification Anomalies
Consider the following Order relation. It is in 1NF.
Order
Ono Date Part_descrip Pno No_Ordered
11 1 1 4 2 2 4 1
A change to the description of BT04 requires several changes since BT04 has been duplicated as a result of normalization. Inconsistent data: In the absence of incomplete update, BT04 may have different values in other attributes. Additions: We cannot add ZZ14 until we have an order for ZZ14. Deletion: By deleting BT04 we lose that BT04 represents Stove.
Conclusion: Order relation needs some improvement to minimize anomalies. It must be normalized to 2NF. 2NF
A relation is in 2NF if it is in 1NF and non-prime attribute is fully functionally dependent on the primary key.
Order_no Date Part_no Part_desc No_Ordererd Price Primary key: Order_no, Part_no. Relation Order is in 1NF: All attributes are atomic. Relation Order is not in 2NF, because non-key attribute Part_desc is dependent upon Part_no which is a portion of the primary key. Also Date is dependent upon Order_no (part of the primary key). This type of dependency is often termed as Partial Dependency 1NF to 2NF: Split the relation into two or more via projection as follows. Steps
Identify the set of attributes that makes up the primary key, i.e., {Order_no, Part_no}. Create all subsets of the above set, i.e., {Order_no}, {Part_no} and {Order_no and Part_no}. Designate each of these subset as the primary key of a relation that contains those attributes which are dependent on these primary keys:
Primary Keys: {Order_no}, {Part_no} and {Order_no, Part_no} New relational schemas Order (Order_no, Date) Date is functionally dependent on Order_no (see diagram). Part (Part_no, Part_desc) Part_desc is functionally dependent on Part_no. Order_line (Order_no, Part_no, No_ordered, Price) Occurrences of Order Part_desc, and Order_line
Order Order_no Date 12489 12491 12494 12495 12498 12500 90287 90287 90487 90487 90587 90587
Part Part_no Part_desc AX12 AZ52 BA74 BH22 BT04 BZ66 CA14 CB03 CX11 Iron Skates Baseball Toaster Stove Washer Skillet Bike Mixer
Order_line Part_no Order_no No_ordered Price AX12 BT04 BZ66 CB03 CX11 AZ52 BA74 BT04 12489 12491 12491 12494 12495 12498 12498 12500 11 1 1 4 2 2 4 1 14.95 402.99 311.95 175.00 57.95 22.95 4.95 402.99
All these relations are in 2NF. Anomalies have been eliminated. To verify this we perform the same set of operations, which resulted in anomalies in the parent relation Order. Change: Addition: Delettion: Information loss: Change to BT04 requires only one change in Part relation. Adding a new part and its description do not need an order to exist. Deleting order 12489 does not cause AX12 to be deleted from Part. none.
Q. Does this imply that relations in 2NF do not have modification anomalies? A. No. Relations in 2NF may suffer with all modification anomalies. Example Customer Cust_no 124 256 311 315 405 412 522 567 587 622 Name Sally A Ann S Don C Tom D Al W Sally A Mary N Joe B Judy R Dan M Address 4747 Troost 215 Oak 48 College 914 Cherry 519 Watson 16 Elm 108 Pine 808 Ridge 512 Pine 419 Chip Slsrep_no Slsrep_name 2 Tom J 6 Bill S 12 Sam B 6 Bill S 12 Sam B 3 Mary J 12 Sam B 6 Bill S 6 Bill S 3 Mary J
Customer is in 2NF. It suffers with all the anomalies Update: Inconsistent data: Additions: Deletions: A change to Slsrep_name requires multiple changes. There is nothing in the design that would prohibit a Slsrep_name from having two different names. To add Slsrep_no 47, there must be a customer for 47 first If we delete all the customers of a sales rep then we lose the name of the Sales rep also
Reason for these anomalies and remedy Slsrep_no which is not a primary key, determines Slsrep_name. As a result Slsrep_no can appear many times in the relation. Cust_no determines Slsrep_name through Slsrep_no. This is a transitive dependency. A transitive dependency exists among three or more attributes. For example, suppose there are three attributes X (primary key), Y, and Z. If X Y and Y Z then X Z. So Z is transitively dependent on X. To minimize anomalies normalize Customer to 3NF by removing transitive dependency. 3NF
A relation r(R) is in 3NF if it is in 2NF and no non-prime attribute of r is transitively dependent on the primary key.
Customer relation is further normalized to the following relation using the method descrined earlier. Customer_Profile: Srep_Profile: Another example Housing
SID 100 150 200 250 300 Building Randolf Ingersol Randolf Pitkin Randolf Fee 1200 1100 1200 1100 1200
In Housing relation which is in 2NF, Fee is transitively dependent on SID. This relation has all modification anomalies. This relation can be converted to 3NF by normalization process. We convert Housing into Housing and Fee relations. Housing: SID Building Fee: Building Fee 3NF also have modification anomalies. Consider the following Advisor relation and its occurrence Advisor
Relationships A student can have one or more majors. A Major can have several faculties as advisors. A faculty member advises in only one Major area. SID cannot be a key since a student can have many Majors and therefore many advisors. A student cannot have many advisors in the same area. Keys are: Determinant: (SID,Major) Fname. (SID, Fname) Major Fname Major
This relation does not have transitive dependency. One of these sets can be selected as a primary key. Advisor is in 3NF since there is no transitive dependency but it has modification anomalies. Deletion Delete SID 300, we lose the fact that Perls advises in Psychology. Addition: Cannot add the fact that Keynes advises in Economics if there is no student enrolled. Update: If Cauchy advises in Physics then multiple changes are required. Inconsistency: Any change in Cauchy-Math will make Advisor inconsistent. Solution: Further normalization to Boyce/Codd Normal form. Boyce/Codd Normal Form (BCNF)
A relation is in BCNF if every determinant is a candidate key Advisor is not in BCNF because Fname Major, and Fname is not a candidate key. So Advisor is normalized to BCNF and the resultant relations are: Advisor: SID Fname Major: Major Fname Relations in BCNF may not entirely free from anomalies. Consider the following Student relation: Student SID Major Activity 100 Music Swimming 100 Accounting Swimming 100 Music Tennis 100 Accounting Tennis 150 Math Jogging
Semantics: A student can take more than one major and can enroll in more than one activity. Primary key: All three attributes. What is the relationship between SID and Major? It is not functional dependency, because students have several majors. There is some sort of relationship. This relationship can be illustrated by an example Suppose: Student 100 wants to enroll in Skiing. Add: tuple 100 Music Skiing. Resulting relation Student SID Major 100 Music 100 Music 100 Accounting 100 Music 100 Accounting 150 Math Activity Skiing Swimming Swimming Tennis Tennis Jogging
Semantics: It implies that Student 100 can take up skis as a Music major but he/she does not know to ski as an Accounting major. Illogical. Solution: Add tuple: 100 Accounting Skiing. Resulting relation. Student SID Major 100 Music 100 Music 100 Accounting 100 Accounting 100 Music 100 Accounting 150 Math Activity Skiing Swimming Swimming Skiing Tennis Tennis Jogging
The resulting relation is consistent. The relationship between SID and Major is a Multivalued dependency. SID determines not a single value but several values. Thus SID = 100 determines Majors [Music, Accounting] and Activities [Skiing, Swimming, Tennis]. The relation is in BCNF since all attributes make the primary key. It has anomalies. Addition: If one tuple is added then several other tuples must be added to preserve consistency. Delete: If 100 Music Skiing is deleted then 100 Accounting Skiing also has to be deleted even though Major and Activity are not related.
Student SID Major 100 Music 100 Music 100 Accounting 100 Accounting 100 Music 100 Accounting 150 Math
Solution: As usual to minimize the anomalies, Student can be projected out into the following two relations. S_major S_Activity
Multivalued dependency always occur in pairs. For the Student relation SID Major because Major depends only on the value of SID and not on the value of Activity. Similarly SID Activity. Activity does not dependent on Major since having a particular major implies nothing about Activity. The no relationship between Major and Activity creates problem in the sense that whenever we add a new Major, we must add a tuple for every value of Activity. The independence among attributes is not a problem if the attributes have a single value. For example, consider the following Student_shoe relation. Student_shoe SID Shoe-size 100 8 150 10 200 5 250 12 Primary Key: SID. In Student_shoe, Shoe_size and Marital_status are independent and have single value, anomalies cannot occur. We can delete or add any tuple with no problem. This observation leads to the definition of 4NF. 4NF A relation is in 4NF if it is in BCNF and it has no multivalue dependencies. Marital_status M S S S