Databases
Databases
Databases
1
Physical design describes the base relation, file organization, and
indexes used to achieve efficient access to the data, and any
associated integrity constraints and security measures.
Sources of information for the physical design process include
global logical data model and documentation that describes model.
Logical database design is concerned with the what; physical database
design is concerned with the how.
It describes the storage structures and access methods used to
achieve efficient access to the data.
2
model in target DBMS.
Designing base relation involves identification of all
necessary requirements about a relation starting from the name up to
the referential integrity constraints.
For each relation, need to define:
• The name of the relation;
• A list of simple attributes in brackets;
• The PK and, where appropriate, AKs and FKs.
• A list of any derived attributes and how they should be
computed;
• Referential integrity constraints for any FKs
identified. For each attribute, need to define:
• Its domain, consisting of a data type, length, and any
constraints on the domain;
• An optional default value for the attribute;
• Whether the attribute can hold nulls.
The implementation of the physical model is dependent on the
target DBMS since some has more facilities than the other in defining
database definitions.
The base relation design along with every justifiable reason should be
fully documented.
5.1.2. Design representation of derived data
While analyzing the requirement of users, we may encounter that
there are some attributes holding data that will be derived from
existing or other attributes. A decision on how to represent any
derived data present in the global logical data model in the target
DBMS should be devised.
Examine logical data model and data dictionary, and produce list
of all derived attributes. Most of the time derived attributes are not
expressed in the logical model but will be included in the data
dictionary. Whether to store derived attributes in a base relation or
calculate them when required is a decision to be made by the
designer considering the performance impact.
Option selected is based on:
• Additional cost to store the derived data and keep it consistent
with operational data from which it is derived;
• Cost to calculate it each time it is required.
Less expensive option is chosen subject to performance
constraints. The representation of derived attributes should be
fully documented.
3
database and the data model used but also with some enterprise
dependent constraints. These constraint definitions are also
dependent on the DBMS selected and enterprise level requirements.
One need to know the functionalities of the DBMS since in designing
the enterprise constraints for the target DBMS some DBMS
provide more facilities than others.
All the enterprise level constraints and the definition method in the
target DBMS should be fully documented.
4
that each transaction accesses, and/or
• Transaction usage map, indicating which relations are
potentially heavily used.
To focus on areas that may be problematic:
1. Map all transaction paths to relations.
2. Determine which relations are most frequently
accessed by transactions.
3. Analyze the data usage of selected transactions that involve
these relations.
5
• Possible performance degradation during query optimization
to consider all secondary indexes.
Guidelines for Choosing Indexes
(1) Do not index small relations.
(2) Index PK of a relation if it is not a key of the file organization.
(3) Add secondary index to a FK if it is frequently accessed.
(4) Add secondary index to any attribute that is heavily
used as a secondary key.
(5) Add secondary index on attributes that are involved in:
selection or join criteria; ORDER BY; GROUP BY; and
other operations involving sorting (such as UNION or
DISTINCT).
(6) Add secondary index on attributes involved in built-in
functions.
(7) Add secondary index on attributes that could result in an
index- only plan.
(8) Avoid indexing an attribute or relation that is frequently
updated.
(9) Avoid indexing an attribute if the query will retrieve a
significant proportion of the tuples in the relation.
(10) Avoid indexing attributes that consist of long character
strings.
6
by relaxing the normalization rules will improve the performance of
the system. Result of normalization is a logical database design that
is structurally consistent and has minimal redundancy.
However, sometimes a normalized database design does not
provide maximum processing efficiency.
It may be necessary to accept the loss of some of the benefits
of a fully normalized design in favor of performance.
Also consider that denormalization:
• Makes implementation more complex;
• Often sacrifices flexibility;
• May speed up retrievals but it slows down updates.
Denormalization refers to a refinement to relational schema such
that the degree of normalization for a modified relation is less than
the degree of at least one of the original relations.
Also use term more loosely to refer to situations where two relations
are combined into one new relation, which is still normalized but
contains more nulls than original relations.
Consider denormalization in following situations, specifically to
speed up frequent or critical transactions:
• Step 1 Combining 1:1 relationships
• Step 2 Duplicating non-key attributes in 1:* relationships to reduce joins
• Step 3 Duplicating foreign key attributes in 1:* relationships to reduce
joins
• Step 4 Introducing repeating groups
• Step 5 Merging lookup tables with base relations
• Step 6 Creating extract tables.