0% found this document useful (0 votes)
2 views

Lecture 3 M

The document outlines the Database Life Cycle, detailing stages from system definition to monitoring and maintenance, with a focus on database design, implementation, and data conversion. It explains logical database design, including system-independent mapping and tailoring schemas to specific DBMS, as well as the process of mapping ER diagrams to relational schemas. Additionally, it covers SQL basics for creating and manipulating relational databases, including table creation, views, and various SQL commands.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 3 M

The document outlines the Database Life Cycle, detailing stages from system definition to monitoring and maintenance, with a focus on database design, implementation, and data conversion. It explains logical database design, including system-independent mapping and tailoring schemas to specific DBMS, as well as the process of mapping ER diagrams to relational schemas. Additionally, it covers SQL basics for creating and manipulating relational databases, including table creation, views, and various SQL commands.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Designing &

Implementing a
Relational Database

1
The Database Life Cycle
All databases regardless of their application area go through a similar life cycle. The Database
life cycle is a specialized process to develop a database, it follows the systems development life
cycle: -
1. System definition. The scope of the database system, its users, and its applications are
defined. The interfaces for various categories of users, the response time constraints, and
storage and processing needs are identified.
2. Database design. A complete logical and physical design of the database system on the
chosen DBMS is prepared.
3. Database implementation. This comprises the process of specifying the conceptual, external,
and internal database definitions, creating the (empty) database files, and implementing the
software applications.
4. Loading or data conversion. The database is populated either by loading the data directly or
by converting existing files into the database system format.

2
The Database Life Cycle
5. Application conversion. Any software applications from a previous system are converted to
the new system.
6. Testing and validation. The new system is tested and validated. Testing and validation of
application programs can be very involving. There are automated tools that assist in this
process, you may want to find out.
7. Operation. The database system and its applications are put into operation. Usually, the old
and the new systems are operated in parallel for a period of time.
8. Monitoring and maintenance. During the operational phase, the system is constantly
monitored and maintained. Growth and expansion can occur in both data content and software
applications. Major modifications and reorganizations may be needed from time to time.

These slides focus on stages 2, 3, and 4, the other stages are carried out just like in information
systems development.

3
Database Design - Logical
Logical database design is divided into two:-
1 System-independent mapping. In this stage, the mapping does not consider any specific
characteristics or special cases that apply to the particular DBMS implementation of the data
model. This is done through some algorithms, an example algorithm is discussed in the following
slides i.e. mapping of an ER schema to a relational schema.
2. Tailoring the schemas to a specific DBMS. Different DBMSs implement a data model by using
specific modelling features and constraints. We may have to adjust the schemas obtained in
step 1 to conform to the specific implementation features of a data model as used in the
selected DBMS.

4
System Independent Mapping:
Suppose an ER Diagram as shown
Species_ID Name Type Gene
Id Name

Sequence
1 M
Species Has Gene

Codes
for

Name
1

PID Protein
Sequence

5
Mapping the ERD to a Relation
1. Map Regular Entities – Crate a Relation R with all simple attributes, for composite attributes include
their simplest form. Choose one attribute as Primary key.
2. Map Weak Entities.- create a relation R for a weak entity & include all simple attributes (or simple
components of composite attributes) of the weak entity as attributes of R. Include the primary key of the
owner entity in R to form a composite key in the new relation.
3: Map Binary 1:1 Relationships. For each binary 1:1 relationship, identify the relations S and T that
correspond to the entities participating in R. There are three possible approaches:
(i) the foreign key approach, i.e., choose 1 of the relations to carry the foreign key & the attributes of the
relationship.
(ii) the merged relationship approach (put the two together), and
(iii) the cross reference or relationship relation approach (create a third all key relation to carry the 2
keys).
The first approach is the most useful.

6
Mapping the ERD to a Relation
4. Map Binary 1:N Relationships. For each regular binary 1:N , identify the relation S that represents
the entity at the N-side of the relationship. Include as foreign key in S the primary key of the relation T
that represents the other entity. Include any simple attributes (or simple components of composite
attributes) of the 1:N relationship type as attributes of S.
5: Map Binary M:N Relationships. For each binary M:N relationship, create a new relation S. Include
as foreign key attributes in S the primary keys of the relations that represent the participating entity
types; their combination will form the primary key of S. Also include any simple attributes of the M:N
relationship type (or simple components of composite attributes) as attributes of S.
6: Map Multivalued Attributes. For each multivalued attribute A, create a new relation R. This
relation R will include an attribute corresponding to A, plus the primary key attribute K—as a foreign
key in R—of the relation that represents the entity type or relationship type that has A as a
multivalued attribute. The primary key of R is the combination of A and K. If the multivalued attribute
is omposite, we include its simple components.

7
Mapping ER Model to Relation
7: Mapping of N-ary Relationship Types. For each n-ary relationship type R, where n > 2, create
a new relation S to represent R. Include as foreign key attributes in S the primary keys of the
relations that represent the participating entity types. Also include any simple attributes of the
n-ary relationship type (or simple components of composite attributes) as attributes of S.

8
Applying the Algorithm to the ERD
1. Mapping Regular entities: -
Species (Species_Id, Name, Type)

Gene(Gene_Id, Name, Sequence)


Protein (PID, Name, Sequence)
2. Mapping Weak entities – There are no weak entities (we may consider introducing an author
of a gene sequence as a weak entity, show how this will be dealt with)

3. Mapping 1:1 Relationships: -


Using the first approach given in the algorithm, we choose to include Gene_Id in the table
protein :- Protein (PID, Name, Sequence, Gene_Id)

9
Applying the Algorithm to the ERD
4. Mapping 1: N Relationships :-
We include the foreign key of species in Gene thus :-
Gene (Gene_Id, Name, Sequence, Species_Id)
5. Mapping Binary M : N Relationships – There is none in this example
6. Mapping Multi-valued attributes – There is none in this example, but a protein fold is an
example of a multi-valued attribute.
7. Mapping N-ary relationships – There is none in this example:

10
Applying the Algorithm
The Resulting Database Structure is:
Species (Species_Id, Name, Type)

Gene(Gene_Id, Name, Sequence, Species_Id)


Protein (PID, Name, Sequence, Gene_Id)

11
Implementing a Biological Database
There are many ways using the DBMS (MySQL), example: -
DBMS GUI User Interface,
MySQL Workbench (Create an ERD and forward engineer
your database)
SQL Statements from the Shell (You have to know the SQL
statement syntax)

12
Structured Query Language
SQL is a standard language for creating and manipulating Relational databases. It facilitates a smooth
transition from one DBMS to the other.

SQL Basic Statements:


Divided into:- Data Definition and Data Manipulation
Create Schema Select <attribute list>
Table FROM <table list>
View WHERE <condition>;
Index
Example Create statements:-
◦ Create Schema SCIDB AUTHORORIZATION ‘smoyo’;
◦ Create Table SCIDB.Species;

13
Creating the Database Tables
CREATE TABLE SCIDB.Species
( Species_Id VARCHAR(15) NOT NULL,
Name CHAR,
Type VARCHAR(30),
PRIMARY KEY (Species_Id));

14
Creating the Database
CREATE TABLE SCIDB.Gene
( Gene_Id VARCHAR(15) NOT NULL,
Name CHAR,
Sequence VARCHAR(200) NOT NULL,
PRIMARY KEY (Gene_Id)
FOREIGN KEY (Species_Id) REFERENCES Species (Species_Id));

15
Creating the Database
CREATE TABLE SCIDB.Protein
( PID VARCHAR(15) NOT NULL,
Name CHAR,
Sequence VARCHAR(200) NOT NULLL,
PRIMARY KEY (PID),
FOREIGN KEY (Gene_Id) REFERENCES Gene (Gene_Id));

16
Creating Views
A view is a subset of a database afforded a particular set of users
CREATE VIEW Human Genes
AS SELECT Species_Id, Name, Type
FROM Species
WHERE Species_Id = “HMS”;

CREATE VIEW (PID, Name, Sequence, Gene_Id)


AS SELECT *
FROM Protein
WHERE PID = ‘P45’
GROUP BY PID;

17
Basic Select Statement
The SELECT statement is used to query the database for particular data of interest
The SELECT syntax:
SELECT <attribute list>
FROM <table list>
[ WHERE <condition> ] Specifies condition
[GROUP BY <attribute>] Specifies grouping attributes
[HAVING <condition>] Specifies condition on the groups
[ ORDER BY <attribute list> ] Sorts query result at the end

Square brackets indicate options

18
Select Statement
Example 1: To List everything from a table:-
SELECT PID, Name, Sequence, Gene_Id OR SELECT *
FROM Protein; FROM Protein;

Example 2:Working with two tables


SELECT PID, Name, Gene_Id
FROM Protein, Gene
WHERE Protein. Gene_Id = Gene.Gene_Id;

19
Using Distinct & LIKE in SQL
SELECT DISTINCT Name
FROM Protein; -Only distinct tuples should remain in the result
- Eliminates duplicates

SELECT Species_Id, Name -Target all those species with this text on name
FROM Species
WHERE Name LIKE ‘%sapiens’;

20
INSERT Statement
INSERT INTO Species
VALUES ( ‘SP245’, ‘Homo sapiens’, ‘Eukaryote’);

The INSERT statement adds a single row to a relation, and the attribute values must be specified
in the order that they appear in the table.
Unspecified attributes are set to their defaults or NULL

21
The DELETE & Update Command
Example 1
DELETE FROM Species
WHERE Name =‘Homo sapiens’; Removes a row with the species Homo sapiens

Example 2
DELETE FROM Protein
WHERE Gene_Id =‘G_102’; Remove all those proteins containing the named gene
Update Command
UPDATE Species
SET Name = ‘Escherichia coli’
WHERE Species_Id = ‘SP102’; Changes species name for the named record

22
MySQL DBMS
Download the MySQL DBMS at the site below, or use any DBMS of your choice:-

https://dev.mysql.com/downloads/file/?id=479862

Follow the Installation instructions given,


Implement the database we have designed in these slides using an appropriate DBMS, add more attributes to describe
your entities.

You may use the video on this link to help you to create the database using MySQL Workbench.
https://www.youtube.com/watch?v=K6w0bZjl_Lw

23
Database Search Tools
In Public databases, stakeholders search databases for the following purposes:
Gene finding
Assigning likely function to a gene.
Understanding genome evolution.
Assisting in sequence assembly
Finding relations between genes
etc.

We need algorithms to execute such searches, speed is key

24
BLAST
Basic Local Alignment Search Tool – a family of most popular sequence search programs the
simplest being Basic BLAST
BLAST is a linear time heuristic algorithm
Main idea (basic BLAST): Homologous sequences are likely to contain a short high scoring
similarity region a hit.
Each hit gives a seed (starting point) that BLAST tries to extend on both sides

25
Some BLAST Terminology
word – substring of a sequence
word pair – pair of words of the same length.
score of a word pair – score of the gapless alignment of the two words:
HSP – high scoring sequence pair

26
BLAST Main Steps
Parameters: w = length of a hit; T = min. score of a hit (for proteins: w=3, T=13 (BLOSUM 62))
Step 1: Given a query sequence Q, compile the list of possible words which form with words in Q
high scoring word pairs.
Step 2: Scan database for exact matching with the list of words compiled in step 1.
Step 3: Extend hits from step 2.
Step 4: Evaluate significance of extended hits from step 3.
Read the Document BlAST Explained:
BLAST Algorithm Explained.pdf

27
Statistical Significance of BLAST
Is the score high enough to provide evidence of homology?
Are the scores of alignments of random sequences higher than this score?
What are is the expected number of alignments between random sequences
with score greater than this score?

28
Variants of BLAST Algorithms
Algorithm Query (Comparison) Database

BLASTN Nucleotide vs nucleotide Nucleotide


Used for searching with shorter queries, cross-species comparison

BLASTP Protein vs protein Protein


Used in general sequence identification and similarity searches

BLASTX Nucleotide, sixframe translation (protein vs Protein


protein) - for identifying potential protein products encoded by
a nucleotide query

TBLASTN Protein vs protein - for identifying database sequences Nucleotide, six frame
encoding proteins similar to the query translation

TBLASTX Nucleotide, sixframe translation - for identifying Nucleotide, six frame


nucleotide sequences similar to the query based on their coding translation
potential 29

You might also like