Databases - Introduction: Jianxin Li
Databases - Introduction: Jianxin Li
Databases - Introduction: Jianxin Li
Jianxin Li 1
Senior Lecturer
School of Computer Science & Software Engineering
University of Western Australia
1
Thanks Gordon Royle in the School of Mathematics & Statistics for his contribution to the
lectures in this course.
Jianxin Li (UWA) Introduction 1 / 33
Introduction
What is data?
Databases
There are many different types of DBMS, but relational database management
systems (RDBMS) have been the dominant paradigm for several decades.
An old-fashioned DBMS
There are many variations, but usually they would at least have:
A large card catalogue listing all books in the library
Check-out slips stored in little envelopes in each book
An on loan list formed from the check-out slips of the books on loan
An old-fashioned DBMS
A card catalogue consisted of wooden cabinets with large numbers of
drawers, each containing an index card relating to a single book.
The drawers, and cards, were alphabetically sorted, usually by author name,
to allow readers to find which titles the library stocks.
"2010 Manchester UK 4467481691" by Ricardo from Manchester, UK - Manchester Central Library, March 2010
Check-out cards
Each book had a little envelope with a checkout-card that was removed and
filled-in with the due-date and the borrowers name, while the due-date was
stamped onto a slip of paper pasted into the book.
The check-out cards were stored in due-date order, allowing library staff to
quickly access the slips when books were returned, and to determine each day
which books had become overdue.
Jianxin Li (UWA) Introduction 7 / 33
Introduction
Features of this DB
Despite being manual, the library system nevertheless displays some of the
features of a modern database:
It had separate listings (catalogues) for books and borrowers with the
check-out cards forming a third list connecting specific books to specific
borrowers.
Many organisations can (and do) manage their data simply as a collection of
files on a shared file system.
Any user with sufficient file system privileges can search, open and edit the
files.
A DBMS sits between the users and the data, and manages the interactions
between them.
The DB model
User1
User2
DBMS
User3
User4
(In general, the Coursera video lectures and supporting materials from this
course are a useful resource for this unit.)
In more detail
Data Independence I
Data Independence II
If the user wants to know the name of the student with student number
22041020 then they ask the DBMS, using something like the following SQL
(Structured Query Language) command2 .
The user does not need to know where this information is stored, what
machine it is on, what files it is in and so on the mechanics of locating and
extracting the information is left to the DBMS.
2
Do not worry if you do not understand this yet!
Jianxin Li (UWA) Introduction 14 / 33
Introduction
Efficiency
Data Integrity
Data Administration
Concurrency Control
In a large organization, there will often be several people accessing the same
data item at the same time.
While this is not a problem if all users are simply viewing the data, it becomes
a major problem if some of the users need to update the data.
For example, an airline reservation system may have several travel agents
viewing availability at the same time, but the DBMS must prevent two agents
from booking the same seat at the same time.
Application Development
A large subject
Each of the topics listed above has enough theory, practice and technology
associated with it to form an entire unit that could legitimately be called
Databases.
Ramakrishnan & Gehrke3 identify two major approaches:
Systems Emphasis
Building database systems the nuts and bolts of storage, indexing,
query optimization, transaction management.
Applications Emphasis
Using database systems data modelling, data query languages,
database-backed applications.
3
http://pages.cs.wisc.edu/~dbbook/
Jianxin Li (UWA) Introduction 20 / 33
Introduction
This unit
Terminology
Tables / relations
The table, or relation (the dark green words are the formal terms), is the
fundamental object manipulated by a relational database.
Tables / Relations
A table has
A name allowing the designer and user to refer to it
A header giving names to the columns, each of which has a specified type
Zero or more rows, each representing one object
The rows
The highlighted row refers to the book Cannery Row written by Steinbeck
and first published in 1945.
The columns
The header of each column names a general property of the objects being
stored in the table.
A column of the table stores the values of these properties, one value per
object (i.e. per row).
Thus each book has a publication date a general property of books, but
each individual book has its own publication date, and so the value of this
attribute differs between books.
Attributes
Types I
Each attribute also has a type, which refers to the kind of values that can be
stored in that column.
Some common types are:
Various numeric types, including;
INT, BIGINT, FLOAT, DOUBLE
Various time-related types, including:
YEAR, DATE, DATETIME
Various text-related types, including:
CHAR, VARCHAR, TEXT
Various other special-purpose types, including
BLOB, ENUM, BINARY
Types II
Tuples I
Tuples II
Some tuples are simply not legal, either because they have the wrong arity or
a component has the wrong type. So
is not a legal tuple because the third component does not represent a date, and
A subtlety
The structure of the table that is, the names and types of the attributes
is called the schema of the relation.
The schema of a relation is normally carefully designed and changes
infrequently, usually in response to some structural change in the business
environment
The contents of the table at any particular point in time is called an
instance of the relation
Under normal usage, rows may be added to a table, then altered and finally
deleted, and so the relation instance is frequently changing
Every actual database system omits some SQL commands, but includes some
non-standard extensions.