Notes
Notes
Notes
SivanaHamer
Universidad de Costa Rica
Escuela de Ciencias de la
Computación e Informática
CI-0127 Bases de Datos
Sivana Hamer
Escuela de Ciencias de la Computación e Informática, Universidad de Costa Rica
II 2022, v1.4.2
Este trabajo tiene una licencia de: CC BY-NC-SA 4.0
Importante: Este documento recopila contenidos de diversos de sitios web especia-
lizados, académicos y documentos compartidos por universidades. Toda la informa-
ción es utilizada con fines estrictamente académicos. Para más información, puede
ver las referencias.
2
Índice general
I Introducción 7
II Diseño 17
2. Diseño conceptual 19
2.1. Database design process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2. Entity relationship model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3. Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4. Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5. Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6. Weak entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7. Binary versus higher degree relationships . . . . . . . . . . . . . . . . . . . 25
2.8. Naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.9. Extended entity relationship model . . . . . . . . . . . . . . . . . . . . . . . 26
2.10.Subtype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.11.Specialization & Generalization . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.12.Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.13.Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3
III Implementación 61
4. SQL Básico 63
4.1. Structured Query Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2. Data definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3. Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5. Data modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6. Data retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7. Database permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5. SQL Avanzado 81
5.1. NULL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2. Nested . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3. JOIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4. Aggregate functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5. More clauses SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.6. Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.7. Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.8. Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.9. Stored procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
IV Cálidad 89
6. Evaluación de la calidad del diseño 91
6.1. Design guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2. Functional dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3. Normal forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4. Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4
8.7. MINUS operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.8. CARTESIAN PRODUCT operation . . . . . . . . . . . . . . . . . . . . . . . . 127
8.9. JOIN operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.10.Sequences of operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.11.Complete set of operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.12.Equivalences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.13.Query tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.14.Query graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
VI Transacciones 155
10.Transacciones 157
10.1.Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10.2.Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10.3.Transaction properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5
6
Parte I
Introducción
7
Capítulo 1
1.1. Data
Data are known or assumed facts. Nowadays, we generate and use data during
our day to day. For example, while I am messaging someone with my phone lots of
data is generated and consumed. Somewhere in my phone, there is a phone address
that is accessed. Furthermore, if I’m using a messaging application there are servers
with IP addresses that must be retrieved to send the message. The message itself
also contains data, the message, who sent it, from where and when. Depending on
the application other data related to the message may also be possible to recollect
such as reactions (thumbs up), replies or who has seen it.
Having access to data has real world value. Companies build their entire business
models by the information they store such as social networks. Furthermore, there
are institutions that are in charge of creating and maintaining databases such as
the national register of citizens or banks. These companies have an interleaved role
within our society, having rights to decide who can vote, what they can buy or what
products will they be aware of through advertisement. Some companies even lose
their reputation by not managing data correctly. Thus, data is quite the commodity.
Throughout our history, we have always generated, used and preserved data. Due
to the usefulness of computers we have migrated data stored in other mediums to
a digital format. Specifically, data is stored in databases (DB) that are collections of
data.
Data integrity: Changes might cause errors in the integrity of the data. How
can we ensure that the student name is the same for the enrollment data? How
can we ensure that the semester is valid? What happens if a course changes
their name?
Data access: As the data has a certain value for the institution, they might want
to retrieve certain information. How can we find all the years of birth of students
9
who have taken a course?
Data concurrency: If there are several people trying to write to the file, we
have to deal with concurrent changes. Do we allow concurrent changes? How
do we choose which value to store? How do we reduce data anomalies?
Data security: As our files are stored in a computer, someone without access
might try to gather data. How can we ensure that the data is protected?
Data descriptiveness: Let us assume that someone new is managing the file
database, without training. How descriptive is the data? How descriptive are
the data conventions? How descriptive are the data relationships?
A query is any interaction with the database. For example, retrieving all the
students or adding a new enrollment.
Someone interacts with the database requesting a query. For example, gathering
all the students currently enrolled in a course or adding a new student. People
can include database administrators (DBA) who are in charge of administering
the database system, programmers that develop apps that access the databa-
se system, and final users who use an application that accesses the database
system.
This query is processed by the query processor that efficiently gathers the data.
10
The data items related to the query are accessed and gathered from physical
storage with a storage manager.
The meta-data is data about the data (data types, structures and constraints).
For example, allowing that year to contain only integers. As the meta-data is a
collection of data about data (quite recursive) it is also a database (as seen in
the Figure).
The data represents the stored data. For example, a student entry with ID “123456789”,
name “Ana Arias” and year of birth “1991”.
A database schema describes the database design as in the the meta-data of the
database.
The database state, snapshot or instance is the actual current data in the data-
base. For example, it would be all the students in the database at a moment in
time.
Requests a query
Database system
Processes
DBMS
the query
Meta-data Data
11
levels as their relationship is mapped between each other. The levels from lowest the
lowest degree of abstraction to highest are:
Physical level: Describes how the data is stored. Specifically, in this level the
block of bytes that are stored are abstracted from the user.
Conceptual or logical level: Defines what data is stored conceptually and what
is their relationship through data models (Section 1.6). Though there is a rela-
tionship between the data and how it is physically stored, there is a physical
level of independence as the details for the mapping are abstracted from the
user.
External or view level: Provides an interaction with the database for different
users. There are different views defined for each user, limiting access to certain
parts of the database. There is a logical level of independence as we can change
what data we are accessing without having to change the logical level of the
DBMS.
Though the data can be abstracted into higher or lower levels, the data only exists
in the lowest abstraction level (physical level). A summary of all the levels with their
relationships is shown in Fig. 1.2.
External level
How we interact with the View 1 ··· View n
data Logical
independence
Conceptual level
Conceptual schema
How we model the data
Physical
independence
Physical level Physical schema
How we store the data
Figura 1.2: Levels of abstraction
High-level or conceptual: Details how the users perceive the data. For exam-
ple, the entity-relationship model.
12
Self-describing: Combines bothe description of the data (schema) with the data
values (state). For example, NoSQL or key-value.
Depending on the selected data model and/or the DBMS used, the database must
be designed.
SELECT NAME
FROM STUDENT
WHERE BYEAR = 1991;
13
Low-level or procedural: Defines the data needed with how to get the data.
This type is related to relational algebra.
Two-tier: The application is in the client machine while the database resides in
a server machine. The interaction between tiers is achieved with an application
programming interface (API) that allows clients to call the DBMS. The standard
API for DBMS is called Open Database Connectivity (ODBC).
Three-tiers: The client machine acts as the front-end with an intermediate ap-
plication server that stores the business logic. The application server thus inter-
acts with the database server to retrieve the queries. Examples of these systems
include web browsers or mobile applications.
n-tier: One can define a variable amount of n tiers. Usually, the intermediate ap-
plication server can be divided into different responsibilities to better distribute
load.
The visual structure of two and three tier architecture is shown in Fig. 1.3.
User User
Client
Application Application client
Application server
Database system Servers
Database system
Figura 1.3: Two (left) and three (right) tier DBMS architecture
1.9. History
14
Figura 1.4: Summary of database history
1950s-early 1960s. Data was moved to computer data storage such as punch cards
and tapes. There were several limitations with how the data could be read (se-
quentially) with some main memory being smaller than the data size.
late 1960s-early 1970s. Computers sequence limitations were lifted (no longer se-
quential). Relational data model began to be explored as they were defined in
the 1970s by Edgar Codd.
1990s. Database vendors added parallel capabilities and object oriented databases.
The advent of the internet changed databases to be more query focused and
process large amounts of data.
2000s. Advent of semi-structured data such as XML and JSON. Furthermore, several
open-source database systems were developed. Due to the large amount of data,
data mining became of interest and companies started to pilot NoSQL systems
for data-intensive applications.
2010s. NoSQL became adopted for some systems (social networks) and had capa-
bilities improved for users who interacted with the database. Data has been
outsourced into cloud services. Concerns with users’ privacy and security of the
data have become hot topics.
15
16
Parte II
Diseño
17
Capítulo 2
Diseño conceptual
Data requirements
Conceptual deisgn
Conceptual schema
(High-level data mo-
del)
Logical Design DBMS independent
(Data model mapping) DBMS specific
Logical Schema
(DBMS specific model)
Physical Design
Internal Schema
1. To start the design, the database designers gather and document the data re-
quirements from prospective database users. These requirements can be also
gathered with the functional or non-functional requirements of the application,
including the operations needed.
2. Based on the data requirements, the data high-level conceptual design is defined
with a conceptual schema based on a data model. These designs do not inclu-
de implementation details, being easier to understand for users and to create
a good design due to not needing to be bogged down by the physical and im-
plementation details. During or after this step, the operations in high level user
queries can also be defined.
3. Using the conceptual schema, we can implement it by using a logical design
or data model mapping. This implementation is DBMS specific and may be so-
mewhat automated using tools.
4. Finally, the physical design can be defined. This includes the internal storage
structure, file organization and indexes.
19
2.2. Entity relationship model
The entity relationship (E-R or ER) data model is a high-level or conceptual da-
ta model. In this model, data is described as entities (Section 2.3), attributes (Sec-
tion 2.4) and relationships (Section 2.5). The model can be graphically represented
in an ER diagram. For the following examples, we will try to exemplify using a course
enrollment system for our university. The ER diagram shown in Fig. 2.2.
ID STUDENT Name
Semester N
Year Last name
TAKES
ID Number ID Acronym
M
N M N 1
INSTRUCTOR IMPARTS GROUP HAS COURSE Name
2.3. Entities
An entity is a real-world “thing” or “object” distinguishable from other entities.
They may have a real world physical existence (e.g., person, building) or a concep-
tual existence (e.g., institution). In the course enrollment example entities can in-
clude instructors, courses, groups and students. An entity can be represented in an
ER diagram as a square, as shown in Fig. 2.3. Inside the square a name is defined.
Entities may be recognized as nouns.
While stored in a database, each entity will have different values for each attribute.
For example, there might be instances for a person such as p1 and p2 . Entity type are
the set of entities that have the same attributes (e.g., PERSON, BUILDING). The
entity set or entity collection are the collection of entities of a particular entity type
in the database at a point of time (for PERSON the p1 , · · · , pn ).
2.4. Attributes
Attributes are the properties that describe an entity. For example, for a person
these could include the name, year of birth, or gender. Each attribute has a value.
20
For example, for the previous attributes a value for a person p1 could be “Sivana
Hamer”, 1900 and “female”, respectively. An attribute is represented by an oval as
shown in Fig. 2.4. Attributes are connected by lines to the associated entity with
their names inside the oval. Attributes tend to be the nouns describing another noun
(entity).
Key vs non-key attributes. Every entity within a set has a unique key as a uni-
queness constraint, called the key attribute. For example, every student has an
ID that is unique to them. Another example is how every course has a unique
21
acronym in our university. The representation for a key value is shown in Fig. 2.8.
A composite variable can be also used as a key, as long as each of the subparts
together provide a unique value. For example, a license plate is unique as the
combination of the state and the number. An entity may also have more than
one key attribute, but always most have at least one. If an entity has no key
attributes, it is a weak entity (Section 2.6).
Attributes might also save NULL if for an entity a particular value is not appli-
cable (e.g., middle name if there is none) or if it is unknown (missing or is not
known). For example, we might not know someones birthday.
The values of the attribute can also be constrained between ranges and data
types, though in the ER diagram this is not represented. This is called the domain
of the attribute.
A complex attribute is a multivalued and composite attribute.
2.5. Relationships
Relationships are associations between several entities. For example, the rela-
tionship between the entity STUDENT and GROUP is TAKES. Relationships might
be recognized as verbs. It is represented in an ER diagram as a diamond, as seen
in Fig. 2.9. Relationships are connected by lines to the associated entities, with the
name inside the diamond.
The relationship set represents all the relationships of the same type.
The degree of the relationship is the number of associated entities to a rela-
tionship. For example, a binary relationship has two entities while a ternary has
three associated entities. In our current ER diagram, we can only see binary
relationships.
Every entity that participates within a relationship has a particular role. These
roles are described by role names, though they are not necessary for all the
relationships.
Relationships may be recursive or self-referencing when the same entity par-
ticipates within a relationship as different roles. For example, a PERSON in a
INSTRUCTing relationship may be the instructor or the apprentice. The repre-
sentation of this relationship is shown in Fig. 2.10.
22
INSTRUCTS
Apprentice Instructor
PERSON
The cardinality determines the maximum number of entities and entities can be
associated within a relationship set. The cardinality numbers are detailed above
the connecting lines of the relationship and the entity. The cardinality can be
used for any degree of relationship, though for this example we will use it for
two entities in a relationship, A and B. They can be:
One-to-one (1:1) Where every entity in A is associated at most with one entity
in B (Fig. 2.11). For example, every STUDENT has one ID_CARD while every
ID_CARD only has one STUDENT.
One-to-many (1:N). Where every entity in A can be associated with any num-
ber of entities (zero or more) in B, but B entities can be associated to at
most one entity in A (Fig. 2.12). For example, a COURSE may be given in
multiple GROUPs but every GROUP only has one COURSE.
23
Figura 2.13: Cardinality N:M between entities A (red) and B (green)
Total participation. Indicates that all entities must participate in the relationship.
For example, every GROUP is expected to have a STUDENT (at least 7 in
the UCR).
Partial participation. Indicates that only some entities participate in a rela-
tionship. For example, every COURSE is not expected to have a GROUP as
it might have not been given yet.
Relationships may also have attributes. Though for 1:1 or 1:N the attribute may
be associated to the entity with the 1 cardinality.
ID STUDENT Name
Semester (0,M)
Year Last name
TAKES
ID Number ID Acronym
(7,N)
(0,M) (0,N) (1,1) (0,N)
INSTRUCTOR IMPARTS GROUP HAS COURSE Name
Figura 2.14: ER diagram for course enrollment using (min, max) notation
24
2.6. Weak entities
Entities without keys are weak entities, in contrast to entities with keys that are
strong entities. Weak entities can be identified based on the combination of their
attributes and another entity type (i.e., identifying or owner entity type). The rela-
tionship between the weak and owner entity is called the identifying relationships.
An example of a weak entity is GROUP with the HAS relationship with COURSE, as
every GROUP requires a course. The weak entity type and identifying relationship
are represented in the ER diagram as a double lined box (Fig. 2.15) and diamond
(Fig. 2.16), respectively.
A weak entity has a participation constraint with their owner entity. Weak entities
may require a partial key that in combination with the owner entity gives uniqueness.
We do not require partial keys in 1 to 1 relationships. Partial keys are denoted with
dashed underlining. We can see that for the GROUP, we have the group number,
semester and year.
25
GROUP entity as a weak entity of COURSE and SEMESTER, however if we did that
we would have no need for the ternary relationship as the information within the
SEMESTER would be part of the GROUP. We must also must have the SEMESTER as
part of the key in GROUP due to the fact that a group may have the same number,
but be given in different semesters.
ID STUDENT Name
N
Last name
P M
SEMESTER TAKES COURSE Name
Diagrams are organized to be read from left to right and top to bottom.
26
Birthday Phone ID Acronym
∪
o d Credits
ASSOCIATION
∪
∪
∪
UNDEGRADUATE GRADUATE VISTING DOMESTIC
Position
Thesis
has
Credits
ISA ST o OT ISA DP
ISA DT
Thesis
has
Position
27
2.10. Subtype
A subtype or subclass of an entity are subgroups for an entity type. For example,
the STUDENT entity can be divided as UNDERGRADUATE (associates or bachelors),
GRADUATE (masters or doctorate), DOMESTIC, VISITING or as part of the student
ASSOCIATION. The different groups for an entity are the subtype, while the super-
class or supertype is the higher level entity. This relationship is also called ISA (IS
A or IS AN) type like a UNDERGRADUATE is a STUDENT. We can see that this is
either represented by a line with a circle and another line defining the hierarchical
structure between supertypes and subtypes, or with a triangle that has an ISA.
Each subtype may have different attributes, like GRADUATE students having a
flag to indicate if they completed their dissertation or thesis. Subtypes may also ha-
ve different relationships than the supertype, like VISITING student IS from another
UNIVERSITY. These relationships may also have attributes. The attributes and rela-
tionships of the superclass are inherited by the subclass.
2.12. Constraints
There are different constraints we can define between the set of subclasses.
28
(a) Disjointness constraint (b) Overlapping constraint
a double line between the circle and the superclass, or a t for total besides the
ISA. Meanwhile, if it is not mandatory for the entities in the superclass to ha-
ve a subclass, we have a partial specialization. For example, a student may be
part of the student ASSOCIATION. This would be partial specialization as not
all students would be part of the ASSOCIATION (only a select few as shown in
Subfig. 2.21b). We denote partialness with only one line between the circle and
the superclass, or with a p for partial besides the ISA.
Disjount Overlapping
Total DT OT o ST
Partial DP OP o SP
29
2.13. Review
Para cada uno de los siguientes requerimientos de datos, elabore el diseño concep-
tual utilizando modelo ER o EER. Las decisiones de diseño deben estar justificadas
con base en los requerimientos. En caso de ambigüedad u omisión en los requeri-
mientos, puede hacer suposiciones lógicas.
Simple
Sistema de bares
De: Dra. Alexandra Martínez
Parte 1:
Un bar vende muchas cervezas y una cerveza puede venderse en muchos bares.
A un tomador le pueden gustar varias cervezas, y una cerveza puede ser tomada
por muchos tomadores o ninguno.
Un tomador frecuenta bares, y un bar puede ser frecuentado por varios toma-
dores.
Parte 2:
Modifique el diseño conceptual anterior (creado anteriormente) para incluir la
siguiente restricción (requerimiento):
Sistema de ligas
De: Dra. Alexandra Martínez
Se desea crear una base de datos para almacenar información sobre ligas, equi-
pos y jugadores.
El nombre de los equipos no es único, pero ninguna liga tiene dos equipos con
el mismo nombre.
El número de los jugadores no es único, pero ningún equipo tiene dos jugadores
con el mismo número.
30
Sistema de cartas al estudiante
Se quiere crear un sistema que permita guardar los libros asociados a una carta
del estudiante.
Cada curso impartido por la persona docente tiene una carta al estudiante para
ese semestre en específico. Esto es debido a que las cartas al estudiante se
pueden actualizar en cada semestre.
Cada libro tiene autores, una versión, un año de publicación, una editorial, y un
título.
Sistema de adaptaciones
Una película puede ser adaptada de uno o varios libros. Pero, debe tener un libro
para ser una adaptación.
Un libro puede ser adaptado por varias películas. Por ejemplo, hay varías pelícu-
las que son adaptaciones de “Alicia en el país de las Maravillas” como la versión
animada de Disney y el live action de Tim Burton.
Sistema de genealogía
Suponga que se quiere registrar una genealogía familiar en una base de datos.
31
Medianos
Sistema de bibliotecas BIBS
De: Dra. Alexandra Martínez
La base de datos debe mantener información sobre las bibliotecas afiliadas, sus
libros, ejemplares y usuarios.
Cada biblioteca tiene un nombre único que la identifica, una dirección y al menos
dos números de teléfono de contacto (el de recepción y el de préstamos).
Para cada libro se debe almacenar su signatura, título, edición, año y autores.
Una signatura identifica a un libro de forma única, pero no a sus ejemplares
(pues todos tienen la misma signatura y no se podrían distinguir).
Dado un libro específico, sus ejemplares se distinguen por medio del número de
copia, que va desde 1 hasta el número total de ejemplares que hay en el consor-
cio de bibliotecas. Sin embargo, para distintos libros, pueden existir ejemplares
con el mismo número de copia. Para cada ejemplar se debe poder consultar si
está en préstamo o no.
Una biblioteca puede tener uno o más ejemplares de un libro, sin embargo, un
ejemplar debe estar asignado sólo a una biblioteca. Es posible que dos o más
bibliotecas posean ejemplares del mismo libro, pero dichos ejemplares deben
tener un número de copia distinto (es decir, no es posible que dos bibliotecas
tengan la copia N del libro X).
Cada usuario tiene un número de carné único (emitido por el consorcio de biblio-
tecas), nombre, email, teléfono y estado de morosidad. Un usuario está moroso
si, para alguno de los libros que tiene prestados, ya pasó su fecha de devolución
y aún no ha sido devuelto.
Un usuario puede sacar (en préstamo) ejemplares de cualquier biblioteca del
consorcio. Si un usuario saca varios libros a la vez, se registra un préstamo se-
parado para cada uno de ellos, pues es posible que tengan fechas de devolución
diferentes (algunos se prestan sólo por 2 semanas, otros por 3 meses, etc.). Para
cada ejemplar que se da en préstamo se debe registrar la fecha de préstamo, la
fecha esperada de devolución (periodo por el cual se le prestó el ejemplar) y la
fecha real de devolución (fecha en que el usuario devolvió físicamente el libro).
Si un usuario no ha devuelto un libro después de la fecha debida, se le pasa a
estado moroso.
32
Cada producto cuenta con un código único que lo identifica, además de marca,
precio y descripción. Por ejemplo, la caja de leche Dos Pinos con 1 % de grasa
se considera un producto distinto de la caja de leche Dos Pinos con 0 % de gra-
sa, pero dos cajas de leche Dos Pinos con 1 % de grasa son el mismo producto
para efectos del sistema (tienen el mismo código). Se debe llevar un inventario
de productos por sucursal, que permita consultar y actualizar la cantidad en
existencia de los productos por sucursal.
Una sucursal pone a la venta muchos productos y un producto puede ser vendido
en varias sucursales.
Para cada compra que realiza un cliente, se debe guardar la fecha y hora en
que se efectuó, la sucursal en la que se realizó, el monto total de la compra y el
desglose de los productos comprados con su cantidad y precio unitario.
Sistema de Twitter
Se desea crear un sistema de redes sociales similar a Twitter.
Los usuarios pueden seguir a varios usuarios y pueden ser seguidos por varios
usuarios.
Cada Tweet puede ser respondido por otros usuarios. Este tweet de respuesta es
un Tweet que puede contener información. Además, estas respuestas también
se pueden responder.
Sistema de Spotify
Se desea crear un sistema de transmisión de media como Spotify.
33
Cada álbum puede tener varias canciones. Un álbum tiene un nombre y una
imagen. Además, el álbum está asociado a varios artistas pero siempre debe
tener un artista.
Cada episodio tiene un nombre, un archivo, una fecha de publicación, una des-
cripción, duración y un identificador.
Cada estado tiene un nombre único, una abreviación única, una bandera única
y un territorio en kilómetros cuadrados. Cada condado tiene un estado, un nom-
bre y una población. Los condados se reconocen únicamente por el estado y su
nombre.
Cada juez@ tiene una cédula única (social security number), un nombre (primer
nombre, segundo nombre y primer apellido) y una cantidad de años servidos.
Una corte puede tener varios jueces, pero siempre debe tener mínimo uno. Se
puede ser juez@ en varias cortes. El puesto de juez@ en una corte particular
tiene una fecha de inicio y una fecha de fin.
En las cortes, se litigan casos legales. Existen dos tipos de casos: civiles y cri-
minales. En los casos civiles, un@ demandante demanda un@ acasud@ por una
compensación monetaria entre partidos. En cambio, en los casos criminales un@
acusad@r demanda a un@ acasud@ para una penalización legal ante un delito.
Por lo tanto, en los casos criminales se da una sentencia de meses. En los casos
civiles la persona que demanda se llama un@ demandante mientras que en los
casos civiles se llama un@ acusad@r, pero se refieren al mismo concepto.
Cada caso se puede reconocer únicamente por la corte en que se litiga el caso,
el apellido de la persona demandante o acusad@r, el apellido de la persona aca-
sud@ y la fecha de inicio de la litigación del caso. Además, los casos también
guardan la fecha final del caso y el veredicto.
Dentro de cada caso, se presenta evidencia que son archivos asociados al caso.
Cada evidencia se reconoce únicamente la combinación de un número y el caso
en el que se asocia esa evidencia. Los casos pueden tener más de una evidencia.
34
Cada persona que participa en el caso (demandante, acusad@r o acusad@) se
llama un partido. Cada partido tiene una cédula única (social security number),
un nombre (primer nombre, segundo nombre y primer apellido), una fecha de
nacimiento, y una edad.
Los partidos dentro de un caso tienen diferentes relaciones entre sí. Por ejemplo,
puede ser una relación laboral, amistad, matrimonial, entre otros.
Cada abogad@ tiene una cédula única (social security number), un nombre (pri-
mer nombre, segundo nombre y primer apellido), un tipo (transaccional o de liti-
gación), una especialización (e.g., leyes de corporaciones, divorcio) y un bufete
de abogad@s que pertenece. Además, cada abogad@ obtuvo su título máximo.
Sistema de criptomonedas
La base de datos debe mantener información acerca de sistema que compra y
venta de criptomonedas.
Cada cuenta guarda cuales tokens tiene comprados. Un token solo tiene una
persona dueña a la vez y puede ser comprado varias veces.
Cada cuenta tiene un país asociado, dado que se regula bajo las leyes de esa
nación. Los países tienen un nombre único, un código de teléfono único (e.g.,
305 en Costa Rica) y una bandera única.
Dentro de nuestro sistema, existen diversas personas usuarias que pueden uti-
lizar el servicio. Cada cuenta de una persona usuaria tiene un nombre único,
un correo único, una contraseña y una cantidad de fondos en dólares que tiene
disponibles. Las personas usuarias almacenan también una cantidad de tokens
totales de las cuales son duenas.
35
Cada cuenta puede adquirir distintos tipos de tokens de criptomonedas en una
transacción. Cada transacción tiene tokens asociados, la cuenta que realizó la
compra y la cuenta que realizó la venta. Una transacción puede tener varios
tokens y un token puede ser comprado en varias transacciones. Además, para
cada token de un tipo dentro de una transacción se tiene un precio de compra
dado en dólares.
36
Capítulo 3
Relations
A relation or “relationship" (not really the correct term) is an unordered set of
attributes. We can think of a relation as an unordered table with columns (i.e., at-
tributes) and rows (i.e., tuples). The order of both the attributes and rows does not
matter (with some exceptions).
A relational scheme, schema or intention defines a relation. It is represented as
R(A1 , A2 , · · · , An )
R is the relational name. We write relational names in upper case. Common
relational names include Q, R and S.
Ai represents an attribute.
The degree or arity of a relation is the number of attributes Ai .
The cardinality is the total number of combinations of tuples (cartesian product
×) with values |D| possible for all attributes in the relation.
Example: We can represent the relation schema of SCHOOL = S as either of the
following:
SCHOOL(N ame, Acronym, P hone_number, N umber_students) =
37
S(N ame : string, Acronym : string, P hone_number : string, N umber_students : integer)
r is the current state of the relation. We write relational states in lower case.
Common relational states include q, r and s.
tj represents a tuple. Tuples have not specified orders within the relation.
Example: We have several tuples within the SCHOOL relation (non gray rows).
Attribute
An attribute Ai describes an aspect or role of an unordered set of values within
a relation R. Each attribute has a domain of possible values within the column. The
relation and column name provides meaning to the set of values within the relation.
Example: There are several attributes for each school within our school. These
include:
Domain
A domain D = dom(Ai ) details a set of atomic (indivisible) values for attribute Ai .
As values are atomic, composite and multivalued attributes are not allowed in the
model, but can be integrated with additional representations.
For a domain we can describe the logical definitions, data type and format. We
can also define a range of possible values. The number of values within the domain
dom(Ai ) can be represented as |dom(Ai )| = |D|.
Example:
Acronym: A string that begins with a capital E. No school can have an acronym
larger than four characters.
Phone_number: In Costa Rica, phone numbers are strings of the form nnnn −
nnnn, with n as an integer value. The first n cannot be 0.
38
Tuple
A tuple represents a collection of related data values represented as t = {v1 , v2 , · · · , vm }.
t represents the tuple. We write tuples in lower case. Common tuple names in-
clude t, u and v.
NULL values can represent unknown values, values that do not apply or values
that were not retrieved. However, it has been difficult to represent different
types of NULL values in the relational model. Two NULL values do not mean
that both tuples are the same. It is best to avoid NULL values if possible.
t[Ai ] = t.Ai refers to the value vi of attribute Ai for tuple t. We can represent
multiple attributes as a list separated by commas.
tECCI =< Ciencias de la Computación e Informática, ECCI, 2511 − 8000, 888 >
3.2. Keys
Due to integrity constraints, relations have different types of keys for attributes.
Superkey
A superkey SK specifies the set of attributes that no two tuples in R can have
the same values. Therefore, it specifies uniqueness. Every relation must have one
superkey and they may have redundant attributes.
Example: We can define as a SK the attributes Name and Acronym. Another SK
can be Phone_number as no school can use the same number.
Key
A key of R is a superkey that has the additional property that removing any ad-
ditional attribute from the set of attributes of the superkey will make it no longer
a superkey. Thus it satisfies properties for uniqueness and minimalism. All keys are
superkeys, but not all superkeys are keys.
Example: The set of Name and Acronym is not a key. It is not minimal as we could
remove either Name or Acronym. However, Phone_number is a key as it is unique
and we cannot remove any attribute while maintaining its uniqueness property.
Primary key
A relational schema can have more than one key, with each key being called a
candidate key. One of the candidate keys is designated as the primary key that is
used to identify the relation. We represent this by underlining the attributes of the
key in the relational schema. The non-primary candidate keys are considered unique
keys that are not represented in the model.
39
Though in theory we can select any candidate key as the primary key, it can be
recommended to select an integer as they are faster to find while stored. However,
the relational model does not really recommend this as it is supposed to represent
business rules.
Example: Both Acronym and Phone_number are candidate keys. We can select
the Acronym as the primary key, leaving Phone_number as a unique key.
Foreign key
A foreign key in R1 references a relation R2 by storing the primary key of R2 in an
attribute in R1 . Thus, there is a relation between the tuples of R1 and R2 that have the
same key.
Example: SCHOOL could have another attribute that references the DEPART-
MENT of each school. Thus, SCHOOL could have a foreign key attribute that re-
ferences DEPARTMENT.
3.3. Constraints
For our database, we shall define several restrictions or constraints on the values
in the database states due to business rules or model restrictions.
Domain constraints. For each tuple t, the value vi of attribute Ai must be atomic
and from the domain of Ai .
Not NULL constraints. Attributes define if NULL values are permitted and must
be followed.
A database must be in valid in every relational state for every relational schema
based on the integrity constraints. A database that does not obey all integrity cons-
traints is not valid.
40
Application-based or semantic constraints
Constraints that cannot be explicitly expressed in the data model schema. These
must be informed either in the application or with SQL (triggers or assertion). As
business rules may be complex to implement in the database, it is common that they
are specified in the applications. However, constraints should always be defined at
the database level if possible. One could also implement these constraints at several
levels to ensure the validity of the database.
3.4. Mapping ER
For most of the following examples for mapping between the ER to the relational
model, we shall use Fig. 3.2 (variation of Fig. 2.2).
1 1
IDENTIFICATION HAS STUDENT Name
N
Last name
Image
TAKES
ID Acronym
M
N M N 1
INSTRUCTOR IMPARTS GROUP HAS COURSE Name
Number Semester
Year
Step 1. Create for every strong entity type E a relation R with all the simple attributes
of E as attributes of R. For composite attributes, select the simple components.
Choose one of the key attributes in E as the key of R, with a composite key in E
using all of the simple attributes as the key of R.
Note: We will also create an attribute in the relation for derivative keys. Though,
we need SQL to implement the logic of the derivative key.
41
STUDENT
Email First_name Last_name Age
IDENTIFICATION
Carnet Image
INSTRUCTOR
ID
COURSE
Acronym Name Credits
Figura 3.3: Relations created in step 1
Step 2. Create for every weak entity type W a relation R with all the simple attributes
or simple components of composite attributes of W as attributes of R. The key
of W is the combination of the primary key of the owner entities and the partial
key of the weak entity W .
GROUP
Number Semester Year Acronym
Figura 3.4: Relations created in step 2
Step 3. For every binary 1:1 relationship R, identify the entity types S and T that parti-
cipate in the relationship. There are three possibilities:
Foreign key approach. For the relation related to the entity type S, that prefe-
rible has total participation to reduce NULLs in the database, add a foreign
key of T in S. Furthermore, add the simple attributes or components as at-
tributes in S. This is the most common approach used.
IDENTIFICATION
Carnet Image Student_email
Figura 3.5: Relations updated in step 3 with the foreign key approach
Merged relation. If both participations between entities are total, we can com-
bine the entity types S and T into a single relation.
STUDENT
Email First_name Last_name Age Carnet Image
Figura 3.6: Relations updated in step 3 with the merged relation approach
42
The primary key of U will be a one of the foreign keys, while the other foreign
key will be a unique key (as it is a 1:1 relationship). We save the simple
attributes or components inside the new relation U .
HAS
Identification_Carnet Student_Email
Figura 3.7: Relations updated in step 3 with the cross-reference approach
Step 4. For every binary 1:N relationship R, we apply either the foreign key or the cross-
reference approach. The entity S selected to save the foreign key or the primary
key, respectively, is the entity at the N -side of the relationship. For example, if
GROUP was not weak this would be the entity that would either save the foreign
key or have a new table as a primary key.
Step 5. For every binary N:M relationship R, we apply the cross-reference approach. We
use both keys of the relations S and T as the primary key.
IMPARTS
ID Number Semester Year Acronym
TAKES
Email Number Semester Year Acronym
Figura 3.8: Relations created in step 5
Step 6. For every multivariate attribute, create a new relation R with attributes of the
multivariate attribute and a foreign key to the table S with the multivariate at-
tribute.
INSTRUCTOR_UNIVERSITY_DEGREES
ID University_degree
Figura 3.9: Relations created in step 6
Step 7. For every n-ary relationship above n > 2, we follow repeat step 5 but with the
primary key being all the keys of all participating entities. The participating
entities that have a cardinality of 1 are foreign keys, but not part of the primary
key.
43
STUDENT
Email First_name Last_name Age Carnet Image
INSTRUCTOR
ID
COURSE
Acronym Name Credits
GROUP
Number Semester Year Acronym
IMPARTS
ID Number Semester Year Acronym
TAKES
Email Number Semester Year Acronym
INSTRUCTOR_UNIVERSITY_DEGREES
ID University_degree
Figura 3.10: Relational model after mapping ER
44
3.5. Mapping EER
To map the EER, we can extend the previous steps with an additional step.
Step 8 Choose one of the following mapping schemes, based on the type on the cons-
traints of the subclasses:
The relationship between the constraints and the type of mapping scheme is
shown in Table 7.1.
Disjoint Overlapping
8A ✓ ✓
8B ✓ ✗
8C ✓ ✗
8D ✓ ✓
Figura 3.11: Relationship between the mapping scheme of the subclasses with the
constraints
3.6. Review
Simples
Sistema de bares
45
Sistema de ligas
De: Dra. Alexandra Martínez
Sistema de genealogía
De: Dra. Alexandra Martínez
Parte 1:
Parte 2:
46
Sistema de adaptaciones
47
Sistema de cartas al estudiante
Mediano
Sistema de bibliotecas BIBS
48
Sistema de supermercados SUPERK
De: Dra. Alexandra Martínez
Sistema de Conferencia
De: Dra. Alexandra Martínez
Sistema de Buques
De: R. Elmasri and S. Navathe, Fundamentals of database systems, 7th ed.
49
Sistema de Proyectos
50
Sistema de Propiedades
51
Sistema de Twitter
52
Sistema de Spotify
Sistema de Criptomonedas
Versión 1:
Versión 2:
53
Sistema de Pizza
54
3.7. Solutions
Sistema de bares
Sistema de ligas
Sistema de genealogía
55
Sistema de Conferencia
Sistema de Conferencia
56
Sistema de Buques
Sistema de Proyectos
57
Sistema de Propiedades
58
Sistema de Pizza
59
60
Parte III
Implementación
61
Capítulo 4
SQL Básico
Cuadro 4.1: Relationship between the terms of the relational model and SQL
Within SQL, there are different structures that represent the database, as shown
in Fig. 4.1. In the following subsections, each of these levels are detailed.
63
Database
Catalog
Schema
Tables & views
Columns & rows
Database
1
We can CREATE an instance of a database with:
USE DB_SIVANA;
Within SSMS, we can see all the databases within the Object Explorer > Databa-
ses, as shown in Fig. 4.2.
1
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-database-transact-sql
64
Catalog
A catalog is a collection of schemas 2 . All the schemas within the catalog are saved
in the INFORMATION_SCHEMA. We can find this schema in Database > Security >
Schemas (Fig. 4.3).
Schema
An SQL Schema represents a group of tables or views that belong to the same
database application. It has a:
Name: Way to identify the Schema
Authorization identifier: Indicates the user who owns the schema.
Descriptors: To define the elements within the schema.
3
We can CREATE the schema with:
We can later add the descriptors by ALTERing the schema 4 and delete the schema
by DROPping it 5 . We can determine the current schema with:
SELECT SCHEMA_NAME();
65
ALTER USER ‘sivana’ WITH DEFAULT_SCHEMA=UNIVERSITY;
We can find the schemas in Database > Security > Schemas (Fig. 4.4).
Implicitly, as we are within the default schema (dbo), while we create the table it
executes the query as if it was:
We can see the result of creating the table in Database > Tables (Fig. 4.5). There
are different types of tables for SSMS 9 . If we expand a table, we can also see its
columns, keys, constraints, triggers, indexes and statistics. Double clicking on any of
these elements will show more details for each of them.
6
https://docs.microsoft.com/en-us/sql/relational-databases/tables/
primary-and-foreign-key-constraints
7
https://docs.microsoft.com/en-us/sql/relational-databases/tables/
unique-constraints-and-check-constraints
8
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
9
https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables
66
Figura 4.5: Schemas in SSMS
To define a foreign key, the table must already exist. Therefore, if there is a foreign
key f k1 for table T1 that depends on a foreign key f k2 for table T2 such that f k1 refe-
rences T2 and f k2 references T1 . Then, the restriction can be defined by ALTERing 10
the table. If we define a column as a primary key, it will automatically add the NOT
NULL constraint.
We declare bits with a B preceeding the digits within apostrophes (e.g., B‘101’).
For dates, we can also use timezones with TIME WITH TIME ZONE. Without it,
the default is the local time zone for the SQL session.
10
https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql
67
Category Data type Description Types
An integer number INT or INTEGER
Integers
SMALLINT
A real number FLOAT or REAL
Real
DOUBLE PRECISION
Numeric
Formated number,
where i is the num-
Formated DECIMAL/DEC/NUMERIC(i, j)
ber of decimal digits
and j the decimal
digits after the deci-
mal point
Fixes a char string to
Fixed length have n chars. If shor- CHAR(n)
ter, it is padded with
spaces to the right.
Chars
Strings can have a
Variable length VARCHAR(n)
variable length of
maximum n chars
Specifies large texts
Large object like documents, CLOB(ns)
where n is a size
and s the size type
(K,M,G)
Fixed length Fixes a bit string to CHAR(n)
have n bits.
Strings can have a
Variable length BIT VARYING(n)
variable length of
Bits
maximum n bits.
Specifies large bit
Large object string, where n is a BLOB(ns)
size and s the size ty-
pe (K,M,G)
Boolean that can be
Boolean BIT
either TRUE or FAL-
SE
Date Given in ‘YYYY-MM- DATE
Dates
DD’.
Time Given in ‘HH-MM- TIME
SS’.
68
11
Some implementations of SQL include more or less types (e.g., SQL does not
have an explicit BOOLEAN data type).
4.4. Constraints
We can define several constraints while creating tables. In the following subsec-
tions, different commands are explained.
NOT NULL
For an attribute, we can define a NOT NULL constraint to not allow NULL values.
This is always implicitly declared for the primary key of the relation. We can see an
example of a NOT NULL constraint for COURSE.
We can see that we can define for the columns NAME and CREDITS the constraint
after specifying the name and data type. For the primary key, it is not necessary to
specify the constraint as it is done implicitly. We can see within SSMS the restric-
tion for the columns with the constraint in Database > Tables > Table > Columns >
Column, as shown in Fig. 4.6.
DEFAULT
A default value can be set for a column, with the DEFAULT command. If no default
is set for a column, the default is NULL for all the columns without the NOT NULL
constraint. We can see an example of this constraint for COURSE.
11
https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql
69
We set that the CREDITS column will have as a default value a four. We can see
within SSMS the restriction for the columns with the constraint in Database > Tables
> Table > Constraints, as shown in Fig. 4.7.
CHECK
We can use a CHECK constraint to define a domain, row-based and general cons-
traints. We can see an example of a domain constraint for COURSE.
With this constraint, a CREDITS for a COURSE has to be 0 < CREDIT S < 13. We
can see within SSMS the restriction in Database > Tables > Table > Constraints, as
shown in Fig. 4.8.
We can also define a tuple constraint, where each row that is inserted or updated
is checked individually. For example, if we add for _GROUP columns for the num-
ber of spaces available (NUMBER_AVAILABLE_SPACES) and the number of enrolled
students (NUMBER_ENROLLED). Thus, we can define a row-based constraint to not
allow more students enrolled than the available spaces as follows:
70
CHECK (NUMBER_ENROLLED <= NUMBER_AVAILABLE_SPACES);
PRIMARY KEY
We can define a PRIMARY KEY constraint while creating the column for the table.
We can see an example of this constraint for COURSE.
We can also define a primary key of several columns after defining the columns,
as seen in the following example for GROUP.
We can see within SSMS the restriction in Table > Columns and Table > Keys, as
shown in Fig. 4.9.
UNIQUE
The UNIQUE clause defines candidate keys that are unique for the table. We can
see an example of this constraint for COURSE.
A UNIQUE constraint can also be defined for a group of columns. This can be de-
fined with the following structure, putting the list of columns inside the parentheses:
71
CREATE TABLE COURSE (
NAME VARCHAR(50) NOT NULL,
UNIQUE (NAME),
...
);
We can see within SSMS the restriction in Table > Keys, as shown in Fig. 4.10.
FOREIGN KEY
A FOREIGN KEY constraint can be defined for a table. This constraint ensures
that the constraint is followed while tuples are added or deleted, or when a foreign
key or primary key is updated. We can see an example of this constraint for GROUP.
We can see within SSMS the restriction in Table > Columns and Table > Keys, as
shown in Fig. 4.11.
We can define different types of foreign key constraints. The referential trigger
actions defines the behavior during changes to the foreign key:
NO ACTION: An error is raised and the state of the database is rolled back.
This is also known as the RESTRICT option and is the default behavior.
CASCADE: When a parent table updates or deletes a foreign key, the reference
table updates the foreign key. This does not work if the referenced or foreign
key is a TIMESTAMP.
72
SET NULL: When a parent table updates or deletes a foreign key, the reference
table sets to NULL the foreign key.
SET DEFAULT: When a parent table updates or deletes a foreign key, the re-
ference table sets to the default values the foreign key. For this constraint to
work, all columns of the foreign key must have a default. If a column is nullable
without an explicit default value, the default value is NULL.
These actions can be combined for different FOREIGN KEY restrictions. Further-
more, restrictions can fire secondary cascading chains and can be subsequently re-
peated. If the SET NULL, SET DEFAULT or CASCADE action triggers a table with
NO ACTION, all the updates are stopped and rolled back. These behaviors can be
defined for updates (ON UPDATE) or deletes (ON DELETE). We can see how this
works in the following example:
CONSTRAINT
For every constraint, we can define a name by specifying the keyword CONS-
TRAINT, then the name of the constraint and then the constraint to be named. We
can see an example for the COURSE table.
We can see within SSMS the constraint with the name, for this example, in Table
> Keys, as shown in Fig. 4.12.
73
INSERT
We can INSERT 12 into a table rows. It is possible to bulk insert rows 13 . The block
of commands are the following:
INSERT INTO: Details the table that rows will be inserted. A list of attributes
can be given inside the parentheses to define the order of the columns, but is
optional.
VALUES: Details the values that will be inserted. The order is given either impli-
citly (defined by the order in which the table was defined) or explicitly (defined
by the order given in the attribute list).
The first way to insert is to detail the table and the values for the columns, in the
same way it was defined. For example, here we can see the result of inserting a row
into COURSE.
We can also INSERT by explicitly mentioning the order of the columns while in-
serting. It is necessary to specify all the columns that cannot be NULL and have no
default. If a column can be NULL or has a default and is not specified, it will automa-
tically add a NULL or the default value, respectively. This explicit INSERT is shown
in the following example:
DELETE
We can DELETE 14 delete current rows of a table. This may activate constraints.
The full block of commands are the following:
DELETE FROM: Defines the table that rows are deleted from.
WHERE: Details a boolean condition that identifies the rows that will be deleted.
This clause is optional.
We can delete all the rows in a table with the following command:
12
https://docs.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql
13
https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql
14
https://docs.microsoft.com/en-us/sql/t-sql/statements/delete-transact-sql
74
DELETE FROM COURSE;
DELETing rows does not remove the table definition in the database. To delete a
table, we must DROP it as follows:
DROP COURSE;
To DELETE only certain rows from a table, we can define a condition using the
WHERE clause.
UPDATE
We can also UPDATE 15 column values for rows. This command is similar to DE-
LETE, but has a SET clause that defines the values of the updated columns. The full
block of commands are the following:
UPDATE <table>
SEMESTER <list of columns with values>
WHERE <condition>
SET: Details the list of columns with the values to update them.
WHERE: Details a boolean condition that identifies the rows that will be upda-
ted. This clause is optional.
UPDATE COURSE
SET CREDITS = 5;
We also define that only rows that meets a condition will be updated with the
WHERE clause, as follows:
UPDATE COURSE
SET CREDITS = 10
WHERE ACRONYM = ’CI0127’;
We can also modify the current value of a column, using the SET clause as follows:
UPDATE COURSE
SET CREDITS = CREDITS + 1;
15
https://docs.microsoft.com/en-us/sql/t-sql/queries/update-transact-sql
75
4.6. Data retrieval
To view the rows that are in the database, we can use the SELECT 16 command.
This command has multiple clauses, with the only required clauses being SELECT
and FROM. The block of commands are the following (we will see some more in
advanced SQL):
SELECT: Details the list of columns that will be retrieved from the tables. All
the columns can be retrieved with *.
WHERE: Details a boolean condition that identifies the rows that will be retrie-
ved. This clause is optional.
ORDER BY: Defines the order of the rows retrieved. This clause is optional.
We can define a simple query to select all the rows from a table that meet a condi-
tion. This is shown in the following example where all the COURSEs’ ACRONYMs for
COURSEs with more than three CREDITS are gathered. To compare values within the
WHERE clause we use comparison operators. The comparison operators grammar
for SQL and the respective math representation are shown in Table 4.3.
SELECT ACRONYM
FROM COURSE
WHERE CREDITS > 3;
Math SQL
= =
̸= <>
< <
> >
≤ <=
≥ >=
We can also define a SELECT without a WHERE clause to select all the rows in
a table as shown in the following example. Here, we gather all the ACRONYMs for
every row in COURSE.
SELECT ACRONYM
FROM COURSE;
16
https://docs.microsoft.com/en-us/sql/t-sql/queries/select-transact-sql
76
We desire to get all the columns that we can after an operation. One way to do
this is to specify all the columns manually. We could also use * in the SELECT clause,
specifying that all the columns will be retrieved. In the following example, the attri-
butes ACRONYM, NAME and CREDITS are gathered for all the rows in COURSE that
have more than three credits.
SELECT *
FROM COURSE
WHERE CREDITS > 3;
SELECT *
FROM COURSE
WHERE (CREDITS BETWEEN 3 AND 7);
In the FROM clause we can put more than one table as a list of tables. In the fo-
llowing example, we retrieve the combinations of COURSEs and GROUPs, gathering
the NAME, CREDITS, NUMBER, SEMESTER and YEAR. The combinations retrieved
have no relationship if the tables reference one another, they are all the possible
combinations between both tables.
Implicitly, if there are no attributes that have the same name within the list of ta-
bles it treats the column as if it said “Table.Column”. In our current example query, an
implicit column is NAME that is treated as COURSE.NAME. If there are two columns
with the same name in the list of tables, we must specify explicitly this reference.
If we want to specify the tables based on a condition between both tables, we can
do it with a join condition. Thus, in the WHERE clause we specify the column in a
table that relates with another table. In the following example we can see the join
condition for “C.ACRONYM = G.ACRONYM”. Note that the keyword AS creates an
alias for the tables that can be used to explicitly state the column.
In a WHERE clause we can combine various conditions with ANDs and ORs,
following boolean logic. There is also a NOT condition. We can see in the following
example how we can specify the join condition, the GROUP’s semester must be equal
to one and the GROUP’s year must be equal to 2022.
This query can be further refined to specify the order of the results with the OR-
DER BY clause. This will order the rows starting from the leftmost element of the
77
list of columns to the rightmost (i.e., order first by the first column, second by the
second column and so forth). We can further define the order with ASC (ascending
order) and DESC (descending order). The default order is ASC. An example is shown
in the following query, where the rows are ordered in ascending order by year and
then semester.
SELECT NAME, CREDITS, NUMBER
FROM COURSE AS C, _GROUP AS G
WHERE SEMESTER = 1 AND YEAR = 2022 AND C.ACRONYM = G.ACRONYM
ORDER BY YEAR, SEMESTER;
When we retrieve values from columns, we are getting all the values from all the
rows, but not the unique values within the current database state. Thus, to gather the
distinct values in the SELECT clause the DISTINCT keyword is stated before the
column name. In the following example, we can see how we can gather the distinct
YEAR values in the rows of the GROUPs.
SELECT DISTINCT YEAR
FROM _GROUP;
Character string columns can be pattern matched. We can use the % and LIKE
keywords to match an arbitrary number of zero or more characters. In the following
example, only the COURSEs with ACRONYMs that start with ‘CI’ (computer science
courses) are matched.
SELECT *
FROM COURSE
WHERE ACRONYM LIKE ’CI %’;
We can also specify the specific number of characters to match with an underscore
(_) for each character. In the following example, only the COURSEs with ACRONYMs
that start with ‘CI’ (computer science courses) and proceeded by four blank charac-
ters are retrieved.
SELECT ACRONYM
FROM _GROUP
WHERE ACRONYM LIKE ’CI____’;
SET operations can be used with SELECT queries. The operations that are defined
in SQL are:
UNION: For sets A and B, this operation includes all the elements in either A
or B are included in the resulting set.
INTERSECTION: For sets A and B, this operation includes all the elements that
are in both A and B are included in the resulting set.
EXCEPT: For sets A and B, this operation includes all the elements that are A
without any element that is shared with B.
If we define these operations as they are, they will generate the rows without dupli-
cates. If we add the keyword ALL (e.g., UNION ALL) duplicates are not eliminated.
The tables must be type-compatible relations, thys must have all the same attributes
in the same order. In the following example, the GROUPs that were imparted in the
SEMESTER 1 and the YEAR 2022 are retrieved (this example can also be done with
an AND).
78
(SELECT DISTINCT ACRONYM, NUMBER
FROM _GROUP
WHERE SEMESTER = 1)
INTERSECTION
(SELECT DISTINCT ACRONYM, NUMBER
FROM _GROUP
WHERE YEAR = 2022);
17
https://docs.microsoft.com/en-us/sql/t-sql/statements/grant-transact-sql
18
https://docs.microsoft.com/en-us/sql/t-sql/statements/revoke-transact-sql
79
80
Capítulo 5
SQL Avanzado
5.1. NULL
In SQL, if the column does not have a NOT NULL constraint, we can have a NULL
value.
A NULL value can mean that the value is unknown, value withheld or it is not
applicable. However, we do not know the type of NULL as we save the same
NULL value for all cases.
As we could compare NULL values, SQL uses a three-valued logic (TRUE, FALSE
or UNKNOWN) instead of boolean logic (TRUE or FALSE). Thus, the used truth
tables in SQL are shown in Tables 5.1, 5.2 and 5.3.
TRUE FALSE UNKNOWN
TRUE TRUE FALSE UNKNOWN
FALSE FALSE FALSE FALSE
UNKNOWN UNKNOWN FALSE UNKNOWN
TRUE FALSE
FALSE TRUE
UNKNOWN UNKNOWN
While SELECTing tuples in a WHERE clause, only tuples that are TRUE are
selected. There is an exception for outer joins.
To compare nulls we use the IS command. The following example shows how it
is used, to find all the CARNETs that have NULL IMAGEs.
81
SELECT CARNET
FROM IDENTIFICATION
WHERE IMAGE IS
NULL;
5.2. Nested
We can nest the result of a query to be used in another query. The outside query
is called the outer query, while inside the query is the innermost query. When a
nested query return a single attribute and tuple, we can use a = comparator. In any
other case, we shall use the IN to compare a value within a set of values and returns
multiple tuples. In the following example, we can see how we can get all the COURSEs
NAMEs and CREDITS that have a GROUP in the first SEMESTER of the YEAR 2022.
We can also compare multiple values using tuples as (column1 , column2 , · · · columnn )
in the outer query’s WHERE clause. There are other comparison operators that can
be used, as shown in Table 5.4.
Operator Description
IN, ANY, SOME Returns the tuples if we find a value IN the set of values of the
inner query.
ALL With a comparison operator, returns the tuples that are more,
less, equal, ... than ALL the innermost tuples.
EXISTS Returns the tuples if tuples EXIST in the innermost query.
NOT EXIST Returns the tuples if tuples does NOT EXIST in the innermost
query.
NOT EXIST Returns TRUE if there are no duplicate tuples in the innermost
query.
Figura 5.4: Possible compators with nested queries with multiple tuples
82
5.3. JOIN
We can JOIN tables (can be more than two) together in the FROM clause of a
SELECT operation. We can see in the following example how we can specify a join,
to find the GROUP’s information for those given in the first SEMESTER of the YEAR
2022.
There are different types of JOINs. We shall assume that the relation R is joined
to S, such that R is the leftmost relation and S the rightmost one.
Function Description
COUNT COUNTs the number of tuples in a query.
SUM SUMs the values of a column in a query (must be numeric).
MAX Gets the MAX value of a column in a query (must be numeric).
MIN Gets the MIN value of a column in a query (must be numeric).
AVG Gets the AVG of the values of a column in a query (must be nume-
ric).
There are several ways we can use these functions. For example, we can gather
the SUM, MAX, MIN and AVG of CREDITS for COURSEs.
83
SELECT MAX(CREDITS), MIN(CREDITS),
SUM(CREDITS),
AVG(CREDITS) AS AVG_CREDITS
FROM COURSE;
We can also count the number of current COURSEs saved in the database with
the following query.
SELECT COUNT(*)
FROM COURSE;
Thus, the description for the two new clauses (GROUP BY and HAVING) is the
following:
GROUP BY: Groups tuples based on a grouping condition. The attributes within
the grouping attribute must appear in the SELECT clause, with the desired ag-
gregate functions. A separate GROUP is created for NULL values in a condition.
We can see an example of the GROUP BY clauses in the following example. Here,
we add an AREA_NUMBER to COURSEs that has the AREA in which the course is
given. For example, computer science courses are within the engineering area. Thus,
we group all the COURSEs by the area and then gather this number, the number of
courses given by area and the average number of CREDITS.
The following example shows how to filter with the HAVING clause. This adds
from the previous example, where now the AREA_NUMBER that have an average
less than or equal to 3 are filtered from the results.
84
SELECT AREA_NUMBER, COUNT(*), AVG (CREDITS)
FROM COURSE
GROUP BY AREA_NUMBER
HAVING AVG (CREDITS) > 3;
5.6. Assertions
General constraints can be specified using declarative assertions using the CREA-
TE ASSERTION statement. The following shows an example of an assertion named
POSITIVE_CREDITS that checks that no COURSEs have non-positive CREDITs.
This constraint is only checked when tuples are inserted or updated in a specific
table.
CREATE ASSERTION should only be used when it is not possible to define a
CHECK.
The technique to write an ASSERTION is to define a query that selects all the
tuples that would violate the condition, by using the NOT EXISTS clause.
The form of an assertion is:
5.7. Triggers
Triggers 1 are constraints that are used to monitor updates, maintain consistency
of derived data and update derived data the database. We can create this constraint
with the CREATE TRIGGER command. Triggers have three components:
EVENTS (E): The events (e.g., INSERTs, UPDATEs, DELETEs, temporal period
time) that activate the trigger. The trigger can be activated BEFORE, INSTEAD
OF, or AFTER the operation that activates the trigger is executed.
CONDITION (C): Determines if the action rule shall be executed if a condition
is true, detailed in the WHERE clause. This component is optional and if it is
not defined, the action will be executed once the event is activated.
ACTION (A): SQL statements that are executed when the trigger is activated.
If a trigger has the optional FOR EACH ROW clause, then it is activated sepa-
rately for each tuple and is known as a row-level trigger. If not, it will be activated
once per statement and be known as a statement-level trigger.
1
https://docs.microsoft.com/en-us/sql/relational-databases/triggers/dml-triggers
85
5.8. Views
A view 2 is a table that is defined from other tables (base tables), creating a vir-
tual table that is not physically stored in the database. Thus there are limitations to
updating views, but not querying views. The tables used in the view are called the
defining tables. The following is an example of a view named COURSE_GROUPS that
saves the NAME, ACRONYM and number of groups of COURSEs for each COURSE.
We can execute queries over views. We can see an example where all the course
names and number of groups are retrieved for the view.
If none of the view attributes are results of aggregate functions, the view name
attributes do not need to be specified as they are inherited.
Views simplify certain queries and serve as security or authentication mecha-
nisms. With the latter, only certain access can be given to attributes.
As views have to be up-to-date, they are materialized when a query is specified
to the view.
We can remove views with the DROP VIEW command.
Usually, views data can be modified if it only has only one table. This is the case
for SQL Server 3 .
Within SSMS, we can see all the views in Databases > Views, as shown in Fig. 5.6.
86
Figura 5.6: Views in SSMS
Parameters have a name, a type (SQL data types) and a mode. The mode is IN
(input only), OUT (output only), INOUT (both input and output). Thus, if a para-
meter has IN it can receive the parameter, while OUT indicates that it returns
it.
The difference with both of these modules is that functions return data, while
stored procedures do not.
These persistent stored modules are stored persistently in the database server
and are useful if a database program is needed by several applications (reduces
effort and improved modularity), in certain situations reduce data transfer costs
between client and server, and enhance modeling power.
Within the declaration of the persistent stored modules, we can have control-
flow statements 6 .
Within SSMS, we can see all the stored procedures and functions in Databases
> Programmability > Stored Procedures and Databases > Programmability >
Functions, respectively. This is shown as shown in Fig. 5.7.
6
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/control-of-flow
87
Figura 5.7: Persistent stored modules in SSMS
88
Parte IV
Cálidad
89
Capítulo 6
2. The implementation or physical level on how tuples are stored and updated.
STUDENT_SCHOOL_FACULTY
Email Name Phone_number Acronym Number_students
91
Guideline #2: Avoid redundancy
If data in the database is redundant (storing the result of natural joins), we are
wasting space and may lead to several anomalies. For example, the following table
shows how the name, phone and faculty name for the “ECCI“” school is duplicated
in t1 and t2 . Furthermore, the name of “Bob Benavidez” is repeated, though they can
be recognized only with their email.
STUDENT_SCHOOL_FACULTY
Student_mail Student_name School_acronym School_phone Faculty_name
alicia.armando23 Alicia Armando ECCI 2511-8000 Ingenería
bob.benavidez Bob Benavidez ECCI 2511-8000 Ingenería
bob.benavidez Bob Benavidez EMat 2511-6551 Ciencias Básicas
carlos.calvo Carlos Calvo EAN 2511-9180 Ciencias Sociales
daniela.delgado Daniela Delgado EMat 2511-6551 Ciencias Básicas
Deletion: While deleting certain tuples, we can lose all the information in the
database. For example if we delete “Carlos Calvo”, we lose all the information
related to the school “EAN”.
STUDENT
Email Name Position_association Fax_number Kindergarden_grade Instagram
92
STUDENT
Email Name Faculty_name
alicia.armando23 Alicia Armando Ingenería
bob.benavidez Bob Benavidez Ingenería
bob.benavidez Bob Benavidez Ciencias Básicas
carlos.calvo Carlos Calvo Ciencias Sociales
daniela.delgado Daniela Delgado Ciencias Básicas
SCHOOL
School_acronym School_phone Faculty_name
ECCI 2511-8000 Ingenería
EMat 2511-6551 Ciencias Básicas
EAN 2511-9180 Ciencias Sociales
ECCC 2511-3600 Ciencias Sociales
If we join both tables on the attribute Faculty_name, we will generate a false tuple
marked in red that did not previously exist.
STUDENT_SCHOOL_FACULTY
Student_mail Student_name School_acronym School_phone Faculty_name
alicia.armando23 Alicia Armando ECCI 2511-8000 Ingenería
bob.benavidez Bob Benavidez ECCI 2511-8000 Ingenería
bob.benavidez Bob Benavidez EMat 2511-6551 Ciencias Básicas
carlos.calvo Carlos Calvo EAN 2511-9180 Ciencias Sociales
carlos.calvo Carlos Calvo ECCC 2511-3600 Ciencias Sociales
daniela.delgado Daniela Delgado EMat 2511-6551 Ciencias Básicas
All guidelines
Guideline #1: While designing the relational schema, the attributes should be easy
to understand and explain. Thus, every entity or relationship has a straightforward
meaning representing only one concept.
Guideline #2: Design to avoid insertion, deletion or modification anomalies. If not
possible, clearly state the anomalies and update the database correctly.
Guideline #3: Avoid NULL attributes while possible. If not possible, ensure that
they do not apply to the majority of tuples in the relation.
Guideline #4: Design relation schemas so that they can be joined correctly, using
the appropriate primary and foreign keys that do not generate false tuples.
93
X → Y does not indicate that Y → X is true.
All candidate keys X can determine the attributes of any attribute Y in R. The-
refore, X → R.
Combining normal forms, ensuring to not create false tuples (nonadditive join
or lossless join) and preserving most dependencies (dependency preservation
property), we can improve the database design. We do not need to ensure the
dependency preservation property in the resulting design.
There are various normal forms, but we shall see in detail the first three: 1NF,
2NF and 3NF.
Though some normal forms do not need to satisfy the requirements of a lower
level normal form, for historical reasons it is customary that the sequence is
followed. Thus, 3NF already satisfies 1NF.
1NF
Every attribute within a relation must be atomic (a single value for any attribute)
or nested relations (tuples with relations within it). This is considered a requirement
for any flat relational model. Every attribute that does not follow this restriction is
moved to another relation that has a foreign key to the primary key of this relation.
Thus separating the non-1NF relation into two 1NF relations.
2NF
Every attribute that is part of a candidate key is a prime attribute. While, attributes
are non prime if they are not prime. 2NF expects every non prime attribute in a
relation to only be fully dependent on the primary key of the relation. Thus, for every
primary key a new relation is created with all the nonprime attributes that depend
on that primary key (based on the FD).
94
3NF
There is a transitive dependency from X → Z, when there exists a FD from X → Y
and Y → Z where Y is a non prime attribute.
3nd expects that every non prime attribute in a relation has a FD with the primary
key. As in, there can be no transitive dependencies from the primary key of the rela-
tion. To normalize this dependency, we create a new relation with the primary key as
the Y in the transitive relation and leave the attribute in the previous relation.
6.4. Review
Guías de diseño informales: Errores
1. Se tiene el siguente diseño relacional de un sistema de bugs.
BUGS_CODE:
BugID BugDescription LineNumber FileName
VACCINE_HOSPITAL_PACIENT:
ID Name PhoneNumber Dosage Notes
PLAYER:
ID Name
MATCH_PLAYER:
MatchID PlayerID Kills Deaths Assists
95
(b) Justifique porque se cumplen o no esas guías de diseño informales.
(c) Proponga las mejoras de diseño correspondientes en el caso de que no sigan
las guías de diseño informal.
BANK:
ID PhoneNumber Location
BANK_CLIENT:
CITIZEN:
ID ResidenceTown BirthDate
VOTE:
VoteID ResidenceTown ElectedCandidate
A B C
10 x hello
20 y bonjour
30 y hola
96
A B C D
triangle 100 red 3
circle 20 white 0
rectangle 100 blue 4
circle 30 white 0
square 90 black 4
A B C D E
Costa Rica America español True S
Japón Asia japonés False L
España Europa español True M
Kenya Africa swahili False M
Australia Oceania inglés False S
97
(l) ¿Existe una dependencia funcional de {C, D, E} → B? ⃝ Si ⃝ Puede
⃝ No
(m) ¿Existe una dependencia funcional de {B, C, D} → E? ⃝ Si ⃝ Puede
⃝ No
(n) ¿Existe una dependencia funcional de {B, C, D} → A? ⃝ Si ⃝ Puede
⃝ No
(ñ) ¿Existe una dependencia funcional de {A, B, C, D} → E? ⃝ Si ⃝ Puede
⃝ No
(o) ¿Existe una dependencia funcional de {A, B, D, E} → C? ⃝ Si ⃝ Puede
⃝ No
A→B (FD1)
{A, B} → C (FD2)
C→B (FD3)
(a) Detalle la clausura de {A}.
(b) ¿{A} es una posible superllave de α? ⃝ Si ⃝ No
(c) Detalle la clausura de {C}.
(d) ¿{C} es una posible superllave de α? ⃝ Si ⃝ No
{A, D} → B (FD1)
{C, D} → E (FD2)
B→A (FD3)
A→D (FD4)
{D, B} → C (FD5)
(a) Detalle la clausura de {B, D}.
(b) ¿{B, D} es una posible superllave de β? ⃝ Si ⃝ No
(c) Detalle la clausura de {C, D}.
(d) ¿{C, D} es una posible superllave de β? ⃝ Si ⃝ No
(e) Detalle la clausura de {A}.
(f) ¿{A} es una posible superllave de β? ⃝ Si ⃝ No
{T, U, V } → W, X (FD1)
W →S (FD2)
R → S, V (FD3)
98
{X, Z} → Y (FD4)
Z → T, R (FD5)
{S, R} → U (FD6)
(a) Detalle la clausura de {T, U, V }.
(b) ¿{T, U, V } es una posible superllave de γ? ⃝ Si ⃝ No
(c) Detalle la clausura de {R, T, Y }.
(d) ¿{R, T, Y } es una posible superllave de γ? ⃝ Si ⃝ No
(e) Detalle la clausura de {Z}.
(f) ¿{Z} es una posible superllave de γ? ⃝ Si ⃝ No
(g) Detalle la clausura de {R, W }.
(h) ¿{R, W } es una posible superllave de γ? ⃝ Si ⃝ No
Y →Z (FD1)
{X, W } → Z (FD2)
Y →X (FD3)
(a) Marque cuales de las siguientes son superllaves de δ. ⃝ X ⃝ W ⃝ Y
⃝ Z ⃝ {X, W } ⃝ {X, Y } ⃝ {X, Y, Z}
(b) Marque cuales de las siguientes son llaves canidatas de δ. ⃝ X ⃝ W
⃝ Y ⃝ Z ⃝ {X, W } ⃝ {X, Y } ⃝ {X, Y, Z}
{F, G} → K (FD1)
K→J (FD2)
{H, J} → F (FD3)
F → H, I (FD4)
I → G, F (FD4)
(a) Marque cuales de las siguientes son superllaves de ϵ. ⃝ F ⃝ G ⃝ H
⃝ I ⃝ J ⃝ K ⃝ {F, G} ⃝ {G, K} ⃝ {H, J} ⃝ {J, K}
⃝ {G, H, J} ⃝ {G, J, K}
(b) Marque cuales de las siguientes son llaves canidatas de ϵ. ⃝ F ⃝ G
⃝ H ⃝ I ⃝ J ⃝ K ⃝ {F, G} ⃝ {G, K} ⃝ {H, J} ⃝ {J, K}
⃝ {G, H, J} ⃝ {G, J, K}
99
3. Para la relación ζ(A, B, C, D, E, F, G, H, I, J) se tienen las siguientes dependencias
funcionales (F D):
{C, E} → I (FD1)
A→J (FD2)
{A, D} → H (FD3)
{H, I, J} → F (FD4)
D → E, G (FD5)
{D, J} → B (FD6)
(a) Marque cuales de las siguientes son superllaves de ζ. ⃝ A ⃝ E ⃝ I
⃝ {A, B} ⃝ {C, E} ⃝ {C, D} ⃝ {D, E} ⃝ {E, G} ⃝ {F, J} ⃝ {I, G}
⃝ {A, B, C} ⃝ {A, C, D} ⃝ {A, C, E} ⃝ {D, E, G} ⃝ {H, I, J}
⃝ {A, B, C, D} ⃝ {A, D, G, J} ⃝ {G, H, I, J} ⃝ {A, D, E, F, G} ⃝ {E, F, G, H, I}
(b) Marque cuales de las siguientes son llaves canidatas de ζ. ⃝ A ⃝ E ⃝ I
⃝ {A, B} ⃝ {C, E} ⃝ {C, D} ⃝ {D, E} ⃝ {E, G} ⃝ {F, J} ⃝ {I, G}
⃝ {A, B, C} ⃝ {A, C, D} ⃝ {A, C, E} ⃝ {D, E, G} ⃝ {H, I, J}
⃝ {A, B, C, D} ⃝ {A, D, G, J} ⃝ {G, H, I, J} ⃝ {A, D, E, F, G} ⃝ {E, F, G, H, I}
Formas normales
1. Se tienen las siguientes relaciones con su conjunto de dependencias funcionales.
FD2
FD3
100
CountryName ContinentID ContinentName PrimaryLanguage Population
FD1
FD2
FD2
FD3
101
VideoGameName ReleaseYear ConsoleID Genre ConsoleName Sales
FD1
FD2
FD3
FD2
102
Parte V
Funcionamiento físico
103
Capítulo 7
105
Buffer pool
Directory Header
P1 P2 P3 P4 ...
Disk
Database file
In the following sections, the building blocks of the physical memory are detailed:
files, pages and records. Their relationship is shown in Fig. 7.2.
Database
File
Page
Record
Data field Data field Data field
Record
Record
Data field Data field Data field
Record
File Data field Data field Data field
...
106
7.3. Files
Databases store data in files. These files are saved on disk using the OS, howe-
ver, only the DBMS knows how to decipher these files. Some DBMS store files in a
hierarchy, while others store everything in one file.
7.4. Pages
Each file is logically partitioned into fixed-length storage units used for data allo-
cation and transfer called pages. A page is sometimes called a block. Pages may
contain several records. Every page has a unique identifier.
There are several ways a DBMS can locate a page within a file including databa-
se heaps, sorted files, and hashes. DBMS commonly save unordered pages that are
tracked using a page directory (Fig. 7.3). This directory saves the position for every
page, retrieving the specific pages with the identifier. This page directory is the first
page within the file, thus it is easily found.
Directory
# used records
Header
offset last record
Record 4 Record
3 Record 2 Record 1
Figura 7.4: Slotted-page structure for a page
107
7.5. Records
Records are a collection of data items for different fields. They are what we know
as tuples or rows in the relational model and SQL, respectively. Each record is uni-
quely identifiable with an identifier (primary key of the table).
Usually, in a database records of different relations have different sizes. Thus,
there are several approaches used to manage these differences:
Fixed length. We assign a number of bytes n for each record.
Variable length. To represent every record with variable length (Fig. 7.5). Each
record will have a fixed-length header at the begining of the record with details
the offeset (osx ) to the data item and the size of the data item (lx ). Therefore,
every data item has the tuple (of f setx , lengthx ) Followed by this header, the infor-
mation of the data items are stored (dfx ).
Header Data
(os1 , l1 ) (os2 , l2 ) ··· (oi , li ) DF1 DF2 ··· DF3
As we can have more records in a relation than space available in a record, we can
store records in several pages. DBMS tend to save only records for the same relation
in a table. There are two ways we can save records across multiple pages (Fig. 7.6):
Unspanned. If each record is not allowed to be saved in multiple pages.
Spanned. If the record can be divided within several pages. This uses all the
space within the page. Saves more space but everytime we need a record that
is saved in multiple pages, we will need to load several pages.
We can calculate the number of records that fit in a page with the blocking factor
(bf r).
B
bf r = ⌊ ⌋
R
108
B is the number of bytes for a page. This is fixed length (i.e., all pages have the
same size).
bf r is the blocking factor. This details the number of records that fit within a
page.
We can also calculate the number of pages needed to save a number of records.
r
b=⌈ ⌉
bf r
If the records are ordered by attributes and we are searching for them, we can
need to retrieve O(logb) pages (binary search).
If not, we must iterate over every page, in worst case, to find the record. On
average, we will have to iterate over O(n/2) pages.
We can see the file, page and slot (i.e., record) information in SQL Server with the
following command for a table T .
7.6. Indexes
Though files and pages are stored in a certain order physically, databases use an
auxiliary data structure called an index to provide a faster way to find the secondary
access paths of the data within a database without altering the physical structure.
Based on an indexing field or indexing attribute, the index is created to enable fast
access on those fields. The DBMS ensures that the tables and indexes are synchroni-
zed. There is a trade-off between speed and additional resources required (additional
space and more maintenance effort for synchronization). There are two levels of in-
dexes: ordered indexes and multi-level indexes.
109
Physically ordered Physically not or-
by indexing key dered by indexing
key
Index field is key Primary index Secondary index
(key)
Index field is non-key Clustering index Secondary
index(non-key)
Cuadro 7.1: Different types of indexes based on the storage order and indexing field
There are several types of ordered indexes described in the following subsections.
The relationship between these is shown in Table 7.1.
Depending on the type index field, we may have an entry for every search key
value (dense index) or entries for some search key values (sparce or non-dense index).
Primary and clustering indexes are nondense. Secondary indexes are dense for keys,
but may be dense or nondense depending on the sorting order of the records for
non-key attributes.
Primary index
Records are sorted with the index record. The index record uses as a search key
value the key attribute of a relation. There is a search key for the first record within a
page (the anchor record). Thus, there is one key per number of pages in the ordered
data files. The structure is shown in Fig. 7.7.
Clustering index
Records are sorted with the index record. The index record uses as a search key
value a non-key attribute of a relation. As there can be repeated values, the index
saves an entry for each distinct value for the index record, pointing at the page where
the first record is found. The structure is shown in Fig. 7.8.
Secondary index
Records are not sorted with the index record. Either we can use as the search
key value the key or non-key attributes of a relation. Therefore, there needs to be an
index record for every value. The structure is shown in Fig. 7.9.
110
Primary
key
A
B
Block
Search poin- C
key ter
D
A
E
D
F
G
.. ..
. . G
Y
H
I
Primary
..
index .
Y
Z
File
Figura 7.7: Primary index
111
Non-
primary
key
1
1
Block
Search poin- 2
key ter
2
1
2
2
2
4
.. ..
. . 4
10
5
5
Clustering
..
index .
9
10
10
File
Figura 7.8: Clustering index
112
Primary
key
Y
A
Record
Search poin- Z
key ter
P
A
O
B
Q
C
D
K
.. ..
. . X
Z
D
..
Secondary .
index B
C
File
Figura 7.9: Secondary index for a primary key
113
7.9. B+ Tree
A B+Tree is a self-balancing tree that allows insertions and deletions in logn (b) time.
We can see an example in Fig. 7.10. This is a tree with n = 2 keys. Thus every node
has n + 1 = 3 references. Let us define that every node of the tree is a combination of
references (rf ) and values (v). In our example, it has the form (rf1 , v1 , rf2 , v2 , rf3 ).
There is only one inner node for this tree (in this example, the node with keys 5
and 9). Inner nodes are non-leaf nodes. Meanwhile, all the nodes in the lowest level
are leaf nodes. There is a leaf node for every value of the search field with a pointer
to either the page or record where the value of the node is stored. In this example,
we are pointing to every record. The last pointer in a leaf node saves the direction
where the following leaf node of the tree is.
The inner nodes have the following properties.
Any value in the subtree in which rf2 points to is denominated X2 . Thus, V1 <
X2 ≤ V 2 .
Any value in the subtree in which rf2 points to is denominated X3 . Thus, V2 < X3 .
Key-pointer property. Every value within the tree has the key with a pointer.
Half-full nodes property. Every non root inner node is at least half full (⌈ n+1
2
⌉−
1 ≤ numberof keys ≤ n)
Balanced property. Every leaf node has the same depth, thus it is perfectly
balanced.
5 7
1 5 6 7 9 13
1 7
5 9
6 13
114
Searching
We can take advantage of the structure of the tree to iterate through from the leaf
to the root to find specific values. We can see the steps in Algorithm 1.
Inserting
We can also insert new values into the three. Insertions must ensure that the tree
is balanced and still complies with all the conditions of the data structure. We can
see the high-level steps in Algorithm 2.
Deleting
Deletions have the same restrictions as insertions. We can see the high-level steps
in Algorithm 3.
Algorithm 1 Searching for records that has a search key value v of a B+ Tree of
order p
function find(v)
n = tree.getRootP age();
n.readP age();
p = n.getN umberP ointers();
while (!n.isLeaf N ode()) do
l = n.getLarger(v); ▷ The node n with i xi values, get those that are v ≤ xi
xi, pi = l.min(); ▷ Finds the smallest xi that is larger than v with the
respective pointer pi
if (xi.isN ull()) then ▷ v > xi
n = n.getLastP ointer(); ▷ Gets last non-null pointer of n
else if (v == xi) then ▷ v = xi
n = n.getN ext(pi); ▷ Gets the pointer following pi
else ▷ v < xi
n = pi; ▷ Gets the current pointer
end if
n.readP age();
end while
r = n.hasRecordW ithKey(v)
if (!r.isN ull()) then ▷ We did found a record with the value
return n;
else ▷ We did not found a record with the value
return null;
end if
end function
115
Algorithm 2 Inserting a record that has a search key value v of a B+ Tree of order p
function insert(v)
l = tree.f ind(v) ▷ We use a variation of the f ind(v) method
if (l.hasSpace()) then
l.insertOrder(v); ▷ Also updates the references
else ▷ We must split the node
m = l.getM iddleKey();
l, l2 = l.splitEvenly();
l.parent.insertOrder(m, l2);
end if
end function
Algorithm 3 Deleting a record that has a search key value v of a B+ Tree of order p
function delete(v)
l = tree.f ind(v) ▷ We use a variation of the f ind(v) method
if (l.hasRecordW ithKey(v)) then
l.remove(v)
if l.sibling.canRedistribute() then ▷ Siblings are adjacent nodes with the same
parent
l.redistributeSibling();
else
n, o = merge(l, l.siblings); ▷ n is the new merged node, while o is the old node
n.parent.removeRef erence(o);
end if
end if
end function
116
7.10. Review
Teoría
1. ¿Cuál es la diferencia entre memoría primaria, secundaria y terciaria?
3. ¿Cuáles son los componentes del diseño físico de la base de datos? Indique cómo
es que se relaciona cada uno.
14. ¿Por qué se utilizan estructuras auxiliares para el almacenamiento físico en las
bases de datos?
117
Árboles B+: Errores
1. Se tiene el siguiente árbol B+ de n = 2 llaves.
2 4
2 4 15 7
27 66
3 7 27 66 77 88 99
11 12
10 11 12 13
7 51 63
3 7 23 51 57 63
118
5. Se tiene el siguiente árbol B+ de n = 3 llaves.
15
5 10
1 5 7 10 11 13 17
55 66
55 66 77
119
Árboles B+: Algortimos de inserción y borrado
Para cada uno de los siguientes ejercicios, dibuje el árbol despues de realizar la
operación con su algortimo respectivo. Para cada ejercicio, aplique cada operación
en orden (una despues de otra).
3 8
3 8 20
100 250
13
3 29
2 3 13 29 31
120
(c) Agregue al árbol B+ el valor 5.
(d) Agregue al árbol B+ el valor 17.
(e) Agregue al árbol B+ el valor 43.
(f) Agregue al árbol B+ el valor 47.
(g) Agregue al árbol B+ el valor 19.
(h) Agregue al árbol B+ el valor 23.
10
3 10 21
121
(g) Agregue al árbol B+ el valor 47.
(h) Remueva del árbol B+ el valor 6.
(i) Remueva del árbol B+ el valor 3.
(j) Agregue al árbol B+ el valor 15.
(k) Agregue al árbol B+ el valor 5.
(l) Agregue al árbol B+ el valor 9.
122
Capítulo 8
Álgebra relacional
8.1. Introduction
Relational algebra is a formal language that defines a set of operations to the
relational model. A sequence of relational algebra operations is a relational algebra
expression. The results of applying an expression is a new relation that represents the
result of the query in the database. Relational algebra is important as: (1) it formally
defines relational model operations; (2) is used in processing and optimizing queries
in RDMSs; and, (3) some concepts are used in the query language for RDMSs. We
can also represent relational algebra expressions as a query tree or query graph.
The operations can be classified into two groups:
Set operations . These operators are related to set theory. These include UNION,
INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT.
Relation operations. These operators are developed for relational databases. The-
se include SELECT, PROJECT, and JOIN.
Furthermore, depending on the number of operators, we can also classify the ope-
rations as unary operations if they operate on only one relation (e.g., SELECT, PRO-
JECT) or binary operations if they operation if they operate on two relations (e.g.,
JOIN).
σc (R)
The number of tuples selected will always be less or equal than the number of
tuples in R.
The fraction of tuples selected by the conditions is the selectivity of the condi-
tion.
The condition can have multiple clauses separated by boolean operators (AND
or OR). We can also apply a NOT to reverse the boolean result.
123
The clauses can be between attributes (F name == Lname) or between an attri-
bute and a constant value (F name == ‘P edro′ ).
The clauses can compare one of the comparison operators {=, <, ≤, >, ≥, ̸=}. If
the values compared are unordered (e.g. Strings that represent colors) only the
operators can be used {=, ̸=}.
Example: Get all the students who are named Pedro and are born after 1990.
SELECT *
FROM STUDENT
WHERE Fname = ‘Pedro’ AND Byear > 1990
πL (R)
The columns are retrieved in the same order as they appear in the list.
The number of tuples selected will always be less or equal than the number of
tuples in R.
π in SQL is specified by the attribute list in the SELECT clause with a DIS-
TINCT.
Example: Get the first and last name for all the students.
124
8.4. RENAME operation
The RENAME operator (ρ) can rename a relation or attributes names. To rename
a relation R into a new relation S, the notation is the following:
ρS (R)
To rename attributes based on a list of attributes L = {A1 , · · · , Am } of a relation R,
the notation is the following.
R∪S
In a Venn Diagram, the UNION operation looks as following:
R S
ST U DEN T ∪ IN ST RU CT OR
125
SELECT *
FROM STUDENT
UNION
SELECT *
FROM INSTRUCTOR
R∩S
In a Venn Diagram the operation is the following:
R S
Example: Get the list of people who are both students and instructors.
ST U DEN T ∩ IN ST RU CT OR
SELECT *
FROM STUDENT
INTERSECTION
SELECT *
FROM INSTRUCTOR
R−S
In a Venn Diagram the operation is the following:
126
R S
ST U DEN T − IN ST RU CT OR
SELECT *
FROM STUDENT
EXCEPT
SELECT *
FROM INSTRUCTOR
R×S
The tuples for the new relation are generated in the order (A1 , ..., An , B1 , ..., Bm ).
× in SQL is CROSS JOIN. Also, if there are two tables in the FROM clause
without a join condition in the WHERE clause.
ST U DEN T × EN ROLLED
SELECT *
FROM STUDENT, ENROLLED
or
127
SELECT *
FROM STUDENT CROSS JOIN ENROLLED
R ▷◁c S
A THETA JOIN has one of the comparison operators {=, <, ≤, >, ≥, ̸=} denomina-
ted θ.
An EQUIJOIN uses only the = comparison operator. The columns involved in the
EQUIJOIN are repeated, as both have the same values in each column.
A NATURAL JOIN (∗) is an EQUIJOIN that gets rid of the repeated column by
combining both relations by a join attribute. The join attribute has the same
name in both relations. If there is no column with the same name, it can be
renamed using the ρ operation.
The degree and order of the resulting attributes is the same as the CARTESIAN
PRODUCT.
The join selectivity is the expected size of the join result divided by the maximum
size of nR ∗ nS .
▷◁ can be applied in SQL in various ways: (1) by specifying the join condition in
the WHERE clause; (2) using a nested relation; and, (3) using join table instruc-
tions such as JOIN, NATURAL JOIN or LEFT OUTER JOIN.
SELECT *
FROM STUDENT AS S, ENROLLED AS E
WHERE S.id = E.id
or
ST U DEN T ∗ EN ROLLED
128
SELECT *
FROM STUDENT AS S JOIN ENROLLED AS E ON S.id = E.id
SELECT *
FROM STUDENT NATURAL JOIN ENRO-
LLED
R ∩ S ≡ (R ∪ S) − ((R ∪ S) ∪ (S ∪ R))
σc (R × S) ≡ (R ▷◁c S)
8.12. Equivalences
There are many rules that can transform a relational algebra expression into an
equivalent expression. Two relational algebra expressions are equivalent if they ha-
ve the same set of attributes in a different order but the expressions represent the
same information. Thus, two expressions are equivalent if the same set of tuples is
retrieved.
Example: Get the students with the first name Pedro, born after 1990.
σF name=‘P edro′ AN DByear>1990 (ST U DEN T ) ≡ σF name=‘P edro′ (σByear>1990 (ST U DEN T ))
129
2. Commutativity of σ. σ can be applied in any order.
Example: Get the students with the first name Pedro, born after 1990.
σByear>1990 (σF name=‘P edro′ (ST U DEN T ) ≡ σF name=‘P edro′ (σByear>1990 (ST U DEN T ))
πid (πid,F name (πid,F name,Lname (ST U DEN T ))) ≡ πid (ST U DEN T )
πA1 ,A2 ,··· ,AN (σc (R)) ≡ σc (πA1 ,A2 ,··· ,AN (R))
Example: Get the id with the full name of students with the first name Pedro.
πid,F name,Lname (σF name=‘P edro′ (ST U DEN T )) ≡ σF name=‘P edro′ (πid,F name,Lname (ST U DEN T ))
R ▷◁c S ≡ S ▷◁c R
R×S ≡S×R
Example: Get the students with the first name Pedro and born after 1990, with
their course enrollment information.
Example: Get the students with the first name Pedro and with their enrolled
courses where they got grades higher than 90.
130
7. Commutativity of π with ▷◁ and ×. For a list of attributes of the projection is
called L, such that L = A1 , · · · , An , B1 , · · · , Bm where A1 , · · · , An are attributes of R
and B1 , · · · , Bn are attributes of S.
πL (R ▷◁c S) ≡ (πA1 ,··· ,An (R)) ▷◁c (πB1 ,··· ,Bm (S))
If attributes An+1 , · · · , An+k of R and Bm+1 , · · · , Bm+p of S are involved for the join
condition but are not in L, then these attributes must be added to the join con-
dition. This does not apply for the × operator.
πL (R ▷◁c S) ≡ (πA1 ,··· ,An ,An+1 ,··· ,An+k (R)) ▷◁c (πB1 ,··· ,Bm ,Bm+1 ,··· ,Bm+p (S))
Example: Get the name and course initials of students enrolled in courses.
σF name,Lname,Sigla (ST U DEN T ▷◁S.id=E.id EN ROLLED) ≡
(πF name,Lname,id (ST U DEN T )) ▷◁S.id=E.id (πid,Sigla (EN ROLLED))
Example: Get the list of all students or instructors born after 1990.
πByear>1990 (ST U DEN T ∪ T EACHER) ≡
(πByear>1990 (ST U DEN T )) ∪ (πByear>1990 (T EACHER))
131
12. Converting a (σ, ×) sequence into ▷◁. If a × is followed by a condition c of a σ,
then it can be changed into a ▷◁.
σc (R × S) ≡ (R ▷◁c S)
SELECT S.id
FROM STUDENT as S, ENROLLED
as E
WHERE S.id = E.id AND E.grade =
90
Root node
End execution πS.id
σE.grade=90 Internal nodes
Leaf nodes ▷◁S.id=E.id
S E
132
Attributes
[S.id] Edges
S.id = E.id
S E
Relation nodes E.grade = 90
Constant nodes 90
133
134
Capítulo 9
Procesamiento de queries
135
Application
Requests a query
Query SQL
Parser
Validates
the query
Query tree or graph
Query optimizer
Selects the
query plan
Execution plan
Query code
generator
Generates
the code
Query code
Runtime data-
base procesor
Executes
query code
Query result
A nested subquery block occurs when the result of an inner query is used by an
outer block.
A correlated nested subquery occurs when a tuple variable from an outer block
is used in the WHERE clause of the inner block. DBMSs use many techniques
to unnest these queries.
Example: Get the name of the students with a higher admission grade than the
student with id B66666.
Is separated into two query blocks:
Inner block. Get the admission grade of the student with id B66666.
Outer block. Get the name of the students with a higher admission grade than
c.
πF name,Lname (σAdmissionGrade>c (ST U DEN T ))
136
Application
Requests a query
SELECT S.id
FROM STUDENT as S, ENROLLED
as E
WHERE S.id = E.id AND E.grade =
90
Validates the query
πS.id
σE.grade=90
▷◁S.id=E.id
S E
STUDENT ENROLLED
Selects the query plan
πS.id
▷◁S.id=E.id
S σE.grade=90
STUDENT E
ENROLLED
S.id
B66666
C00000
Figura 9.2: Example of the inputs and outputs of each step high-level query proces-
sing
SELECT AdmissionGra-
de
FROM STUDENT
WHERE id = ‘B66666’
137
SELECT Fname, Lname
FROM STUDENT
WHERE AdmissionGrade > c
2. Move down σ operations as far down as it can go to be closer with the attributes
of the relationship (Commutativity of σ, Commuting σ with π, Commutativity of
σ with ▷◁ and ×, and Commutativity of σ with set operations).
a) If the condition involves only one table, it means it is a select condition and
can be moved all the way to the leaf node of the table.
b) If it involves two tables, that means that it represents a join condition and
can be moved above the location where the two tables are combined.
3. Rearrange the leaf nodes based on the following criteria (Commutativity ▷◁ and
×, and associativity of ▷◁, ×, ∩ and ∪).
a) Execute first the most restrictive σ, thus move the relationship before. The
most restrictive σ is the one that produces the least amount of tuples, with
the smallest absolute size, or the smallest selectivity.
b) Make sure that the order of the nodes does not cause ×. This may not be
done if the previous relations had a select to a key field.
5. Break down and move π as far down as possible. Create new π if needed. Only
leave the π needed by each operation. (Cascade of π, Commuting σ with π, Com-
mutativity of σ with ▷◁ and ×, and Commutativity of π with ∪).
Example: Get the student IDs of those who have taken the ‘CI-0127’ course born
after 1990
SELECT S.id
FROM STUDENT as S, ENROLLED as E, COURSE as C
WHERE C.sigla = ‘CI-0127’ AND S.id = E.id AND C.sigla = E.sigla AND
S.Byear > 1990
138
πS.id
σC.sigla=‘CI−0127′ AN DS.id=E.idAN DC.sigla=E.siglaAN DS.Byear>1990
× C
COURSE
S E
STUDENT ENROLLED
πS.id
σC.sigla=‘CI−0127′
σS.id=E.id
σC.sigla=E.sigla
σS.Byear>1990
× C
COURSE
S E
STUDENTENROLLED
Figura 9.4: Query tree after breaking up σ operations
πS.id
σC.sigla=E.sigla
σS.id=E.id σC.sigla=‘CI−0127′
× C
σS.Byear>1990 E COURSE
S ENROLLED
STUDENT
Figura 9.5: Query tree after moving down σ operations
139
πS.id
σS.id=E.id
×
σC.sigla=E.sigla σS.Byear>1990
× S
σC.sigla=‘CI−0127′ E STUDENT
C ENROLLED
COURSE
Figura 9.6: Query tree after selecting the most restrictive σ
πS.id
▷◁S.id=E.id
▷◁C.sigla=E.sigla σS.Byear>1990
σC.sigla=‘CI−0127′ E
S
C ENROLLED STUDENT
COURSE
Figura 9.7: Query tree after converting to ▷◁
πS.id
▷◁S.id=E.id
πE.id
πS.id
▷◁C.sigla=E.sigla
σS.Byear>1990
πC.sigla πE.sigla,E.id
S
σC.sigla=‘CI−0127′ E
STUDENT
C ENROLLED
COURSE
Figura 9.8: Query tree after moving down π
140
Pipelined evaluation. The resulting tuples are forwarded directly to the next ope-
ration through a buffer. This has an advantage in cost for writing and reading
from disk.
This approach is more suited for compiled queries, optimizing at compile time,
rather than interpreted queries that less time-consuming optimization works best.
Traditional optimization techniques are used to search the solution space to find a
solution that minimizes our cost functions (estimates).
The cost depends on several underlying metrics:
Access cost to secondary storage (a.k.a disk I/O cost). Cost of reading and
writing data blocks between main memory and secondary storage. This measu-
res the number of block transfers. This dependinds the type of access structures
and other factors such as the allocation of memory.
Disk storage cost. Cost of storing in disk any intermediate files generated by
an execution strategy for the query.
Computation cost (a.k.a CPU cost). Cost of using the CPU such as to search,
order and computing fields. This measures the CPU used, but it is tough to esti-
mate.
Memory usage cost. Cost of main memory buffers used during query execu-
tion.
Communication cost (a.k.a network cost). Cost of shipping the query and
the results from the database to the application that requests the query. This
measures the number of messages sent.
141
For large databases there is an emphasis on minimizing disk I/O cost, meanwhi-
le for smaller databases the emphasis is on minimizing CPU cost. For distributed
databases the network cost is also important to consider. We tend to only use one
single factor, the disk I/O cost, as it is difficult to assign suitable weights to each cost
component.
142
Sorting phase
The runs (portions or pieces) that are available in the buffer are sorted using an
internal algorithm. Then, the results are written back as temporary sorted runs.
Merging phase
The runs are merged with merge pases. The degree of merging (dM ) are the sorted
subfiles that can be merged in each step with at least one buffer block used to save
the result. dM = min(nB − 1, nR ).
Cost
The performance of the algorithm can be measured in disk block accesses (either
reads or writes) required. The following formula estimates the cost:
143
Disjunctive conditions have multiple conditions separated by ORs. If there is an
access path for all the conditions we can use it, but if not we have to use S1. We
can use any method from S1 to S7 for every simple condition. We have to remove
duplicates. The form is the following.
In the following subsections, each SELECT operation algorithm with their cost is
explained.
S2 - Binary search
If we are selecting with an equality comparison on an ordered file, we can binary
search.
For binary search, we can find a key attribute in log2 (bR ) time. Furthermore, if it is
a non-key attribute then we might have to retrieve ⌈( bfsArR )⌉ − 1. Given that sA records
may satisfy the condition, with bfsArR file blocks containing the selected records minus
the first record retrieved.
Cost: log2 (bR ) + ⌈( bfsArR )⌉ − 1
144
For every level (xR ) we have to check one disk block to get the results. Then, half
of the file records (bR ) will satisfy the condition c that will be accessed.
Cost: xR + b2R
S6 - B + -tree index
If we are selecting a with a secondary B + -tree index on an equality comparison,
we can use the tree to find key or non-key values. We can also use a range of values
for a range query.
For every level of the tree (xR ) we must check one disk block of the results, plus
the reference to the disk block pointer. If the index is on a non-key attribute, s records
will satisfy the condition c. If it is a range, it is xR + bI1A
2
+ 2r .
Cost: xR + 1 + s
145
The cost estimates for block transfers of selection algorithms S1 to S6 are shown
in Table ??. For the other algorithms, we can estimate the cost using these methods,
depending on the one used.
The join selectivity (js) represents the number of tuples generated after the join.
The following formula represents the js of joining the tables R and S with the
resulting relationship saved in T = R ▷◁c S.
rT 1
js = =
rR ∗ rS max(N DV (A, R), N DV (B, S))
The join cardinality (jc) is the size of the resulting file after the join operation.
The following formula represents the jc.
jc = js ∗ rR ∗ rS
The results of writing the resulting cost file to disk is represented by the follo-
wing formula.
js ∗ rR ∗ rS jc
=
bf rR S bf rR S
The join selection factor is the fraction of records joined of the relationships.
In the following subsections, each JOIN operation algorithm with their cost is ex-
plained.
146
foreach block BR in R:
foreach block BS in s:
foreach tuple r in Br :
foreach tuple s in Bs :
if (s == r):
save( s, r)
bR + ∗(rR ∗ (xB + 1 + sB ))
Primary index:
bR + ∗(rR ∗ (xB + 1))
bR + ∗(rR ∗ h))
Algorithm:
Example:
For the query:
147
foreach tuple r in R:
foreach tuple s in Index (ri = sj ):
if (s == r):
save (s, r)
Suppose ST U DEN T has a primary index with XS.id = 1. Furthermore, EN ROLLED has
a secondary index with XE.id = 2 and SE.id = 80. Also, bS = 13, bE = 2000, rE = 10000,
rS = 125, and nB = 3.
If we use EN ROLLED as the outer relation, then:
J3 - Sort-merge join
If both R and S are physically sorted by the attributes, we can sort both of the
attributes A and B and iterate by joining the smallest of the two values.
If we need to sort each table, we must include this in the cost. If not, the only cost
is the merging as we only need to scan once every table to compare the results.
Cost:
Sort: This might not be needed. We use as an example the external merge sort
strategy.
(2 ∗ b) + (2 ∗ b ∗ (logdM (nR ))) = 2 ∗ b ∗ (logdM (nR ) + 1)
Merge:
bR + bS
Total:
Sort + M erge
Algorithm:
//Optional sorts
cursorR =R.sort(A).cursor()
cursorS =S.sort(B).cursor()
while (cursorR AND cursorS ): //We stop if there are no more values
if (cursorR > cursorS ):
cursorS .next()
elif (cursorR < cursorS ):
cursorR .next()
else:
save(cursorR , cursorS )
cursorS .next()
Example:
For the query:
148
If the relations ST U DEN T and EN ROLLED are sorted at the join attribute, then:
bE + bS = 2000 + 13 = 2013
If not then we must sort ST U DEN T and EN ROLLED at the join attribute, thus with
external merge sort:
dMS = min(nB − 1, bS ) = min(2, 13) = 2
bS 13
SortS = 2 ∗ bS ∗ (⌈logdMS (⌈ ⌉ + 1)⌉ = 2 ∗ 13 ∗ (⌈log2 (⌈ ⌉))⌉ + 1) = 26 ∗ (⌈log2 (5)⌉ + 1) = 104
nB 3
dME = min(nB − 1, bE ) = min(2, 2000) = 2
bE 2000
SortE = 2∗bE ∗(⌈logdME (⌈ ⌉)⌉+1) = 2∗2000∗(⌈log2 (⌈ ⌉))⌉+1) = 4000∗(⌈log2 (667)⌉+1) = 44000
nB 3
bE + bS + SortE + SortS = 2000 + 13 + 104 + 44000 = 46117
Example:
For the query:
9.10. Review
Admisión UCR
Para que uno pueda entrar en una carrera en la Universidad de Costa Rica (UCR)
existe un proceso de admisión. Para estudiantes de nuevo ingreso, se debe tomar
un examen de admisión que es administrado anualmente. Luego, las personas que
cumplan una nota minima pueden concursar para empadronarse a una carrera. El
sistema de concurso de carrera tiene el modelo relacional de la Fig. 9.10.
149
Carrera:
Código Nombre Facultad Escuela Recinto
Aplicante:
Cédula Nombre Correo NombreColegio NotaColegio NotaAdmisión
Concurso:
CédulaAplicante CódigoCarrera Año Admitido
Álgebra relacional
Escriba los siguientes queries utilizando los operadores de álgebra relacional.
Además, se recomienda escribir las consultas en SQL.
4. Obtenga los nombres de aplicantes con una nota de admisión mayor que 600 y
una nota de colegio mayor que 9.
5. Obtenga todos los concursos a la carrera con código CI (computación) del año
2021.
9. Obtenga los nombres de las personas que aplicaron en el 2020 y en el 2021 con
una nota de admisión mayor a 720.
10. Obtenga el nombre de todas las carreras donde ningún aplicante aplicó en el
2021.
Heurísticas
Para las siguientes expresiones, vamos a usar A para la relación AP LICAN T E, CO
para la relación CON CU RSO y CA para la relación CARRERA. Se resumen algunos
atributos por efectos de espacio.
150
2. Para los siguientes queries indique cúal es el resultado de aplicar cada paso de
las heurísticas.
(a) Para la expresión σN otaAdmision>420AN DN ombreColegio=‘High School Musical’ (A) el resultado
de solamente romper σ es:
⃝ πN otaAdmision>420AN DN ombreColegio=‘High School Musical’ (A)
⃝ σN ombreColegio=‘High School Musical’AN DN otaAdmision>420 (A)
⃝ σN otaAdmision>420 (σN ombreColegio=‘High School Musical’ (A))
⃝ σN otaAdmision>420AN DN ombreColegio=‘High School Musical’ (A)
(b) Para la expresión σN otaAdmision>533 (σCedula=CedulaAplicante (A × CO)) el resultado de
solamente mover σ para abajo es:
⃝ ((σN otaAdmision>533 (A)) ▷◁Cedula=CedulaAplicante CO)
⃝ σCedula=CedulaAplicante (A × (σN otaAdmision>533 (CO)))
⃝ σN otaAdmision>533 ((σCedula=CedulaAplicante (A)) × CO)
⃝ σCedula=CedulaAplicante ((σN otaAdmision>533 (A)) × CO)
(c) Para la expresión (CA ▷◁CA.Co=CO.CoC (σCO.Admitido=‘Y ′ (CO)) ▷◁A.Ce=CO.CeA A) el re-
sultado de solamente mover σ restrictivo primero es:
⃝ (CA ▷◁CA.Co=CO.CoC (σCO.Admitido=‘Y ′ (CO)) ▷◁A.Ce=CO.CeA A)
⃝ (A ▷◁A.Ce=CO.CeA (σCO.Admitido=‘Y ′ (CO)) ▷◁CA.Co=CO.CoC CA)
⃝ (A ▷◁A.Ce=CO.CeA ((σCO.Admitido=‘Y ′ (CO)) ▷◁CA.Co=CO.CoC CA))
⃝ ((σCO.Admitido=‘Y ′ (CO)) ▷◁CA.Co=CO.CoC CA ▷◁A.Ce=CO.CeA A)
(d) Para la expresión σN otaAdmision>533 (σCedula=CedulaAplicante (A × CO)) el resultado de
solamente convertir ▷◁ es:
⃝ σN otaAdmision>533 (σCedula=CedulaAplicante (A ▷◁ CO))
⃝ σN otaAdmision>533 (A ▷◁Cedula=CedulaAplicante CO)
⃝ σCedula=CedulaAplicante (A ▷◁N otaAdmision>533 CO)
⃝ ((σN otaAdmision>533 (A)) ▷◁Cedula=CedulaAplicante CO)
(e) Para la expresión πA.Correo (CO ▷◁A.Ce=CO.CeA A) el resultado de solamente mo-
ver π para abajo es:
⃝ (πA.Ce (A)) ▷◁A.Ce=CO.CeA (πA.Correo,CO.CeA (A))
⃝ πA.Correo ((πA.Ce (A)) ▷◁A.Ce=CO.CeA (πCO.Correo,CO.CeA (A)))
⃝ πA.Correo (CO ▷◁A.Ce=CO.CeA (πA.Correo (A)))
⃝ CO ▷◁A.Ce=CO.CeA (πA.Correo (A))
Sorting
Existe una base de datos con tres millones de bloques (b = 3, 000, 000) que se quiere
ordenar usando external merge sort. nB son el número de buffers.
1. Si se tienen cuatro buffers, ¿cuántas corridas se requieren para ordenar los blo-
ques?
⃝ 10 ⃝ 15 ⃝ 120 ⃝ 500, 000 ⃝ 750, 000 ⃝ 1, 000, 000
151
2. Si se tienen seis buffers, ¿cuántos bloques se pueden ordenar en cada paso del
merge?
⃝ 5 ⃝ 6 ⃝ 100 ⃝ 500, 000 ⃝ 750, 000
3. ¿Cuántos son los buffers mínimos requeridos para poder ordenar con external
merge sort?
⃝ 1 ⃝ 2 ⃝ 3 ⃝ 5 ⃝ 10
7. ¿Cuál es el número más pequeño de buffers que se pueden tener para que se
pueda ordenar en solo dos corridas?
⃝ 555 ⃝ 556 ⃝ 1, 732 ⃝ 1, 733 ⃝ 1, 734 ⃝ 2, 007 ⃝ 2, 008 ⃝ 2, 009
8. Si se tienen veinte buffers, ¿cuál es el número de bloques más grandes que pue-
den ser ordenados en cuatro corridas?
⃝ 71 ⃝ 556 ⃝ 1, 001 ⃝ 3, 702 ⃝ 3, 703 ⃝ 50, 007 ⃝ 137, 180
Join
Suponga que se está realizando un join entre la relación aplicantes (A) y la relación
concurso (CO) sobre el atributo Cedula y CedulaAplicante. Calcule los costos de lectura
y escritura a disco. No incluya los costos de escribir en memoría los resultados.
1. ¿Cuál es el costo de realizar un nested loop join con A como la relación exterior
y CO como la relación interior?
⃝ 124, 000 ⃝ 158, 000 ⃝ 137, 800 ⃝ 210, 000 ⃝ 269, 900
2. ¿Cuál es el costo de realizar un nested loop join con CO como la relación exterior
y A como la relación interior?
⃝ 124, 000 ⃝ 158, 000 ⃝ 137, 800 ⃝ 210, 000 ⃝ 269, 900
152
5. ¿Cuál es el costo total de partition-hash join?
⃝ 3, 980 ⃝ 4, 160 ⃝ 7, 160 ⃝ 10, 340 ⃝ 15, 600 ⃝ 16, 700
Steam
Steam es uno de los distribuidores de juegos más grande del mundo. La compañía
ha distribuido más de 50, 000 juegos de PC en centenas de géneros. Dado la calidad
de su plataforma, actualmente cuenta con más de 1, 000, 000, 000 (miles de millones)
de cuentas. Cada cuenta puede comprar diversos juegos de los cuales la plataforma
recolecta información de uso. La base de datos de Steam tiene el modelo relacional
de la Fig. 9.11 para las tablas que se relacionan con la distribución de uso.
JUEGO (J):
ID Nombre Descripcion FechaLanzamiento Genero
CUENTA (CU):
ID Username Correo Nivel Pais Saldo
COMPRA (CO):
Juego Cuenta MinutosJugados FechaCompra UltimaFecha
1. Escriba la expresión de álgebra relacional que obtenga todos los géneros de jue-
gos que se han jugado por más de 50 horas (3, 000 minutos).
πJ.N ombre (σJ.ID=CO.Juego AN D CU.ID=CO.Cuenta AN D CU.P ais=‘CR′ AN D CO.U F >=‘01−01−2021′ (CU ×CO×J))
153
(a) Explique brevemente, en sus propias palabras, qué consulta realiza la ex-
presión de álgebra relacional.
(b) Dibuje el árbol canónico de la expresión de álgebra relacional.
(c) Optimice en totalidad, utilizando heurísticas, la expresión de álgebra relacio-
nal. Debe escribir el resultado de cada paso en un árbol de consulta (query
tree).
3. Suponga que se está realizando un join entre CUENTA (CU ) y COMPRA (CO)
sobre los atributos ID y Cuenta, respectivamente. Estime el costo de lectura y es-
critura a disco. No debe incluir los costos de escribir en memoría los resultados.
154
Parte VI
Transacciones
155
Capítulo 10
Transacciones
10.1. Introduction
It is common that when we execute operations in a database, from the point of
view of a user it seems as a single logical work unit. However, frequently we require
multiple operations to perform a work unit, requiring either all the work to be execu-
ted or none at all if there is a failure. Per example, if Alice wants to transfer $100 to
Bob’s bank account (logical work unit), the transfer requires to:
Other examples of systems that use transactions include ticket reservation, and
retail purchasing. To manage these types of operations, we need to perform the co-
llection of operations as one logical work unit called a transaction.
10.2. Definition
A database operates over data items A, B, C, · · · . A data item can be an attribute,
tuple, page, disk block, table or database. Furthermore, we can perform the following
operations:
write(X): Writes the variable X to a data item of the database. We asume that
the writes are saving directly to disk, though that might is usually not the case
as it is probably saved in cache or a buffer.
COMMIT: A successful end to the transaction that commits the updates safely
to the database. In SQL, it is represented by a COMMIT.
157
ROLLABACK: An unsuccessful end to the transaction, requiring to undo the
database changes. In SQL, it is represented by an ABORT.
The states of a transaction are shown in Fig. 10.1. A transaction starts in the active
state after beginning and while executing read(X) or write(X). When the transaction
has reached the end, it goes to the partially committed state. If there are no errors,
the transaction ends as a committed transaction. If there is a failure while execu-
ting operations or committing the results, the transaction enters a fail state. Thus
the results to the database are then aborted. After being aborted or commited, the
transaction is terminated.
end
abort
Failed Aborted
T1 : BEGIN;
read(A);
A := A − 100;
write(A);
read(B);
B := B + 100;
write(B);
COMMIT;
Figura 10.2: Transfering $100 from Alice’s bank account (A) to Bob’s bank account
(B)
158
T2 : BEGIN;
read(A);
A := A + 50;
write(A);
COMMIT;
Atomicity
Let us assume that account A has $1000 and account B has $500 before the transac-
tion. Suppose that there is a failure after the write(A) operation and before the write(B)
operation. Thus, the value would have $900 for A and $500 for B, destroying $100. The
database would be in an inconsistent state as the system would no longer reflect the
real world state. This inconsistent state will happen at one point in the transaction,
which will then become consistent after the write(B) operation. Thus, if the transac-
tion was never started or was guaranteed to be completed, the inconsistent state
would not be visible except during the transaction.
The atomicity means that all operations are executed or none at all. During the
transaction the system keeps track of the changes in a log file. If there is any failure,
the system restores the old values with the log files as if the transaction was not
executed. The database system is responsible for ensuring the atomicity through the
recovery system.
Consistency
We want that the result A+B to be the same before and after executing the transac-
tion. Without consistency, money could be destroyed or created (no bueno).
The consistency property ensures that executing a transaction by itself preser-
ves the consistency of the database. The programmer who codes the transaction is
responsible to ensure the consistency.
Isolation
If several transactions are executed concurrently, operations may interleave in
such a way that may produce an inconsistent state. If while transferring money from
A to B between write(A) and write(B) operation another second transaction read data
reads A or B there will be an inconsistent state. If this second transaction updates A
and B based on the inconsistent values, the database will be inconsistent after the
execution of the transactions.
The isolation property ensures that executing the transaction concurrently results
in an equivalent state as if it was done one at a time in a particular order. The database
ensures this property with the concurrency-control system.
Durability
If we have transferred the funds successfully, even if there is a system failure later
the data cannot be lost. For example, the system cannot lose that A has $900 in the
account and B has $600 in the account.
The durability property guarantees after successfully completing a transaction,
all the updates persist even if there is a system failure. To do this, we must write to
159
disk before finishing the transaction or by saving enough information to disk to be
able to reconstruct the updates after failure. The recovery system is also in charge
of ensuring durability.
160
Capítulo 11
Control de la concurrencia
11.1. Introduction
Usually, transaction processing systems allow multiple transactions to run concu-
rrently causing inconsistencies with the data. Though running transaction serially
(one at a time) is easier, there are benefits for using concurrency:
Reduced waiting time. The running time of a transaction may vary, thus if
transactions run serially a short transition might have unpredictable delays wai-
ting for a longer transaction to finish. Running a transaction concurrently, if they
use different parts for the database, is better as it allows them to share resour-
ces. Thus it reduces unpredictable delays, bringing down the average time that
a transaction takes to be completed after submission (average response time).
Still, as concurrency allows for multiple transactions to be run at the same time we
must ensure the isolation ACID property. To do this, we must use schedules to iden-
tify the order of execution (Sections 11.3). Furthermore to ensure the consistency of
the database we use several mechanisms called concurrency-control protocols (Sec-
tion 11.6).
11.2. Concurrency
While some systems may have a database system with a single-user, it is com-
mon for database systems to be multi-user. A simple way we could handle multiple
transactions is executing one after the other with only one thread (Subfig. 11.1a).
However, this is inefficient in time as there is only one transaction executing and we
require to copy the database per transaction.
Another possible approach is to process the transaction concurrently in a thread
using interleaved processing by partially executing the transaction and then suspen-
ding it to execute other operations (Subfig. 11.1b). Furthermore, we could also use
multiple threads to allow for parallel processing (Subfig. 11.1c).
161
A
Time
(a) Serial processing
A A
B B
Time
(b) Interleaved processing
A
CP U1
B
CP U2
Time
(c) Parallel processing
11.3. Schedules
The chronological order of execution of sequences of n transactions T1 , T2 , · · · , Tn is
the schedule or history. The order of the instructions inside the transaction must be
preserved.
Suppose we have a transaction T1 that transfers $100 from Alice’s bank account (A)
to Bob’s bank account (B) shown in Fig. 11.2. We also have a transaction T2 in which
$50 are deposited in cash to Alice’s bank account (A) shown in Fig. 11.3. Therefore a
schedule would be the order of execution for both of these transactions. At the end
of each transaction to detail that the transaction has entered a committed state, we
use the commit instruction.
T1 : read(A);
A := A − 100;
write(A);
read(B);
B := B + 100;
write(B);
Figura 11.2: Transfering $100 from Alice’s bank account (A) to Bob’s bank account
(B)
An example of a serial schedule is shown in Fig. 11.4. The instructors are shown
in chronological order from top to bottom, with the instructions for T1 and T2 shown
in the left and right column, respectively. If there was $1000 in A and $500 in B, the
result of the scheduled execution is $950 in A and $600 in B. As the total money in the
162
T2 : read(A);
A := A + 50;
write(A);
T1 T2
read(A);
A := A − 100;
write(A);
read(B);
B := B + 100;
write(B);
commit;
read(A);
A := A + 50;
write(A);
commit;
T1 T2
read(A);
A := A + 50;
write(A);
commit;
read(A);
A := A − 100;
write(A);
read(B);
B := B + 100;
write(B);
commit;
163
schedules that result in consistent states using the concurrency-control component.
Therefore, to ensure the consistency a concurrent schedule must be equivalent to a
serial schedule. Such schedules are serializable (Section 11.5).
T1 T2
read(A);
A := A − 100;
write(A);
read(A);
A := A + 50;
write(A);
commit;
read(B);
B := B + 100;
write(B);
commit;
T1 T2
read(A);
A := A − 100;
read(A);
A := A + 50;
write(A);
commit;
write(A);
read(B);
B := B + 100;
write(B);
commit;
Lost update
A lost update (dirty write) occurs when two transactions interleave in such a way
that the value produced by the database is incorrect. Wi represents a write to transac-
tion Ti and a Ri a read to transaction Ti . The anomaly occurs for Ti when the schedule
interleaves for Q in such a way where Wi (Q), · · · , Wj (Q).
An example is shown in Fig. 11.8. When T2 writes the result of A it is incorrect as
it reads the value of A before T1 changes the database result. Thus, the result in A
will be $1050 instead of $950 due to losing the debit of $100 to the bank account.
164
T1 T2
read(A);
A := A − 100;
read(A);
A := A + 50;
write(A);
write(A);
read(B);
A := B + 100;
write(B);
Dirty read
A dirty read (temporary update) occurs when a transaction updates a value that
then fails, while another transaction reads the data before the roll back. The problem
occurs for Ti when the schedule interleaves for Q in such a way where Wj (Q), · · · , Ri (Q).
An example is shown in Fig. 11.9. T2 reads the temporary result of A. However,
T1 has a failure and the previous data is recovered to the original value. Therefo-
re, T2 reads dirty data created by the transaction that has not been completed and
committed.
T1 T2
read(A);
A := A − 100;
write(A);
read(A);
A := A + 50;
write(A);
read(B);
ABORT;
Unrepeatable read
An unrepeatable read (fuzzy or non-repeatable read) occurs when a transaction
reads the same data twice, but the value was changed by another transaction bet-
ween reads. The issue occurs for Ti when the schedule interleaves for Q in such a
way where Ri (Q), · · · , Wj (Q), · · · Ri (Q).
An example is shown in Fig. 11.10. T1 reads a result for A at the beginning of
the transaction that is modified by T2 . When T1 reads the data of A again there are
different values for the same data.
Phantom read
A phantom read occurs when a transaction repeats a search condition but gets
a different a set of items that satisfies the condition. [y in Q] represents modifying
(inserting, updating or deleting) a tuple y for the data item Q . The anomaly occurs
165
T1 T2
read(A);
A := A − 50;
write(A);
read(A);
A := A + 50;
write(A);
read(A);
A := A − 50;
write(A);
for Ti when the schedule interleaves for Q in such a way where Ri (Q), · · · , Wj [y in
Q], · · · Ri (Q).
An example is shown in Fig. 11.11. T1 reads the total of tA money transferred in
Alice’s bank account, However, in T2 a new transfer is added to all the money transfers
tA. Therefore, when the same condition is executed in T1 a new phantom tuple exists
that was not previously there.
T1 T2
read(tA);
insert(an+1 in tA);
read(tA);
11.5. Serializability
A serializable schedule determines if the order of concurrent operations are equi-
valent to the serial execution. The serializability identifies when a schedule is se-
rializable. Serial schedules are serializable, and interleaved executions can also be
serializable though it is harder to determine. There are two types of serializability:
conflict and view. Neither definition encapsulates all serializable schedules.
Conflict serializability
If we have a schedule S with two consecutive instructions I and J of transactions
Ti and Tj , with i ̸= j. If I and J are different data items the steps can be swapped
without affecting the results. However if I and J both refer to the same data item
Q then the order may matter. As in our transactions we only have read and write
instructions only four possibilities can happen:
read-read: I = read(Q), J = read(Q). The order does not matter as, regardless of
the order of I and J, the value Q will be the same.
166
write-read: I = write(Q), J = read(Q). The order does matter, for the same reason
as the previous case.
write-write: I = write(Q), J = write(Q). The order does not affect Ti and Tj . Ho-
wever, for the next read(Q) operation the order matters as only the last write
instruction is preserved in the database. If there is no other write(Q) the order
does affect the final value of Q in the database.
T1 T2 T1 T2 T1 T2
read(A); read(A); read(A);
write(A); write(A); write(A);
read(A); read(A); read(A);
write(A); read(B); read(B);
read(B); write(A); write(B);
write(B); write(B); write(A);
(a) Concurrent schedule (b) T1 read(B) ↔ T2 write(A) (c) T1 write(B) ↔ T2 write(A)
T1 T2 T1 T2
read(A); read(A);
write(A); write(A);
read(B); read(B);
read(A); write(B);
write(B); read(A);
write(A); write(A);
(d) T1 read(B) ↔ T2 read(A) (e) T1 read(B) ↔ T2 read(A)
167
Ti executes a write(Q) before Tj executes a write(Q).
If there is an edge Ti to Tj exists in the graph, then in a serial execution S ′ Ti must
execute before Tj . If there are no cycles, then the schedule S is conflict serializable.
If the graph has a cycle, then S is not conflict serializable.
For example, the precedence graphs of schedules 1 to 4 are shown in Fig. 11.13.
For schedule 1 (Subfig. 11.13a) there is a single edge T1 → T2 as T1 executes
write(A) before T2 executes a read(A).
Schedule 2 (Subfig. 11.13b) has also only one edge T2 → T1 for a similar reason
than Schedule 1. The edge from T2 → T1 exists as T2 executes write(A) before T2
executes a read(A).
Schedule 3 (Subfig. 11.13c) has one edge T1 → T2 as T1 executes write(A) before T2
executes a read(A). Therefore, the concurrent schedule 3 is conflict serializable.
For schedule 4 (Subfig. 11.13d), there is an edge T1 → T2 as T1 executes read(A)
before T2 executes a write(A). There is also an edge T2 → T1 as T2 executes write(A)
before T2 executes write(A). Therefore, there is a cycle and the schedule is not
serializable.
T1 T2 T1 T2
T1 T2 T1 T2
The serializability order defines the order of execution of the transactions that
is consistent with the partial order of the precedence graph. To test for conflict se-
rializability, the precedence graphs is constructed and a cycle-detection algorithm
used.
View serializability
If there is a transaction T3 , shown in Fig. 11.14, that transfers $20 from Bob’s bank
account (B) to Alice’s bank account (A). A schedule 5 could be created by executing
T1 and T3 , as shown in shown in Fig. 11.15. The precedence graph of schedule 5 has
an edge from T1 → T3 as T1 executes write(A) before T3 executes read(A), and an edge
from T3 → T1 as T3 executes write(B) before T1 executes read(B). Thus, there is a cycle
and the transaction is not conflict serializable.
However, if we execute the transaction we will get an equivalent result to a serial
schedule of < T1 , T3 > due to the fact that the mathematical increment and decre-
ment operations are commutative. Therefore, there are schedules that produce the
same outcome but are not conflict equivalent. Analyzing the results instead of only
considering read and write operations is the view serializability. However, this ty-
pe of serializability is not used in practice because it is computationally complex to
determine (NP-complete).
168
T3 : read(B);
B := B − 20;
write(B);
read(A);
A := A + 20;
write(A);
Figura 11.14: Transfering $20 from Bob’s bank account (A) to Alice’s bank account
(B)
T1 T3
read(A);
A := A − 100;
write(A);
read(B);
B := B − 20;
write(B);
read(B);
B := B + 100;
write(B);
read(A);
A := A + 20;
write(A);
Pessimistic: The DBMS assumes that transactions will have conflicts, therefore
it doesn’t allow the problems to occur. Lock-based (Section 11.7) are pessimistic.
Optimistic: The DBMS assumes that conflicts between transactions are rare,
therefore it assumes it will be able to finish the transaction after commiting.
Checks are performed after the transaction is executed. Validation-based pro-
tocols (Section 11.11) and timestamp-based protocols (Section 11.10) are opti-
mistic.
Shared lock (S − LOCK). Several transactions can read at the same time, but
169
none can write. This lock can be acquired by multiple transactions at the same
time.
Exclusive lock (X − LOCK). Only one transaction can both read and write. This
lock prevents other transactions acquiring S − LOCK or X − LOCK.
We request the S − LOCK for a data item Q executing the S-LOCK(Q) instruction,
while for X − LOCK we execute the X-LOCK(Q) instruction. To release either lock
on a data item Q, we execute the UNLOCK(Q) instruction. If we hold the lock till the
end of a transaction (after a commit or abort) it is a long lock, while if we liberate it
before it is a short lock.
The transactions request locks (or upgrades) to the concurrency-control manager.
The lock requested depends on the type of transaction performed. The transaction
will have to wait to continue it’s execution until the concurrency-control manager
grants the lock. The lock will be granted until when all the incompatible locks held
by other transactions have been released. When a transaction finishes using a lock,
it must be released so other transactions can use it.
Compatible modes do not require waiting to be granted permission to use the lock
if another transaction currently holds a lock. The compatibility between the modes
is shown in Fig. 11.16. Only S − LOCKs are compatible with each other and can be
held simultaneously.
S − LOCK X − LOCK
S − LOCK ✓ ✗
X − LOCK ✗ ✗
T1 T2
X-LOCK(A);
read(A);
write(A);
UNLOCK(A);
X-LOCK(A);
write(A);
UNLOCK(A);
S-LOCK(A);
read(A);
UNLOCK(A);
If a transaction is waiting for a lock that is not granted, the transaction is starved
and cannot progress. For example, schedule 7 shown in Fig. 11.18. T1 has a
170
S −LOCK granted, while T2 requests the lock but is waiting to get granted access
to A. However, if starvation is not considered it will allow for T3 to be granted the
lock. Thus, even if T1 has unlocked A, T2 is still waiting. This can further happen
with transaction T4 , · · · , Tn . Thus, T2 cannot progress as it is left waiting by the
lock manager for the X − LOCK on A. Starvation can be handled if a lock request
is not blocked by a later request.
T1 T2 T3 T4 ···
S-LOCK(A); ···
X-LOCK(A); ···
S-LOCK(A); ···
UNLOCK(A); ···
S-LOCK(A); ···
UNLOCK(A); ···
If two transactions are waiting for the other to release a lock, a deadlock can
occur. Fig. 11.19 shows an example of a schedule that generates a deadlock.
When T2 ask for an S − LOCK on A it is not granted as T1 still holds an X − LOCK
on A. The deadlock is then generated when T1 asks for an X − LOCK on B as it
cannot be granted due to T2 having an S − LOCK on B. Therefore, neither T1 or
T2 can advance. Deadlocks are a necessary evil as they can be handled by rolling
back transactions (Section 11.8).
T1 T2
X-LOCK(A);
S-LOCK(B);
read(B);
S-LOCK(A);
write(A);
UNLOCK(B);
Figura 11.19: Schedule 8. Deadlock of data items A and B between two transactions
Locking protocols define a set of rules that must be followed to lock and unlock
data items. These protocols restrict the possible schedules and do not produce all
possible serializable schedules. In the following subsections, these protocols will be
defined. The relationship between these protocols and how long the locks are held is
shown in Fig. 11.20.
S − LOCK X − LOCK
2PL short short
Strict 2PL short long
Rigorous 2PL long long
171
Two-phase locking
The two-phase locking protocol (2PL) or basic 2PL ensures serializability by is-
suing locks and unlocks in two phases:
1. Growing or expanding phase. Transactions start by obtaining locks without
releasing them.
2. Shrinking phase. After locks are no longer needed, the transaction releases
locks and cannot obtain any new ones.
Both S − LOCKs and X − LOCKs are short locks. A schedule example is shown in
Fig. 11.21. First, both transactions T1 and T2 ask for all the locks for all the data items
used at the beginning of the transaction. For T1 , there is an X − LOCK for A and B,
while for T2 it is a X − LOCK for A. After T1 completely finishes using the X − LOCK
on A, it is released thus T2 can continue executing the operations. Furthermore, as A
is not used any more in T1 , there will be no inconsistent data states.
T1 T2
X-LOCK(A);
X-LOCK(B);
read(A);
write(A);
X-LOCK(A);
UNLOCK(A);
read(A);
write(A);
read(B);
write(B);
ABORT;
However, there are several limitations to 2PL. Deadlocks can still occur and it li-
mits concurrency. Furthermore, cascading aborts or cascading rollbacks can happen
when a transaction aborts and another transaction must be rolled back. For example,
in schedule 9 of Fig. 11.21 there is an abort at the end of T1 , thus T2 must also abort
the changes. Thus the effort of executing the instructions in T2 is wasted.
172
T1 T2
X-LOCK(A);
S-LOCK(B);
X-LOCK(A);
read(B);
UNLOCK(B);
read(A);
write(A);
UNLOCK(A);
COMMIT; read(A);
write(A);
COMMIT;
are long locks. Strict or rigorous 2PL eliminates cascading aborts, but the schedules
limit concurrency.
173
T1
T1 T2 T3
S-LOCK(A);
X-LOCK(B);
X-LOCK(C);
T2
S-LOCK(D);
S-LOCK(D);
S-LOCK(B);
S-LOCK(C); T3
X-LOCK(A);
··· ··· ···
(b) Wait-for graph to the sche-
(a) Schedule with three-way deadlock dule
the age, progress, number of data items used and number of transactions involved in
the rollback. A combination of several factors are considered. We must include the
number of rollbacks as a cost factor to not select only one victim.
When we decide the victim transaction, it has to determine how far the transaction
needs to be rolled back. A total rollback aborts all the transaction and restarts it. A
partial rollback, using the sequence of grants and requests of locks with the updates,
can determine a point to revert the changes to resume the execution from there.
Deadlock prevention
Deadlock prevention protocols ensure that the system will never enter a deadlock
state. These schemes are used if the probability of deadlock is high. Several approa-
ches have been proposed.
Another one of these approaches consists of ordering all the data items and strictly
following the ordering. A variation considering both of these schemes is to use 2PL
174
with strict ordering, thus the locks can only be requested following the order. An
example can be seen in Fig. 11.25. The locks are defined to be acquired with A first
and then B. A deadlock does not occur when T2 tries to acquire A, the transaction
must wait. If we had no order, T2 could acquire the X − LOCK(B) first and a deadlock
would happen.
T1 T2
X-LOCK(A); X-LOCK(A);
X-LOCK(B);
···
UNLOCK(B);
UNLOCK(A);
COMMIT; X-LOCK(B);
···
Wait-die (“Old waits for young”): If the requesting transaction has a higher
priority than the holding transaction, it waits. Otherwise, it is aborted. The older
transaction is allowed to wait for a younger transaction. The younger transaction
dies (aborts) if it requests the lock held by an older transaction.
Wound-wait (“Young waits for old”): If the requesting transaction has a higher
priority than the holding transaction, the holding transaction aborts and rolls
back (wounds). Otherwise, it waits. The younger transaction is allowed to wait
for an older transaction. The older transaction wounds (abort) the younger transac-
tion holding the lock.
An example for both timestamped-based schemes is shown in Fig. 11.26. For exam-
ple in Subfigure. 11.26a T1 starts before T2 , thus T1 < T2 (T1 is older). When T1 requests
a lock held by T2 , for wait-die the older transaction T1 waits for transaction T2 to re-
lease the lock. If the protocol was wound-wait then T2 is aborted and rolled back.
For another example shown in Subfigure. 11.26a T1 also starts before T2 (T1 is older).
With the wait-die scheme when T2 requests the lock held by T1 , T2 dies. While, for the
wound-die scheme T2 can wait for the older transaction.
Both timestamped-based techniques prevent deadlocks as either transactions wait
for the younger (wait-die) or older transactions (wound-wait), therefore no cycle is
created. However, both techniques have many unnecessary rollbacks.
Another method is using lock timeouts, defining a specified amount for waiting. If
the transaction was not granted the lock after waiting the specified time, it will roll
back and restart. Defining the time to wait is difficult, long times cause unnecessary
delays while short waits lead to wasted results. This protocol falls between deadlock
prevention and detection. Starvation can occur, thus there is limited applicability to
the scheme.
175
T1 T2 T1 T2
BEGIN BEGIN
BEGIN X-LOCK(A);
X-LOCK(A); BEGIN
X-LOCK(A); X-LOCK(A);
(a) Wait-die T1 waits and wound-wait T2 aborts (b) Wait-die T2 waits and wound-wait T1 aborts
Database
To determine if a lock can be granted we have to check if all the implicit locks
are compatible with requested mode. To determine this, all the descendants of the
granularity level must be checked. However, this is not efficient. In the worst case,
locking the database requires checking all the nodes in the tree.
To check more efficiently, we can use intention lock modes to not have to check
all descendant nodes. With intention mode, all the descendants are explicitly locked.
The different modes allowed are the following:
Intention-shared (IS) mode: Indicares that the descendants have explicit shared-
mode locks.
176
Intention-exclusive (IX) mode: Indicates that the descendants have explicit
exclusive-mode or shared-mode locks.
IS IX S SIX X
IS ✓ ✓ ✓ ✓ ✗
IX ✓ ✓ ✗ ✗ ✗
S ✓ ✗ ✓ ✗ ✗
SIX ✓ ✗ ✗ ✗ ✗
X ✗ ✗ ✗ ✗ ✗
Suppose that transaction T1 wants to read the data for Alice. Therefore, an S −
LOCK is acquired for the tuple with Alice’s data and a IS − LOCK on the Student
table indicating that one of its descendants (Alice’s tuple) has an S − LOCK.
Finally, let’s assume that transaction T3 scans wants to scan the student table
to update some student records. This would require gaining a SIX − LOCK on
the student table. However, this would not be possible as SIX is not compatible
with IX. Therefore, T3 would have to wait to be granted the lock. If T3 happend
before T2 , we could have acquired the lock for T3 .
T1 → IS
T2 → IX
Tables Student
T1 → S T2 → X
Tuples Alice Bob Carlos
Locks must be acquired from the root to the leaf, and released from the left to the
root. This protocol enhances concurrency and reduces overhead, but deadlocks can
still occur.
177
11.10. Timestamp-based protocols
The lock based protocols define the order at execution time, but we could also defi-
ne the serializability order in advance using a timestamp-ordering scheme. A unique
timestamp T S(Ti ) for every transaction Ti before execution.
When a new transaction Tj enters the system then T S(Ti ) < T S(Tj ). To assign the
sequential timestamp we could use different strategies. The system clock can be
used to assign a value when the transaction enters the system, however there can
be issues in edge cases (e.g., daylight savings). Another option is to use a logical
counter that increments after a new transaction enters the system, but the counter
could overflow or has issues maintaining the counter across multiple machines. A
hybrid combination of methods can be used. If T S(Ti ) < T S(Tj ) the DBMS must ensure
an equivalent serial execution of Ti appearing before Tj .
The timestamp-ordering protocol (basic T/O) executes conflicting reads and write
operations in the timestamp order, ensuring conflict serializability. To execute the
scheme, the DBMS tracks for every data item Q the last transaction that executed
successfully a read (R − T S(Q)) and a write (W − T S(Q)). The DBMS updates these
timestamps after every instruction is executed.
Read operations.
• If T S(Ti ) < W −T S(Q), a future transaction has written to data item Q before Ti
violating the T S(Ti ) < T S(Tj ) property. Therefore, Ti is aborted and restarted
with a new timestamp value.
• Else, T S(Ti ) ≥ W − T S(Q) the order T S(Ti ) < T S(Tj ) is preserved and the
read(Q) instruction is executed. The DBMS also updates R − T S(Q) = max(R −
T S(Q), T S(Ti )).
Write operations.
• If T S(Ti ) < R − T S(Q) or T S(Ti ) < W − T S(Q), a future transaction has read
or written to data item Q before Ti violating the T S(Ti ) < T S(Tj ) property.
Therefore, Ti is aborted and restarted with a new timestamp value.
• Else, T S(Ti ) ≥ R − T S(Q) and T S(Ti ) ≥ W − T S(Q) the order of execution is
ensured and the write(Q) instruction is executed. The DBMS also updates
W − T S(Q) = T S(Ti ).
The timestamped protocol example will use the schedule in Fig. 11.30. We can
assume that T S(T1 ) = 1 and T S(T2 ) = 2.
T1 T2
read(B);
read(B);
write(B);
read(A);
read(A);
read(A);
write(A);
178
When T2 executes read(B), W − T S(B) = 0. As T S(T2 ) ≥ W − T S(B) = 2 ≥ 0, T2 can
execute the instruction and R − T S(B) = max(R − T S(B), T S(T2 )) = max(1, 2) = 2.
When T2 executes write(B), W − T S(B) = 0 and R − T S(B) = 2. As T S(T2 ) ≥ W −
T S(B) = 2 ≥ 0 and T S(T2 ) ≥ R − T S(B) = 2 ≥ 2, T2 can execute the instruction and
W − T S(B) = T S(T2 ) = 2.
When T1 executes read(A), W − T S(A) = 0. As T S(T1 ) ≥ W − T S(A) = 1 ≥ 0, T1 can
execute the instruction and R − T S(A) = max(R − T S(A), T S(T1 )) = max(0, 1) = 1.
When T2 executes read(A), W − T S(A) = 0. As T S(T2 ) ≥ W − T S(A) = 2 ≥ 0, T2 can
execute the instruction and R − T S(A) = max(R − T S(A), T S(T2 )) = max(1, 2) = 2.
When T1 executes read(A), W − T S(A) = 0. As T S(T1 ) ≥ W − T S(A) = 1 ≥ 0, T1 can
execute the instruction and R − T S(A) = max(R − T S(A), T S(T1 )) = max(2, 1) = 2.
When T2 executes write(A), W − T S(A) = 0 and R − T S(A) = 2. As T S(T2 ) ≥ W −
T S(A) = 2 ≥ 0 and T S(T2 ) ≥ R − T S(a) = 2 ≥ 2, T2 can execute the instruction and
W − T S(A) = T S(T2 ) = 2.
The protocol can also be modified with Thomas’ write rule to remove unnecessary
rollbacks for write, creating view serializable conflicts. If T S(Ti ) < W − T S(Q) the
system can ignore the write as it is obsolete. This does violate the timestamp order,
but it is fine as no other transaction will read the write(Q) of Ti .
The protocol ensures conflict serializability and freedom from deadlocks, though
it has issues due to starvation, non-recoverable schedules and phantom tuples. Some
of these problems may be fixed with variations of the protocol.
1. Read phase. The data is read to a temporary copy. The transaction the modified
the temporary results (not global).
2. Validation phase. The system validates whether a serializability conflict occu-
rred. If there is a violation the transaction is aborted.
3. Write phase. The temporary results are writen to the database (global).
179
1. Ti completes all phases before Tj begins (Subfig. 11.32a).
2. Ti completes all phases before Tj enters the validation phase and the transactions
do not read the same items (Subfig. 11.32b).
3. Ti completes the read phase before Tj and the transactions do not read or write
the same items (Subfig. 11.32c).
This approach is optimistic as the validation will fail only in the worst of cases.
The protocol automatically guards against rollbacks and has no deadlocks, however
starvation can happen.
180
+ Conccurrency
READ UNCOMMITTED
+ Consistency
+ Isolation READ COMMITTED
REPEATABLE READ
SERIALIZABLE
Figura 11.33: Transaction concurrency versus database consistency with SQL-92 iso-
lation levels
S − LOCK
X − LOCK
data-item condition
READ UNCOMMITTED None None Long
READ COMMITTED Short Short Long
REPEATABLE READ Long Short Long
SERIALIZABLE Long Long Long
181
Most DBMS run the READ COMMITTED isolation level by default. We can chan-
ge the isolation level in SQL SERVER with the SET TRANSACTION ISOLATION
LEVEL <NAME>. Furthermore, by default most DBMS commit every statement af-
ter execution (automatic commit). To enable manual commits, the transaction begins
with the start transaction command (in SQL Server it starts with BEGIN TRANSAC-
TION) and ends with a COMMIT or ROLLBACK.
11.13. Review
Problemas de control de concurrencia
1. Se tiene el siguiente schedule:
T1 T2
read(A);
write(C);
write(C);
T1 T2
read(A);
write(B);
write(C);
read(D);
182
T1 T2
read(A);
read(A);
read(A);
read(B);
read(A);
read(A);
write(B);
T1 T2 T3 T4
read(A);
write(B);
read(D);
write(D);
write(C);
read(C);
write(C);
183
T1 T2 T3
read(B);
read(C);
write(B);
read(A);
write(A);
write(C);
read(C);
write(C);
write(A);
read(A);
write(B);
(b) Para cada uno de los problemas anteriores, detalle todas las instrucciones
que generan cada problema respectivo. Debe indicar para cual transacción
genera el problema respectivo.
(c) Si se quisieran evitar todos los problemas de control de concurrencia de
este, ¿cuál es el nivel de aislamiento (isolation level) mínimo que se debe
seleccionar? ⃝ Read uncommitted ⃝ Read committed ⃝ Repeatable
read. ⃝ Serializable.
(d) Explique su escogencia del nivel de aislamiento.
T1 T2 T3
write(A);
read(B);
write(C);
read(A);
read(B);
write(B);
read(C);
write(C);
write(A);
read(B);
184
Protocolos de control de concurrencia
Para cada uno de los siguientes protocolos, seleccione todos los errores que pue-
den suceder.
Serializabilidad
Para cada uno de los siguientes schedules, responda las preguntas.
T1 T2
read(A);
write(C);
write(C);
185
T1 T2
read(A);
write(B);
write(C);
read(D);
T1 T2
read(A);
read(A);
read(A);
read(B);
read(A);
read(A);
write(B);
186
T1 T2 T3 T4
read(A);
write(B);
read(D);
write(D);
write(C);
read(C);
write(C);
(e) Seleccione todas las transacciones que se deben remover para hacer que la
transacción sea serializable en conflictos.
⃝ T1 ⃝ T2 ⃝ T3 ⃝ T4 ⃝ Ya es serializable en conflictos.
T1 T2 T3
read(B);
read(C);
write(B);
read(A);
write(A);
write(C);
read(C);
write(C);
write(A);
read(A);
write(B);
187
T1 T2 T3
write(A);
read(B);
write(C);
read(A);
read(B);
write(B);
read(C);
write(C);
write(A);
read(B);
Candados
Para cada una de los siguientes schedules, indique que candado simple se debe
pedir. Asuma que despues de obtener un candado, no se libera.
T1 T2 T3
t1 read(A);
t2 read(B);
t3 write(A);
t4 write(B);
t5 write(C);
t6 read(A);
188
T1 T2 T3
t1 write(A);
t2 read(B);
t3 write(C);
t4 read(A);
t5 read(B);
t6 write(B);
t7 read(C);
t8 write(C);
t9 write(A);
t10 read(B);
T1 T2 T3
t1
t2
t3
t4
t5
t6
t7
t8
t9
t10
(a) Para cada tiempo, indique si cada candado otorgará (granted) o bloqueará
(blocked) el candado.
i. En t1 : ⃝ Otorgará ⃝ Bloqueará
ii. En t2 : ⃝ Otorgará ⃝ Bloqueará
iii. En t3 : ⃝ Otorgará ⃝ Bloqueará
iv. En t4 : ⃝ Otorgará ⃝ Bloqueará
v. En t5 : ⃝ Otorgará ⃝ Bloqueará
vi. En t6 : ⃝ Otorgará ⃝ Bloqueará
vii. En t7 : ⃝ Otorgará ⃝ Bloqueará
viii. En t8 : ⃝ Otorgará ⃝ Bloqueará
(b) Para las transacciones anteriores, dibuje el wait-for graph.
(c) ¿Existe un deadlock en el schedule anterior? ⃝ Si ⃝ No
(d) Explique porque hay o no hay un deadlock.
2. Se tiene las siguientes transacciones que piden candados simples:
(a) Para el schedule anterior, dibuje el wait-for graph.
(b) ¿Existe un deadlock en el schedule anterior? ⃝ Si ⃝ No
(c) Explique porque hay o no hay un deadlock.
189
T1 T2 T3
t1 S-LOCK(A);
t2 X-LOCK(B);
t3 X-LOCK(C);
t4 S-LOCK(A);
t5 X-LOCK(C);
t6 X-LOCK(A);
t7 S-LOCK(D);
t8 S-LOCK(A);
T1 T2 T3
t1 X − LOCK(A)
t2 S − LOCK(B)
t3 X − LOCK(C)
t4 Nada
t5 S − LOCK(B)
t6 X − LOCK(B)
t7 S − LOCK(C)
t8 X − LOCK(C)
t9 X − LOCK(A)
t10 Nada
T1 T2 T3
t1 S-LOCK(C);
t2 X-LOCK(C);
t3 X-LOCK(A);
t4 S-LOCK(B);
t5 X-LOCK(A);
t6 X-LOCK(A);
t7 S-LOCK(B);
190
(c) Si se utiliza la política wound-wait:
i. En t1 : ⃝ O ⃝ B ⃝ A ⃝ M
ii. En t2 : ⃝ O ⃝ B ⃝ A ⃝ M
iii. En t3 : ⃝ O ⃝ B ⃝ A ⃝ M
iv. En t4 : ⃝ O ⃝ B ⃝ A ⃝ M
v. En t5 : ⃝ O ⃝ B ⃝ A ⃝ M
vi. En t6 : ⃝ O ⃝ B ⃝ A ⃝ M
vii. En t7 : ⃝ O ⃝ B ⃝ A ⃝ M
T1 T2 T3
t1 X − LOCK(A)
t2 S − LOCK(B)
t3 X − LOCK(C)
t4 Nada
t5 S − LOCK(B)
t6 X − LOCK(B)
t7 S − LOCK(C)
t8 X − LOCK(C)
t9 X − LOCK(A)
t10 Nada
191
192
Bibliografía
[1] R. Elmasri and S. Navathe, Fundamentals of database systems, 7th ed. Pearson,
2016.
[4] A. Silberschatz, H. F. Korth, and S. Sudarshan, Database System Concepts, 7th ed.
New York, NY: McGraw-Hill, 2020.
[9] E. Malinowski and A. Martínez, Material de Apoyo de Bases de Datos I, 1st ed.,
Universidad de Costa Rica, 2018.
193