CS3492 DBMS Univ - QP Answer AM 2024
CS3492 DBMS Univ - QP Answer AM 2024
CS3492 DBMS Univ - QP Answer AM 2024
PART – A(10*2=20)
1. Differentiate File processing system and database management system (Understand)
File system is a method of organizing the files with a hard disk or other medium of storage. File system
arranges the files and helps in retrieving the files, when required. It is compatible with different file
types, such as mp3, doc, txt, mp4,etc and these are also grouped into directories.
DBMS, meanwhile, is the acronym for Database Management System. It is also a software used to
store and regain user’s data, while also maintaining the required security measures. This includes a
group of programmes that can help to manipulate the database. In bigger systems, DBMS helps the
users as well as third party software to store and recover the data.
2. List some relational algebra operations. (Understand)
Five basic operations in relational algebra: Selection, Projection, Cartesian product, Union, and Set
Difference.
3. Define Entity, Relationship and attributes in ER model. ( Remembering)
An entity set is a group of similar entities and these entities can have attributes. In terms of DBMS, an
entity is a table or attribute of a table in database, so by showing relationship among tables and their
attributes, ER diagram shows the complete logical structure of a database.
4. Why BCNF is preferred over 3NF? ( Understanding)
BCNF is a stronger form of normalization than 3NF because it eliminates the second condition for 3NF,
which allowed the right side of the FD to be a prime attribute. Thus, every left side of an FD in a table
must be a super key. Every table that is BCNF is also 3NF, 2NF, and 1NF, by the previous definitions.
If both transactions run concurrently, a deadlock can occur if Transaction A locks Account X and
waits for Account Y, while Transaction B locks Account Y and waits for Account X. Solution: Ensure
both transactions lock the accounts in the same order.
7 What is hash based indexing? ( Rememberibg)
Hash-based indexing does not maintain any ordering among the indexed values; rather it is based on
mapping the search-key values on a collection of buckets. Therefore it can only address equality (or
membership) queries. Tree-based indices maintain order and can thus also address range queries.
8 List three components of Query processor. ( Rememberibg)
The principal components of the query processor are the wrappers, the sources and services model
(SSM), and the planner.
1 The Sources and Services Model. The SSM stores the relationships between the concepts and roles in
the ontology and the functions used to wrap sources in CPL. ...
2 The Query Planner. ...
3 The Wrappers.
9 Define Distributed Database. ( Rememberibg)
In the most basic terms, a distributed database is a database that stores data in multiple locations instead
of one location. This means that rather than putting all data on one server or on one computer, data is
placed on multiple servers or in a cluster of computers consisting of individual nodes.
10 What are the challenges faced when using an encrypted system? ( Rememberibg)
The complexity of managing encryption and keys, maintaining performance, ensuring accessibility,
complying with regulations, and covering costs are significant hurdles that organizations must
overcome.
PART – B(5*13=65)
11.a. What is data model? List its different types. Explain with suitable example. ( Understanding)
Data Models
Definition: It is a collection of conceptual tools for describing data, relationships among data, semantics
(meaning) of data and constraints.
• Data model provides a way to describe the design of database at physical, logical and view level.
• There are various data models used in database systems and these are as follows -
• Relation model consists of collection of tables which stores data and represents the relationship
among the data.
• The table contains one or more columns and each column has unique name.
• Each table contains record of particular type, and each record type defines a fixed number of fields or
attributes.
• For example - Following figure shows the relational model by showing the relationship between
Student and Result database. For example - Student Ram lives in city Chennai and his marks are 78.
Thus the relationship between these two databases is maintained by the SeatNo. Column
Advantages:
(i) Structural Independence: Structural independence is an ability that allows us to make changes in one
database structure without affecting other. The relational level model have structural independence.
Hence making required changes in thedatabase is convenient in relational database model.
(ii)Conceptual Simplicity: The relational model allows the designer to simply focus on logical design
and not on physical design. Hence relational models are conceptually simple to understand.
(iii) Query Capability: Using simple query language (such as SQL) user can get agile information from
the database or designer can manipulate the database structure.
(iv) Easy design, maintenance and usage: The relational models can be designed logically hence they
are easy to maintain and use.
Disadvantages:
(i) Relational model requires powerful hardware and large data storage devices.
• As the name suggests the entity relationship model uses collection of basic objects called entities and
relationships.
• For example - Following is a representation of Entity Relationship modelin which the relationship
works_for is between entities Employee and Department.
Advantages:
ii) Easy to understand: The design of ER diagram is very logical and hence they are easy to design and
understand.
iv) Integrated: The ER model can be easily integrated with Relational model.
v) Easy conversion: ER model can be converted easily into other type of models.
Disadvantages:
i) Loss of information: While drawing ER model some information can be hidden or lost.
ii) Limited relationships: The ER model can represent limited relationships as compared to other
models.
iii) No Representation for data manipulation: It is not possible to represent data manipulation in ER
model.
• The object oriented languages like C++, Java, C# are becoming the
• To The object based data model combines object oriented features with relational data model.
Advantages:
i) Enriched modelling: The object based data model has capability of modelling the real world objects.
ii) Reusability: There are certain features of object oriented design such as inheritance, polymorphism
which help in re usability.
iii) Support for schema evolution: There is a tight coupling between data and b applications, hence
there is strong support for schema evolution.
iv)Improved performance: Using object based data model there can be significant improvement in
performance using object based data model.
Disadvantages:
i) Lack of universal data model: There is no universally agreed data model for an object based data
model, and most models lack a theoretical foundation.
ii) Lack of experience: In comparison with relational database management the use of object based data
model is limited. This model is more dependent on the skilled programmer.
iii) Complex: More functionalities present in object based data model make the design complex.
• The semi-structured data model permits the specification of data where individual data items of same
type may have different sets of attributes.
• The Extensible Markup Language (XML) is widely used to represent semi- structured data model.
Advantages
ii) It is flexible.
iii) It is portable.
Disadvantages
In DBMS, constraints are the set of rules that ensures that when an authorized user modifies the
database they do not disturb the data consistency and the constraints are specified within the DDL
commands like “alter” and “create” command. There are several types of constraints available in
DBMS and they are:
Domain constraints
Entity Integrity constraints
Referential Integrity constraints
Key constraints
In this article, we will only discuss domain constraints.
Domain Constraints
Domain Constraints are user-defined columns that help the user to enter the value according to the data
type. And if it encounters a wrong input it gives the message to the user that the column is not fulfilled
properly. Or in other words, it is an attribute that specifies all the possible values that the attribute can
hold like integer, character, date, time, string, etc. It defines the domain or the set of values for an
attribute and ensures that the value taken by the attribute must be an atomic value(Can’t be divided)
from its domain.
Domain Constraint = data type(integer / character/date / time / string / etc.) +
Constraints(NOT NULL / UNIQUE / PRIMARY KEY /
FOREIGN KEY / CHECK / DEFAULT)
Type of domain constraints:
There are two types of constraints that come under domain constraint and they are:
1. Domain Constraints – Not Null: Null values are the values that are unassigned or we can also say that
which are unknown or the missing attribute values and by default, a column can hold the null values.
Now as we know that the Not Null constraint restricts a column to not accept the null values which
means it only restricts a field to always contain a value which means you cannot insert a new record or
update a record without adding a value into the field.
Example: In the ’employee’ database, every employee must have a name associated with them.
Normalization is the process to eliminate data redundancy and enhance data integrity in the table.
Normalization also helps to organize the data in the database. It is a multi-step process that sets the data
into tabular form and removes the duplicated data from the relational tables.
Practical Use Widely used in database design. Used in specific cases where data integrity is
critical.
A relation is in fourth normal form (4NF) if it is in BCNF and has no multi-valued dependencies.
Multi-valued dependencies occur when an attribute depends on a set of values rather than a single
value. By eliminating multi-valued dependencies, data redundancy is further reduced.
The concept of 4NF refines the principles of BCNF. It goes a step further by ensuring the absence of
multi-valued dependencies within a table. This normalization form eliminates scenarios where a single
attribute depends on a combination of other attributes, enhancing the overall integrity of the data
structure.
b. i. Illustrate functional dependency with an example. ( Analysis )
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:
ID → Name,
Name → DOB
b. ii. Discuss about dependency preservation. (Remembering)
Dependencies are relationships between tasks that determine their order in a project. Identifying and
recording dependencies is vital for organizing work and making a project schedule. The 4 main types of
dependencies are: mandatory, discretionary, internal, or external.
13.a. Demonstrate conflict serializability and view serializability. ( Analysis)
Conflict Serializability in DBMS checks if a non-serial schedule is conflict serializable or not. View
Serializability checks if a schedule is view serializable or not. If a schedule is a view equivalent to a
Serial Schedule, it is said to be view serializable.
Conflicting Operations
Two operations are said to be conflicting if all conditions are satisfied:
Conflicting operations pair (R1(A), W2(A)) because they belong to two different transactions on the
same data item A and one of them is a write operation.
Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
On the other hand, the (R1(A), W2(B)) pair is non-conflicting because they operate on different data
items.
Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Consider the following schedule:
-> Swapping non-conflicting operations R2(A) and R1(B) in S1, the schedule becomes,
To illustrate, let's consider a tree structure that has four levels of nodes. The top level represents the
entire database, and below it are nodes of type "area", which represent specific areas of the database.
Each area has child nodes called "files", and each file represents a specific subset of data within that
area. Importantly, no file can span more than one area.
Finally, each file has child nodes called "records", which represent individual units of data within the
file. Like files, each record is a child node of its corresponding file and cannot be present in more than
one file. Therefore, the tree can be divided into the following levels, starting from the top −
Database
Area
File
Record
Exclusive Lock
It prevents any other transaction from accessing the data. It is used to prevent other transactions from
reading or modifying the data while a transaction is writing to it.
Intention mode locks are a type of lock used in multiple granularity locking that allows multiple
transactions to acquire locks on the same resource, but with different levels of access.
There are three types of intention mode locks in multiple granularity locking −
These intention mode locks are used to optimize the locking mechanism in a database by allowing
transactions to acquire locks on multiple resources in a coordinated manner. They help prevent
deadlocks and improve concurrency in a database system.
The compatibility metrics for these lock modes are described below −
Compatibility Matrix
IS IX S SIX X
IS YES YES YES YES NO
IX YES YES NO NO NO
S YES NO YES NO NO
SIX YES NO NO NO NO
X NO NO NO NO NO
Intention lock modes are utilized in the multiple-granularity locking protocol to ensure serializability.
According to this protocol, when a transaction (T) attempts to lock a node, it must adhere to the
following guidelines
14.a Discuss B+ tree. Discuss about this Dynamic Index Structure. ( Understanding)
In a B + tree, data pointers are stored only at the leaf nodes of the tree. In a B+ tree structure of a leaf
node differs from the structure of internal nodes. The leaf nodes have an entry for every value of the
search field, along with a data pointer to the record (or to the block that contains this record).
A dynamic index structure is an index that automatically adjusts its structure as the underlying data
grows, shrinks, or otherwise changes. Unlike static index structures, which are fixed in size and
organization, dynamic index structures reorganize themselves in response to changes in the data they
index. This makes them more flexible and scalable, capable of handling varying workloads and data
distributions efficiently.
Dynamic index structures are crucial for databases that are expected to evolve over time, especially
when it comes to insertion, deletion, and updating of records.
B+ Trees
A B+ Tree is a type of self-balancing tree structure commonly used in databases and file systems to
maintain sorted data in a way that allows for efficient insertion, deletion, and search operations. B+
Trees are an extension of B-Trees but differ mainly in the way they handle leaf nodes, which contain all
the key values and point to the actual records.
The I/O costs for different file organizations in database management systems can vary
significantly depending on factors such as the type of access (sequential or random), the structure
of the file organization, and the type of operations (read, write, update, delete). Below is a
comparison of I/O costs for some common file organizations: Heap files, Sorted files, Hash files,
and B+ Tree files.
1. Heap Files
Sequential Access:
Cost: Low
Reason: Data is stored in the order it is inserted, so sequential access involves reading blocks
sequentially.
Random Access:
Cost: High
Reason: No particular order to the records, so finding a specific record may require reading many
blocks.
Insertion:
Cost: Low
Reason: New records can be appended to the end of the file.
Deletion:
Cost: Variable
Reason: Deleting a record may require searching for it, which can be costly.
Update:
Cost: High
Reason: Finding the record to update can be costly.
2. Sorted Files
Sequential Access:
Cost: Low
Reason: Records are stored in sorted order, so sequential access is efficient.
Random Access:
Cost: Medium
Reason: Binary search can be used to locate records, but this still requires logarithmic I/O
operations.
Insertion:
Cost: High
Reason: Maintaining sorted order requires shifting records.
Deletion:
Cost: Medium to High
Reason: Similar to insertion, maintaining order requires shifting records.
Update:
Cost: Medium
Reason: Updating a record may require maintaining order.
3. Hash Files
Sequential Access:
Cost: Very High
Reason: No ordering to the records, making sequential access inefficient.
Random Access:
Cost: Low
Reason: Hash function allows direct access to records.
Insertion:
Cost: Low
Reason: Records are placed according to the hash function.
Deletion:
Cost: Low to Medium
Reason: Finding the record via hash function is efficient, but may involve rehashing.
Update:
Cost: Low
Reason: Similar to insertion and deletion.
4. B+ Tree Files
Sequential Access:
Cost: Low
Reason: Leaf nodes of B+ tree are linked, enabling efficient sequential access.
Random Access:
Cost: Medium
Reason: Logarithmic search through tree levels.
Insertion:
Cost: Medium to High
Reason: Maintaining tree balance requires additional operations.
Deletion:
Cost: Medium to High
Reason: Maintaining tree balance requires additional operations.
Update:
Cost: Medium
Reason: Efficient access to records, but maintaining tree balance requires additional operations.
Summary
Heap Files: Best for workloads with heavy insertions and limited searches or updates.
Sorted Files: Efficient for range queries and sequential access, but costly for insertions and
deletions.
Hash Files: Optimal for direct access based on a key, poor for sequential access.
B+ Tree Files: Balanced performance across different types of operations, especially efficient for
range queries and ordered data access.
Selecting the appropriate file organization depends on the specific requirements of the application
and the expected workload.
15.a Explain distributed database architecture in detail. ( Understanding)
Distributed database architecture involves a single logical database that is spread physically across
computers in multiple locations connected by a network. The design aims to improve performance,
availability, and reliability while maintaining the illusion of a single database to the user. Here’s a
detailed explanation:
Each site is a database system in its own right, with its own hardware, operating system, DBMS, and
applications.
Sites are connected via a network, which could be LAN, WAN, or the internet.
Distributed Database Management System (DDBMS):
Manages the distributed database and provides an interface between users and the distributed data.
Ensures that the distributed nature of the database is transparent to the user.
Types of Distributed Databases
Homogeneous Distributed Database:
Location Transparency: Users do not need to know the physical location of the data.
Replication Transparency: Users are unaware of the data replication.
Fragmentation Transparency: Users are unaware of data fragmentation.
Distributed Transactions:
Data replication ensures that even if one site fails, others can continue to provide data.
Scalability:
Managing and maintaining a distributed database is more complex than a centralized database.
Security:
Ensuring data security across multiple sites and over a network can be challenging.
Consistency:
Key-Value Stores
Key-value stores are a type of NoSQL database that use a simple key-value pair to store data. They are
designed for simplicity, speed, and scalability, making them suitable for applications that require fast
read/write operations on large volumes of data.
Key Concepts
Key-Value Pairs:
Key: A unique identifier for the data. Typically a string, but can also be a number or other simple data
type.
Value: The data associated with the key. Can be a simple data type (string, integer) or a more complex
structure (JSON, binary data).
Storage:
Data is stored as a dictionary or hash table, where each key maps to a value.
The database can be distributed across multiple servers to ensure scalability and fault tolerance.
Operations:
Get (key): Retrieves the value associated with the given key.
Put (key, value): Stores the value associated with the given key.
Delete (key): Removes the key-value pair from the store.
Scalability:
Key Concepts
Roles:
Users are assigned roles based on their responsibilities and job functions.
Roles can be hierarchical, where higher-level roles inherit permissions from lower-level roles.
Role Permissions:
Database administrators define roles and associate them with specific database operations.
Examples: DBA role with full access, ReadOnly role with read permissions.
Assigning Roles to Users:
The DBMS enforces access controls based on the roles and permissions.
Unauthorized access attempts are logged and denied.
Auditing and Monitoring:
Users have access only to the data and operations necessary for their role.
Reduces the risk of data breaches and unauthorized access.
Simplified Management:
Easier to manage permissions through roles rather than individual user assignments.
Streamlines the process of adding, modifying, or removing access.
Compliance:
PART C (1*15=15)
Key:
The given key for the Book schema is {Author, Title}\{ \text{Author, Title} \}{Author, Title}.
Step-by-Step Normalization:
The schema Book\text{Book}Book is already in 1NF as all attributes contain atomic values.
A relation is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the
primary key.
Step-by-Step Normalization:
The schema Collection{Collection}Collection is already in 1NF as all attributes contain atomic values.
Summary:
Thus, the Book schema is decomposed into three schemas to achieve 3NF, and the Collection schema is
already in 3NF and does not require further decomposition.
16.b Consider a B+ tree in which the maximum number of keys in a node is 5. Calculate the minimum
number of keys in any non-root node. ( Analysis)
Properties of B+ Tree
Each internal node can have at most mmm children, where mmm is the order of the B+ tree.
Here, m=6m = 6m=6 since the maximum number of keys is 5, and an internal node with kkk
Each internal node (except the root) must have at least ⌈m2⌉ frac{m}{2} ceil⌈2m⌉ children.
keys has k+1k+1k+1 children.
Minimum Number of Keys
Since each internal node must have at least ⌈m2⌉ ceil frac{m}{2} ceil⌈2m⌉ children, and each
child connection corresponds to a key in the node, the minimum number of keys is one less than
the minimum number of children.
⌈m2⌉=⌈62⌉=⌈3⌉=3 ceil frac{m}{2} ceil = ceil frac{6}{2} ceil = ceil 3 ceil = 3⌈2m⌉=⌈26⌉=⌈3⌉=3
Thus, each non-root internal node must have at least 3 children. Therefore, the minimum number of
keys in each non-root internal node is:
3−1=23 - 1 = 23−1=2
Summary
The minimum number of keys in any non-root node of a B+ tree, where the maximum number of keys
in a node is 5, is 2 .