OS and DBMS Study Material - 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 152

NATIONAL LAW INSTITUTE UNIVERSITY, BHOPAL

B.Sc. LL.B. (HONS.) [CYBER SECURITY]


Semester III
ACADEMIC SESSION 2024-25

Study Material
On

OPERATING SYSTEM AND DATABASE


MANAGEMENT SYSTEM [IC-06]

Course Teacher
Dr. Ashish Mohan Yadav

(For Private Circula on and Academic Purpose Only)


Table of contents
Chapter 1 Introduction
1.1 Meaning and Evolution of Operating Systems
1.2 Types of Operating Systems
Multitasking Operating Systems:
Timesharing Operating Systems:
Multithreading Operating Systems:
Multiprogramming Operating Systems:
Real-time Operating Systems (RTOS):
Distributed Operating Systems:
Network Operating Systems (NOS):
1.3 Operating System Concepts and Structure
Layered Operating Systems:
Monolithic Systems:
Chapter 2 Operating System Functions
2.1 Operating-System Operations
2.2 Communication Process Management
2.3 Memory Management
2.4 File System Management
2.4.1 Mass Storage Management
2.4.3 Caching
Chapter 3 OS System Calls
Chapter 4 OS Process
Meaning of OS Process
Process Control Block (PCB)
Additional Points to Consider for Process Control Block (PCB)
Advantages:
Disadvantages:
Operations on Processes
Scheduling/Dispatching
Blocking
Preemption
Process Termination
Process Schedulers in Operating System
Categories in Scheduling
Long-Term or Job Scheduler
Short-Term or CPU Scheduler
Medium-Term Scheduler
Some Other Schedulers
Comparison among Scheduler
Two-State Process Model Short-term
Context Switching
Inter Process Communication (IPC)
CCCOverall C++ Implementation:
Advantages of IPC:
Disadvantages of IPC:
Context Switch in Operating System
Context Switch
The Need for Context Switching
Context Changes as a Trigger
Process Control Block
State Diagram of Context Switching
Working Process Context Switching
Attributes of a process
1. Process ID
2. Program counter
3. Process State
4. Priority
5. General Purpose Registers
6. List of open files
7. List of open devices
Process States
1. New
2. Ready
3. Running
4. Block or wait
5. Completion or termination
1
6. Suspend ready
7. Suspend wait
Operations on the Process
1. Creation
2. Scheduling
3. Execution
4. Deletion/killing
Process Scheduling in OS (Operating System)
1. Long term scheduler
2. Short term scheduler
3. Medium term scheduler
Process Queues
1. Job Queue
2. Ready Queue
3. Waiting Queue
Various Times related to the Process
1. Arrival Time
2. Burst Time
3. Completion Time
4. Turnaround time
5. Waiting Time
6. Response Time
CPU Scheduling
What is saved in the Process Control Block?
Why do we need Scheduling?
Scheduling Algorithms in OS (Operating System)
The Purpose of a Scheduling algorithm
1. First Come First Serve
2. Round Robin
3. Shortest Job First
4. Shortest remaining time first
5. Priority based scheduling
6. Highest Response Ratio Next
First Come First Serve CPU Process Scheduling in Operating Systems
2
Important Abbreviations
First Come First Serve
Pre Emptive Approach
Non Pre Emptive Approach
Convoy Effect In First Come First Serve (FCFS )
Characteristics of FCFS CPU Process Scheduling
Advantages of FCFS CPU Process Scheduling
Disadvantages of FCFS CPU Process Scheduling
Example
Non Pre Emptive Approach
Solution to the Above Question Example 1
Pre Emptive Approach
Semaphores in OS (Operating System)
Round Robin Scheduling Algorithm
Important Abbreviations
Round Robin CPU Scheduling
Advantages
Disadvantages
Shortest Remaining Time First (SRTF) Scheduling Algorithm
Example
Highest Response Ratio Next (HRRN) Scheduling
Chapter 5 Deadlocks in OS
What is Deadlock in Operating System (OS)?
Difference between Starvation and Deadlock
Necessary conditions for Deadlocks
Strategies for Handling Deadlock
1. Deadlock Ignorance
2. Deadlock prevention
3. Deadlock avoidance
4. Deadlock detection and recovery
Deadlock avoidance
Resources Assigned
Resources still needed
Resource Allocation Graph
3
Example
Deadlock Detection using RAG
Allocation Matrix
Request Matrix
Avial = (0,0,0)
Deadlock Detection and Recovery
For Resource
Preempt the resource
Rollback to a safe state
For Process
Kill a process
Kill all process
Chapter 7 Database Management System (DBMS)
Database Management System (DBMS)
DBMS Architecture Components
Data Models
Relational Database Model: Relational Database Elements
Keys
Structured Query Language
Chapter 8 DBMS Design
Chapter.9 DBMS Transactions
Chapter 10 DBMS Advancements

4
Chapter 1 Introduction

Meaning and Evolution of Operating Systems

An operating system (OS) is software that acts as an intermediary between computer


hardware and user applications. It manages computer resources, provides services to
applications, and ensures proper execution of tasks. The meaning of operating systems lies
in their crucial role in enabling users to interact with computers effectively. Operating
systems handle tasks like memory management, process scheduling, file management,
device drivers, and user interface.

Operating systems, come with a wide range of features that make them powerful, efficient,
and user-friendly. Here are some of the key features commonly found in modern operating
systems:

Multitasking: Modern operating systems can run multiple applications simultaneously,


allowing users to switch between programs seamlessly.

Multiuser Support: They support multiple user accounts, enabling different users to have
their personalized settings and files on the same device.

Graphical User Interface (GUI): Most modern operating systems provide a graphical
interface that makes it easier for users to interact with the computer through windows,
icons, menus, and pointers (WIMP).

File System: Operating systems have advanced file systems that organize and manage data
efficiently, providing features like hierarchical file structures, access control, and metadata
support.

Virtual Memory: This feature allows the operating system to use a portion of the hard
drive as an extension of physical RAM, which improves system performance by allowing
larger applications to run.

Device Drivers: Operating systems include device drivers that facilitate communication
between the hardware devices (e.g., printers, graphics cards, and input devices) and the
software.

5
Networking Support: Modern operating systems offer built-in networking capabilities,
allowing users to connect to the internet, share files, and access remote resources.

Security Features: Security is a crucial aspect, and modern operating systems come with
various security features such as user authentication, access control, firewalls, and
encryption to protect data and the system from threats.

Task Scheduling: The operating system manages the scheduling of tasks and allocates
CPU time to various processes efficiently to ensure smooth and fair execution.

Power Management: With increasing use of mobile devices, operating systems


incorporate power management features to optimize battery usage and extend device
battery life.

Compatibility Support: They are designed to run a wide range of applications and support
different hardware configurations.

Updates and Patches: Modern operating systems often receive regular updates and
patches to fix bugs, enhance security, and improve performance.

Virtualization and Containers: Operating systems often support virtualization and


containerization technologies, allowing users to run multiple virtual machines or isolated
application environments on the same hardware.

Cloud Integration: Many modern operating systems have integrated cloud services,
enabling seamless access and synchronization of files and settings across devices.

Voice Assistants and AI Integration: Some modern operating systems incorporate voice
assistants and AI integration, enabling voice commands and intelligent features.

Evolution of Operating Systems

The evolution of operating systems refers to their historical development, progress, and
improvement over time. Operating systems have gone through several generations and
have witnessed significant advancements since the inception of computers. Here's a brief
overview of the key stages of operating system evolution:

First Generation (1940s-1950s): The earliest computers had no operating systems. Users
interacted with the hardware directly using machine language.
6
Second Generation (1950s-1960s): Batch processing operating systems emerged,
allowing users to submit jobs in batches, which the system executed sequentially.
Examples include IBM's OS/360.

Third Generation (1960s-1970s): Time-sharing operating systems were introduced,


enabling multiple users to access a computer simultaneously through terminals. The
famous example is UNIX.

Fourth Generation (1980s-1990s): Graphical User Interface (GUI) based operating


systems were developed, making computers more user-friendly. Microsoft's Windows and
Apple's Mac OS are prominent examples.

Fifth Generation (2000s-2010s): Modern operating systems continued to evolve with


improvements in security, multitasking, and multimedia support. Mobile operating systems
like Android and iOS became dominant.

Sixth Generation (present and beyond): Operating systems continue to evolve to meet
the demands of emerging technologies like artificial intelligence, virtual reality, and the
Internet of Things (IoT).

Throughout their evolution, operating systems have become more sophisticated, reliable,
and capable of handling complex tasks, making computers and other devices more
accessible and powerful for users.

Please note that this is a brief overview, and the actual historical development of operating
systems is much more intricate with numerous milestones and innovations in the field. The
ongoing evolution of operating systems is likely to continue as technology advances
further.

1.2 Types of Operating Systems

There are different types of operating systems because computer systems serve various
purposes and cater to diverse user needs. Each type of operating system is designed to
address specific requirements, use cases, and hardware configurations. The diversity in
operating systems allows for customization and specialization to optimize performance and
user experience in different computing environments. Different types of OS are

7
Multitasking Operating Systems:

Multitasking operating systems allow multiple applications or processes to run


simultaneously on a computer. The operating system divides the CPU's time among these
processes, giving the illusion that they are all running concurrently. This enables users to
switch between applications seamlessly and increases overall system efficiency. Modern
desktop and server operating systems are typically multitasking systems.

Timesharing Operating Systems:

Timesharing operating systems are a specific type of multitasking system that provides the
illusion of simultaneous execution of multiple processes to multiple users. The CPU rapidly
switches between different user sessions, giving each user a time slice (time-sharing) to
interact with the system. Timesharing systems were prevalent in mainframe computers and
played a significant role in making computing more accessible to multiple users.

Multithreading Operating Systems:

Multithreading operating systems enable multiple threads within a single process to run
concurrently. Threads are lightweight units of execution that share the same resources
within a process, such as memory space and file handles. Multithreading allows for better
utilization of CPU cores and can enhance the performance of applications that are designed
to take advantage of multiple threads.

Multiprogramming Operating Systems:

Multiprogramming operating systems aim to maximize CPU utilization by keeping


multiple programs in memory simultaneously. The operating system switches between
these programs, executing portions of each program in a time-sharing manner.
Multiprogramming allows for efficient use of the CPU, as it can switch to other tasks while
one program is waiting for I/O or other operations.

Real-time Operating Systems (RTOS):

RTOS is designed for applications that require precise and deterministic timing. These
operating systems prioritize tasks based on their time-critical nature, ensuring that critical
tasks meet strict deadlines. RTOS is commonly used in embedded systems, robotics,
industrial automation, and other real-time applications.

8
Distributed Operating Systems:

A distributed operating system runs on multiple interconnected machines and enables them
to work together as a single unified system. Distributed operating systems facilitate
resource sharing, load balancing, and fault tolerance across the network. They are used in
large-scale computing environments and data centers to harness the combined processing
power of multiple machines.

Network Operating Systems (NOS):

Network operating systems are designed to manage and coordinate network resources and
activities. These operating systems provide functionalities like file and printer sharing, user
authentication, and centralized administration of networked devices. NOS is commonly
used in environments where multiple computers need to communicate and share resources
over a local area network (LAN) or a wide area network (WAN).

It's worth noting that many modern operating systems may incorporate multiple of these
types of operating systems features to cater to various needs and use cases. Additionally,
the field of operating systems is continually evolving, and new advancements and
variations may emerge over time.

1.3 Operating System Concepts and Structure

Operating systems (OS) are software that manages computer hardware and provide
services to user applications.

9
The hardware—the central processing unit (CPU), the memory, and the input/output (I/O)
devices—provides the basic computing resources for the system. The application
programs—such as word processors, spreadsheets, compilers, and Web browsers—define
the ways in which these resources are used to solve users’ computing problems. The
operating system controls the hardware and coordinates its use among the various
application programs for the various users.

We can also view a computer system as consisting of hardware, software, and data. The
operating system provides the means for proper use of these resources in the operation of
the computer system. An operating system is similar to a government. Like a government,
it performs no useful function by itself. It simply provides an environment within which
other programs can do useful work. The key concepts and structures of operating systems
are:

Kernel: The kernel is the core component of an operating system, responsible for
managing resources, providing services to applications, and acting as an intermediary
between hardware and software.

Process Management: The OS manages processes, which are instances of running


programs. It allocates CPU time, memory, and other resources to processes and ensures
their proper execution.

Memory Management: The OS handles memory allocation and deallocation, keeping


track of available memory and managing the virtual memory system.

File System: The file system manages the organization and storage of files and directories
on storage devices, such as hard drives.

Device Management: The OS interacts with hardware devices through device drivers,
enabling communication between software and hardware components.

User Interface: The user interface allows users to interact with the operating system and
applications. It can be command-line-based (CLI) or graphical (GUI).

Security: Operating systems implement security mechanisms to protect data and


resources, control access to sensitive information, and prevent unauthorized access.

Networking: Many modern operating systems include networking support to enable


communication over networks and access to remote resources.
10
File and Process Permissions: Operating systems define permissions for files and
processes to control who can read, write, or execute them.

Interprocess Communication (IPC): IPC mechanisms allow processes to exchange data


and coordinate their activities.

Layered Operating Systems:

A layered operating system architecture divides the OS into functional layers, where each
layer provides specific services to the layer above it. The layered approach simplifies the
design, implementation, and maintenance of the operating system. Typically, the layers are
arranged in a hierarchical manner, and each layer communicates only with the layers
directly above and below it. If one layer needs to be modified or replaced, it can often be
done without affecting the other layers.

A common layered structure consists of three layers:

Application Layer: The top layer where user applications and utilities reside.

Operating System Services Layer: This layer provides services like process
management, memory management, file systems, and device drivers. It acts as an interface
between applications and the hardware.

Hardware Layer: The bottom layer interacts directly with the computer hardware,
providing drivers and low-level control over hardware components.

Monolithic Systems:

Monolithic operating systems, in contrast to layered systems, incorporate all operating


system functionalities into a single, large software module. In a monolithic structure, the
kernel manages process management, memory management, file systems, device drivers,
and other services directly. This design makes the operating system more tightly integrated
but can also make it more challenging to modify or maintain.

In a monolithic system, a crash or failure in one part of the kernel can potentially affect the
entire system. However, monolithic systems can be efficient because there is no overhead
from communication between layers, as seen in layered systems.

11
Examples of monolithic operating systems include early versions of UNIX, Linux (to some
extent), and Microsoft Windows (prior to the introduction of the Windows NT architecture,
which follows a hybrid approach).

12
Chapter 2 Operating System Functions

We can view an operating system from several vantage points. One view focuses on the
services that the system provides; another, on the interface that it makes available to users
and programmers; a third, on its components and their interconnections. In this chapter, we
explore all three aspects of operating systems, showing the viewpoints of users,
programmers, and operating system designers. We consider what services an operating
system provides, how they are provided, how they are debugged, and what the various
methodologies are for designing such systems. Finally, we describe how operating systems
are created and how a computer starts its operating system.

Operating-System Operations

As mentioned earlier, modern operating systems are interrupt driven. If there are no
processes to execute, no I/O devices to service, and no users to whom to respond, an
operating system will sit quietly, waiting for something to happen. Events are almost
always signaled by the occurrence of an interrupt or a trap. A trap (or an exception) is a
software-generated interrupt caused either by an error (for example, division by zero or
invalid memory access) or by a specific request from a user program that an operating-
system service be performed. The interrupt-driven nature of an operating system defines
that system’s general structure. For each type of interrupt, separate segments of code in the

13
operating system determine what action should be taken. An interrupt service routine is
provided to deal with the interrupt.

Since the operating system and the users share the hardware and software resources of the
computer system, we need to make sure that an error in a user program could cause
problems only for the one program running. With sharing, many processes could be
adversely affected by a bug in one program. For example, if a process gets stuck in an
infinite loop, this loop could prevent the correct operation of many other processes. More
subtle errors can occur in a multiprogramming system, where one erroneous program might
modify another program, the data of another program, or even the operating system itself.

Without protection against these sorts of errors, either the computer must execute only one
process at a time or all output must be suspect. A properly designed operating system must
ensure that an incorrect (or malicious) program cannot cause other programs to execute
incorrectly

At the very least, we need two separate modes of operation: user mode and kernel mode
(also called supervisor mode, system mode, or privileged mode). A bit, called the mode
bit, is added to the hardware of the computer to indicate the current mode: kernel (0) or
user (1). With the mode bit, we can distinguish between a task that is executed on behalf
of the operating system and one that is executed on behalf of the user. When the computer
system is executing on behalf of a user application, the system is in user mode. However,
when a user application requests a service from the operating system (via a system call),
the system must transition from user to kernel mode to fulfill the request. This is shown in
Figure 1.10. As we shall see, this architectural enhancement is useful for many other
aspects of system operation as well At system boot time, the hardware starts in kernel

14
mode. The operating system is then loaded and starts user applications in user mode.
Whenever a trap or interrupt occurs, the hardware switches from user mode to kernel mode
(that is, changes the state of the mode bit to 0). Thus, whenever the operating system gains
control of the computer, it is in kernel mode. The system always switches to user mode (by
setting the mode bit to 1) before passing control to a user program. The dual mode of
operation provides us with the means for protecting the operating system from errant
users—and errant users from one another. We accomplish this protection by designating
some of the machine instructions that may cause harm as privileged instructions. The
hardware allows privileged instructions to be executed only in kernel mode. If an attempt
is made to execute a privileged instruction in user mode, the hardware does not execute the
instruction but rather treats it as illegal and traps it to the operating system

We can now see the life cycle of instruction execution in a computer system. Initial control
resides in the operating system, where instructions are executed in kernel mode. When
control is given to a user application, the mode is set to user mode. Eventually, control is
switched back to the operating system via an interrupt, a trap, or a system call.

System calls provide the means for a user program to ask the operating system to perform
tasks reserved for the operating system on the user program’s behalf. A system call is
invoked in a variety of ways, depending on the functionality provided by the underlying
processor. In all forms, it is the method used by a process to request action by the operating
system. A system call usually takes the form of a trap to a specific location in the interrupt
vector. This trap can be executed by a generic trap instruction, although some systems
(such as MIPS) have a specific syscall instruction to invoke a system call

When a system call is executed, it is typically treated by the hardware as a software


interrupt. Control passes through the interrupt vector to a service routine in the operating
system, and the mode bit is set to kernel mode. The system-call service routine is a part of
the operating system. The kernel examines the interrupting instruction to determine what
system call has occurred; a parameter indicates what type of service the user program is
requesting. Additional information needed for the request may be passed in registers, on
the stack, or in memory (with pointers to the memory locations passed in registers). The
kernel verifies that the parameters are correct and legal, executes the request, and returns
control to the instruction following the system call

Communication Process Management

A program does nothing unless its instructions are executed by a CPU. A program in
execution, as mentioned, is a process. A time-shared user program such as a compiler is a

15
process. A word-processing program being run by an individual user on a PC is a process.
A system task, such as sending output to a printer, can also be a process (or at least part of
one). For now, you can consider a process to be a job or a time-shared program, but later
you will learn that the concept is more general. As we shall see in Chapter 3, it is possible
to provide system calls that allow processes to create subprocesses to execute concurrently.

A process needs certain resources—including CPU time, memory, files, and I/O devices—
to accomplish its task. These resources are either given to the process when it is created or
allocated to it while it is running. In addition to the various physical and logical resources
that a process obtains when it is created, various initialization data (input) may be passed
along. For example, consider a process whose function is to display the status of a file on
the screen of a terminal. The process will be given the name of the file as input and will
execute the appropriate instructions and system calls to obtain and display the desired
information on the terminal. When the process terminates, the operating system will
reclaim any reusable resources.

A process is the unit of work in a system. A system consists of a collection of processes,


some of which are operating-system processes (those that execute system code) and the
rest of which are user processes (those that execute user code). All these processes can
potentially execute concurrently—by multiplexing on a single CPU, for example.

The operating system is responsible for the following activities in connection with process
management:

• Scheduling processes and threads on the CPUs

• Creating and deleting both user and system processes

• Suspending and resuming processes

• Providing mechanisms for process synchronization

• Providing mechanisms for process communication

In a multi-programming environment, the OS decides the order in which processes have


access to the processor, and how much processing time each process has. This function of
OS is called Process Scheduling. An Operating System performs the following activities
for Processor Management.

16
An operating system manages the processors work by allocating various jobs to it and
ensuring that each process receives enough time from the processor to function properly.

Keeps track of the status of processes. The program which performs this task is known as
a traffic controller. Allocates the CPU that is a processor to a process. De-allocates
processor when a process is no more required.

Memory Management

The main memory is central to the operation of a modern computer system. Main memory
is a large array of bytes, ranging in size from hundreds of thousands to billions. Each byte
has its own address. Main memory is a repository of quickly accessible data shared by the
CPU and I/O devices. The central processor reads instructions from main memory during
the instruction-fetch cycle and both reads and writes data from main memory during the
data-fetch cycle (on a von Neumann architecture). As noted earlier, the main memory is
generally the only large storage device that the CPU is able to address and access directly.
For example, for the CPU to process data from disk, those data must first be transferred to
main memory by CPU-generated I/O calls. In the same way, instructions must be in
memory for the CPU to execute them.

17
For a program to be executed, it must be mapped to absolute addresses and loaded into
memory. As the program executes, it accesses program instructions and data from memory
by generating these absolute addresses. Eventually, the program terminates, its memory
space is declared available, and the next program can be loaded and executed. To improve
both the utilization of the CPU and the speed of the computer’s response to its users,
general-purpose computers must keep several programs in memory, creating a need for
memory management. Many different memory management schemes are used. These
schemes reflect various approaches, and the effectiveness of any given algorithm depends
on the situation. In selecting a memory-management scheme for a specific system, we must
take into account many factors especially the hardware design of the system. Each
algorithm requires its own hardware support.

The operating system is responsible for the following activities in connection with memory
management:

• Keeping track of which parts of memory are currently being used and who is using them

• Deciding which processes (or parts of processes) and data to move into and out of memory

• Allocating and deallocating memory space as needed

File System Management

File management is one of the most visible components of an operating system. Computers
can store information on several different types of physical media. Magnetic disk, optical
disk, and magnetic tape are the most common. Each of these media has its own
characteristics and physical organization. Each medium is controlled by a device, such as
a disk drive or tape drive, that also has its own unique characteristics. These properties
include access speed, capacity, data-transfer rate, and access method (sequential or
random).

A file is a collection of related information defined by its creator. Commonly, files


represent programs (both source and object forms) and data. Data files may be numeric,
alphabetic, alphanumeric, or binary. Files may be free-form (for example, text files), or
they may be formatted rigidly (for example, fixed fields). Clearly, the concept of a file is
an extremely general one.

The operating system implements the abstract concept of a file by managing mass-storage
media, such as tapes and disks, and the devices that control them. In addition, files are
18
normally organized into directories to make them easier to use. Finally, when multiple
users have access to files, it may be desirable to control which user may access a file and
how that user may access it (for example, read, write, append).

The operating system is responsible for the following activities in connection with file
management:

• Creating and deleting files

•Creating and deleting directories to organize files

• Supporting primitives for manipulating files and directories

• Mapping files onto secondary storage

• Backing up files on stable (nonvolatile) storage media

Mass Storage Management

As we have already seen, because main memory is too small to accommodate all data and
programs, and because the data that it holds are lost when power is lost, the computer
system must provide secondary storage to back up main memory. Most modern computer
systems use disks as the principal on-line storage medium for both programs and data. Most
programs—including compilers, assemblers, word processors, editors, and formatters—
are stored on a disk until loaded into memory. They then use the disk as both the source
and destination of their processing. Hence, the proper management of disk storage is of
central importance to a computer system. The operating system is responsible for the
following activities in connection with disk management:

• Free-space management

• Storage allocation

• Disk scheduling

Because secondary storage is used frequently, it must be used efficiently. The entire speed
of operation of a computer may hinge on the speeds of the disk subsystem and the
algorithms that manipulate that subsystem. There are, however, many uses for storage that

19
is slower and lower in cost (and sometimes of higher capacity) than secondary storage.
Backups of disk data, storage of seldom-used data, and long-term archival storage are some
examples. Magnetic tape drives and their tapes and CD and DVD drives and platters are
typical tertiary storage devices. The media (tapes and optical platters) vary between
WORM (write-once, read-many-times) and RW (read– write) formats.

Tertiary storage is not crucial to system performance, but it still must be managed. Some
operating systems take on this task, while others leave tertiary-storage management to
application programs. Some of the functions that operating systems can provide include
mounting and unmounting media in devices, allocating and freeing the devices for
exclusive use by processes, and migrating data from secondary to tertiary storage.

Caching

Caching in memory management refers to the practice of temporarily storing frequently


accessed data in a fast and easily accessible location to improve the overall performance of
a computer system. The goal of caching is to reduce the time it takes to retrieve data by
avoiding the need to access slower storage devices, such as hard disk drives (HDDs) or
solid-state drives (SSDs). Instead, the data is kept in a faster memory location, such as
RAM (Random Access Memory) or cache memory, which can be accessed much more
quickly.

Caching works based on the principle of the "principle of locality," which states that
programs tend to access a relatively small portion of data frequently and exhibit temporal
and spatial locality:

Temporal Locality: If a particular data item is accessed once, it is likely to be accessed


again in the near future. Therefore, caching stores recently accessed data in the cache
memory, making it readily available for subsequent requests.

Spatial Locality: When a data item is accessed, neighboring data items are also likely to
be accessed soon. Caching takes advantage of spatial locality by fetching and storing
contiguous blocks of data rather than individual items.

Caching in memory management can occur at multiple levels within a computer system:

CPU Cache: Modern CPUs have small but extremely fast cache memory located directly
on the processor chip. The CPU cache is divided into multiple levels, such as L1, L2, and

20
L3 caches. These caches store frequently used instructions and data, reducing the time the
CPU spends waiting for data to be fetched from main memory.

Main Memory (RAM) Cache: The main memory itself can have a cache, known as the
Memory Cache or Memory-side Cache. This cache sits between the CPU and the main
memory and serves as an intermediary to speed up memory access.

Disk Caching: Operating systems and file systems also use disk caching to store frequently
accessed data from slower storage devices (HDDs or SSDs) in faster RAM. This helps in
reducing disk I/O operations and improving overall system performance.

Web Browser Caching: Web browsers use caching to store previously downloaded web
pages, images, and other resources. When you revisit a web page, the browser can load
some of the resources from the local cache instead of re-downloading them from the
internet, leading to faster page load times.

Application-Level Caching: Some applications implement their caching mechanisms to


store frequently used data in memory. This is particularly common in databases, where
caching query results can significantly speed up data retrieval.

Caching is an essential technique to optimize memory access and improve the performance
of computer systems. By reducing the need to access slower storage devices and leveraging
the principle of locality, caching ensures that frequently used data is readily available,
leading to faster response times and a more efficient use of system resources.

21
22
Chapter 3 OS System Calls

The most central concept in any operating system is the process: an abstraction of a running
program. Everything else hinges on this concept, and the operating system designer (and
student) should have a thorough understanding of what a process is as early as possible.

Processes are one of the oldest and most important abstractions that operating systems
provide. They support the ability to have (pseudo) concurrent operation ev en when there
is only one CPU available. They turn a single CPU into multiple virtual CPUs. Without the
process abstraction, modern computing could not exist.

All modern computers often do several things at the same time. People used to working
with computers may not be fully aware of this fact, so a few examples may make the point
clearer. First consider a Web server. Requests come in from all over asking for Web pages.
When a request comes in, the server checks to see if the page needed is in the cache. If it
is, it is sent back; if it is not, a disk request is started to fetch it. However, from the CPU’s
perspective, disk requests take eternity. While waiting for a disk request to complete, many
more requests may come in. If there are multiple disks present, some or all of the newer
ones may be fired off to other disks long before the first request is satisfied. Clearly some
way is needed to model and control this concurrency. Processes (and especially threads)
can help here.

Now consider a user PC. When the system is booted, many processes are secretly started,
often unknown to the user. For example, a process may be started up to wait for incoming
email. Another process may run on behalf of the antivirus program to check periodically if
any new virus definitions are available. In addition, explicit user processes may be running,
printing files and backing up the user’s photos on a USB stick, all while the user is surfing
the Web. All this activity has to be managed, and a multiprogramming system supporting
multiple processes comes in very handy here.

In any multiprogramming system, the CPU switches from process to process quickly,
running each for tens or hundreds of milliseconds. While, strictly speaking, at any one
instant the CPU is running only one process, in the course of 1 second it may work on
23
several of them, giving the illusion of parallelism. Sometimes people speak of
pseudoparallelism in this context, to contrast it with the true hardware parallelism of
multiprocessor systems (which have two or more CPUs sharing the same physical
memory). Keeping track of multiple, parallel activities is hard for people to do. Therefore,
operating system designers over the years have ev olved a conceptual model (sequential
processes) that makes parallelism easier to deal with.

Process Control Block (PCB)

While creating a process the operating system performs several operations. To identify the
processes, it assigns a process identification number (PID) to each process. As the
operating system supports multi-programming, it needs to keep track of all the processes.
For this task, the process control block (PCB) is used to track the process’s execution status.
Each block of memory contains information about the process state, program counter, stack
pointer, status of opened files, scheduling algorithms, etc. All this information is required
and must be saved when the process is switched from one state to another. When the
process makes a transition from one state to another, the operating system must update
information in the process’s PCB. A process control block (PCB) contains information
about the process, i.e. registers, quantum, priority, etc. The process table is an array of
PCBs, that means logically contains a PCB for all of the current processes in the system.

24
● Pointer – It is a stack pointer which is required to be saved when the process is
switched from one state to another to retain the current position of the process.
● Process state – It stores the respective state of the process.
● Process number – Every process is assigned with a unique id known as process ID
or PID which stores the process identifier.
● Program counter – It stores the counter which contains the address of the next
instruction that is to be executed for the process.
● Register – These are the CPU registers which includes: accumulator, base, registers
and general purpose registers.
● Memory limits – This field contains the information about memory management
system used by operating system. This may include the page tables, segment tables etc.
● Open files list – This information includes the list of files opened for a process.

Additional Points to Consider for Process Control Block (PCB)

● Interrupt handling: The PCB also contains information about the interrupts that
a process may have generated and how they were handled by the operating system.
● Context switching: The process of switching from one process to another is called
context switching. The PCB plays a crucial role in context switching by saving the state of
the current process and restoring the state of the next process.
● Real-time systems: Real-time operating systems may require additional
information in the PCB, such as deadlines and priorities, to ensure that time-critical
processes are executed in a timely manner.
● Virtual memory management: The PCB may contain information about a
process’s virtual memory management, such as page tables and page fault handling.
● Inter-process communication: The PCB can be used to facilitate inter-process
communication by storing information about shared resources and communication
channels between processes.
● Fault tolerance: Some operating systems may use multiple copies of the PCB to
provide fault tolerance in case of hardware failures or software errors.

Advantages:

Efficient process management: The process table and PCB provide an efficient way to
manage processes in an operating system. The process table contains all the information
about each process, while the PCB contains the current state of the process, such as the
program counter and CPU registers.
Resource management: The process table and PCB allow the operating system to manage
system resources, such as memory and CPU time, efficiently. By keeping track of each

25
process’s resource usage, the operating system can ensure that all processes have access to
the resources they need.
Process synchronization: The process table and PCB can be used to synchronize
processes in an operating system. The PCB contains information about each process’s
synchronization state, such as its waiting status and the resources it is waiting for.
Process scheduling: The process table and PCB can be used to schedule processes for
execution. By keeping track of each process’s state and resource usage, the operating
system can determine which processes should be executed next.

Disadvantages:

Overhead: The process table and PCB can introduce overhead and reduce system
performance. The operating system must maintain the process table and PCB for each
process, which can consume system resources.
Complexity: The process table and PCB can increase system complexity and make it more
challenging to develop and maintain operating systems. The need to manage and
synchronize multiple processes can make it more difficult to design and implement system
features and ensure system stability.
Scalability: The process table and PCB may not scale well for large-scale systems with
many processes. As the number of processes increases, the process table and PCB can
become larger and more difficult to manage efficiently.
Security: The process table and PCB can introduce security risks if they are not
implemented correctly. Malicious programs can potentially access or modify the process
table and PCB to gain unauthorized access to system resources or cause system instability.
Miscellaneous accounting and status data – This field includes information about the
amount of CPU used, time constraints, jobs or process number, etc. The process control
block stores the register content also known as execution content of the processor when it
was blocked from running. This execution content architecture enables the operating
system to restore a process’s execution context when the process returns to the running
state. When the process makes a transition from one state to another, the operating system
updates its information in the process’s PCB. The operating system maintains pointers to

26
each process’s PCB in a process table so that it can access the PCB quickly.

27
Operations on Processes
A process is an activity of executing a program. Basically, it is a program
underexecution. Every process needs certain resources to complete its task.

The execution of a process is a complex activity. It involves various operations. Following


are the operations that are performed while execution of a process:

28
Creation
This is the initial step of the process execution activity. Process creation means the
construction of a new process for execution. This might be performed by the system, the
user, or the old process itself. There are several events that lead to the process creation.
Some of the such events are the following:
1. When we start the computer, the system creates several background processes.
2. A user may request to create a new process.
3. A process can create a new process itself while executing.
4. The batch system takes initiation of a batch job.

Scheduling/Dispatching

The event or activity in which the state of the process is changed from ready to run. It
means the operating system puts the process from the ready state into the running state.
Dispatching is done by the operating system when the resources are free or the process has
higher priority than the ongoing process. There are various other cases in which the process
in the running state is preempted and the process in the ready state is dispatched by the
operating system.

Blocking

29
When a process invokes an input-output system call that blocks the process, and operating
system is put in block mode. Block mode is basically a mode where the process waits for
input-output. Hence on the demand of the process itself, the operating system blocks the
process and dispatches another process to the processor. Hence, in process-blocking
operations, the operating system puts the process in a ‘waiting’ state.

Preemption

When a timeout occurs that means the process hadn’t been terminated in the allotted time
interval and the next process is ready to execute, then the operating system preempts the
process. This operation is only valid where CPU scheduling supports preemption.
Basically, this happens in priority scheduling where on the incoming of high priority
process the ongoing process is preempted. Hence, in process preemption operation, the
operating system puts the process in a ‘ready’ state.

Process Termination

Process termination is the activity of ending the process. In other words, process
termination is the relaxation of computer resources taken by the process for the execution.
Like creation, in termination also there may be several events that may lead to the process
of termination. Some of them are:

• The process completes its execution fully and it indicates to the OS that it has
finished.
• The operating system itself terminates the process due to service errors.
• There may be a problem in hardware that terminates the process.
• One process can be terminated by another process.

Process Schedulers in Operating System


Process scheduling is the activity of the process manager that handles the removal of the
running process from the CPU and the selection of another process on the basis of a
particular strategy.
Process scheduling is an essential part of a Multiprogramming operating system. Such
operating systems allow more than one process to be loaded into the executable memory
at a time and the loaded process shares the CPU using time multiplexing.
Categories in Scheduling
Scheduling falls into one of two categories:

30
• Non-preemptive: In this case, a process’s resource cannot be taken before the
process has finished running. When a running process finishes and transitions to a
waiting state, resources are switched.
• Preemptive: In this case, the OS assigns resources to a process for a predetermined
period of time. The process switches from running state to ready state or from
waiting for state to ready state during resource allocation. This switching happens
because the CPU may give other processes priority and substitute the currently
active process for the higher priority process.

There are three types of process schedulers.


Long-Term or Job Scheduler
It brings the new process to the ‘Ready State’. It controls the Degree of Multi-
programming, i.e., the number of processes present in a ready state at any point in time. It
is important that the long-term scheduler make a careful selection of both I/O and CPU-
bound processes. I/O-bound tasks are which use much of their time in input and output
operations while CPU-bound processes are which spend their time on the CPU. The job
scheduler increases efficiency by maintaining a balance between the two. They operate at
a high level and are typically used in batch-processing systems.
Short-Term or CPU Scheduler

It is responsible for selecting one process from the ready state for scheduling it on the
running state. Note: Short-term scheduler only selects the process to schedule it doesn’t
load the process on running. Here is when all the scheduling algorithms are used. The CPU
scheduler is responsible for ensuring no starvation due to high burst time processes.The
dispatcher is responsible for loading the process selected by the Short-term scheduler on
the CPU (Ready to Running State) Context switching is done by the dispatcher only. A
dispatcher does the following:

1. Switching context.
2. Switching to user mode.
3. Jumping to the proper location in the newly loaded program.

Medium-Term Scheduler
It is responsible for suspending and resuming the process. It mainly does swapping
(moving processes from main memory to disk and vice versa). Swapping may be necessary
to improve the process mix or because a change in memory requirements has
overcommitted available memory, requiring memory to be freed up. It is helpful in

31
maintaining a perfect balance between the I/O bound and the CPU bound. It reduces the
degree of multiprogramming.
Some Other Schedulers

● I/O schedulers: I/O schedulers are in charge of managing the execution of I/O
operations such as reading and writing to discs or networks. They can use various
algorithms to determine the order in which I/O operations are executed, such as FCFS
(First-Come, First-Served) or RR (Round Robin).
● Real-time schedulers: In real-time systems, real-time schedulers ensure that
critical tasks are completed within a specified time frame. They can prioritize and schedule
tasks using various algorithms such as EDF (Earliest Deadline First) or RM (Rate
Monotonic).

Comparison among Scheduler

Long Term Short term Medium Term


Scheduler schedular Scheduler

It is a CPU It is a process-
It is a job scheduler
scheduler swapping scheduler.

Speed lies in
Generally, Speed is Speed is the
between both short
lesser than short term fastest among all
and long-term
scheduler of them.
schedulers.

32
It gives less
It controls the degree control over how It reduces the degree
of much of
multiprogramming multiprogrammin multiprogramming.
g is done.

It is barely present or It is a minimal It is a component of


nonexistent in the time-sharing systems for time
time-sharing system. system. sharing.

It can re-enter the It can re-introduce


It selects those
process into the process into
processes which
memory, allowing memory and
are ready to
for the continuation execution can be
execute
of execution. continued.

Two-State Process Model Short-term


The terms “running” and “non-running” states are used to describe the two-state process
model.

S
.
State Description
n
o

33
Running

1
. A newly created process joins the system in a running state when it is
created.

Not running

Processes that are not currently running are kept in a queue and await
execution. A pointer to a specific process is contained in each entry in the
2
queue. Linked lists are used to implement the queue system. This is how
.
the dispatcher is used. When a process is stopped, it is moved to the back
of the waiting queue. The process is discarded depending on whether it
succeeded or failed. The dispatcher then chooses a process to run from the
queue in either scenario.

Context Switching
In order for a process execution to be continued from the same point at a later time, context
switching is a mechanism to store and restore the state or context of a CPU in the Process
Control block. A context switcher makes it possible for multiple processes to share a single
CPU using this method. A multitasking operating system must include context switching
among its features.
The state of the currently running process is saved into the process control block when the
scheduler switches the CPU from executing one process to another. The state used to set
the PC, registers, etc. for the process that will run next is then loaded from its own PCB.
After that, the second can start processing.

34
In order for a process execution to be continued from the same point at a later time, context
switching is a mechanism to store and restore the state or context of a CPU in the Process
Control block. A context switcher makes it possible for multiple processes to share a single
CPU using this method. A multitasking operating system must include context switching
among its features.

● Program Counter
● Scheduling information
● The base and limit register value
● Currently used register
● Changed State
● I/O State information
● Accounting information

Inter Process Communication (IPC)


A process can be of two types:

● Independent process.
● Co-operating process.

An independent process is not affected by the execution of other processes while a co-
operating process can be affected by other executing processes. Though one can think that
those processes, which are running independently, will execute very efficiently, in reality,
there are many situations when co-operative nature can be utilized for increasing
computational speed, convenience, and modularity. Inter-process communication (IPC) is
a mechanism that allows processes to communicate with each other and synchronize their
actions. The communication between these processes can be seen as a method of co-

35
operation between them. Processes can communicate with each other through both:

1. Shared Memory
2. Message passing

Figure 1 below shows a basic structure of communication between processes via the shared
memory method and via the message passing method.
An operating system can implement both methods of communication. First, we will discuss
the shared memory methods of communication and then message passing. Communication
between processes using shared memory requires processes to share some variable, and it
completely depends on how the programmer will implement it. One way of communication
using shared memory can be imagined like this: Suppose process1 and process2 are
executing simultaneously, and they share some resources or use some information from
another process. Process1 generates information about certain computations or resources
being used and keeps it as a record in shared memory. When process2 needs to use the
shared information, it will check in the record stored in shared memory and take note of
the information generated by process1 and act accordingly. Processes can use shared
memory for extracting information as a record from another process as well as for
delivering any specific information to other processes.
Let’s discuss an example of communication between processes using the shared memory
method.

36
i) Shared Memory Method
Ex: Producer-Consumer problem
There are two processes: Producer and Consumer. The producer produces some items and
the Consumer consumes that item. The two processes share a common space or memory
location known as a buffer where the item produced by the Producer is stored and from
which the Consumer consumes the item if needed. There are two versions of this problem:
the first one is known as the unbounded buffer problem in which the Producer can keep on
producing items and there is no limit on the size of the buffer, the second one is known as
the bounded buffer problem in which the Producer can produce up to a certain number of
items before it starts waiting for Consumer to consume it. We will discuss the bounded
buffer problem. First, the Producer and the Consumer will share some common memory,
then the producer will start producing items. If the total produced item is equal to the size
of the buffer, the producer will wait to get it consumed by the Consumer. Similarly, the
consumer will first check for the availability of the item. If no item is available, the
Consumer will wait for the Producer to produce it. If there are items available, Consumer
will consume them.
ii) Messaging Passing Method
Now, We will start our discussion of the communication between processes via message
passing. In this method, processes communicate with each other without using any kind
of shared memory. If two processes p1 and p2 want to communicate with each other, they
proceed as follows:

• Establish a communication link (if a link already exists, no need to establish it


again.)
• Start exchanging messages using basic primitives.
We need at least two primitives:
– send(message, destination) or send(message)
– receive(message, host) or receive(message)

37
The message size can be of fixed size or of variable size. If it is of fixed size, it is easy for
an OS designer but complicated for a programmer and if it is of variable size then it is
easy for a programmer but complicated for the OS designer. A standard message can
have two parts: header and body.

The header part is used for storing message type, destination id, source id, message length,
and control information. The control information contains information like what to do if
runs out of buffer space, sequence number, priority. Generally, message is sent using FIFO
style.

Message Passing through Communication Link.


Direct and Indirect Communication link
Now, We will start our discussion about the methods of implementing communication
links. While implementing the link, there are some questions that need to be kept in mind
like :

1. How are links established?


2. Can a link be associated with more than two processes?
3. How many links can there be between every pair of communicating processes?
4. What is the capacity of a link? Is the size of a message that the link can accommodate
fixed or variable?
5. Is a link unidirectional or bi-directional?

38
A link has some capacity that determines the number of messages that can reside in it
temporarily for which every link has a queue associated with it which can be of zero
capacity, bounded capacity, or unbounded capacity. In zero capacity, the sender waits until
the receiver informs the sender that it has received the message. In non-zero capacity cases,
a process does not know whether a message has been received or not after the send
operation. For this, the sender must communicate with the receiver explicitly.
Implementation of the link depends on the situation, it can be either a direct communication
link or an in-directed communication link.

Direct Communication links are implemented when the processes use a specific process
identifier for the communication, but it is hard to identify the sender ahead of time.

For example the print server. In-direct Communication is done via a shared mailbox
(port), which consists of a queue of messages. The sender keeps the message in mailbox
and the receiver picks them up.

Message Passing through Exchanging the Messages.

Synchronous and Asynchronous Message Passing:

A process that is blocked is one that is waiting for some event, such as a resource becoming
available or the completion of an I/O operation. IPC is possible between the processes on
same computer as well as on the processes running on different computer i.e. in
networked/distributed system. In both cases, the process may or may not be blocked while
sending a message or attempting to receive a message so message passing may be blocking
or non-blocking. Blocking is considered synchronous and blocking send means the sender
will be blocked until the message is received by receiver. Similarly, blocking receive has
the receiver block until a message is available. Non-blocking is considered asynchronous
and Non-blocking send has the sender sends the message and continue. Similarly, Non-
blocking receive has the receiver receive a valid message or null. After a careful analysis,
we can come to a conclusion that for a sender it is more natural to be non-blocking after
message passing as there may be a need to send the message to different processes.
However, the sender expects acknowledgment from the receiver in case the send fails.
Similarly, it is more natural for a receiver to be blocking after issuing the receive as the
information from the received message may be used for further execution. At the same
time, if the message send keep on failing, the receiver will have to wait indefinitely. That

39
is why we also consider the other possibility of message passing. There are basically three
preferred combinations:

● Blocking send and blocking receive


● Non-blocking send and Non-blocking receive
● Non-blocking send and Blocking receive (Mostly used)

In Direct message passing,


The process which wants to communicate must explicitly name the recipient or sender of
the communication.

e.g. send(p1, message) means send the message to p1.

Similarly, receive(p2, message) means to receive the message from p2.

In this method of communication, the communication link gets established automatically,


which can be either unidirectional or bidirectional, but one link can be used between one
pair of the sender and receiver and one pair of sender and receiver should not possess more
than one pair of links. Symmetry and asymmetry between sending and receiving can also
be implemented i.e. either both processes will name each other for sending and receiving
the messages or only the sender will name the receiver for sending the message and there
is no need for the receiver for naming the sender for receiving the message. The problem
with this method of communication is that if the name of one process changes, this method
will not work.

In Indirect message passing, processes use mailboxes (also referred to as ports) for
sending and receiving messages. Each mailbox has a unique id and processes can
communicate only if they share a mailbox. Link established only if processes share a
common mailbox and a single link can be associated with many processes. Each pair of
processes can share several communication links and these links may be unidirectional or
bi-directional. Suppose two processes want to communicate through Indirect message
passing, the required operations are: create a mailbox, use this mailbox for sending and
receiving messages, then destroy the mailbox.
The standard primitives used are:
send(A, message) which means send the message to mailbox A.

40
The primitive for the receiving the message also works in the same way e.g. received (A,
message). There is a problem with this mailbox implementation. Suppose there are more
than two processes sharing the same mailbox and suppose the process p1 sends a message
to the mailbox, which process will be the receiver? This can be solved by either enforcing
that only two processes can share a single mailbox or enforcing that only one process is
allowed to execute the receive at a given time or select any process randomly and notify
the sender about the receiver. A mailbox can be made private to a single sender/receiver
pair and can also be shared between multiple sender/receiver pairs. Port is an
implementation of such mailbox that can have multiple senders and a single receiver. It is
used in client/server applications (in this case the server is the receiver). The port is owned
by the receiving process and created by OS on the request of the receiver process and can
be destroyed either on request of the same receiver processor when the receiver terminates
itself. Enforcing that only one process is allowed to execute the receive can be done using
the concept of mutual exclusion. Mutex mailbox is created which is shared by n process.
The sender is non-blocking and sends the message. The first process which executes the
receive will enter in the critical section and all other processes will be blocking and will
wait.

Examples of IPC systems

1. Posix : uses shared memory method.


2. Mach : uses message passing
3. Windows XP : uses message passing using local procedural calls

Communication in client/server Architecture:


There are various mechanism:

● Pipe
● Socket
● Remote Procedural calls (RPCs)

Inter-process communication (IPC) is the mechanism through which processes or threads


can communicate and exchange data with each other on a computer or across a network.
IPC is an important aspect of modern operating systems, as it enables different processes
to work together and share resources, leading to increased efficiency and flexibility.

Advantages of IPC:
41
• Enables processes to communicate with each other and share resources, leading to
increased efficiency and flexibility.
• Facilitates coordination between multiple processes, leading to better overall
system performance.
• Allows for the creation of distributed systems that can span multiple computers or
networks.
• Can be used to implement various synchronization and communication protocols,
such as semaphores, pipes, and sockets.

Disadvantages of IPC:

• Increases system complexity, making it harder to design, implement, and debug.


• Can introduce security vulnerabilities, as processes may be able to access or modify
data belonging to other processes.
• Requires careful management of system resources, such as memory and CPU time,
to ensure that IPC operations do not degrade overall system performance.
Can lead to data inconsistencies if multiple processes try to access or modify the
same data at the same time.

Overall, the advantages of IPC outweigh the disadvantages, as it is a necessary mechanism


for modern operating systems and enables processes to work together and share resources
in a flexible and efficient manner. However, care must be taken to design and implement
IPC systems carefully, in order to avoid potential security vulnerabilities and performance
issues.

Context Switch in Operating System


An operating system is a program loaded into a system or computer. and manage all the
other program which is running on that OS Program, it manages the all other application
programs. or in other words, we can say that the OS is an interface between the user and
computer hardware.
So in this article, we will learn about what is Context switching in an Operating System
and see how it works also understands the triggers of context switching and an overview
of the Operating System.
Context Switch
Context switching in an operating system involves saving the context or state of a running
process so that it can be restored later, and then loading the context or state of another.
process and run it.

42
Context Switching refers to the process/method used by the system to change the process
from one state to another using the CPUs present in the system to perform its job.
Example – Suppose in the OS there (N) numbers of processes are stored in a Process
Control Block(PCB). like The process is running using the CPU to do its job. While a
process is running, other processes with the highest priority queue up to use the CPU to
complete their job.
The Need for Context Switching
Context switching enables all processes to share a single CPU to finish their execution and
store the status of the system’s tasks. The execution of the process begins at the same place
where there is a conflict when the process is reloaded into the system.
The operating system’s need for context switching is explained by the reasons listed below.

• One process does not directly switch to another within the system. Context
switching makes it easier for the operating system to use the CPU’s resources to
carry out its tasks and store its context while switching between multiple processes.
• Context switching enables all processes to share a single CPU to finish their
execution and store the status of the system’s tasks. The execution of the process
begins at the same place where there is a conflict when the process is reloaded into
the system.
• Context switching only allows a single CPU to handle multiple processes requests
parallelly without the need for any additional processors.

Context Changes as a Trigger


The three different categories of context-switching triggers are as follows.

1. Interrupts
2. Multitasking
3. User/Kernel switch

Interrupts: When a CPU requests that data be read from a disc, if any interruptions occur,
context switching automatically switches to a component of the hardware that can handle
the interruptions more quickly.
Multitasking: The ability for a process to be switched from the CPU so that another
process can run is known as context switching. When a process is switched, the previous
state is retained so that the process can continue running at the same spot in the system.
Kernel/User Switch: This trigger is used when the OS needed to switch between the user
mode and kernel mode.

43
When switching between user mode and kernel/user mode is necessary, operating systems
use the kernel/user switch.
Process Control Block
So, The Process Control block(PCB) is also known as a Task Control Block. it represents
a process in the Operating System. A process control block (PCB) is a data structure used
by a computer to store all information about a process. It is also called the descriptive
process. When a process is created (started or installed), the operating system creates a
process manager.
State Diagram of Context Switching

Working Process Context Switching


So the context switching of two processes, the priority-based process occurs in the ready
queue of the process control block. These are the following steps.

• The state of the current process must be saved for rescheduling.


• The process state contains records, credentials, and operating system-specific
information stored on the PCB or switch.
• The PCB can be stored in a single layer in kernel memory or in a custom OS file.
44
• A handle has been added to the PCB to have the system ready to run.
• The operating system aborts the execution of the current process and selects a
process from the waiting list by tuning its PCB.
• Load the PCB’s program counter and continue execution in the selected process.
• Process/thread values can affect which processes are selected from the queue, this
can be important.

Attributes of a process

The Attributes of the process are used by the Operating System to create the process control
block (PCB) for each of them. This is also called context of the process. Attributes which
are stored in the PCB are described below.

1. Process ID

When a process is created, a unique id is assigned to the process which is used for unique
identification of the process in the system.

2. Program counter

A program counter stores the address of the last instruction of the process on which the
process was suspended. The CPU uses this address when the execution of this process is
resumed.

3. Process State

The Process, from its creation to the completion, goes through various states which are
new, ready, running and waiting. We will discuss about them later in detail.

4. Priority

Every process has its own priority. The process with the highest priority among the
processes gets the CPU first. This is also stored on the process control block.

5. General Purpose Registers

Every process has its own set of registers which are used to hold the data which is generated
during the execution of the process.

6. List of open files

45
During the Execution, Every process uses some files which need to be present in the main
memory. OS also maintains a list of open files in the PCB.

7. List of open devices

OS also maintain the list of all open devices which are used during the execution of the
process.

Process States

State Diagram

46
The process, from its creation to completion, passes through various states. The minimum
number of states is five.

The names of the states are not standardized although the process may be in one of the
following states during execution.

1. New

A program which is going to be picked up by the OS into the main memory is called a new
process.

2. Ready

Whenever a process is created, it directly enters in the ready state, in which, it waits for the
CPU to be assigned. The OS picks the new processes from the secondary memory and put
all of them in the main memory.

The processes which are ready for the execution and reside in the main memory are called
ready state processes. There can be many processes present in the ready state.

3. Running

One of the processes from the ready state will be chosen by the OS depending upon the
scheduling algorithm. Hence, if we have only one CPU in our system, the number of
running processes for a particular time will always be one. If we have n processors in the
system then we can have n processes running simultaneously.

4. Block or wait

From the Running state, a process can make the transition to the block or wait state
depending upon the scheduling algorithm or the intrinsic behavior of the process.

When a process waits for a certain resource to be assigned or for the input from the user
then the OS move this process to the block or wait state and assigns the CPU to the other
processes.

5. Completion or termination

When a process finishes its execution, it comes in the termination state. All the context of
the process (Process Control Block) will also be deleted the process will be terminated by
the Operating system.

6. Suspend ready
47
A process in the ready state, which is moved to secondary memory from the main memory
due to lack of the resources (mainly primary memory) is called in the suspend ready state.

If the main memory is full and a higher priority process comes for the execution then the
OS have to make the room for the process in the main memory by throwing the lower
priority process out into the secondary memory. The suspend ready processes remain in
the secondary memory until the main memory gets available.

7. Suspend wait

Instead of removing the process from the ready queue, it's better to remove the blocked
process which is waiting for some resources in the main memory. Since it is already waiting
for some resource to get available hence it is better if it waits in the secondary memory and
make room for the higher priority process. These processes complete their execution once
the main memory gets available and their wait is finished.

48
Chapter 4 OS Processes
1. Creation

Once the process is created, it will be ready and come into the ready queue (main memory)
and will be ready for the execution.

2. Scheduling

Out of the many processes present in the ready queue, the Operating system chooses one
process and start executing it. Selecting the process which is to be executed next, is known
as scheduling.

3. Execution

Once the process is scheduled for the execution, the processor starts executing it. Process
may come to the blocked or wait state during the execution then in that case the processor
starts executing the other processes.

4. Deletion/killing

Once the purpose of the process gets over then the OS will kill the process. The Context of
the process (PCB) will be deleted and the process gets terminated by the Operating system.

Process Scheduling in OS (Operating System)

Operating system uses various schedulers for the process scheduling described below.

1. Long term scheduler

Long term scheduler is also known as job scheduler. It chooses the processes from the pool
(secondary memory) and keeps them in the ready queue maintained in the primary
memory.

Long Term scheduler mainly controls the degree of Multiprogramming. The purpose of
long term scheduler is to choose a perfect mix of IO bound and CPU bound processes
among the jobs present in the pool.

If the job scheduler chooses more IO bound processes then all of the jobs may reside in the
blocked state all the time and the CPU will remain idle most of the time. This will reduce
49
the degree of Multiprogramming. Therefore, the Job of long term scheduler is very critical
and may affect the system for a very long time.

2. Short term scheduler

Short term scheduler is also known as CPU scheduler. It selects one of the Jobs from the
ready queue and dispatch to the CPU for the execution.

A scheduling algorithm is used to select which job is going to be dispatched for the
execution. The Job of the short term scheduler can be very critical in the sense that if it
selects job whose CPU burst time is very high then all the jobs after that, will have to wait
in the ready queue for a very long time.

This problem is called starvation which may arise if the short term scheduler makes some
mistakes while selecting the job.

3. Medium term scheduler

Medium term scheduler takes care of the swapped out processes. If the running state
processes needs some IO time for the completion then there is a need to change its state
from running to waiting.

Medium term scheduler is used for this purpose. It removes the process from the running
state to make room for the other processes. Such processes are the swapped out processes
and this procedure is called swapping. The medium term scheduler is responsible for
suspending and resuming the processes.

It reduces the degree of multiprogramming. The swapping is necessary to have a perfect


mix of processes in the ready queue.

Process Queues

The Operating system manages various types of queues for each of the process states. The
PCB related to the process is also stored in the queue of the same state. If the Process is
moved from one state to another state then its PCB is also unlinked from the corresponding
queue and added to the other state queue in which the transition is made.

50
There are the following queues maintained by the Operating system.

1. Job Queue

In starting, all the processes get stored in the job queue. It is maintained in the secondary
memory. The long term scheduler (Job scheduler) picks some of the jobs and put them in
the primary memory.

2. Ready Queue

Ready queue is maintained in primary memory. The short term scheduler picks the job
from the ready queue and dispatch to the CPU for the execution.

3. Waiting Queue

When the process needs some IO operation in order to complete its execution, OS changes
the state of the process from running to waiting. The context (PCB) associated with the
process gets stored on the waiting queue which will be used by the Processor when the
process finishes the IO.

Various Times related to the Process

1. Arrival Time

The time at which the process enters into the ready queue is called the arrival time.

2. Burst Time

The total amount of time required by the CPU to execute the whole process is called the
Burst Time. This does not include the waiting time. It is confusing to calculate the

51
execution time for a process even before executing it hence the scheduling problems based
on the burst time cannot be implemented in reality.

3. Completion Time

The Time at which the process enters into the completion state or the time at which the
process completes its execution, is called completion time.

4. Turnaround time

The total amount of time spent by the process from its arrival to its completion, is called
Turnaround time.

5. Waiting Time

The Total amount of time for which the process waits for the CPU to be assigned is called
waiting time.

6. Response Time

The difference between the arrival time and the time at which the process first gets the
CPU is called Response Time.

CPU Scheduling

In the uniprogramming systems like MS DOS, when a process waits for any I/O
operation to be done, the CPU remains idol. This is an overhead since it wastes the time
and causes the problem of starvation. However, In Multiprogramming systems, the CPU
doesn't remain idle during the waiting time of the Process and it starts executing other
processes. Operating System has to define which process the CPU will be given.

In Multiprogramming systems, the Operating system schedules the processes on the CPU
to have the maximum utilization of it and this procedure is called CPU scheduling. The
Operating System uses various scheduling algorithm to schedule the processes.

This is a task of the short term scheduler to schedule the CPU for the number of processes
present in the Job Pool. Whenever the running process requests some IO operation then the
short term scheduler saves the current context of the process (also called PCB) and changes
its state from running to waiting. During the time, process is in waiting state; the Short term
scheduler picks another process from the ready queue and assigns the CPU to this process.
This procedure is called context switching.

52
What is saved in the Process Control Block?

The Operating system maintains a process control block during the lifetime of the process.
The Process control block is deleted when the process is terminated or killed. There is the
following information which is saved in the process control block and is changing with the
state of the process.

Why do we need Scheduling?

In Multiprogramming, if the long term scheduler picks more I/O bound processes then most
of the time, the CPU remains idol. The task of Operating system is to optimize the
utilization of resources.

If most of the running processes change their state from running to waiting then there may
always be a possibility of deadlock in the system. Hence to reduce this overhead, the OS
needs to schedule the jobs to get the optimal utilization of CPU and to avoid the possibility
to deadlock.

53
Scheduling Algorithms in OS (Operating System)

There are various algorithms which are used by the Operating System to schedule the
processes on the processor in an efficient way.

The Purpose of a Scheduling algorithm

• Maximum CPU utilization


• Fare allocation of CPU
• Maximum throughput
• Minimum turnaround time
• Minimum waiting time
• Minimum response time

There are the following algorithms which can be used to schedule the jobs.

1. First Come First Serve

It is the simplest algorithm to implement. The process with the minimal arrival time will
get the CPU first. The lesser the arrival time, the sooner will the process gets the CPU. It
is the non-preemptive type of scheduling.

2. Round Robin

In the Round Robin scheduling algorithm, the OS defines a time quantum (slice). All the
processes will get executed in the cyclic way. Each of the process will get the CPU for a
small amount of time (called time quantum) and then get back to the ready queue to wait
for its next turn. It is a preemptive type of scheduling.

3. Shortest Job First

The job with the shortest burst time will get the CPU first. The lesser the burst time, the
sooner will the process get the CPU. It is the non-preemptive type of scheduling.

4. Shortest remaining time first

It is the preemptive form of SJF. In this algorithm, the OS schedules the Job according to
the remaining time of the execution.

5. Priority based scheduling


54
In this algorithm, the priority will be assigned to each of the processes. The higher the
priority, the sooner will the process get the CPU. If the priority of the two processes is same
then they will be scheduled according to their arrival time.

6. Highest Response Ratio Next

In this scheduling Algorithm, the process with highest response ratio will be scheduled
next. This reduces the starvation in the system.

First Come First Serve CPU Process Scheduling in Operating Systems

In this tutorial, we are going to learn an important concept in CPU Process Scheduling
Algorithms. The important concept name is First Come First Serve. This is the basic
algorithm which every student must learn to understand all the basics of CPU Process
Scheduling Algorithms.

First Come First Serve paves the way for understanding of other algorithms. This algorithm
may have many disadvantages. But these disadvantages created very new and efficient
algorithms. So, it is our responsibility to learn about First Come First Serve CPU Process
Scheduling Algorithms.

Important Abbreviations

1. CPU - - - > Central Processing Unit


2. FCFS - - - > First Come First Serve
3. AT - - - > Arrival Time
4. BT - - - > Burst Time
5. WT - - - > Waiting Time
6. TAT - - - > Turn Around Time
7. CT - - - > Completion Time
8. FIFO - - - > First In First Out

First Come First Serve

First Come First Serve CPU Scheduling Algorithm shortly known as FCFS is the first
algorithm of CPU Process Scheduling Algorithm. In First Come First Serve Algorithm
what we do is to allow the process to execute in linear manner.

55
This means that whichever process enters process enters the ready queue first is executed
first. This shows that First Come First Serve Algorithm follows First In First Out (FIFO)
principle.

The First Come First Serve Algorithm can be executed in Pre Emptive and Non-Pre
Emptive manner. Before, going into examples, let us understand what is Pre Emptive and
Non-Pre Emptive Approach in CPU Process Scheduling.

Pre Emptive-Approach

In this instance of Pre Emptive-Process Scheduling, the OS allots the resources to a Process
for a predetermined period of time. The process transitions from running state to ready
state or from waiting state to ready state during resource allocation. This switching happens
because the CPU may assign other processes precedence and substitute the currently active
process for the higher priority process.

Non-Pre Emptive-Approach

In this case of Non-Pre Emptive-Process Scheduling, the resource cannot be withdrawn


from a process before the process has finished running. When a running process finishes
and transitions to the waiting state, resources are switched.

Convoy Effect in First Come First Serve (FCFS)

Convoy Effect is a phenomenon which occurs in the Scheduling Algorithm named First
Come First Serve (FCFS). The First Come First Serve Scheduling Algorithm occurs in a
way of non-preemptive way.

The Non preemptive way means that if a process or job is started execution, then the
operating system must complete its process or job. Until, the process or job is zero the new
or next process or job does not start its execution. The definition of Non-Preemptive
Scheduling in terms of Operating System means that the Central Processing Unit (CPU)
will be completely dedicated till the end of the process or job started first and the new
process or job is executed only after finishing of the older process or job.

There may be a few cases, which might cause the Central Processing Unit (CPU) to allot a
too much time. This is because in the First Come First Serve Scheduling Algorithm Non-
Preemptive Approach, the Processes or the jobs are chosen in serial order. Due, to this
shorter jobs or processes behind the larger processes or jobs takes too much time to
complete its execution. Due, to this the Waiting Time, Turn Around Time, Completion
Time is very high.

56
So, here as the first process is large or completion time is too high, then this Convoy effect
in the First Come First Serve Algorithm is occurred.

Let us assume that Longer Job takes infinite time to complete. Then, the remaining
processes have to wait for the same infinite time. Due to this Convoy Effect created by the
Longer Job the Starvation of the waiting processes increases very rapidly. This is the
biggest disadvantage of FCFS CPU Process Scheduling.

Characteristics of FCFS CPU Process Scheduling

The characteristics of FCFS CPU Process Scheduling are:

1. Implementation is simple.
2. Does not cause any causalities while using
3. It adopts a non pre Emptive and pre Emptive strategy.
4. It runs each procedure in the order that they are received.
5. Arrival time is used as a selection criterion for procedures.

Advantages of FCFS CPU Process Scheduling

The advantages of FCFS CPU Process Scheduling are:

• In order to allocate processes, it uses the First In First Out queue.


• The FCFS CPU Scheduling Process is straight forward and easy to implement.
• In the FCFS situation pre emptive scheduling, there is no chance of process
starving.
• As there is no consideration of process priority, it is an equitable algorithm.

57
Disadvantages of FCFS CPU Process Scheduling

The disadvantages of FCFS CPU Process Scheduling are:

• FCFS CPU Scheduling Algorithm has Long Waiting Time


• FCFS CPU Scheduling favors CPU over Input or Output operations
• In FCFS there is a chance of occurrence of Convoy Effect
• Because FCFS is so straight forward, it often isn't very effective. Extended waiting
periods go hand in hand with this. All other orders are left idle if the CPU is busy
processing one time-consuming order.

Problems in the First Come First Serve CPU Scheduling Algorithm

Example

1. S. No Process ID Process Name Arrival Time Burst Time


2. ___ ______ _______ _______ _______
3. 1 P1 A 0 9
4. 2 P2 B 1 3
5. 3 P3 C 1 2
6. 4 P4 D 1 4
7. 5 P5 E 2 3
8. 6 P6 F 3 2

Non-Pre Emptive-Approach

Now, let us solve this problem with the help of the Scheduling Algorithm named First
Come First Serve in a Non-Preemptive Approach.

Gantt chart for the above Example 1 is:

Turn Around Time = Completion Time - Arrival Time

Waiting Time = Turn Around Time - Burst Time

58
Solution to the Above Question Example 1

S. No P A B Com Turn W
r rr u pleti Aroun ai
o iv r on d ti
c al s Time Time ng
e T t Ti
s i T m
s m i e
I e m
D e

1 P A 0 9 9 9 0
1

2 P B 1 3 12 11 8
2

3 P C 1 2 14 13 11
3

4 P D 1 4 18 17 13
4

5 P E 2 3 21 19 16
5

6 P F 3 2 23 20 18
6

The Average Completion Time is:


59
Average CT = ( 9 + 12 + 14 + 18 + 21 + 23 ) / 6

Average CT = 97 / 6

Average CT = 16.16667

The Average Waiting Time is:

Average WT = ( 0 + 8 + 11 + 13 + 16 + 18 ) /6

Average WT = 66 / 6

Average WT = 11

The Average Turn Around Time is:

Average TAT = ( 9 + 11 + 13 + 17 + 19 +20 ) / 6

Average TAT = 89 / 6

Average TAT = 14.83334

This is how the FCFS is solved in Non-Pre Emptive-Approach.

Now, let us understand how they can be solved in Pre Emptive-Approach

Pre Emptive-Approach

Now, let us solve this problem with the help of the Scheduling Algorithm named First
Come First Serve in a Pre Emptive-Approach.

In Pre Emptive-Approach we search for the best process which is available

Gantt chart for the above Example 1 is:

S P A B Com Turn W
. r rr u pleti Aroun ai
o iv r ti

60
N c al s on d ng
o e T t Time Time Ti
s i T m
s m i e
I e m
D e

1 P A 0 9 23 23 1
1 4

2 P B 1 3 8 7 4
2

3 P C 1 2 3 2 0
3

4 P D 1 4 15 14 1
4 0

5 P E 2 3 11 9 7
5

6 P F 3 2 5 2 0
6

Semaphores in OS (Operating System)

To get rid of the problem of wasting the wake-up signals, Dijkstra proposed an approach
which involves storing all the wake-up calls. Dijkstra states that, instead of giving the

61
wake-up calls directly to the consumer, producer can store the wake-up call in a variable.
Any of the consumers can read it whenever it needs to do so.

Semaphore is the variables which stores the entire wake up calls that are being transferred
from producer to consumer. It is a variable on which read, modify and update happens
automatically in kernel mode.

Semaphore cannot be implemented in the user mode because race condition may always
arise when two or more processes try to access the variable simultaneously. It always needs
support from the operating system to be implemented.

According to the demand of the situation, Semaphore can be divided into two categories.

• Counting Semaphore
• Binary Semaphore or Mutex

Round Robin Scheduling Algorithm

In this tutorial, we are going to learn about the most efficient CPU Process Scheduling
Algorithm named Round Robin CPU Process Scheduling. This algorithm is very special
because it is going to remove all the Flaws which we have detected in the previous CPU
Process Scheduling Algorithms.

There is a lot of popularity for this Round Robin CPU Scheduling is because Round Robin
works only in Pre Emptive state. This makes it very reliable.

Important Abbreviations

1. CPU - - - > Central Processing Unit


2. AT - - - > Arrival Time
3. BT - - - > Burst Time
4. WT - - - > Waiting Time
5. TAT - - - > Turn Around Time
6. CT - - - > Completion Time
7. FIFO - - - > First In First Out
8. TQ - - - > Time Quantum

Round Robin CPU Scheduling

62
Round Robin CPU Scheduling is the most important CPU Scheduling Algorithm which is
ever used in the history of CPU Scheduling Algorithms. Round Robin CPU Scheduling
uses Time Quantum (TQ). The Time Quantum is something which is removed from the
Burst Time and lets the chunk of process to be completed.

Time Sharing is the main emphasis of the algorithm. Each step of this algorithm is carried
out cyclically. The system defines a specific time slice, known as a time quantum.

First, the processes which are eligible to enter the ready queue enter the ready queue. After
entering the first process in Ready Queue is executed for a Time Quantum chunk of time.
After execution is complete, the process is removed from the ready queue. Even now the
process requires some time to complete its execution, then the process is added to Ready
Queue.

The Ready Queue does not hold processes which already present in the Ready Queue. The
Ready Queue is designed in such a manner that it does not hold non unique processes. By
holding same processes Redundancy of the processes increases.

After, the process execution is complete, the Ready Queue does not take the completed
process for holding.

Advantages

63
The Advantages of Round Robin CPU Scheduling are:

• A fair amount of CPU is allocated to each job.


• Because it doesn't depend on the burst time, it can truly be implemented in the
system.
• It is not affected by the convoy effect or the starvation problem as occurred in First
Come First Serve CPU Scheduling Algorithm.

Disadvantages

The Disadvantages of Round Robin CPU Scheduling are:

• Low Operating System slicing times will result in decreased CPU output.
• Round Robin CPU Scheduling approach takes longer to swap contexts.
• Time quantum has a significant impact on its performance.
• The procedures cannot have priorities established.

Examples:

1. S. No Process ID Arrival Time Burst Time


2. ___ ______ ______ _______
3. 1 P1 0 7
4. 2 P2 1 4
5. 3 P3 2 15
6. 4 P4 3 11
7. 5 P5 4 20
8. 6 P6 4 9

Assume Time Quantum TQ = 5

Ready Queue:

1. P1, P2, P3, P4, P5, P6, P1, P3, P4, P5, P6, P3, P4, P5

Gantt chart:

64
Average Completion Time

1. Average Completion Time = ( 31 +9 + 55 +56 +66 + 50 ) / 6


2. Average Completion Time = 267 / 6
3. Average Completion Time = 44.5

Average Waiting Time

1. Average Waiting Time = ( 5 + 26 + 5 + 42 + 42 + 37 ) / 6


2. Average Waiting Time = 157 / 6
3. Average Waiting Time = 26.16667

Average Turn Around Time

1. Average Turn Around Time = ( 31 + 8 + 53 + 53 + 62 + 46 ) / 6


2. Average Turn Around Time = 253 / 6
3. Average Turn Around Time = 42.16667

Shortest Remaining Time First (SRTF) Scheduling Algorithm

This Algorithm is the preemptive version of SJF scheduling. In SRTF, the execution of
the process can be stopped after certain amount of time. At the arrival of every process, the
short term scheduler schedules the process with the least remaining burst time among the
list of available processes and the running process.

Once all the processes are available in the ready queue, No preemption will be done and
the algorithm will work as SJF scheduling. The context of the process is saved in the

65
Process Control Block when the process is removed from the execution and the next
process is scheduled. This PCB is accessed on the next execution of this process.

Example

In this Example, there are five jobs P1, P2, P3, P4, P5 and P6. Their arrival time and burst
time are given below in the table.

P A B Com Turn W Re
r r u pleti Arou ai sp
o r r on nd ti on
c i s Tim Time n se
e v t e g Ti
s a T T me
s l i i
I T m m
D i e e
m
e

1 0 8 20 20 12 0

2 1 4 10 9 5 1

3 2 2 4 2 0 2

4 3 1 5 2 1 4

5 4 3 13 9 6 10

66
6 5 2 7 2 0 5

Avg Waiting Time = 24/6

The Gantt chart is prepared according to the arrival and burst time given in the table.

1. Since, at time 0, the only available process is P1 with CPU burst time 8. This is the
only available process in the list therefore it is scheduled.
2. The next process arrives at time unit 1. Since the algorithm we are using is SRTF which
is a preemptive one, the current execution is stopped and the scheduler checks for the
process with the least burst time.
Till now, there are two processes available in the ready queue. The OS has executed
P1 for one unit of time till now; the remaining burst time of P1 is 7 units. The burst
time of Process P2 is 4 units. Hence Process P2 is scheduled on the CPU according to
the algorithm.
3. The next process P3 arrives at time unit 2. At this time, the execution of process P3 is
stopped and the process with the least remaining burst time is searched. Since the
process P3 has 2 unit of burst time hence it will be given priority over others.
4. The Next Process P4 arrives at time unit 3. At this arrival, the scheduler will stop the
execution of P4 and check which process is having least burst time among the available
processes (P1, P2, P3 and P4). P1 and P2 are having the remaining burst time 7 units
and 3 units respectively.
5. P3 and P4 are having the remaining burst time 1 unit each. Since, both are equal hence
the scheduling will be done according to their arrival time. P3 arrives earlier than P4
and therefore it will be scheduled again.The Next Process P5 arrives at time unit 4. Till
this time, the Process P3 has completed its execution and it is no more in the list. The
scheduler will compare the remaining burst time of all the available processes. Since
the burst time of process P4 is 1 which is least among all hence this will be scheduled.

67
6. The Next Process P6 arrives at time unit 5, till this time, the Process P4 has completed
its execution. We have 4 available processes till now, that are P1 (7), P2 (3), P5 (3) and
P6 (2). The Burst time of P6 is the least among all hence P6 is scheduled. Since, now,
all the processes are available hence the algorithm will now work same as SJF. P6 will
be executed till its completion and then the process with the least remaining time will
be scheduled.

Once all the processes arrive, No preemption is done and the algorithm will work as SJF.

Highest Response Ratio Next (HRRN) Scheduling

Highest Response Ratio Next (HRNN) is one of the most optimal scheduling algorithms.
This is a non-preemptive algorithm in which, the scheduling is done on the basis of an extra
parameter called Response Ratio. A Response Ratio is calculated for each of the available
jobs and the Job with the highest response ratio is given priority over the others.

Response Ratio is calculated by the given formula.

Response Ratio = (W+S)/S

Where,

W → Waiting Time
S → Service Time or Burst Time

If we look at the formula, we will notice that the job with the shorter burst time will be
given priority but it is also including an extra factor called waiting time. Since,

HRNN α W
HRNN α (1/S)

Hence,

This algorithm not only favors shorter job but it also concern the waiting time of the longer
jobs. Its mode is non preemptive hence context switching is minimal in this algorithm.

68
Chapter 5 Deadlocks in OS

What is Deadlock in Operating System (OS)?

Every process needs some resources to complete its execution. However, the resource is
granted in a sequential order.

1. The process requests for some resource.


2. OS grant the resource if it is available otherwise let the process waits.
3. The process uses it and release on the completion.

A Deadlock is a situation where each of the computer process waits for a resource which
is being assigned to some another process. In this situation, none of the process gets
executed since the resource it needs, is held by some other process which is also waiting
for some other resource to be released.

Let us assume that there are three processes P1, P2 and P3. There are three different
resources R1, R2 and R3. R1 is assigned to P1, R2 is assigned to P2 and R3 is assigned to
P3.

After some time, P1 demands for R1 which is being used by P2. P1 halts its execution since
it can't complete without R2. P2 also demands for R3 which is being used by P3. P2 also
stops its execution because it can't continue without R3. P3 also demands for R1 which is
being used by P1 therefore P3 also stops its execution.

In this scenario, a cycle is being formed among the three processes. None of the process is
progressing and they are all waiting. The computer becomes unresponsive since all the
processes got blocked.

69
Difference between Starvation and Deadlock

Deadlock Starvation

Deadlock is a situation where no Starvation is a situation where the low priority


process got blocked and no process process got blocked and the high priority
proceeds processes proceed.

Deadlock is an infinite waiting. Starvation is a long waiting but not infinite.

Every Deadlock is always a starvation. Every starvation need not be deadlock.

The requested resource is blocked by The requested resource is continuously be


the other process. used by the higher priority processes.

Deadlock happens when Mutual It occurs due to the uncontrolled priority and
exclusion, hold and wait, No resource management.
preemption and circular wait occurs
simultaneously.

Necessary conditions for Deadlocks

70
1. Mutual Exclusion A resource can only be shared in mutually exclusive manner. It
implies, if two process cannot use the same resource at the same time.
2. Hold and Wait A process waits for some resources while holding another resource at
the same time.
3. No preemption The process which once scheduled will be executed till the completion.
No other process can be scheduled by the scheduler meanwhile.
4. Circular Wait All the processes must be waiting for the resources in a cyclic manner
so that the last process is waiting for the resource which is being held by the first
process.

Strategies for Handling Deadlock

1. Deadlock Ignorance

Deadlock Ignorance is the most widely used approach among all the mechanism. This is
being used by many operating systems mainly for end user uses. In this approach, the
Operating system assumes that deadlock never occurs. It simply ignores deadlock. This
approach is best suitable for a single end user system where User uses the system only for
browsing and all other normal stuff.

There is always a tradeoff between Correctness and performance. The operating systems
like Windows and Linux mainly focus upon performance. However, the performance of
the system decreases if it uses deadlock handling mechanism all the time if deadlock
happens 1 out of 100 times then it is completely unnecessary to use the deadlock handling
mechanism all the time.

In these types of systems, the user has to simply restart the computer in the case of
deadlock. Windows and Linux are mainly using this approach.

2. Deadlock prevention

Deadlock happens only when Mutual Exclusion, hold and wait, No preemption and circular
wait holds simultaneously. If it is possible to violate one of the four conditions at any time
then the deadlock can never occur in the system.

The idea behind the approach is very simple that we have to fail one of the four conditions
but there can be a big argument on its physical implementation in the system.

3. Deadlock avoidance
71
In deadlock avoidance, the operating system checks whether the system is in safe state or
in unsafe state at every step which the operating system performs. The process continues
until the system is in safe state. Once the system moves to unsafe state, the OS has to
backtrack one step.

In simple words, The OS reviews each allocation so that the allocation doesn't cause the
deadlock in the system.

4. Deadlock detection and recovery

This approach let the processes fall in deadlock and then periodically check whether
deadlock occur in the system or not. If it occurs then it applies some of the recovery
methods to the system to get rid of deadlock.

We will discuss deadlock detection and recovery later in more detail since it is a matter of
discussion.

Deadlock avoidance

In deadlock avoidance, the request for any resource will be granted if the resulting state of
the system doesn't cause deadlock in the system. The state of the system will continuously
be checked for safe and unsafe states.

In order to avoid deadlocks, the process must tell OS, the maximum number of resources
a process can request to complete its execution.

72
The simplest and most useful approach states that the process should declare the maximum
number of resources of each type it may ever need. The Deadlock avoidance algorithm
examines the resource allocations so that there can never be a circular wait condition.

Safe and Unsafe States

The resource allocation state of a system can be defined by the instances of available and
allocated resources, and the maximum instance of the resources demanded by the
processes.

A state of a system recorded at some random time is shown below.

Resources Assigned

Process Type 1 Type 2 Type 3 Type 4

A 3 0 2 2

B 0 0 1 1

C 1 1 1 0

D 2 1 4 0

Resources still needed

Process Type 1 Type 2 Type 3 Type 4

A 1 1 0 0

B 0 1 1 2

73
C 1 2 1 0

D 2 1 1 2

1. E = (7 6 8 4)
2. P = (6 2 8 3)
3. A = (1 4 0 1)

Above tables and vector E, P and A describes the resource allocation state of a system.
There are 4 processes and 4 types of the resources in a system. Table 1 shows the instances
of each resource assigned to each process.

Table 2 shows the instances of the resources, each process still needs. Vector E is the
representation of total instances of each resource in the system.

Vector P represents the instances of resources that have been assigned to processes. Vector
A represents the number of resources that are not in use.

A state of the system is called safe if the system can allocate all the resources requested by
all the processes without entering into deadlock.

If the system cannot fulfill the request of all processes then the state of the system is called
unsafe.

The key of Deadlock avoidance approach is when the request is made for resources then
the request must only be approved in the case if the resulting state is also a safe state.

Resource Allocation Graph

The resource allocation graph is the pictorial representation of the state of a system. As its
name suggests, the resource allocation graph is the complete information about all the
processes which are holding some resources or waiting for some resources.

It also contains the information about all the instances of all the resources whether they are
available or being used by the processes.

74
In Resource allocation graph, the process is represented by a Circle while the Resource is
represented by a rectangle. Let's see the types of vertices and edges in detail.

Vertices are mainly of two types, Resource and process. Each of them will be represented
by a different shape. Circle represents process while rectangle represents resource.

A resource can have more than one instance. Each instance will be represented by a dot
inside the rectangle.

75
Edges in RAG are also of two types, one represents assignment and other represents the
wait of a process for a resource. The above image shows each of them.

A resource is shown as assigned to a process if the tail of the arrow is attached to an instance
to the resource and the head is attached to a process.

A process is shown as waiting for a resource if the tail of an arrow is attached to the process
while the head is pointing towards the resource.

Example
76
Let'sconsider 3 processes P1, P2 and P3, and two types of resources R1 and R2. The
resources are having 1 instance each.

According to the graph, R1 is being used by P1, P2 is holding R2 and waiting for R1, P3
is waiting for R1 as well as R2.

The graph is deadlock free since no cycle is being formed in the graph.

Deadlock Detection using RAG

If a cycle is being formed in a Resource allocation graph where all the resources have the
single instance then the system is deadlocked.

In Case of Resource allocation graph with multi-instanced resource types, Cycle is a


necessary condition of deadlock but not the sufficient condition.

The following example contains three processes P1, P2, P3 and three resources R2, R2,
R3. All the resources are having single instances each.

77
If we analyze the graph then we can find out that there is a cycle formed in the graph since
the system is satisfying all the four conditions of deadlock.

Allocation Matrix

Allocation matrix can be formed by using the Resource allocation graph of a system. In
Allocation matrix, an entry will be made for each of the resource assigned. For Example,
in the following matrix, en entry is being made in front of P1 and below R3 since R3 is
assigned to P1.

Process R1 R2 R3

P1 0 0 1

P2 1 0 0

P3 0 1 0

Request Matrix

In request matrix, an entry will be made for each of the resource requested. As in the
following example, P1 needs R1 therefore an entry is being made in front of P1 and below
R1.
78
Process R1 R2 R3

P1 1 0 0

P2 0 1 0

P3 0 0 1

Avial = (0,0,0)

Neither we are having any resource available in the system nor a process going to release.
Each of the process needs at least single resource to complete therefore they will
continuously be holding each one of them.

We cannot fulfill the demand of at least one process using the available resources therefore
the system is deadlocked as determined earlier when we detected a cycle in the graph.

Deadlock Detection and Recovery

In this approach, The OS doesn't apply any mechanism to avoid or prevent the deadlocks.
Therefore the system considers that the deadlock will definitely occur. In order to get rid
of deadlocks, The OS periodically checks the system for any deadlock. In case, it finds any
of the deadlock then the OS will recover the system using some recovery techniques.

The main task of the OS is detecting the deadlocks. The OS can detect the deadlocks with
the help of Resource allocation graph.

79
In single instanced resource types, if a cycle is being formed in the system then there will
definitely be a deadlock. On the other hand, in multiple instanced resource type graph,
detecting a cycle is not just enough. We have to apply the safety algorithm on the system
by converting the resource allocation graph into the allocation matrix and request matrix.

In order to recover the system from deadlocks, either OS considers resources or processes.

For Resource

Preempt the resource

We can snatch one of the resources from the owner of the resource (process) and give it to
the other process with the expectation that it will complete the execution and will release
this resource sooner. Well, choosing a resource which will be snatched is going to be a bit
difficult.

Rollback to a safe state

System passes through various states to get into the deadlock state. The operating system
canrollback the system to the previous safe state. For this purpose, OS needs to implement
check pointing at every state.

The moment, we get into deadlock, we will rollback all the allocations to get into the
previous safe state.

For Process

80
Kill a process

Killing a process can solve our problem but the bigger concern is to decide which process
to kill. Generally, Operating system kills a process which has done least amount of work
until now.

Kill all process

This is not a suggestible approach but can be implemented if the problem becomes very
serious. Killing all process will lead to inefficiency in the system because all the processes
will execute again from starting.

81
Chapter 6 Commonly used OS

Proprietary OS: Windows

History: Microsoft Windows, often referred to simply as Windows, is a series of graphical


operating systems developed and marketed by Microsoft. It traces its roots back to the early
1980s when Microsoft started working on a graphical user interface (GUI) extension for
the MS-DOS operating system. The first version, Windows 1.0, was released in 1985.
Since then, Windows has evolved through numerous versions, each bringing new features,
improvements, and advancements in technology.

Components and Features: Windows consists of several core components, including the
kernel, device drivers, user interface components, and system libraries. The kernel is the
heart of the operating system, responsible for managing hardware resources, memory, and
process scheduling. The user interface components include the Windows Shell, which
provides the desktop environment and graphical elements like icons and windows.

Memory Management: Windows employs virtual memory management, allowing


multiple processes to run concurrently even if physical RAM is limited. It uses a paging
system that swaps data between RAM and disk as needed, optimizing system performance.

File System: Windows supports the New Technology File System (NTFS) as its primary
file system. NTFS offers features like file and folder permissions, encryption, and
journaling for improved reliability. This file system is known for its robustness and support
for large storage volumes.

Benefits and Limitations: Windows offers a wide range of software compatibility due to
its popularity. It provides a familiar user interface, extensive driver support for hardware,
and compatibility with a vast array of applications. However, it has faced criticism for its
susceptibility to malware and security vulnerabilities, as well as concerns about user
privacy.

Mac:

History: The macOS operating system, previously known as Mac OS X, was developed
by Apple Inc. Originally, it was built on a Unix-based foundation called NeXTSTEP,
which Apple acquired from NeXT Computer in 1996. This formed the basis for macOS,
which was first released in 2001.

82
Components and Features: macOS includes a graphical user interface known for its
elegance and simplicity. It has a core based on the Darwin operating system, which is a
Unix-like foundation. The Aqua user interface is known for its distinctive look and feel,
characterized by its use of translucent elements and visual effects.

Memory Management: Similar to other modern operating systems, macOS utilizes virtual
memory management to allow multiple processes to run concurrently while efficiently
using physical memory.

File System: macOS originally used the Hierarchical File System (HFS) and then HFS+,
but it transitioned to the Apple File System (APFS) in recent versions. APFS offers features
like snapshot support, enhanced data integrity, and improved performance.

Benefits and Limitations: macOS is praised for its sleek design, ease of use, and strong
integration with Apple's hardware ecosystem. It is known for its focus on security and
privacy. However, it is limited by its hardware exclusivity to Apple devices and a narrower
range of software compared to Windows.

OS Security: Vulnerabilities, Threats, and Countermeasures: Both Windows and


macOS have faced security challenges over the years. Vulnerabilities in the form of
software bugs, coding errors, and design flaws can expose the system to various threats,
including viruses, worms, malware, and unauthorized access.

Countermeasures: To mitigate these security risks, both Microsoft and Apple regularly
release updates and patches to address vulnerabilities. They also incorporate security
features like firewall protection, antivirus software integration, encryption mechanisms,
and user account controls. Users are encouraged to keep their systems updated and follow
best practices to minimize risks.

In summary, Windows and macOS are two major proprietary operating systems with
distinct histories, components, features, memory management techniques, and file systems.
While they offer various benefits, they also have limitations and face security challenges
that require ongoing vigilance and countermeasures to ensure the safety and functionality
of the systems.

Open-source OS: Linux

History: Linux is an open-source operating system kernel created by Linus Torvalds in


1991. It is based on the Unix operating system and was initially developed as a hobby
83
project. Over time, it evolved into a robust and versatile operating system used in various
computing environments.

Components and Features: Linux consists of the kernel, which manages hardware
resources, process scheduling, and memory management. On top of the kernel, various
components are combined to create complete operating systems, often referred to as Linux
distributions or distros. These distributions include user interfaces, system libraries,
utilities, and application software.

Devices and Drivers: Linux has strong support for a wide range of devices and hardware
architectures. The kernel includes drivers for many common hardware components, and
the open-source nature of Linux encourages the community to develop drivers for a diverse
array of devices.

Memory Management: Linux uses a virtual memory system similar to other modern
operating systems. It employs techniques like demand paging and memory mapping to
optimize memory usage and allow efficient multitasking.

File System: Linux supports a variety of file systems, including ext4, XFS, and Btrfs.
These file systems offer features such as journaling for data integrity, efficient storage
allocation, and support for large storage volumes.

Benefits and Limitations: Linux's open-source nature brings numerous benefits. The
community-driven development model encourages collaboration, rapid innovation, and
widespread adoption. Linux is highly customizable, making it suitable for a wide range of
applications from embedded systems to servers. It is known for its stability, security, and
efficient resource utilization.

However, Linux also has its limitations. Its diverse ecosystem can lead to fragmentation,
with various distributions having different approaches and package management systems.
Additionally, some users may find the learning curve steep due to its command-line
interface and configuration options.

OS Security: Vulnerabilities, Threats, and Countermeasures: Linux, like all operating


systems, is susceptible to security vulnerabilities. However, its open-source nature allows
vulnerabilities to be discovered and patched quickly by the community. Linux distributions
often provide timely updates to address security issues.

84
Countermeasures: Linux's security is bolstered by its strong community involvement,
rapid patching, and security-focused distributions. Tools like SELinux (Security-Enhanced
Linux) provide additional layers of access control, and the open-source nature allows
security experts to review the code for potential vulnerabilities.

In conclusion, Linux is a powerful open-source operating system known for its flexibility,
security, and customization. Its components, features, and diverse ecosystem have led to
its widespread adoption across various computing domains. While it presents benefits,
users should be aware of its potential limitations and practice security measures to ensure
a safe and efficient computing experience.

85
Database Management Systems

86
Unit VII: Database Management System

The data is a value that we physically record in a database.

Meaning of the data is called Information.

Database is a Computer based record keeping system whose over all purpose is to record
and maintain the information.

Database system involves four major components.

Data Base

Data Hardware Software Users

1. Data
The data stored in the system is partitioned into one or more databases. For tutorial
purposes it is usually convenient to assume that there is just one database, containing the
totality of all stored data in the system.

A database then is a repository for stored data, in general it is both integrated and shared.

Integrated

By “integrated” we mean that the database may be thought of as a unification of several


otherwise distinct data files, with any redundancy among those files partially or wholly
eliminated.

Shared

By “shared” we mean that individual pieces of data in the database may be shared among
several different users, in the sense that each of those users may have access to the same
piece of data and may use it for different purposes.

2. Hardware:

The hardware consists of the secondary storage volumes-disks, drums etc. On which the
database resides together with the associated devices, control units, channels, and so forth.
(We assume that the database is too large to be held in its entirety within the computer’s
primary storage).
87
3. Software:

Between the physical database itself (i.e. the data as actually stored) and the users of the
system is a layer of software, usually called the database management system or DBMS.
All requests from users for access to the database are handled by the DBMS.

Queries

D.B.M.S. Operating System Data


Base
Cobol/PL/ I

4. Users:
Here are considered three broad classes of user i.e.
(1) Application programmer.
(2) End-user.
(3) Database administrator.

(1) Application programmer:

Application programmer is responsible for writing application programs that use the
database, typically in a language such as COBOL or PL /I. These application programs
operate on the data in all the usual ways, retrieving information, creating new information,
deleting or changing existing information (All these functions are performed by issuing the
appropriate request to the DBMS)

(2) End-user:
The second class of user, then, the end-user may employ a query language provided as an
integral part of the system.

(3) Data base Administrator:

The third class of user is the database administrator or DBA. DBA is a person or a group
of person responsible for over all performance of the system.

88
Conventional purpose of DBMS:

(Conventional files are also called permanent files(Cobol Files, Non database files)
consider part of saving bank enterprises that keep information about all customers and
savings accounts in permanent system files at the bank)

(The system has a number of application programs that allow the user to manipulate the
file including.)

1. A program to add a new record.


2. A program to find the balance of account.
3. A program to generate the monthly statement.
4. A program to debit and credit the account.

‘New application programs’ are edited to the system as need arises. As the time
grows, files and more application programs are added to the system. The typical file
processing system described above is supported by conventional file system (these are non
DBF).

This scheme has a no. of disadvantages:

1. Mainly we cannot share the data.


2. Difficulty in accessing the data, for eg-suppose a bank officer needs to find out the
name of all customers who live within the city PINCODE 462021. Then the officer
asks the data processing officer to generate the search list. Since this request was not
anticipated when original system was designed :

The bank officer has two choices.

1. Either get the list of customers and extract the needed information manually ; or
2. Ask the data processing department to have a system programmer and then write the
necessary application program.

Both the alternatives are absolutely unsatisfactory.

3. Data redundancy and inconsistency:

Since the file and application program are created by different programmers over a long
period of time and files are likely to have different formats and program may be written
in several programming languages. Moreover the same piece of information may be
duplicated in several places. The conventional file processing environment does not
allow needed data to be retrieved in a convenient and efficient manner because SQL
89
(structured query language) are not available, as we can in database, this leads to higher
storage and access cost.

4. Concurrent Access Anomalies:


In order to improve the over all performance in the system and obtain a
faster response time, many system allow multiple user to update the data
simultaneously. In such an environment interaction of concurrent update may result in
consistent data. Consider an example of bank account X with dollar 500 ($500) if two
customer withdraw fund say $100, $50 from account X at about the same time. The
result of the concurrent execution may lead the account in incorrect (or inconsistent)
stage. That particular account may contain either $450 or $400 rather than $350.

5. Data isolation:
Since the data is scattered in various files and files may be in different formats, it is difficult
to write the new application program to retrieve the data.

6. Integrity problem:

Data value stored in database must satisfy certain constraints, for eg. the balance of bank
account may never fall below a prescribed amount say $25 these constraints are enforced
by the system by adding codes in various application program.

7. Security problem:

Not every user of the Data base system should be able to access all the data,for eg- In a
banking system payroll personal need only to see that part of the data base that has
information about the bank employee, they do not need access to information about
customers account. So it is difficult to enforce these security constraints in conventional
file system.

8. Lack of flexibility.

9. Data dependencies

90
Unit VII DBMS Design

Model / Approach/ TYPE OF DBMS


TYPE

Relational Hierarchical model Network model

Relational DBMS:

Data and relationship among the data is represented by collection of table each of which
has number of columns with unique name.

1. Supply table (S):

S# S NAME STATUS CITY

S1 Smith 20 London

S2 John 10 Paris

S3 Black 30 Paris

2. Part table (P)


P# P NAME COLOUR WEIGHT CITY

P1 Nut Red 12 London


P2 Bolt Green 17 Paris
P3 Screw Blue 17 Rome
P4 Screw Red 14 London.

91
3. Supply & Part Combine to make Table Sp

S# P# QTY

S1 P1 300
S1 P2 200
S1 P3 400
S2 P1 300
S2 P2 400
S3 P2 200

Common terms used in relational database:

• Rows are generally referred to as a tuple and columns are referred as a attribute.

• The number of column are called arity or the degree of relation.

• Cardinality- The no. of tuple in a table is called the cardinality.

• Domain: A domain is a pool of value from which the actual values appearing in a
given column are drawn. The set of possible values that a given attribute can have
is called domain. It is possible for diff. Attribute to share same domain
GUEST.Soc_Sec_No & EMPLOYEE.Soc_Sec_NO

Several Operation:

1. INSERT-
Given information concerning a new supplier (S4) in area (W).
INSERT tuple from W into Relation S.

2. DELETE- Shipment connecting part P2 and supplier S3.

DELETE SP tuple where P# = P2 and S# = S3.


92
3.UPDATE-

Suppose supplier S1 has move from London to Delhi.


UPDATE tuple where S# = S1 setting city to Delhi.

Note : TUPLE , RECORD(ROW) and INSTANCE are same thing

Keys:

1. Primary key:

In a relation there is one attribute with the value that is unique with in the relation
and thus can be used to identify the tuple of that relation. And that attribute is called
the primary key which is used to identify the one to another. For example in the
relation part the primary key is P#. Primary key can be a combination of two
attributes for eg. in SP table [ S# p# ]

2. Super key

If we add additional attributes to a primary key , the resulting combination would


still uniquely identify an instance of the entity set. Such the key is called super key
: a primary key is, therefore , a minimum super key .for example in a bank even
though each customer has a unique soc_sec_no but bank uses a unique identifier
called Account_Number to identify each account . the fact that bank have more
than one account of same type. For example two current account , three saving
accounts. The attribute Account_Number is better choice for the primary key of
Account.

3. Candidate key:

A relation in which their are more than one attribute Combination, processing the
unique identification property and hence more than one attribute that can be used
for unique identification purpose, then there are more than one candidate key for
eg. in Supplier table the attribute S# and S.Name both are candidate Keys.

4. Alternate Key :

An attribute which is used for unique identification purpose other than primary Key
is Called the Alternate Key.
93
5. Secondary Key:

A secondary key is an attribute or combination of attributes that may not be a


candidate key but that classifies the entity set on a particular characteristic .
EMPLOYEE having the attribute Department , which identifies by its value all
instances of employee who belong to given department .More than one employee
may belong to a department. So Department is not a candidate key for EMPLOYEE
entity set.

6. Foreign key:

If the value of primary Key (K) of one table also occurs in another table and both
are drawn from the same domain than M is Called a foreign Key.

For example:
Teacher Table

T# T NAME Qualification C#

T1 Pinky B.Sc. BI

T24 Mr. Kumar M. Sc. B II

Here c# is the foreign Key

Class Table

C# CNAME R. no. Capacity

BII 12 12340 20

BI 11 1462 30

Integrity Rules:
1. No Component of a primary Key may be null (Unknown). This is Known as entity
integrity.

94
2. Referential Integrity: Let D be a primary domain, and let R1 be a relation with an
attribute A that is defined on D. Then, at any Given time, each value of A in R1 must
be either (a) null, or (b) equal to v, say, where v is the primary Key value of some tuple
in some relation R2 (R1 $ R2 not necessarily distinct) with primary Key defined on D.

Hierarchical model:

In this model data is organised as collection tree. Tree is composed of Hierarchy’s of


elements called node and in this there is a special designated node called root .With the
exception of root every node has a relation with its higher level. No element can have more
than one parent. Each element can have one or more element related to it at lower level.
These are called children. We represent the shipment in the form of hierarchical in which
the record type at the top of the tree (in our eg. is part) is called the root.

P1 NUT RED 12 LONDON

S2 JONES 10 PARIS 300


S1 SMITH 20 LONDON 300

P3 SCREW BLUE 17 ROME

S1 SMITH 20 LONDON 400

P2 BOLT GREEN 17 PARIS

S3 BLAKE 30 PARIS 200


S2 JONES 10 PARIS 400
S1 SMITH 20 LONDON 200

P4 SCREW RED 14 LONDON

Sample Data in hierarchical form


(Parts Superior to Suppliers)

Advantages:

1. Relative Simplicity.
2. Easy to use.
95
Disadvantage:

Insert:
It is not possible, without introducing a special dummy part to insert data concerning a new
supplier S4, say until that supplier supplies some part.

Delete:
The only way to delete a shipment is to delete the corresponding supplier occurrence. It
follows that if we delete the only shipment for a given supplier, we lose all information on
that supplier (The insert and delete anomalies are really two sides of the same coin). for
example, deleting the shipment connecting P2 and S3 is handled by deleting the occurrence
for S3 under part p2, which since it is the only occurrence for S3-causes all information on
S3 to be lost.

Update:
If we need to change the description of a supplier- eg to change the city for supplier S1 to
Delhi-we are faced with either the problem of Searching the entire view to find every
occurrence of supplier S1, or the possibility of introducing an in consistency. (Supplier S1
might be shown as being in Delhi at one point and London at another.)

Network model:

In a network model data is represented by collection of records . Such as PL/I or Pascal


and the relationship among the data is represented through pointer. In database as the
collection of arbitrary graph. A given record occurrence in N/W approach can have any
number of immediate parent as well as any no.of immediate dependent . A network model
is more general than the hierarchical structure. In simple word we can also say that in N/W
approach there can be any number of parent for one child.

Disadvantage:

1. It is a more complex structure because there are number of pointer even to represent a
small database we have large number of pointer.

Advantages:

Insertion, Deletion and modification is possible without upsetting other links.

96
Insert:
To insert data concerning a new supplier record occurrence. Initially there will be no
connector records for the new supplier; its chain will consist of a single pointer from the
supplier to itself.

Delete:
To delete the shipment connecting P2 and S3 we delete the connector record
OCCURENCE linking this supplier and this part. The two-chain concerned will need to be
adjusted appropriately

Update:
We can change the city for supplier S1 to Delhi without search problems and without the
possibility of inconsistency because the city for S1 appears at precisely one place in the
structure.

The network data model was formalised in the late 1960s by the Database Task Group of
the conference on Data System Language (DBTG/CODASYL). Their first report which
has been revised a number of times, contained detailed specifications for the network data
model .

DBTG Set .

The DBTG model uses two different data structures to represent the database entities and
relationships between the entities, namely record type and set type. A RECORD TYPE is
used to represent an entity type. It is made up of a number of data items that represent the
attributes of the entity.

A SET TYPE is used to represent a directed relationship between two record types, the so-
called owner record type, and the member record type. The set type , like the record
type , is named and specifies that there is a one-to-many relationship (1:M) between the
owner and member record type . The set type can have more than one record type as its
member , but only one record type is allowed to be the owner in a given set type. A
database could have one or more occurrences of each of its record and set types. An
occurrence of a set type consists of an occurrence of the owner record type and any number
of occurrences of each of its member record types. A record type can not be a member of
two distinct occurrence of the same set type.

Bachman introduced a graphical means called a data structure diagram to denote the logical
relationship implied by the set. Here a labeled rectangle represents the corresponding
entity or record type. An arrow that connects two labeled rectangles represents a set type.

97
Department

Dept_Emp

Employee

A DBTG Set

Implementation of the Network Data Model.

The record is a basic unit to represent data in the DBTG network database model. The
implementation of the one-to-many relationships of a set is represented by linking the
members of a given occurrence of a set to the owner record occurrence. Figure 22 shows
the implementation of the set occurrence DEPT-EMP where the owner record is Comp.sc.
and the member records are for instance Jancy and Santosh. Note that for simplicity we
have shown only one of the record fields of each record. This method of implementation
assigns one pointer (link) in each record for each set type in which the record participates
and , therefore, allows a record occurrence to participate in only one occurrence of a given
set type.

Comp. Sc.

Pointer to the first member Pointer to the owner

Jancy ………… Santosh


Printer to next member
of the set occurence

Implementation of the DBTG SET in the network model

A second form of network implementation , especially useful for M:N relationships, is a


bit map, which is depicted in figure. A bit map is a matrix created for each relationship.
Each row corresponds to the relative record number of a target record of a relationship. A
1 bit in a cell for row X and column Y means that the records corresponding to row X and
column Y are associated in this relationship.
1 2 ……. Y ……

1 1 0 ……. 1 ……..
98
2 0 0 ……. 1 ……..
. . . ……. . ……..
. . . ……. . ……..

1 1 0 ……. 1 ……..

1 1 0 ……. 1 ……..

Example of a bit map implementatiion for product and vendor relationship in a


network

AN ARCHITECTURE FOR A DATABASE SYSTEM

The architecture of a database system is divided into three general level.


1. External Level
2. Conceptual Level.
3. Internal Level.

External level which is one closest to the user, that is , the one concerned with the way in
which data is viewed by individual users.

Internal level is one which is closest to physical storage, that is , the one concerned the
way in which the data is actually stored.

Conceptual level is a “level of indirection” between other two . it is a representation of


entire information content database.

User 1 User 2

Employee Name LOGICAL VIEW Employee Name


Logical record 1
Employee Address Employee Soc_Sec_No.

Employee Address
Logical record
Employee Name 1
Employee annual salary

Employee Address
Mapping supplied by DBMS
Employee Name
Employee Name Employee Soc_Sec_No.
CONCEP-
Employee name : String
Employee Address 99
Employee Address
Employee social security number : key
Employee annual salary
TUAL
VIEW Conceptual Record

DB
A

Name : string length 25 offset 0

Soc_sec_no : 9 Dec. offset 25 Unique

Department : String length 6 offset 34

Address : string length 51 offset 40

Skill : String length 20 offset 91


Mapping supplied by DBMS/OS
Salary : 9.2 Dec. offset 111
Employee record
Length 120 Internal View

Internal record

The highest level seen by application programmer or user is called external view, user
view or simple view.

The next level of abstraction is the total sum of user’s view called global view or conceptual
view.

The lowest level, a description of actual method of storing the data is the internal view.

A scheme is an out line or a plan that describe the records and relationship exist in the
view. The view at each of these levels is described by a scheme.

Each external view is described by a scheme called an External Scheme. The External
scheme is consisting of definition of logical record and the relationships in the external
view. It also contains the method of deriving the object in external view from object in
conceptual view.

The conceptual view is defined by the means of Conceptual Scheme. It describes all records
and relationship included in conceptual view and therefore in database there is only one
conceptual scheme per database. Conceptual scheme are intended to include a great many
100
additional features such as authorization checks and validation procedures. It also contains
the method of deriving the object in conceptual view from object in internal view.

Internal view is expressed by the Internal Scheme. The Internal view is very low-level
representation of database. Internal scheme not only define various type of stored record
but also specifies what indexes exist, how stored fields are represented, what physical
sequence of stored record are in and so on.

Two mapping are required in a database system with three different view. The
Conceptual/Internal mapping defines the correspondence between a conceptual view and
a stored database; it specifies how conceptual records and fields map into their stored
counterpart.

The External / conceptual mapping defines the correspondence between a conceptual view
and a external view because some differences may exist between these two level for field
may exist different format.

Language

(1) Each user has a language at his/her disposal for example, for application programmer
it will be conventional programming language such as COBOL or PL/I ; for the
terminal user it will be either a query language or special purpose language.
(2) One important thing about the users language is that it will include the data sub
language (DSL) that is a subset of a total language that is concerned with the database
object and the operation.
(3) Any data sub language with a combination of two languages.

(a) DDL (Data definition Language).


(b) DML (Data Manipulation Language).

(a) Data definition language (DDL) which provide the definition or description of data
base object.
(b) Data manipulation language (DML) , which is used to manipulation or processing of
such object which are defined using DDL.

DDL

(1) Create table supplier (Sr.No. char(5) non null, Name char(20), status int(4), city
char(15));
(2) A DDL is the means of declaring to the DBMS what data structure will be used.
(3) A DDL giving a logical data description to perform the following function.

101
(a) It should identify the type of data record and the data base file.
(b) It should give a unique name to each data item type, record type, file type.
(c) It should specify the data item type.
(d) It may define the length of the data item or range of the value that a data item can
accept.
DML

The application program must give instructions to DBMS and have a means of interpreting
the status message with which it replies saying whether the request has been satisfactorily
accomplished.

Some DBMS provide an interface which is application program independent. The


CODASYL data base task group (DBTG) has an extension of COBOL called the Data
manipulation language (DML).

Generally, the interface b/w application program and DBMS referred to as a DML is
embedded in a host language such as COBOL. Typically command in DML which used in
CODASYL is as following:

OPEN :- A file or set of records is open, made available for an application program to
use.

CLOSE :- A file or set of records is closed, made unavailable to the application program.

FIND :- The DBMS locate a specified occurrences of a name record type. It may have to
search through several records to find it.

GET :- The contents of specified data item of a specified record occurrence are transferred
to program work area.

MODIFY :- The value of specified data item are change.

INSERT :- The record in the program work area is inserted into one or more name group
of records.

REMOVE :- A specified record occurrence is cancelled from the membership of one or


more group of records.

STORE :- A new record occurrence is stored in the data base and all the necessary
relationship linking and address facilities are created.

102
ORDER :- All or specified member record is in a name group of record, are logically
recorded in ascending or descending sequence of specified keys.

DELETE :- A specified record occurrence is removal from the database.

103
Chapter 8 DBMS DESIGN

Relational Algebra is a collection of operation on relations. Each operation take one or


more relation as its operand and produces another relation as a result.

RELATIONAL ALGEBRA

Traditional Set Operation Special Set of Relation Operation

Union Insertion Difference Selection Join Division


Cartesian Product Projection

(A) Traditional Set of Operation :

1) UNION :
The union of two union compatible relation A & B is the set of all tuple (T) belong to
either A or B (or Both)

For example:

A be the set of supplier in London and B the set of set of supplier tuple’s for set of supplier
whose suppy part P1, then A U B is the set of supplier tuple for the supplier whose either
located in London or supply part P1 or both./eg.

2) INTERSECTION :

The intersection of two (UNION) compatible relation A & B is the set of all tuple T
belongs to set of both A & B for ex :

A & B is the set of supplier tuple for supplier’s who are located in London and supply part
P1.

3) DIFFERENCE :
The difference b/w two relation (A-B) in that order is the set of all tuple T belong to A and
not to B. for Ex :
104
A-B in our example contain the set of tuple for the supplier who are located in London and
who don’t supply the part P1.

4) CARTESIAN PRODUCT :
Cartesian product is a simply concatenation of all tuple belonging to the two relation.
S=A*B
Let A be the set of supplier no.
Let B be the set of part no.

Then A times B is the set of all possible supplier no./part no.


The product multiply two tables so that if one table has M rows & N rows then product has
MXN rows.
R S
A B C D E F
a b c b g a

d a f d a f

c b d

RXS

A B C D E F

a b c b g a

a b c d a f

d a f b g a

d a f d a f

c b d b g a

c b d a a f

(B) Special Relation Operation:


Selection & Projection operation are uniary operations. Since they operate on one relation.

105
1) SELECTION :
The algebraic selection operator yields a horizontal subset of a given relation i.e. that
7subset of tuple within the given relation for which a specified predicate is satisfied.
S where City = “London”

S# SNAME STATUS CITY


S1 Smith 20 London

S4 Clark 20 London

P Where weight < 14

P# PNAME COLOR WEIGHT CITY


P1 Nut Red 12 London

P5 Com Blue 12 Paris

SP WHERE S# = ‘S1’
AND P# = ‘P1’

S# P# QTY
S1 P1 300

Three sample selection.

Condition Relationname
Ex.
City = London S

If x denotes the relation on which selection is performed then the result of selection has
exactly same qualified attribute as x.

2) PROJECTION :

Projection operator yield a vertical subset of a given relation that subset obtained by
selecting specified attributes in a specified left to right order and then eliminating duplicate
tuple within the attribute selected.

Note :It’s mean removing certain column from the tuple.


106
Column name name of
Or relation
Attributes

TABLE :
S# SNAME STATUS CITY

S1 Smith 20 London
S2 Jones 10 Paris
S3 Blake 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens

The Supplier relation


City Supplier

CITY
London
Paris
Athens

3) JOIN :

Join is equivalent to taking first cartesian product of two relation and then taking the
selection of the product based on some logical condition for example we have two relation
R $ S.
R S
A B C D E
1 2 3 3 1
4 5 6 6 2
7 8 9

RXS This is cartesian Product


A B C D E
1 2 3 3 1
1 2 3 6 2
4 5 6 3 1
4 5 6 6 2
7 8 9 3 1
107
7 8 9 6 2

R S
I0J
Where , 0 is a logical expression
R S
B <= D

A B C D E
1 2 3 3 1
1 2 3 6 2
4 5 6 6 2

6 RXS
IOJ

If the value of 0 where 0 is an arithmetic comparison operator


=, < , > , <= , >=.
If 0 is equal to operation then it is called equi join.
Like C = D

R S
C=D

A B C D E
1 2 3 3 1
5 6 6 2

6 RXS
IOJ

Natural Join :
R S
A B C B C D
a b c b c d
d b c b c e
b d f a d b
c a d
108
RXS

A B C B C D
a b c b c d
d b c b c e
b d f a d b

d b c b c d
d b c b c e
d b c a d b

b d f b c d
b d f b c e
b d f a d b

c a d b c d
c a d b c e
c a d a d b
bu 12 cases ls ftles b or c dh value same gSA mUgs ysuk gSA

A B C D
a b c d
a b c e
d b c d
d b c e
c a d b
The natural join return R S , is applicable only when both R and S have column that
are named by attribute.

1) First we compute R x S
2) For each attributes X that named both a column in R and a column in S select from
R x S those tuple whose value agree in the column for R , X and S ,X .

The Join operator , as the name suggest allows the combining of two relations to form $ a
single new relation. The tuples from the operand relations that participate in the operation
and contribute to the result are related. The join operation allows the processing of
relationships existing between the operand relations.

3) DIVISION :
The division operator divide a dividend relation A of degree M+N by a division relation
B of degree n and produce a result relation of degree in
109
A B C D
a b c d
a b e f
b c e f
e d c d
e d e f
a b d e

S C D
e f

(a) Relation R

a b
e d
b c

R S

DATA DICTIONARY :

A catalogue of all data type giving their names and structure

A data dictionary defining all data item used is needed.

Data dictionary defining all the data item that are in use will be built up in co-operation.

One of the most important DBA tools is the data dictionary. The data dictionary is
effectively a database in its own right a database that contains “data about data” (that is ,
description of other objects in the system , rather than simply “raw data”) . In particular all
the various schemas (external , conceptual internal ) are physically stored , in both source
and object form in the dictionary . A comprehensive dictionary will also include cross
reference information , showing , for instance, which programs use which piece of data,
which department require which reports and so on.

In fact the dictionary may even be integrated in to the database it describes and thus may
include its own description. It should be possible to query the dictionary just like any other
database , so that for example , the DBA can easily discover which programs are likely to
be affected by some proposed change to the system.
110
This may be defined as a boundary in the system below which everything is invisible to
the user.

DISTRIBUTED DATA BASE :

In a distributed system the data is stored in several computer. The several computer in a
distributed data base system is communicated by means of various communication system
such as high speed buses or telephone line and the processor in distributed database system
may vary in size, these processes are refer in number of different name, such as node, site.
In simple word we say that in distributed data base system. The data is stored in several
locations. Thus, the database is distributed in relevant only at internal level not on a
conceptual or external level.

A distributed database is typically a database i.e. not stored its entirety on a single physical
location but rather is spread across a n/w or computer that are geographically dispersed and
connected by communication link . The key object of a distributed data base system is that
it looks likes a centralized system to users.

For example consider a banking system consisting four branches located in four different
city . Each branch has its own computer with database consisting of all account maintained
at that branch. Each such installation is thus a site. There also exist one single site which
maintains information about all branches of bank.

A local transaction is a transaction that accesses accounts in the one single site, at which
the transaction was initiated. A global transaction on other hand is which either access
account in a site different from the one at which transaction was initiated ,or accesses
accounts in several different sites. For eg. consider a transaction to add Rs.200 to account
number 177 located at Delhi branch . If transaction was initiated at Delhi branch , then it
is considered local , otherwise it is considered global. A transaction to transfer Rs.200 from
account 177 to 305 , which is located at Bombay branch , is a global transaction since
accounts in two different site are accessed as the result of its execution.

Advantage:

1) Primary advantage of DDB system is that access and share the data in reliable and
efficient manner.

2) If one node of DDB system is failure then remaining node are able to continuing the
operation . Thus the failure of one node is not causes the system shut down. The failure
of node can be defected.

111
3) We know that in DDB system there are several nodes. So each node are able to retain
a degree of control over which data stored locally.

4) Speedup Query Processing


If a query involves data at several sites, it may be possible to split the query into
subqueries that can be executed in parallel by several sites. Such parallel computation
allows for faster processing of a user’s query.

Disadvantage :

1) Software implement cost is very high. It is very difficult to implement the DDB system.

2) Greater potential for bugs.

3) The exchange of messages and additional computation required to achieve inter-site


coordination is form of that does not arise in centralized system

The Site in the system can be connected physically in variety of ways .

some topology

VIEWS

Base table is a primary data structure. It has independent exist. A view is a table that does
not have existence in its own right but is instead derived from one or more base table.

For example if the database include a base table S with fields S#,SNAME,STATUS and
CITY, then we may define view called LONDIN_SUPPLIER say, with fields S# , SNAME
, STATUS ,and CITY derived from S by selecting those records having CITY =
‘LONDON’ .

A views which restricts the user to certain rows is called a horizontal view and a vertical
view restricts the user to certain columns.

If the list of columns names is omitted the columns in the view take the same name as in
the underlying tables.

There are several advantages to views, including:

112
Security: users can be given access to only the rows /column of table that concern that. The
table may contain the data for the whole firm but they see data for their department.

Date Integrity: If the view shows data for particular office the user can only enter data for
that office.

Simplicity: Even if view is multiple table query, querying the view still look like a single-
table query.

There are several disadvantages of views

Performance : IF the view is complex then even simple queries can take a long time.

Update restriction : Updating the data through a view may or may not be possible. If the
view is complex the DBMS may decide it can’t perform updates and make the view read-
only

The format of view statement is as follows

create view VIEW_NAME

as (select ……. )

create view EMP_VIEW as


(select emp_no ,name , skill
from EMPLOYEE )

THE CODD RULES

In the most basic of definitions a DBMS can be regarded as relation only if it obeys
the following three rules:

. All information must be held in tables.

. Retrieval of the data must be possible using the following types of operations:
SELECT, JOIN and PROJECT

. All relationships between data must be represented explicitly in that data itself.
113
With the 12 rules stated below , must be demonstrable, within a single product, for it to
termed relational .

The Twelve Rules

Rule 0. : Any truly relational database must be manageable entirely through its own
relational capabilities

A relational database must be relational, wholly relational and nothing but relational.

Rule 1 : The information rule


All information is explicitly and logically represented in exactly one way – by data
values in tables.

Rule 2 : The rule of guaranteed access


Every item of data must be logically addressable by resorting to a combination of table
name, primary key value and column name.

It must be true that any item can be retrieved by supplying the table name, the primary key
value of the row holding the item and the column name in which it is to be found .

Rule 3 : The systematic treatment of null values (null value means unavailable or
unapplicable values)

It is fundamental to DBMS that null values are supported in the representation of missing
and inapplicable information.

A description of the database is held and maintained using the same logical structures used
to define the data.

Rule 4 : The database description rule

A description of the database is held and maintained using the same logical structures used
to define the data.

Rule 4 means that there must be a data dictionary within the RDMS that is constructed of
tables and/or views that can be examined using SQL.
114
Rule 5 : The comprehensive sub-language rule

There must be at least one language whose statements can be expressed as character strings
conforming to some well defined syntax, that is comprehensive in supporting the
following:

• Data definition
• View information
• Data manipulation
• Integrity constraints
• Authorisation
• Transaction boundaries

Rule 6: The view updating rule

All views that can be updated in theory, can also be updated by the system. Example,
if you define a virtual column in a view as A*B where A and B are columns in a base
table , then how can you perform an update on that virtual column directly?

Rule 7: The insert and update rule

An RDMS must do more than just be able to retrieve relational data sets. It has to be capable
of inserting and deleting data as a relational set.

Rule 8: The physical independence rule

Use access to the database, via monitors or application programs, must remain logically
consistent whenever changes to the storage representation , or access methods to data,
are changed.

Therefore, and by way of an example, if an index is built or destroyed by the DBA on a


table, any user should still retrieve the same data from that tale, albeit a little more
slowly.

Rule 9: The logical data independence rule

Application programs must be independent of changes ,made to the base tables.

115

TAB 1 FRAG 1 FRAG 2


A B C D A B A C D

1 A C 1 A 1 C E
E

4 A 4 C F
4 A C F

6 B 6 D G
This rule allows many types of database design changes to be made dynamically, without
users being aware of6 them..
B ItDshould be possible to split a table vertically into more than
G as such splitting preserves all the original data (is non –loss), and
one fragments, as long
maintain the primary key in each and every2 fragment.
B 2 D H

FRAG
2 1 B D FRAG 2 TAB
1 H

A B A C D A B C
D

1 A 1 C E
1 A C
E
4 A 4 C D

Secondly, it should be possible to combine base tables into one, by way of a non-loss join.
4 A C
5 Brule
Rule 10: Integrity 6 D G F

The relational model includes two general integrity rules .

2 B 2 D H 6 B D
Integrity rule 1
G

116

2 B D
Integrity rule 1 is concerned with primary key values. Before we formally state the rule, let
us the look at the effect of null values in prime attributes. A null value for an attribute is a
value that is either not known at the time or does not apply to a given instance of the object
. It may also be possible that a particular tuple does not have a value for an attribute ; these
fact could be represented by a null value.

If any attribute of a primary key ( prime attribute) were permitted to have null values, then,
because the attribute in the key must be non redundant, the key cannot be used for unique
identification of tuples .

P: P:

Id Name Id. Name

101 Jones 101 Jones


103 Smith @ Smith
107 Evan 107 Evan
112 Smith @ Smith

Definition: Integrity rule 1 (Entity Integrity)

If the attribute A of relation R is a prime attribute of R, then A cannot accept null


values.

Integrity Rule 2( Referential Integrity):

Integrity Rule 2 is concerned with foreign keys, i.e. with attributes of a relation having
domains that are those of primary key of another relation . Ex. The manager attribute
represents the employee no of the manager. Manager is the foreign key; note that it is
referring to the primary key of the same relation. The chief executive officer (CEO) of the
company can have himself or herself as the manager or may take null values.

Emp# Name Manager

101 Jones @
102 Smith 110
103 Evan 112
104 Smith 112
117
Rule 11 : Distribution rule :

A RDBMS must have distribution independence.

This is one of the more attractive aspects of RDBMS’s Database system built on the
relational framework are well suited to today’s client/server database design.

Rule 12: No subversion rule (low level accessing is not allowed) :

If an RDBMS supports a lower-level language that permits for eg, a row-at-a-time


processing, then this language must not be able to bypass any integrity rules of constraints
of the relational language.

NORMALISATION
How we decide the structure for a given data base.

Given a body of data to be represented in a database, how do we decide on a suitable logical


structure for that data?

In other words, how do we decide what relations are needed and what their attributes should
be ? This is the database design problem.

So perhaps a good design principle is “ONE FACT , IN ONE PLACE”


Normalization theory is a useful aid in the design process, but it is not a panacea.

NORMAL FORM

Normalization Theory is built around the concept of normal forms. A relation is said to be
in a particular normal form, if it satisfies a certain specified set of constraints. For example,
a relation is said to be in first normal form (abbreviated 1 NF ) if and only if it satisfies the
constraint that it contains atomic values only (Thus every Normalized relation is in 1 NF
, which accounts for the “first” )

Date of Birth Day Month Year

118
UNIVERSE OF RELATION (Normalized and unnormalized )
11 NF RELATIONS (Normalized relations)

2 NF RELATIONS
3 NF RELATIONS
BO NF RELATIONS
4 NF RELATIONS
PI/NF(5NF)
RELATIONS

NORMAL FORMS

FUNCTIONAL DEPENDENCY :-
Let X and Y be two attributes of a relation. Given the value of X , if there is only one value
of Y corresponding to it , then Y is said to be functionally dependent on X . This is indicated
by the notation.
X Y
Y is functionally dependent on X if their exit a one value of Y for one value of X

In the suppliers and parts database for example , attributes SNAME , STATUS and CITY
of relation S are each functionally dependent on attribute S# because , given a particular
value for s# , there exists precisely one corresponding value for each of SNAME, STATUS,
and CITY (provided that the S# value occurs in relation S, of course).
In symbols.
S.S# S.SNAME
S.S# S.STATUS
S.S# S,CITY
Or ,
S.S# S,( NAME,STATUS,CITY )

S# NAME STATUS CITY

S1 Smith 20 London
S2 James 10 Paris
119
S3 Black 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens

S# STATUS

S.NAME CITY

PNAME

COLOR
P#

WEIGHT

CITY

S#
QTY

P#

Functional Dependencies in relations , S, P, SP.

Composite Attribute functional dependency :-


Functional dependency may also be based on a composite attribute . For eg., if we write

X, Z Y
It means that there is only one value of Y corresponding to given values of X,Z . In other
words, y is functionally dependent on the composite X, Z .

120
For eg. :

X
Y
Z

S#
QTY
P#

For one value of ( S# , P# ) their exists one value of CITY .

FULL FUNCTIONAL DEPENDENCE :

X1 S#
Y QTY
X2 P#

Y is fully functionally dependent, if it is functionally dependent, on composite of


(X1, X2) not on x1 or x2 alone.

X2 Y dependent

Ex.:

S#
CITY Not fully functionally
dependent
STATU
S

Fully functional dependence: Attribute y is fully functionally dependent on attribute X


if it is functionally dependent on X and not functionally dependent on any proper subset of
X ( in other words, there does not exist a proper subset X of the attributes constituting X
such that y is functionally dependent on X) . For example
121
In relation S , the attribute CITY is functionally dependent on the composite attribute (S#
, STATUS) ; however , it is not fully functionally dependent on this composite attribute
because of course , it is also functionally dependent on S# alone.

SP S# P# QTY

S1 P1 300
S1 P2 200 S#
S1 P3 400
S2 P1 300 QTY
S2 P2 400 P#
S3 P2 200
Example of fully functionally
Dependent

FIRST NORMAL FORM :-


A relation r is in first normal form (1NF) if and only if all underlying domains contain
atomic values only)

Sub level are not allowed which are allowed in COBOL. i.e. not decomposable.

Let us suppose that information concerning suppliers and shipments, rather than being split
into two separate relations (S AND SP) , is lumped together into a single relation FIRST
(S# , STATUS , CITY , P# , QTY)

For sake of convenience we introduce an additional constraint namely, that STATUS is


functionally dependent on CITY . (The meaning of this constraint is that a suppliers status
is determined by the corresponding location ; eg. , all LONDON suppliers must have a
status of 20.

Also we ignore the attribute SNAME for simplicity . The primary key of FIRST is the
combination ( S# , P#).

We see from the fig that STATUS and CITY are note fully functionally dependent on the
primary key. Status and CITY are not mutually independent.

122
Primary Key

S# STATUS

Qty.
P# CITY

Functional dependency in the relation FIRST


The relation first suffers from some anomalies.
INSERTING :-
We cannot enter the fact that a particular supplier is located in a particular city until that
supplier supplies at least one part, indeed the tabulation in following figure
does not show that supplier S5 is located in Athens . The reason is that , until S5 supplies
some part, we have no appropriate Primary key value . ( Remember that by integrity rule
1, no component of a primary key value may be null.)

In relation First , primary key value consists of a supplier number and a part number.
FIRST
S# STATUS CITY P# QTY.
S1 20 LONDON P1 300
S1 20 LONDON P2 200
S1 20 LONDON P3 400
S1 20 LONDON P4 200
S1 20 LONDON P5 100
S1 20 LONDON P6 100
S2 10 PARIS P1 300
S2 10 PARIS P2 400
S3 10 PARIS P2 200
S4 20 LONDON P2 200
S4 20 LONDON P4 300
S4 20 LONDON P5 400
SIMPLE TABULATION OF FIRST

DELETING :-
If we delete the only first tuple for a particular supplier, we destroy not only the shipment
connecting that supplier to some part but also the information that the supplier is located
in a particular city. For Example :
If we delete the FIRST tuple with S# value S3 and P# value P2, we lose the information
that S3 is located in paris.
123
UPDATING:-
The city value for a given supplier appears in FIRST many times , in general. This
redundancy causes update problems. For example, if supplier S1 move from LONDON to
Amsterdam , we are faced with either the problem of searching the FIRST relation to find
every tuple connecting S1 and LONDON (and changing it ) or the possibility of producing
an inconsistent result.( the city for S1 may be given as Amsterdam in one place and
LONDON in another.

STATUS

S#

CITY

S#

QTY.

P#

Functional dependencies in the relations SECOND and SP

SECOND NORMAL FORM :-

A relation R is in Second normal form (2NF) if and only if it is in 1NF and every nonkey
attribute is fully dependent on the primary key .

An attribute is nonkey if it does not participate in the primary key .

A relation that is in First normal form and not n second can always be reduced to an
equivalent collection of 2NF relation.
124
The reduction consists of replacing the relations by suitable projection ; the collection of
these projections is equivalent to the original relation can always be recovered by taking
the natural join of these projections , so no information is lost in the process. In other
words the process is reversible. In our example Second and SP are projections of FIRST
and FIRST ifs the natural join of SECOND and SP over S#.

Second
S# STATUS CITY

S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Athens

SP
S# P# QTY.

S1 P1 300
S1 P2 200
S1 P3 400
S1 P4 200
S1 P5 100
S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 400

SAMPLE TABULATION OF SECOND & SP

The solution of the problems which appears in the 1st normal form is solve in this way.
The solution to these problem is to replace the relation FIRST by the two relations
SECOND (S# , STATUS , CITY ) and SP( S# , P# , QTY).

INSERTING :-
We can enter the information that S5 is located in Athens , even though S5 does not
currently supply any parts , by simply inserting the appropriate tuple in to SECOND.

DELETING :-
125
We can delete the shipment connecting S3 and P2 by deleting the appropriate tuple from
SP , we don’t lose the information that S3 is located in Paris.

UPDATING :-
In the revised structure, the city for a given supplier appears once, not many times (the
redundancy has been eliminated). Thus we can change the city for S1 from London to
Amsterdam by changing it once and for all in the relevant SECOND tuple.

DISADVANTAGE OF 2nd NORMAL FORM :

INSERTING :-
We cannot enter the fact that a particular city has a particular status value for example , we
cannot state that any supplier in ROME must have a status of 50 until we have some
supplier located in that city. The reason is , again , that until such a supplier exists we have
no appropriate primary key value.

DELETING :-
If we delete the only SECOND tuple for a particular city , we destroy not only the
information for the supplier concerned but also the information that city has that particular
status value . for example , if we delete the SECOND tuple for 55 , we lose the information
that the status for Athens is 30. (Once again , the inserting and deletion problems are two
sides of the same coin ).

UPDATING :-
The status value for a given city appears in SECOND relation to find every tuple for
London or the possibility of producing an inconsistent result. ( the status for London may
be given as 20 in one place and 30 in another)

NON LOSS DECOMPOSITION :-


The reduction of FIRST to the pair (SECOND , SP) is an example of a non loss
decomposition . In general , given a relation R with possibly composite attributes . A , B ,
C satisfying the FD R, A → R , B, R can always be “non loss decomposed” into its
projections R1(A,B) and R2(A,C).

TRANSITIVE :-
The dependency of status on S# , though it is functional , is transitive ( via Qty.) . Each S#
value determines a CITY value , and this in turn determines the STATUS value. The
transitivity leads, once again , to difficulties over update operation.
A

B Therefore C is transitively dependent


on A.
C
126
3rd NORMAL FORM (3NF) :-

A relation R is in third normal form (3 NF) if and only if it is in 2NF and every nonkey
attribute is non transitively dependent on the primary key.

Again the solution to the problem, is to replace the original relation (SECOND) by two
projections, in This case SC (S# , CITY ) and CS ( CITY , STATUS) . Figure shows the
tabulation corresponding to the data values of the original SECOND. The process is
reversible, once again since SECOND is the join of SC and CS over CITY.

S# CITY

CITY STATUS

Functionally dependencies in the relations SC and CS

SC
S# CITY
S1 London
S2 Paris
S3 Paris
S4 London
S5 Athens

CS
CITY STATUS
Athens 30
London 20
Paris 10

SAMPLE TABULATIONS OF SC AND CS.

Relation SC and CS are both 3NF ; relation SECOND is not ( The primary keys for SC
and CS) are S# and CITY , respectively ) . A relation that is in second normal form and
not in third can always be reduced to an equivalent collection of 3NF relations.

127
In this normal form we have overcome the problems of 2 NF.

INSERTION :-
If we want to enter the city and status of a supplier which do not have the supplier number
then also we can enter it in the 3NF which was not possible in 2 NF.

DELETION :-
If we delete a particular supplier S5 then the city and status of that supplier is also deleted
in 2NF but in 3 NF we broken down the S# , CITY and STATUS in two parts that’s why
city and status remains there inspite of the deletion of S#.

UPDATION :-
In 2NF if we change the status of supplier S1 i.e. the status value for London from 20 to
30 we faced either the problem of searching the SECOND relation to find every tuple for
London or the possibility of producing an inconsistent result . but in 3 NF we broken
down the S# , CITY and CITY , STATUS in the two parts that’s why if we want to change
the status for a particular city we take the table CS and do it . There is no problem of
searching and inconsistent because in the table CS city as well as status appears single
time.

BCNF ( BOYCE-CODD NORMAL FORM (BCNF) :-


An attribute, possibly composite , on which some other attribute is fully , functionally
dependent is called determinant. Then we can define BCNF as follows.

A relation R is in Boyce/codd Normal Form (BCNF) if and only if every determinant is a


candidate key.

The motivation for introducing BCNF is that the original 3NF Definition does not
satisfactorily handle the case of a relation possessing two or more composite and
overlapping candidate keys.

EXAMPLE :->
Relation First contains three determinants S# , City , and the combination (S# , P#) , of
these only (S#,P#) is a candidate key; hence First is not BCNF.

Similarly , SECOND is not BCNF , because the determinant CITY is not a candidate key.

Relations SP, SC AND CS on the other hand, are each BCNF , because in each case the
primary key is only determinant is the relation.

Now we present some examples in which the candidate keys overlap. Two candidate keys
overlap if they involve two or more attributes each and have and attribute in common.
128
FOR EXAMPLE :->
First example we suppose again that supplier names are unique , and we consider The
relation SSP (S# , SNAME ,P#, QTY.) . The keys are (S#, P#) and (SNAME, P#) .
IS THIS RELATION BCNF ?

The answer is no , because we have two determinants , S# , and SNAME , which are not
keys for the relation ( S# determines SNAME , and Conversely).

For Example :->


The relation Professor :
Professor (professor code, dept. , Head of Dept. , Parent time)
It is assumed that
1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.

The relationship diagram for the above relation is given in below figure and Table.
Table gives the relation attributes. The two possible composite keys are professor code
and Dept. or Professor code and head of Dept. Observe that department as well as Head of
Dept. are not non-key attributes . They are a part of a composite key.

Dependency diagram of Professor relation

Head of
Departme
Departme
ntnt
nt
Professor
Code Percent
Time

Department Head of Dept.

Departme
Head of nt
Departme
Percent
Professor
ntnt 129
Time
Code
Professor Code Department Head of Dept. Parcent
P1 Physics Ghosh 50
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematics Krishnan 100

NORMALISATION OF RELATION “Professor”

The relation given in above table is in 3NF . Observe , however , that the names of Dept.
and Head of Dept. are duplicated. Further , if Professor P2 resign , rows 3 and 4 are deleted.
We lose the information that Rao is the Head of Department of Chemistry.
The Normalization of the relation is done by creating a new relation for Dept. and Head of
Dept. and deleting Head of Dept. From Professor relation. The normalized relations are
shown in the following tables :-

(a) (b)

Professor Department Percent Department Head of


Code Time Department

P1 Physics 50 Physics Ghosh


P1 Mathematics 50 Mathematics Krishnan
P2 Chemistry 25 Chemistry Rao
P2 Physics 75
P3 Mathematics 100

And the dependency diagram for these new relation in next figure. The dependency
diagram gives the important clue to this normalization step as it clear from next figures.

Department
Percent Time

Professor
Code

Department Head of
Departmet
DEPENDENCY DIAGRAM OF PROFESSOR RELATIO
130
MULTI VALUED IDEPENDENCIES (MVD)

STUDENT
A B C

STUDENT NO PROJECT HOBBY

10 C DANCE
10 PROLOG SINGING
10 PASCAL MOVIE
20 D-BASE READING
20 BASIC PAINTING
30 ORACLE STAMP COLLECTION
30 VISUAL BASIC CRICKET MATCH

1:M
Relationship
1:N

R.A → → R.B
Given a relation R with attribute A, B & C . The multi-value dependence R.A → →R.B
hold in R if and only if the set of B value matching a given pair in R dependence only on
the A value and independent of C value.

Student.No → → Student.Project
Student.No → → Student.Hobby

Project is multivalue dependent on student . And project is independent on hobby.

4th NORMAL FORM

A relation R in 4th NF if and only if whenever there exist an MVD in R say. A→→B , then
all attributes of R are also functionally dependent on A.

Relation Student is not in 4 NF , sinnce it involves only MVD not FD . So break this
relation into two table’s by using projection.

131
SMP
A B

STUDENT NO PROJECT

10 C
11 PROLOG
10 PASCAL
20 D-BASE
20 BASIC
30 ORACLE
30 VISUAL BASIC

A C

STUDENT NO HOBBY

10 DANCE
12 SINGING
10 MOVIE
20 READING
20 PAINTING
30 STAMP COLLECTION
30 CRICKET MATCH

CTX

COURSE TEACHER TEXT

Physics Prof. Green Basic Mechanics


Prof. Brown Principles of Optics
Prof. Black

Math Prof. White Modern Algebra


Projective Geometry

COURSE TEACHER TEXT

Physics Prof. Green Basic Mechanics


132
Physics Prof. Green Principles of Optics
Physics Prof. Brown Basic Mechanics
Physics Prof. Brown Principles of Optics
Physics Prof. Black Basic Mechanics
Physics Prof. Black Principles of Optics
Math Prof. White Modern Algebra
Math Prof. White Projective Geometry

Sample tabulation of CTX (Normalized)

It is apparent that relation CTX contains a good deal of redundancy , leading as usual to
problems over update operations. For example, to add the information that the physics
course uses a new text called advanced mechanics, it is necessary to create three new tuple
, one for each of the tree teachers. Nevertheless , CTX is in BCNF , since it is “all key “
and there are no other functional determinants.
Now the CTX were replced by its two projections CT ( COURSE, TEACHER ) and CX(
COURSE, TEXT) . CT and CX are both “all key “ and are thus both BCNF.

CX CX
COURSE TEACHER COURSE TEXT
Physics Prof. Green Physics Basic Mechanics
Physics Prof. Brown Physics Principles of Optics
Physics Prof. Black Physics Modern Algebra
Math Prof. White Math Projetive Geometry

SAMPLE TABULATION OF CT AND CX

We can now see that relation CTX is not in 4 NF , since it involves an MVD that is not
an FD at all , let alone an FD in which the determinant is a candidate key . The two
projections CT and CX are in 4NF , however . thus 4NF is an improvement over BCNF.

FIFTH NORMAL FORM

S# P# J#

S1 P1 J2
S1 P2 J1
S2 P1 J1
S1 P1 J1

133
SP PJ SJ
S# P# P# J# S# J#
S1 P1 P1 J2 S1 J2
S1 P2 P2 J1 S1 J1
S2 P1 P1 J1 S2 J1

JOIN OVER P#

S# P# J#
S1 P1 J2
S1 P1 J1
S1 P2 J1
Spurious S2 P1 J2
row S2 P1 J1 join over ( J# , S#)

ORIGINAL SPJ

SPJ is the join of three of its projections but not of any two

A relation R is in Fifth normal form (5NF) – also called projection-join normal form
(PJ/NF) – if every join dependency in R is implied by the candidate keys of R. ( We amplify
the notion of JD being “implied by candidate keys”)

Relation SPJ is not 5NF ; its single candidate key, the combination (S# , P# , j#) , certainly
does not imply that the relation can be nonloss-decomposed into its projections SP , PJ and
JS. The projections SP , PJ , and JS are in 5 NF.

This JD is implied by the fact that S# and SNAME are both candidate keys . Thus , given
a relation R, we can tell if R is in 5 NF , provided we know the candidate keys and all JDs
in R.

Database Administra
One of the main reason of having database management system is to have to control
of both data and program accessing that data. The person having such control over
the system is called Database Administrator.

Database Administrator ( DBA ) is the person or group of persons which are


responsible for over all control of the data base system

134
The DBA’s responsibilities include the following

1. Deciding the storage structure and access strategy


The DBA decide how the data is to be represented in the database and must specify the
representation by writing the storage structure definition (using the internal data definition
language)

2. Deciding the information content of the database.


It is the DBA’s job to decide exactly what information is to be held in the database. The
DBA must define the content of database by writing the conceptual schema.

3. Liaising with user


It is the business of DBA to liaise with users , to endure that the data they require is
available , and to write the necessary external schema ( using the appropriate external
definition language.

4. Defining authorization checks and validation procedures


Authorization checks and validation procedure may be considered as logical extension of
conceptual schema. The conceptual DDL will include facilities for specifying such checks
and procedure. So the granting of different type of authorization for data access to various
user to database is also the job of DBA.

5. Define a strategy for backup and recovery.


Once the enterprise is committed to database system , it become critically dependent on
successful operation of the system. In event of damage to any portion of database-cause by
human error , say, or failure of in hardware or in supporting O.S.- it is essential to be able
to repair the concerned data with minimum of delay. The DBA must define and implement
an appropriate recovery strategy. For example, periodic dumping of the database to backup
tape.

6. Monitoring performance and responding to change in requirement.


The DBA is responsible for so organization the system as to get the performance that is
“ best for enterprise “ and for making the appropriate adjustment as requirement change.
Any change to details of storage and access must be accompanied by corresponding
change to definition of mapping to storage. Only Change is made in mapping not in
application

DATA ITEM –
A Data item is the smallest unit of name data. It may consists of any number of bits or byte.

135
A data item is often referred to as a field or data element . In COBOL it is called an
elementary item. In DBMS data item as atomic , we cannot decompose in sub-item.

DATA AGGREGATE –
A data aggregate is a collection of data items within a record, which is given a name and
referred to as a whole.
A data aggregate is called a group or group item in COBOL .

RECORD –
A Record is a named collection of data items or data aggregates.

SEGMENT –
It is the view of some authorities that there is no need to differentiate between a data
aggregate and a record because each is a collection of data item.

FILE –
A file is the named collection of all occurrences of a given type of (logical) record

DATA BASE –
A database is a collection of the occurrences of multiple record types , containing the
relationships between records, data aggregates and data items.

ENTITIES –
Entity is a thing that exist and is distinguishable , that is , we can tell , one entity from
another. For Example , Each chair , each person, each automobile.
A group of all similar type of entity is called entity set. Example of entity set are:-

(1) All person.


(2) All automobile.
(3) All Chairs.

ATTRIBUTES –
Entity have their property called Attributes or quality of entity which we store as
information is called the attribute.

ENTITY RELATIONSHIP DIAGRAM (ER DIAGRAM) –


Entity relationship diagram where

(1) Rectangle represent entity set .

(2) Circle , represent Attribute and these attributes are linked to their entity set by
undirected edges.
136
(3) Diamond represent relationship . They are linked to their constituent entity set
by constituent entity set by indirect edges.

Vendor-code vendor no. vendor-addr.

VENDOR

Vendor-code qty. Supplied

Supply

Order-no. date of suppl.

Price/unit
itemcode ITEM
Item-name

E-R DIAGRAM FOR VENDOR , ITEM AND THEIR RELATIONSHIP.

GENERALISATION AND SPECIFICATION –

Abstraction is the simplification mechanism used to hide superfluous details of a set of


objects , it allows one to concentrate on the properties that are of interest to the application.
As such , car is an abstraction of a personal transportation vehicle but does not reveal details
about mode, year , colour and so on. Vehicle itself is an abstraction that includes the types
car , truck and bus.

There are two main abstraction mechanisms used to model information : Generalisation
and aggregation. Generalisation is the abstracting process of viewing set of objects as a
single general class by concentrating on the general characteristics of the constituent sets
while suppressing or ignoring their differences. Or ignoring their differences . It is the
union of a number of lower-level entity types for the purpose of producing a higher-level
entity type. For instance, student is a generalisation of graduate or undergraduate , full-
time or part-time students.

137
Generalization is an IS_A relationship; therefore, manager IS_An employee , cook IS_An
employee, waiter IS_An employee, and so forth. Specialisation is the abstracting process
of introducing new characteristics to an existing class of objects to create one or more new
classes of objects. This invokes taking a higher-level entity and , using additional
characteristics , generating lower-level entities. The lower-level entities also inherit the
characteristics of the higher-level entity . In applying the characteristic size to car we can
create a full-size , mid-size, compact , or subcompact car. Specialisation may be seen as
the reverse process of generalisation.

The entity set EMPLOYEE is a generalisation of the entity sets


FULL_TIME_EMPLOYEE and PART_TIME_EMPLOYEE. The former is a
generalisation of the entity sets faculty and staff; the latter, that of the entity sets
TEACHING and CASUAL. FACULTY and STAFF inherit the attribute salary of the
entity set FULL_TIME_EMPLOYEE and the latter, in turn, inherits the attributes of
EMPLOYEE. FULL_TIME_EMPLOYEE is a specialisation of the entity set
EMPLOYEE and is differentiated by the additional attribute salary. Similarly,
PART_TIME_EMPLOYEE is a specialisation differentiated by the presence of the
attribute Type.

Employee _No Name Date_of_Hire

EMPLOYEE

IS_A IS_A

Salary Full_Time_ Type Part_Time_


Employee Employee

IS_A IS_A IS_A IS_A


138
Faculty Staff Teaching Casual

Degree Interest Classification Stipend Hour_Rate

Generalisation and Specialisation

AGGREGATION –
Aggregation is the process of compiling information on an object, thereby abstracting a
higher-level-object. In this manner, the entity person is derived by aggregating the
characteristics name , address, and Social Security number. Another form of the
aggregation is abstracting a relationship between objects and viewing the relationship as
an object. As such the ENROLLMENT relationship between entities student and course
could be viewed as entity REGISTRATION.

PERSON
Aggregation
Name
Address S_S_

COURSE
EXAMPLES OF AGGREGATION.

Important Things

• Attributes should be the fact about an entity & it should be fact about that entity.
Employee dk uke]
Employee dh mez vkfn
• Attribute in mono atomic or single value attribute
• We first define the mandatory attribute (not null) then optional ones (mandatory *
Optional O)
• Define fixed no of column & infinite and infinite number of row
• Clearly differentiate between entity & attributes (phone is an entity or attribute)
Contact No. –Phone1_off,Phone2_res,
( unique & not null primary key #)
139
Chapter 9 DBMS Transactions
Transaction Concept

• A transaction is a unit of program execution that accesses and possibly updates


various data items.
• A transaction must see a consistent database.
• During transaction execution the database may be inconsistent.
• When the transaction is committed, the database must be consistent.
• Two main issues to deal with:
o Failures of various kinds, such as hardware failures and system crashes
o Concurrent execution of multiple transactions

ACID Properties

To preserve integrity of data, the database system must ensure:

• Atomicity. Either all operations of the transaction are properly reflected in the
database or none are.
• Consistency. Execution of a transaction in isolation preserves the consistency of
the database.
• Isolation. Although multiple transactions may execute concurrently, each
transaction must be unaware of other concurrently executing transactions.
o That is, for every pair of transactions Ti and Tj, it appears to Ti that either
Tj, finished execution before Ti started, or Tj started execution after Ti
finished.
• Durability. After a transaction completes successfully, the changes it has made to
the database persist, even if there are system failures.

Example of Fund Transfer

• Transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

140
5. B := B + 50

6. write(B)

• Consistency requirement – the sum of A and B is unchanged by the execution of


the transaction.
• Atomicity requirement — if the transaction fails after step 3 and before step 6, the
system should ensure that its updates are not reflected in the database, else an
inconsistency will result.
• Durability requirement — once the user has been notified that the transaction has
completed (i.e., the transfer of the $50 has taken place), the updates to the
database by the transaction must persist despite failures.
• Isolation requirement — if between steps 3 and 6, another transaction is allowed
to access the partially updated database

Transaction State

• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed.
• Failed, after the discovery that normal execution can no longer proceed.
• Aborted, after the transaction has been rolled back and the database restored to its
state prior to the start of the transaction. Two options after it has been aborted:
o restart the transaction – only if no internal logical error
o kill the transaction
• Committed, after successful completion.

141
Implementation of Atomicity and Durability (Cont.)

The shadow-database scheme:

Concurrent Executions

142
• Multiple transactions are allowed to run concurrently in the system. Advantages
are:
o increased processor and disk utilization, leading to better transaction
throughput: one transaction can be using the CPU while another is reading
from or writing to the disk
o reduced average response time for transactions: short transactions need
not wait behind long ones.

Schedules

• Schedules – sequences that indicate the chronological order in which instructions


of concurrent transactions are executed
o a schedule for a set of transactions must consist of all instructions of those
transactions
o must preserve the order in which the instructions appear in each individual
transaction.

Example Schedules

• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
The following is a serial schedule (Schedule 1 in the text), in which T1 is
followed by T2.

143
• Let T1 and T2 be the transactions defined previously. The following schedule
(Schedule 3 in the text) is not a serial schedule, but it is equivalent to Schedule 1.

• The following concurrent schedule (Schedule 4 in the text) does not preserve the
value of the the sum A + B.

144
Serializability

• A schedule where the operation of each transaction are executed consequently


without any interference from any other transaction are called serial schedule.
• Schedule in which the operation from group of transaction are interleaved called
non serial schedule
• Thus serial execution of a set of transactions preserves database consistency.
• The objective of serializability is to find non serial schedule that allow transaction
to be executed concurrently without interfering with one another and produce the
state that could be produced by serial execution . We say that non serial schedule
is correct if it produce same result as produce by serial schedule.
• A (possibly concurrent) schedule is serializable if it is equivalent to a serial
schedule. Different forms of schedule equivalence give rise to the notions of:

1. conflict serializability

2. view serializability

Conflict Serializability

• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if


there exists some item Q accessed by both li and lj, and at least one of these
instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict.


2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict

• If a schedule S can be transformed into a schedule S´ by a series of swaps of non-


conflicting instructions, we say that S and S´ are conflict equivalent.
• We say that a schedule S is conflict serializable if it is conflict equivalent to a
serial schedule
• Example of a schedule that is not conflict serializable:

T3 T4
read(Q)
write(Q)
write(Q)

145
We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.

• Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2


follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule
3 is conflict serializable.

View Serializability

• Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met:

1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S´, also read the initial value of Q.

2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value
was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read
the value of Q that was produced by transaction Tj .

3. For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S´.

As can be seen, view equivalence is also based purely on reads and writes alone

146
Recoverability

Need to address the effect of transaction failures on concurrently running


transactions.

• Recoverable schedule — if a transaction Tj reads a data items previously written


by a transaction Ti , the commit operation of Ti appears before the commit
operation of Tj.
• The following schedule (Schedule 11) is not recoverable if T9 commits
immediately after the read

• If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence database must ensure that schedules are
recoverable.

Cascading rollback – a single transaction failure leads to a series of transaction


rollbacks. Consider the following schedule where none of the transactions has yet
committed (so the schedule is recoverable)

• If T10 fails, T11 and T12 must also be rolled back.


• Can lead to the undoing of a significant amount of work

147
• Cascadeless schedules — cascading rollbacks cannot occur; for each pair of
transactions Ti and Tj such that Tj reads a data item previously written by Ti, the
commit operation of Ti appears before the read operation of Tj.
• Every cascadeless schedule is also recoverable
• It is desirable to restrict the schedules to those that are cascadeless

Implementation of Isolation

• Schedules must be conflict or view serializable, and recoverable, for the sake of
database consistency, and preferably cascadeless.
• A policy in which only one transaction can execute at a time generates serial
schedules, but provides a poor degree of concurrency..
• Concurrency-control schemes tradeoff between the amount of concurrency they
allow and the amount of overhead that they incur.
• Some schemes allow only conflict-serializable schedules to be generated, while
others allow view-serializable schedules that are not conflict-serializable.

Transaction Definition in SQL

• Data manipulation language must include a construct for specifying the set of
actions that comprise a transaction.
• In SQL, a transaction begins implicitly.
• A transaction in SQL ends by:
o Commit work commits current transaction and begins a new one.
o Rollback work causes current transaction to abort.
• Levels of consistency specified by SQL-92:
o Serializable — default
o Repeatable read
o Read committed
o Read uncommitted

Levels of Consistency in SQL-92

• Serializable — default
• Repeatable read — only committed records to be read, repeated reads of same
record must return same value. However, a transaction may not be serializable – it
may find some records inserted by a transaction but not find others.
• Read committed — only committed records can be read, but successive reads of
record may return different (but committed) values.
• Read uncommitted — even uncommitted records may be read.

148
Chapter 10 DBMS Advancements

1. Object-Based Databases: Object-based databases store data in the form of objects,


allowing for complex data structures to be represented more naturally. These
objects can encapsulate both data and the operations that can be performed on that
data. This approach is particularly useful for applications that deal with multimedia
data or other complex structures, as it aligns well with object-oriented programming
paradigms.
2. Object-Relational Features: Object-relational databases combine the best of both
worlds: the robust query capabilities of relational databases and the expressive
power of object-oriented databases. They can store structured data in tables while
also accommodating complex data types and relationships. This is achieved
through concepts like user-defined types, inheritance, and methods associated with
these types.
3. Object Data Management Group (ODMG) Object Model: The ODMG standardizes
how object-oriented databases work by defining the core concepts, such as objects,
classes, inheritance, and relationships. This standardization helps developers work
with object-oriented databases more consistently across different systems.
4. Object Definition Language (ODL): ODL is used to define the schema of an object-
oriented database. It allows you to create and manage classes, specify their
attributes, define inheritance hierarchies, and associate methods with these classes
for manipulating data.
5. Object Query Language (OQL): OQL is a query language used to retrieve and
manipulate data from object-oriented databases. It enables users to express complex
queries involving objects, their attributes, and relationships. OQL is designed to
work seamlessly with the object model.
6. Extensible Markup Language (XML) Databases: XML databases are designed to
store and manage XML documents. These databases offer efficient storage and
retrieval of XML data, allowing hierarchical and semi-structured data to be easily
managed.
7. XML Hierarchical Model: XML data is structured hierarchically using nested
elements and attributes. Each XML document follows a tree-like structure, with a
root element that has child elements, forming a parent-child relationship.
8. Document Type Definition (DTD) and XML Schema: DTD and XML Schema are
used to define the structure, constraints, and data types of XML documents. They
ensure that XML data adheres to a specific format, making validation and
interpretation more consistent.
9. XQuery: XQuery is a query language specifically designed for querying XML data.
It allows you to extract, filter, transform, and combine XML data from various

149
sources. XQuery supports complex queries that involve navigating through XML
hierarchies.
10. DBMS Security: DBMS security involves identifying vulnerabilities and threats
that could compromise the integrity, confidentiality, and availability of data.
Security controls include authentication mechanisms, authorization policies,
encryption, auditing, and access controls to mitigate these risks.
11. DBMS Digital Rights Management (DRM): DBMS DRM focuses on protecting
digital content stored in databases. It involves mechanisms to control who can
access, view, modify, and distribute digital assets. DRM systems often use
encryption, access controls, watermarks, and usage tracking to enforce content
usage policies.

150

You might also like