OS and DBMS Study Material - 1
OS and DBMS Study Material - 1
OS and DBMS Study Material - 1
Study Material
On
Course Teacher
Dr. Ashish Mohan Yadav
4
Chapter 1 Introduction
Operating systems, come with a wide range of features that make them powerful, efficient,
and user-friendly. Here are some of the key features commonly found in modern operating
systems:
Multiuser Support: They support multiple user accounts, enabling different users to have
their personalized settings and files on the same device.
Graphical User Interface (GUI): Most modern operating systems provide a graphical
interface that makes it easier for users to interact with the computer through windows,
icons, menus, and pointers (WIMP).
File System: Operating systems have advanced file systems that organize and manage data
efficiently, providing features like hierarchical file structures, access control, and metadata
support.
Virtual Memory: This feature allows the operating system to use a portion of the hard
drive as an extension of physical RAM, which improves system performance by allowing
larger applications to run.
Device Drivers: Operating systems include device drivers that facilitate communication
between the hardware devices (e.g., printers, graphics cards, and input devices) and the
software.
5
Networking Support: Modern operating systems offer built-in networking capabilities,
allowing users to connect to the internet, share files, and access remote resources.
Security Features: Security is a crucial aspect, and modern operating systems come with
various security features such as user authentication, access control, firewalls, and
encryption to protect data and the system from threats.
Task Scheduling: The operating system manages the scheduling of tasks and allocates
CPU time to various processes efficiently to ensure smooth and fair execution.
Compatibility Support: They are designed to run a wide range of applications and support
different hardware configurations.
Updates and Patches: Modern operating systems often receive regular updates and
patches to fix bugs, enhance security, and improve performance.
Cloud Integration: Many modern operating systems have integrated cloud services,
enabling seamless access and synchronization of files and settings across devices.
Voice Assistants and AI Integration: Some modern operating systems incorporate voice
assistants and AI integration, enabling voice commands and intelligent features.
The evolution of operating systems refers to their historical development, progress, and
improvement over time. Operating systems have gone through several generations and
have witnessed significant advancements since the inception of computers. Here's a brief
overview of the key stages of operating system evolution:
First Generation (1940s-1950s): The earliest computers had no operating systems. Users
interacted with the hardware directly using machine language.
6
Second Generation (1950s-1960s): Batch processing operating systems emerged,
allowing users to submit jobs in batches, which the system executed sequentially.
Examples include IBM's OS/360.
Sixth Generation (present and beyond): Operating systems continue to evolve to meet
the demands of emerging technologies like artificial intelligence, virtual reality, and the
Internet of Things (IoT).
Throughout their evolution, operating systems have become more sophisticated, reliable,
and capable of handling complex tasks, making computers and other devices more
accessible and powerful for users.
Please note that this is a brief overview, and the actual historical development of operating
systems is much more intricate with numerous milestones and innovations in the field. The
ongoing evolution of operating systems is likely to continue as technology advances
further.
There are different types of operating systems because computer systems serve various
purposes and cater to diverse user needs. Each type of operating system is designed to
address specific requirements, use cases, and hardware configurations. The diversity in
operating systems allows for customization and specialization to optimize performance and
user experience in different computing environments. Different types of OS are
7
Multitasking Operating Systems:
Timesharing operating systems are a specific type of multitasking system that provides the
illusion of simultaneous execution of multiple processes to multiple users. The CPU rapidly
switches between different user sessions, giving each user a time slice (time-sharing) to
interact with the system. Timesharing systems were prevalent in mainframe computers and
played a significant role in making computing more accessible to multiple users.
Multithreading operating systems enable multiple threads within a single process to run
concurrently. Threads are lightweight units of execution that share the same resources
within a process, such as memory space and file handles. Multithreading allows for better
utilization of CPU cores and can enhance the performance of applications that are designed
to take advantage of multiple threads.
RTOS is designed for applications that require precise and deterministic timing. These
operating systems prioritize tasks based on their time-critical nature, ensuring that critical
tasks meet strict deadlines. RTOS is commonly used in embedded systems, robotics,
industrial automation, and other real-time applications.
8
Distributed Operating Systems:
A distributed operating system runs on multiple interconnected machines and enables them
to work together as a single unified system. Distributed operating systems facilitate
resource sharing, load balancing, and fault tolerance across the network. They are used in
large-scale computing environments and data centers to harness the combined processing
power of multiple machines.
Network operating systems are designed to manage and coordinate network resources and
activities. These operating systems provide functionalities like file and printer sharing, user
authentication, and centralized administration of networked devices. NOS is commonly
used in environments where multiple computers need to communicate and share resources
over a local area network (LAN) or a wide area network (WAN).
It's worth noting that many modern operating systems may incorporate multiple of these
types of operating systems features to cater to various needs and use cases. Additionally,
the field of operating systems is continually evolving, and new advancements and
variations may emerge over time.
Operating systems (OS) are software that manages computer hardware and provide
services to user applications.
9
The hardware—the central processing unit (CPU), the memory, and the input/output (I/O)
devices—provides the basic computing resources for the system. The application
programs—such as word processors, spreadsheets, compilers, and Web browsers—define
the ways in which these resources are used to solve users’ computing problems. The
operating system controls the hardware and coordinates its use among the various
application programs for the various users.
We can also view a computer system as consisting of hardware, software, and data. The
operating system provides the means for proper use of these resources in the operation of
the computer system. An operating system is similar to a government. Like a government,
it performs no useful function by itself. It simply provides an environment within which
other programs can do useful work. The key concepts and structures of operating systems
are:
Kernel: The kernel is the core component of an operating system, responsible for
managing resources, providing services to applications, and acting as an intermediary
between hardware and software.
File System: The file system manages the organization and storage of files and directories
on storage devices, such as hard drives.
Device Management: The OS interacts with hardware devices through device drivers,
enabling communication between software and hardware components.
User Interface: The user interface allows users to interact with the operating system and
applications. It can be command-line-based (CLI) or graphical (GUI).
A layered operating system architecture divides the OS into functional layers, where each
layer provides specific services to the layer above it. The layered approach simplifies the
design, implementation, and maintenance of the operating system. Typically, the layers are
arranged in a hierarchical manner, and each layer communicates only with the layers
directly above and below it. If one layer needs to be modified or replaced, it can often be
done without affecting the other layers.
Application Layer: The top layer where user applications and utilities reside.
Operating System Services Layer: This layer provides services like process
management, memory management, file systems, and device drivers. It acts as an interface
between applications and the hardware.
Hardware Layer: The bottom layer interacts directly with the computer hardware,
providing drivers and low-level control over hardware components.
Monolithic Systems:
In a monolithic system, a crash or failure in one part of the kernel can potentially affect the
entire system. However, monolithic systems can be efficient because there is no overhead
from communication between layers, as seen in layered systems.
11
Examples of monolithic operating systems include early versions of UNIX, Linux (to some
extent), and Microsoft Windows (prior to the introduction of the Windows NT architecture,
which follows a hybrid approach).
12
Chapter 2 Operating System Functions
We can view an operating system from several vantage points. One view focuses on the
services that the system provides; another, on the interface that it makes available to users
and programmers; a third, on its components and their interconnections. In this chapter, we
explore all three aspects of operating systems, showing the viewpoints of users,
programmers, and operating system designers. We consider what services an operating
system provides, how they are provided, how they are debugged, and what the various
methodologies are for designing such systems. Finally, we describe how operating systems
are created and how a computer starts its operating system.
Operating-System Operations
As mentioned earlier, modern operating systems are interrupt driven. If there are no
processes to execute, no I/O devices to service, and no users to whom to respond, an
operating system will sit quietly, waiting for something to happen. Events are almost
always signaled by the occurrence of an interrupt or a trap. A trap (or an exception) is a
software-generated interrupt caused either by an error (for example, division by zero or
invalid memory access) or by a specific request from a user program that an operating-
system service be performed. The interrupt-driven nature of an operating system defines
that system’s general structure. For each type of interrupt, separate segments of code in the
13
operating system determine what action should be taken. An interrupt service routine is
provided to deal with the interrupt.
Since the operating system and the users share the hardware and software resources of the
computer system, we need to make sure that an error in a user program could cause
problems only for the one program running. With sharing, many processes could be
adversely affected by a bug in one program. For example, if a process gets stuck in an
infinite loop, this loop could prevent the correct operation of many other processes. More
subtle errors can occur in a multiprogramming system, where one erroneous program might
modify another program, the data of another program, or even the operating system itself.
Without protection against these sorts of errors, either the computer must execute only one
process at a time or all output must be suspect. A properly designed operating system must
ensure that an incorrect (or malicious) program cannot cause other programs to execute
incorrectly
At the very least, we need two separate modes of operation: user mode and kernel mode
(also called supervisor mode, system mode, or privileged mode). A bit, called the mode
bit, is added to the hardware of the computer to indicate the current mode: kernel (0) or
user (1). With the mode bit, we can distinguish between a task that is executed on behalf
of the operating system and one that is executed on behalf of the user. When the computer
system is executing on behalf of a user application, the system is in user mode. However,
when a user application requests a service from the operating system (via a system call),
the system must transition from user to kernel mode to fulfill the request. This is shown in
Figure 1.10. As we shall see, this architectural enhancement is useful for many other
aspects of system operation as well At system boot time, the hardware starts in kernel
14
mode. The operating system is then loaded and starts user applications in user mode.
Whenever a trap or interrupt occurs, the hardware switches from user mode to kernel mode
(that is, changes the state of the mode bit to 0). Thus, whenever the operating system gains
control of the computer, it is in kernel mode. The system always switches to user mode (by
setting the mode bit to 1) before passing control to a user program. The dual mode of
operation provides us with the means for protecting the operating system from errant
users—and errant users from one another. We accomplish this protection by designating
some of the machine instructions that may cause harm as privileged instructions. The
hardware allows privileged instructions to be executed only in kernel mode. If an attempt
is made to execute a privileged instruction in user mode, the hardware does not execute the
instruction but rather treats it as illegal and traps it to the operating system
We can now see the life cycle of instruction execution in a computer system. Initial control
resides in the operating system, where instructions are executed in kernel mode. When
control is given to a user application, the mode is set to user mode. Eventually, control is
switched back to the operating system via an interrupt, a trap, or a system call.
System calls provide the means for a user program to ask the operating system to perform
tasks reserved for the operating system on the user program’s behalf. A system call is
invoked in a variety of ways, depending on the functionality provided by the underlying
processor. In all forms, it is the method used by a process to request action by the operating
system. A system call usually takes the form of a trap to a specific location in the interrupt
vector. This trap can be executed by a generic trap instruction, although some systems
(such as MIPS) have a specific syscall instruction to invoke a system call
A program does nothing unless its instructions are executed by a CPU. A program in
execution, as mentioned, is a process. A time-shared user program such as a compiler is a
15
process. A word-processing program being run by an individual user on a PC is a process.
A system task, such as sending output to a printer, can also be a process (or at least part of
one). For now, you can consider a process to be a job or a time-shared program, but later
you will learn that the concept is more general. As we shall see in Chapter 3, it is possible
to provide system calls that allow processes to create subprocesses to execute concurrently.
A process needs certain resources—including CPU time, memory, files, and I/O devices—
to accomplish its task. These resources are either given to the process when it is created or
allocated to it while it is running. In addition to the various physical and logical resources
that a process obtains when it is created, various initialization data (input) may be passed
along. For example, consider a process whose function is to display the status of a file on
the screen of a terminal. The process will be given the name of the file as input and will
execute the appropriate instructions and system calls to obtain and display the desired
information on the terminal. When the process terminates, the operating system will
reclaim any reusable resources.
The operating system is responsible for the following activities in connection with process
management:
16
An operating system manages the processors work by allocating various jobs to it and
ensuring that each process receives enough time from the processor to function properly.
Keeps track of the status of processes. The program which performs this task is known as
a traffic controller. Allocates the CPU that is a processor to a process. De-allocates
processor when a process is no more required.
Memory Management
The main memory is central to the operation of a modern computer system. Main memory
is a large array of bytes, ranging in size from hundreds of thousands to billions. Each byte
has its own address. Main memory is a repository of quickly accessible data shared by the
CPU and I/O devices. The central processor reads instructions from main memory during
the instruction-fetch cycle and both reads and writes data from main memory during the
data-fetch cycle (on a von Neumann architecture). As noted earlier, the main memory is
generally the only large storage device that the CPU is able to address and access directly.
For example, for the CPU to process data from disk, those data must first be transferred to
main memory by CPU-generated I/O calls. In the same way, instructions must be in
memory for the CPU to execute them.
17
For a program to be executed, it must be mapped to absolute addresses and loaded into
memory. As the program executes, it accesses program instructions and data from memory
by generating these absolute addresses. Eventually, the program terminates, its memory
space is declared available, and the next program can be loaded and executed. To improve
both the utilization of the CPU and the speed of the computer’s response to its users,
general-purpose computers must keep several programs in memory, creating a need for
memory management. Many different memory management schemes are used. These
schemes reflect various approaches, and the effectiveness of any given algorithm depends
on the situation. In selecting a memory-management scheme for a specific system, we must
take into account many factors especially the hardware design of the system. Each
algorithm requires its own hardware support.
The operating system is responsible for the following activities in connection with memory
management:
• Keeping track of which parts of memory are currently being used and who is using them
• Deciding which processes (or parts of processes) and data to move into and out of memory
File management is one of the most visible components of an operating system. Computers
can store information on several different types of physical media. Magnetic disk, optical
disk, and magnetic tape are the most common. Each of these media has its own
characteristics and physical organization. Each medium is controlled by a device, such as
a disk drive or tape drive, that also has its own unique characteristics. These properties
include access speed, capacity, data-transfer rate, and access method (sequential or
random).
The operating system implements the abstract concept of a file by managing mass-storage
media, such as tapes and disks, and the devices that control them. In addition, files are
18
normally organized into directories to make them easier to use. Finally, when multiple
users have access to files, it may be desirable to control which user may access a file and
how that user may access it (for example, read, write, append).
The operating system is responsible for the following activities in connection with file
management:
As we have already seen, because main memory is too small to accommodate all data and
programs, and because the data that it holds are lost when power is lost, the computer
system must provide secondary storage to back up main memory. Most modern computer
systems use disks as the principal on-line storage medium for both programs and data. Most
programs—including compilers, assemblers, word processors, editors, and formatters—
are stored on a disk until loaded into memory. They then use the disk as both the source
and destination of their processing. Hence, the proper management of disk storage is of
central importance to a computer system. The operating system is responsible for the
following activities in connection with disk management:
• Free-space management
• Storage allocation
• Disk scheduling
Because secondary storage is used frequently, it must be used efficiently. The entire speed
of operation of a computer may hinge on the speeds of the disk subsystem and the
algorithms that manipulate that subsystem. There are, however, many uses for storage that
19
is slower and lower in cost (and sometimes of higher capacity) than secondary storage.
Backups of disk data, storage of seldom-used data, and long-term archival storage are some
examples. Magnetic tape drives and their tapes and CD and DVD drives and platters are
typical tertiary storage devices. The media (tapes and optical platters) vary between
WORM (write-once, read-many-times) and RW (read– write) formats.
Tertiary storage is not crucial to system performance, but it still must be managed. Some
operating systems take on this task, while others leave tertiary-storage management to
application programs. Some of the functions that operating systems can provide include
mounting and unmounting media in devices, allocating and freeing the devices for
exclusive use by processes, and migrating data from secondary to tertiary storage.
Caching
Caching works based on the principle of the "principle of locality," which states that
programs tend to access a relatively small portion of data frequently and exhibit temporal
and spatial locality:
Spatial Locality: When a data item is accessed, neighboring data items are also likely to
be accessed soon. Caching takes advantage of spatial locality by fetching and storing
contiguous blocks of data rather than individual items.
Caching in memory management can occur at multiple levels within a computer system:
CPU Cache: Modern CPUs have small but extremely fast cache memory located directly
on the processor chip. The CPU cache is divided into multiple levels, such as L1, L2, and
20
L3 caches. These caches store frequently used instructions and data, reducing the time the
CPU spends waiting for data to be fetched from main memory.
Main Memory (RAM) Cache: The main memory itself can have a cache, known as the
Memory Cache or Memory-side Cache. This cache sits between the CPU and the main
memory and serves as an intermediary to speed up memory access.
Disk Caching: Operating systems and file systems also use disk caching to store frequently
accessed data from slower storage devices (HDDs or SSDs) in faster RAM. This helps in
reducing disk I/O operations and improving overall system performance.
Web Browser Caching: Web browsers use caching to store previously downloaded web
pages, images, and other resources. When you revisit a web page, the browser can load
some of the resources from the local cache instead of re-downloading them from the
internet, leading to faster page load times.
Caching is an essential technique to optimize memory access and improve the performance
of computer systems. By reducing the need to access slower storage devices and leveraging
the principle of locality, caching ensures that frequently used data is readily available,
leading to faster response times and a more efficient use of system resources.
21
22
Chapter 3 OS System Calls
The most central concept in any operating system is the process: an abstraction of a running
program. Everything else hinges on this concept, and the operating system designer (and
student) should have a thorough understanding of what a process is as early as possible.
Processes are one of the oldest and most important abstractions that operating systems
provide. They support the ability to have (pseudo) concurrent operation ev en when there
is only one CPU available. They turn a single CPU into multiple virtual CPUs. Without the
process abstraction, modern computing could not exist.
All modern computers often do several things at the same time. People used to working
with computers may not be fully aware of this fact, so a few examples may make the point
clearer. First consider a Web server. Requests come in from all over asking for Web pages.
When a request comes in, the server checks to see if the page needed is in the cache. If it
is, it is sent back; if it is not, a disk request is started to fetch it. However, from the CPU’s
perspective, disk requests take eternity. While waiting for a disk request to complete, many
more requests may come in. If there are multiple disks present, some or all of the newer
ones may be fired off to other disks long before the first request is satisfied. Clearly some
way is needed to model and control this concurrency. Processes (and especially threads)
can help here.
Now consider a user PC. When the system is booted, many processes are secretly started,
often unknown to the user. For example, a process may be started up to wait for incoming
email. Another process may run on behalf of the antivirus program to check periodically if
any new virus definitions are available. In addition, explicit user processes may be running,
printing files and backing up the user’s photos on a USB stick, all while the user is surfing
the Web. All this activity has to be managed, and a multiprogramming system supporting
multiple processes comes in very handy here.
In any multiprogramming system, the CPU switches from process to process quickly,
running each for tens or hundreds of milliseconds. While, strictly speaking, at any one
instant the CPU is running only one process, in the course of 1 second it may work on
23
several of them, giving the illusion of parallelism. Sometimes people speak of
pseudoparallelism in this context, to contrast it with the true hardware parallelism of
multiprocessor systems (which have two or more CPUs sharing the same physical
memory). Keeping track of multiple, parallel activities is hard for people to do. Therefore,
operating system designers over the years have ev olved a conceptual model (sequential
processes) that makes parallelism easier to deal with.
While creating a process the operating system performs several operations. To identify the
processes, it assigns a process identification number (PID) to each process. As the
operating system supports multi-programming, it needs to keep track of all the processes.
For this task, the process control block (PCB) is used to track the process’s execution status.
Each block of memory contains information about the process state, program counter, stack
pointer, status of opened files, scheduling algorithms, etc. All this information is required
and must be saved when the process is switched from one state to another. When the
process makes a transition from one state to another, the operating system must update
information in the process’s PCB. A process control block (PCB) contains information
about the process, i.e. registers, quantum, priority, etc. The process table is an array of
PCBs, that means logically contains a PCB for all of the current processes in the system.
24
● Pointer – It is a stack pointer which is required to be saved when the process is
switched from one state to another to retain the current position of the process.
● Process state – It stores the respective state of the process.
● Process number – Every process is assigned with a unique id known as process ID
or PID which stores the process identifier.
● Program counter – It stores the counter which contains the address of the next
instruction that is to be executed for the process.
● Register – These are the CPU registers which includes: accumulator, base, registers
and general purpose registers.
● Memory limits – This field contains the information about memory management
system used by operating system. This may include the page tables, segment tables etc.
● Open files list – This information includes the list of files opened for a process.
● Interrupt handling: The PCB also contains information about the interrupts that
a process may have generated and how they were handled by the operating system.
● Context switching: The process of switching from one process to another is called
context switching. The PCB plays a crucial role in context switching by saving the state of
the current process and restoring the state of the next process.
● Real-time systems: Real-time operating systems may require additional
information in the PCB, such as deadlines and priorities, to ensure that time-critical
processes are executed in a timely manner.
● Virtual memory management: The PCB may contain information about a
process’s virtual memory management, such as page tables and page fault handling.
● Inter-process communication: The PCB can be used to facilitate inter-process
communication by storing information about shared resources and communication
channels between processes.
● Fault tolerance: Some operating systems may use multiple copies of the PCB to
provide fault tolerance in case of hardware failures or software errors.
Advantages:
Efficient process management: The process table and PCB provide an efficient way to
manage processes in an operating system. The process table contains all the information
about each process, while the PCB contains the current state of the process, such as the
program counter and CPU registers.
Resource management: The process table and PCB allow the operating system to manage
system resources, such as memory and CPU time, efficiently. By keeping track of each
25
process’s resource usage, the operating system can ensure that all processes have access to
the resources they need.
Process synchronization: The process table and PCB can be used to synchronize
processes in an operating system. The PCB contains information about each process’s
synchronization state, such as its waiting status and the resources it is waiting for.
Process scheduling: The process table and PCB can be used to schedule processes for
execution. By keeping track of each process’s state and resource usage, the operating
system can determine which processes should be executed next.
Disadvantages:
Overhead: The process table and PCB can introduce overhead and reduce system
performance. The operating system must maintain the process table and PCB for each
process, which can consume system resources.
Complexity: The process table and PCB can increase system complexity and make it more
challenging to develop and maintain operating systems. The need to manage and
synchronize multiple processes can make it more difficult to design and implement system
features and ensure system stability.
Scalability: The process table and PCB may not scale well for large-scale systems with
many processes. As the number of processes increases, the process table and PCB can
become larger and more difficult to manage efficiently.
Security: The process table and PCB can introduce security risks if they are not
implemented correctly. Malicious programs can potentially access or modify the process
table and PCB to gain unauthorized access to system resources or cause system instability.
Miscellaneous accounting and status data – This field includes information about the
amount of CPU used, time constraints, jobs or process number, etc. The process control
block stores the register content also known as execution content of the processor when it
was blocked from running. This execution content architecture enables the operating
system to restore a process’s execution context when the process returns to the running
state. When the process makes a transition from one state to another, the operating system
updates its information in the process’s PCB. The operating system maintains pointers to
26
each process’s PCB in a process table so that it can access the PCB quickly.
27
Operations on Processes
A process is an activity of executing a program. Basically, it is a program
underexecution. Every process needs certain resources to complete its task.
28
Creation
This is the initial step of the process execution activity. Process creation means the
construction of a new process for execution. This might be performed by the system, the
user, or the old process itself. There are several events that lead to the process creation.
Some of the such events are the following:
1. When we start the computer, the system creates several background processes.
2. A user may request to create a new process.
3. A process can create a new process itself while executing.
4. The batch system takes initiation of a batch job.
Scheduling/Dispatching
The event or activity in which the state of the process is changed from ready to run. It
means the operating system puts the process from the ready state into the running state.
Dispatching is done by the operating system when the resources are free or the process has
higher priority than the ongoing process. There are various other cases in which the process
in the running state is preempted and the process in the ready state is dispatched by the
operating system.
Blocking
29
When a process invokes an input-output system call that blocks the process, and operating
system is put in block mode. Block mode is basically a mode where the process waits for
input-output. Hence on the demand of the process itself, the operating system blocks the
process and dispatches another process to the processor. Hence, in process-blocking
operations, the operating system puts the process in a ‘waiting’ state.
Preemption
When a timeout occurs that means the process hadn’t been terminated in the allotted time
interval and the next process is ready to execute, then the operating system preempts the
process. This operation is only valid where CPU scheduling supports preemption.
Basically, this happens in priority scheduling where on the incoming of high priority
process the ongoing process is preempted. Hence, in process preemption operation, the
operating system puts the process in a ‘ready’ state.
Process Termination
Process termination is the activity of ending the process. In other words, process
termination is the relaxation of computer resources taken by the process for the execution.
Like creation, in termination also there may be several events that may lead to the process
of termination. Some of them are:
• The process completes its execution fully and it indicates to the OS that it has
finished.
• The operating system itself terminates the process due to service errors.
• There may be a problem in hardware that terminates the process.
• One process can be terminated by another process.
30
• Non-preemptive: In this case, a process’s resource cannot be taken before the
process has finished running. When a running process finishes and transitions to a
waiting state, resources are switched.
• Preemptive: In this case, the OS assigns resources to a process for a predetermined
period of time. The process switches from running state to ready state or from
waiting for state to ready state during resource allocation. This switching happens
because the CPU may give other processes priority and substitute the currently
active process for the higher priority process.
It is responsible for selecting one process from the ready state for scheduling it on the
running state. Note: Short-term scheduler only selects the process to schedule it doesn’t
load the process on running. Here is when all the scheduling algorithms are used. The CPU
scheduler is responsible for ensuring no starvation due to high burst time processes.The
dispatcher is responsible for loading the process selected by the Short-term scheduler on
the CPU (Ready to Running State) Context switching is done by the dispatcher only. A
dispatcher does the following:
1. Switching context.
2. Switching to user mode.
3. Jumping to the proper location in the newly loaded program.
Medium-Term Scheduler
It is responsible for suspending and resuming the process. It mainly does swapping
(moving processes from main memory to disk and vice versa). Swapping may be necessary
to improve the process mix or because a change in memory requirements has
overcommitted available memory, requiring memory to be freed up. It is helpful in
31
maintaining a perfect balance between the I/O bound and the CPU bound. It reduces the
degree of multiprogramming.
Some Other Schedulers
● I/O schedulers: I/O schedulers are in charge of managing the execution of I/O
operations such as reading and writing to discs or networks. They can use various
algorithms to determine the order in which I/O operations are executed, such as FCFS
(First-Come, First-Served) or RR (Round Robin).
● Real-time schedulers: In real-time systems, real-time schedulers ensure that
critical tasks are completed within a specified time frame. They can prioritize and schedule
tasks using various algorithms such as EDF (Earliest Deadline First) or RM (Rate
Monotonic).
It is a CPU It is a process-
It is a job scheduler
scheduler swapping scheduler.
Speed lies in
Generally, Speed is Speed is the
between both short
lesser than short term fastest among all
and long-term
scheduler of them.
schedulers.
32
It gives less
It controls the degree control over how It reduces the degree
of much of
multiprogramming multiprogrammin multiprogramming.
g is done.
S
.
State Description
n
o
33
Running
1
. A newly created process joins the system in a running state when it is
created.
Not running
Processes that are not currently running are kept in a queue and await
execution. A pointer to a specific process is contained in each entry in the
2
queue. Linked lists are used to implement the queue system. This is how
.
the dispatcher is used. When a process is stopped, it is moved to the back
of the waiting queue. The process is discarded depending on whether it
succeeded or failed. The dispatcher then chooses a process to run from the
queue in either scenario.
Context Switching
In order for a process execution to be continued from the same point at a later time, context
switching is a mechanism to store and restore the state or context of a CPU in the Process
Control block. A context switcher makes it possible for multiple processes to share a single
CPU using this method. A multitasking operating system must include context switching
among its features.
The state of the currently running process is saved into the process control block when the
scheduler switches the CPU from executing one process to another. The state used to set
the PC, registers, etc. for the process that will run next is then loaded from its own PCB.
After that, the second can start processing.
34
In order for a process execution to be continued from the same point at a later time, context
switching is a mechanism to store and restore the state or context of a CPU in the Process
Control block. A context switcher makes it possible for multiple processes to share a single
CPU using this method. A multitasking operating system must include context switching
among its features.
● Program Counter
● Scheduling information
● The base and limit register value
● Currently used register
● Changed State
● I/O State information
● Accounting information
● Independent process.
● Co-operating process.
An independent process is not affected by the execution of other processes while a co-
operating process can be affected by other executing processes. Though one can think that
those processes, which are running independently, will execute very efficiently, in reality,
there are many situations when co-operative nature can be utilized for increasing
computational speed, convenience, and modularity. Inter-process communication (IPC) is
a mechanism that allows processes to communicate with each other and synchronize their
actions. The communication between these processes can be seen as a method of co-
35
operation between them. Processes can communicate with each other through both:
1. Shared Memory
2. Message passing
Figure 1 below shows a basic structure of communication between processes via the shared
memory method and via the message passing method.
An operating system can implement both methods of communication. First, we will discuss
the shared memory methods of communication and then message passing. Communication
between processes using shared memory requires processes to share some variable, and it
completely depends on how the programmer will implement it. One way of communication
using shared memory can be imagined like this: Suppose process1 and process2 are
executing simultaneously, and they share some resources or use some information from
another process. Process1 generates information about certain computations or resources
being used and keeps it as a record in shared memory. When process2 needs to use the
shared information, it will check in the record stored in shared memory and take note of
the information generated by process1 and act accordingly. Processes can use shared
memory for extracting information as a record from another process as well as for
delivering any specific information to other processes.
Let’s discuss an example of communication between processes using the shared memory
method.
36
i) Shared Memory Method
Ex: Producer-Consumer problem
There are two processes: Producer and Consumer. The producer produces some items and
the Consumer consumes that item. The two processes share a common space or memory
location known as a buffer where the item produced by the Producer is stored and from
which the Consumer consumes the item if needed. There are two versions of this problem:
the first one is known as the unbounded buffer problem in which the Producer can keep on
producing items and there is no limit on the size of the buffer, the second one is known as
the bounded buffer problem in which the Producer can produce up to a certain number of
items before it starts waiting for Consumer to consume it. We will discuss the bounded
buffer problem. First, the Producer and the Consumer will share some common memory,
then the producer will start producing items. If the total produced item is equal to the size
of the buffer, the producer will wait to get it consumed by the Consumer. Similarly, the
consumer will first check for the availability of the item. If no item is available, the
Consumer will wait for the Producer to produce it. If there are items available, Consumer
will consume them.
ii) Messaging Passing Method
Now, We will start our discussion of the communication between processes via message
passing. In this method, processes communicate with each other without using any kind
of shared memory. If two processes p1 and p2 want to communicate with each other, they
proceed as follows:
37
The message size can be of fixed size or of variable size. If it is of fixed size, it is easy for
an OS designer but complicated for a programmer and if it is of variable size then it is
easy for a programmer but complicated for the OS designer. A standard message can
have two parts: header and body.
The header part is used for storing message type, destination id, source id, message length,
and control information. The control information contains information like what to do if
runs out of buffer space, sequence number, priority. Generally, message is sent using FIFO
style.
38
A link has some capacity that determines the number of messages that can reside in it
temporarily for which every link has a queue associated with it which can be of zero
capacity, bounded capacity, or unbounded capacity. In zero capacity, the sender waits until
the receiver informs the sender that it has received the message. In non-zero capacity cases,
a process does not know whether a message has been received or not after the send
operation. For this, the sender must communicate with the receiver explicitly.
Implementation of the link depends on the situation, it can be either a direct communication
link or an in-directed communication link.
Direct Communication links are implemented when the processes use a specific process
identifier for the communication, but it is hard to identify the sender ahead of time.
For example the print server. In-direct Communication is done via a shared mailbox
(port), which consists of a queue of messages. The sender keeps the message in mailbox
and the receiver picks them up.
A process that is blocked is one that is waiting for some event, such as a resource becoming
available or the completion of an I/O operation. IPC is possible between the processes on
same computer as well as on the processes running on different computer i.e. in
networked/distributed system. In both cases, the process may or may not be blocked while
sending a message or attempting to receive a message so message passing may be blocking
or non-blocking. Blocking is considered synchronous and blocking send means the sender
will be blocked until the message is received by receiver. Similarly, blocking receive has
the receiver block until a message is available. Non-blocking is considered asynchronous
and Non-blocking send has the sender sends the message and continue. Similarly, Non-
blocking receive has the receiver receive a valid message or null. After a careful analysis,
we can come to a conclusion that for a sender it is more natural to be non-blocking after
message passing as there may be a need to send the message to different processes.
However, the sender expects acknowledgment from the receiver in case the send fails.
Similarly, it is more natural for a receiver to be blocking after issuing the receive as the
information from the received message may be used for further execution. At the same
time, if the message send keep on failing, the receiver will have to wait indefinitely. That
39
is why we also consider the other possibility of message passing. There are basically three
preferred combinations:
In Indirect message passing, processes use mailboxes (also referred to as ports) for
sending and receiving messages. Each mailbox has a unique id and processes can
communicate only if they share a mailbox. Link established only if processes share a
common mailbox and a single link can be associated with many processes. Each pair of
processes can share several communication links and these links may be unidirectional or
bi-directional. Suppose two processes want to communicate through Indirect message
passing, the required operations are: create a mailbox, use this mailbox for sending and
receiving messages, then destroy the mailbox.
The standard primitives used are:
send(A, message) which means send the message to mailbox A.
40
The primitive for the receiving the message also works in the same way e.g. received (A,
message). There is a problem with this mailbox implementation. Suppose there are more
than two processes sharing the same mailbox and suppose the process p1 sends a message
to the mailbox, which process will be the receiver? This can be solved by either enforcing
that only two processes can share a single mailbox or enforcing that only one process is
allowed to execute the receive at a given time or select any process randomly and notify
the sender about the receiver. A mailbox can be made private to a single sender/receiver
pair and can also be shared between multiple sender/receiver pairs. Port is an
implementation of such mailbox that can have multiple senders and a single receiver. It is
used in client/server applications (in this case the server is the receiver). The port is owned
by the receiving process and created by OS on the request of the receiver process and can
be destroyed either on request of the same receiver processor when the receiver terminates
itself. Enforcing that only one process is allowed to execute the receive can be done using
the concept of mutual exclusion. Mutex mailbox is created which is shared by n process.
The sender is non-blocking and sends the message. The first process which executes the
receive will enter in the critical section and all other processes will be blocking and will
wait.
● Pipe
● Socket
● Remote Procedural calls (RPCs)
Advantages of IPC:
41
• Enables processes to communicate with each other and share resources, leading to
increased efficiency and flexibility.
• Facilitates coordination between multiple processes, leading to better overall
system performance.
• Allows for the creation of distributed systems that can span multiple computers or
networks.
• Can be used to implement various synchronization and communication protocols,
such as semaphores, pipes, and sockets.
Disadvantages of IPC:
42
Context Switching refers to the process/method used by the system to change the process
from one state to another using the CPUs present in the system to perform its job.
Example – Suppose in the OS there (N) numbers of processes are stored in a Process
Control Block(PCB). like The process is running using the CPU to do its job. While a
process is running, other processes with the highest priority queue up to use the CPU to
complete their job.
The Need for Context Switching
Context switching enables all processes to share a single CPU to finish their execution and
store the status of the system’s tasks. The execution of the process begins at the same place
where there is a conflict when the process is reloaded into the system.
The operating system’s need for context switching is explained by the reasons listed below.
• One process does not directly switch to another within the system. Context
switching makes it easier for the operating system to use the CPU’s resources to
carry out its tasks and store its context while switching between multiple processes.
• Context switching enables all processes to share a single CPU to finish their
execution and store the status of the system’s tasks. The execution of the process
begins at the same place where there is a conflict when the process is reloaded into
the system.
• Context switching only allows a single CPU to handle multiple processes requests
parallelly without the need for any additional processors.
1. Interrupts
2. Multitasking
3. User/Kernel switch
Interrupts: When a CPU requests that data be read from a disc, if any interruptions occur,
context switching automatically switches to a component of the hardware that can handle
the interruptions more quickly.
Multitasking: The ability for a process to be switched from the CPU so that another
process can run is known as context switching. When a process is switched, the previous
state is retained so that the process can continue running at the same spot in the system.
Kernel/User Switch: This trigger is used when the OS needed to switch between the user
mode and kernel mode.
43
When switching between user mode and kernel/user mode is necessary, operating systems
use the kernel/user switch.
Process Control Block
So, The Process Control block(PCB) is also known as a Task Control Block. it represents
a process in the Operating System. A process control block (PCB) is a data structure used
by a computer to store all information about a process. It is also called the descriptive
process. When a process is created (started or installed), the operating system creates a
process manager.
State Diagram of Context Switching
Attributes of a process
The Attributes of the process are used by the Operating System to create the process control
block (PCB) for each of them. This is also called context of the process. Attributes which
are stored in the PCB are described below.
1. Process ID
When a process is created, a unique id is assigned to the process which is used for unique
identification of the process in the system.
2. Program counter
A program counter stores the address of the last instruction of the process on which the
process was suspended. The CPU uses this address when the execution of this process is
resumed.
3. Process State
The Process, from its creation to the completion, goes through various states which are
new, ready, running and waiting. We will discuss about them later in detail.
4. Priority
Every process has its own priority. The process with the highest priority among the
processes gets the CPU first. This is also stored on the process control block.
Every process has its own set of registers which are used to hold the data which is generated
during the execution of the process.
45
During the Execution, Every process uses some files which need to be present in the main
memory. OS also maintains a list of open files in the PCB.
OS also maintain the list of all open devices which are used during the execution of the
process.
Process States
State Diagram
46
The process, from its creation to completion, passes through various states. The minimum
number of states is five.
The names of the states are not standardized although the process may be in one of the
following states during execution.
1. New
A program which is going to be picked up by the OS into the main memory is called a new
process.
2. Ready
Whenever a process is created, it directly enters in the ready state, in which, it waits for the
CPU to be assigned. The OS picks the new processes from the secondary memory and put
all of them in the main memory.
The processes which are ready for the execution and reside in the main memory are called
ready state processes. There can be many processes present in the ready state.
3. Running
One of the processes from the ready state will be chosen by the OS depending upon the
scheduling algorithm. Hence, if we have only one CPU in our system, the number of
running processes for a particular time will always be one. If we have n processors in the
system then we can have n processes running simultaneously.
4. Block or wait
From the Running state, a process can make the transition to the block or wait state
depending upon the scheduling algorithm or the intrinsic behavior of the process.
When a process waits for a certain resource to be assigned or for the input from the user
then the OS move this process to the block or wait state and assigns the CPU to the other
processes.
5. Completion or termination
When a process finishes its execution, it comes in the termination state. All the context of
the process (Process Control Block) will also be deleted the process will be terminated by
the Operating system.
6. Suspend ready
47
A process in the ready state, which is moved to secondary memory from the main memory
due to lack of the resources (mainly primary memory) is called in the suspend ready state.
If the main memory is full and a higher priority process comes for the execution then the
OS have to make the room for the process in the main memory by throwing the lower
priority process out into the secondary memory. The suspend ready processes remain in
the secondary memory until the main memory gets available.
7. Suspend wait
Instead of removing the process from the ready queue, it's better to remove the blocked
process which is waiting for some resources in the main memory. Since it is already waiting
for some resource to get available hence it is better if it waits in the secondary memory and
make room for the higher priority process. These processes complete their execution once
the main memory gets available and their wait is finished.
48
Chapter 4 OS Processes
1. Creation
Once the process is created, it will be ready and come into the ready queue (main memory)
and will be ready for the execution.
2. Scheduling
Out of the many processes present in the ready queue, the Operating system chooses one
process and start executing it. Selecting the process which is to be executed next, is known
as scheduling.
3. Execution
Once the process is scheduled for the execution, the processor starts executing it. Process
may come to the blocked or wait state during the execution then in that case the processor
starts executing the other processes.
4. Deletion/killing
Once the purpose of the process gets over then the OS will kill the process. The Context of
the process (PCB) will be deleted and the process gets terminated by the Operating system.
Operating system uses various schedulers for the process scheduling described below.
Long term scheduler is also known as job scheduler. It chooses the processes from the pool
(secondary memory) and keeps them in the ready queue maintained in the primary
memory.
Long Term scheduler mainly controls the degree of Multiprogramming. The purpose of
long term scheduler is to choose a perfect mix of IO bound and CPU bound processes
among the jobs present in the pool.
If the job scheduler chooses more IO bound processes then all of the jobs may reside in the
blocked state all the time and the CPU will remain idle most of the time. This will reduce
49
the degree of Multiprogramming. Therefore, the Job of long term scheduler is very critical
and may affect the system for a very long time.
Short term scheduler is also known as CPU scheduler. It selects one of the Jobs from the
ready queue and dispatch to the CPU for the execution.
A scheduling algorithm is used to select which job is going to be dispatched for the
execution. The Job of the short term scheduler can be very critical in the sense that if it
selects job whose CPU burst time is very high then all the jobs after that, will have to wait
in the ready queue for a very long time.
This problem is called starvation which may arise if the short term scheduler makes some
mistakes while selecting the job.
Medium term scheduler takes care of the swapped out processes. If the running state
processes needs some IO time for the completion then there is a need to change its state
from running to waiting.
Medium term scheduler is used for this purpose. It removes the process from the running
state to make room for the other processes. Such processes are the swapped out processes
and this procedure is called swapping. The medium term scheduler is responsible for
suspending and resuming the processes.
Process Queues
The Operating system manages various types of queues for each of the process states. The
PCB related to the process is also stored in the queue of the same state. If the Process is
moved from one state to another state then its PCB is also unlinked from the corresponding
queue and added to the other state queue in which the transition is made.
50
There are the following queues maintained by the Operating system.
1. Job Queue
In starting, all the processes get stored in the job queue. It is maintained in the secondary
memory. The long term scheduler (Job scheduler) picks some of the jobs and put them in
the primary memory.
2. Ready Queue
Ready queue is maintained in primary memory. The short term scheduler picks the job
from the ready queue and dispatch to the CPU for the execution.
3. Waiting Queue
When the process needs some IO operation in order to complete its execution, OS changes
the state of the process from running to waiting. The context (PCB) associated with the
process gets stored on the waiting queue which will be used by the Processor when the
process finishes the IO.
1. Arrival Time
The time at which the process enters into the ready queue is called the arrival time.
2. Burst Time
The total amount of time required by the CPU to execute the whole process is called the
Burst Time. This does not include the waiting time. It is confusing to calculate the
51
execution time for a process even before executing it hence the scheduling problems based
on the burst time cannot be implemented in reality.
3. Completion Time
The Time at which the process enters into the completion state or the time at which the
process completes its execution, is called completion time.
4. Turnaround time
The total amount of time spent by the process from its arrival to its completion, is called
Turnaround time.
5. Waiting Time
The Total amount of time for which the process waits for the CPU to be assigned is called
waiting time.
6. Response Time
The difference between the arrival time and the time at which the process first gets the
CPU is called Response Time.
CPU Scheduling
In the uniprogramming systems like MS DOS, when a process waits for any I/O
operation to be done, the CPU remains idol. This is an overhead since it wastes the time
and causes the problem of starvation. However, In Multiprogramming systems, the CPU
doesn't remain idle during the waiting time of the Process and it starts executing other
processes. Operating System has to define which process the CPU will be given.
In Multiprogramming systems, the Operating system schedules the processes on the CPU
to have the maximum utilization of it and this procedure is called CPU scheduling. The
Operating System uses various scheduling algorithm to schedule the processes.
This is a task of the short term scheduler to schedule the CPU for the number of processes
present in the Job Pool. Whenever the running process requests some IO operation then the
short term scheduler saves the current context of the process (also called PCB) and changes
its state from running to waiting. During the time, process is in waiting state; the Short term
scheduler picks another process from the ready queue and assigns the CPU to this process.
This procedure is called context switching.
52
What is saved in the Process Control Block?
The Operating system maintains a process control block during the lifetime of the process.
The Process control block is deleted when the process is terminated or killed. There is the
following information which is saved in the process control block and is changing with the
state of the process.
In Multiprogramming, if the long term scheduler picks more I/O bound processes then most
of the time, the CPU remains idol. The task of Operating system is to optimize the
utilization of resources.
If most of the running processes change their state from running to waiting then there may
always be a possibility of deadlock in the system. Hence to reduce this overhead, the OS
needs to schedule the jobs to get the optimal utilization of CPU and to avoid the possibility
to deadlock.
53
Scheduling Algorithms in OS (Operating System)
There are various algorithms which are used by the Operating System to schedule the
processes on the processor in an efficient way.
There are the following algorithms which can be used to schedule the jobs.
It is the simplest algorithm to implement. The process with the minimal arrival time will
get the CPU first. The lesser the arrival time, the sooner will the process gets the CPU. It
is the non-preemptive type of scheduling.
2. Round Robin
In the Round Robin scheduling algorithm, the OS defines a time quantum (slice). All the
processes will get executed in the cyclic way. Each of the process will get the CPU for a
small amount of time (called time quantum) and then get back to the ready queue to wait
for its next turn. It is a preemptive type of scheduling.
The job with the shortest burst time will get the CPU first. The lesser the burst time, the
sooner will the process get the CPU. It is the non-preemptive type of scheduling.
It is the preemptive form of SJF. In this algorithm, the OS schedules the Job according to
the remaining time of the execution.
In this scheduling Algorithm, the process with highest response ratio will be scheduled
next. This reduces the starvation in the system.
In this tutorial, we are going to learn an important concept in CPU Process Scheduling
Algorithms. The important concept name is First Come First Serve. This is the basic
algorithm which every student must learn to understand all the basics of CPU Process
Scheduling Algorithms.
First Come First Serve paves the way for understanding of other algorithms. This algorithm
may have many disadvantages. But these disadvantages created very new and efficient
algorithms. So, it is our responsibility to learn about First Come First Serve CPU Process
Scheduling Algorithms.
Important Abbreviations
First Come First Serve CPU Scheduling Algorithm shortly known as FCFS is the first
algorithm of CPU Process Scheduling Algorithm. In First Come First Serve Algorithm
what we do is to allow the process to execute in linear manner.
55
This means that whichever process enters process enters the ready queue first is executed
first. This shows that First Come First Serve Algorithm follows First In First Out (FIFO)
principle.
The First Come First Serve Algorithm can be executed in Pre Emptive and Non-Pre
Emptive manner. Before, going into examples, let us understand what is Pre Emptive and
Non-Pre Emptive Approach in CPU Process Scheduling.
Pre Emptive-Approach
In this instance of Pre Emptive-Process Scheduling, the OS allots the resources to a Process
for a predetermined period of time. The process transitions from running state to ready
state or from waiting state to ready state during resource allocation. This switching happens
because the CPU may assign other processes precedence and substitute the currently active
process for the higher priority process.
Non-Pre Emptive-Approach
Convoy Effect is a phenomenon which occurs in the Scheduling Algorithm named First
Come First Serve (FCFS). The First Come First Serve Scheduling Algorithm occurs in a
way of non-preemptive way.
The Non preemptive way means that if a process or job is started execution, then the
operating system must complete its process or job. Until, the process or job is zero the new
or next process or job does not start its execution. The definition of Non-Preemptive
Scheduling in terms of Operating System means that the Central Processing Unit (CPU)
will be completely dedicated till the end of the process or job started first and the new
process or job is executed only after finishing of the older process or job.
There may be a few cases, which might cause the Central Processing Unit (CPU) to allot a
too much time. This is because in the First Come First Serve Scheduling Algorithm Non-
Preemptive Approach, the Processes or the jobs are chosen in serial order. Due, to this
shorter jobs or processes behind the larger processes or jobs takes too much time to
complete its execution. Due, to this the Waiting Time, Turn Around Time, Completion
Time is very high.
56
So, here as the first process is large or completion time is too high, then this Convoy effect
in the First Come First Serve Algorithm is occurred.
Let us assume that Longer Job takes infinite time to complete. Then, the remaining
processes have to wait for the same infinite time. Due to this Convoy Effect created by the
Longer Job the Starvation of the waiting processes increases very rapidly. This is the
biggest disadvantage of FCFS CPU Process Scheduling.
1. Implementation is simple.
2. Does not cause any causalities while using
3. It adopts a non pre Emptive and pre Emptive strategy.
4. It runs each procedure in the order that they are received.
5. Arrival time is used as a selection criterion for procedures.
57
Disadvantages of FCFS CPU Process Scheduling
Example
Non-Pre Emptive-Approach
Now, let us solve this problem with the help of the Scheduling Algorithm named First
Come First Serve in a Non-Preemptive Approach.
58
Solution to the Above Question Example 1
S. No P A B Com Turn W
r rr u pleti Aroun ai
o iv r on d ti
c al s Time Time ng
e T t Ti
s i T m
s m i e
I e m
D e
1 P A 0 9 9 9 0
1
2 P B 1 3 12 11 8
2
3 P C 1 2 14 13 11
3
4 P D 1 4 18 17 13
4
5 P E 2 3 21 19 16
5
6 P F 3 2 23 20 18
6
Average CT = 97 / 6
Average CT = 16.16667
Average WT = ( 0 + 8 + 11 + 13 + 16 + 18 ) /6
Average WT = 66 / 6
Average WT = 11
Average TAT = 89 / 6
Pre Emptive-Approach
Now, let us solve this problem with the help of the Scheduling Algorithm named First
Come First Serve in a Pre Emptive-Approach.
S P A B Com Turn W
. r rr u pleti Aroun ai
o iv r ti
60
N c al s on d ng
o e T t Time Time Ti
s i T m
s m i e
I e m
D e
1 P A 0 9 23 23 1
1 4
2 P B 1 3 8 7 4
2
3 P C 1 2 3 2 0
3
4 P D 1 4 15 14 1
4 0
5 P E 2 3 11 9 7
5
6 P F 3 2 5 2 0
6
To get rid of the problem of wasting the wake-up signals, Dijkstra proposed an approach
which involves storing all the wake-up calls. Dijkstra states that, instead of giving the
61
wake-up calls directly to the consumer, producer can store the wake-up call in a variable.
Any of the consumers can read it whenever it needs to do so.
Semaphore is the variables which stores the entire wake up calls that are being transferred
from producer to consumer. It is a variable on which read, modify and update happens
automatically in kernel mode.
Semaphore cannot be implemented in the user mode because race condition may always
arise when two or more processes try to access the variable simultaneously. It always needs
support from the operating system to be implemented.
According to the demand of the situation, Semaphore can be divided into two categories.
• Counting Semaphore
• Binary Semaphore or Mutex
In this tutorial, we are going to learn about the most efficient CPU Process Scheduling
Algorithm named Round Robin CPU Process Scheduling. This algorithm is very special
because it is going to remove all the Flaws which we have detected in the previous CPU
Process Scheduling Algorithms.
There is a lot of popularity for this Round Robin CPU Scheduling is because Round Robin
works only in Pre Emptive state. This makes it very reliable.
Important Abbreviations
62
Round Robin CPU Scheduling is the most important CPU Scheduling Algorithm which is
ever used in the history of CPU Scheduling Algorithms. Round Robin CPU Scheduling
uses Time Quantum (TQ). The Time Quantum is something which is removed from the
Burst Time and lets the chunk of process to be completed.
Time Sharing is the main emphasis of the algorithm. Each step of this algorithm is carried
out cyclically. The system defines a specific time slice, known as a time quantum.
First, the processes which are eligible to enter the ready queue enter the ready queue. After
entering the first process in Ready Queue is executed for a Time Quantum chunk of time.
After execution is complete, the process is removed from the ready queue. Even now the
process requires some time to complete its execution, then the process is added to Ready
Queue.
The Ready Queue does not hold processes which already present in the Ready Queue. The
Ready Queue is designed in such a manner that it does not hold non unique processes. By
holding same processes Redundancy of the processes increases.
After, the process execution is complete, the Ready Queue does not take the completed
process for holding.
Advantages
63
The Advantages of Round Robin CPU Scheduling are:
Disadvantages
• Low Operating System slicing times will result in decreased CPU output.
• Round Robin CPU Scheduling approach takes longer to swap contexts.
• Time quantum has a significant impact on its performance.
• The procedures cannot have priorities established.
Examples:
Ready Queue:
1. P1, P2, P3, P4, P5, P6, P1, P3, P4, P5, P6, P3, P4, P5
Gantt chart:
64
Average Completion Time
This Algorithm is the preemptive version of SJF scheduling. In SRTF, the execution of
the process can be stopped after certain amount of time. At the arrival of every process, the
short term scheduler schedules the process with the least remaining burst time among the
list of available processes and the running process.
Once all the processes are available in the ready queue, No preemption will be done and
the algorithm will work as SJF scheduling. The context of the process is saved in the
65
Process Control Block when the process is removed from the execution and the next
process is scheduled. This PCB is accessed on the next execution of this process.
Example
In this Example, there are five jobs P1, P2, P3, P4, P5 and P6. Their arrival time and burst
time are given below in the table.
P A B Com Turn W Re
r r u pleti Arou ai sp
o r r on nd ti on
c i s Tim Time n se
e v t e g Ti
s a T T me
s l i i
I T m m
D i e e
m
e
1 0 8 20 20 12 0
2 1 4 10 9 5 1
3 2 2 4 2 0 2
4 3 1 5 2 1 4
5 4 3 13 9 6 10
66
6 5 2 7 2 0 5
The Gantt chart is prepared according to the arrival and burst time given in the table.
1. Since, at time 0, the only available process is P1 with CPU burst time 8. This is the
only available process in the list therefore it is scheduled.
2. The next process arrives at time unit 1. Since the algorithm we are using is SRTF which
is a preemptive one, the current execution is stopped and the scheduler checks for the
process with the least burst time.
Till now, there are two processes available in the ready queue. The OS has executed
P1 for one unit of time till now; the remaining burst time of P1 is 7 units. The burst
time of Process P2 is 4 units. Hence Process P2 is scheduled on the CPU according to
the algorithm.
3. The next process P3 arrives at time unit 2. At this time, the execution of process P3 is
stopped and the process with the least remaining burst time is searched. Since the
process P3 has 2 unit of burst time hence it will be given priority over others.
4. The Next Process P4 arrives at time unit 3. At this arrival, the scheduler will stop the
execution of P4 and check which process is having least burst time among the available
processes (P1, P2, P3 and P4). P1 and P2 are having the remaining burst time 7 units
and 3 units respectively.
5. P3 and P4 are having the remaining burst time 1 unit each. Since, both are equal hence
the scheduling will be done according to their arrival time. P3 arrives earlier than P4
and therefore it will be scheduled again.The Next Process P5 arrives at time unit 4. Till
this time, the Process P3 has completed its execution and it is no more in the list. The
scheduler will compare the remaining burst time of all the available processes. Since
the burst time of process P4 is 1 which is least among all hence this will be scheduled.
67
6. The Next Process P6 arrives at time unit 5, till this time, the Process P4 has completed
its execution. We have 4 available processes till now, that are P1 (7), P2 (3), P5 (3) and
P6 (2). The Burst time of P6 is the least among all hence P6 is scheduled. Since, now,
all the processes are available hence the algorithm will now work same as SJF. P6 will
be executed till its completion and then the process with the least remaining time will
be scheduled.
Once all the processes arrive, No preemption is done and the algorithm will work as SJF.
Highest Response Ratio Next (HRNN) is one of the most optimal scheduling algorithms.
This is a non-preemptive algorithm in which, the scheduling is done on the basis of an extra
parameter called Response Ratio. A Response Ratio is calculated for each of the available
jobs and the Job with the highest response ratio is given priority over the others.
Where,
W → Waiting Time
S → Service Time or Burst Time
If we look at the formula, we will notice that the job with the shorter burst time will be
given priority but it is also including an extra factor called waiting time. Since,
HRNN α W
HRNN α (1/S)
Hence,
This algorithm not only favors shorter job but it also concern the waiting time of the longer
jobs. Its mode is non preemptive hence context switching is minimal in this algorithm.
68
Chapter 5 Deadlocks in OS
Every process needs some resources to complete its execution. However, the resource is
granted in a sequential order.
A Deadlock is a situation where each of the computer process waits for a resource which
is being assigned to some another process. In this situation, none of the process gets
executed since the resource it needs, is held by some other process which is also waiting
for some other resource to be released.
Let us assume that there are three processes P1, P2 and P3. There are three different
resources R1, R2 and R3. R1 is assigned to P1, R2 is assigned to P2 and R3 is assigned to
P3.
After some time, P1 demands for R1 which is being used by P2. P1 halts its execution since
it can't complete without R2. P2 also demands for R3 which is being used by P3. P2 also
stops its execution because it can't continue without R3. P3 also demands for R1 which is
being used by P1 therefore P3 also stops its execution.
In this scenario, a cycle is being formed among the three processes. None of the process is
progressing and they are all waiting. The computer becomes unresponsive since all the
processes got blocked.
69
Difference between Starvation and Deadlock
Deadlock Starvation
Deadlock happens when Mutual It occurs due to the uncontrolled priority and
exclusion, hold and wait, No resource management.
preemption and circular wait occurs
simultaneously.
70
1. Mutual Exclusion A resource can only be shared in mutually exclusive manner. It
implies, if two process cannot use the same resource at the same time.
2. Hold and Wait A process waits for some resources while holding another resource at
the same time.
3. No preemption The process which once scheduled will be executed till the completion.
No other process can be scheduled by the scheduler meanwhile.
4. Circular Wait All the processes must be waiting for the resources in a cyclic manner
so that the last process is waiting for the resource which is being held by the first
process.
1. Deadlock Ignorance
Deadlock Ignorance is the most widely used approach among all the mechanism. This is
being used by many operating systems mainly for end user uses. In this approach, the
Operating system assumes that deadlock never occurs. It simply ignores deadlock. This
approach is best suitable for a single end user system where User uses the system only for
browsing and all other normal stuff.
There is always a tradeoff between Correctness and performance. The operating systems
like Windows and Linux mainly focus upon performance. However, the performance of
the system decreases if it uses deadlock handling mechanism all the time if deadlock
happens 1 out of 100 times then it is completely unnecessary to use the deadlock handling
mechanism all the time.
In these types of systems, the user has to simply restart the computer in the case of
deadlock. Windows and Linux are mainly using this approach.
2. Deadlock prevention
Deadlock happens only when Mutual Exclusion, hold and wait, No preemption and circular
wait holds simultaneously. If it is possible to violate one of the four conditions at any time
then the deadlock can never occur in the system.
The idea behind the approach is very simple that we have to fail one of the four conditions
but there can be a big argument on its physical implementation in the system.
3. Deadlock avoidance
71
In deadlock avoidance, the operating system checks whether the system is in safe state or
in unsafe state at every step which the operating system performs. The process continues
until the system is in safe state. Once the system moves to unsafe state, the OS has to
backtrack one step.
In simple words, The OS reviews each allocation so that the allocation doesn't cause the
deadlock in the system.
This approach let the processes fall in deadlock and then periodically check whether
deadlock occur in the system or not. If it occurs then it applies some of the recovery
methods to the system to get rid of deadlock.
We will discuss deadlock detection and recovery later in more detail since it is a matter of
discussion.
Deadlock avoidance
In deadlock avoidance, the request for any resource will be granted if the resulting state of
the system doesn't cause deadlock in the system. The state of the system will continuously
be checked for safe and unsafe states.
In order to avoid deadlocks, the process must tell OS, the maximum number of resources
a process can request to complete its execution.
72
The simplest and most useful approach states that the process should declare the maximum
number of resources of each type it may ever need. The Deadlock avoidance algorithm
examines the resource allocations so that there can never be a circular wait condition.
The resource allocation state of a system can be defined by the instances of available and
allocated resources, and the maximum instance of the resources demanded by the
processes.
Resources Assigned
A 3 0 2 2
B 0 0 1 1
C 1 1 1 0
D 2 1 4 0
A 1 1 0 0
B 0 1 1 2
73
C 1 2 1 0
D 2 1 1 2
1. E = (7 6 8 4)
2. P = (6 2 8 3)
3. A = (1 4 0 1)
Above tables and vector E, P and A describes the resource allocation state of a system.
There are 4 processes and 4 types of the resources in a system. Table 1 shows the instances
of each resource assigned to each process.
Table 2 shows the instances of the resources, each process still needs. Vector E is the
representation of total instances of each resource in the system.
Vector P represents the instances of resources that have been assigned to processes. Vector
A represents the number of resources that are not in use.
A state of the system is called safe if the system can allocate all the resources requested by
all the processes without entering into deadlock.
If the system cannot fulfill the request of all processes then the state of the system is called
unsafe.
The key of Deadlock avoidance approach is when the request is made for resources then
the request must only be approved in the case if the resulting state is also a safe state.
The resource allocation graph is the pictorial representation of the state of a system. As its
name suggests, the resource allocation graph is the complete information about all the
processes which are holding some resources or waiting for some resources.
It also contains the information about all the instances of all the resources whether they are
available or being used by the processes.
74
In Resource allocation graph, the process is represented by a Circle while the Resource is
represented by a rectangle. Let's see the types of vertices and edges in detail.
Vertices are mainly of two types, Resource and process. Each of them will be represented
by a different shape. Circle represents process while rectangle represents resource.
A resource can have more than one instance. Each instance will be represented by a dot
inside the rectangle.
75
Edges in RAG are also of two types, one represents assignment and other represents the
wait of a process for a resource. The above image shows each of them.
A resource is shown as assigned to a process if the tail of the arrow is attached to an instance
to the resource and the head is attached to a process.
A process is shown as waiting for a resource if the tail of an arrow is attached to the process
while the head is pointing towards the resource.
Example
76
Let'sconsider 3 processes P1, P2 and P3, and two types of resources R1 and R2. The
resources are having 1 instance each.
According to the graph, R1 is being used by P1, P2 is holding R2 and waiting for R1, P3
is waiting for R1 as well as R2.
The graph is deadlock free since no cycle is being formed in the graph.
If a cycle is being formed in a Resource allocation graph where all the resources have the
single instance then the system is deadlocked.
The following example contains three processes P1, P2, P3 and three resources R2, R2,
R3. All the resources are having single instances each.
77
If we analyze the graph then we can find out that there is a cycle formed in the graph since
the system is satisfying all the four conditions of deadlock.
Allocation Matrix
Allocation matrix can be formed by using the Resource allocation graph of a system. In
Allocation matrix, an entry will be made for each of the resource assigned. For Example,
in the following matrix, en entry is being made in front of P1 and below R3 since R3 is
assigned to P1.
Process R1 R2 R3
P1 0 0 1
P2 1 0 0
P3 0 1 0
Request Matrix
In request matrix, an entry will be made for each of the resource requested. As in the
following example, P1 needs R1 therefore an entry is being made in front of P1 and below
R1.
78
Process R1 R2 R3
P1 1 0 0
P2 0 1 0
P3 0 0 1
Avial = (0,0,0)
Neither we are having any resource available in the system nor a process going to release.
Each of the process needs at least single resource to complete therefore they will
continuously be holding each one of them.
We cannot fulfill the demand of at least one process using the available resources therefore
the system is deadlocked as determined earlier when we detected a cycle in the graph.
In this approach, The OS doesn't apply any mechanism to avoid or prevent the deadlocks.
Therefore the system considers that the deadlock will definitely occur. In order to get rid
of deadlocks, The OS periodically checks the system for any deadlock. In case, it finds any
of the deadlock then the OS will recover the system using some recovery techniques.
The main task of the OS is detecting the deadlocks. The OS can detect the deadlocks with
the help of Resource allocation graph.
79
In single instanced resource types, if a cycle is being formed in the system then there will
definitely be a deadlock. On the other hand, in multiple instanced resource type graph,
detecting a cycle is not just enough. We have to apply the safety algorithm on the system
by converting the resource allocation graph into the allocation matrix and request matrix.
In order to recover the system from deadlocks, either OS considers resources or processes.
For Resource
We can snatch one of the resources from the owner of the resource (process) and give it to
the other process with the expectation that it will complete the execution and will release
this resource sooner. Well, choosing a resource which will be snatched is going to be a bit
difficult.
System passes through various states to get into the deadlock state. The operating system
canrollback the system to the previous safe state. For this purpose, OS needs to implement
check pointing at every state.
The moment, we get into deadlock, we will rollback all the allocations to get into the
previous safe state.
For Process
80
Kill a process
Killing a process can solve our problem but the bigger concern is to decide which process
to kill. Generally, Operating system kills a process which has done least amount of work
until now.
This is not a suggestible approach but can be implemented if the problem becomes very
serious. Killing all process will lead to inefficiency in the system because all the processes
will execute again from starting.
81
Chapter 6 Commonly used OS
Components and Features: Windows consists of several core components, including the
kernel, device drivers, user interface components, and system libraries. The kernel is the
heart of the operating system, responsible for managing hardware resources, memory, and
process scheduling. The user interface components include the Windows Shell, which
provides the desktop environment and graphical elements like icons and windows.
File System: Windows supports the New Technology File System (NTFS) as its primary
file system. NTFS offers features like file and folder permissions, encryption, and
journaling for improved reliability. This file system is known for its robustness and support
for large storage volumes.
Benefits and Limitations: Windows offers a wide range of software compatibility due to
its popularity. It provides a familiar user interface, extensive driver support for hardware,
and compatibility with a vast array of applications. However, it has faced criticism for its
susceptibility to malware and security vulnerabilities, as well as concerns about user
privacy.
Mac:
History: The macOS operating system, previously known as Mac OS X, was developed
by Apple Inc. Originally, it was built on a Unix-based foundation called NeXTSTEP,
which Apple acquired from NeXT Computer in 1996. This formed the basis for macOS,
which was first released in 2001.
82
Components and Features: macOS includes a graphical user interface known for its
elegance and simplicity. It has a core based on the Darwin operating system, which is a
Unix-like foundation. The Aqua user interface is known for its distinctive look and feel,
characterized by its use of translucent elements and visual effects.
Memory Management: Similar to other modern operating systems, macOS utilizes virtual
memory management to allow multiple processes to run concurrently while efficiently
using physical memory.
File System: macOS originally used the Hierarchical File System (HFS) and then HFS+,
but it transitioned to the Apple File System (APFS) in recent versions. APFS offers features
like snapshot support, enhanced data integrity, and improved performance.
Benefits and Limitations: macOS is praised for its sleek design, ease of use, and strong
integration with Apple's hardware ecosystem. It is known for its focus on security and
privacy. However, it is limited by its hardware exclusivity to Apple devices and a narrower
range of software compared to Windows.
Countermeasures: To mitigate these security risks, both Microsoft and Apple regularly
release updates and patches to address vulnerabilities. They also incorporate security
features like firewall protection, antivirus software integration, encryption mechanisms,
and user account controls. Users are encouraged to keep their systems updated and follow
best practices to minimize risks.
In summary, Windows and macOS are two major proprietary operating systems with
distinct histories, components, features, memory management techniques, and file systems.
While they offer various benefits, they also have limitations and face security challenges
that require ongoing vigilance and countermeasures to ensure the safety and functionality
of the systems.
Components and Features: Linux consists of the kernel, which manages hardware
resources, process scheduling, and memory management. On top of the kernel, various
components are combined to create complete operating systems, often referred to as Linux
distributions or distros. These distributions include user interfaces, system libraries,
utilities, and application software.
Devices and Drivers: Linux has strong support for a wide range of devices and hardware
architectures. The kernel includes drivers for many common hardware components, and
the open-source nature of Linux encourages the community to develop drivers for a diverse
array of devices.
Memory Management: Linux uses a virtual memory system similar to other modern
operating systems. It employs techniques like demand paging and memory mapping to
optimize memory usage and allow efficient multitasking.
File System: Linux supports a variety of file systems, including ext4, XFS, and Btrfs.
These file systems offer features such as journaling for data integrity, efficient storage
allocation, and support for large storage volumes.
Benefits and Limitations: Linux's open-source nature brings numerous benefits. The
community-driven development model encourages collaboration, rapid innovation, and
widespread adoption. Linux is highly customizable, making it suitable for a wide range of
applications from embedded systems to servers. It is known for its stability, security, and
efficient resource utilization.
However, Linux also has its limitations. Its diverse ecosystem can lead to fragmentation,
with various distributions having different approaches and package management systems.
Additionally, some users may find the learning curve steep due to its command-line
interface and configuration options.
84
Countermeasures: Linux's security is bolstered by its strong community involvement,
rapid patching, and security-focused distributions. Tools like SELinux (Security-Enhanced
Linux) provide additional layers of access control, and the open-source nature allows
security experts to review the code for potential vulnerabilities.
In conclusion, Linux is a powerful open-source operating system known for its flexibility,
security, and customization. Its components, features, and diverse ecosystem have led to
its widespread adoption across various computing domains. While it presents benefits,
users should be aware of its potential limitations and practice security measures to ensure
a safe and efficient computing experience.
85
Database Management Systems
86
Unit VII: Database Management System
Database is a Computer based record keeping system whose over all purpose is to record
and maintain the information.
Data Base
1. Data
The data stored in the system is partitioned into one or more databases. For tutorial
purposes it is usually convenient to assume that there is just one database, containing the
totality of all stored data in the system.
A database then is a repository for stored data, in general it is both integrated and shared.
Integrated
Shared
By “shared” we mean that individual pieces of data in the database may be shared among
several different users, in the sense that each of those users may have access to the same
piece of data and may use it for different purposes.
2. Hardware:
The hardware consists of the secondary storage volumes-disks, drums etc. On which the
database resides together with the associated devices, control units, channels, and so forth.
(We assume that the database is too large to be held in its entirety within the computer’s
primary storage).
87
3. Software:
Between the physical database itself (i.e. the data as actually stored) and the users of the
system is a layer of software, usually called the database management system or DBMS.
All requests from users for access to the database are handled by the DBMS.
Queries
4. Users:
Here are considered three broad classes of user i.e.
(1) Application programmer.
(2) End-user.
(3) Database administrator.
Application programmer is responsible for writing application programs that use the
database, typically in a language such as COBOL or PL /I. These application programs
operate on the data in all the usual ways, retrieving information, creating new information,
deleting or changing existing information (All these functions are performed by issuing the
appropriate request to the DBMS)
(2) End-user:
The second class of user, then, the end-user may employ a query language provided as an
integral part of the system.
The third class of user is the database administrator or DBA. DBA is a person or a group
of person responsible for over all performance of the system.
88
Conventional purpose of DBMS:
(Conventional files are also called permanent files(Cobol Files, Non database files)
consider part of saving bank enterprises that keep information about all customers and
savings accounts in permanent system files at the bank)
(The system has a number of application programs that allow the user to manipulate the
file including.)
‘New application programs’ are edited to the system as need arises. As the time
grows, files and more application programs are added to the system. The typical file
processing system described above is supported by conventional file system (these are non
DBF).
1. Either get the list of customers and extract the needed information manually ; or
2. Ask the data processing department to have a system programmer and then write the
necessary application program.
Since the file and application program are created by different programmers over a long
period of time and files are likely to have different formats and program may be written
in several programming languages. Moreover the same piece of information may be
duplicated in several places. The conventional file processing environment does not
allow needed data to be retrieved in a convenient and efficient manner because SQL
89
(structured query language) are not available, as we can in database, this leads to higher
storage and access cost.
5. Data isolation:
Since the data is scattered in various files and files may be in different formats, it is difficult
to write the new application program to retrieve the data.
6. Integrity problem:
Data value stored in database must satisfy certain constraints, for eg. the balance of bank
account may never fall below a prescribed amount say $25 these constraints are enforced
by the system by adding codes in various application program.
7. Security problem:
Not every user of the Data base system should be able to access all the data,for eg- In a
banking system payroll personal need only to see that part of the data base that has
information about the bank employee, they do not need access to information about
customers account. So it is difficult to enforce these security constraints in conventional
file system.
8. Lack of flexibility.
9. Data dependencies
90
Unit VII DBMS Design
Relational DBMS:
Data and relationship among the data is represented by collection of table each of which
has number of columns with unique name.
S1 Smith 20 London
S2 John 10 Paris
S3 Black 30 Paris
91
3. Supply & Part Combine to make Table Sp
S# P# QTY
S1 P1 300
S1 P2 200
S1 P3 400
S2 P1 300
S2 P2 400
S3 P2 200
• Rows are generally referred to as a tuple and columns are referred as a attribute.
• Domain: A domain is a pool of value from which the actual values appearing in a
given column are drawn. The set of possible values that a given attribute can have
is called domain. It is possible for diff. Attribute to share same domain
GUEST.Soc_Sec_No & EMPLOYEE.Soc_Sec_NO
Several Operation:
1. INSERT-
Given information concerning a new supplier (S4) in area (W).
INSERT tuple from W into Relation S.
Keys:
1. Primary key:
In a relation there is one attribute with the value that is unique with in the relation
and thus can be used to identify the tuple of that relation. And that attribute is called
the primary key which is used to identify the one to another. For example in the
relation part the primary key is P#. Primary key can be a combination of two
attributes for eg. in SP table [ S# p# ]
2. Super key
3. Candidate key:
A relation in which their are more than one attribute Combination, processing the
unique identification property and hence more than one attribute that can be used
for unique identification purpose, then there are more than one candidate key for
eg. in Supplier table the attribute S# and S.Name both are candidate Keys.
4. Alternate Key :
An attribute which is used for unique identification purpose other than primary Key
is Called the Alternate Key.
93
5. Secondary Key:
6. Foreign key:
If the value of primary Key (K) of one table also occurs in another table and both
are drawn from the same domain than M is Called a foreign Key.
For example:
Teacher Table
T# T NAME Qualification C#
T1 Pinky B.Sc. BI
Class Table
BII 12 12340 20
BI 11 1462 30
Integrity Rules:
1. No Component of a primary Key may be null (Unknown). This is Known as entity
integrity.
94
2. Referential Integrity: Let D be a primary domain, and let R1 be a relation with an
attribute A that is defined on D. Then, at any Given time, each value of A in R1 must
be either (a) null, or (b) equal to v, say, where v is the primary Key value of some tuple
in some relation R2 (R1 $ R2 not necessarily distinct) with primary Key defined on D.
Hierarchical model:
Advantages:
1. Relative Simplicity.
2. Easy to use.
95
Disadvantage:
Insert:
It is not possible, without introducing a special dummy part to insert data concerning a new
supplier S4, say until that supplier supplies some part.
Delete:
The only way to delete a shipment is to delete the corresponding supplier occurrence. It
follows that if we delete the only shipment for a given supplier, we lose all information on
that supplier (The insert and delete anomalies are really two sides of the same coin). for
example, deleting the shipment connecting P2 and S3 is handled by deleting the occurrence
for S3 under part p2, which since it is the only occurrence for S3-causes all information on
S3 to be lost.
Update:
If we need to change the description of a supplier- eg to change the city for supplier S1 to
Delhi-we are faced with either the problem of Searching the entire view to find every
occurrence of supplier S1, or the possibility of introducing an in consistency. (Supplier S1
might be shown as being in Delhi at one point and London at another.)
Network model:
Disadvantage:
1. It is a more complex structure because there are number of pointer even to represent a
small database we have large number of pointer.
Advantages:
96
Insert:
To insert data concerning a new supplier record occurrence. Initially there will be no
connector records for the new supplier; its chain will consist of a single pointer from the
supplier to itself.
Delete:
To delete the shipment connecting P2 and S3 we delete the connector record
OCCURENCE linking this supplier and this part. The two-chain concerned will need to be
adjusted appropriately
Update:
We can change the city for supplier S1 to Delhi without search problems and without the
possibility of inconsistency because the city for S1 appears at precisely one place in the
structure.
The network data model was formalised in the late 1960s by the Database Task Group of
the conference on Data System Language (DBTG/CODASYL). Their first report which
has been revised a number of times, contained detailed specifications for the network data
model .
DBTG Set .
The DBTG model uses two different data structures to represent the database entities and
relationships between the entities, namely record type and set type. A RECORD TYPE is
used to represent an entity type. It is made up of a number of data items that represent the
attributes of the entity.
A SET TYPE is used to represent a directed relationship between two record types, the so-
called owner record type, and the member record type. The set type , like the record
type , is named and specifies that there is a one-to-many relationship (1:M) between the
owner and member record type . The set type can have more than one record type as its
member , but only one record type is allowed to be the owner in a given set type. A
database could have one or more occurrences of each of its record and set types. An
occurrence of a set type consists of an occurrence of the owner record type and any number
of occurrences of each of its member record types. A record type can not be a member of
two distinct occurrence of the same set type.
Bachman introduced a graphical means called a data structure diagram to denote the logical
relationship implied by the set. Here a labeled rectangle represents the corresponding
entity or record type. An arrow that connects two labeled rectangles represents a set type.
97
Department
Dept_Emp
Employee
A DBTG Set
The record is a basic unit to represent data in the DBTG network database model. The
implementation of the one-to-many relationships of a set is represented by linking the
members of a given occurrence of a set to the owner record occurrence. Figure 22 shows
the implementation of the set occurrence DEPT-EMP where the owner record is Comp.sc.
and the member records are for instance Jancy and Santosh. Note that for simplicity we
have shown only one of the record fields of each record. This method of implementation
assigns one pointer (link) in each record for each set type in which the record participates
and , therefore, allows a record occurrence to participate in only one occurrence of a given
set type.
Comp. Sc.
1 1 0 ……. 1 ……..
98
2 0 0 ……. 1 ……..
. . . ……. . ……..
. . . ……. . ……..
1 1 0 ……. 1 ……..
1 1 0 ……. 1 ……..
External level which is one closest to the user, that is , the one concerned with the way in
which data is viewed by individual users.
Internal level is one which is closest to physical storage, that is , the one concerned the
way in which the data is actually stored.
User 1 User 2
Employee Address
Logical record
Employee Name 1
Employee annual salary
Employee Address
Mapping supplied by DBMS
Employee Name
Employee Name Employee Soc_Sec_No.
CONCEP-
Employee name : String
Employee Address 99
Employee Address
Employee social security number : key
Employee annual salary
TUAL
VIEW Conceptual Record
DB
A
Internal record
The highest level seen by application programmer or user is called external view, user
view or simple view.
The next level of abstraction is the total sum of user’s view called global view or conceptual
view.
The lowest level, a description of actual method of storing the data is the internal view.
A scheme is an out line or a plan that describe the records and relationship exist in the
view. The view at each of these levels is described by a scheme.
Each external view is described by a scheme called an External Scheme. The External
scheme is consisting of definition of logical record and the relationships in the external
view. It also contains the method of deriving the object in external view from object in
conceptual view.
The conceptual view is defined by the means of Conceptual Scheme. It describes all records
and relationship included in conceptual view and therefore in database there is only one
conceptual scheme per database. Conceptual scheme are intended to include a great many
100
additional features such as authorization checks and validation procedures. It also contains
the method of deriving the object in conceptual view from object in internal view.
Internal view is expressed by the Internal Scheme. The Internal view is very low-level
representation of database. Internal scheme not only define various type of stored record
but also specifies what indexes exist, how stored fields are represented, what physical
sequence of stored record are in and so on.
Two mapping are required in a database system with three different view. The
Conceptual/Internal mapping defines the correspondence between a conceptual view and
a stored database; it specifies how conceptual records and fields map into their stored
counterpart.
The External / conceptual mapping defines the correspondence between a conceptual view
and a external view because some differences may exist between these two level for field
may exist different format.
Language
(1) Each user has a language at his/her disposal for example, for application programmer
it will be conventional programming language such as COBOL or PL/I ; for the
terminal user it will be either a query language or special purpose language.
(2) One important thing about the users language is that it will include the data sub
language (DSL) that is a subset of a total language that is concerned with the database
object and the operation.
(3) Any data sub language with a combination of two languages.
(a) Data definition language (DDL) which provide the definition or description of data
base object.
(b) Data manipulation language (DML) , which is used to manipulation or processing of
such object which are defined using DDL.
DDL
(1) Create table supplier (Sr.No. char(5) non null, Name char(20), status int(4), city
char(15));
(2) A DDL is the means of declaring to the DBMS what data structure will be used.
(3) A DDL giving a logical data description to perform the following function.
101
(a) It should identify the type of data record and the data base file.
(b) It should give a unique name to each data item type, record type, file type.
(c) It should specify the data item type.
(d) It may define the length of the data item or range of the value that a data item can
accept.
DML
The application program must give instructions to DBMS and have a means of interpreting
the status message with which it replies saying whether the request has been satisfactorily
accomplished.
Generally, the interface b/w application program and DBMS referred to as a DML is
embedded in a host language such as COBOL. Typically command in DML which used in
CODASYL is as following:
OPEN :- A file or set of records is open, made available for an application program to
use.
CLOSE :- A file or set of records is closed, made unavailable to the application program.
FIND :- The DBMS locate a specified occurrences of a name record type. It may have to
search through several records to find it.
GET :- The contents of specified data item of a specified record occurrence are transferred
to program work area.
INSERT :- The record in the program work area is inserted into one or more name group
of records.
STORE :- A new record occurrence is stored in the data base and all the necessary
relationship linking and address facilities are created.
102
ORDER :- All or specified member record is in a name group of record, are logically
recorded in ascending or descending sequence of specified keys.
103
Chapter 8 DBMS DESIGN
RELATIONAL ALGEBRA
1) UNION :
The union of two union compatible relation A & B is the set of all tuple (T) belong to
either A or B (or Both)
For example:
A be the set of supplier in London and B the set of set of supplier tuple’s for set of supplier
whose suppy part P1, then A U B is the set of supplier tuple for the supplier whose either
located in London or supply part P1 or both./eg.
2) INTERSECTION :
The intersection of two (UNION) compatible relation A & B is the set of all tuple T
belongs to set of both A & B for ex :
A & B is the set of supplier tuple for supplier’s who are located in London and supply part
P1.
3) DIFFERENCE :
The difference b/w two relation (A-B) in that order is the set of all tuple T belong to A and
not to B. for Ex :
104
A-B in our example contain the set of tuple for the supplier who are located in London and
who don’t supply the part P1.
4) CARTESIAN PRODUCT :
Cartesian product is a simply concatenation of all tuple belonging to the two relation.
S=A*B
Let A be the set of supplier no.
Let B be the set of part no.
d a f d a f
c b d
RXS
A B C D E F
a b c b g a
a b c d a f
d a f b g a
d a f d a f
c b d b g a
c b d a a f
105
1) SELECTION :
The algebraic selection operator yields a horizontal subset of a given relation i.e. that
7subset of tuple within the given relation for which a specified predicate is satisfied.
S where City = “London”
S4 Clark 20 London
SP WHERE S# = ‘S1’
AND P# = ‘P1’
S# P# QTY
S1 P1 300
Condition Relationname
Ex.
City = London S
If x denotes the relation on which selection is performed then the result of selection has
exactly same qualified attribute as x.
2) PROJECTION :
Projection operator yield a vertical subset of a given relation that subset obtained by
selecting specified attributes in a specified left to right order and then eliminating duplicate
tuple within the attribute selected.
TABLE :
S# SNAME STATUS CITY
S1 Smith 20 London
S2 Jones 10 Paris
S3 Blake 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
CITY
London
Paris
Athens
3) JOIN :
Join is equivalent to taking first cartesian product of two relation and then taking the
selection of the product based on some logical condition for example we have two relation
R $ S.
R S
A B C D E
1 2 3 3 1
4 5 6 6 2
7 8 9
R S
I0J
Where , 0 is a logical expression
R S
B <= D
A B C D E
1 2 3 3 1
1 2 3 6 2
4 5 6 6 2
6 RXS
IOJ
R S
C=D
A B C D E
1 2 3 3 1
5 6 6 2
6 RXS
IOJ
Natural Join :
R S
A B C B C D
a b c b c d
d b c b c e
b d f a d b
c a d
108
RXS
A B C B C D
a b c b c d
d b c b c e
b d f a d b
d b c b c d
d b c b c e
d b c a d b
b d f b c d
b d f b c e
b d f a d b
c a d b c d
c a d b c e
c a d a d b
bu 12 cases ls ftles b or c dh value same gSA mUgs ysuk gSA
A B C D
a b c d
a b c e
d b c d
d b c e
c a d b
The natural join return R S , is applicable only when both R and S have column that
are named by attribute.
1) First we compute R x S
2) For each attributes X that named both a column in R and a column in S select from
R x S those tuple whose value agree in the column for R , X and S ,X .
The Join operator , as the name suggest allows the combining of two relations to form $ a
single new relation. The tuples from the operand relations that participate in the operation
and contribute to the result are related. The join operation allows the processing of
relationships existing between the operand relations.
3) DIVISION :
The division operator divide a dividend relation A of degree M+N by a division relation
B of degree n and produce a result relation of degree in
109
A B C D
a b c d
a b e f
b c e f
e d c d
e d e f
a b d e
S C D
e f
(a) Relation R
a b
e d
b c
R S
DATA DICTIONARY :
Data dictionary defining all the data item that are in use will be built up in co-operation.
One of the most important DBA tools is the data dictionary. The data dictionary is
effectively a database in its own right a database that contains “data about data” (that is ,
description of other objects in the system , rather than simply “raw data”) . In particular all
the various schemas (external , conceptual internal ) are physically stored , in both source
and object form in the dictionary . A comprehensive dictionary will also include cross
reference information , showing , for instance, which programs use which piece of data,
which department require which reports and so on.
In fact the dictionary may even be integrated in to the database it describes and thus may
include its own description. It should be possible to query the dictionary just like any other
database , so that for example , the DBA can easily discover which programs are likely to
be affected by some proposed change to the system.
110
This may be defined as a boundary in the system below which everything is invisible to
the user.
In a distributed system the data is stored in several computer. The several computer in a
distributed data base system is communicated by means of various communication system
such as high speed buses or telephone line and the processor in distributed database system
may vary in size, these processes are refer in number of different name, such as node, site.
In simple word we say that in distributed data base system. The data is stored in several
locations. Thus, the database is distributed in relevant only at internal level not on a
conceptual or external level.
A distributed database is typically a database i.e. not stored its entirety on a single physical
location but rather is spread across a n/w or computer that are geographically dispersed and
connected by communication link . The key object of a distributed data base system is that
it looks likes a centralized system to users.
For example consider a banking system consisting four branches located in four different
city . Each branch has its own computer with database consisting of all account maintained
at that branch. Each such installation is thus a site. There also exist one single site which
maintains information about all branches of bank.
A local transaction is a transaction that accesses accounts in the one single site, at which
the transaction was initiated. A global transaction on other hand is which either access
account in a site different from the one at which transaction was initiated ,or accesses
accounts in several different sites. For eg. consider a transaction to add Rs.200 to account
number 177 located at Delhi branch . If transaction was initiated at Delhi branch , then it
is considered local , otherwise it is considered global. A transaction to transfer Rs.200 from
account 177 to 305 , which is located at Bombay branch , is a global transaction since
accounts in two different site are accessed as the result of its execution.
Advantage:
1) Primary advantage of DDB system is that access and share the data in reliable and
efficient manner.
2) If one node of DDB system is failure then remaining node are able to continuing the
operation . Thus the failure of one node is not causes the system shut down. The failure
of node can be defected.
111
3) We know that in DDB system there are several nodes. So each node are able to retain
a degree of control over which data stored locally.
Disadvantage :
1) Software implement cost is very high. It is very difficult to implement the DDB system.
some topology
VIEWS
Base table is a primary data structure. It has independent exist. A view is a table that does
not have existence in its own right but is instead derived from one or more base table.
For example if the database include a base table S with fields S#,SNAME,STATUS and
CITY, then we may define view called LONDIN_SUPPLIER say, with fields S# , SNAME
, STATUS ,and CITY derived from S by selecting those records having CITY =
‘LONDON’ .
A views which restricts the user to certain rows is called a horizontal view and a vertical
view restricts the user to certain columns.
If the list of columns names is omitted the columns in the view take the same name as in
the underlying tables.
112
Security: users can be given access to only the rows /column of table that concern that. The
table may contain the data for the whole firm but they see data for their department.
Date Integrity: If the view shows data for particular office the user can only enter data for
that office.
Simplicity: Even if view is multiple table query, querying the view still look like a single-
table query.
Performance : IF the view is complex then even simple queries can take a long time.
Update restriction : Updating the data through a view may or may not be possible. If the
view is complex the DBMS may decide it can’t perform updates and make the view read-
only
as (select ……. )
In the most basic of definitions a DBMS can be regarded as relation only if it obeys
the following three rules:
. Retrieval of the data must be possible using the following types of operations:
SELECT, JOIN and PROJECT
. All relationships between data must be represented explicitly in that data itself.
113
With the 12 rules stated below , must be demonstrable, within a single product, for it to
termed relational .
Rule 0. : Any truly relational database must be manageable entirely through its own
relational capabilities
A relational database must be relational, wholly relational and nothing but relational.
It must be true that any item can be retrieved by supplying the table name, the primary key
value of the row holding the item and the column name in which it is to be found .
Rule 3 : The systematic treatment of null values (null value means unavailable or
unapplicable values)
It is fundamental to DBMS that null values are supported in the representation of missing
and inapplicable information.
A description of the database is held and maintained using the same logical structures used
to define the data.
A description of the database is held and maintained using the same logical structures used
to define the data.
Rule 4 means that there must be a data dictionary within the RDMS that is constructed of
tables and/or views that can be examined using SQL.
114
Rule 5 : The comprehensive sub-language rule
There must be at least one language whose statements can be expressed as character strings
conforming to some well defined syntax, that is comprehensive in supporting the
following:
• Data definition
• View information
• Data manipulation
• Integrity constraints
• Authorisation
• Transaction boundaries
All views that can be updated in theory, can also be updated by the system. Example,
if you define a virtual column in a view as A*B where A and B are columns in a base
table , then how can you perform an update on that virtual column directly?
An RDMS must do more than just be able to retrieve relational data sets. It has to be capable
of inserting and deleting data as a relational set.
Use access to the database, via monitors or application programs, must remain logically
consistent whenever changes to the storage representation , or access methods to data,
are changed.
115
1 A C 1 A 1 C E
E
4 A 4 C F
4 A C F
6 B 6 D G
This rule allows many types of database design changes to be made dynamically, without
users being aware of6 them..
B ItDshould be possible to split a table vertically into more than
G as such splitting preserves all the original data (is non –loss), and
one fragments, as long
maintain the primary key in each and every2 fragment.
B 2 D H
FRAG
2 1 B D FRAG 2 TAB
1 H
A B A C D A B C
D
1 A 1 C E
1 A C
E
4 A 4 C D
Secondly, it should be possible to combine base tables into one, by way of a non-loss join.
4 A C
5 Brule
Rule 10: Integrity 6 D G F
2 B 2 D H 6 B D
Integrity rule 1
G
116
2 B D
Integrity rule 1 is concerned with primary key values. Before we formally state the rule, let
us the look at the effect of null values in prime attributes. A null value for an attribute is a
value that is either not known at the time or does not apply to a given instance of the object
. It may also be possible that a particular tuple does not have a value for an attribute ; these
fact could be represented by a null value.
If any attribute of a primary key ( prime attribute) were permitted to have null values, then,
because the attribute in the key must be non redundant, the key cannot be used for unique
identification of tuples .
P: P:
Integrity Rule 2 is concerned with foreign keys, i.e. with attributes of a relation having
domains that are those of primary key of another relation . Ex. The manager attribute
represents the employee no of the manager. Manager is the foreign key; note that it is
referring to the primary key of the same relation. The chief executive officer (CEO) of the
company can have himself or herself as the manager or may take null values.
101 Jones @
102 Smith 110
103 Evan 112
104 Smith 112
117
Rule 11 : Distribution rule :
This is one of the more attractive aspects of RDBMS’s Database system built on the
relational framework are well suited to today’s client/server database design.
NORMALISATION
How we decide the structure for a given data base.
In other words, how do we decide what relations are needed and what their attributes should
be ? This is the database design problem.
NORMAL FORM
Normalization Theory is built around the concept of normal forms. A relation is said to be
in a particular normal form, if it satisfies a certain specified set of constraints. For example,
a relation is said to be in first normal form (abbreviated 1 NF ) if and only if it satisfies the
constraint that it contains atomic values only (Thus every Normalized relation is in 1 NF
, which accounts for the “first” )
118
UNIVERSE OF RELATION (Normalized and unnormalized )
11 NF RELATIONS (Normalized relations)
2 NF RELATIONS
3 NF RELATIONS
BO NF RELATIONS
4 NF RELATIONS
PI/NF(5NF)
RELATIONS
NORMAL FORMS
FUNCTIONAL DEPENDENCY :-
Let X and Y be two attributes of a relation. Given the value of X , if there is only one value
of Y corresponding to it , then Y is said to be functionally dependent on X . This is indicated
by the notation.
X Y
Y is functionally dependent on X if their exit a one value of Y for one value of X
In the suppliers and parts database for example , attributes SNAME , STATUS and CITY
of relation S are each functionally dependent on attribute S# because , given a particular
value for s# , there exists precisely one corresponding value for each of SNAME, STATUS,
and CITY (provided that the S# value occurs in relation S, of course).
In symbols.
S.S# S.SNAME
S.S# S.STATUS
S.S# S,CITY
Or ,
S.S# S,( NAME,STATUS,CITY )
S1 Smith 20 London
S2 James 10 Paris
119
S3 Black 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
S# STATUS
S.NAME CITY
PNAME
COLOR
P#
WEIGHT
CITY
S#
QTY
P#
X, Z Y
It means that there is only one value of Y corresponding to given values of X,Z . In other
words, y is functionally dependent on the composite X, Z .
120
For eg. :
X
Y
Z
S#
QTY
P#
X1 S#
Y QTY
X2 P#
X2 Y dependent
Ex.:
S#
CITY Not fully functionally
dependent
STATU
S
SP S# P# QTY
S1 P1 300
S1 P2 200 S#
S1 P3 400
S2 P1 300 QTY
S2 P2 400 P#
S3 P2 200
Example of fully functionally
Dependent
Sub level are not allowed which are allowed in COBOL. i.e. not decomposable.
Let us suppose that information concerning suppliers and shipments, rather than being split
into two separate relations (S AND SP) , is lumped together into a single relation FIRST
(S# , STATUS , CITY , P# , QTY)
Also we ignore the attribute SNAME for simplicity . The primary key of FIRST is the
combination ( S# , P#).
We see from the fig that STATUS and CITY are note fully functionally dependent on the
primary key. Status and CITY are not mutually independent.
122
Primary Key
S# STATUS
Qty.
P# CITY
In relation First , primary key value consists of a supplier number and a part number.
FIRST
S# STATUS CITY P# QTY.
S1 20 LONDON P1 300
S1 20 LONDON P2 200
S1 20 LONDON P3 400
S1 20 LONDON P4 200
S1 20 LONDON P5 100
S1 20 LONDON P6 100
S2 10 PARIS P1 300
S2 10 PARIS P2 400
S3 10 PARIS P2 200
S4 20 LONDON P2 200
S4 20 LONDON P4 300
S4 20 LONDON P5 400
SIMPLE TABULATION OF FIRST
DELETING :-
If we delete the only first tuple for a particular supplier, we destroy not only the shipment
connecting that supplier to some part but also the information that the supplier is located
in a particular city. For Example :
If we delete the FIRST tuple with S# value S3 and P# value P2, we lose the information
that S3 is located in paris.
123
UPDATING:-
The city value for a given supplier appears in FIRST many times , in general. This
redundancy causes update problems. For example, if supplier S1 move from LONDON to
Amsterdam , we are faced with either the problem of searching the FIRST relation to find
every tuple connecting S1 and LONDON (and changing it ) or the possibility of producing
an inconsistent result.( the city for S1 may be given as Amsterdam in one place and
LONDON in another.
STATUS
S#
CITY
S#
QTY.
P#
A relation R is in Second normal form (2NF) if and only if it is in 1NF and every nonkey
attribute is fully dependent on the primary key .
A relation that is in First normal form and not n second can always be reduced to an
equivalent collection of 2NF relation.
124
The reduction consists of replacing the relations by suitable projection ; the collection of
these projections is equivalent to the original relation can always be recovered by taking
the natural join of these projections , so no information is lost in the process. In other
words the process is reversible. In our example Second and SP are projections of FIRST
and FIRST ifs the natural join of SECOND and SP over S#.
Second
S# STATUS CITY
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Athens
SP
S# P# QTY.
S1 P1 300
S1 P2 200
S1 P3 400
S1 P4 200
S1 P5 100
S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 400
The solution of the problems which appears in the 1st normal form is solve in this way.
The solution to these problem is to replace the relation FIRST by the two relations
SECOND (S# , STATUS , CITY ) and SP( S# , P# , QTY).
INSERTING :-
We can enter the information that S5 is located in Athens , even though S5 does not
currently supply any parts , by simply inserting the appropriate tuple in to SECOND.
DELETING :-
125
We can delete the shipment connecting S3 and P2 by deleting the appropriate tuple from
SP , we don’t lose the information that S3 is located in Paris.
UPDATING :-
In the revised structure, the city for a given supplier appears once, not many times (the
redundancy has been eliminated). Thus we can change the city for S1 from London to
Amsterdam by changing it once and for all in the relevant SECOND tuple.
INSERTING :-
We cannot enter the fact that a particular city has a particular status value for example , we
cannot state that any supplier in ROME must have a status of 50 until we have some
supplier located in that city. The reason is , again , that until such a supplier exists we have
no appropriate primary key value.
DELETING :-
If we delete the only SECOND tuple for a particular city , we destroy not only the
information for the supplier concerned but also the information that city has that particular
status value . for example , if we delete the SECOND tuple for 55 , we lose the information
that the status for Athens is 30. (Once again , the inserting and deletion problems are two
sides of the same coin ).
UPDATING :-
The status value for a given city appears in SECOND relation to find every tuple for
London or the possibility of producing an inconsistent result. ( the status for London may
be given as 20 in one place and 30 in another)
TRANSITIVE :-
The dependency of status on S# , though it is functional , is transitive ( via Qty.) . Each S#
value determines a CITY value , and this in turn determines the STATUS value. The
transitivity leads, once again , to difficulties over update operation.
A
A relation R is in third normal form (3 NF) if and only if it is in 2NF and every nonkey
attribute is non transitively dependent on the primary key.
Again the solution to the problem, is to replace the original relation (SECOND) by two
projections, in This case SC (S# , CITY ) and CS ( CITY , STATUS) . Figure shows the
tabulation corresponding to the data values of the original SECOND. The process is
reversible, once again since SECOND is the join of SC and CS over CITY.
S# CITY
CITY STATUS
SC
S# CITY
S1 London
S2 Paris
S3 Paris
S4 London
S5 Athens
CS
CITY STATUS
Athens 30
London 20
Paris 10
Relation SC and CS are both 3NF ; relation SECOND is not ( The primary keys for SC
and CS) are S# and CITY , respectively ) . A relation that is in second normal form and
not in third can always be reduced to an equivalent collection of 3NF relations.
127
In this normal form we have overcome the problems of 2 NF.
INSERTION :-
If we want to enter the city and status of a supplier which do not have the supplier number
then also we can enter it in the 3NF which was not possible in 2 NF.
DELETION :-
If we delete a particular supplier S5 then the city and status of that supplier is also deleted
in 2NF but in 3 NF we broken down the S# , CITY and STATUS in two parts that’s why
city and status remains there inspite of the deletion of S#.
UPDATION :-
In 2NF if we change the status of supplier S1 i.e. the status value for London from 20 to
30 we faced either the problem of searching the SECOND relation to find every tuple for
London or the possibility of producing an inconsistent result . but in 3 NF we broken
down the S# , CITY and CITY , STATUS in the two parts that’s why if we want to change
the status for a particular city we take the table CS and do it . There is no problem of
searching and inconsistent because in the table CS city as well as status appears single
time.
The motivation for introducing BCNF is that the original 3NF Definition does not
satisfactorily handle the case of a relation possessing two or more composite and
overlapping candidate keys.
EXAMPLE :->
Relation First contains three determinants S# , City , and the combination (S# , P#) , of
these only (S#,P#) is a candidate key; hence First is not BCNF.
Similarly , SECOND is not BCNF , because the determinant CITY is not a candidate key.
Relations SP, SC AND CS on the other hand, are each BCNF , because in each case the
primary key is only determinant is the relation.
Now we present some examples in which the candidate keys overlap. Two candidate keys
overlap if they involve two or more attributes each and have and attribute in common.
128
FOR EXAMPLE :->
First example we suppose again that supplier names are unique , and we consider The
relation SSP (S# , SNAME ,P#, QTY.) . The keys are (S#, P#) and (SNAME, P#) .
IS THIS RELATION BCNF ?
The answer is no , because we have two determinants , S# , and SNAME , which are not
keys for the relation ( S# determines SNAME , and Conversely).
The relationship diagram for the above relation is given in below figure and Table.
Table gives the relation attributes. The two possible composite keys are professor code
and Dept. or Professor code and head of Dept. Observe that department as well as Head of
Dept. are not non-key attributes . They are a part of a composite key.
Head of
Departme
Departme
ntnt
nt
Professor
Code Percent
Time
Departme
Head of nt
Departme
Percent
Professor
ntnt 129
Time
Code
Professor Code Department Head of Dept. Parcent
P1 Physics Ghosh 50
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematics Krishnan 100
The relation given in above table is in 3NF . Observe , however , that the names of Dept.
and Head of Dept. are duplicated. Further , if Professor P2 resign , rows 3 and 4 are deleted.
We lose the information that Rao is the Head of Department of Chemistry.
The Normalization of the relation is done by creating a new relation for Dept. and Head of
Dept. and deleting Head of Dept. From Professor relation. The normalized relations are
shown in the following tables :-
(a) (b)
And the dependency diagram for these new relation in next figure. The dependency
diagram gives the important clue to this normalization step as it clear from next figures.
Department
Percent Time
Professor
Code
Department Head of
Departmet
DEPENDENCY DIAGRAM OF PROFESSOR RELATIO
130
MULTI VALUED IDEPENDENCIES (MVD)
STUDENT
A B C
10 C DANCE
10 PROLOG SINGING
10 PASCAL MOVIE
20 D-BASE READING
20 BASIC PAINTING
30 ORACLE STAMP COLLECTION
30 VISUAL BASIC CRICKET MATCH
1:M
Relationship
1:N
R.A → → R.B
Given a relation R with attribute A, B & C . The multi-value dependence R.A → →R.B
hold in R if and only if the set of B value matching a given pair in R dependence only on
the A value and independent of C value.
Student.No → → Student.Project
Student.No → → Student.Hobby
A relation R in 4th NF if and only if whenever there exist an MVD in R say. A→→B , then
all attributes of R are also functionally dependent on A.
Relation Student is not in 4 NF , sinnce it involves only MVD not FD . So break this
relation into two table’s by using projection.
131
SMP
A B
STUDENT NO PROJECT
10 C
11 PROLOG
10 PASCAL
20 D-BASE
20 BASIC
30 ORACLE
30 VISUAL BASIC
A C
STUDENT NO HOBBY
10 DANCE
12 SINGING
10 MOVIE
20 READING
20 PAINTING
30 STAMP COLLECTION
30 CRICKET MATCH
CTX
It is apparent that relation CTX contains a good deal of redundancy , leading as usual to
problems over update operations. For example, to add the information that the physics
course uses a new text called advanced mechanics, it is necessary to create three new tuple
, one for each of the tree teachers. Nevertheless , CTX is in BCNF , since it is “all key “
and there are no other functional determinants.
Now the CTX were replced by its two projections CT ( COURSE, TEACHER ) and CX(
COURSE, TEXT) . CT and CX are both “all key “ and are thus both BCNF.
CX CX
COURSE TEACHER COURSE TEXT
Physics Prof. Green Physics Basic Mechanics
Physics Prof. Brown Physics Principles of Optics
Physics Prof. Black Physics Modern Algebra
Math Prof. White Math Projetive Geometry
We can now see that relation CTX is not in 4 NF , since it involves an MVD that is not
an FD at all , let alone an FD in which the determinant is a candidate key . The two
projections CT and CX are in 4NF , however . thus 4NF is an improvement over BCNF.
S# P# J#
S1 P1 J2
S1 P2 J1
S2 P1 J1
S1 P1 J1
133
SP PJ SJ
S# P# P# J# S# J#
S1 P1 P1 J2 S1 J2
S1 P2 P2 J1 S1 J1
S2 P1 P1 J1 S2 J1
JOIN OVER P#
S# P# J#
S1 P1 J2
S1 P1 J1
S1 P2 J1
Spurious S2 P1 J2
row S2 P1 J1 join over ( J# , S#)
ORIGINAL SPJ
SPJ is the join of three of its projections but not of any two
A relation R is in Fifth normal form (5NF) – also called projection-join normal form
(PJ/NF) – if every join dependency in R is implied by the candidate keys of R. ( We amplify
the notion of JD being “implied by candidate keys”)
Relation SPJ is not 5NF ; its single candidate key, the combination (S# , P# , j#) , certainly
does not imply that the relation can be nonloss-decomposed into its projections SP , PJ and
JS. The projections SP , PJ , and JS are in 5 NF.
This JD is implied by the fact that S# and SNAME are both candidate keys . Thus , given
a relation R, we can tell if R is in 5 NF , provided we know the candidate keys and all JDs
in R.
Database Administra
One of the main reason of having database management system is to have to control
of both data and program accessing that data. The person having such control over
the system is called Database Administrator.
134
The DBA’s responsibilities include the following
DATA ITEM –
A Data item is the smallest unit of name data. It may consists of any number of bits or byte.
135
A data item is often referred to as a field or data element . In COBOL it is called an
elementary item. In DBMS data item as atomic , we cannot decompose in sub-item.
DATA AGGREGATE –
A data aggregate is a collection of data items within a record, which is given a name and
referred to as a whole.
A data aggregate is called a group or group item in COBOL .
RECORD –
A Record is a named collection of data items or data aggregates.
SEGMENT –
It is the view of some authorities that there is no need to differentiate between a data
aggregate and a record because each is a collection of data item.
FILE –
A file is the named collection of all occurrences of a given type of (logical) record
DATA BASE –
A database is a collection of the occurrences of multiple record types , containing the
relationships between records, data aggregates and data items.
ENTITIES –
Entity is a thing that exist and is distinguishable , that is , we can tell , one entity from
another. For Example , Each chair , each person, each automobile.
A group of all similar type of entity is called entity set. Example of entity set are:-
ATTRIBUTES –
Entity have their property called Attributes or quality of entity which we store as
information is called the attribute.
(2) Circle , represent Attribute and these attributes are linked to their entity set by
undirected edges.
136
(3) Diamond represent relationship . They are linked to their constituent entity set
by constituent entity set by indirect edges.
VENDOR
Supply
Price/unit
itemcode ITEM
Item-name
There are two main abstraction mechanisms used to model information : Generalisation
and aggregation. Generalisation is the abstracting process of viewing set of objects as a
single general class by concentrating on the general characteristics of the constituent sets
while suppressing or ignoring their differences. Or ignoring their differences . It is the
union of a number of lower-level entity types for the purpose of producing a higher-level
entity type. For instance, student is a generalisation of graduate or undergraduate , full-
time or part-time students.
137
Generalization is an IS_A relationship; therefore, manager IS_An employee , cook IS_An
employee, waiter IS_An employee, and so forth. Specialisation is the abstracting process
of introducing new characteristics to an existing class of objects to create one or more new
classes of objects. This invokes taking a higher-level entity and , using additional
characteristics , generating lower-level entities. The lower-level entities also inherit the
characteristics of the higher-level entity . In applying the characteristic size to car we can
create a full-size , mid-size, compact , or subcompact car. Specialisation may be seen as
the reverse process of generalisation.
EMPLOYEE
IS_A IS_A
AGGREGATION –
Aggregation is the process of compiling information on an object, thereby abstracting a
higher-level-object. In this manner, the entity person is derived by aggregating the
characteristics name , address, and Social Security number. Another form of the
aggregation is abstracting a relationship between objects and viewing the relationship as
an object. As such the ENROLLMENT relationship between entities student and course
could be viewed as entity REGISTRATION.
PERSON
Aggregation
Name
Address S_S_
COURSE
EXAMPLES OF AGGREGATION.
Important Things
• Attributes should be the fact about an entity & it should be fact about that entity.
Employee dk uke]
Employee dh mez vkfn
• Attribute in mono atomic or single value attribute
• We first define the mandatory attribute (not null) then optional ones (mandatory *
Optional O)
• Define fixed no of column & infinite and infinite number of row
• Clearly differentiate between entity & attributes (phone is an entity or attribute)
Contact No. –Phone1_off,Phone2_res,
( unique & not null primary key #)
139
Chapter 9 DBMS Transactions
Transaction Concept
ACID Properties
• Atomicity. Either all operations of the transaction are properly reflected in the
database or none are.
• Consistency. Execution of a transaction in isolation preserves the consistency of
the database.
• Isolation. Although multiple transactions may execute concurrently, each
transaction must be unaware of other concurrently executing transactions.
o That is, for every pair of transactions Ti and Tj, it appears to Ti that either
Tj, finished execution before Ti started, or Tj started execution after Ti
finished.
• Durability. After a transaction completes successfully, the changes it has made to
the database persist, even if there are system failures.
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
140
5. B := B + 50
6. write(B)
Transaction State
• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed.
• Failed, after the discovery that normal execution can no longer proceed.
• Aborted, after the transaction has been rolled back and the database restored to its
state prior to the start of the transaction. Two options after it has been aborted:
o restart the transaction – only if no internal logical error
o kill the transaction
• Committed, after successful completion.
141
Implementation of Atomicity and Durability (Cont.)
Concurrent Executions
142
• Multiple transactions are allowed to run concurrently in the system. Advantages
are:
o increased processor and disk utilization, leading to better transaction
throughput: one transaction can be using the CPU while another is reading
from or writing to the disk
o reduced average response time for transactions: short transactions need
not wait behind long ones.
Schedules
Example Schedules
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
The following is a serial schedule (Schedule 1 in the text), in which T1 is
followed by T2.
143
• Let T1 and T2 be the transactions defined previously. The following schedule
(Schedule 3 in the text) is not a serial schedule, but it is equivalent to Schedule 1.
• The following concurrent schedule (Schedule 4 in the text) does not preserve the
value of the the sum A + B.
144
Serializability
1. conflict serializability
2. view serializability
Conflict Serializability
T3 T4
read(Q)
write(Q)
write(Q)
145
We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.
View Serializability
• Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S´, also read the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value
was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read
the value of Q that was produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely on reads and writes alone
146
Recoverability
• If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence database must ensure that schedules are
recoverable.
147
• Cascadeless schedules — cascading rollbacks cannot occur; for each pair of
transactions Ti and Tj such that Tj reads a data item previously written by Ti, the
commit operation of Ti appears before the read operation of Tj.
• Every cascadeless schedule is also recoverable
• It is desirable to restrict the schedules to those that are cascadeless
Implementation of Isolation
• Schedules must be conflict or view serializable, and recoverable, for the sake of
database consistency, and preferably cascadeless.
• A policy in which only one transaction can execute at a time generates serial
schedules, but provides a poor degree of concurrency..
• Concurrency-control schemes tradeoff between the amount of concurrency they
allow and the amount of overhead that they incur.
• Some schemes allow only conflict-serializable schedules to be generated, while
others allow view-serializable schedules that are not conflict-serializable.
• Data manipulation language must include a construct for specifying the set of
actions that comprise a transaction.
• In SQL, a transaction begins implicitly.
• A transaction in SQL ends by:
o Commit work commits current transaction and begins a new one.
o Rollback work causes current transaction to abort.
• Levels of consistency specified by SQL-92:
o Serializable — default
o Repeatable read
o Read committed
o Read uncommitted
• Serializable — default
• Repeatable read — only committed records to be read, repeated reads of same
record must return same value. However, a transaction may not be serializable – it
may find some records inserted by a transaction but not find others.
• Read committed — only committed records can be read, but successive reads of
record may return different (but committed) values.
• Read uncommitted — even uncommitted records may be read.
148
Chapter 10 DBMS Advancements
149
sources. XQuery supports complex queries that involve navigating through XML
hierarchies.
10. DBMS Security: DBMS security involves identifying vulnerabilities and threats
that could compromise the integrity, confidentiality, and availability of data.
Security controls include authentication mechanisms, authorization policies,
encryption, auditing, and access controls to mitigate these risks.
11. DBMS Digital Rights Management (DRM): DBMS DRM focuses on protecting
digital content stored in databases. It involves mechanisms to control who can
access, view, modify, and distribute digital assets. DRM systems often use
encryption, access controls, watermarks, and usage tracking to enforce content
usage policies.
150