0% found this document useful (0 votes)
11 views20 pages

PA midsem

The document provides an overview of parallel computing, explaining its definition, benefits, and various models such as shared memory, message passing, threads, and data parallel models. It discusses the challenges of synchronization and communication in distributed-memory systems, comparing message passing and shared-memory paradigms. Additionally, it covers query processing in databases, detailing optimization techniques and types of parallelism, including intra-query and inter-query parallelism, as well as parallel discrete event simulation (PDES).

Uploaded by

A7 Roll No 40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views20 pages

PA midsem

The document provides an overview of parallel computing, explaining its definition, benefits, and various models such as shared memory, message passing, threads, and data parallel models. It discusses the challenges of synchronization and communication in distributed-memory systems, comparing message passing and shared-memory paradigms. Additionally, it covers query processing in databases, detailing optimization techniques and types of parallelism, including intra-query and inter-query parallelism, as well as parallel discrete event simulation (PDES).

Uploaded by

A7 Roll No 40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Parallel Algorithm

Unit 1
What is Parallel Computing
It is the simultaneous use of multiple computing resources to solve a
computational problem.

To be run using multiple CPUs

Problem is broken into discrete parts that can be solved concurrently

Each part is further broken down to a series of instructions.

Instructions from each part execute simultaneously on different CPUs.

The multiple computing resources can include:

Single computer with multiple processors;

An arbitrary number of computers connected by a network;

A combination of both.

Parallel Algorithm 1
The computational problem usually demonstrates characteristics such as
the ability to be:

Broken apart into discrete pieces of work that can be solved


simultaneously;

Execute multiple program instructions at any moment in time;

Solved in less time with multiple compute resources than with a single
compute resource.

The real world is parallel

Complex, inter-related events happen simultaneously

E.g. galaxies, planetary movements, functioning of the brain, weather,


traffic.

Parallel computing is better suited for modeling, simulation of these


processes

Why Parallel Computing?

Saves time and money: many resources working together will reduce the
time and cut potential costs.

Solves larger problems. E.g. Web search engines/database processing


millions of transactions per second

Use of non-local resources

Provide concurrency (do multiple things at the same time)

Many problems that are interesting to scientists and engineers don't fit on
a PC - they have huge memory requirements

Processors are not getting faster

Due to heat dissipation and power consumption issue

Analyzing Parallel Algorithms


1. Speedup

Parallel Algorithm 2
Definition: Speedup is the ratio of the execution time of the sequential
algorithm to the execution time of the parallel algorithm using ppp processors.

Formula

Parallel Algorithm 3
Models of Computation
Parallel programming models exist as an abstraction above hardware and memory
architectures.

1. Shared memory model

multiple processes sharing common memory space

2. Message passing model (or Distributed Memory Model)

the user makes calls to libraries to explicitly share information between


processors.

3. Threads model

a single process having multiple (concurrent) execution paths

4. Data parallel model

data partitioning determines parallelism

1. Shared Memory Model (Without Threads)

In this programming model, processes/tasks share a common address space,


which they read and write to asynchronously.

Parallel Algorithm 4
Various mechanisms such as locks / semaphores are used to control access
to the shared memory, resolve contentions and to prevent race conditions and
deadlocks.

Data may be cache on the processor that works on it

Compiler translates user variables into "global" memory addresses

Simplest parallel programming model.

Example

The programmer views his program as collection of processes which use


common or shared variables.

The processor may not have a private program or data memory. A common
program and data are stored in the main memory. This is accessed by all
processors.

Each processor is assigned a different part of the program and the data. The
main program creates separate process for each processor.

The process is allocated to the processor along with the required data. These
process are executed independently on different processors.

After the execution, all the processors have to rejoin the main program,

Parallel Algorithm 5
Implementations

Native compilers translates user program variables into addresses, which are
memory "global"

Advantages

Program development becomes simple. As all processes see and have equal
access to shared memory.

Data "ownership" is lacking, so there is no need to explicitly specify the


communication of data between process.

Disadvantages

It becomes more difficult to understand and manage data locality: Users will
not understand where his variables use stored.

2. Message Passing Model

The message passing model (or Distributed Memory model) is defined as:

A set of processes/tasks use their own local memory

Processes/tasks communicate by sending and receiving messages

data transfer requires cooperative operations to be performed by each


process: send() matched by receive()

Parallel Algorithm 6
Programming with message passing is done by linking with and making calls
to libraries which manage the data exchange between processors.

Implementation:

The MPI (Message Passing Interface) is a message passing library standard.


Advantages

The data can be stored anywhere

Communication between processes is simple

Many Manage Passing Interfaces (MPIs) are available

Disadvantage

Programmers should take care that there is a receive function for every send

function.

If any of the process, quite, others also will stop working, as they are
dependent.

3. Threads Model

This programming model is a type of shared memory programming.

In the threads model of parallel programming, a single process can have


multiple concurrent execution paths (called threads)

Parallel Algorithm 7
Each thread has local data, but they also share common data

The threads communicate through global memory. This requires


synchronization constructs to ensure that more than one thread is not
updating the same global address at any time.

Example: A subroutine within the main program (a.out). Any thread can
execute any subroutine at the same time as other threads.

Different threads can be executed on same processor or on different


processor.

If the threads are executed on same processor, then the processor switches
between the threads in a random fashion.

If the threads are executed on different processors, they are executed


simultaneously.

Implementations:

Parallel Algorithm 8
POSIX Threads

OpenMP

Advantages

Programmer need not have to worry about parallel processing.

Disadvantages

Care must be taken that no two threads update the shared resources
simultaneously.

4. Data Parallel Model

Focuses on parallel operations on a set (array) of data items i.e Data Set.

The data set is typically organized into a common structure, such as an array
or cube

A set of tasks work collectively on the same data structure, however, each
task works on a different partition of the same data structure.

Tasks perform the same operation on their partition of work, for example, "add
6 to every array element".

Example

Parallel Algorithm 9
Discuss the challenges and techniques involved in achieving synchronization
and communication among parallel processes in distributed-memory parallel
algorithms. Compare and contrast message passing and shared-memory
paradigms for handling synchronization and communication in parallel
computing environments.

1. Challenges in Achieving Synchronization and Communication

Distributed-Memory Parallel Algorithms:

Data Sharing: In distributed-memory systems, each processor has its own


local memory, making data sharing a challenge. Processes must explicitly
communicate to share data.

Latency: Communication between nodes can introduce latency, making it


crucial to minimize communication overhead.

Synchronization Issues: Coordinating processes in distributed systems can


be more complex due to the lack of a global clock and the possibility of
message delays.

Fault Tolerance: Handling failures in one node requires robust mechanisms to


maintain overall system coherence and continuity.

2. Techniques for Synchronization and Communication


Techniques Used:

Message Passing: A common technique where processes communicate by


sending and receiving messages. This can be implemented using protocols
like MPI (Message Passing Interface).

Pros: Scalable and suitable for large systems.

Cons: Higher overhead due to explicit message management.

Barrier Synchronization: A method to ensure that all processes reach a


certain point before any can proceed, helping to maintain consistency.

Parallel Algorithm 10
Collective Communication: Operations such as broadcast, scatter, and gather
that allow multiple processes to communicate efficiently.

3. Compare and Contrast Message Passing vs. Shared-Memory Paradigms

Aspect Message Passing Shared-Memory

Each process has its own memory;


All processes share a common
Memory Model communication is through
memory space.
messages.

More explicit; requires mechanisms Can use implicit synchronization


Synchronization
(like barriers or locks). (e.g., locks).

Often faster for local


May incur overhead due to
Performance communications but can suffer
message management and latency.
from contention.

Limited scalability; contention


Highly scalable for large systems
Scalability issues grow with the number of
(e.g., supercomputers).
processes.

May require more complex Typically easier to program due to


Ease of Use
programming models. shared state.

UNIT 3
Query Processing in DBMS
A query processor is one of the major components of a relational database or
an electronic
database in which data is stored in tables of rows and columns.

It complements the storage engine, which writes and reads data to and from
storage media

Parallel Algorithm 11
Parsing and Translation

As query processing includes certain activities for data retrieval.

Initially, the given user queries get translated in high-level database languages
such as SQL.

It gets translated into expressions that can be further used at the physical level
of the file system.

After this, the actual evaluation of the queries and a variety of query -
optimizing transformations and takes place.

Query Evaluation Plan

In order to fully evaluate a query, the system needs to construct a query


evaluation plan.

The annotations in the evaluation plan may refer to the algorithms to be used
for the particular index or the specific operations.

Such relational algebra with annotations is referred to as Evaluation Primitives.


The evaluation primitives carry the instructions needed for the evaluation of
the operation

Thus, a query evaluation plan defines a sequence of primitive operations used


for evaluating a query. The query evaluation plan is also referred to as the
query execution plan.

Parallel Algorithm 12
A query execution engine is responsible for generating the output of the given
query.

It takes the query execution plan, executes it, and finally makes the output for
the user query.

Optimization

The cost of the query evaluation can vary for different types of queries.
Although the system is responsible for constructing the evaluation plan, the
user does need not to write their query efficiently.

Usually, a database system generates an efficient query evaluation plan,


which minimizes its cost. This type of task performed by the database system
and is known as Query Optimization.

For optimizing a query, the query optimizer should have an estimated cost
analysis of each operation. It is because the overall operation cost depends on
the memory allocations to several operations, execution costs, and so on

Parallelism in Query in DBMS


Parallelism in a query allows us to parallel execution of multiple queries by
decomposing them into the parts that work in parallel.

This can be achieved by shared-nothing architecture.

It is also used in fastening the process of a query execution as more and more
resources like processors and disks are provided.

1. I/O parallelism :

It is a form of parallelism in which the relations are partitioned on multiple


disks a motive to reduce the retrieval time of relations from the disk.

Within, the data inputted is partitioned and then processing is done in parallel
with each partition.

Results are merged after processing all the partitioned data.

Parallel Algorithm 13
It is also known as data-partitioning

We have four types of partitioning in I/O parallelism:

a. Hash partitioning –

Each row of the original relationship is hashed on partitioning attributes.

Example - let’s assume that there are 4 disks disk1, disk2,


disk3, and disk4 through which the data is to be partitioned. Now if the
Function returns 3, then the row is placed on disk3.

b. Range partitioning –

It issues continuous attribute value ranges to each disk.

Example, we have 3 disks numbered 0, 1, and 2 in range partitioning, and may


assign relation with a value that is less than 5 to disk0, values between 5-40 to
disk1, and values that are greater than 40 to disk2.

c. Round-robin partitioning –

In Round Robin partitioning, the relations are studied in any order.

The ith tuple is sent to the disk number(i % n). So, disks take turns receiving
new rows of data.

This technique ensures the even distribution of tuples across disks and is
ideally suitable for applications that wish to read the entire relation
sequentially for each query.

d. Schema partitioning –

In schema partitioning, different tables within a database are placed on


different disks.

Parallel Algorithm 14
2. Intra-query parallelism :

Refers to the execution of a single query in a parallel process on different


CPUs using a shared-nothing paralleling architecture technique.

This uses two types of approaches:

a. First approach – In this approach, each CPU can execute the duplicate task
against some data portion.

b. Second approach –In this approach, the task can be divided into different
sectors with each CPU executing a distinct subtask.

3. Inter-query parallelism :

There is an execution of multiple transactions by each CPU.

It is called parallel transaction processing.

DBMS uses transaction dispatching to carry inter query parallelism.

We can also use some different methods, like efficient lock management.

4. Intra-operation parallelism :

It is a sort of parallelism in which we parallelize the execution of each


individual operation of a task like sorting, joins, projections, and so on.

The level of parallelism is very high in intra-operation parallelism.

This type of parallelism is natural in database systems

Parallel Algorithm 15
5. Inter-operation parallelism :
When different operations in a query expression are executed in parallel, then it is
called inter-operation parallelism. They are of two types –

Pipelined parallelism – In pipeline parallelism, the output row of one operation


is consumed by the second operation even before the first operation has
produced the entire set of rows in its output. Also, it is possible to run these
two operations simultaneously on different CPUs, so that one operation
consumes tuples in parallel with another operation, reducing them. It is useful
for the small number of CPUs and avoids writing of intermediate results to
disk.

Independent parallelism –In this parallelism, the operations in query


expressions that are not dependent on each other can be executed in parallel.
This parallelism is very useful in the case of the lower degree of parallelism.

Parallel Discrete Event Simulation (PDES)


It is a simulation technique where multiple discrete events are simulated in
parallel to accelerate the computation, especially for large, complex systems.

System is broken down into multiple parts (or logical processes), each of
which is responsible for handling events in parallel.

Events can occur at different times and affect various parts of the system.

Goal of PDES is to exploit parallelism to speed up the simulation process.

Key Concepts in PDES:

1. Discrete Events: These are events that happen at a specific point in time, such
as the arrival of a message, a machine breaking down, or a train reaching a
station.

2. Event List: This is a queue that contains all the scheduled events, sorted by
time.

3. Logical Processes (LPs): These are the independent entities responsible for
processing events. Each LP may simulate different parts of the system, such
as different subsystems or components.

Parallel Algorithm 16
4. Synchronization: One of the critical challenges of PDES is synchronizing
events across the different logical processes to ensure a consistent
simulation.

5. Time Management: In PDES, time must be managed in a way that ensures that
events occur in the correct order. This can be challenging when different LPs
may process events at different times, which could cause inconsistencies.

Steps in Parallel Discrete Event Simulation:

1. Event Scheduling: Each LP schedules events that it will process in the future.
These events are stored in the event list (or priority queue).

2. Event Processing: LPs process the events according to their timestamps, and
then generate new events as required.

3. Synchronization of Events: After processing an event, the simulation checks


if there are any dependencies between LPs that must be synchronized. If
necessary, events from other LPs are brought into the simulation, ensuring
consistency.

Example of Parallel Discrete Event Simulation:


Let's take the example of simulating a simple manufacturing system where
multiple machines are involved, each processing different products. We want to
simulate this system in parallel, where each machine can process products
independently, but we need to synchronize certain events like when a product
finishes processing.
Consider the following:

Machines (LPs): Machine 1 (M1), Machine 2 (M2), and Machine 3 (M3).

Events: Each machine processes a product, and when a machine finishes, it


generates an event that a product is ready for the next stage.

Synchronization: When the last product is processed, we need to know that


all machines have finished processing before ending the simulation.

Steps to Simulate:

Parallel Algorithm 17
1. Initialization: We create the event list, which contains the start time of each
machine's first event.

2. Event Scheduling: Each machine schedules an event for when it will finish
processing a product.

3. Parallel Processing: Machines process their events in parallel. If the machines


are synchronized, they must check each other’s event list to ensure the events
are processed in the correct order.

4. Synchronization Point: At the end of the simulation, we synchronize all the


machines to check if they have completed their tasks.

Image Dithering

It is a technique used to create the illusion of color depth in images with a


limited color palette.

It is especially useful in reducing the number of colors used in an image while


trying to maintain the visual quality and texture of the original image.

What is Dithering?

When an image is displayed on a device with a limited color palette (such as


old computer monitors, printers, or systems with limited color depth), dithering
helps to simulate the appearance of a broader range of colors by strategically
placing pixels of the available colors next to each other.

This creates the perception of intermediate colors or shades, which would


otherwise be unavailable due to the color limitations.

The most common dithering methods are:

Ordered Dithering

Floyd-Steinberg Dithering

Atkinson Dithering

How Does Dithering Work?

Parallel Algorithm 18
Imagine you have an image with many colors, but the display device can only
show a limited number of colors.

Dithering simulates the missing colors by placing pixels of available colors in a


grid pattern that tricks the human eye into perceiving colors that aren’t directly
present.

Example - If you have a pixel that should be light gray, but your device can
only show either black or white, dithering might place black and white pixels
near each other to simulate gray.

Common Dithering Techniques:

1. Ordered Dithering:

Ordered dithering uses a fixed matrix to decide where to place the dithered colors.
It is a simple and fast method but may not produce the best results in terms of
visual quality.
Steps to Apply Ordered Dithering:

1. Convert the image to grayscale: If the image is in color, we first convert it to


grayscale because dithering is typically applied to grayscale images.

2. Normalize the pixel values: We scale the pixel values to a range of 0 to 15,
assuming we are using a 4-level grayscale. For a more complex image, we
could scale the pixel values accordingly to fit the threshold matrix size.

3. Apply the Bayer matrix: The Bayer matrix is tiled over the image. Each pixel in
the image is compared with the corresponding value in the matrix. If the pixel
value is greater than the matrix value at that position, the pixel is set to light
(white); if it's smaller, it is set to dark (black).

4. Tile the matrix - The Bayer matrix is tiled across the image, so you start again
with the first value of the matrix once you reach the end.

Example: You might use a matrix like the Bayer matrix, which breaks the image
into smaller blocks and applies dithering based on the intensity of each pixel.

2. Floyd-Steinberg Dithering:

Parallel Algorithm 19
This is one of the most widely used dithering techniques. It is a error diffusion
method, where the error (difference between the actual color and the closest
available color) is distributed to neighboring pixels. This gives a more natural
appearance and works well for photographs or images with continuous tones.
Steps:

Start with the top-left pixel and calculate the color difference (error) between
the pixel’s color and the closest available color.

Distribute this error to the neighboring pixels (to the right, bottom-right, and
bottom pixels).

Continue the process across the entire image.

3. Atkinson Dithering:

Atkinson Dithering is a simplified version of error diffusion that uses a smaller,


simpler matrix. It is often faster than Floyd-Steinberg but produces a different
visual effect. It diffuses the error to fewer neighboring pixels, which results in a
less detailed but still effective dither pattern.

Applications of Image Dithering:

Display on Low-Color Devices: Dithering is particularly useful when


displaying images on devices that have a limited color palette (e.g., old
monitors, mobile screens).

Image Compression: Dithering can be used to compress images by reducing


the number of colors, while still preserving visual detail.

Printed Graphics: When printing images with limited ink colors, dithering can
help simulate continuous tones and shades.

Parallel Algorithm 20

You might also like