Multicore-Architecture-And-Programming Notes
Multicore-Architecture-And-Programming Notes
Multicore-Architecture-And-Programming Notes
SEM:08
YEAR:04
BATCH 2015-2019
Vision of Institution
To build Jeppiaar Engineering College as an Institution of Academic Excellence in
Technical education and Management education and to become a World Class
University.
Mission of Institution
To equip students with values, ethics and life skills needed to enrich their lives and
M3
enable them to meaningfully contribute to the progress of society
M4 To prepare students for higher studies and lifelong learning, enrich them with the
practical and entrepreneurial skills necessary to excel as future professionals and
contribute to Nation’s economy
Ethics: Apply ethical principles and commit to professional ethics and responsibilities
PO8 and norms of the engineering practice.
Vision of Department
To emerge as a globally prominent department, developing ethical computer
professionals, innovators and entrepreneurs with academic excellence through quality
education and research.
Mission of Department
To create computer professionals with an ability to identify and formulate
M1
the engineering problems and also to provide innovative solutions through
effective teaching learning process.
M3 To produce engineers with good professional skills, ethical values and life
skills for the betterment of the society.
PEO1 To address the real time complex engineering problems using innovative approach
with strong core computing skills.
PEO3 Apply ethical knowledge for professional excellence and leadership for the
betterment of the society.
PEO4 Develop life-long learning skills needed for better employment and
entrepreneurship
An ability to understand the core concepts of computer science and engineering and to
PSO1 enrich problem solving skills to analyze, design and implement software and hardware
based systems of varying complexity.
To interpret real-time problems with analytical skills and to arrive at cost effective and
PSO2 optimal solution using advanced tools and techniques.
BTL1: Remembering
BTL 2: Understanding.,
BTL 3: Applying.,
BTL 4: Analyzing.,
BTL 5: Evaluating.,
BTL 6: Creating.,
OBJECTIVES:
TEXT BOOKS:
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-Kauffman/Elsevier,
2011.
2. Darryl Gove, “Multicore Application Programming for Windows, Linux, and Oracle Solaris”,
Pearson, 2011 (unit 2)
REFERENCES:
1. Michael J Quinn, “Parallel programming in C with MPI and OpenMP”, Tata McGraw Hill
COURSE OUTCOME
C409.1
Illustrate the challenges in parallel and multi threaded programming
C409.2 Explain the various parallel programming paradigms and solutions.
C409.5 Compare and contrast programming for serial processors and parallel processors.
UNIT –I
MULTI-CORE PROCESSORS
Single core to Multi-core architectures – SIMD and MIMD systems – Interconnection networks -
Symmetric and Distributed Shared Memory Architectures – Cache coherence - Performance
Issues –Parallel program design.
Bloom’
Q. No. Questions CO
s Level
Speed up the
Speed up every programs which are
Gain program or software designed for multi-
being executed core processors
Dependent on the
Dependent on the frequency, number of
Performance clock frequency of cores and program to
the core be executed
Processor launched
Processor launched
before 2005 like
after 2005 like Core -
80386,486, AMD
Examples 2-Duo,Athlon 64
29000, AMD K6,
X2,I3,I5 and I7 etc
Pentium I,II,III etc.
What the difference between Message passing vs. DSM C409.1 BTL1
System scalable
Hides the message passing
Can handle complex and large data bases without replication or
sending the data to processes
19 DSM is usually cheaper than using multiprocessor system
No memory access bottleneck, as no single bus
DSM provides large virtual memory space
DSM programs portable as they use common DSM programming
interface
Shields programmer from sending or receive primitives
DSM can (possibly) improve performance by speeding up data access
What are multiprocessor systems and give their advantages? (Nov/Dec C409.1 BTL1
2017)
System scalable
Hides the message passing
Can handle complex and large data bases without replication or
sending the data to processes
DSM is usually cheaper than using multiprocessor system
No memory access bottleneck, as no single bus
29
DSM provides large virtual memory space
DSM programs portable as they use common DSM programming
interface
Shields programmer from sending or receive primitives
DSM can (possibly) improve performance by speeding up data access
Define the symmetric shared memory (Apr/May 2018, Nov/Dec 2018) C409.1 BTL1
30 Symmetric Shared Memory Architecture consists of several processors
with a single physical memory shared by all processors through a shared
bus
List out the advantages of multicore CPU C409.1 BTL1
The latency is the time that elapses between the source’s beginning
to transmit the data and the destination’s starting to receive the first
33 byte.
The bandwidth is the rate at which the destination receives data after
it has started to receive the first byte. So if the latency of an
interconnect is l seconds and the bandwidth is b bytes per second,
then the time it takes to transmit a message of n bytes is
Message transmission time= l+n/b
List out the approaches in cache coherence C409.1 BTL1
34
Snooping cache coherence
Directory-based cache coherence.
35 1.Partitioning
2.Aggregating
3.Communication
4.Mapping
The Serial run-time Tserial and our parallel run-time Tparallel, then
the best we can hope for is Tparallel = Tserial/p.
40
Parallel program has linear speedup. So if we define the speedup of a
parallel program to be linear speedup has S = p, which is unusual.
Furthermore, as p increases,we expect S to become a smaller and smaller
fraction of the ideal, linear speedup p.
For the first step we might identify two types of tasks: finding the
41 bin to which an element of data belongs and incrementing the
appropriate entry in bin counts.
For the second step, there must be a communication between the
computation of the appropriate bin and incrementing an element of
bin counts.
List out the different distributed memory interconnects C409.1 BTL1
42 Distributed-memory interconnects are often divided into two groups:
Direct interconnects and Indirect interconnects
PART B
Bloom’s
Q. No. Questions CO
Level
Explain with neat diagram Symmetric Shared Memory Architectures C409.1 BTL5
4
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:35-42
Explain with neat diagram Distributed Shared Memory Architectures C409.1 BTL5
(Nov/Dec 2018)
5
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:43-45
Explain Cache coherence in Symmetric Shared and Distributed Shared C409.1 BTL5
Memory Architectures
6
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No: 50-55
Explain the performance issues of multicore processor. .(Nov/Dec 2017) C409.1 BTL5
7 1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:58-64
Define cache coherence problem. What are the 2 main approaches to C409.1 BTL5
cache coherence? Describe working of snooping cache coherence and
8 explain describe directory based coherence. (Nov/Dec 2017)
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:43-45
10
building parallel program. Give example (Apr/May 2018)
UNIT II
PARALLEL PROGRAM CHALLENGES
PART A
Bloom
Q. No. Questions CO ’s
Level
void Increment()
{
acquire( &mutex );
counter++;
release( &mutex );}
void Decrement()
{
acquire( &mutex );
counter--;
release( &mutex );}
What is meant by contended mutex. C409.2 BTL1
7 If multiple threads are attempting to acquire the same mutex at the
same time, then only one thread will succeed, and the other threads will
have to wait. This situation is known as a contended mutex.
What is critical region. C409.2 BTL1
8 The region of code between the acquisition and release of a mutex
lock is called a critical section, or critical region. Code in this region will be
executed by only one thread at a time.
Develop the code for Placing a Mutex Lock Around a Region of Code C409.2
Compute_values_held_in_matrix();
Barrier();
total = Calculate_value_from_matrix();
21 What is data sharing. (Apr/May 2017) C409.2 BTL1
Sharing data between multiple threads is called data sharing.
What are the difference between deadlock and livelock. (Apr/May C409.2 BTL1
2017)(Nov/Dec 2018)
The deadlock occurs where two or more threads cannot make
progress because the resources that they needare held by the other threads.
Example:Suppose two threads need to acquire mutex locks A and B to
complete some task. If thread 1 has already acquired lock A and thread 2
has already acquired lock B, then A cannot make forward progress because
it is waiting for lock B, and thread 2 cannot make progress because it is
waiting for lock A. The two threads are deadlocked.
22
What are conditions under which a deadlock situation may arise? C409.2
(Nov/Dec 2017)
Mutual Exclusion: At least one resource is held in a non-sharable
mode that is only one process at a time can use the resource. If
another process requests that resource, the requesting process must
be delayed until the resource has been released.
Hold and Wait:There must exist a process that is holding at least
one resource and is waiting to acquire additional resources that are
23 BTL1
currently being held by other processes.
No Preemption: Resouces cannot be preempted; that is, a resource
can only be released voluntarily by the process holding it, after the
process has completed its task.
Circular Wait: There must exist a set {p0, p1,.....pn} of waiting
processes such that p0 is waiting for a resource which is held by p1,
p1 is waiting for a resource which is held by p2,..., pn-1 is waiting for
a resource which is held by p n and pn is waiting for a resource which
is held by p0.
List out the three critical areas to address large difference scaling C409.2
The amount of bandwidth to cache and the memory will be divided
among the active threads on the system.
43 The design of the caches will determine how much time is lost BTL1
because of capacity and conflict-induced cache misses.
The way that the processor core pipelines are shared between active
software threads will determine how instruction issue rates change
as the number of active threads increases.
What is the role default malloc( ) C409.2
The default malloc() provides better performance than the alternative
44 implementation. The algorithm that provides improved scaling also adds a BTL1
cost to the single-threaded situation; it can be hard to produce an algorithm
that is fast for the single- threaded case and scales well with multiple
threads.
Define an idea to choose the appropriate data structures C409.2
Choosing the best structure to hold data, such as choosing an algorithm of
45 the appropriate complexity, can have a major impact on overall BTL1
performance.
Some structures will be efficient when data is accessed in one pattern, while
other structures will be more efficient if the access pattern is changed.
Define Column major order C409.2
The opposite ordering is followed, so adjacent elements of the first index
46 are adjacent in memory. This is called column-major order. Accessing BTL1
elements by a stride is a common error in codes translated from Fortran into
C. It shows how memory is addressed in C, where adjacent elements in a
row are adjacent in memory.
How to select the appropriate array access pattern C409.2
47 One common data access pattern is striding through elements of an array. BTL1
The performance of the application would be better if the array could be
arranged so that the selected elements were contiguous.
List out the techniques to reduce the latency C409.2
48 Out of order execution BTL1
Hardware prefetching
Software prefetching
List out the non technical reasons why functionality get placed in C409.2
libraries
Libraries often represent a convenient product for an organizational
unit. One group of developers might be responsible for a particular
library of code, but that does not automatically imply that a single
49 library represents the best way for that code to be delivered to the BTL1
end users.
Libraries are also used to group related functionality. For example,
an application might contain a library of string-handling functions.
Such a library might be appropriate if it contains a large body of
code. On the other hand, if it contains only a few small routines, it
might be more appropriate to combine it with another library.
PART B
Bloom’
Q. No. Questions CO
s Level
3.
detail.
Explain communication between threads using message queues and C409.2 BTL5
6. pipes.
2. Darryl Gove, “Multicore Application Programming for Windows, Linux,
and Oracle Solaris”, Pearson, 2011 Pg.No:138-139
Explain data races and scalability in parallel program. (apr/may2017) C409.2 BTL5
7.
2. Darryl Gove, “Multicore Application Programming for Windows, Linux,
and Oracle Solaris”, Pearson, 2011 Pg.No:121-126
8.
(Apr/may2017)
What is a data race? What are the tools used for detecting data races? C409.2 BTL5
Write a short notes on deadlocks, livelocks and named pipes C409.2 BTL5
12 2. Darryl Gove, “Multicore Application Programming for Windows, Linux,
and Oracle Solaris”, Pearson, 2011
Explain the outline about necessity of structure reflects in performance C409.2 BTL4
14 2. Darryl Gove, “Multicore Application Programming for Windows, Linux,
and Oracle Solaris”, Pearson, 2011
Write in detail and summarize about hardware constraints applicable C409.2 BTL5
in improving scaling
15
2. Darryl Gove, “Multicore Application Programming for Windows, Linux,
and Oracle Solaris”, Pearson, 2011
UNIT III
Develop a”hello word” program inthat uses open MP. . (Apr/May 2017) C409.3 BTL1
9.
Hello world program in C using MPI:
#include <stdio.h>
#include <mpi.h>
ierr = MPI_Finalize();
}
Define Message Queue.(Nov/Dec 2017) C409.3 BTL1
Message queues provide an asynchronous communications protocol,
10
meaning that the sender and receiver of the message do not need to interact
with the message queue at the same time. Messages placed onto the queue
are stored until the recipient retrieves them.
What is an initial task region? C409.3 BTL1
11. An initial thread executes sequentially, as if enclosed in an implicit
task region, called an initial task region, that is defined by the implicit
parallel region surrounding the whole program.
Discuss the Structure of the OpenMP Memory Model C409.3 BTL1
The OpenMP API provides a relaxed-consistency, shared-memory
12
model. All OpenMP threads have access to a place to store and to retrieve
variables, called the memory. In addition, each thread is allowed to have its
own temporary view of the memory.
What is threadprivate memory? C409.3 BTL1
13
Each thread also has access to another type of memory that must not
be accessed by other threads, called threadprivate memory.
Show the format of directive in OpenMP. C409.3
#pragma omp directive-name [clause[ [,] clause]...] new-line
14 Each directive starts with #pragma omp. The remainder of the BTL2
directive follows the conventions of the C and C++ standards for compiler
directives. In particular, white space can be used before and after the #, and
sometimes white space must be used to separate the words in a directive.
What is meant by Stand-alone directives? C409.3 BTL1
Stand-alone directives do not have any associated executable user code.
Instead, they represent executable statements that typically do not have
15 succinct equivalent statements in the base languages. There are some
restrictions on the placement of a stand-alone directive within a program. A
stand-alone directive may be placed only at a point where a base language
executable statement is allowed
binds to the innermost enclosing parallel region. Only the threads of the
team executing the binding parallel region participate in the execution of
the loop iterations and the implied barrier of the loop region if the barrier is
not eliminated by a nowait clause
copyprivate(list)
nowait
Show the syntax of the workshare construct. C409.3 BTL1
!$omp workshare
structured-block
!$omp end workshare [nowait]
The enclosed structured block must consist of only the following:
29 • array assignments
• scalar assignments
• FORALL statements
• FORALL constructs
• WHERE statements
• WHERE constructs
• atomic constructs
List the restrictions apply to the workshare construct: C409.3
• All array assignments, scalar assignments, and masked array
30 BTL4
assignments must be intrinsic assignments.
• The construct must not contain any user defined function calls
unless the function is ELEMENTAL
What is the use of schedule clause. C409.3
If more than one loop is associated with the loop construct, then the
iterations of all associated loops are collapsed into one larger iteration space
31 BTL1
that is then divided according to the schedule clause. The sequential
execution of the iterations in all associated loops determines the order of the
iterations in the collapsed iteration space
52 BTL1
. ` PART B
Bloom’s
Q. No. Questions CO
Level
Explain in detail how to Handle Loops in OpenMP (Nov/Dec 2017) C409.3 BTL5
How data and functional parallelism are handled in shared memory C409.3 BTL5
programming with open MP? (Apr/may2017)
7.
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:212-215
Explain in detail about the handling loops in parallel operations C409.3 BTL5
(Nov/Dec 2017)
8.
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:236-240
Write an example program for shared memory programming with C409.3 BTL5
Pthread (Apr/May 2018)
9
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:236-240
UNIT IV
PART A
Bloom
Q. No. Questions CO ’s
Level
What is MPI?
One process calls a send function and the other calls a receive function. The
implementation of message-passing that will be using is called MPI
1. (Message-Passing Interface). MPI is not a new programming language. It C409.4 BTL1
6.
actions, and this is achieved by simply having the processes branch on the C409.4 BTL1
basis of their process rank. This approach to parallel programming is called
single program, multiple data, or SPMD
local n.
BTL1
14 What is Global Variables? Give Examples. C409.4
Variables whose contents are significant to all the processes are
sometimes called global variables. Some examples are: a, b and n.
What is point to point communications in MPI? BTL1
15 C409.4
MPI_Send and MPI_Recv are often called point-to-point
communications.
List any two points stating that how collective communication differ
from point to point communication.
16 All the processes in the communicator must call the same collective C409.4 BTL4
function. Example, a program that attempts to match a call to MPI_Reduce
on one process with a call to MPI_ Recv on another process is erroneous,
and, in all likelihood, the program will hang or crash.
What is a butterfly-structured global sum? BTL1
17 The processes exchange partial results instead of using one-way C409.4
communications. Such a communication pattern is sometimes called a
butterfly-structured global sum.
What is broadcast? BTL1
18 C409.4
A collective communication in which data belonging to a single process
is sent to all of the processes in the communicator is called a broadcast
Show a broadcast function. BTL1
A broadcast function:
int MPI_Bcast (
19 void* data p /* in/out */, C409.4
int count /* in */,
MPI_Datatype datatype /* in */,
int source proc /* in */,
MPI_Comm comm /* in */);
Define block partition of the vector. BTL1
The work consists of adding the individual components of the vectors,
so we might specify that the tasks are just the additions of corresponding
components. Then there is no communication between the tasks, and the
problem of parallelizing vector addition boils down to aggregating the
20 C409.4
tasks and assigning them to the cores. If the number of components is n
and we have comm_sz cores or processes, let’s assume that n evenly
divides comm_sz and define local n D n=comm sz. Then we can simply
assign blocks of local n consecutive
components to each process. This is often called a block partition of
the vector.
Show MPI Scatter function. BTL1
The communication MPI provides a function:
21 int MPI Scatter( C409.4
void* send_buf_p /* in */,
int send_count /* in */,
MPI_Datatype send_type /* in */,
int MPI_Type_create_struct(
int count /*
in */,
24 C409.4
int array_of_blocklengths[] /*
in */,
MPI_Aint array_of_displacements[] /*
in */,
MPI_Datatype array_of_types[] /*
in */,
MPI_Datatype* new_type_p /*
out */);
Show the ratio of the serial and parallel runtime. BTL1
The most widely used measure of the relation between the serial and the
parallel run-times is the speedup. It’s just the ratio of the serial run-time
25 to the parallel run-time: C409.4
Tserial (n)
S (n, p) = -----------------
Tparallel (n, p)
Show “per process” speedup? BTL1
26 The “Per Process speedup is C409.4
What are the possibilities for choosing a destination when sending BTL1
requests for work with MPI
31 MPI is designed to allow users to create programs that can run efficiently on most C409.4
parallel architectures. The design process included vendors (such as IBM, Intel,
TMC, Cray, Convex, etc.), parallel library authors (involved in the development of
PVM, Linda, etc.), and applications specialists
List the restrictions work sharing constructs BTL1
32 C409.4
The sequence of work-sharing constructs and barrier directives encountered must
be the same for every thread in a team
33 Write the evaluation methods is distributed memory programming C409.4 BTL1
46 C409.4
List out the two possibilities when the message are assembled BTL1
47 C409.4
the sending process can buffer the message or
it can block
What are the types of MPI type BTL1
The MPI type MPI_Status is a struct with at least the three members MPI_
SOURCE, MPI_TAG, and MPI_ERROR.
48 Suppose our program contains the definition MPI_Status status; Then, after C409.4
a call to MPI Recv in which &status is passed as the last argument, we can
determine the sender and tag by examining the two members
status.MPI SOURCE
status.MPI TAG
49 What is status_p argument C409.4 BTL1
The Amount of Data In The Message
50 C409.4
PART B
Bloom
Q. No. Questions CO ’s
Level
Disscus about MPI Derived data types with example programs. (10)
UNIT V
PART A
Bloom’
Q. No. Questions CO s Level
cause a problem: the lock condition (e.g. the cost of the best tour) can
change between the time of the first test and the time that the lock is
acquired. Thus, the threads also need to check the lock condition after they
acquire the lock.
Show the pseudocode for updating the best tour. C409.5 BTL1
The pseudocode for updating the best tour should look something like this:
if (new tour cost < best tour cost) {
Acquire lock protecting best tour;
17 if (new tour cost < best tour cost)
Update best tour;
Relinquish lock;
}
20
21
distributed memory refers to a multiprocessor computer system in which each
processor has its own private memory. Computational tasks can only operate on
local data, and if remote data is required, the computational task must communicate
int MPI_Gatherv(
void* sendbuf /* in */,
int sendcount /* in */,
MPI_Datatype sendtype /* in */,
void* recvbuf /* out */,
int* recvcounts /* in */,
int* displacements /* in */,
MPI_Datatype recvtype /* in */,
int root /* in */,
MPI_Comm comm /* in */);
What are the three modes provided by MPI? C409.5 BTL1
28
MPI provides three other modes for sending: synchronous, standard, and
ready.
List the use of functions MPI_Pack and MPI_Unpack. C409.5 BTL1
MPI provides the function MPI_Pack for storing a data structure in a single,
29
contiguous buffer before sending. Similarly, the function MPI_Unpack can
be used to take data that’s been received into a single contiguous buffer and
unpack it into a local data structure
List the use of ready mode. C409.5 BTL1
30 Ready sends (MPI_Rsend) are erroneous unless the matching receive
has already been started when MPI_Rsend is called. The ordinary send
MPI_Send is called the standard mode send.
List the use of Synchronous mode. C409.5 BTL1
31 Synchronous sends won’t buffer the data; a call to the synchronous send
function MPI_Ssend won’t return until the receiver has begun receiving the
data.
How to compute n-body forces C409.5 BTL1
List the data structures used for serial implementation C409.5 BTL1
33 The data structures are the tour, the digraph, and, in the iterative implementations, the sta
and the stack are essentially list structures. For tour instead of array structure with three m
the array storing the cities, the number of cities, and the cost of the partial tour.
When the digraph are represented using List.
C409.5 BTL1
Difference between Parallelizing the two n-body solvers using pthread
and OpenMP.
The main difference between the Pthreads and the OpenMP implementations is in th
34 parallelizing the inner loops. Since Pthreads has nothing analogous to a parallel for d
must explicitly determine which values of the loop variables correspond to each thread’s
calculations. To facilitate this a function Loop schedule which contains
. the initial value of the loop variable,
. the final value of the loop variable, and
. the increment for the loop variable.
C409.5 BTL1
How the performance of the reduced solver is much superior to the performance
of the basic solver?
35
The efficiency of the basic solver on 16 nodes is about 0.95, while the efficiency of the redu
16 nodes is only about 0.70. A point to stress here is that the reduced MPI solver makes muc
efficient use of memory than the basic MPI solver the basic solver must provide storage for
on each process, while the reduced solver only needs extra storage for n=comm_sz position
n=comm_sz forces
C409.5 BTL1
How can we use OpenMP to map tasks/particles to cores in the basic version o
n-body solver?
C409.5 BTL1
Write the pseudocode for the MPI version of the basic n-body solver?
C409.5 BTL1
What are the two phases for computation of forces?
39
The following choices with respect to the data structures:
Each process stores the entire global array of particle masses.
Each process only uses a single n-element array for the positions.
Each process uses a pointer loc_pos that refers to the start of its block of pos.
C409.5 BTL1
Write the pseudo code for an implementation of a depth-first solution to TSP with
using recursion?
It is necessary to push onto the stack to create a copy of the tour before actually pushing it on to the stack
using the function Push _copy. The extra memory is required to allocating storage for a new tour and
copying the existing tour is time-consuming. Reduce the costs by saving freed tours in our own data
structure, and when a freed tour is available we can use it in the Push _copy function instead of calling
malloc.
C409.5 BTL1
What are the algorithms for identifying which subtrees we assign to the pr
threads
42
depth-first
search
breadth-first
search
Define the term POSIX or PThreads C409.5 BTL1
Pthreads are libraries of type definitions, functions, and macros that can be used in C program
43 a standard for Unix-like operating systems—for example, Linux and Mac OS X. It specifies
facilities that should be available in such systems. In particular, it specifies an application prog
interface (API) for multithreaded programming. Pthreads is not a programming language (suc
Java). Rather, like MPI, Pthreads specifies a library that can be linked with C programs. Unl
Pthreads API is only available on POSIX systems—Linux, Mac OS X, Solaris, HPUX, and so
C409.5 BTL1
What are the reason for parameter threads_in_cond_wait used in Tree
search?
44
There are also two cases to consider:
o threads _in_cond_wait < thread_count, it tells us how many threads are waiting
o threads_in_cond_wait == thread count,all the treads are out of work, and its
C409.5 BTL1
What are the global variables for Recursive depth first search?
45
n: the total number of cities in the problem .
digraph: a data structure representing the input digraph .
hometown: a data structure representing vertex or city 0, the salesperson’s hometown . best
tour: a data structure representing the best tour so far.
Mention the performance of MPI solvers C409.5 BTL1
The performance of the reduced solver is much superior to the performance of the
basic solver, although the basic solver achieves higher efficiencies .
46
A point to stress here is that the reduced MPI solver makes much more efficient
use of memory than the basic MPI solver; the basic solver must provide storage for
all n positions on each process, while the reduced solver only needs extra storage
for n/commsz positions and n/commsz forces.
47 Mention the principal data structures on pthread C409.5 BTL1
The vectors are two-dimensional arrays of doubles, and the mass, position, and
velocity of a single particle are stored in a struct. The forces are stored in an array of
vectors.
Name any two OpenMp environment variables (Nov/Dec 2018) C409.5 BTL1
omp_set_num_threads(num_threads)
48
omp_get_num_threads()
omp_get_max_threads()
omp_get_thread_num()
List any two scoping clauses in OpenMP (Nov/Dec 2018) C409.5 BTL1
49 Shared Variables
Private Variables
What are the reason for parameter threads_in_cond_wait used in Tree C409.5 BTL1
search?
PART-B
Bloom
Q. No. Questions CO ’s
Level
i.How to parallelize the basic solver using MPI. .(Apr/May2017) C409.5 BTL5
ii.Explain Non recursive depth first search. .(Apr/May2017)
7. 1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No: 316-317 & 303-304
Explain the implementation of tree search using MPI and dynamic C409.5 BTL5
partitioning. .(Apr/May2017)
8.
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No:229-318
What does the n-body problem do/give the pseudocode for serial n- C409.5 BTL1
body solver and for computing n-body forces. .(Nov/Dec 2017).
9 1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No: 271-297
How will you parallelize the reduced solver using Open Mp? How will C409.5 BTL1
you parallelize the reduced solver using Open MP? .(Nov/Dec 2017).
10
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011 Page No: 271-297
Describe about the Recursive and non recursive DFS C409.5 BTL2
13
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011
Examine the Data structures for the serial implementation C409.5 BTL3
14
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011
Express detail about the static parallelizing of tree search using C409.5 BTL1
15 PThreads
1. Peter S. Pacheco, “An Introduction to Parallel Programming”, Morgan-
Kauffman/Elsevier, 2011