PJ 1
PJ 1
PJ 1
https://www.tutorialspoint.com/cprogramming/c_command_line_arguments.htm
https://www.cprogramming.com/tutorial/c-tutorial.html
Programming Language: by Brian W. Kerninghan, Dennis M.Ritchie
Problem Introduction and Description: Generate a text file and populate it with L (>=10,000)
positive integers, randomly generated. Identify the MAXIMUM, the AVERAGE, and a number of
“H”<20 GIVEN HIDDEN KEY INTEGERs out of the 50 hidden keys randomly placed in the
file. Use Inter Process Communication (IPC) of pipes (or shared memory – if you are familiar
with it) to partition the file/array into chunks and distribute work to more than one processes.
You must “hide” your secret keys into your list/file of > = 10,000 integers. The hidden keys will
be 50 negative integers (with value -1) inserted at random positions into your file. But then,
after creation of the file, you “forget” where the hidden keys are located. You only know that you
must look for “H” secret keys.
Arguments/Parameters: Arguments: L, H, and Process Number (PN) are provided at run time
and are handled by main (int argc, char* argv[]).You can also request this information
from the keyboard and have your program read/scanf the information entered from the user.
(how to use argc-argv[] -- read the related section on tutorialspoint site provided above).
Experiments Set Up: Use multiple processes from 1 to “PN” (where PN: Process Number
argument) in the following layout strategies:
1)) A DFS tree (a chain of processes) – can experiment using 1 up to the max. no of
processes you can generate before you get a core dump message.
2)) A BFS tree with tree choices for children: For example a) 2, b) 3, and c) 4 children. If
however in your system you can fork many more processes before you get a core dump, you
may wish to experiment with trees of a larger size of offspring (For example: a)2, b)4, c) 6 or 8).
Can experiment using as PN the max. no of processes you can generate before you get a core
dump message.
Every time a process forks a child or children process(es), the parent must pass to the offspring
the part of the partitioned file that corresponds to each offspring. This information can be passed
through pipes. Then each offspring processes its own part/segment of the array/file to identify
the local metrics of interest and also forks (if needed) its own offspring and passes on to their
offspring their corresponding array/file segments (the array/segment for the corresponding sub-
tree).
Any process that has forked child/children must invoke waitpid for all its direct offspring and
analyze how they have been interrupted/terminated. Once a child identifies its local
maximum and average values it must send this data to its parent. The parent must collect
all metrics of interest from all children, process these accordingly, and when ready, send its own
calculation results to its own parent, and so on and so forth until the root is reached, and the
final metrics that show the total maximum and total average are computed.
The internal process awaits of the termination of all its children processes. Every process prints
a corresponding message every time it transitions to another phase (for example: start, loop to
wait for children termination, computation of its local max and avg., allowing its own
termination), so that the validation of the correct program operation is feasible.
A = 2, B = 4, C = 6, D = 10
Here the return codes are distributed arbitrarily, the tree nodes are very few and hence the
return codes do not class. But what happens when we have a tree with many more nodes?
You must identify the trade-offs in computation versus communication overhead when you fork
multiple process from a single process. The results of this experiment will help you understand if
multiple process that partition the large array help you calculate the requested values faster.
Remark: Record the time it takes for each of the programs to run and comment on your
observations. Try it on lists of size L: 10k, 100k, 1M, 5M.
Hint: You should first load the file into an array then start working on the data.
Input Format: Input will be in a text file.
Output Format: You should print out the results in a text file. Every process that is created should
print out their own process id and their parent’s process id. Then once you have computed the final max,
avg, and hidden keys, you will print those out. Please follow this format:
Hi I’m process 2103 with return arg 2. I found the hidden key in position A[65].
Hi I am process 2108 with return arg 7 and I found the hidden key in position A[123].
Hi I am process 2014 with return arg 13 and I found the hidden key in position A[204].
Hi I am process …. (all the above are fictitious numbers).
Find which process layout completes faster and how many processes are involved. Modify the
number of elements (the size of the array) and check if and how this may affect your results.
Provide your insight on your findings. Is this what you expected, yes, no, and why yes/no?
Summary:
1) Write a program to solve the problem using only one process.
2) Write a program to solve the problem using multiple processes where each process spawns at
most one process. (Like DFS). The maximum number of processes spawned should be NP.
3) Write a program to solve the problem using multiple processes where the first process spawns a
fixed number of processes, (2 or 3 or 4 or different numbers if you wish), and the children spawn
their own (2 or 3 or 4 or different) multiple processes each, and so on and so forth. The maximum
number of processes spawned should be NP. Your tree structure may be pre-defined and hence
avoid using loops or recursion or counters to create a new tree. You also have the option to ask
the user to provide the size of chidren (1, 2, 3, 4, or different). At the end, you must produce the
structures, address all questions, and compare the results for each structure.
4) Your ultimate goal should be to find the process tree that produces the final result faster. Can you
fine-tune the size of variable NP to do that? Consider lists of increasing sizes. There is a trade-off
w.r.t. to the number of processes but there is also a trade-off w.r.t. the number of integers in the
file (and increasing file sizes). What are the trade-offs you observe here?
5) Can you create an arbitrary number of processes or are there any limitations? If you find
limitations in your version of Linux OS, please show (e.g. via a printscreen or a file) what are your
exact limitations in increasing the number of processes you will be generating. Why is this so?
6) What is the maximum number of processes you can create before your system crashes?
Once a child process has calculated its metrics of interest and has sent these to its parent, then
it will “pause”, waiting for its parent to decide on its fate.
Pause should be implemented by having the process send signal SIGTSTP to itself, using the
following system call:
- raise(SIGTSTP);
Pausing the child as above, will give time to the parent to decide what to do with the given child.
The child must separately notify the parent how many hidden elements it has. The parent will
make decisions about the fate of the child based on the number of hidden nodes.
Every parent updates now its own hidden nodes by adding to its existing hidden nodes only the
first hidden node of every child (if there is any).
For example if child pid gives 5 hidden nodes to the parent ppid, parent ppid makes is decision
for child pid based on the 5 hidden nodes, but for its own subsequent computations adds only
+1 (hidden nodes) coming from this child, to the number of hidden nodes it already has. So, if
the parent has 4 children, and all children are polluted with 20 hidden nodes in total, the parent
will add to itself exactly 4 hidden nodes and no more.
Decision rules for a child with pid and parent with ppid:
2) ELSE IF If the above do not hold and If child with has hidden numbers >=H then this
child
will be terminated, by having the parent send a SIGKILL signal to this child, invoking the
following system call:
signal(child,9); or signal(child,SIGKILL)
You must check what the case is with the offspring of a terminated child. Are they also
terminated at this point (cascaded termination) or are they adopted by init? One way to
look into this is call ps –au from another terminal and observe the status of processes:
running, schedule, blocked, zombie, terminated, …. (please research into this).
3) If the above two rules do not hold, then the parent will unblock the child by invoking
signal(pid, SIGCONT);
and allow them to run their remaining code and exit naturally, by having the children
invoke exit(child_arg); at the end of their code. Before that, the child should invoke
system call sleep(20); so that it gives the system the opportunity to display the subtree of
interest calling pstree from the root.
Reminder: The arg that must be used with the exit system call has been passed by the parent
through the pipe, and must be unique for every child in the process.
Objective: You must show how you have implemented all the above design requirements and
you must prove that the system behaves as prescribed. The processes that you generate must
remain active for some considerable window of time in order for the user to be able to observe
the tree. Either the leaf process or all processes execute a call to: sleep(). The internal process
awaits of the termination of all its children processes and analyze the termination circumstances
with the proper printouts, using the unique return ids of the processes and using system calls
such as pstree to display the tree.
Additional Questions:
1. What happens if root process A is terminated prematurely, by issuing: kill -SIGKILL
<pid>?
2. What happens if you display the process tree with root pstree(getpid()) instead of
pstree(pid()) in main()? Which other processes appear in the tree and why?
Solution:
What to turn in:
C files for each problem
A makefile in order to run your programs.
Input text file (your test case)
Output text file (for your test case)
Report: Explain design decisions (fewer vs. more processes, process structure, etc.).
Elaborate on what you have learned from each problem. Answer the question(s) below each
part/subproblem. Also, please consider providing a very detailed report, as along with your
C file deliverables, it corresponds to a substantial portion of your grade.
Logistics:
You will work within your defined group. Do not collaborate with other groups. Groups
that have copied from each other will BOTH get zero points for this project.
You are expected to work on this project using LINUX OS
Make ONE submission per group. In this submission provide a table of contribution for
each member that worked on this project.
Useful Links:
https://www.gnu.org/software/libc/manual/html_node/Generating-Signals.html#Generating-Signals
https://www.gnu.org/software/libc/manual/html_node/Pipes-and-FIFOs.html#Pipes-and-FIFOs
https://www.gnu.org/software/libc/manual/html_node/Creating-a-Process.html#Creating-a-Process
https://www.gnu.org/software/libc/manual/html_node/Process-Completion.html#Process-Completion
https://en.wikipedia.org/wiki/Depth-first_search