CS 350 – Fall 2003

Project 2 -9/24/03, sample output updated

“Hot Potato” Distributed Algorithm

Report Due: Friday 10/3/03 after class

Objective:

The objective of this project is to create an application using “targeted message passing” with UNIX message queues. Targeted message passing was introduced in the second part of project 1 (called “directed messages” in that project). In part 1 of project 1 the messages were not targeted at any particular process and any process (including the sender), with access to the message queue, could attempt to receive a message. This is was what may be considered a “Hungry Dog” approach in which any process is eligible to receive any message and unpredictable race conditions may result. Part 2 of project 1 employed “targeted message passing” , in which a message sent would be directed at a particular process and only this process will be able to receive it.

You will now apply targeted message passing by modeling a network of n processors as n UNIX processes running concurrently. It is assumed the n processes are working cooperatively and that we must guarantee the integrity of the “network” of processes by continuously passing a “token” message from one process to another in a loop in round robin style. This message will be referred to as a “Hot Potato” for obvious reasons The successful passage of the token would guarantee the connectivity of the processes (or the processor nodes they are modeling). A master node called the Control Node (CN) will control this process on the n-1 Slave nodes (SN) which are forked as child processes of the CN. Besides initiating and controlling the Hot Potato process, the CN will initially take a “census” of all the SN’s by bouncing an “echo” message off each SN (like a “ping”). In addition, at the end of the hot potato loops, the CN will send a special “quit potato” message to be passed in the loop, but this time (on detecting the “quit potato”) each node will quit after passing the “quit potato” on. After all the SN’s quit, the CN will return the message queue and also quit.

You may use any UNIX system for this exercise - The UNIX systems at the west pod should primarily be used and alternately bingsuns if there is a problem using the pod systems. You must check all system return values for error conditions and take appropriate action. Penalties will be given for not doing these checks. NOTE: You MUST run ipcsfree after each every test run (successful or not)– message queues can be depleted and cause the server to get corrupted.

How to do “targeted” message passing

Refer to the setup material in project 1 for the basic header files, required system structure and variables, formats for the system calls, etc. This project will use blocking receives as in project 1. As in project 1, targeted or directed messages were achieved with the following system calls and pseudo-code:

struct q_type {

long mtype;

char mtext[81];

} q_entry; /* a place to receive and launch messages from */

int id; /* id is some positive integer associated with a process */

put a message string in q_entry.mtext

q_entry.mtype = id; /* must be a positive integer identifying the receiver */

rc = msgsnd(msgqid, &q_entry, 80 ,0); /* msg is first placed in q_entry.mtext */

/* below is the receiver code in the process associated with id */

rc = msgrcv(msgqid, &q_entry, 80, id ,0); /* msg will be in q_entry.mtext */

Assume that each concurrent process is identified by some positive integer: id. This could be the pid, but in this project, we will assign consecutive integers 1, 2, 3, 4, … to each process, where the CN has id 1, and the remainder are the SN’s each assigned an integer in the order they are created, for example 2,3,4, …, n, where n is the total number of processes including the CN. Think of these id’s as “logical pids” as opposed to the true pid which is returned by a getpid() call. The “id” parameter placed in the second to the last parameter in msgrcv(…) must match the value assigned to q_entry.mtype by the sender. This scheme will guarantee complete synchronism with no race conditions.

Program description:

The following is a general description of the task: In the discussion which follows, we will refer to the original parent as a “Control Node” (CN’s) and the children processes as “Slave Nodes” (SN’s). Also “pid” means “process id”. The original parent (CN) will prompt the user for the number of children (SN’s) to create via the fork process, and the number of cycles that the “Hot Potato” must circulate. Place upper bounds on these inputs (say no more than 8 for each and enforce them). As each SN is created, its pid is stored in an array list of pids (pidlist[]) of all the processes and will always accessible to the CN. The CN and all the SN’s each displays is presences by displaying its pid.

After the SN’s are created, the CN will take a “census” in order to check to ,see that each SN is operating and is “who it thinks it is” by sending an “echo request” to each node. The message sent will be the pid of the target SN from the pidlist. On receiving the echo request, thee SN will check to see if the received pid is the same as its own pid., if not it sends a null message back to the CN, it matches, it sends its own pid back to the SN. When the SN receives the echo back, it will also verify that the received pid matches what it sent. If it is a null message or does not match, it will abort. The echo process is similar to a ping in a network.

If the “census” process was successful, the CN will now initiate the “Hot Potato” cycles. The potato is initialized as a null string. In general, when any node receives the potato, it will “sign” the potato with its logical pid as the last character in the message string representing the potato. For example, if you are process 5, and received the potato, then replace the terminating null character in the potato message by the character 5, and add a new null character before passing it to the next node. Before sending the first pass of the potato, the SN will replace the null character in the empty string with its logical pid (which is 1) and add a null character to index 1 of the potato. The CN will control the flow ofa the potato. All the SN does when it receives the potato, is to sign it and pass it on. The Cn will detect when the number of cycles of potato passes is used up. When this occurs, the CN will now pass on a “quit” potato, which is a null message. when an SN in potato mode, receive a null potato, it will pass it on the the next node and quit. When the CN receive the null potato, it will go into a loop and wait for each child to quit for each pass of the loop. When all SN hav quit, the CN will return the message queue to the system, and wave bye-bye and also quit.

In general you must trace the action of this process by doing printf’s which describe essential events. Whenever the CN receives the potato, it should display it to show how it is growing as each process signs it on each pass. Proper documentation of the events will be essential in debugging the program.

The following “program outline” summarizes this process:

Program outline:

1.  The user will input the number (n) of slave nodes (child processes) in addition to the original control node (parent). There is a total of n+1 nodes.

2.  The user also inputs the number of cycles or passes the potato must make.

3.  CN creates n number of SN’s. CN keeps track of the pids of each SN in an array of ints called pidlist. pidlist[0] is the pid of the CN.

4.  The CN now goes into “census” mode and sends the pid of each SN to the corresponding SN (echo request).

5.  Each SN blocks on the message queue waiting for a census message from CN. On successfully receiving the echo request, the SN sends it pid back to the CN. If the SN finds out that the pid does not match, it send a null message back top the CN.

6.  now CN goes into “Hot Potato” mode. CN sends a signed potato (it’s logical pid in the zero index of the potato, followed by the null character) to node 2 to start the process off.

7.  On receiving the potato message, the SN signs it and passes it on to the next logical pid (its own logical pid + 1, with the last SN sending to logical pid 1, not 0).

8.  If the number of cycles are not exhausted, the CN also signs and passes on the potato.

9.  If the number of cycles are over, the SN passes on a “quit potato” (null message)

10.  When an SN receives a quit potato, it passes it on and also quits.

11.  When the quit potato gets back to the CN, it will wait for each SN to quit, then return the message queue to the system, and finally quit.

The following is a sample output from a run with 4 SN’s and 2 cycles:

bingsun2% p2

Control node (CN) has pid: 5895

Enter the number of child processes:(slave nodes): 4

Enter the number of cycles: 2

Message queue 19501 is here

child with id 2 and pid 5904 is here

child with id 3 and pid 5905 is here

child with id 4 and pid 5906 is here

child with id 5 and pid 5907 is here

starting do_parent with pid 5895

***DB: CN sending echo req to child id 2

***DB: CN rcved echo from child id 2 and pid 5904

***DB: CN sending echo req to child id 3

***DB: CN rcved echo from child id 3 and pid 5905

***DB: CN sending echo req to child id 4

***DB: CN rcved echo from child id 4 and pid 5906

***DB: CN sending echo req to child id 5

***DB: CN rcved echo from child id 5 and pid 5907

***DB CN just finished the echo request phase

***DB CN ready to start pass potato phase

***SN: child 2 just recved a potato with signature 1

***DB CN sent pot to SN id 2 on currcyc-1 0, signature 1

***SN:Normal potato was passed on to node 3

***SN: child 3 just recved a potato with signature 12

***SN:Normal potato was passed on to node 4

***SN: child 4 just recved a potato with signature 123

***SN:Normal potato was passed on to node 5

***SN: child 5 just recved a potato with signature 1234

***SN:Normal potato was passed on to node 1

***DB CN rcvd pot on currcyc-1 0, with signature 12345

***SN: child 2 just recved a potato with signature 123451

***DB CN sent pot to SN id 2 on currcyc-1 1, signature 123451

***SN:Normal potato was passed on to node 3

***SN: child 3 just recved a potato with signature 1234512

***SN:Normal potato was passed on to node 4

***SN: child 4 just recved a potato with signature 12345123

***SN:Normal potato was passed on to node 5

***SN: child 5 just recved a potato with signature 123451234

***SN:Normal potato was passed on to node 1

***DB CN rcvd pot on currcyc-1 1, with signature 1234512345

***SN: child 2 just recved a potato with signature

***DB SN: child 2 received a quit potato

child 2 is quitting

***SN: child 3 just recved a potato with signiture

***DB SN: child 3 received a quit potato

child 3 is quitting

***SN: child 4 just recved a potato with signiture

***DB SN: child 4 received a quit potato

child 4 is quitting

***SN: child 5 just recved a potato with signiture

***DB SN: child 5 received a quit potato

child 5 is quitting

CN: Quit potato returned to CN - we're going to quit

***DB CN waiting for a child quit

Child with pid 5907 is gone

***DB CN waiting for a child quit

Child with pid 5906 is gone

***DB CN waiting for a child quit

Child with pid 5905 is gone

***DB CN waiting for a child quit

Child with pid 5904 is gone

Message queue 19501 is gone-its all over but the laughing

bingsun2% ipcsfree

0 semaphores were returned for userid cs350.

0 message queues were returned for userid cs350.

0 shared memory segments were returned for userid cs350.

Some tips:

To check the length of a string, use the strlen(…) call. This usage is:

int strlen(cs);

where cs is the string and the return value is the length not counting the terminating null character (‘\0’)..

To convert a string representation if an integer (a sequence of characters representing digits), use:

int atoi(cs);

where cs is the string and the return value is the corresponding integer.

There is no standard library function converting an integer to a string. The following function may be added to your program to accomplish this function (include a prototype if you sue this function):

void itoa (unsigned n, char s[] ) { /* converts integer to a string s */

/* n is a given integer, result is put in s */

/* s is passed uninitialized, and gets filled in*/

unsigned left, right;

char temp;

unsigned i = 0; /* ye olde counter variable */

do {

s[i++] = (char)(n%10 + '0');

}

while( (n /= 10 ) > 0 );

s[i] = '\0';

for (left = 0, right = strlen(s) -1; left < right; left++, right--) {

temp = s[left];

s[left] = s[right];

s[right] = temp;

}

} /* end of itoa function*/

If you run into any other C language problems, consult a good book like “The ANSI C Programming Language” by Kernighan and Ritchie.

Questions:

1.  In this project we are using a blocking receive (the last parameter of msgrcv is a 0). If the purpose of this program is to detect broken links between nodes, what would happen if a node-to-node link failed based on the the way it has been specified.