1) Project Description

AMS530/Project 1/problem 1.1

1) Project Description:

the purpose of this code is to calculate execution time needed for 1 processor ( in our case proc 0--master proc)to send a message of floats for 4 other processors ( in our case processors 1,2,3,4--slave processors). The size of the message is predetermined by the user. In this project, I use the sizes: 1MB, 20MB, 30MB, 40MB.Running the code for each these message sizes, we should get in total 16 execution times (for each processor we get 4 execution times/ 4 messages.

We use the fact that every float = 16 bytes. For example to get a message of size 1MB we need to define our message as follows: float message[62500], since

1000000/16=62500.

We use the output of this code for 2 purposes:

First: we plot message size N vs. the execution times, which are 16 in total. The graph for this case is sent as a separate attachment.

Second: we calculate the latency and bandwidth of sending messages of the above sizes by using the model equation: t(N)=t0 + alpha*N, where t(N)=total execution time needed to send a message of size N, and 1/alpha = bandwidth, t0= latency. In this project, I obtain the values of latency and bandwidth by plotting a graph of N ( choose 2 values only) vs. total time ( this is taken to be the AVERAGE TIME of all the execution times of the 4 processors for a given message.)

2) Results:

Please refer to two attached graphs for this project: One for Blocking message mode and the other for non-blocking

I also attached a word file that contains the timing results for each processor/message size.

3) Analysis and comments:

The following observations can be made regarding the above results

1)It is noted that the execution time is less in case of the non-blocking message mode. This agrees with our intuition and expectations because of the asynchronized nature of networking the processors.

2) It also noted that if the code is run at different times even for the same message size and number processors, different execution times will be obtained.This reflects the fact that the availability of the processors depends on how much it is also used by other users of Galaxy.

3)Despite the fact the code I wrote uses minimum I/O statements, the execution times obtained for each processor don’t necessarily reflect “correct” answers, in the sense that they might not completely reliable in calculating the latency and bandwidth because we are not accounting exactly for the load imbalance ( which is feasible) and also for communication overhead ( which is always there). In this sense, to get as much accurate results as possible, the above code should be run at different times of the day, and many experiments needs to be done and only then can we get an “enough” number of data points which makes more likely that we get good results.

4)In my calculation of the latency and bandwidth, I used the average time of the 4 processors involved. This is only a sensible choice and it is usually problem dependent, but conventionally, one should take the maximum time of all the times.

5)The above code didn’t run for messages of sizes 50MB and 100 MB. Not even for a message of size 40 MB. In each of these cases, the code gave an error message as follows: “ segmentation fault”. My reasoning is that if the message size is increased to more than 40 MB then more memory allocation is needed to communicate this message. And in case of Galaxy, since many users are making use of the available nodes and also many memory is used per se, this may explain why such a “huge” message wouldn’t be communicated. What enforces this explanation is that the same error occurred with both the blocking and non-blocking code, so it is not MPI- commands dependent and it also doesn’t seem to depend on the mode of communication (blocking or non-blocking), it is simply Galaxy dependent.

4) calculation of latency and bandwidth:

1)blocking case:

the equation of the line passing through ( 10MB, 0.314 sec) and ( 20 MB, 0.335) is

t = 0.104 + 0.21* N , hence by using the equation model mentioned earlier , we get:

latency = 0.104 , bandwidth = 1/0.21= 4.76

2)Non-blocking case : we use same model equation but using different data points as follows:

(10MB, 0.312 sec) , ( 30MB, 0.318 sec)

Equation of line passing through these points:

t(N) = 0.318 + 0.003*N

hence latency = 0.318 , bandwidth= 1/0.003 =333.33

Comment: By definition bandwidth is a measure of communication or data-carrying capacity, typically measured in either "megabits per second" (Mb/s) or "megabytes per second" (MB/s, where 1 MB/s = 8 Mb/s).In the above calculation , we see that in case of non-blocking message passing, the bandwidth is much larger and this is quite expected. As for the latency , the difference is not big since it is the time to start up an operation and in our case, the starting up is almost the same for both modes of communication.