Lab 4 Find the Parallel Sum Using Divide and Conquer

1. Assume there are 8 processing nodes and there are 256 numbers (1, 2,..., 256) to be added. The original list is initialized at processor 0, and then is divided into two halves to broadcast level by level until the leaves are reached as shown below (each processor will have 256/8=32 numbers).

After each processor receives its final sublist (32 numbers), it calculates its partial sum. The partial sums will be reduced level by level as shown below. The final sum will be obtained and printed out on processor 0.

(1) Log in andy.csi.cuny.edu using your account and password. If you access it off campus, connect chizen.csi.cuny.edu first, and then ssh andy.csi.cuny.edu.

(2) Create a new directory lab4 and enter the new directory.

(3) Write a program named DCParallelSum.c.

(4) Compile your program, using the following command:

mpicc DCParallelSum.c –o DCParallellSum

(5) create a job script named myjob and use 8 nodes to run your program. You can copy the job script file given in lab1 below, and then modify it accordingly.

#!/bin/bash

#PBS -N job3

#PBS -q production

#PBS -l select=8:ncpus=1

#PBS -l place=free

#PBS -V

cd $PBS_O_WORKDIR

mpirun -np 8 -machinefile $PBS_NODEFILE ./hello

(6) Then run command if necessary:

chmod a+x myjob

(7) Then run command:

qsub myjob

(8) Run the following command to check the status of your program:

qstat

(9) When your program is finished, list your files in the current directory to check whether you have file named like myjob.exxxxx and myjob.oxxxxx. Then open the file myjob.oxxxxx and check the content. You should have the following content:

Processor 0 sends sublist to processor 4 at level 1

Processor 0 sends sublist to processor 2 at level 2

Processor 0 sends sublist to processor 1 at level 3

The partial sum on processor 0 is 528

Processor 1 receives sublist from processor 0 at level 3

The partial sum on processor 1 is 1552

Processor 1 sends sublist to processor 0 at level 0

Processor 2 receives sublist from processor 0 at level 2

Processor 2 sends sublist to processor 3 at level 3

The partial sum on processor 2 is 2576

Processor 4 receives sublist from processor 0 at level 1

Processor 4 sends sublist to processor 6 at level 2

Processor 4 sends sublist to processor 5 at level 3

The partial sum on processor 4 is 4624

Processor 6 receives sublist from processor 4 at level 2

Processor 6 sends sublist to processor 7 at level 3

The partial sum on processor 6 is 6672

Processor 4 receives sublist from processor 5 at level 3

Processor 0 receives sublist from processor 1 at level 3

Processor 5 receives sublist from processor 4 at level 3

The partial sum on processor 5 is 5648

Processor 5 sends sublist to processor 4 at level 0

Processor 4 receives sublist from processor 6 at level 2

Processor 4 sends sublist to processor 0 at level 2

Processor 6 receives sublist from processor 7 at level 3

Processor 6 sends sublist to processor 4 at level 1

Processor 7 receives sublist from processor 6 at level 3

The partial sum on processor 7 is 7696

Processor 7 sends sublist to processor 6 at level 0

Processor 3 receives sublist from processor 2 at level 3

The partial sum on processor 3 is 3600

Processor 3 sends sublist to processor 2 at level 0

Processor 2 receives sublist from processor 3 at level 3

Processor 2 sends sublist to processor 0 at level 1

Processor 0 receives sublist from processor 2 at level 2

Processor 0 receives sublist from processor 4 at level 1

The final sum is 32896

2. Revise your program and script file accordingly so that your program is working for 16 nodes, and then run your program.