Name: Vi Thong

I. Project description:

This project is to find the locations of N particles confined in a 10x10x10 cube. Each particle has charge in the range [-1, 1] Coulomb and initial locations. The particles move under the influence of the Coulomb’s force and the following assumptions:

a) The mass of each particle is normalized to 1.

b) The components of the initial velocities of these particles follow the Gaussain distribution centered at (0, 0, 0) with variance 4.

c) Periodic boundary conditions are constrained in every dimension of the cube.

d) The particles move under the Newton’s second law.

The time step to find the final location is 100 and the step size is 10 –6 units. The speed up curve is plot for P =2, 4, 8, 16 processors and the number of particles N = 10000, 20000, 40000.

II. Algorithm description:

First, the N particles are distributed evenly among the processors to be initialized. The initialization is done on the charge, locations, and velocity. The charge of each particle is in the range [-1, 1]. The initial location of each particle is in the range [0, 10] for the x, y, and z directions. The initial velocities are in the range [0, 1]. All of these can be done by calling the random () functions provided in the C math library. Since this function returns value in the range [0, 1], appropriate parameters needed to be included to get the desired ranges of values above.

Second, after the initialization step above, each processor needs to get the values from the other processors. There are cases to consider when P =1, 2, 4, 8 where P is the number of processors.

If P =1 then only one processor to perform the entire initialization and updating steps.

If P =2 then each processor needs to exchange values with each other. That is to say each processor sends out it parts of values and receive the missing part from the other processor. Then each processor finds out the acceleration components in the x, y, z direction for each particle, update the locations and velocities of the particle. Then each processor exchange values again for 100 times.

In order to update the locations and velocities of the particles, each processor use the following formulas:

V (t) = V (t-Dt) + A (t-Dt) *Dt

X (t) = X (t-Dt) + V (t-Dt) *Dt + 0.5 * A (t-Dt) *Dt*Dt

Where V is the velocity and X is the location and A the acceleration. Value of Dt is 0.000001.

If P =4, then there are two parallel steps when processors are exchanging values. First 1 « 2, 3«4. Then 1«3, and 2«4, where « means exchange of values between the two processors.

If P =8, then there are three parallel steps when processors are exchanging values. First step is K«(K+1). Second step is: 1«3, 2«4, 5«7, 6«8. Third step is 1«5, 2«6, 3«7, 4«8.

To sum up, the flowchart is the following:

(a) ® (b) ®(c) ®(d)

where a = set particles charges, locations, and velocities

b = calculate force on each particle

c = update the locations

d = if reach step 100, stop, otherwise go back to b.

III. Programs with comments:

#include <math.h>

#include <time.h>

#include <stdlib.h>

#include <stdio.h>

#include "mpi.h"

#define N 1000

// Gaussian distributions

double drandGaussian(double);

double drandUniform(double, double);

int main(int argc, char *argv[])

{

int num_processor; // number of processor

int my_rank; // the rank of the processor

int numworkers; // number of slaves

int chunksize; // number of particles for

int start, end; // starting and ending location of the processor

int i; // running index

float k; // running index

int j; // running index

int indexmsg =1; // flag

int arraymsg =2; // flag

int index; // index of array

int dest; // destination processor

int me; // my_rank

int source; // source processor

float a,b,c; // distances

float dist; // distance

float res;

float starttime, endtime, interval; // starting and ending time

float delta=0.000001;

MPI_Status status;

float tempx, tempy, tempz; // temporary values

float px, py, pz; // previous value of x, y, z location

float pvx, pvy, pvz; // previous value of the velocity

float temx, temy, temz; // temporary values

// particles information

float x[N], y[N], z[N]; // x, y, z locations

float charge[N]; // charge of particle i

float vx[N], vy[N], vz[N]; // velocity in x, y, z direction

float ax[N], ay[N], az[N]; // acceleration in x, y, z direction

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&num_processor);

MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);

numworkers = num_processor -1; // exclude the master

chunksize = N/numworkers;

srandom((unsigned)time(NULL)); // random number seed

if(my_rank>0){ // exclude the master processor

// only 1 processor

if(numworkers==1){

starttime= MPI_Wtime();

for(i=0; i<N; i++){ // initialization

charge[i] = 2*(random() / (float) 0x7fffffff)-1;

x[i] = 10*(random() / (float) 0x7fffffff);

y[i] = 10*(random() / (float) 0x7fffffff);

z[i] = 10*(random() / (float) 0x7fffffff);

vx[i]= .4;// drandGaussian(2);

vy[i]= .5;// drandGaussian(2);

vz[i]= .2;// drandGaussian(2);

}

// forces in x, y, z direction for particle i

for(i= 0; i<N; i++){

temx= 0;

temy =0;

temz =0;

for(j=0; j<N; j++){

a=b=res=dist =0;

if(j!=i) {

a= x[i]-x[j];

b= y[i]-y[j];

c= z[i]-z[j];

res= a*a+b*b+c*c;

dist = sqrt(res);

temx = temx +(float)((float)charge[i]*charge[j]*a/(float)(dist*dist*dist));

temy = temy +(float)((float)charge[i]*charge[j]*b/(float)(dist*dist*dist));

temz = temz +(float)((float)charge[i]*charge[j]*c/(float)(dist*dist*dist));

}

ax[i]= temx; // since mass =1, ax[i] is equal to fx[i]

ay[i]= temy;

az[i]= temz;

}

delta = 0.000001; // time step size

for(k=0; k<100; k++){ // do for a hundred time step

for(j=0; j<N; j++){

pvx = vx[j]; // save the velocities, location

pvy = vy[j];

pvz = vz[j];

px = x[j];

py = y[j];

pz = z[j];

// update the velocities

vx[j] = vx[j] +ax[j] *delta;

vy[j] = vy[j] +ay[j]*delta;

vz[j] = vz[j] +az[j]* delta;

// update the locations

tempx = x[j] + delta*pvx + .5*ax[j]*delta*delta;

// periodic boundary conditions

if(tempx>10)

tempx = tempx -10;

x[j]= tempx;

tempy = y[j]+ delta*pvy+ .5*ay[j]*delta*delta;

if(tempy>10)

tempy = tempy -10;

y[j] = tempy;

tempz = z[j] + delta*pvz + .5*az[j]* delta*delta;

if(tempz>10)

tempz = tempz -10;

z[j] = tempz;

}

// marks the ending time

endtime =MPI_Wtime();

interval = endtime - starttime;

printf("Only 1 processor in this case.\n");

printf("I took %f time to finish.\n", interval);

}//end numworkers==1

// there are 2 slave processors

else if(numworkers==2){

start = (my_rank -1)*chunksize;

end = start+chunksize;

if(my_rank==1){

starttime = MPI_Wtime();

// initialization

for(i=start; i<end; i++){

charge[i] = 2*(random() / (float) 0x7fffffff)-1;

x[i] = 10*(random() / (float) 0x7fffffff);

y[i] = 10*(random() / (float) 0x7fffffff);

z[i] = 10*(random() / (float) 0x7fffffff);

vx[i]=.5;//abs(drandGaussian(2));

vy[i]=.5;//abs( drandGaussian(2));

vz[i]=.5;//abs( drandGaussian(2));

}//end initialization

// do for a 100 steps

for(k=0; k<100; k++){

dest =2;

MPI_Send(&start, 1, MPI_INT, dest, indexmsg, MPI_COMM_WORLD);

MPI_Send(&charge[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);

MPI_Send(&x[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);

MPI_Send(&y[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);

MPI_Send(&z[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);

MPI_Send(&vx[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);

MPI_Send(&vy[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);

MPI_Send(&vz[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);

source =2;

MPI_Recv(&index, 1, MPI_INT, source, indexmsg, MPI_COMM_WORLD, &status);

MPI_Recv(&charge[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);

MPI_Recv(&x[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);

MPI_Recv(&y[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);

MPI_Recv(&z[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);

MPI_Recv(&vx[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);

MPI_Recv(&vy[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);

MPI_Recv(&vz[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);

// find accelerations, which equals to force components since m =1

for(i= start; i<end; i++){

temx= 0;

temy =0;

temz =0;

for(j=0; j<N; j++){

a=b=res=dist =0;

if(j!=i) {

a= x[i]-x[j];

b= y[i]-y[j];

c= z[i]-z[j];

res= a*a+b*b+c*c;

dist = sqrt(res);

temx = temx +(float)((float)charge[i]*charge[j]*a/(float)(dist*dist*dist));

temy = temy +(float)((float)charge[i]*charge[j]*b/(float)(dist*dist*dist));

temz = temz +(float)((float)charge[i]*charge[j]*c/(float)(dist*dist*dist));

}

ax[i]= temx;

ay[i]= temy;

az[i]= temz;

}//end acceleration

delta = 0.000001;

for(j=start; j<end; j++){

pvx = vx[j];

pvy = vy[j];

pvz = vz[j];

px = x[j];

py = y[j];

pz = z[j];

vx[j] = vx[j] +ax[j] *delta;

vy[j] = vy[j] +ay[j]*delta;

vz[j] = vz[j] +az[j]* delta;

tempx = x[j] + delta*pvx + .5*ax[j]*delta*delta;

if(tempx>10)

tempx = tempx -10;

x[j]= tempx;

tempy = y[j]+ delta*pvy+ .5*ay[j]*delta*delta;

if(tempy>10)

tempy = tempy -10;

y[j] = tempy;

tempz = z[j] + delta*pvz + .5*az[j]* delta*delta;

if(tempz>10)

tempz = tempz -10;

z[j] = tempz;

}//end locations update

}//end 100 loops

endtime = MPI_Wtime();

interval = endtime- starttime;

printf("I am %d and I took %f time to finish.\n", my_rank, interval);

}//end my_rank==1

else{ // start my_rank==2

start = (my_rank -1)*chunksize;

end = start+chunksize;

starttime= MPI_Wtime();

// initialization

for(i=start; i<end; i++){

charge[i] = 2*(random() / (float) 0x7fffffff)-1;

x[i] = 10*(random() / (float) 0x7fffffff);

y[i] = 10*(random() / (float) 0x7fffffff);

z[i] = 10*(random() / (float) 0x7fffffff);

vx[i]= .5;//abs(drandGaussian(2));

vy[i]=.5;//abs( drandGaussian(2));

vz[i]=.5;//abs( drandGaussian(2));

}//end initialization

// do for a 100 step

for(k=0; k<100; k++){

source =1;