Name: Vi Thong
I. Project description:
This project is to find the locations of N particles confined in a 10x10x10 cube. Each particle has charge in the range [-1, 1] Coulomb and initial locations. The particles move under the influence of the Coulomb’s force and the following assumptions:
a) The mass of each particle is normalized to 1.
b) The components of the initial velocities of these particles follow the Gaussain distribution centered at (0, 0, 0) with variance 4.
c) Periodic boundary conditions are constrained in every dimension of the cube.
d) The particles move under the Newton’s second law.
The time step to find the final location is 100 and the step size is 10 –6 units. The speed up curve is plot for P =2, 4, 8, 16 processors and the number of particles N = 10000, 20000, 40000.
II. Algorithm description:
First, the N particles are distributed evenly among the processors to be initialized. The initialization is done on the charge, locations, and velocity. The charge of each particle is in the range [-1, 1]. The initial location of each particle is in the range [0, 10] for the x, y, and z directions. The initial velocities are in the range [0, 1]. All of these can be done by calling the random () functions provided in the C math library. Since this function returns value in the range [0, 1], appropriate parameters needed to be included to get the desired ranges of values above.
Second, after the initialization step above, each processor needs to get the values from the other processors. There are cases to consider when P =1, 2, 4, 8 where P is the number of processors.
If P =1 then only one processor to perform the entire initialization and updating steps.
If P =2 then each processor needs to exchange values with each other. That is to say each processor sends out it parts of values and receive the missing part from the other processor. Then each processor finds out the acceleration components in the x, y, z direction for each particle, update the locations and velocities of the particle. Then each processor exchange values again for 100 times.
In order to update the locations and velocities of the particles, each processor use the following formulas:
V (t) = V (t-Dt) + A (t-Dt) *Dt
X (t) = X (t-Dt) + V (t-Dt) *Dt + 0.5 * A (t-Dt) *Dt*Dt
Where V is the velocity and X is the location and A the acceleration. Value of Dt is 0.000001.
If P =4, then there are two parallel steps when processors are exchanging values. First 1 « 2, 3«4. Then 1«3, and 2«4, where « means exchange of values between the two processors.
If P =8, then there are three parallel steps when processors are exchanging values. First step is K«(K+1). Second step is: 1«3, 2«4, 5«7, 6«8. Third step is 1«5, 2«6, 3«7, 4«8.
To sum up, the flowchart is the following:
(a) ® (b) ®(c) ®(d)
where a = set particles charges, locations, and velocities
b = calculate force on each particle
c = update the locations
d = if reach step 100, stop, otherwise go back to b.
III. Programs with comments:
#include <math.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"
#define N 1000
// Gaussian distributions
double drandGaussian(double);
double drandUniform(double, double);
int main(int argc, char *argv[])
{
int num_processor; // number of processor
int my_rank; // the rank of the processor
int numworkers; // number of slaves
int chunksize; // number of particles for
int start, end; // starting and ending location of the processor
int i; // running index
float k; // running index
int j; // running index
int indexmsg =1; // flag
int arraymsg =2; // flag
int index; // index of array
int dest; // destination processor
int me; // my_rank
int source; // source processor
float a,b,c; // distances
float dist; // distance
float res;
float starttime, endtime, interval; // starting and ending time
float delta=0.000001;
MPI_Status status;
float tempx, tempy, tempz; // temporary values
float px, py, pz; // previous value of x, y, z location
float pvx, pvy, pvz; // previous value of the velocity
float temx, temy, temz; // temporary values
// particles information
float x[N], y[N], z[N]; // x, y, z locations
float charge[N]; // charge of particle i
float vx[N], vy[N], vz[N]; // velocity in x, y, z direction
float ax[N], ay[N], az[N]; // acceleration in x, y, z direction
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&num_processor);
MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);
numworkers = num_processor -1; // exclude the master
chunksize = N/numworkers;
srandom((unsigned)time(NULL)); // random number seed
if(my_rank>0){ // exclude the master processor
// only 1 processor
if(numworkers==1){
starttime= MPI_Wtime();
for(i=0; i<N; i++){ // initialization
charge[i] = 2*(random() / (float) 0x7fffffff)-1;
x[i] = 10*(random() / (float) 0x7fffffff);
y[i] = 10*(random() / (float) 0x7fffffff);
z[i] = 10*(random() / (float) 0x7fffffff);
vx[i]= .4;// drandGaussian(2);
vy[i]= .5;// drandGaussian(2);
vz[i]= .2;// drandGaussian(2);
}
// forces in x, y, z direction for particle i
for(i= 0; i<N; i++){
temx= 0;
temy =0;
temz =0;
for(j=0; j<N; j++){
a=b=res=dist =0;
if(j!=i) {
a= x[i]-x[j];
b= y[i]-y[j];
c= z[i]-z[j];
res= a*a+b*b+c*c;
dist = sqrt(res);
temx = temx +(float)((float)charge[i]*charge[j]*a/(float)(dist*dist*dist));
temy = temy +(float)((float)charge[i]*charge[j]*b/(float)(dist*dist*dist));
temz = temz +(float)((float)charge[i]*charge[j]*c/(float)(dist*dist*dist));
}
}
ax[i]= temx; // since mass =1, ax[i] is equal to fx[i]
ay[i]= temy;
az[i]= temz;
}
delta = 0.000001; // time step size
for(k=0; k<100; k++){ // do for a hundred time step
for(j=0; j<N; j++){
pvx = vx[j]; // save the velocities, location
pvy = vy[j];
pvz = vz[j];
px = x[j];
py = y[j];
pz = z[j];
// update the velocities
vx[j] = vx[j] +ax[j] *delta;
vy[j] = vy[j] +ay[j]*delta;
vz[j] = vz[j] +az[j]* delta;
// update the locations
tempx = x[j] + delta*pvx + .5*ax[j]*delta*delta;
// periodic boundary conditions
if(tempx>10)
tempx = tempx -10;
x[j]= tempx;
tempy = y[j]+ delta*pvy+ .5*ay[j]*delta*delta;
if(tempy>10)
tempy = tempy -10;
y[j] = tempy;
tempz = z[j] + delta*pvz + .5*az[j]* delta*delta;
if(tempz>10)
tempz = tempz -10;
z[j] = tempz;
}
}
// marks the ending time
endtime =MPI_Wtime();
interval = endtime - starttime;
printf("Only 1 processor in this case.\n");
printf("I took %f time to finish.\n", interval);
}//end numworkers==1
// there are 2 slave processors
else if(numworkers==2){
start = (my_rank -1)*chunksize;
end = start+chunksize;
if(my_rank==1){
starttime = MPI_Wtime();
// initialization
for(i=start; i<end; i++){
charge[i] = 2*(random() / (float) 0x7fffffff)-1;
x[i] = 10*(random() / (float) 0x7fffffff);
y[i] = 10*(random() / (float) 0x7fffffff);
z[i] = 10*(random() / (float) 0x7fffffff);
vx[i]=.5;//abs(drandGaussian(2));
vy[i]=.5;//abs( drandGaussian(2));
vz[i]=.5;//abs( drandGaussian(2));
}//end initialization
// do for a 100 steps
for(k=0; k<100; k++){
dest =2;
MPI_Send(&start, 1, MPI_INT, dest, indexmsg, MPI_COMM_WORLD);
MPI_Send(&charge[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&x[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&y[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&z[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vx[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vy[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vz[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
source =2;
MPI_Recv(&index, 1, MPI_INT, source, indexmsg, MPI_COMM_WORLD, &status);
MPI_Recv(&charge[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&x[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&y[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&z[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vx[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vy[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vz[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
// find accelerations, which equals to force components since m =1
for(i= start; i<end; i++){
temx= 0;
temy =0;
temz =0;
for(j=0; j<N; j++){
a=b=res=dist =0;
if(j!=i) {
a= x[i]-x[j];
b= y[i]-y[j];
c= z[i]-z[j];
res= a*a+b*b+c*c;
dist = sqrt(res);
temx = temx +(float)((float)charge[i]*charge[j]*a/(float)(dist*dist*dist));
temy = temy +(float)((float)charge[i]*charge[j]*b/(float)(dist*dist*dist));
temz = temz +(float)((float)charge[i]*charge[j]*c/(float)(dist*dist*dist));
}
}
ax[i]= temx;
ay[i]= temy;
az[i]= temz;
}//end acceleration
delta = 0.000001;
for(j=start; j<end; j++){
pvx = vx[j];
pvy = vy[j];
pvz = vz[j];
px = x[j];
py = y[j];
pz = z[j];
vx[j] = vx[j] +ax[j] *delta;
vy[j] = vy[j] +ay[j]*delta;
vz[j] = vz[j] +az[j]* delta;
tempx = x[j] + delta*pvx + .5*ax[j]*delta*delta;
if(tempx>10)
tempx = tempx -10;
x[j]= tempx;
tempy = y[j]+ delta*pvy+ .5*ay[j]*delta*delta;
if(tempy>10)
tempy = tempy -10;
y[j] = tempy;
tempz = z[j] + delta*pvz + .5*az[j]* delta*delta;
if(tempz>10)
tempz = tempz -10;
z[j] = tempz;
}//end locations update
}//end 100 loops
endtime = MPI_Wtime();
interval = endtime- starttime;
printf("I am %d and I took %f time to finish.\n", my_rank, interval);
}//end my_rank==1
else{ // start my_rank==2
start = (my_rank -1)*chunksize;
end = start+chunksize;
starttime= MPI_Wtime();
// initialization
for(i=start; i<end; i++){
charge[i] = 2*(random() / (float) 0x7fffffff)-1;
x[i] = 10*(random() / (float) 0x7fffffff);
y[i] = 10*(random() / (float) 0x7fffffff);
z[i] = 10*(random() / (float) 0x7fffffff);
vx[i]= .5;//abs(drandGaussian(2));
vy[i]=.5;//abs( drandGaussian(2));
vz[i]=.5;//abs( drandGaussian(2));
}//end initialization
// do for a 100 step
for(k=0; k<100; k++){
source =1;
MPI_Recv(&index, 1, MPI_INT, source, indexmsg, MPI_COMM_WORLD, &status);
MPI_Recv(&charge[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&x[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&y[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&z[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vx[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vy[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vz[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
dest =1;
MPI_Send(&start, 1, MPI_INT, dest, indexmsg, MPI_COMM_WORLD);
MPI_Send(&charge[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&x[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&y[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&z[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vx[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vy[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vz[start],chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
// force or acclerations components
for(i= start; i<end; i++){
temx= 0;
temy =0;
temz =0;
for(j=0; j<N; j++){
a=b=res=dist =0;
if(j!=i) {
a= x[i]-x[j];
b= y[i]-y[j];
c= z[i]-z[j];
res= a*a+b*b+c*c;
dist = sqrt(res);
temx = temx +(float)((float)charge[i]*charge[j]*a/(float)(dist*dist*dist));
temy = temy +(float)((float)charge[i]*charge[j]*b/(float)(dist*dist*dist));
temz = temz +(float)((float)charge[i]*charge[j]*c/(float)(dist*dist*dist));
}
}
ax[i]= temx;
ay[i]= temy;
az[i]= temz;
}//end acceleration
delta = 0.000001;
for(j=start; j<end; j++){
pvx = vx[j];
pvy = vy[j];
pvz = vz[j];
px = x[j];
py = y[j];
pz = z[j];
vx[j] = vx[j] +ax[j] *delta;
vy[j] = vy[j] +ay[j]*delta;
vz[j] = vz[j] +az[j]* delta;
tempx = x[j] + delta*pvx + .5*ax[j]*delta*delta;
if(tempx>10)
tempx = tempx -10;
x[j]= tempx;
tempy = y[j]+ delta*pvy+ .5*ay[j]*delta*delta;
if(tempy>10)
tempy = tempy -10;
y[j] = tempy;
tempz = z[j] + delta*pvz + .5*az[j]* delta*delta;
if(tempz>10)
tempz = tempz -10;
z[j] = tempz;
}//end coord update
}//end 100 loops
endtime = MPI_Wtime();
interval = endtime- starttime;
printf("I am %d and I took %f time to finsish.\n", my_rank, interval);
}//end my_rank==2
}//end numworkers==2
// there are 4 slave processors
else if(numworkers==4){
starttime = MPI_Wtime();
start = (my_rank -1)*chunksize;
end = start+chunksize;
// initialization
for(i=start ; i<end; i++){
charge[i] = 2*(random() / (float) 0x7fffffff)-1;
x[i] = 10*(random() / (float) 0x7fffffff);
y[i] = 10*(random() / (float) 0x7fffffff);
z[i] = 10*(random() / (float) 0x7fffffff);
vx[i]=.5;//abs(drandGaussian(2));
vy[i]=.5;//abs( drandGaussian(2));
vz[i]=.5;//abs( drandGaussian(2));
}
for(j=0; j<100; j++){ // starts the 100 loops
me = my_rank;
// 1st step : exchange neighboring processors like this 1<->2, 3<->4, 5 <->6
if((me%2)==1){ // odd guy
dest = me+1;
MPI_Send(&start, 1, MPI_INT, dest, indexmsg, MPI_COMM_WORLD);
MPI_Send(&charge[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&x[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&y[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&z[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vx[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vy[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vz[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
source =me+1;
MPI_Recv(&index, 1, MPI_INT, source, indexmsg, MPI_COMM_WORLD, &status);
MPI_Recv(&charge[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&x[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&y[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&z[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vx[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vy[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vz[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
}
else{ // even guy 1<->2, 3<->4, ...etc
source =me-1;
MPI_Recv(&index, 1, MPI_INT, source, indexmsg, MPI_COMM_WORLD, &status);
MPI_Recv(&charge[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&x[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&y[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&z[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vx[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vy[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vz[index], chunksize, MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
dest = me-1;
MPI_Send(&start, 1, MPI_INT, dest, indexmsg, MPI_COMM_WORLD);
MPI_Send(&charge[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&x[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&y[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&z[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vx[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vy[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vz[start], chunksize, MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
}
start = (my_rank -1)*chunksize;
// 2nd step: exchange with processor two steps away
if(me==1){
dest = me+2;
MPI_Send(&start, 1, MPI_INT, dest, indexmsg, MPI_COMM_WORLD);
MPI_Send(&charge[start],(2*chunksize), MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&x[start],(2*chunksize), MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&y[start],(2*chunksize), MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&z[start],(2*chunksize), MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vx[start],(2*chunksize), MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vy[start],(2*chunksize), MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vz[start],(2*chunksize), MPI_INT, dest, arraymsg, MPI_COMM_WORLD);
source =me+2;
MPI_Recv(&index, 1, MPI_INT, source, indexmsg, MPI_COMM_WORLD, &status);
MPI_Recv(&charge[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&x[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&y[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&z[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vx[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vy[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vz[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
}
if(me==3){
source =me-2;
MPI_Recv(&index, 1, MPI_INT, source, indexmsg, MPI_COMM_WORLD, &status);
MPI_Recv(&charge[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&x[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&y[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&z[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vx[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vy[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
MPI_Recv(&vz[index], (2*chunksize), MPI_INT, source, arraymsg, MPI_COMM_WORLD, &status);
dest = me-2;
MPI_Send(&start, 1, MPI_INT, me-2, indexmsg, MPI_COMM_WORLD);
MPI_Send(&charge[start], (2*chunksize), MPI_INT, me-2, arraymsg, MPI_COMM_WORLD);
MPI_Send(&x[start], (2*chunksize), MPI_INT, me-2, arraymsg, MPI_COMM_WORLD);
MPI_Send(&y[start], (2*chunksize), MPI_INT, me-2, arraymsg, MPI_COMM_WORLD);
MPI_Send(&z[start], (2*chunksize), MPI_INT, me-2, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vx[start], (2*chunksize), MPI_INT, me-2, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vy[start], (2*chunksize), MPI_INT, me-2, arraymsg, MPI_COMM_WORLD);
MPI_Send(&vz[start], (2*chunksize), MPI_INT, me-2, arraymsg, MPI_COMM_WORLD);
}
if(me==2){