Priyank Tiwari

PROJECT 2

Submitted

Priyank Tiwari

Project Description:

Problem
In 3D, N particles are arranged, initially, in a 101010 cube, i.e., any triplet of random numbers in [0, 10] can be the coordinates of a particle. These particles interact under the so-called Lennard-Jones potential

where, Vij is the pair-wise potential between particles i and j with distance rij. Write a parallel program to minimize the total (Lennard-Jones potential) energy of the particle system by adjusting particle positions. Please do the following three parts:

(1)Compute the minimal energy for N=16 particles by any method of your choice by 1 node. You will invest the best but practical efforts for this part of calculation (e.g., run your code for a couple of hours on one node.)

(2)Now, use P=2, 4, 8 nodes to minimize the system energy, individually. You terminate the minimizations when you reach at a minimal energy similar to the minimal energy obtained by P=1 (as in (1) above). Results within 5 percent of relative errors are considered the similar.

(3)Plot the speedup curve.

ALGORITHM

Assign the temperature value Some initial temperature=10,000
Assign some initial position to the particles
Calculate the potential at that position of the particles and assign it as OldCost
While temperature > lower threshold(0.1)
randomize the co-ordinates of the particle locally at each processor
Calculate the potential at this position of the particles and assign it as NewCost
Calculate the difference Deltacost=NewCost-Old Cost
If delta cost <0…Copy the NewCost into OldCost(indicating good move)
Else check the condition of acceptance by simulated annealing module
keep on cooling the temperature till step 4 allows to go further

Computation of Cost function(before and after randomizing) is done as follows

/* The forces between the n = N/p local particles on each processor are computed locally*/

calculate n=N/p
process=0
copy the coordinates from one processor P to processor P+2 in Ring topology

( Copying done by interleaving i.e 0 sends to 2, 1 sends to 3..)

process++
Is process=n-1?
If no go to 3

Calculate the local potential on each Processor after getting the rotated coordinates

Each processor sends the calculated local potential to processor 0

/* The above loop will have to be executed proportional to the time we want the code to run and get an optimum value of positions of the particle*/

The Pseudo Code is given as below:

PSEUDO-CODE

TEMPERATURE = INIT_TEMP;

X = INIT_POSITION

OldCost = COST(X)

While (TEMPERATURE > 0.1)

{

While ( INNER_LOOP_CRITERION = TRUE)

{

New_position = GENERATE(X);

NewCost = COST(New_position);

DeltaCost = NewCost – OldCost ;

If (DeltaCost < 0)

{

X = New_position ;

OldCost = NewCost;

}

else
{

r = RANDOM(0,1);

y = exp( -DeltaCost / TEMPERATURE);

if(r < y)

{

X = New_position ;

OldCost = NewCost;

}

else

REJECT(New_position);

}

TEMPERATURE = ALPHA * TEMPERATURE;// ( 0 < ALPHA < 1 )

}

The computation of the cost function is done as follows:

/* The forces between the n = N/p local particles on each processor are computed. This computation can proceed fully in parallel.

The local particles are copied to the Rotating particles on each processor,

for (i = 0; i < particlesPerProcess; i++)

for (k = 0; k < 3; k++)

Rotating[i][k] = r[i][k];

The following steps are repeated p - 1 times:

The rotating particles are shifted to the right neighbor processor in the ring Topology. The movement of the p sets of traveling particles can proceed in parallel.

for (process = 0; process < numberOfProcesses - 1; process++)

{

// Copy traveling particles to send buffer

for (i = 0; i < particlesPerProcess; i++)

for (k = 0; k < 3; k++)

sendBuffer[i][k] = Rotating [i][k];

// interleave send/receive to prevent deadlock

if (my_rank % 2 == 0)

{

// Even processes send/receive

// Send traveling particles to next process

MPI_Send(sendBuffer[0], 3 * particlesPerProcess, MPI_DOUBLE,

nextProcess, 0, MPI_COMM_WORLD);

// Receive traveling particles from previous process

MPI_Recv(Rotating [0], 3 * particlesPerProcess, MPI_DOUBLE,

previousProcess, 0, MPI_COMM_WORLD, &status);

}

else

{

// Odd processes receive/send

MPI_Recv(Rotating [0], 3 * particlesPerProcess, MPI_DOUBLE,

previousProcess, 0, MPI_COMM_WORLD, &status);

MPI_Send(sendBuffer[0], 3 * particlesPerProcess, MPI_DOUBLE,

nextProcess, 0, MPI_COMM_WORLD);

}

The forces between the n local particles and the n traveling particles on each processor is computed. This computation can proceed fully in parallel.

- The GENERATE(X) function :- This function changes the positions of the particles by moving them and assigning new co-ordinates randomly.

- The Simulated Annealing algorithm parameters :-

Initial temperature = 10000

Freezing temperature = 0.1

Cooling Schedule parameter “alpha” = 0.8

Inner loop count = 100

PROGRAM CODE :-

#include<stdio.h>

#include<math.h>

#include<stdlib.h>

#include<mpi.h>

double compute_potentials();// To compute the potential on each processor

#define Num_particles 16// Specify number of particles over here.

#define alpha 0.8// Specify value of cooling schedule parameter alpha.

// Global Declarations

int my_rank,p;

int part_per_proc, proc, min,max, Rotate, nextprocess, previousprocess, target_processor, target_particle;

int z,i,j,k,x,change,move_partnum, INNER_LOOP,offset, Recv_processor,change_particle;

double potential, starttime, communicationtime, temp,r;

double Rotating[Num_particles][3],sendbuffer[Num_particles][3];

double particle[Num_particles][3],temp_particle[Num_particles][3], newcod_particle[Num_particles][3], temp_potential[Num_particles];

double Temperature,oldcost,newcost,delta_cost,m,n,y,lowestcost=0.0;

double starttime=0.0,endtime=0.0,totaltime=0.0,timeelapsed=0.0,time_taken=0.0;

MPI_Status status;

main(int argc, char *argv[])

{

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Comm_size(MPI_COMM_WORLD, &p);

Temperature = 10000;// Initial temperature value for SA.

printf("\n I am %d", my_rank);

part_per_proc = Num_particles/p;// Particles per processor.

printf("\n PArticles per Process = %d", part_per_proc);

if(my_rank==0)

{

nextprocess = my_rank+1;

previousprocess = p-1;

printf("\n Nextprocess = %d and Previous Process =%d", nextprocess, previousprocess);

}

else

{

if(my_rank==(p-1))

{

nextprocess = 0;

previousprocess = my_rank-1;

printf("\n Nextprocess = %d and Previous Process =%d", nextprocess, previousprocess);

}

else

{

nextprocess = my_rank+1;

previousprocess = my_rank-1;

printf("\n Nextprocess = %d and Previous Process =%d", nextprocess, previousprocess);

}

if(my_rank==0)

{

offset = 0;

//CALCULATING THE INITIAL POSITIONS OF PARTICLES

potential = 0.0;

//srand();

for(i=0;i<Num_particles;i++)

{

for(x=0;x<3;x++)

particle[i][x]=rand()%11;

}

// Processor 0 sending the co-ordinates of n = N/p particles to every other processor

for(proc=1;proc<p;proc++)

{

offset = offset+part_per_proc;

printf("\nSending %d particles to processor %d offset=%d ",part_per_proc,proc,offset);

MPI_Send(&offset, 1, MPI_INT, proc, 1, MPI_COMM_WORLD);

MPI_Send(&particle[offset][0], 3*part_per_proc, MPI_DOUBLE, proc, 1, MPI_COMM_WORLD);

}

for(i=0;i<part_per_proc;i++)

{

//printf("\n******************");

for(k=0;k<3;k++)

{

temp_particle[i][k] = particle[i][k];

//printf(" %f\t",temp_particle[i][k]);

}

else

{

MPI_Recv(&offset, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);

MPI_Recv(&particle, 3*part_per_proc, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, &status);

for(i=0;i<part_per_proc;i++)

for(k=0;k<3;k++)

temp_particle[i][k] = particle[i][k];

}

//COMPUTE THE POTENTIAL FOR ALL THE PARTICLES

oldcost = compute_potentials();

// START THE TIMER

starttime = MPI_Wtime(); //starting timer

// OUTER LOOP FOR SA

while(Temperature>0.1)// FREEZING TEMPERATURE = 0.1

{

INNER_LOOP = 100;// No. of interations for INNER_LOOP = 100

while(INNER_LOOP!=0)

{

for(z=0;z<Num_particles;z++)

{

Recv_processor = z/part_per_proc;

change_particle = z % part_per_proc;

if(my_rank==0)

{

srand(rand());

for(k=0;k<3;k++)

{

temp_particle[z][k] = rand() % 11;// Move one particle at a time randomly

// printf(" %f\t",temp_particle[z][k]);

}

if(p>1 & Recv_processor!=0)

MPI_Send(&temp_particle[z][0],3, MPI_DOUBLE, Recv_processor, 1, MPI_COMM_WORLD);

}

if(p>1 & Recv_processor!=0 & my_rank==Recv_processor)

{

MPI_Recv(&temp_particle[change_particle][0],3, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, &status);

}

// COMPUTE THE POTENTIAL AS THE NEW COST FUNCTION.

temp_potential[z] = compute_potentials();

if(my_rank==0)

{

for(k=0;k<3;k++)

{

newcod_particle[z][k] = temp_particle[z][k];

temp_particle[z][k] = particle[z][k];

}

if(p>1 & Recv_processor!=0 & my_rank==Recv_processor)

{

for(k=0;k<3;k++)

{

newcod_particle[change_particle][k] = temp_particle[change_particle][k];

temp_particle[change_particle][k] = particle[change_particle][k];

}

// COMPARISON OF THE COST FUNCTION AND ACCEPTANCE OF THE NEW STATE

if(my_rank==0)

{

change = 0;

move_partnum = 0;

newcost = temp_potential[0];

for(i=0;i<Num_particles;i++)

{

if(temp_potential[i] < newcost)

{

newcost = temp_potential[i];

move_partnum = i;

}

//CALCULATE THE DIFFERENCE BETWEEN THE COST FUNCTIONS

delta_cost = newcost - oldcost;

if(delta_cost < 0)

{

oldcost = newcost;

change = 1;

}

else

{

m = (float)(rand())/RAND_MAX;

// printf("\nvalue of m is %f",m);

n = delta_cost/Temperature;

y = 1/exp(n);// ACCEPTANCE PROBABILITY FOR THE UPCLIMB MOVES.

if(m<y)// = EXP( - DELATCOST / TEMPERATUER)

{

oldcost = newcost;// MAKE THE MOEV PERMANENT

change = 1;

}

else change = 0;

}

if(oldcost < lowestcost)

if(change == 1)

{

for(k=0;k<3;k++)

particle[move_partnum][k] = newcod_particle[move_partnum][k];

//printf("\n The move_partnum = %d",move_partnum);

}

target_processor = move_partnum/part_per_proc;

target_particle = move_partnum % part_per_proc;

MPI_Bcast(&target_processor,1, MPI_INT,0, MPI_COMM_WORLD);

MPI_Bcast(&target_particle,1, MPI_INT,0, MPI_COMM_WORLD);

}

if(my_rank!=0)

{

MPI_Bcast(&target_processor,1, MPI_INT,0, MPI_COMM_WORLD);

MPI_Bcast(&target_particle,1, MPI_INT,0, MPI_COMM_WORLD);

}

if(my_rank==target_processor)

{

for(k=0;k<3;k++)

particle[target_particle][k] = newcod_particle[target_particle][k];

}

INNER_LOOP--;

}// END of INNER_LOOP

Temperature = alpha*Temperature;// COOLING SCHEDULE

}// END OF OUTER LOOP

printf("\n Lowest Cost = %f ", lowestcost);

endtime = MPI_Wtime();//stopping the time

time_taken=endtime-starttime;//computing time taken

// TIMING CALCULATIONS

for(j=0;j<p;j++)

{

if(my_rank==j)

MPI_Send(&time_taken,1,MPI_DOUBLE,0,j*3,MPI_COMM_WORLD);

}

if(my_rank==0)

{

totaltime=time_taken;

for(j=1;j<p;j++)

{

MPI_Recv(&timeelapsed,1,MPI_DOUBLE,j,j*3,MPI_COMM_WORLD,&status);

totaltime+=timeelapsed;

}

printf("\n\n Time taken for execution of the program is %f seconds\n",totaltime/p);

}

MPI_Finalize();

}//End of Main

/*************************************************************************************************************/

// COMPUTING THE POTENTIALS ON EACH PROCESSOR PARALLELY

double compute_potentials()

{

potential = 0.0;

// CALCULATING THE LOCAL POTENTIALS

for(i=0;i<part_per_proc;i++)

{

for(j=0;j<part_per_proc;j++)

{

r = pow(temp_particle[i][0] - temp_particle[j][0],2)+ pow(temp_particle[i][1] - temp_particle[j][1],2) + pow(temp_particle[i][2] - temp_particle[j][2],2);

if(r!=0)

potential+= (1.0/pow(r,6)) - (2.0/pow(r,3));

}

if(p>1)

{

//COPY THE POSITIONS OF THE LOCAL PARTICLES TO ROTATING PARTICLES

for(i=0;i<part_per_proc;i++)

for(k=0;k<3;k++)

Rotating[i][k] = temp_particle[i][k];

//The following steps are repeated p - 1 times:

for(Rotate=0;Rotate<p-1;Rotate++)

{

// Copy traveling particles to send buffer

for(i=0;i<part_per_proc;i++)

for(k=0;k<3;k++)

sendbuffer[i][k] = Rotating[i][k];

// Interleave send/receive to prevent deadlock

if(my_rank%2 == 0)

{

// Even processes send/receive

// Send traveling particles to next process

MPI_Send(&sendbuffer[0], 3*part_per_proc, MPI_DOUBLE, nextprocess,0, MPI_COMM_WORLD);

MPI_Recv(&Rotating[0], 3*part_per_proc, MPI_DOUBLE, previousprocess, 0, MPI_COMM_WORLD, &status);

}

else

{

MPI_Recv(&Rotating[0], 3*part_per_proc, MPI_DOUBLE, previousprocess, 0, MPI_COMM_WORLD, &status);

MPI_Send(&sendbuffer[0], 3*part_per_proc, MPI_DOUBLE, nextprocess, 0, MPI_COMM_WORLD);

}

//CALCULATING THE CONTRIBUTIONS TO LOCAL FORCES FROM ROTATING PARTICLES

for(i=0;i<part_per_proc;i++)

{

for(j=0;j<part_per_proc;j++)

{

r = pow(temp_particle[i][0] - Rotating[j][0],2)+ pow(temp_particle[i][1] - Rotating[j][1],2) + pow(temp_particle[i][2] - Rotating[j][2],2);

if(r!=0)

potential+= (1.0/pow(r,6)) - (2.0/pow(r,3));

}

// SENDING THE POTENTIAL VALUES FOR EACH PROCESSOR TO NODE 0

if(my_rank!=0)

MPI_Send(&potential, 1, MPI_DOUBLE, 0, my_rank*10, MPI_COMM_WORLD);

if(my_rank==0)

{

for(j=1;j<p;j++)

{

MPI_Recv(&temp, 1, MPI_DOUBLE, j, j*10, MPI_COMM_WORLD, &status);

potential+=temp;

}

return potential;

}

ANALYSIS :

The Results have been plotted for Number of particles as 16 & 32 (for 1,2,4&8 processors)

--for number of particles as 16 we can see the speed up only for 2 processor ,there after the time taken for execution increases with the number of processors. The reason may be attributed to the large overhead due to parallel processing, i.e the communication time is more than that of the computation time.

For number of particles 32 we can see the speed up for the 2,4,8 processor as well, because here the computation time is more than that of communication time.

16 particles are not too many for watching the speed up .So for getting the speed up we had to observe the readings for 32 particles as well in which a decent speed up can be observed.

RESULTS:

Timing Results:

Speed up curve:

MONTE –CARLO on next page…

We have also tried the Monte carlo method….The code is attached along with this mail…..

In Monte Carlo Method speed up curve is not obtained for the 16 particle case but a very feeble speed up is obtained in case of 32 particles..

PROGRAM ::

#include<math.h>

#include<mpi.h>

#include<stdlib.h>

#include<stdio.h>

//definition of maximum no of particles and the number of processors

#define MAX_PARTICLE 16 //change this value to change the maximum number

particles

#define NUM_PROCS 8

void change_coord(int pos);

void reset_potential();

void calc_potential();

void calc_energy();

double calc_kt();

struct particle

{

int x,y,z;

double potential;

};

//defining a number of particles

struct particle par[16];

double r;

double trial,current,energy,minimum_eng;

double kt,total1,total2,avg1;

double eng[MAX_PARTICLE],diff[MAX_PARTICLE];

double each_ans[NUM_PROCS], temp_result[NUM_PROCS]; //declaration of

variables for holding values

int p,my_rank,min,max,each;

//MPI code

MPI_Status status;

//Main function

main(int argc, char *argv[])

{

//rank of processor

int i,j,k,particle_no,count; //declaration of variables required

for

calculations

double c,a;//distance between 2 particles

double starttime=0, endtime=0; //variables used

for

calculating execution time

double time_taken=0,timeelapsed=0,totaltime=0; //variables used

for

calculating execution time

MPI_Init(&argc, &argv); //Initiate MPI

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); //Set COMM and my_rank

MPI_Comm_size(MPI_COMM_WORLD, &p); //find COMM's size

//assigning number of particles to each processor

each = MAX_PARTICLE/p;

/* Assigning the charges and co-ordinates to the particles */

starttime = MPI_Wtime(); //starting timer

if(my_rank==0)

{

for(i=0; i<MAX_PARTICLE; i++)

{

par[i].x=((int)((rand()%10)+1));

par[i].y=((int)((rand()%10)+1));

par[i].z=((int)((rand()%10)+1));

par[i].potential = 0;

for(j=0;j<p;j++)

{

MPI_Send(&par[i].x,1,MPI_INT,j,j*10,MPI_COMM_WORLD);

MPI_Send(&par[i].y,1,MPI_INT,j,j*10,MPI_COMM_WORLD);

MPI_Send(&par[i].z,1,MPI_INT,j,j*10,MPI_COMM_WORLD);

MPI_Send(&par[i].potential,1,MPI_DOUBLE,j,j*10,MPI_COMM_WORLD);

}

else

{ for(j=0;j<p;j++)

if((my_rank==j)&(my_rank!=0))

for(i=0;i<MAX_PARTICLE;i++)

{

MPI_Recv(&par[i].x,1,MPI_INT,0,j*10,MPI_COMM_WORLD,&status);

MPI_Recv(&par[i].y,1,MPI_INT,0,j*10,MPI_COMM_WORLD,&status);

MPI_Recv(&par[i].z,1,MPI_INT,0,j*10,MPI_COMM_WORLD,&status);

MPI_Recv(&par[i].potential,1,MPI_DOUBLE,0,j*10,MPI_COMM_WORLD,&stat

us);

}

calc_potential();

energy=0;

if(my_rank==0)

{

for(count=0;count<MAX_PARTICLE;count++)

{

energy+=par[count].potential;

}

printf("\nEnergy of the system before particles are moved is

%lf",energy);

}

current=energy;

kt=calc_kt();

if(my_rank==0) printf("\n\nWait patiently while program is being

executed");

if(p>1) minimum_eng=-4.074829;

else minimum_eng=current;

while(trial>minimum_eng)

for(count=0;count<10;count++)

{

//reset_potential();

if(my_rank==0)

{

particle_no=(rand()%MAX_PARTICLE)+1;

change_coord(particle_no);

reset_potential();

for(i=0;i<NUM_PROCS;i++)

{

each_ans[i]=0;

temp_result[i]=0;

}

calc_potential();

if(my_rank==0)calc_energy();

if(my_rank==0)

{

if(p>1)

{

if(trial>minimum_eng)

{

if(trial<current) current=trial;

else if(trial>=current)

{

c=((rand()%1)+1)/(sqrt(rand()%5));

a=(-(trial-current))/10;

if(c<exp(a))

current=trial;

else current=current;

}

else

{

if(trial<current) current=trial;

else if(trial>=current)

{

c=((rand()%1)+1)/(sqrt(rand()%5));

a=(-(trial-current))/10;

if(c<exp(a))

current=trial;

else current=current;

}

endtime = MPI_Wtime();//stopping the timer

time_taken=endtime-starttime;//computing time taken

if(my_rank==0)

printf("\n\nminimum energy of the system is %lf",current);

//Calculating the time taken by each processor and then sending it to

processor with Rank 0

for(j=0;j<p;j++)

{

if(my_rank==j)

MPI_Send(&time_taken,1,MPI_DOUBLE,0,j*3,MPI_COMM_WORLD);

}

if(my_rank==0)

{

totaltime=time_taken;

for(j=1;j<p;j++)

{

MPI_Recv(&timeelapsed,1,MPI_DOUBLE,j,j*3,MPI_COMM_WORLD,&status);

totaltime+=timeelapsed;

}

printf("\n\n Time taken for execution of the program is %f

seconds\n",totaltime/p);

}

MPI_Finalize();

}

//END OF MAIN FUNCTION

double calc_kt()

{

int count,particle_no,j;

double res;

for(count=0;count<3;count++)

{

particle_no=(rand()%3)+1;

change_coord(particle_no);

reset_potential();

calc_potential();

for(j=0;j<MAX_PARTICLE;j++)

eng[count]+=par[j].potential;

}

for(count=0;count<3;count++)

total1+=eng[count];

avg1=total1/3;

for(count=0;count<3;count++)

{diff[count]=eng[count]-avg1;

if(diff[count]<0) diff[count]=-(diff[count]);

total2+=diff[count];

}

res=total2/3;

return(res);

}

void change_coord(int pos)

{

par[pos].x=((int)((rand()%10)+1));

par[pos].y=((int)((rand()%10)+1));

par[pos].z=((int)((rand()%10)+1));

par[pos].potential = 0;

}

void reset_potential()

{

int i;

for(i=0;i<MAX_PARTICLE;i++)

{

par[i].potential=0;

}

void calc_potential()

{

int i,j=0,k,count;

for(i=0; i<MAX_PARTICLE; i++)

{

for(j=0;j<p;j++)

{

each_ans[j]=0;

if(my_rank==j)

{

min = (my_rank*each); //calculating which particles

being given to processor with rank j

max = (each*(my_rank+1));

for(k=min; k<max; k++)

{

r = sqrt(pow(par[i].x - par[k].x,2)+ pow(par[i].y - par[k].y,2)+

pow(par[i].z - par[k].z,2)); //compute distance between 2 particles

if(r!=0)

each_ans[j]+= (1/pow(r,12))-(2/pow(r,6)); //calculation of

potential by

each processor

else each_ans[j]+= 0;

}

if(my_rank!=0)

MPI_Send(&each_ans[j],1,MPI_DOUBLE,0,j*10,MPI_COMM_WORLD);//Send

answer

to processor with rank 0

}

if(my_rank==0)

{

par[i].potential = each_ans[0];

for(j=1;j<p;j++)

{

MPI_Recv(&temp_result[j],1,MPI_DOUBLE,j,j*10,MPI_COMM_WORLD,&status)

;//Receive

data from other processors

par[i].potential+=temp_result[j]; //Total potential on particle i

}

//printf("\npotential of particle

%d=%lf",i,par[i].potential);

}

void calc_energy()

{

int j;

trial=0;

for(j=0;j<MAX_PARTICLE;j++)

{

trial+=par[j].potential;

}