AMS 530: PRINICIPLES OF PARALLEL COMPUTING
PROJECT 2: MINIMIZATION OF POTENTIAL ENERGY
Submitted by
PRACHI UGHADE
PROJECT DESCRIPTION:
Problem Description:
N particles are arranged, in 3D initially, in a 10´10´10 cube, with random coordinates for each particle. These particles interact under the Lennard-Jones potential given by
where Vij is the pair-wise potential between particles i and j with distance rij. The aim of this project is to formulate a parallel program, which attempts to minimize the total (Lennard-Jones potential) energy of the particle system by adjusting particle positions.
Solution:
The approach adopted is to use the Monte Carlo Simulation which uses random moves to explore the search space to find out some information about the space. In all, random moves are accepted such that a different region of search space is sampled at each step.
Another sampling procedure incorporates temperature of the system in the calculations This is done by calculating the Boltzmann average of a property of the system. This modified Monte Carlo method is known as a Metropolis Monte Carlo simulation.
At any given time the current state of the system (i.e. its position in the search space) is denoted State-0. It has an energy of . After a random move it will be in State-1 with an energy of .
The system is currently in some State-0 and a temperature T, and the simulation proceeds as follows.
1. Randomly change the system so that it is now in State-1, making sure that the configurations of both State-0 and State-1 are saved.
2. Compare the energies of the two states. The possible choices are given below. (R is a random number between 0 and 1 representing probability.)
3. The energy of State-1 is less than or equal to State-0. This means that the new state is accepted and becomes the new State-0.
OR
4. Now, there are 2 possible cases
a) The energy of State-1 is greater than State-0, but the energy difference is small enough that it is probabilistically accepted. This means that the new state is accepted and becomes the new State-0. This means that even though the energy of State -1 is strictly lesser than that of State-0, it is accepted to avoid the problem of getting not being able to move beyond a local optimum.
b) The energy of State-1 is greater than State-0, and the energy difference is large enough that it is probabilistically rejected. This means that the system stays in State-0.
There are two factors that control how well the simulation samples the available search space of the system; the size of each Monte Carlo step and the number of Monte Carlo steps attempted - nloop. If the size of a given Monte Carlo step is too large, the change in the energy between State-0 and State-1 may become very large and virtually all of the attempts will be probabilistically rejected. This causes the simulation to get "stuck" at a particular point in search space. Conversely, if the Monte Carlo step size is too small, the system may have a hard time sampling all of the available search space. In either case, the total number of attempts may have to become exceedingly large.
In this project,
(1) The minimal energy for N=16 particles is computed using Monte Carlo simulation algorithm by 1 node
(2) Then, using P=2, 4, 8 nodes the minimal system energy, is computed again individually. The minimizations are terminated on reaching a minimal energy similar to the minimal energy obtained by P=1 (as in (1) above)
(3) Lastly the speed-up curve is plotted.
In addition, a separate program to calculate an appropriate value of KT to be used
in the main program has also been included.
The value of KT is calculated as follows. The first particle is moved and system energy is calculated. This value is stored in E[1]. Then the first particle is restored to its original position and the second particle is moved. The system energy is recalculated and stored in E[2]. In this way system energy is calculated for all the 16 particles. The average energy of the system is found out, as well as the deviation, of each individual system energy from the average. Now the average of these deviations gives the value of KT.
ALGORITHM DESCRIPTION
1. Start
2. Run the program
3. Define the appropriate value of KT to be used
4. Initialize the MPI routines and indicate the rank and size
5. Initialize the co-ordinate positions of each of the 16 particles in the 10 x 10 x 10 cube
6. Calculate the initial system energy
7. Initialize nloop=1
8. Record the time tstart when the energy calculation begins
9. Parallelize the work between the processors (i.e each processors calculates the energy
acting on N/P particles)
10. Move each of the 16 particles randomly, one at a time (keeping the co-ordinates of
the other particles unchanged) and calculate the energy each time.
11. Choose the minimum of these energies as the new system energy and change the
original co-ordinates of the corresponding particle only, to it’s new calculated
co-ordinates
12. Repeat steps 10 and 11 and now use Monte Carlo Alogorithm to decide whether the
newly calculated energy should be adopted as the new system energy.
13. Increment nloop and repeat step 12 till nloop=2000000 or till a pre-defined energy
level is reached.
14. Record the time totaltime when the minimum energy calculation ends
15. Find the time taken to perform energy calculation (totaltime – tstart)
16. Calculate the average time required by each processor
17. Pass this average time to a central processor and calculate the average time taken by
all the processors
18. Display the minimum energy and the final co-ordinates of each particle
18. End
PROGRAMS:
/*This program minimizes the Lennard-Jones potential of a system, using Monte Carlo simulation algorithm*/
#include<stdio.h>
#include<math.h>
#include<stdlib.h>
#include"mpi.h"
#define ABS(x) x>0 ? x:-x
#define KT 14.598821
#define nloop 1000000
#define proc1engy -6.695888
#define not_accept_move 1000
struct particle /*structure to hold co-ordinates of each particle*/
{
float x;
float y;
float z;
};
int var = 1,my_rank,n=16,p,flag=0,sent=0;
struct particle temp[16];
float newcost;
float tstart,totaltime,finaltime;
main(int argc,char **argv)
{
struct particle a[16];
struct particle bestsol[16],minsol,tempminsol;
int i,j,k,l,m,pos,temppos,counter = 0;
float oldcost,bestcost,diff,mincost,tempmincost;
MPI_Status status;
void calculate_system_energy();
float rand1();
/*MPI Initialization*/
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);
MPI_Comm_size(MPI_COMM_WORLD,&p);
// initial configuration
if (my_rank==(p-1))
{
for(i=0;i<n;i++)
{
a[i].x = rand1()*10;
a[i].y = rand1()*10;
a[i].z = rand1()*10;
}
}
/*processor with rank (p-1) sends initial co-ordinates to all other processors*/
if (p!=1)
{
if (my_rank == (p-1))
{
for (i=0;i<(p-1);i++)
{
for (j=0;j<n;j++)
{
MPI_Send(&a[j].x,1,MPI_DOUBLE,i,i+j+10,MPI_COMM_WORLD);
MPI_Send(&a[j].y,1,MPI_DOUBLE,i,i+j+11,MPI_COMM_WORLD);
MPI_Send(&a[j].z,1,MPI_DOUBLE,i,i+j+12,MPI_COMM_WORLD);
}//end of second for loop
}//end of first loop
}//end of if my_rank loop
else
for (j=0;j<n;j++)
{
MPI_Recv(&a[j].x,1,MPI_DOUBLE,p-1,my_rank+j+10,MPI_COMM_WORLD,&status);
MPI_Recv(&a[j].y,1,MPI_DOUBLE,p-1,my_rank+j+11,MPI_COMM_WORLD,&status);
MPI_Recv(&a[j].z,1,MPI_DOUBLE,p-1,my_rank+j+12,MPI_COMM_WORLD,&status);
}
}//end of p!= loop
for (i=0;i<n;i++) /*Initializations*/
{
bestsol[i].x = a[i].x;
bestsol[i].y = a[i].y;
bestsol[i].z = a[i].z;
temp[i].x = a[i].x;
temp[i].y = a[i].y;
temp[i].z = a[i].z;
}
calculate_system_energy(); /*This will calculate initial system energy*/
if (my_rank == p-1)
{
bestcost = newcost;
oldcost = newcost;
}
tstart=MPI_Wtime(); /*Start timer to calculate time taken for computation*/
for(i=0;i<nloop;i++)
{
if (p==1)
if (counter >= not_accept_move)
break;
/*Calculation of configuration that yields minimum energy , done parallelly by each processor*/
for(j=((n/p)*my_rank);j<=(((n/p)*(my_rank+1))-1);j++)
{
temp[j].x = rand1()*10;
temp[j].y = rand1()*10;
temp[j].z = rand1()*10;
var++;
calculate_system_energy();
if (j==((n/p)*my_rank))
{
pos=j;
mincost = newcost;
minsol.x = temp[j].x;
minsol.y = temp[j].y;
minsol.z = temp[j].z;
}
if (flag == 1)
continue;
if( mincost > newcost )
{
mincost=newcost;
minsol.x = temp[j].x;
minsol.y = temp[j].y;
minsol.z = temp[j].z;
pos=j;
}
temp[j].x=a[j].x;
temp[j].y=a[j].y;
temp[j].z=a[j].z;
}//end of j loop
/*Each processor sends it’s minimum energy configuration to the processor with rank (p-1)*/
if(p!=1)
{
if(my_rank!=(p-1))
{
MPI_Send(&mincost,1,MPI_DOUBLE,p-1,(my_rank*5)+8,MPI_COMM_WORLD);
MPI_Send(&pos,1,MPI_INT,p-1,(my_rank*5)+9,MPI_COMM_WORLD);
MPI_Send(&minsol.x,1,MPI_DOUBLE,p-1,(my_rank*5)+10,MPI_COMM_WORLD);
MPI_Send(&minsol.y,1,MPI_DOUBLE,p-1,(my_rank*5)+11,MPI_COMM_WORLD);
MPI_Send(&minsol.z,1,MPI_DOUBLE,p-1,(my_rank*5)+12,MPI_COMM_WORLD);
} // my_rank!= p-1
else
{
for(k=0;k<(p-1);k++)
{
MPI_Recv(&tempmincost,1,MPI_DOUBLE,k,(k*5)+8,MPI_COMM_WORLD,&status);
MPI_Recv(&temppos,1,MPI_INT,k,(k*5)+9,MPI_COMM_WORLD,&status);
MPI_Recv(&tempminsol.x,1,MPI_DOUBLE,k,(k*5)+10,MPI_COMM_WORLD,&status);
MPI_Recv(&tempminsol.y,1,MPI_DOUBLE,k,(k*5)+11,MPI_COMM_WORLD,&status);
MPI_Recv(&tempminsol.z,1,MPI_DOUBLE,k,(k*5)+12,MPI_COMM_WORLD,&status);
/*processor with rank (p-1)compares and accepts the minimum energy configuration amongst all it receives*/
if(tempmincost<mincost)
{
mincost=tempmincost;
pos=temppos;
minsol.x=tempminsol.x;
minsol.y=tempminsol.y;
minsol.z=tempminsol.z;
}
}
}
}
/*processor with rank (p-1) uses Monte Carlo simulation algorithm to decide whether to accept the new energy state*/
if (my_rank == (p-1))
{
if (mincost <= oldcost)
{
counter = 0;
if( bestcost >= mincost )
{
bestcost=mincost;
bestsol[pos].x = minsol.x;
bestsol[pos].y = minsol.y;
bestsol[pos].z = minsol.z;
}
a[pos].x = minsol.x;
a[pos].y = minsol.y;
a[pos].z = minsol.z;
temp[pos].x = a[pos].x;
temp[pos].y = a[pos].y;
temp[pos].z = a[pos].z;
oldcost=mincost;
if(p!=1)
{
for(m=0;m<(p-1);m++)
{
sent=1;
MPI_Send(&sent,1,MPI_INT,m,m+13,MPI_COMM_WORLD);
MPI_Send(&pos,1,MPI_DOUBLE,m,m+14,MPI_COMM_WORLD);
MPI_Send(&a[pos].x,1,MPI_DOUBLE,m,m+15,MPI_COMM_WORLD);
MPI_Send(&a[pos].y,1,MPI_DOUBLE,m,m+16,MPI_COMM_WORLD);
MPI_Send(&a[pos].z,1,MPI_DOUBLE,m,m+17,MPI_COMM_WORLD);
}
}
} //end of accepting good moves
if (mincost > oldcost)
{
diff=mincost-oldcost;
if(rand1() < exp (-diff/KT))
{
counter =0;
a[pos].x = minsol.x;
a[pos].y = minsol.y;
a[pos].z = minsol.z;
temp[pos].x = a[pos].x;
temp[pos].y = a[pos].y;
temp[pos].z = a[pos].z;
oldcost=mincost;
if(p!=1)
{
for(m=0;m<(p-1);m++)
{
sent=1;
MPI_Send(&sent,1,MPI_INT,m,m+13,MPI_COMM_WORLD);
MPI_Send(&pos,1,MPI_DOUBLE,m,m+14,MPI_COMM_WORLD);
MPI_Send(&a[pos].x,1,MPI_DOUBLE,m,m+15,MPI_COMM_WORLD);
MPI_Send(&a[pos].y,1,MPI_DOUBLE,m,m+16,MPI_COMM_WORLD);
MPI_Send(&a[pos].z,1,MPI_DOUBLE,m,m+17,MPI_COMM_WORLD);
}
}
}
else
{
++counter;
if (p!=1)
{
for(m=0;m<p-1;m++)
{
sent=0;
MPI_Send(&sent,1,MPI_INT,m,m+13,MPI_COMM_WORLD);
}
}
}
}
} // end of if my_rank = p-1
if (p!=1)
{
for(k=0;k<p-1;k++)
{
if(my_rank == k)
{
MPI_Recv(&sent,1,MPI_INT,p-1,k+13,MPI_COMM_WORLD,&status);
if(sent)
{
sent=0;
MPI_Recv(&pos,1,MPI_DOUBLE,p-1,k+14,MPI_COMM_WORLD,&status);
MPI_Recv(&a[pos].x,1,MPI_DOUBLE,p-1,k+15,MPI_COMM_WORLD,&status);
MPI_Recv(&a[pos].y,1,MPI_DOUBLE,p-1,k+16,MPI_COMM_WORLD,&status);
MPI_Recv(&a[pos].z,1,MPI_DOUBLE,p-1,k+17,MPI_COMM_WORLD,&status);
temp[pos].x = a[pos].x;
temp[pos].y = a[pos].y;
temp[pos].z = a[pos].z;
}
}
}
}
// breaking from the loop in case energy obtained equal to that obtained by 1 node*/
if(p!=1)
{
if(my_rank == (p-1))
{
for(m=0;m<(p-1);m++)
{
MPI_Send(&bestcost,1,MPI_DOUBLE,m,m+13,MPI_COMM_WORLD);
}
}
else
MPI_Recv(&bestcost,1,MPI_DOUBLE,p-1,my_rank+13,MPI_COMM_WORLD,&status);
if(bestcost <= proc1engy)
break;
}
} //end of nloop
totaltime=MPI_Wtime()-tstart;
if(my_rank==(p-1))
{
printf("\nThe final minimized potential energy of the system is %f\n",bestcost);
printf("\nThe final co-ordinate positions are as follows:-\n");
for(i=0;i<n ; i++)
printf( "\nParticle #%d : %f, %f, %f\n",i,bestsol[i].x,bestsol[i].y,bestsol[i].z);
finaltime=totaltime;
if(p!=1)
{
for(m=0;m<(p-1);m++)
{
MPI_Recv(&totaltime,1,MPI_DOUBLE,m,m+15,MPI_COMM_WORLD,&status);
finaltime+=totaltime;
}
finaltime=(finaltime/p);
}
printf("\nThe total time taken is %f seconds\n",finaltime);
}
else
if (p!=1)
{
MPI_Send(&totaltime,1,MPI_DOUBLE,p-1,my_rank+15,MPI_COMM_WORLD);
}
MPI_Finalize();
}
/*this function calculates the Lennard-Jones potential energy given the co-ordinates of the particles*/
void calculate_system_energy()
{
float sumv,v,r,r1,r2,totalsum,t;
int i,j,k;
MPI_Status status;
newcost = 0.0;
flag = 0;
sumv=0;
/*Calculation of Lennard-jones potential*/
for(i=0,j=0;i<n;i++,j++)
{
for(k=i+1;k<n;k++)
{
r=((temp[i].x-temp[k].x)*(temp[i].x-temp[k].x))+((temp[i].y-temp[k].y)*(temp[i].y- temp[k].y))+((temp[i].z-temp[k].z)*(temp[i].z-temp[k].z));
if(r>0.0000000)
{
r1 = r*r*r;
r2 = r1 * r1;
v= (1/r2) - (2/r1);
if (v > 0) flag = 1;
sumv = sumv + v;
}
}
} // i ends here
newcost = sumv;
} // end of function for calculation of energy
float rand1() //This function returns a random value between 0 and 1
{
float s;
++var;
srand(var);
s =(float) rand()/RAND_MAX;
return(s);
}
RESULTS & ANALYSIS
Results:
The timing and energy results yielded were as follows:
P =1 / P = 2 / P = 4 / P = 8Final system energy / -6.695888 / -6.695888 / -6.695888 / -6.695888
Time required / 1880.706421 / 1881.377441 / 1480.636719 / 1480.638184
The speed up is calculated using the formula:
Speed up = Time required by one processor
Time required by n processors
The Speed up is as follows
P / Speed up1 / 1
2 / 0.99
4 / 1.27
8 / 1.27
x axis: # of nodes
y axis: Speed up
Analysis:
From the results section it can be seen that two nodes take almost the same time as one node while four nodes and eight nodes take marginally less time than one node.
In case of two nodes each processor handles eight particles. The message passing in this case involves passing the co-ordinates of one particle between the processors in each iteration. The computation effort in this case is reduced significantly. Now each processor handles 8 particles rather than all 16. Even though the computation time decreases due to the use of two nodes the message passing overhead nullifies this effect. Hence two nodes and one node take almost the same time.
In case of four nodes, each processor handles four particles. Hence the computation time reduces very significantly and this is reflected in the fact that four nodes require approximately 1480 sec inspite of increased message passing while a single processor took about 1880sec to do the job. A similar explanation holds true for eight nodes.
We observe one more fact that four nodes and eight nodes take the same amount of time to do the job about 1480 sec. One possible explanation for this is as follows. In case of four nodes each processor is handling four particles while in case of eight processors each is handling two particles. For four nodes the computation time is more but the message passing time is less while for eight nodes the computation time is less while the message passing time is more. Hence both the cases require the same amount of time.
ANALYSIS OF PROGRAM PERFORMANCE:
The performance of a program running on parallel processors basically depends on the method used to parallelize the work between the processors. The more the parallelizing, the better is the performance of the system. In our program the method of Particle Decomposition has been used to parallelize the work.
The process begins with calculating the system energy for the initial co-ordinate system.
This value of system energy is stored in a variable called oldcost.
Each processor has the information of the entire co-ordinate system of particles. Now a group of particles has been assigned to each of the p processors. Each processor moves the first particle in its group and calculates the system energy. It also keeps a record of this new value of system energy along with the new co-ordinates of the particle. Next it restores the moved particle back to its position in the original configuration and moves the second particle. The system energy is again calculated. This new system energy is compared with the previous one. If the new energy is lower than the previous one then the processor records it along with the new co-ordinates of the particle. As in the previous case the second particle is moved back to its initial position. This process is carried out for all the particles in the group by each processor. At the end of the process each processor knows the movement of which particle in its group will reduce the system energy the most. All the processors convey this information to the master processor. Now the master processor finds the minimum amongst these particles and records the result in a variable called mincost.