A Parallel Algorithm for Optimization-Based

13th Annual SIAM Conference on Parallel Processing for Scientific Computing

A Parallel Algorithm for Optimization-Based

Smoothing of Unstructured 3-D Meshes

Vincent Charles Betro

University of Tennessee at Chattanooga

SimCenter: National Center for Computational Engineering

Abstract

Serial optimization-based smoothing algorithms are computationally expensive. Using Metis (or ParMetis) to partition the mesh, the parallel algorithm moves (or does not move) a processor’s internal nodes based on a cost function derived from the Jacobians and condition numbers of surrounding elements. Ghost nodes are used to communicate new positions, the owning processor on a boundary uses new information to move boundary nodes, and the process repeats. The result is a ready-to-use decomposed mesh.

Introduction to Mesh Smoothing

Often sliver cells, or cells with very high aspect ratios and very small volumes, are generated through the general cutting process used to create hybrid hierarchical unstructured Cartesian meshes.[1] These are not optimal for use with most flow solvers, so the use of an optimization-based smoother is helpful to improve the quality of these cells, giving them more reasonable volumes and lower aspect ratios.[2]

Some optimization-based smoothing algorithms use only the solution to one function to choose an optimal movement direction for each node (such as maximizing the minimum dihedral angle), and as such are not computationally expensive and are easy to implement in parallel.[3] These, methods, like the one used here, use the tetrahedron as a basis element, and these tetrahedra can be derived from breaking down the four basic element types. Where the algorithm here differs from that referenced above is that beyond using the Jacobian or condition number of the element to determine a cost function to minimize at each node, the actual movements are based on a pre-determined epsilon which is a fraction of the minimum spacing in the mesh. If this cost function is not lowered by one type of movement, there are two others available to assist in lowering the cost function for the node. This disallows the algorithm from moving one point so much that it becomes a detriment to the surrounding nodes. While this is equally simple to parallelize and the extra search directions are not significantly more computationally expensive, the ability to work on meshes that are arbitrarily large has necessitated a parallel implementation.

Serial Smoothing Algorithm

The smoothing process need not work on all of the nodes in the mesh. Therefore, the algorithm determines whether or not to move a node based on a cost function for the node derived from the elements that surround it. Additionally, it may be determined based on inspection of the initial mesh that only boundary nodes, interior nodes, or both need to be moved, and this option has been built into P_OPT. The cost function for each individual element is determined using in the following equations.

(1)

In the above equation, J stands for the Jacobian of a corner of an element, which is negative if the element is inverted. The reason for having a separate cost function for negative Jacobians is that these elements must have priority in movement since a final product with inverted elements is highly undesirable. In the second cost function, the tetrahedral condition number is used, represented by CN.

Once the cost functions for each element surrounding the node are completed, the cost function for the node is computed using the following piecewise formula and the maximum and average cost functions from elements surrounding the node itself.

(5)

where cmax is the maximum cost, cmin is the minimum cost, and

(6)

Each node is perturbed in an effort to improve the value of the cost function for the node. If the movement of the node has improved (lowered) the cost function from the last iteration, the perturbation is accepted; if not, the perturbation is rejected. There are three types of perturbations around a node that are utilized (the cost is analyzed between each type of movement). The first is to move the node in the direction of the unit normal of its opposite face, the second is to move the node along each of the edges in which it is included, and the third is to move the node a prescribed perturbation in the x, y and z directions if neither of the previous methods improved the cost function.[4]

Discussion of Parallelization

While the basic construct of the parallel implementation in OPENMPI is the same, the use of Metis (or ParMetis for pre-partitioned files that will be generated by P_HUGG) to partition a preset mesh allows for a great deal of speed-up since each part of the mesh is worked on equally, which is not necessarily the case for geometries with large far-field areas upon initial mesh generation. Thus, P_OPT can take full advantage of parallel partitioning and only necessitate communication between optimization runs of information regarding partition boundary nodes’ new locations. The bias attributed to these nodes being a step behind has been lessened even more by choosing to perturb them first, since normally the first segment of perturbed nodes have no new information even in serial and are perturbing based on current nodal positions anyways.

Figure 1 shows the speed-up (computed using the formula, ) for a surface ship, designated DTMB 5415.[5] This final mesh contained 382,481 nodes and 904,452 total elements, with 391,828 tetrahedra, 325,350 pyramids and 187,274 hexahedra. The run with the closest fit to the ideal speed up curve was that with the cost function threshold set to 0.1 running 100 smoothing iterations with no nodal weighting. This is sensible since this process can take up to 45 minutes in serial; however, it still shows a decrease in speed up from 32 to 64 processors as the communication costs begin to outweigh the computational costs.

Figure 2 shows the speed-up for Drag Prediction Workshop III transport aircraft geometry.[6] The final mesh contained 1,739,455 points and 4,382,046 total elements, with 2,326,329 tetrahedra, 1,155,931 pyramids and 899,786 hexahedra. The run with the closest fit to the ideal speed up curve was that with the cost function threshold set to 0.1 running 200 smoothing iterations with no nodal weighting; this is sensible since this process can take days in serial.

It is noticeable in both Figures 1 and 2 that, when the threshold is raised to 0.9 and the number of smoothing iterations is decreased by an order of magnitude, not only does the speed-up suffer by a factor of two but also, after 8 processors, the speed tends to level off and even decrease after 16 processors. While the larger, DPW III mesh still exhibits some speed-up, its gains become quite modest; in the case of the DTMB 5415, the speed-up can actually go below 1.0 due to the communication costs beginning to far outweigh the computational costs. This seems to show that while the algorithm is scalable for larger meshes and higher workloads, there is a limit on processor use after which the partitioning scheme cannot find enough work for each processor.

Moreover, when the threshold is higher than 0.1, not all the nodes in the mesh are being worked on and the partitioning that Metis did becomes less relevant. Unfortunately, there is no preterit way to determine when a node will cease to be worked on, as the cost function changes based on the movement of the surrounding nodes after each iteration. Thus, there is no way to weight these nodes specially for Metis to take them into extra consideration.

Figure 1: DTMB 5415 optimization speed-up with various weighting schemes and thresholds having been implemented.

Figure 2: DTMB 5415 optimization run times with various weighting schemes having been implemented.

The aforementioned speed-up issues are also evident in Figures 3 and 4, which show the timings for the high threshold run of the DTMB 5415 and DPW III with three different weighting schemes. It is noticeable that the times decrease greatly at first, and then simply level off before an abrupt increase around 32 processors. More importantly, by weighting either the boundary nodes doubly (due to the extra work required in reprojection) or the non-boundary nodes doubly (due to the extra perturbation directions), better load balancing is not attained (as shown through slower run time and loss of speed-up). In fact, the weighting seems to cause a slight decrease in overall timings and small eccentricities in speed-up to occur when more processors are used than absolutely necessary. This seems to be a function of the fact that the extra facets of work that must be done on each type of node are nearly the same, except in cases where the nodes have high connectivity, and this extends beyond simply weighting based on numbers of edges but requires knowledge of the type of opposite face on which normals are computed.

Figure 3: DPW III optimization smoothing speed-up with various weighting schemes and thresholds having been implemented.

Figure 4: DPW III optimization smoothing run times with various weighting schemes having been implemented.

Results and Future Work

Figures 5 and 6 show a boundary surface mesh from the trailing edge of an airfoil from the DPW III which has been smoothed, and figures 7 and 8 show a volume mesh from the side of the same DPW III airfoil before and after smoothing. Finally, figure 9 shows the same volume mesh from figure 8 after smoothing for an extra 50 iterations at an order of magnitude lower threshold (0.01) and forcing the smoother to work on the internal points only. These meshes represent the ability of the smoother to take a grid that may not be able to be used successfully to find a truly accurate fluid solution and make it useable by nearly any Computational Fluid Dynamics flow solver code.

It should be noted that the above results were generated on a Linux cluster from Dell containing 325 compute nodes, each with two dual-core Intel EM64T 3.0GHz Xeon processors, 4 GB RAM per node, and a GigE interconnect (576 port Force10 E1200 switch). Also, the results were output in the CGNS[7] file structure and and graphed using the Fieldview[8] file structure. Tables were created with XMGR.[9]

The next major development planned for P_OPT is to implement ParMetis to take pre-partioned CGNS files as input without having to decompose the files first and then re-partition based on the needs of the optimizer versus the grid generator. Another goal is to have the optimizer work on general polyhedra instead of only the four basic element types by utilizing a cost function that will work on the corners of any shape.

Figure 14 DPW boundary surface mesh before optimization-based smoothing is applied to trailing edge. /
Figure 15 DPW boundary surface mesh after optimization-based smoothing is applied to trailing edge.

Figure 16 DPW volume mesh before optimization-based smoothing is applied to airfoil. /
Figure 17 DPW volume mesh after optimization-based smoothing is applied to airfoil.

Figure 18 DPW volume mesh after extra, internal-only optimization-based smoothing is applied to the airfoil.

The Renaissance Atlanta Hotel Downtown, Atlanta, GA, March 12-14, 2008

[1]Karman, S. L. Jr. and Betro, V.C., “Parallel Hierarchical Unstructured Mesh Generation with General Cutting,” AIAA-2008-0918, 46th Aerospace Sciences Meeting and Exhibit, Reno, NV, January 7-11, 2008.

[2] Karman, Steve L., Jr., “Hierarchical Unstructured Mesh Generation,” AIAA-2004-0613, 42nd Aerospace Sciences Meeting and Exhibit, Reno, NV, January 4-7, 2004.

[3]Lori Freitag and Carl Ollivier-Gooch, “Tetrahedral mesh improvement using swapping and smoothing,” International Journal of Numerical Methods in Engineering, 40, to appear,

[4]Sahasrabudhe, Mandar S., Karman, Jr., S.L., Anderson, W.K., “Grid Control of Viscous Unstructured Meshes Using Optimization,” AIAA-2006-0532,44th Aerospace Sciences Meeting and Exhibit, Reno, NV, January 9-12, 2006.

[5]Longo, J., Yooh, H. S., Toda, Y., Stern, F., “Phase-Averaged 3DPIV/Wave Elevations and Force/Moment Measurements for the Surface Combatant in PMM Maneuvers,” 26th Symposium on Naval Hydrodaynamics, Rome, Italy, Sept. 2006.

[6]AIAA Applied Aerodynamics Drag Prediction Workshop website,

[7]CGNS Documentation, “The CFD General Notation System Standard Interface Data Structures,” .

[8] FIELDVIEW, Intelligent Light, Inc., .

[9] XMGR, ACE/gr Development Team, .