Technical Track:6.3 Computer aided Design Software

Preferred Format:Either Poster or Lecture

Using the SIMARC Tools for Developing a Modularly Configured Attached Processor for the Conjugate Gradient Method

Hugo Garcia1, Sergio Cabrera1, Glenn A. Gibson1 and Alejandro E. Brito1

1 Electrical and Computer Engineering Department, University of Texas at El Paso

Abstract: The SIMARC package was developed for designing and analyzing Modularly Configured Attached Processor (MCAP) architectures. A MCAP is an attached processor that is constructed of a standard set of two asynchronous connections and ten types of components that provide quickness and high utilization. The SIMARC tools were developed for designing, programming and simulating MCAP architectures. SIMARC has an Architecture Editor, a special Assembly Language, an Assembler-programming tool, and a simulator capable of producing performance statistics. In addition, we present a new architecture for performing the Conjugate Gradient algorithm; the SIMARC package is aiding in the design and analysis process.

Technical Track:6.3 Computer aided Design Software

Preferred Format:Either Poster or Lecture

CAD Tools for Developing Modularly Configured Attached Processors

Hugo Garcia1, Teresa Guerena1,Oscar Mondragon2, Glenn A. Gibson1 and Sergio Cabrera1

1 Electrical and Computer Engineering Department, University of Texas at El Paso, USA

2 Engineering Department. Instituto Tecnologico y de Estudios Superiores de Monterrey, Mexico City Campus, Mexico

Abstract: A computer-aided design tool, called the SIMARC package, is presented for the fast prototyping of Modularly Configured Attached Processor (MCAP) architectures. This tool consists of a graphics-user-interface (GUI) architecture editor, a GUI assembler tool for programming architectures, and a GUI simulator tool. This tool is capable of animating the behavioral data-flow in an MCAP architecture for a programmed algorithm while also producing performance statistics. A standard set of connections and component types form the basic structure of an MCAP architecture and these are also described here. A new MCAP architecture for performing the conjugate gradient algorithm is introduced. This also helps to illustrate the use of the SIMARC package in the design and analysis process.

Contact Author:Hugo GarciaPhone:(915) 747 5254

University of Texas at El PasoFAX:(915) 747 7871

301 Engineering Bldg.e-mail:

El Paso, TX 79968-0523, USA

Equipment Required: VGA Projector, overhead projector

Paper Summary:

MCAP Architecture

An attached or back-end processor is a processing system that is connected to host computer for the purpose of very quickly executing the system’s most computationally intensive tasks. The specific purpose is to execute parts of a set of algorithms at very high rates. The Modularly Configured Attached Processor (MCAP) considered here [1-3] attains its quickness and high utilization through:

Closely matching the architecture to the algorithm it is to execute

Overlapping processing and memory accessing by using memory prefetching

Minimizing the movement of data

Using a high-speed technology (MCM or Wafer-Scale implementation)

A MCAP is constructed from a standard set of connections and components. This standard set consist of:

Two types of asynchronous connections

Ten types of components

The list of components and their classifications is shown in Table 1.

Conjugate Gradient Method

A MCAP for performing the serial algorithm of the conjugate gradient method is an improvement on a previous attempt [8], and it is used to illustrate the use of the SIMARC package.

The Conjugate Gradient (CG) method is one of the most powerful tools in solving systems of linear equations that arise in problems solved by the finite element method [6,7]. The sparse nature of the systems of equations and the rate of convergence make this method one of the fastest. By examining the CG algorithm routing and memory access patterns, one can construct a highly efficient MCAP for executing this algorithm. The CG algorithm to be implemented is shown in Figure 1.

SIMARC package

SIMARC is a set of tools developed for the specific purpose of designing and analyzing MCAP architectures. It consists of a:

  • Graphical-user-interface (GUI) architecture editor,
  • Special assembly language,
  • GUI assembler tool for programming the architectures and
  • GUI simulator for animating the behavioral flow of data of a programmed architecture, that is also capable of producing performance statistics.

These tools give the capability of designing, testing and optimizing the performance of an MCAP before physical implementation, thereby avoiding extra costs. The editor and simulator are available for both Windows 95 and UNIX. The other tools are currently available only for PC platforms under DOS.

Architecture Editor

The design process of a MCAP architecture starts with the study of the flow of data necessary to perform the algorithm. These flows describe the basic operations involved, the data dependencies and the inherent parallelism.

The Architecture Editor [12] is a GUI tool that allows us to create an architecture layout using the basic components listed in Table 1, and to define the connections between the specified components.

Figure 2 shows the main screen of the Architecture Editor. The rectangles in the main drawing area correspond to the components used for implementing the CG algorithm. The arrows coming out of the rectangle labeled FPO1 correspond to the connections between the component FP01 and the components that will receive the data. The architecture editor has two basic modes of operation: the component drawing mode and the connection drawing mode.

Each component has a set of attributes that will determine its execution behavior during a simulation[1]. These attributes can be set via dialog boxes provided by the architecture editor, that are specific to each kind of component.

Figure 3 shows the dialog box provided for setting up the attributes for a memory component R that is used for storing the values of the x vector given in Figure 1. The execution time is 48, indicating that the time needed for retrieving or storing each value is 48 nanoseconds.

Figure 4 shows the components that are part of the conjugate gradient architecture in greater detail. The rectangles on the leftmost part are S memory controllers containing the values of the A matrix[2]. The first and third columns in the center are sets of multiplier pipelines. The second and fourth columns are sets of adder pipelines, with the fourth being accumulative adders. In the upper part of Figure 4 are memories used for storing the r, p, x, and Ap vectors. All are R memory components, except for the one used for the p vector. The rest of the components are used to distribute the data across the components mentioned above.

Once an architecture has been created, the architecture editor creates a file that contains all the necessary information to recreate it.

Architecture Programming Tool (APP)

The next step in the design process is to create a file with assembler code that will describe the internal behavior of each component and the interaction between them. The assembler code of this file can be written using any text editor or by using the APP tool [10]. The APP program provides a GUI used to easily create the assembler code necessary for programming the architecture. APP has the four main screens shown in Figure 5.

There is a set of instructions specific to each type of component. These instructions have certain limits and constraints that the APP takes into account when generating the code. Figure 5(c) shows a screen where the instructions corresponding to the S component can be programmed; it includes the Operation Mode (input, output), the number of operands to be processed, and others. For a list of available modes and instructions particular to each component please refer to [9].

Each time a component is programmed, a 'segment' of code is generated. These segments can be seen in the screen in Figure 5(b). The code can be also edited manually using the screen in Figure 5(d). Once all the necessary segments are programmed, APP generates a file containing the assembler code. Figure 6 shows a segment of code generated for an S component.

The Simulate Architecture tool

The simulate architecture tool [11] is used to animate the data-flow of the algorithm described by the assembler code, taking into account the constraints and execution attributes provided by the architecture. It also has the capability of producing performance statistics and giving information about the status of the simulation and the components at a specific step in the simulation process. Figure 7 shows the main screen of the simulator tool.

It is necessary to have the assembler code and the architecture file in order to generate a simulation. The simulator reads the architecture and assembler code files and then executes concurrently the instructions provided by the assembler code. The simulation can be animated (and interactive) or it can be run without any graphics.

Each step in the simulation represents an increase by a time unit. At every step, the simulator determines the state of every component in the architecture. According to their current state (busy, idle, wait, distribute and free), the components are shown with different colors. The simulation can be interrupted at any time to see the details of execution of each one of the components.

The simulator allows us to assure that the data flow defined for the architecture is correct, as well as optimize the parameters and attributes for each component. Some of the performance statistics generated are busy time and idle time. Using these statistics, the architecture can be optimized easily, avoiding real simulation costs.

Programming the CG Architecture

The CG architecture is programmed in such a way that it describes the flow of data necessary for the algorithm to perform. For example, consider equation (3) in Figure 1. The vector rk+1 is to be generated. The vector Apk is stored in the memories labeled RAPx in the upper right corner of Figure 4. These memory components send the data of the rk vector to the multiplier pipeline in the first column of the architecture (TMxx) via the LMVx link components. Also the value of k is distributed to the multipliers using the LMAx link components.

At the end the value of kApk is obtained. These values are then feed into the adder pipelines in the second column (TAxx). The values of the rk vector are waiting there also. Finally, the values of rk+1 can be obtained at the end of this column.

CONCLUSIONS

The use of the GUI tools developed for MCAP architectures allow us to easily design and analyze new special-purpose architectures. The set of tools available cover the whole building process:

-creation of the architecture layout

-programming the architecture

-simulation of the architecture

-generation of performance statistics

Thus, the use of the MCAP GUI tools allows us to obtain optimized architectures, avoiding the costs of physical simulation and implementation. The transparent portability of the design and simulation process to different computer platforms is possible due to the availability of tools on different platforms. In particular, the UNIX tools allow the designers to work even from a remote system. The simulator tool is capable of running various simulations independently, because it is implemented in a multithreaded fashion.

The conjugate gradient method architecture obtained was found to be a highly efficient architecture, with a predicted processor efficiency about 80%.

REFERENCES

[1] Y. Chang, G. Gibson and C. Ayala, (1994). Simulation and Performance Evaluation of a Modularly Configurable Attached Processor. IEEE International Conference on Parallel and Distributed Systems, Taiwan, Dec. 1994.

[2] D. Doran and G. Gibson, (1995). A Modularly Configured Attached Processor for Performing Matrix Operations. International Conference on Parallel and Distributed Programming: Techniques and Applications, Athens, GA. Nov 1995.

[3] G. Gibson G, A. Brito, Y. Chang and E. Castro, (1995). Simulation and Fast Prototyping of Modularly Configurable Attached Processors. High Computer Symposium, Phoenix, Az., April, 1995.

[6] J. Shewchuk. (1994): An Introduction to the Conjugate Gradient Method with out the agonizing pain. CMU-CS_94_125

[7] R. Burden, J. Faires (1995). Numerical Analysis. PWS Publishing Co.

[8] G. Gibson, A. Brito, S. Cabrera and H. Garcia, (1997). A Modularly Configured Attached Processor for performing the Conjugate Gradient Method. International Conferences on Computational Engineering Science, San Jose, Costa Rica, 1997.

[9] A. Brito, (1994). A graphics Editor for the MCAP simulator. Master's Thesis, UTEP.

[10] J. M. Fuller (1998). Graphical method for creating MCAP assembler programs. Master's Thesis, UTEP.

[11] T. Guerena (1998). Comparative Analysis of Porting the SIMARC package to Windows 95 and XWINDOW. Master's Thesis, UTEP.

[12] O. Mondragon and J. Guerrero (1999). MCAP Editor Version 1.0, Users Manual. Mexico City Mexico.

[1] Execution time, size of data queues, size of instruction queues, etc.

[2] The matrices, vectors and variables mentioned are part of the algorithm in Figure 1