Performance Analysis of Parallel Processing

Performance Analysis of Parallel Processing

Dr. Subra Ganesan

Department of Computer Science and Engineering

School of Engineering and Computer Science

Oakland University

CSE664

Project report

Winter 2000

4/29/2000

Ahmad Milhim

Contents

1. Introduction

2. Definitions

2.1 Performance analysis

2.2 Performance analysis techniques

2.3 Performance analysis metrics

3. Performance means

3.1 Arithmetic mean performance

3.2 Geometric mean performance

3.3 Harmonic mean performance

4. Speedup performance laws

4.1 Asymptotic speedup

4.2 Harmonic speedup

4.3 Amdahl’s speedup law

5. Workloads

5.1 Types of test workloads

5.2 HINT

6. References

1. Introduction

The goal of all computer system users and designers is to get the highest performance at the lowest cost. Basic knowledge of performance evaluation terminology and technology is a must for computer professionals to evaluate their systems. Performance analysis is required at every stage of the life cycle of a computer system, starting from the design stage to manufacturing, sales, and upgrade.

Designers do performance analysis when they want to compare a number of alternative designs and pick the best design, system administrators use performance analysis to choose a system from a set of possible systems for a certain application, and users need to know how well their systems are performing and if an upgrade is needed.

It is said that performance analysis is an art, every analysis requires an intimate knowledge of the system being evaluated. Computer applications are so numerous and different that it is not possible to have a standard measure of performance analysis for all applications. There are three techniques used for performance analysis, analytical modeling, simulation, and measurement. Different considerations help analysts choose which technique to use, the key consideration is the stage in which the system is, if the system is a new concept analytical modeling and simulations are the only possible techniques to be used, at the same time these techniques can be base on previous measurements of other similar systems, other considerations of less importance are the time available for analysis, tools, accuracy, and cost. Users of computer systems seek ways to increase the productivity of their systems in terms of both computer hardware and programmers, and then reduce the cost of computing. Computer throughputs were increased by making the operating system handle resource sharing.

There has been always a desire to evaluate how will computer systems are performing, and to find ways of improving their performance.

Measurement and analysis of parallel processing is a new field, the analysis of parallel processing helps users and researchers to answer key questions about the configuration of parallel systems. Usually the questions that analysts try to find an answer for are what is the best way to configure the processing elements to solve a particular problem, how much overhead is generated by parallelism, what is the effect of varying the number of processors on the performance, which is better to use shared memory or local memory, and so on.

2. Definitions. In this section the most common terms and concepts relating to the performance analysis of systems in general and parallel processing in particular are introduced, and how these terms are applied to computer systems and parallel processing. First the definition of performance analysis is introduced, then performance metrics and techniques are discussed in more detail.

2.1 performance analysis. The ideal performance of a computer system can be achieved if a perfect match between hardware capability and software behavior is reached.

2.2 Performance analysis techniques Three analysis techniques are used for performance analysis; these are analytical modeling, simulation, and measurement. To choose which one to use for the analysis of a system depends on certain considerations. The most important consideration is the stage in which the analysis is to be performed, measurement can’t be performed unless the system already exists or at least a similar system does, if the proposed design is a new idea then only analytical modeling or simulation techniques can be used.

2.2.1 Analytical modeling. Used if the system is in the early design stages, and the results are needed soon. Provides insight into the underlying system, but it may not be precise due to simplification in modeling and mathematical equations.

2.2.2 Simulation. It is a useful technique for analysis. Simulation models provide easy ways to predict the performance of computer systems before they exist and it is used to validate the results of analytical modeling and measurement. Simulation provides snapshots of system behavior

2.2.3 Measurement. It can be used only if the system is available for measurement (postprototype) or at least other systems similar to the system under design exist. Its cost is high compared to the other two techniques because instruments of hardware and software are needed to perform measurements. It also takes a long time to monitor the system efficiently. The most important considerations that are used to choose one or more of these three techniques are illustrated in Figure 1 below.

Figure 1. Considerations in order of importance

It is said that not to trust the results of one technique until they have been validated by the other two techniques. This is to emphasize that using one technique may be miss leading, or at least you may not get accurate results.

2.3 Performance metrics. Each performance study has different metrics that can be defined and used to evaluate that specific system under study, although these metrics are varied and different from one system to another, common metrics exist and are used in most general computer system evaluation. These most common metrics are introduced here.

2.3.1 Response time. It is the interval between a users request and the system response. In general the response time increases as the load increases, terms that are related to the response time are turnaround time and reaction time, turnaround time is defined as the time elapsed between submitting a job and the completion of its output, reaction time elapsed between submitting of a request and the beginning of its execution by the computer system. Stretch factor is the ratio of the response time at a particular load to that at a minimum load.

2.3.2 Throughput is defined as the rate (requests per unit time) at which the request can be serviced by the system, for CPUs the rate is measured in million instructions per second (MIPS) or million floating point instructions per second (Mflops), for networks the rate is measured in bits per second (bps) or packets per second, and for transaction systems it is measured in transactions per second (TPS). In general the throughput of a the system increases as the load initially increases, then after a certain load the throughput stops increasing and in some cases starts decreasing.

The maximum achievable throughput under ideal workload conditions is called the nominal capacity of the system, it is the bandwidth in computer networks and is measured in bits per second. The response time at the nominal capacity is often too high which leads to the definition of a new term, the usable capacity which is defined as the maximum achievable throughput without exceeding a pre specified response time. The optimal operating point is the point after which the response time increases rapidly as a function of the load, at the same time the gain in throughput is small, it is the knee capacity in Figure 2 below.

2.3.3 System Efficiency E(n). It is defined as the ratio of the maximum achievable throughput (usable capacity ) to the maximum achievable capacity under ideal workload conditions. It is an indication of the actual degree of speedup performance achieved as compared with the maximum value for parallel processing, in other words it is the ratio of the performance of n-processor system to that of a single-processor system. The lowest efficiency corresponds to the case where the entire program code being executed sequentially on a single processor. The maximum efficiency corresponds to the case when all processors are fully utilized throughout the execution period. The minimum efficiency is the case where the entire program code being executed sequentially on a single processor as illustrated in figure 3.

E(n)=S(n)/n

Figure 3 efficiency

2.3.4 Speedup S(n). It is an indication of the degree of the speed gain in parallel computations. It is discussed in more detail later in this paper. There are three different speedup measures, asymptotic speedup, harmonic speedup, and Amdahl's speedup law, simply the speedup is defined as the ratio of the time taken by one processor to execute a program to the time taken by n processors to execute the same program.

S(n) = T(1)/T(n)

2.3.5 Redundancy R(n). The ratio of the total number of unit operations performed by n-processor system O(n) to the total number of unit operations performed by a single-processor system O(1).

R(n) = O(n)/O(1)

2.3.6 Utilization U(n). The fraction of the time the resource is busy or it is the ratio of busy time to total time over an observation period. Idle time is the time during which a resource is not used. Balancing between resources produces more efficient systems when parallel processing system is being designed. In terms of system redundancy and its efficiency, the utilization is expressed as system redundancy times its efficiency. It indicates the percentage of the resources that were kept busy during the execution of a parallel program.

U(n) = R(n)*E(n)

2.3.7 Quality . This metric combines the effects of speedup, utilization, and redundancy to assess relative merits of parallel computation. It is directly proportional to the speedup and efficiency and inversely related to the redundancy.

Q(n) = S(n)*E(n)/R(n)

Since the efficiency is always a fraction and the redundancy is greater than one then the quality of the parallel system is upper bounded by the speedup.

2.3.8 Reliability. It is measured by the probability of errors or by the mean time between errors.

2.3.9 availability. The availability of a resource is the fraction of the time the resource is available to service users requests. Two other terms are used in system analysis are downtime and uptime, system downtime is the time period where the system is not available, the uptime is the time during which the system is available.1

2.3.10 Cost/performance ratio. this metric is used to compare two or more systems in terms of their cost and performance. The cost should inched the cost of hardware, software, etc. the efficiency is measured in terms of throughput for a pre specified response time.

3. Performance means.

To understand the concepts of performance analysis and the significance of the output of the performance techniques, one should understand the general concepts of mathematical performance laws and performance means. In this section performance means are discussed briefly.

3.1 Arithmetic mean performance. It is defined as the ratio of the sum of all execution rates of all programs to the total number of programs used. Symbolically, let Rj be the execution rate of program j where j =1,2,…m, then the arithmetic mean performance

Ra = å Rj / m, for all j =1 to m.

this is true only if all programs have equal weighting. If programs are weighted, then weighted arithmetic mean is defined as:

Ra* = å fjRj / m, for all j =1 to m

Where fj is the weight of program j.

3.2 Geometric mean performance. The geometric mean of n values is obtained by multiplying the values together and taking the nth root of the product. It is used only if the product of the values is a quantity of interest. The geometric mean of the execution rates Rj is:

Rg = (Õ Rj )1/m , j = 1,2,… m

Where m is the number of programs. Similarly this value is correct if only all the programs have equal weights. Neither the arithmetic mean nor the geometric mean represents the real performance of the execution rates of benchmark programs, which leads to the definition of a third kind of performance means.

3.3 Harmonic performance means. The harmonic mean of n values is defined as the ratio of the number of values n to the sum of all the fractions 1/xj where xj is the jth value in the set of values. In the context of performance analysis, it is defined as the average performance across a large number of programs running in various execution modes, these modes corresponds to scalar, vector, sequential, or parallel processing with different program parts. Symbolically the harmonic performance mean is:

Rh = m / å (1 / Rj)

Where Rj is the execution rate of program j, m is the total number of programs and Rh is the harmonic mean if equal weighting of all programs. The harmonic mean is the closest of the means to the real performance.

4. Speedup performance laws.

4.1 Asymptotic speedup law. Consider the parallel system of n processors and , denote wi as the amount of work done with a degree of parallelism (DOP) equals i. The execution time of wi on a single processor is ti(1), similarly the execution time of wi on k processors is ti(k). The response time of T(1) is the response time of a single processor system when executing wi. And T(¥) is the response time of executing the same workload wi if an infinite number of processors is available. At this point the asymptotic speedup S¥ is defined as the ratio of T(1) to T(¥).