High-Level Area Estimation for

Customizable Array of Processors

By

Adam Douglas Harbour

B.S.E. (University of Michigan, Ann Arbor) 2005

THESIS

Submitted in partial satisfaction of the requirements for the degree of

MASTER OF SCIENCE

in

Electrical and Computer Engineering

in the

OFFICE OF GRADUATE STUDIES

of the

UNIVERSITY OF CALIFORNIA

DAVIS

Approved:

______

______

______

Committee in Charge

2007

Acknowledgements

I hereby acknowledge that Xilinx has permitted me use of the following figures that were originally published by Xilinx, Inc:

  • Figure 2.2
  • Figure 2.3
  • Figure 2.4
  • Figure 2.5
  • Figure 3.2
  • Figure 3.4
  • Figure 3.5

Table of Contents

1 Introduction …………………………………………………………………………... / 1
1.1 Objectives ……………………………………………………………………… / 1
1.2 Method …………………………………………………………………………. / 2
1.3 Related Works …..…………………..………………………………………….. / 2
1.3.1 General Chip Multiprocessor Research ………………………………... / 2
1.3.2 Soft Core Processor Research ………………………………………….. / 3
1.3.3 Area Estimation Research ……………………………………………… / 3
1.3.4 FPGA Synthesis and Mapping Research ………………………………. / 4
1.3.5 CAP Software Research ………………………………………………... / 4
1.4 Thesis Outline ………………………………………………………………….. / 4
2 Background …………………………………………………………………………... / 6
2.1 Customizable Array of Processors ……………………………………………... / 6
2.1.1 Motivation ……………………………………………………………… / 6
2.1.2 Processor Customization ……………………………………………….. / 7
2.1.3 Interconnect Architecture ………………………………………………. / 9
2.2 Xilinx MicroBlaze ……………………………………………………………... / 11
2.2.1 Processor Core …………………………………………………………. / 11
2.2.2 Processor Memory ……………………………………………………... / 12
2.2.3 Processor Peripherals …………………………………………………... / 12
2.2.4 Inter-Processor Communication ……………………………………….. / 13
2.3 Xilinx Virtex-II Pro …………………………………………………………….. / 14
2.3.1 FPGA Features …………………………………………………………. / 14
2.3.2 XUP Virtex-II Pro Board ………………………………………………. / 16
2.4 FPGA Design Flow …………………………………………………………….. / 18
2.4.1 Introduction …………………………………………………………….. / 18
2.4.2 Design Entry …………………………………………………………… / 18
2.4.3 High-Level Synthesis …………………………………………………... / 19
2.4.4 Technology Independent Logic Optimizations ………………………… / 19
2.4.5 Technology Mapping and Further Optimizations ……………………… / 20
2.4.6 Place and Route ………………………………………………………… / 21
2.4.7 Xilinx Flow Differences ……………………………………………….. / 21
3 Hardware Profiles ……………………………………………………………………. / 23
3.1 Profiling Methodology …………………………………………………………. / 23
3.2 MicroBlaze Element Profiles …………………………………………………... / 25
3.2.1 Base System ……………………………………………………………. / 25
3.2.2 Memory ………………………………………………………………… / 27
3.2.3 Additional Processor Cores …………………………………………….. / 28
3.2.4 Low Impact Processor Customization …………………………………. / 28
3.2.5 Barrel Shifter …………………………………………………………… / 29
3.2.6 Integer Divider …………………………………………………………. / 30
3.2.7 Integer Multiplier ………………………………………………………. / 31
3.2.8 Floating Point Unit ……………………………………………………... / 32
3.2.9 Area Optimization ……………………………………………………… / 33
3.2.10 Fast Simplex Links …………………………………………………… / 34
3.2.11 Timers ………………………………………………………………… / 37
3.2.12 On-Chip Peripheral Bus and Arbitration ……………………………... / 38
3.2.13 Processor Frequency Concerns ……………………………………….. / 40
3.2.14 Profile Summary ……………………………………………………… / 40
4 Area Estimation Results ……………………………………………………………… / 43
4.1 Estimation Automation ………………………………………………………… / 43
4.2 Accuracy Satisfaction ………………………………………………………….. / 44
4.3 Processor Chain Accuracy ……………………………………………………... / 46
4.3.1 IA Overview ……………………………………………………………. / 46
4.3.2 Accuracy ……………………………………………………………….. / 46
4.4 Star Architecture Accuracy …………………………………………………….. / 48
4.4.1 IA Overview ...... / 48
4.4.2 Accuracy ……………………………………………………………….. / 48
4.5 Mesh Architecture Accuracy …………………………………………………... / 50
4.5.1 IA Overview ……………………………………………………………. / 50
4.5.2 Accuracy ……………………………………………………………….. / 51
4.6 Overall Accuracy ………………………………………………………………. / 52
4.7 Accuracy Observations ………………………………………………………… / 52
4.8 Other FPGA Families ………………………………………………………….. / 53
5 The CAP Tool Chain ………………………………………………………………… / 56
5.1 Overview ……………………………………………………………………….. / 56
5.2 CAP Simulation ………………………………………………………………... / 58
5.2.1 Integer Application …………………………………………………….. / 58
5.2.2 Floating Point Application ……………………………………………... / 60
5.2.3 Workload Imbalance …………………………………………………… / 63
6 Conclusion …………………………………………………………………………… / 65
6.1 Discussion ……………………………………………………………………… / 65
6.2 Future Work ……………………………………………………………………. / 66
Bibliography ……………………………………………………………………………. / 68
Appendix A: List of Acronyms and Abbreviations …………………………………….. / 73

List of Figures

2.1 Sample Interconnect Architectures ………………………………………………… / 10
2.2 The Xilinx MicroBlaze Soft Core Architecture ……………………………………. / 11
2.3 General Representation of Virtex-II Pro Architecture ……………………………... / 15
2.4 Diagram for the Top Half of a Virtex-II Pro Slice …………………………………. / 16
2.5 Virtex-II Pro Board Block Diagram ……………………………………………….. / 17
3.1 Base Allowable MicroBlaze System ………………………………………………. / 26
3.2 8-bit Barrel Shifter ………………………………………………………………… / 30
3.3 32-bit Multiplication with Hard 18x18 Multipliers ………………………………... / 32
3.4 Single Ported 16x1 LUT RAM and Dual-Ported Alternative ……………………… / 36
3.5 General OPB Master and Slave Interface ………………………………………….. / 39
4.1 A Six Processor, Unidirectional Chain …………………………………………….. / 46
4.2 A Six Processor Star Architecture …………………………………………………. / 49
4.3 A Six Processor, Unidirectional Mesh ……………………………………………... / 51
5.1 The Customizable Array of Processors Tool Chain ……………………………….. / 57

List of Tables

3.1 Processor Core and Peripheral Logic Utilization …………………………………... / 41
3.2 FSL Buffer Depth Logic Utilization ……………………………………………….. / 41
3.3 FSL Port Logic Utilization …………………………………………………………. / 41
3.4 Additional OPB Master Connection Logic Utilization …………………………….. / 42
4.1 Accuracy Results for a 6 Processor Chain …………………………………………. / 47
4.2 Accuracy Results for a 6 Processor Star Architecture ……………………………... / 50
4.3 Accuracy Results for a 6 Processor Mesh Architecture ……………………………. / 51
4.4 Base System Utilization Across Different FPGAs ………………………………… / 54
5.1 Bitonic Sort Throughput …………………………………………………………… / 59
5.2 Post-map Slice Estimation for Bitonic Sort Architectures ………………………… / 59
5.3 Time Delay Equalization Throughput ……………………………………………... / 61
5.4 Post-map Slice Estimation for TDE Architectures ………………………………… / 61
5.5 FM Radio Throughput ……………………………………………………………... / 63
5.6 Post-map Slice Estimation for FM Architectures ………………………………….. / 63

1