[1] Accelerated Strategic Computing (ASCI) Initiative. A report by US Department of Energy, Lawrence Livermore ,Los Alamos, SandiaNational Laboratory,1996

[2] Interconnection Networks, J. Duato, S. Yalamanchili, L. Ni, Morgan Kaufman, 2002

[3] Boden, NJ et al, "Myrinet: A Gigabit-per-SecondLocal Area Network", IEEE Micro, Feb. 1995

[4] Stenstrom, P., Joe, T., and Gupta, A. Comparative performance evaluation of cache-coherent numa and coma architectures. In Proceedings of the 19th International Symposium on Computer Architecture (1992), IEEE Computer Society, IEEE Press, pp. 80--91

[5] Adve S, Hill M, Vernon M. Comparison of Hardware and Software Cache Coherence Schemes. Proc. of the 18th Annual International Symposium on Computer Architecture, 1991, (Jun.): 298~308

[8] Hwang K. Advanced computer architecture: parallelism, scalability, and programmability. McGraw-Hill, 1993

[9] Silicon Graphics, Origin 200 and Origin 2000, Technical Report, 1996

[10] Stephen R. Wheat Timothy G. Mattson,David Scott. A TeraFLOPS Supercomputer in 1996: The ASCI TFLOP System. In Proceedingsof the 1996 International Parallel Processing Symposium, 1996

[11] Tom Anderson, David Culler, Dave Patterson,and the NOW Team. A Case for NOW (Networks of Workstations). IEEE Micro 15, 1, February1995, pp. 54-64

[12] Brent R P.The parallel Evaluation of General Arithmetic Expressions.Journal of the ACM, 1972, 21(2):201-206

[13] Amdahl G.Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities。AFIPS Conf.Proc.30,April,Thompson Books,Washington D.C,1967,483-486

[14] Gustafson JL.Reevaluating Amdahl’s Law.Comm.of ACM, 31(5):532-533, 1988

[15] Sun X H, Ni L M.Another View of Parallel Speed.Proc.Supercomputing’90, 324-333, 1990

[16] Kumar V, Rao V N.Parallel Depth-Firsh Search, PartⅡ:Analysis.Int’l J.of Parallel Programming, 16(6):501-519, 1987

[17] Sun X H, Rover D T.Scalability of Parallel Algorithm-Machine Combinations.IEEE Trans.on Parallel and Distributed, Systems, 5(6):519-613, 1994

[18] Zhang X D, Yan Y, He K Q.Latency Metric:An Experimental Method for Measuring and Evaluating Parallel Program and Architecture Scalability.J.of Parallel and Distributed Computing, 22:392-410, 1994

[19] S. Fortune and J. Wyllie. Parallelism in random access machines. Proc. 10th Annual ACM Symp. on Theory of COmputing, San Diego, California, 1978, 114-118

[20] 陈国良,并行算法的可扩放性分析, 小型微型计算机系统,Vol.16,No.2,pp.10-16,1995

[21] Ben HH Juurlink, Harry AG Wijshoff: A Quantitative Comparisonof Parallel Computation Models. ACM Trans. Comput. Syst. 16(3): 271-318 (1998)

[22] MarkGoudreau, Kevin Lang, Satish Rao, Torsten Suel, Thanasis Tsantilas: Towards Efficiencyand Portability: Programming with the BSP Model. SPAA 1996: 1-12

[23] T. Cheatham, A. Fahmy, D. C. Stefanescu, and L. G. Valiant. Bulk synchronous parallel computing - A paradigm for transportable software. In Proc. of the 28th Hawaii International Conference on System Sciences. Vol. 2: SoftwareTechnology, pages 268--275, 1995.

[24] Chlebus B, Vrto I. , Parallel Quick Sort. Journal of Parallel and Distributed Computing,1991 ,

11:332-337

[25] ekel E, Nassimi D, Sahni S. Parallel Matrix and Graph Algorithms. SIAM j. on Computing,1981,10:657---673

[26] Galil Z. Optimal Parallel Algorithms for String Matching. Info. and Control, 1985,67(1---3)

144--157

[27] Hoare C A R. Quicksort. Computer Journal,1962,5:10-15

[28] JaJa J. An Introduction to Parallel Algorithms. Addison-Wesley Pub. Company, 1992

[29] Knuth D E,Morris I H, Pratt V B. Fast Pattern Matching in String. SIAM J. Computing.

1997,6(2):189-195

[30] Sedgewick R. Implementing Quicksort Programs. Communication of the ACM, 1978, 21

(10):847--857

[31] Singh V, Kumar V, Agha G et al. Efficient Algorithms for Parallel Sorting on Mesh Multi-computers. International Jounal of Parallel Programniug,1991,20(2):95---131

[32] Vishkin U. Optimal Parallel Matching in Strings. Info. and Control, 1985,67(1-3) :91-113

[33] Wagar B A. Hyperquicksort: A Fast Sorting Algorithm for Hypercubes. Pros. of the Second

Conference an Hypercube Multiprocessors, 1987,292-299

[34] E. Horowitz and A. Zorat,”Divide-and-conquer for parallel processing," IEEE Trans. Comput., vol. 32, pp. 582--585, June1983.

[35] Daniel S. Hirschberg: Parallel Algorithmsfor the Transitive Closure and the Connected Component Problems STOC 1976: 55-57.

[36] HT Kung, "Why systolic architectures ?", IEEE Computer 15, 1 (1982), 37-46.

[37] Richard Cole and Uzi Vishkin. Deterministic coin tossing withapplications to optimal parallel list ranking. Information and Control ,70(1):32-53, July 1986.

[38] AV Goldberg, SA Plotkin, and GE Shannon. Parallel symmetry-breaking in sparse graphs.SIAM J. Desc. Math., 1:434–446, 1989.

[39] JaJa J. An introduction to parallel algorithm. Addison-Wesley Pub. Company, 1992

[40] Benjamin W. Wah, Guo-Jie Li, CheeFen Yu: Multiprocessing of Combinatorial Search Problems. IEEE Computer 18(6):93-108 (1985)

[41] Parnas and Paul C Clements A rational design process: how and why tofake it IEEE Transactions on Software Engineering, SE-12(2), pp251-257, Feb 1986.

[42] G. Fox, et al Solving Problems on Concurrent Processors, PrenticeHall 1988.

[43] GC Fox, RD Williams, and PC Messina.Parallel Computing Works! Morgan Kauffman Publishers, Inc., 1994.

[44] Nichol, Salz "An Analysis of Scatter Decomposition", IEEE Transactions on Computers, November 1990, pages 1153-1161.

[45] Foster I. Designing and building parallel programs: concepts and tools for parallel software engineering, Addison-Wesley, 1995

[46] Feng T Y. A Survey of Interconnection Networks. IEEE Computer, 1981,14 (12):12- 27

[47] Hwang K. Advanced Computer Architecture. Parallelism, Scalability, Programmability. Mc-Graw-Hill. Inc. .1993

[48] Kumar V, Gupta A, Gupta A et al. Introduction to Parallel Computing: Design and Analysis

of Algorithms. Benjamin/Cummings Publishing Company, Inc. , 1994

[49] Berntsen J. Communication Effcient Matrix Multiplication on Hypercubes. Parallel Computing,1989,12:335---342

[50] Bertsekas D P and Tsitsilklis J N. Parallel and Distributed Computation, Numerical Methods. Prentice-Hall, 1989

[51] Cannon L E. A Cellular Computer to Implement the Kalman Filter Algorithm: Ph. D. thesis.Montana State Univ. ,1969

[52] Fox G C,Otto S W, Hey A J G. Matrix Algorithms on Hypercube I: Matrix Multiplication.

Parallel Computing, 1987,4:17--31

[53] Golub G H, Loan C V. Matrix Computations. (2nd Ed). The Johns Hopkins Univ. Press.1989

[54] Gupta A and Kumar V. The Scalability of Matrix Multiplication Algorithms on Parallel Computers. Proc. lnt' l 93 Conference on Parallel Processing, 1993 , Ⅲ~115, Ⅲ~119

[55] Ho C T, Johnssson S L, Edelman A. Matrix Multiplication on Hypercubes using Full Bandwidth and Constant Storage. Proc. Int'l 91 Conference on Parallel Processing, 1997,447---451

[56] Kumar V, Gupta A, Rao V. Scalable Load Balancing Techniques far Parallel Computers. J.

Parallel & Distributed Ccanputing,1994,22(1) :60---79

[57] Kumar V, Gupta A, Gupta A et al. Introduction to Parallel Computing: Design and Analysis

of Algorithms. Benjamin/Cummings Publishing Company, Inc. , 1994

[58] Don Heller A survey of parallelalgorithms in numerical linear algebra, SIAM Rev.20 (1978), pp. 740—777

[59] JM Ortega, Introduction to Parallel andVector Solution of Linear Systems, Plenum Press, New York, 1989.

[60] KA Gallivan, RJ Plemmons,and AH Sameh, Parallel algorithms for dense linear algebra computations, SIAM Rev.32 (1990), no. 1, 54–135.

[61] MT.Heath, E.Ng and BW.Peyton, Parallel algorithms for sparse linear systems, SIAMReview, Vol. 33, 1991, pp. 420-460

[62] JM Ortega and RG Voigt. Solutionof partial differential equations on vector and parallel computers. SIAM Review,27:149-240, 1985.

[63] 并行计算方法:《数值并行计算原理与方法》张宝琳等,国防工业出版社,1999

[64] JW Cooley and JW Tukey, “An algorithm for the machine caculation of complexfourier series,” Math. Comp., vol. 19, pp. 297–301, April 1965.

[65] Nussbaumer, H. J. Fast Fourier Transform and ConvolutionAlgorithms, 2nd ed. New York: Springer-Verlag, 1982.

[66] Paul N. Swarztrauber. Multiprocessor FFTs. Parallel Computing,5:197-210, 1987.

[67] Averbuch, E. Gabber, B. Gordissky and Y. Medan, "A Parallel FFT on a MIMD Machine,"Parallel Computing, vol. 15, 1990, pp. 61-74

[68] Blumrich M A, Dubnicki C, Felten E W et al. Protected User-Level DMA for the SHRIMP

Network Interface, Proc.2th Int' l Symp. on High-Performance Computer Architecture, 1996

[69] Comer D E. Internetworking with TCP/IP. 3nd Ed. Prentice-Hall,1995

[70] Lauria M, Chien A. MPI-FM:High Performance MPI on Workstation Clusters. J. of Parallel

and Distributed Computing, 1997,40(l):4- 18

[71] Mellor-Crummey J M, Scott M L. Algorithms for Scalable Synchronization on Shared Memory Multiprocessors. ACM Trans. Computer Systems,1991, 9{ 1} :21-b5

[72] Messina P, Sterling T (Eels) . System Software and Tools for High Performance Computing

Environment. SIAM, 1993

[73] Pancake C. M. Software Support for Parallel Computing: Where are We Headed? Comm. of

the AGM, 1991.34(11) :53 --G4

[74] Pfister G F. In Search of Clusters. Prentice-Hall PTR, 1995

[75] IEEE, POSIX P1003. 4a: Threads Extension for Portable Operating Systems, IEEE, 1994

[76] Snir M et al. The Communication Software and Parallel Environment of the IBM SP2. IBM Systems Journal, 1995 , 34 (2).205 – 221

[77] Stallings W. Operating Systems (2nd Ed). Prentice-Ha11,1995

[78] Agha G, Concurrent Object-Oriented Programming. Comm. of the ACM, 1990, 33 (9). 125 -

141

[79] Allan S J, Oldehoeft R, HEP SISAL: Parallel Functional Programming. Kowalik (Ed). Parallel MIMD Computation: HEP Supercomputers and Applications. MIT Press, 1985

[80] ANSI Technical Committee X3H5. Parallel Processing Model for High-level Programming

Languages, 1993

[81] Bal H E, Steiner J G, Tanenbaum A S. Programming Languages for Distributed Computing

Systems. ACM Computing Surveys, 1989,21(3).261~322

[82] OpenMP Standards Board. OpenMP: A Proposed Industry Standard AN far Shared Memory

Programming, Oct. 1997

[83] OpenMP Standards Board. OpenMP Fortran Application Program Interface Version I. 0,

Oct. 1997,

[84] IEEE, POSIX P1003. 4a: Threads Extension for Portable Operating Systems, IEEE, 1994

[85] Silicon Graphics, IRIS Power C User's Guide. Silicon Graphics Computer Systems, 1989

[86] Wilson G V, Lu P (Eds). Parallel Programming Using C+ + . MIT Press, 1996

[87] Xu Z,Hwang K. Coherent Parallel Programming in C//. Proc. of Int' l Conf. on Advances in

Parallel and Distributed Computing, IEEE Computer Society Press, Mar. 1997 ,116---122

[88] Adams J et al. The Fortran 90 Handbook. McGraw-Hill,1992

[89] Adams J et al. The Fortran 95 Handbook. MIT Press, 1997

[90] Chapman B et al. . Extending HPF for Advanced Data-Parallel Applications. IEEE Parallel &

Distributed Technology, 1994,2(3):15-27

[91] Fox G et al. FORTRAN D Language Specification. Rice Univ. , 1992.

[92] Geist A et al. PVM:Parallel Virtual Machine-A User's Guide and Tutorial for Networked

Parallel Computing. MIT Press, 1994

[93] Hillis W D, Steele G L. Data Parallel Algorithms. Comm. ACM, 1986,29(12).1170-1183

[94] Hwang K, Xu Z Scalable Parallel Computing. Technology, Architecture Programming.

WCB/McGraw-Hill Companies,1998

[95] Koelbel C et al. The High Performance Fortran Handbook. MIT Press, 1994

[96] Mehrotra Pet al. High Performance Fortran: History, Status and Future. Parallel Computing,1998,24:325---354

[97] MPI Forum, MPI: A Message Passing interface, Proceedings of Supercomputing' 93. IEEE

Computer Society,1993,878-883

[98] Zima H et al. Vienna FORTRAN-A Language Specification. ICASE,1992. Version 1.1

[99] Alliant. Alliant Product Summary. Alliant Computer Systems Corporation, 1989

[100] Babaoglu O et al. Paralex: An Environment for Parallel Programming in Distributed Systems.Proc. of ACM Int' l Conf. on Supercomputing,1992

[101] Banerjee U. Dependence Analysis for Supercomputing. Boston: Kluwer Academic Press, 1988

[102] Beguelin A et al- Visualization and Debugging in a Heterogeneous Environment. IEEE Computers, 1993,26(6)

[103] Boudier G et al. An Overview of PCTE+ . SIGPLAN,1982,2(24) :248---257

[104] Brown J S. Debuggers for High Performance Computers, Proc. of the Supercomputing' 93,1993

[105] Cheng Y. A Survey of Parallel Programming Languages and Tools. Technical Report RND-93-

[106] Gosling J. Unix Emacs. Carnegie-Mellon Computer Science Dept,, 1982

[107] Gupta A and Kumar V. The Scalability of Matrix Multiplication Algorithms on Parallel Computers. Proc. lnt' l 93 Conference on Parallel Processing, 1993 , Ⅲ~115, Ⅲ~119

[108] Hwang K. Advanced Computer Architecture. Parallelism, Scalability, Programmability. Mc-Graw-Hill. Inc. .1993

[109] Kacsuk P et al. A Graphical Development and Debugging Environment for Parallel Pro-

grams. Parallel Computing, 1997,22 :1747---1770

[110] Luque E et al. Overview and New Trend on PSEE. IEEE software ,1992

[111] Newton P, Browne J C. The CODE 2. 0 Graphical Parallel Programming Language. Proc. of

ACM Int’ l Conf on Supercomputing,1992

[112] Reiss S P. Software Tools aril Environments. ACM Computing Surveys,1996,28(1):281---284

[113] Ries B. The Paragon Perforn3anoe Monitoring Environment. Pros. of the Supercomputing'

93,1993

[114] Ross D T. Applications and Extensions of SALT. IEEE Cornputer,1985,18(4) :25---35

[115] Rumbaugh J et al. Object-Oriented Modeling and Design. Prentice-Hall,1991

[116] Scheidler C,Schafers L. TRAPPER: A Graphical Parallel Programming Environment for In-

dustrial High Performance Applications. Proc. of PARLE' 93: Parallel Architecturesand Languages, 1993

[117] Wolfe M. High-Performance Compilers for Parallel Computing. Addison –Wesley,pub. Company,1996

[118] NASA Ames Research Center, 1993

[119] Cheng D, Hood R. A Portable Debugger for Parallel and Distributed Programs, Proc. of the Supercomputing' 94.1994

[120] Banerjee U. Dependence Analysis. Boston: Kluwer Academic Publishers, 1996

[121] Blume W, Eigenmann R. Performance Analysis of Parallelizing Compilers on the Perfect

Benchmarks Programs . IEEE Trans. on Parallel and Distributed Systerns,1992, 3(6) :643---656

[122] Blume W et al. Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing. IEEE Parallel aral Distributed Technology, 1994,2(3):37-47

[123] Blume W et al. Parallel Programming with Polaris. IEEE ccmputer,1996,29t12):78---82