CSCE614 Computer Architecture (Fall 2013)

Assignment #2

Due: 10/7 11:20AM (Report must be submitted in the class time)

Objective

This project is to help you understand how pseudo-associative (column-associative) cache works. You will initially analyze the sensitivity of L1 caches to changes in parameters. Then you are to implement L1 data cache as pseudo-associative in SimpleScalar and compare its performance to the normal direct-mapped L1 data cache.

System Requirement

Linux operating system is needed in order to use the pre-compiled little-endian Alpha ISA SPEC2000 binaries. Do not use Cygwin. If you don’t have any linux machine, please use linux.cs.tamu.edu with your CS account. If you don’t have CS account, contact HelpDesk located in the first floor.

Setting up the environment and installing SimpleScalar

1. Download and Install SimpleScalar 3.0.

(1) Download simplesim-3v0e.tgz from http://www.simplescalar.com/.

(2) Untar the downloaded file.

$ tar xzvf simplesim-3v0e.tgz

(3) Read the README file under simplesim3.0 directory you have just untarred.

(4) Compile the simulator according to the instructions.

$ make config-alpha

$ make

Note: Some versions of GCC may generate compilation errors. In this case, use the version of GCC, which is already installed in the department linux machine, linux.cs.tamu.edu.

(5) After you get the simulator, execute 'sim-outorder', and you will get all the configurable parameters in the out-of-order simulator and their default values.

2. Get the benchmark.

(1) Download alpha binaries of SPECcpu 2000 benchmark from the following link.

http://students.cse.tamu.edu/rahulboyapati/spec2000binary.tgz

(2) Untar the downloaded file.

$ tar xzvf spec2000binary.tgz

3. Get run scripts and argument files.

(1) Download files from the following links.

http://students.cse.tamu.edu/rahulboyapati/spec2000args.tgz

http://students.cse.tamu.edu/rahulboyapati/runscripts.tgz

(2) Untar the files using tar command.

(3) Each run script contains the executable scripts to run each benchmark.

(4) Each benchmark needs its own arguments which are stored in the files.

(5) Select 2 integer and floating point benchmarks according to the last digit of your UIN.

Last digit of UIN / Integer / Floating Point
0 / bzip2, crafty / ammp, applu
1 / crafty, gap / applu, apsi
2 / gap, gcc / apsi, art
3 / gcc, gzip / art, equake
4 / gzip, mcf / equake, fma3d
5 / mcf, parser / fma3d, galgel
6 / parser, twolf / galgel, lucas
7 / twolf, vortex / lucas, mesa
8 / vortex, vpr / mesa, mgrid
9 / vpr, bzip2 / mgrid, swim

4. Run benchmarks using compiled SimpleScalar binary.

(1) Copy the script to the directory where the argument files are stored.

Note: The script file and argument files must be in the same directory.

$ cp (script dir)/RUN(benchmark) (spec2000args dir)/(benchmark)

Ex) Assuming tar files are extracted in the current directory

$ cp runscripts/RUNequake spec2000args/equake

(2) Run the script

$ cd (spec2000args dir)/(benchmark)

$ ./RUN(benchmark) (simplescalar dir)/sim-outorder (spec2000bin dir)/(benchmark)00.peak.ev6 (simplescalar options)

Ex) Assuming tar files are extracted in the current directory

$ cd spec2000args/equake

$ ./RUNequake ../../simplesim-3.0/sim-outorder ../../spec2000binaries/equake00.peak.ev6 –max:inst 50000000 –fastfwd 20000000 –redir:sim output1.txt –bpred bimod –bpred:bimod 256 –bpred:ras 8 –bpred:btb 64 2

Procedure

Implement a pseudo-associative cache in L1 data cache in SimpleScalar. Run sim-outorder to compare the performance to the normal direct-mapped L1 data cache using SPEC2000 benchmarks. Use the integer and floating-point benchmarks according to the last digit of your UIN.

When running sim-outorder, use the following options as default.

-max:inst 50000000 -fastfwd 20000000 -redir:sim sim_output_file

-bpred 2lev –bpred:2lev 1 256 4 0 –bpred:ras 8 –bpred:btb 64 2

Since the assignment would require you to modify the L1 cache configurations, you can use an unified 64 KB L2 cache with a 64B cache block and 2-way associativity.

If you are running SimpleScalar in linux.cse.tamu.edu, be sure you are not monopolizing computational resources on the machine. Do not run more than 1 instance at a time in linux.cse.tamu.edu. It is violation of section 3.3 of the Appropriate Use of Computer Science Computing Resources Policy, located here: http://www.cse.tamu.edu/department/policies/resources

Don't run more than one instance of any benchmark simultaneous in the same machine. It may cause errors. Run one instance at a time per benchmark.

Assignment

Part A.

In the first part of the assignment you will be evaluating the sensitivity of L1 caches to changes in various parameters like cache size, block size, associativity and replacement policy. You will need to run the simulations on all the different configurations and analyze the effects of changing cache parameters on the performance.

Configurations / Size / Associativity / Cache block size / Replacement policy
1 (baseline) / 4 KB / Direct mapped / 32 B
2 / 4 KB / {4 , 8, fully} / 32B / {LRU,random}
3 / 4 KB / Direct mapped / 64B
4 / 16 KB / Direct mapped / 32B

You need to report the appropriate cache performance results and the analysis as to why you see this particular behavior. Please explain why you think you see particular behavioral patterns in each of the configurations. Also explain the effect of change in performance in L1 caches on the performance of the L2 cache.

You need to read up on the options you need to use to simulate the cache configurations. They will show up in the configurable parameters when you execute sim-outorder as in step 1.(5) in setting up the simulator.

Part B.

0. Reading

(1) Anant Agarwal and Steven D. Pudar, “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” ISCA 1993

1. Guideline

Direct-mapped caches are the solution for simple and easy-to-design caches with short hit access time. However, the biggest drawback of using direct-mapped caches is the large number of conflict misses. Pseudo-associative caches resolve conflicts by allowing alternate hashing functions and show much higher hit rate than normal direct-mapped caches while maintaining almost the same hit access time.

Basically a pseudo-associative cache is the same as a direct-mapped cache. The fundamental idea is to resolve conflicts by dynamically choosing different locations, which are accessed by different hashing functions. When a conflict miss happens, the pseudo-associative cache tries to avoid it by relocating the cache block using another rehashing function. The simplest solution of rehashing function is bit selection with the highest-order bit inverted, which is called bit flipping.

In order to avoid secondary thrashing effect, which is explained in detail in the reference paper, each cache block is expanded to have extra 1-bit information called a rehash bit that indicates whether the block is a rehashed location or not.

2. Design

Add a new CACHE_TAG_PSEUDOASSOC macro in cache.c to get a tag value with the high-order bit of the index appended at the end.

#define CACHE_TAG_PSEUDOASSOC(cp, addr) …

Add one more variable in struct cache_blk_t for the rehash bit as following. The rehash bit must be initialized to 1 when the pseudo-associative cache is first created in cache_create() function in cache.c.

int rehash_bit;

You must modify cache_access() function in cache.c to implement the pseudo-associative cache for L1D. Since cache_access() is a general function used by all caches in the system and the pseudo-associative cache is only for L1D, you need to write new code for pseudo-associative cache specific to L1D.

3. Implementation

Add the following options for pseudo-associative cache.

-pseudoassoc <true/false> # false # use pseudo-associative cache in L1D

4. Comparison

Compare performance of the two L1D cache configurations assuming the same size (128 sets * 32-byte block size = 4KB).

(1) Normal direct-mapped L1D : -cache:dl1 dl1:128:32:1:l -pseudoassoc false

(2) Pseudo-associative L1D : -cache:dl1 dl1:128:32:1:l -pseudoassoc true

You do not need to consider various hit access times in the pseudo-associative cache. Focus on only hit/miss rates (dl1.hits/misses/miss_rate in SimpleScalar results).

Turning Instruction

1. Make all your files including modified source codes, simulation results and the report into one zipped file. We accept zip files only. If you send a different file format, you may receive 0 points for the assignment.

Your report must contain simulation results (You should include SimpleScalar log files in the zipped file, but don’t put the whole log in your report.) and analysis of them. Any result you consider important can be used. Only Microsoft DOC (DOCX) or PDF is acceptable for the report.

2. Send the zipped file to with the following in the email’s subject line:

Assignment2 (Your Full Name)

3. “IMPORTANT” Be sure to turn in the hard copy of your report including simulation results in the class time. It should be the same as the one submitted via email.

4. Penalty of Late submission: 5% deduction per day