Question 1: (5 points)
HP v.4 question 2.1.
Question 2: (5 points)
HP v.4 question 2.2.
Question 3: (10 points)
HP v.4 question 2.3.
Question 4: (10 points)
HP v.4 question 2.4.
Question 5: (10 points)
HP v.4 question 2.5.
Question 6: (5 points)
HP v.4 question 2.6.
Question 7: (10 points)
HP v.4 question 2.7.
Question 8: (15 points)
HP v.4 question
2.8.
Question 9: (40 points)
Read the paper ÒInstruction Sets and Beyond: Computers, Complexity, and
Controversy,Ó by Colwell, et. al. and answer the questions below:
Question A: Before describing a ÒRISC
Manifesto,Ó the authors discuss the Òdrive toward CISC
machines.Ó Discuss/explain 3 reasons why CISC type
architectures evolved to become
the predominant ISA form prior
to the RISC/CISC debate put forth in the paper.
Question B: What
is microcode? How has it enabled
CISC scaling and additional ISA complexity?
Question C: Describe the ideas/design
principles behind the 801 machine.
(i.e. How were the
designers trying to improve
performance over the state of the art?)
Question D: Referencing the Colwell paper,
list and explain 2 fallacies in the pro-RISC arguments of
the day.
Question E: Think about the different
approaches to benchmarking performance chosen by the RISC
and CISC design
communities. In your opinion was
the pro-CISC approach better or was the pro-RISC approach better? Justify your answer.
Question 10: (40 points)
This question covers pipelining and hazards with
SimpleScalar. To answer this
question, start with the sim-safe simulator. The main loop of the
simulator, sim_main(), executes each instruction
and increments the cycle counter by one. Note that sim-safe does NOT model the timing of the execution—it only
models the functional effects of each instruction. To model timing, youÕll have to modify
sim-safe.c to count how many cycles have elapsed during each iteration of
sim_main(). Run all experiments with the three working benchmarks.
Question A: Assume your processor is a 3-wide superscalar (i.e., can execute
a maximum of 3 instructions per cycle). Assuming no hazards of any kind, what
is its performance (i.e., how many cycles does it take to run)?
Question B (Structural Hazards):
Now assume that the L1 data cache has only port and thus the processor can only
execute at most one memory operation (load or store) per cycle. How does this
affect its performance?
Question C (Data Hazards):
Now further assume that the processor cannot execute data dependent
instructions in the same cycle. For example, if an instruction writes to
register 2, then no subsequent instruction (in program order) that reads
register 2 can execute in the same cycle (it must wait until the next cycle).
How does this affect performance? Note that this question is independent of the
pipeline length.
Question D (Control Hazards):
Now further assume that the processor has a 10-stage pipeline. The result of a
conditional branch (i.e., taken or not-taken) is computed in stage 7. The
processor statically predicts all conditional branches as taken and continues
fetching from the branch destination. If the branch is indeed taken, then there
is no penalty. If the branch is not taken, then all instructions after it are
squashed and fetching resumes from the instruction immediately after the branch
in program order. How does this affect performance?