Upload
rafe-jefferson
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Computer Architecture CSE 3322Lecture 8
TEST 1 – Tuesday March 3
Lectures 1 - 8, Ch 1,2
Web Sitecrystal.uta.edu/~cse3322
TEST 1 – Tuesday March 3Lectures 1 - 8, Ch 1,2
• HW Due Feb 24– 1.4.1 p.60– 1.4.4 p.60– 1.4.6 p.60– 1.5.2 p.60-61– 1.5.4 p.61– 1.5.5 p.61
CPI
Invest Resources where time is Spent!
CPI = Clock Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count
“Average clock cycles per instruction”
CPU Time = Instruction Count x CPI / Clock Rate = Instruction Count x CPI x Clock Cycle Time
Average CPI = SUM of CPI (i) * I(i) for i=1, n Instruction Count
Average CPI = SUM of CPI(i) * F(i) for i = 1, nF(i) is the Instruction Frequency
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5
Load 20% 5 1.0
Store 10% 3 .3
Branch 20% 2 .4
2.2 = CPI ave
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5
Load 20% 5 1.0
Store 10% 3 .3
Branch 20% 2 .4
2.2 = CPI ave
CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle TimeCPU Time Inst Cnt * CPI ave * Clk Cycle Time
% Time = F(i) * CPI(i) / CPI ave
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 = CPI ave
CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle TimeCPU Time Inst Cnt * CPI ave * Clk Cycle Time
% Time = F(i) * CPI(i) / CPI ave
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 = CPI ave
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles?
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 (2) 1.0 (.4) 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 (1.6)
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles?
2.2/1.6 = 1.375
CPU Time = Inst Cnt * CPI ave * Clk Cycle Time
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time?
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 (1) .4 (.2) 18%
2.2 (2.0)
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0
What if two ALU instructions could be executed at once?
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 (.5) .5 (.25) 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 (1.95)
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0
What if two ALU instructions could be executed at once? CPI=1.95
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much?What is the CPI for each sequence?
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second
sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster? How much?
What is the CPI for each sequence?
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be
faster? How much? 10 / 9 = 1.11What is the CPI for each sequence?
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster?
How much? 10 / 9 = 1.11What is the CPI for each sequence? 10/5 = 2
9/6 = 1.5
MIPS = Instruction Count
Execution time x 10 6
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
MIPS = Instruction Count
Execution time x 10 6
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
1. Cannot compare if instruction set is different
MIPS = Instruction Count
Execution time x 10 6
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
1. Cannot compare if instruction set is different2. Highly dependent on the program
MIPS = Instruction Count
Execution time x 10 6
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
1. Cannot compare if instruction set is different2. Highly dependent on the program3. Can be inversely proportional to performance
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A has 1 cycle,Class B has 2 cycles, Class C has 3 cycles
Instruction counts ( billions)Code from A B CCompiler 1 5 1 1Compiler 2 10 1 1
• Which sequence will be faster according to MIPS?• Which sequence will be faster according to execution
time?
MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 5+1x2+1x3=10 billionCompiler 2
MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billionCompiler 2 15 billion
MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 1010x10-8=100Compiler 2 15 billion
MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 secCompiler 2 15 billion 150 sec
MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 sec 7x103/100Compiler 2 15 billion 150 sec
MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class CCPI 1 2 3
Instruction counts ( billions)Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 sec 70Compiler 2 15 billion 150 sec 12x103/150
MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 sec 70Compiler 2 15 billion 150 sec 80
MIPS example
• Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.
Benchmarks
• Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.
• Small benchmarks– nice for architects and designers– easy to standardize– can be abused
Benchmarks
SPEC CPU 2006
• SPEC - System Performance Evaluation Cooperative• 12 Integer Benchmarks• 17 Floating Point Benchmarks• Fig 1.20 P.49
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a
machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine,
with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Improved Time = (100 – 80) + 80/n = 100/4
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Improved Time = (100 – 80) + 80/n = 100/4
20 + 80/n = 25 80 = 5n ; n = 16
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a
machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine,
with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?Improved Time = (100 –80) + 80/n = 100/5
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine, with multiply
responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?Improved Time = (100 –80) + 80/n = 100/5 20 + 80/n = 20
80/n = 0 Impossible!
Amdahl's Law
• Performance is specific to a particular program/s
– Total execution time is a consistent summary of performance
Remember
• Performance is specific to a particular program/s– Total execution time is a consistent summary of
performance
• For a given architecture performance increases come
from:– increases in clock rate (without adverse CPI affects)
Remember
• Performance is specific to a particular program/s– Total execution time is a consistent summary of
performance
• For a given architecture performance increases come from:– increases in clock rate (without adverse CPI affects)– improvements in processor organization that lower CPI
Remember
• Performance is specific to a particular program/s– Total execution time is a consistent summary of performance
• For a given architecture performance increases come from:– increases in clock rate (without adverse CPI affects)– improvements in processor organization that lower CPI– compiler enhancements that lower CPI and/or instruction
count
Remember