CISC 662 Graduate Computer Architecture Lecture 1 ...€¦ · 14 Technology constantly on the move! • All major manufacturers have announced and/or are shipping multi-core processor

CISC 662 Graduate Computer Architecture

Lecture 1 - Introduction Michela Taufer

http://www.cis.udel.edu/~taufer/courses

Powerpoint Lecture Notes from John Hennessy and David Patterson’s: Computer Architecture, 4th edition

---- Additional teaching material from:

Jelena Mirkovic (U Del) and John Kubiatowicz (UC Berkeley)

2

Course Overview

3

CISC662: Information Instructor: Michela Taufer

Office: 406 Smith Hall Office Hours: TR 10:45AM – 11:45AM or by appt.

TA: Richard Burns Office Hours: TBD

Class: TR 3:30PM - 4:45PM Text: Computer Architecture: A Quantitative Approach,

Forth Edition (2006) Web page: http://www.cis.udel.edu/~taufer/courses

Lectures available in course webpage 24 hours before class

Mailing List: http://gcl.cis.udel.edu/mailman/listinfo/cisc662010_fa09 Email: Instructor: [email protected] or [email protected]

TA: [email protected]

4

CISC 662 Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century

Technology Programming Languages

Operating Systems History

Applications Interface Design (ISA)

Measurement & Evaluation

Parallelism

Computer Architecture: • Instruction Set Design • Organization • Hardware/Software Boundary Compilers

5

Tentative Topics Coverage Textbook: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th Ed., 2006

Tentative Schedule: •  2.5 weeks Fundamentals of Computer Architecture,

Instruction Set Architecture •  1.5 weeks Pipelining •  3.0 weeks Instructional Level Parallelism •  1.5 weeks Multiprocessors and Thread-Level Parallelism •  3.0 weeks Memory and Memory Hierarchy

6

Grading •  Grade based on:

•  Homework assignments •  Midterm exam •  Final exam •  Reading assignments and participation in class (questions and

question discussion)

7

Participation •  Complete reading and homework assignments on time •  Print and review slides before to come to class

–  Slides will be available 24 hours before the class starts

8

Getting Help •  Course webpage at http://cis.udel.edu/~taufer/courses

–  Copies of lectures and project assignments –  Clarifications to assignments, deadlines –  Syllabus and class schedule including deadlines –  User: cisc662student –  Password: Study4Fun!

•  Discussions through mailing list –  Clarifications to assignments, general discussion –  Send e-mail to all with:

[email protected] –  You MUST register to the list at:

http://gcl.cis.udel.edu/mailman/listinfo/cisc662010_fa09 •  Personal help

–  Benefit from office hours

9

Cheating •  What is cheating?

–  Sharing code: either by copying, retyping, looking at, or supplying a copy of a file.

•  What is NOT cheating? –  Helping others use systems or tools. –  Helping others with high-level design issues. –  Helping others debug their code.

•  Penalty for cheating: –  Removal from course with failing grade.

10

Concepts in Architecture

11

What’s Inside a Computer?

Instruction Decoder

Memory

Cache

Main memory

Disk

ALU

Input/Output Units

CPU

Clock

I/O controller

12

What Each Unit Does? Memory

Cache

Main memory

Disk

I/O Units

CPU

a+b=c print c

a+b=c

a, b

print c

c

32

I/O controller

Instruction Decoder

ALU

CPU

Clock

13

What is “Computer Architecture”? Applications

Instruction Set Architecture

Compiler

Operating System

Firmware

• Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation

I/O system Instr. Set Proc.

Digital Design Circuit Design

Datapath & Control

Layout & fab

Semiconductor Materials

14

Technology constantly on the move! •  All major manufacturers have

announced and/or are shipping multi-core processor chips

•  Intel talking about 80 cores in not-to-distance future

•  3-dimensional chip technology –  Sandwiches of silicon –  “Through-Vias” for communication

•  Number of transistors/dice keeps increasing

–  Intel Core 2: 65nm, 291 Million transistors! –  Intel Pentium D 900: 65nm, 376 Million

Transistors!

Intel Core Duo

15

Moore’s Law

•  “Cramming More Components onto Integrated Circuits” –  Gordon Moore, Electronics, 1965

•  # on transistors on cost-effective integrated circuit double every 18 months

16

Computer Architecture’s Changing Definition

•  1950s to 1960s: Computer Architecture Course: Computer Arithmetic

•  1970s to mid 1980s: Computer Architecture Course: Instruction Set Design, especially ISA appropriate for compilers

•  1990s: Computer Architecture Course: Design of CPU, memory system, I/O system, Multiprocessors, Networks

•  2000s: Multi-core design, on-chip networking, parallel programming paradigms, power reduction

•  2010s: Computer Architecture Course: Self adapting systems? Self organizing structures? DNA Systems/Quantum Computing?

17

The Instruction Set: a Critical Interface

instruction set

software

hardware

•  Properties of a good abstraction –  Lasts through many generations (portability) –  Used in many different ways (generality) –  Provides convenient functionality to higher levels –  Permits an efficient implementation at lower levels

18

Instruction Set Architecture ... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.

– Amdahl, Blaaw, and Brooks, 1964

-- Organization of Programmable Storage

-- Data Types & Data Structures: Encodings & Representations

-- Instruction Formats

-- Instruction (or Operation Code) Set

-- Modes of Addressing and Accessing Data Items and Instructions

-- Exceptional Conditions

19

Computer Architecture is an Integrated Approach

•  What really matters is the functioning of the complete system

–  hardware, runtime system, compiler, operating system, and application

–  In networking, this is called the “End to End argument”

•  Computer architecture is not just about transistors, individual instructions, or particular implementations

–  E.g., Original RISC projects replaced complex instructions with a compiler + simple instructions

•  It is very important to think across all hardware/software boundaries

–  New technology ⇒ New Capabilities ⇒ New Architectures ⇒ New Tradeoffs

–  Delicate balance between backward compatibility and efficiency

20

Elements of an ISA •  Set of machine-recognized data types

–  bytes, words, integers, floating point, strings, . . .

•  Operations performed on those data types –  Add, sub, mul, div, xor, move, ….

•  Programmable storage –  regs, PC, memory

•  Methods of identifying and obtaining data referenced by instructions (addressing modes)

–  Literal, reg., absolute, relative, reg + offset, …

•  Format (encoding) of the instructions –  Op code, operand fields, …

21

Example: MIPS R3000 0 r0

r1 ° ° ° r31 PC lo hi

Programmable storage 2^32 x bytes 31 x 32-bit GPRs (R0=0) 32 x 32-bit FP regs (paired DP) HI, LO, PC

Data types ? Format ? Addressing Modes?

Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV

Memory Access LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR

Control J, JAL, JR, JALR BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL

32-bit instructions on word boundary

22

ISA vs. Computer Architecture •  Old definition of computer architecture

= instruction set design –  Other aspects of computer design called implementation –  Insinuates implementation is uninteresting or less challenging

•  Our view is computer architecture >> ISA •  Architect’s job much more than instruction set

design; technical hurdles today more challenging than those in instruction set design

•  Since instruction set design not where action is, some conclude computer architecture (using old definition) is not where action is

–  We disagree on conclusion –  Agree that ISA not where action is (ISA in CA:AQA 4/e appendix)

23

Computer Architecture Topics

Instruction Set Architecture

Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, Dynamic Compilation

Addressing, Protection, Exception Handling

L1 Cache

L2 Cache

DRAM

Disks, WORM, Tape

Coherence, Bandwidth, Latency

Emerging Technologies Interleaving Bus protocols

RAID

VLSI

Input/Output and Storage

Memory Hierarchy

Pipelining and Instruction Level Parallelism

Network Communication

Oth

er P

roce

ssor

s

24

Computer Architecture Topics

M

Interconnection Network S

P M P M P M P ° ° °

Topologies, Routing, Bandwidth, Latency, Reliability

Network Interfaces

Shared Memory, Message Passing, Data Parallelism

Processor-Memory-Switch

Multiprocessors Networks and Interconnections

25

Fundamental Execution Cycle

Instruction Fetch

Instruction Decode

Operand Fetch

Execute

Result Store

Next Instruction

Obtain instruction from program storage

Determine required actions and instruction size

Locate and obtain operand data

Compute result value or status

Deposit results in storage for later use

Determine successor instruction

Processor

regs

F.U.s

Memory

program

Data

von Neuman

bottleneck

26

Pipelined Instruction Execution

I n s t r.

O r d e r

Time (clock cycles)

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5

27

Limits to Pipelining

•  Maintain the von Neumann “illusion” of one instruction at a time execution

•  Hazards prevent next instruction from executing during its designated clock cycle

–  Structural hazards: attempt to use the same hardware to do two different things at once

–  Data hazards: Instruction depends on result of prior instruction still in the pipeline

–  Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

•  Power: Too many thing happening at once ⇒ Melt your chip!

–  Must disable parts of the system that are not being used –  Clock Gating, Asynchronous Design, Low Voltage Swings, …

28

Progression of ILP •  1st generation RISC - pipelined

–  Full 32-bit processor fit on a chip => issue almost 1 IPC »  Need to access memory 1+x times per cycle

–  Floating-Point unit on another chip –  Cache controller a third, off-chip cache –  1 board per processor multiprocessor systems

•  2nd generation: superscalar –  Processor and floating point unit on chip (and some cache) –  Issuing only one instruction per cycle uses at most half –  Fetch multiple instructions, issue couple

»  Grows from 2 to 4 to 8 … –  How to manage dependencies among all these instructions? –  Where does the parallelism come from?

•  VLIW (Very Long Instruction Word) –  Expose some of the ILP to compiler, allow it to schedule

instructions to reduce dependences

29

Modern ILP •  Dynamically scheduled, out-of-order execution

–  Current microprocessor fetch 10s of instructions per cycle –  Pipelines are 10s of cycles deep ⇒ many 10s of instructions in execution at once

•  What happens: –  Grab a bunch of instructions, determine all their dependences,

eliminate dep’s wherever possible, throw them all into the execution unit, let each one move forward as its dependences are resolved

–  Appears as if executed sequentially –  On a trap or interrupt, capture the state of the machine

between instructions perfectly

•  Huge complexity –  Complexity of many components scales as n2 (issue width) –  Power consumption big problem

30

Have we reached the end of ILP? •  Multiple processor easily fit on a chip •  Every major microprocessor vendor

has gone to multithreading –  Thread: loci of control, execution context –  Fetch instructions from multiple threads at once,

throw them all into the execution unit –  Intel: hyperthreading –  Concept has existed in high performance computing

for 20 years (or is it 40? CDC6600)

•  Vector processing –  Each instruction processes many distinct data –  Ex: MMX

•  Raise the level of architecture – many processors per chip

Tensilica Configurable Proc

31

The Memory Abstraction

•  Association of <name, value> pairs –  typically named as byte addresses –  often values aligned on multiples of size

•  Sequence of Reads and Writes •  Write binds a value to an address •  Read of addr returns most recently written

value bound to that address

address (name) command (R/W)

data (W)

data (R)

done

32

µProc 60%/yr. (2X/1.5yr)

DRAM 9%/yr. (2X/10 yrs)

1

10

100

1000 19

80

1981

1983

19

84

1985

19

86

1987

19

88

1989

19

90

1991

19

92

1993

19

94

1995

19

96

1997

19

98

1999

20

00

DRAM

CPU 19

82

Processor-Memory Performance Gap: (grows 50% / year)

Perf

orman

ce

Time

Processor-DRAM Memory Gap (latency)

33

Levels of the Memory Hierarchy

CPU Registers 100s Bytes << 1s ns

Cache 10s-100s K Bytes ~1 ns $1s/ MByte

Main Memory M Bytes 100ns- 300ns $< 1/ MByte

Disk 10s G Bytes, 10 ms (10,000,000 ns) $0.001/ MByte

Capacity Access Time Cost

Tape infinite sec-min $0.0014/ MByte

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

Staging Xfer Unit

prog./compiler 1-8 bytes

cache cntl 8-128 bytes

OS 512-4K bytes

user/operator Mbytes

Upper Level

Lower Level

faster

Larger

circa 1995 numbers

34

The Principle of Locality •  The Principle of Locality:

–  Program access a relatively small portion of the address space at any instant of time.

•  Two Different Types of Locality: –  Temporal Locality (Locality in Time): If an item is referenced, it will tend to

be referenced again soon (e.g., loops, reuse) –  Spatial Locality (Locality in Space): If an item is referenced, items whose

addresses are close by tend to be referenced soon (e.g., straightline code, array access)

•  Last 30 years, HW relied on locality for speed

P MEM $

35

Memory Abstraction and Parallelism •  Maintaining the illusion of sequential access to memory

across distributed system •  What happens when multiple processors access the same

memory at once? –  Do they see a consistent picture?

•  Processing and processors embedded in the memory?

P 1

$

Inter connection network

$

P n

Mem Mem

P 1

$

Inter connection network

$

P n

Mem Mem

Deadlines and News News: •  08-20-09: Syllabus is now available •  08-20-09: Passwd for slide and lab web-pages

was given in class Lecture Topics: •  09-03-09: Lec02 - Performance and ISAs •  09-01-09: Lec01 - Course presentation and

assessment test (today) Reading Assignments: •  09-01-09: Chapter 1 and Appendix B of the book Deadlines: •  09-07-09 (11:59PM): Questions 1 on Chapter 1

36

Documents

CISC 662 Graduate Computer Architecture Lecture 1 ...€¦ · 14 Technology constantly on the move! • All major manufacturers have announced and/or are shipping multi-core processor