33
CONTEMPORARY DRAM ARCHITECTURES AND BEYOND Bruce Jacob University of Maryland Contemporary DRAM Architectures and Beyond Bruce Jacob Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/~blj/ OUTLINE: Motivation & Background Experiments Results More Recent Results UNIVERSITY OF MARYLAND

New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

nd

k

UNIVE

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Contemporary DRAMArchitectures and Beyo

Bruce Jacob

Electrical & Computer EngineeringUniversity of Maryland, College Parhttp://www.ece.umd.edu/~blj/

OUTLINE:

• Motivation & Background

• Experiments

• Results

• More Recent ResultsRSITY OF MARYLAND

Page 2: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

rary ’99.. Mudge

pu,land

,gan

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Sources

“A Performance Study of ContempoDRAM Architectures,” Proc. ISCAV. Cuppu, B. Jacob, B. Davis, and T

Recent experiments by Vinodh CupPh.D. student at University of Mary

Recent experiments by Brian DavisPh.D. student at University of Michi

Page 3: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

R) ){

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Dilemma: THIS ...

STATUS QUO inMEMORY-SYSTEM RESEARCH:

...

if (memory_instruction(INSTR)) {if (L1_cache_miss( data_addr(INSTR) ){

if (L2_cache_miss( data_addr(INST

cycles += DRAM_LATENCY;

}}

}

...

Page 4: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

... or THIS

Page 5: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

...

MC

DR

AM

DR

AM

DR

AM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Motivation

HERE’S WHAT YOU MISS:

bus

DR

AM

DR

AM

DR

AM

CPU MC busCPU

DATA TRANSFER

OVERLAP

COLUMN ACCESS

ROW ACCESS

BUS TRANSMISSION

DRAM LATENCY:

...

Page 6: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

nents

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Goal

PRELIMINARY DRAM STUDY:

• Bus Transmission

• Row Access

• Column Access

• Data Transfer

• Bus Wait/Synch Time

• Stalls Due to Refresh

• The OVERLAP of These Compo(with each other)(with CPU execution)

MODEL EXISTING TECHNOLOGY

Page 7: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

.. Bit Lines...

MemoryArray

Sense Amps

lumn Decoder

DRAM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

BUS TRANSMISSION

BUSMEMORY

.

Row

Dec

oder

Co

. . .

.

Data In/OutBuffers

CONTROLLERCPU

Page 8: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

.. Bit Lines...

MemoryArray

Sense Amps

lumn Decoder

DRAM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

ROW ACCESS

BUSMEMORY

.

Row

Dec

oder

Co

. . .

.

Data In/OutBuffers

CONTROLLERCPU

Page 9: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

.. Bit Lines...

MemoryArray

Sense Amps

lumn Decoder

DRAM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

COLUMN ACCESS

BUSMEMORY

.

Row

Dec

oder

Co

. . .

.

Data In/OutBuffers

CONTROLLERCPU

Page 10: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

ith COL

.. Bit Lines...

MemoryArray

Sense Amps

lumn Decoder

DRAM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

DATA TRANSFER

note: page mode enables overlap w

BUSMEMORY

.

Row

Dec

oder

Co

. . .

.

Data In/OutBuffers

CONTROLLERCPU

Page 11: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

own

.. Bit Lines...

MemoryArray

Sense Amps

lumn Decoder

DRAM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

BUS TRANSMISSION

note: overlapped component not sh

BUSMEMORY

.

Row

Dec

oder

Co

. . .

.

Data In/OutBuffers

CONTROLLERCPU

Page 12: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

M

ut

Data Transfer

Column Access

Transfer Overlap

Row Access

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

Read Timing for Conventional DRA

RowAddress

ColumnAddress

ValidDataout

RAS

CAS

Address

DQ

RowAddress

ColumnAddress

ValidDatao

Page 13: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

RAM

ValidDataout

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

Read Timing for Fast Page Mode D

RowAddress

ColumnAddress

ValidDataout

ColumnAddress

ColumnAddress

ValidDataout

RAS

CAS

Address

DQ

Data Transfer

Column Access

Transfer Overlap

Row Access

Page 14: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

t DRAM

Data Transfer

Column Access

Transfer Overlap

Row Access

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

Read Timing for Extended Data Ou

RowAddress

ColumnAddress

ValidDataout

RAS

CAS

Address

DQ

ColumnAddress

ColumnAddress

ValidDataout

ValidDataout

Page 15: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

M

alidtaout

Data Transfer

Column Access

Transfer Overlap

Row Access

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

Read Timing for Synchronous DRA

CAS

Address

DQ ValidDataout

ValidDataout

VDa

ColumnAddress

RowAddress

RAS

Clock

Page 16: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

d

Data Transfer

Column Access

Transfer Overla p

Row Access

utValid

Dataout

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Primer

Read Timing for Rambus DRAM

DQ

Command

Address

ReadStrobe

ReaTerm

ACTV/READ

Bank/Row

4 cycles

ColAddr

ColAddr

ColAddr

ValidDataout

ValidDatao

Page 17: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

free x32

ee x1

ulated)

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Simulator Overview

CPU: SimpleScalar v3.0a

• 8-way out-of-order

• L1 cache: split 64K/64K, lockup

• L2 cache: unified 1MB, lockup fr

• L2 blocksize: 128 bytes

Main Memory: 8 64Mb DRAMs

• 100MHz/128-bit memory bus

• Optimistic open-page policy(close-immediately can be calc

Represents a “typical” workstation

Page 18: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

hannel

x16 DRAM

x16 DRAM

x16 DRAM

x16 DRAM

x16 DRAM

x16 DRAM

x16 DRAM

x16 DRAM

DIMM

rrow Channel

DR

AM

DR

AM

DR

AM

DR

AM

DR

AM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Configurations

Note: TRANSFER WIDTH of Direct Rambus C

• equals that of ganged FPM, EDO, etc.

• is 2x that of Rambus & SLDRAM

CPU Memory

Controllerand caches128-bit 100MHz bus

FPM, EDO, SDRAM, ESDRAM:

Fast, Na

CPU MemoryControllerand caches

128-bit 100MHz bus

DR

AM

DR

AM

DR

AM

Rambus, Direct Rambus, SLDRAM:

Page 19: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

DR

AM

DR

AM

DR

AM

arallel Channels

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

DRAM Configurations

Strawman: Rambus, etc.

DR

AM

DR

AM

...

Multiple P

CPU MemoryControllerand caches

128-bit 100MHz bus

Page 20: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA CPU

hes

ortex

es caches) & Memorys Time

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Overhead: Memory vs.

Variable: speed of processor & cac

Compress Go Ijpeg Li Perl V0

0.5

1

1.5

2

2.5

3

Clo

cks

Per

Inst

ruct

ion

(CP

I)

Processor Execution (includOverlap between ExecutionStalls due to Memory Acces

Yesterday’s CPU

Tomorrow’s CPUToday’s CPU

BENCHMARK

Page 21: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA t al)

memory

paths

ystem

tDRAM

tPROC

tBW

EAL

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Definitions (var. on Burger, e

• tPROC — processor with perfect

• tREAL — realistic configuration

• tBW — CPU with wide memory

• tDRAM — time seen by DRAM s

Stalls Due toBANDWIDTH

Stalls Due toLATENCY

CPU+L1+L2Execution

CPU-MemoryOVERLAP

tR

tREAL - tBW

tBW - tPROC

tREAL - tDRAM

tPROC - (tREAL - tDRAM)

Page 22: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA L

DRAM

tion & Memorytencyndwidth

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Memory & CPU — PER

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

FPM EDO SDRAM ESDRAM DR

DRAM Configuration

Cyc

les

Per

Inst

ruct

ion

(CP

I)

Yesterday’s CPU

Tomorrow’s CPU

Today’s CPU

Processor ExecutionOverlap between ExecuStalls due to Memory LaStalls due to Memory Ba

Page 23: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA AMs

transfers

DRDRAM

ission Time Timeess Timeer Time Overlaper Timee

me

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Average Latency of DR

note: SLDRAM & RDRAM 2x data

FPM EDO SDRAM ESDRAM SLDRAM RDRAM

DRAM Configurations

0

100

200

300

400

500

Tim

e pe

r A

cces

s (n

s)Bus TransmRow AccessColumn AccData TransfData TransfRefresh TimBus Wait Ti

Page 24: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA AMs

transfers

DRDRAM

ission Time Timeess Timeer Time Overlaper Timee

me

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Average Latency of DR

note: SLDRAM & RDRAM 2x data

FPM EDO SDRAM ESDRAM SLDRAM RDRAM

DRAM Configurations

0

100

200

300

400

500

Tim

e pe

r A

cces

s (n

s)Bus TransmRow AccessColumn AccData TransfData TransfRefresh TimBus Wait Ti

Page 25: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

s

Latency

:

nels

apacity

AM

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Cost-Performance

FPM, EDO, SDRAM, ESDRAM:

• Lower Latency => Wide/Fast Bu

• Increase Capacity => Decrease

• Low System Cost

Rambus, Direct Rambus, SLDRAM

• Lower Latency => Multiple Chan

• Increase Capacity => Increase C

• High System Cost

1 DRDRAM = Multiple SDR

Page 26: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

ttleneck

CPUtion, ...)

h Problem

es

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Conclusions

100MHz/128-bit Bus is Current Bo

• Solution: Fast Bus/es & MC on (e.g. Compaq Alpha, Sony Emo

Current DRAMs Solving Bandwidt(but not Latency Problem )

There is Locality in DRAM Access(but how important is this?)

SPECint ’95 Fits in 1MB Cache

Page 27: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA rk

round

ls

RAM

res

ses

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Recent (Unfinished) Wo

Investigation of Organization-LevelParameters:

• Channel widths & speeds, turna

• Independent vs. ganged channe

• Banks per channel, burst widths

Detailed Study of DRDRAM vs. SDin Highly Concurrent Environment

Embedded DRAM+DSP Architectu

Detailed Study of Multiprocessor Bu

Page 28: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

C

D DD D

D DD D

C

DD D

DD D

D D

D D

...

...

channelsf 1, 2, 4, ...

C

D DD D

D DD D

C

DD D

DD D

D D

D D

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Channel/Bank Model

C

D

C

D D

C

D D

D D

D

C

D

C

D DD D

D

D

D D

D D

C

DD

DD

D

D

D

D

C

DD DD

...

One independent channelBanking degrees of 1, 2, 4, ...

Four independent channelsBanking degrees of 1, 2, 4, ...

Two independentBanking degrees o

C

D D

C

D D

D D

C

D

C

D D

C

D DD D

D

D

D D

D D

C

DD

DD

D

D

D

D

C

DD DD

Page 29: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA del

ns

20ns

40ns

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Read/Write Request Mot0

10

10ns90ns

DATA BUS

ADDRESS BUS

DRAM BANK

70ns

READ REQUESTS:

DATA BUS

ADDRESS BUS

DRAM BANK10ns

90ns70ns

DATA BUS

ADDRESS BUS

DRAM BANK

10ns100ns

70ns

t0

10ns

10ns90ns

DATA BUS

ADDRESS BUS

DRAM BANK

40ns

WRITE REQUESTS:

DATA BUS

ADDRESS BUS

DRAM BANK

20ns

10ns90ns

40ns

DATA BUS

ADDRESS BUS

DRAM BANK

40ns

10ns90ns

40ns

Page 30: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

0ns

nks:

nt banks:

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Concurrency Model

2

10ns90ns

70ns

20ns

10ns90ns

40ns

20ns

10ns90ns

70ns

Legal if no turnaround and R/W to different ba

R:

W:

10ns

10ns90ns

40ns

10ns

10ns90ns

70nsR:

W:

Legal if turnaround ≤ 10ns and R/W to differe

20ns

10ns90ns

70ns

Legal if R/R to different banks:

R:

R:20ns

10

10

Page 31: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA dth

.4peed)

CPU

yte Burst Widthyte Burst Widthyte Burst Widthyte Burst Widthte Burst Width

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Bandwidth vs. Burst Wi

0.4 0.8 1.6 3.2 6System Bandwidth (GB/s = Channels * Width * S

0

0.25

0.5

0.75

1

1.25

Cyc

les

per

Inst

ruct

ion

PERL: 1 channel, 4 banks, 2GHz

128-B64-B32-B16-B8-By

Page 32: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

256idth)

CPU

1 channel

2 channels4 channels

64-Bit Data Bus32-Bit Data Bus16-Bit Data Bus8-Bit Data Bus

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Exploiting Concurrency

8 16 32 64 128Total Datapath Bitwidth (bits = Channels * BusW

0

0.25

0.5

0.75

1

1.25

Cyc

les

per

Inst

ruct

ion

PERL: 2 banks, 16-byte burst, 2GHz

400

MH

z x

1 ch

anne

l

400

MH

z x

2 ch

anne

ls

400

MH

z x

4 ch

anne

ls

Page 33: New DRAM Contemporary DRAM Architectures and Beyondblj/talks/Compaq.pdf · 1999. 9. 27. · AND BEYOND Bruce Jacob University of Maryland Exploiting Concurrency 8 16 32 64 128 256

CONTEMPORARY

ARCA

k

UNIVE

DRAMHITECTURESND BEYOND

Bruce Jacob

University ofMaryland

Conclusions

None yet ... preliminary data

CONTACT INFO:

Prof. Bruce Jacob

Electrical & Computer EngineeringUniversity of Maryland, College Parhttp://www.ece.umd.edu/~blj/

[email protected]

RSITY OF MARYLAND