8
Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis Zhen Fang ‡† Ramesh Illikkal* Ravi Iyer* University of Utah , NVidia and Intel La Work done while at Intel

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

  • Upload
    clio

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access. Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis Zhen Fang ‡† Ramesh Illikkal * Ravi Iyer *. University of Utah , NVidia ‡ and Intel Labs* † Work done while at Intel. - PowerPoint PPT Presentation

Citation preview

Page 1: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Niladrish ChatterjeeManjunath ShevgoorRajeev BalasubramonianAl DavisZhen Fang‡†

Ramesh Illikkal*Ravi Iyer*

University of Utah , NVidia‡ and Intel Labs*

†Work done while at Intel

Page 2: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Memory Bottleneck• DRAM power as high as 25% of total datacenter power• Low-Power DRAM in place of DDR3.

– BOOM from HP Labs – Energy Proportional Memory from Stanford

2

CPUDDR3

DDR3

DDR3

DDR3

CPU LPDDR

LPDDR

LPDDR

LPDDR

BASELINE Low Power Memory

Page 3: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Latency Wall

• Memory latency wall not going away– Emerging scale-out workloads e.g. Cloudsuite– Move towards energy-efficient in-order cores

• Reduced Latency DRAM offers very low latency– Row-cycle time (tRC) of 8-12ns (DDR3 tRC = 48.75ns, LPDDR2 tRC = 60ns)

3

Page 4: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

No one memory works best

4

RLDRAM3 DDR3 LPDDR20

100

200

300

400

500

600

700

Power (mW)

RLDRAM3 DDR3 LPDDR20

10

20

30

40

50

60

Latency (ns)

Page 5: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Heterogeneous Memory

5

PERFORMANCE OPTIMIZED

DRAMCPU

• Combine high-performance and low-power dram to outperform DDR3 at a lower energy cost

• Large number of possible designs– Different DRAM device combinations– Channel Organization– Data Placement Granularity

POWER OPTIMIZED

DRAM

Page 6: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Critical Word Regularity

6

• Most DRAM requests are for word-0 of the cache-line

Frequency of accesses to individual words of a cache-line

Page 7: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Critical Word Acceleration

7

CPU LPDDR

RLDRAM

LPDDR

RLDRAM

Word 0

Words 1 - 7

• Critical Word fetched from RLDRAM to boost performance• Rest of the cache-line placed & retrieved from LPDRAM for

energy efficiency.

Page 8: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Results

8

• Throughput improved by 12.9%

• System energy improved by 6%