29
AMD POWER MANAGEMENT SRILATHA MANNE 11/14/2013

SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

AMD POWER MANAGEMENT SRILATHA MANNE

11/14/2013

Page 2: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 2

OUTLINE

APUs and HSA

Richland APU

APU Power Management

Monitoring and Management Features

Page 3: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

APU: ACCELERATED PROCESSING UNIT

The APU has arrived and it is a great advance over previous platforms

Combines scalar processing on CPU with parallel processing on the GPU and high-bandwidth access to memory

Advantages over other forms of compute offload: Easier to program Easier to optimize Easier to load-balance Higher performance Lower power

© Copyright 2012 HSA Foundation. Al l Rights Reserved. 3

Page 4: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| AMD OPTERON™ PROCESSOR ROADMAP | NOVEMBER 13, 2013 4

Optimized compute density

Radical performance improvement for HPC, big data, and multimedia workloads

Low power to maximize performance per watt

Simplified programming

Single, standard computing environments

Support for mainstream languages— C, C++, Fortran, Java, .NET

Lower development costs

WHAT IS HSA?

Memory

CPU ACC GPU

HSA Platform

HSA Runtime Infrastructure

Lega

cy

Ap

plic

atio

ns HSA Accelerated Applications

Access to Broad Set of Programming Languages

OS

Joins CPUs, GPUs, and accelerators into a unified computing framework

Single address space accessible to avoid the overhead of data copying

Use-space queuing to minimize communication overhead

Pre-emptive context switching for better quality of service

Page 5: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

A NEW ERA OF PROCESSOR PERFORMANCE

?

Sin

gle-

thre

ad

Per

form

ance

Time

we are

here

Enabled by: Moore’s

Law Voltage

Scaling

Constrained by: Power Complexity

Single-core Era

Mod

ern

App

licat

ion

P

erfo

rman

ce

Time (Data-parallel exploitation)

we are

here

Heterogeneous Systems Era

Enabled by: Abundant data

parallelism Power efficient

GPUs

Temporarily Constrained by:

Programming models Comm.overhead

Thro

ughp

ut

P

erfo

rman

ce

Time (# of processors)

we are

here

Enabled by: Moore’s Law SMP

architecture

Constrained by: Power Parallel SW Scalability

Multi-core Era

Assembly C/C++ Java …

pthreads OpenMP / TBB … Shader CUDA OpenCL C++ and Java

© Copyright 2012 HSA Foundation. Al l Rights Reserved. 5

Page 6: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

CASCADE DEPTH ANALYSIS

0

5

10

15

20

25

Cascade Depth

20-25

15-20

10-15

5-10

0-5

© Copyright 2012 HSA Foundation. Al l Rights Reserved. 6

Page 7: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9-22

Tim

e (m

s)

AMD “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

GPU

CPU

PROCESSING TIME/STAGE

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)

Cascade Stage

© Copyright 2012 HSA Foundation. Al l Rights Reserved. 7

Page 8: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 22

Imag

es/S

ec

Number of Cascade Stages on GPU

AMD “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

CPU

HSA

GPU

PERFORMANCE: CPU vs. GPU

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)

© Copyright 2012 HSA Foundation. Al l Rights Reserved. 8

Page 9: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

Richland APU

Page 10: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 10

APU HARDWARE

© Copyright 2012 HSA Foundation. All Rights Reserved.

10

AMD "Richland" incorporates:

• Two “Piledriver” high-performance x86 modules (core-pairs)

• 2 MB shared L2 cache per x86 module

• AMD Radeon™ HD 8000 series DirectX® 11-capable GPU with six compute units

• Next-generation media acceleration technology

• Dual 64-bit memory channel supporting up to DDR3-2133

• Integrated DisplayPort 1.2 interfaces

• PCI Express® I/O Generation 2 interfaces

"Richland" is implemented in a 32-nm SOI node2+ high-K metal gate process

technology

Page 11: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 11

APU HARDWARE

© Copyright 2012 HSA Foundation. All Rights Reserved.

11

Advantages

• Scalar and parallel capabilities

• Close physical proximity

• Shared system memory

• Reduced communication overhead

• Fine-grained interaction

• Fine-grained power management

• HSA framework

• Flexible and powerful programming interface

• GPU as first-class compute

• Unified, coherent address space

• Potential for other accelerators

Page 12: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

APU Power Management

Page 13: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 13

AMD TURBO CORE TECHNOLOGY MOTIVATION

Workload1

APU Pwr

AP

U P

wr

APU Pwr

APU Pwr

Workload2 Workload3

AP

U T

em

p

Die Temp

Without AMD Turbo CORE technology,

power and temperature headroom may be left

untapped

in many workload scenarios.

Utilizes estimated and measured metrics to optimize for performance under power and thermal constraints

Page 14: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 14

APU POWER MANAGEMENT

© Copyright 2012 HSA Foundation. All Rights Reserved.

14

• 3 main thermal entities (TE)

• TE1: 1st x86 module + L2

• TE2: 2nd x86 module + L2

• TE3: Graphics + Northbridge + Multimedia

• On each TE

• Power and Temperature tracked

• Frequency and Voltage controlled

• Also account for I/O power influence on each of the other TEs

Page 15: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 15

BUILDING BLOCKS OF AMD TURBO CORE TECHNOLOGY POWER CALCULATION

Activity from Cac Monitors

Power Calculation

Cac * V2 * F + Static

Cac = Σweighti * Activityi Static = Fn(V,T)

Freq

V

Tcalc

P calculated

Page 16: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 16

BUILDING BLOCKS OF AMD TURBO CORE TECHNOLOGY TEMPERATURE CALCULATION

RC MODEL represents

• Cooling solution (APU heat

sink, fan, etc.)

• Temperature influence of

other TEs

Temp. Calculation

Pcalc Temp +

Worst-case ambient temp +

Additional offsets (to account for I/O power, etc.)

Temp. from other TEs

P calculated

T calculated

RC MODEL Calculated temperature enables AMD to deliver robust, dependable, and repeatable performance

Page 17: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 17

BUILDING BLOCKS OF AMD TURBO CORE TECHNOLOGY P-STATE SELECTION

Pboost0

P0

P1

P2

P3

Pmin

...

Pboost1 T calculated P-states

Discrete frequency and voltage operating points

If Tcalc < Limit Step up a P-state (i.e., raise F,V)

If Tcalc > Limit Step down a P-state (i.e., lower F,V)

Tcalc <> Max Temp.? Limit

Page 18: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 18

AMD TURBO CORE CONTROL LOOPS PUTTING THE PIECES TOGETHER

Cac

Power Calculation Freq

V

Tcalc

Temp. Calculation

Temp. from other TEs

P calculated

Pboost0

P0

P1

P2

P3

Pmin

..

.

Pboost1

T calculated

If limits encroached, throttle F and V

New Operating F, V

Loop executed at

a specific interval

Control Loop (similar to above)

Control Loop (similar to above)

Tcalc <> Max Temp.? Limit

Check against

current (EDC/TDC) And other

limits

Page 19: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 19

"RICHLAND" ENHANCEMENTS TO AMD TURBO CORE TECHNOLOGY

Activity from Cac Monitors

Power

Calc. Temp.

Calc

P Calc

Freq V Tcalc

P0

P1

Pb2

...

If Tcalc < Limit raise F,V If Tcalc > Limit lower F,V

New Operating F, V

T Calc

Temp. of other

TEs

*

Boost Scalar

cTDP Scalar IB widget

P-state limit

Pb1

Pb0

Add’l boost state

CONFIGURABLE TDP (cTDP) Intelligent Boost (IB)

ADDITIONAL BOOST STATE

On-die Temperature Sensor Readings

Temperature-smart AMD Turbo CORE

Tcalc <> Max Temp.? Limit

Page 20: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 20

INTELLIGENT BOOST EFFICIENT ALLOCATION OF POWER

Traditional Power Allocation

GPU

CU0 CU1

DD

R

No

rth

bri

dge

3W 3W

2W

99°C-1.6GHz 99°C-1.6GHz

96°C-0.3GHz

Efficient Power Allocation

GPU

CU0 CU1

DD

R

No

rth

bri

dg

e

1W 1W

6W

96°C-0.8GHz 96°C-0.8GHz

96°C-0.75GHz

• CPU power budget at minimum level to keep GPU fully utilized

• Reduced CPU temperature designed to allow GPU to sustain higher power level

• Total system performance increases

CPU Power Allocation

Ove

rall

Per

form

ance

(C

PU

+ G

PU

)

GPU-limited, CPU using too much power

Balanced

CPU-limited, GPU starved

Page 21: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 21

SUMMARY

HSA framework for effective utilization of available compute resources ‒ Unified, coherent address space ‒ Flexible and powerful programming interface

‒ GPU as first-class compute engine

Sophisticated power management solutions for effective operation under physical constraints ‒ Extensive monitoring of internal events and activity

‒ Intelligent algorithms for translating thermal headroom into performance

‒ Management of power limits (cTDP)

‒ Deterministic results using estimated values

‒ Better margining using measured temperature values

OPTIMAL UTILIZATION UNDER GIVEN PHYSICAL CONSTRAINTS

Page 22: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 22

NORTHBRIDGE AND MEMORY POWER CONTROLS

Memory power-limiting: ‒Bandwidth capping by throttling commands

‒ Rolling window of 100 clock cycles

‒ Throttle by 30% to 95%

‒Controlled by ‒ External feedback on thermals (DIMM temperature sensors)

‒ Software register control independent of thermal management

Memory P-states ‒4 Northbridge P-states based on core and GPU activity levels (2

active at any time)

‒2 memory P-states

Page 23: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 23

AMD SERVER ROADMAP: JUNE 2013

Page 24: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

Monitoring and

Management

Page 25: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

Measurements: Nodes

(Info) A node-level measurement shall consist of a combined measurement of all components that make up a node in the architecture. For example, these components may include the CPU, memory, and the network interface. If the node contains other components such as spinning or solid state disks, they shall also be included in this combined measurement. The utility of the node-level measurement is to facilitate measurement of the power or energy profile of a single application. The node may be part of the network or storage equipment, such as network switches, disk shelves, and disk controllers. (important) The ability to measure the current and voltage of any and all nodes shall be provided. The current and voltage measurements shall provide a readout capability of: (mandatory) > 1 per second (important) >50 per second (enhancing) > 250 per second (mandatory) The current and voltage data must be real electrical measurements, not based on heuristic models.

Vendor

Forum

Slide

Page 26: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 26

APML: ADVANCED POWER MANAGEMENT LINK

Page 27: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 27

APML FEATURES

Examples:

‒Processor power-capping

‒Processor asset identification (including CPUID)

‒Machine check register access with optional alerts to management subsystems

Specifics ‒100 kHz standard, and 400 kHz fast mode ‒ SB-TSI: Sideband temperature sensor interface

‒Access internal temperature sensor and specify temperature thresholds

‒ SB-RMI: Sideband remote management interface

‒Monitor the current P-state

‒Set and read current maximum P-state

Page 28: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

| DOE WEBINAR| NOV 14, 2013 28

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limi ted to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

Page 29: SRILATHA MANNE 11/14/2013€¦ · amd specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. in no event will amd be liable to any

Backup