HOT CHIPS 2014 NVIDIA’S DENVER PROCESSOR Boggs, CPU Architecture Co-authors: Gary Brown, Bill...

Darrell Boggs, CPU Architecture

Co-authors: Gary Brown, Bill Rozas,

Nathan Tuck, K S Venkatraman

HOT CHIPS 2014

NVIDIA’S DENVER PROCESSOR

The First 64-bit Android Kepler-Class

with Dual Denver CPUs TEGRA K1

TEGRA K1 192-core Kepler-Class Chip

One Chip — Two Versions

Pin Compatible

Quad A15 CPUs

32-bit

3-way Superscalar

Up to 2.3GHz

32K+32K L1$

Dual Denver CPUs

64-bit

7-way Superscalar

Up to 2.5GHz

128K+64K L1$

DENVER VALUE PROPOSITION

Highest performance and very energy-efficient ARMv8 processor

Greater dynamic sharing with GPU

Extended battery life

Low latency power-state transition

Best web browsing experience

Designed to bring PC-class performance to the ARM ecosystem

Content creation

Gaming

Enterprise applications

DENVER CPU Highest Perf ARMv8 CPU

7-wide superscalar

Aggressive HW prefetcher

Dynamic Code Optimization Optimize once, use many times

OOO execution without the power

Denver Core

IFUBPU

L1 Inst Cache

128 K – 4 Way

ARM HW Decoder

SchedulerL1 Data Cache

64 K – 4 Way

JSR IEU0 IEU1 FP0 FP1 LS0 LS1

Prefetch

Branch

Integer Integer

Integer

Scheduler

Decoder

(predecode)

8 wide

Scheduler

Decoder

3 wide

Branch Integer Integer Mul

NEON Load Store

TEGRA K1 SUPERSCALAR ARCHITECTURE

Branch: 1

Integer: 2

Multiply: 1

Floating Point/Neon: 2 x 64-bit

LD/ST: 1 LD and 1 ST

Peak IPC 3

Branch: 1

Integer: 2 (+ Mul) + 2

Floating Point/Neon: 2 x 128-bit

LD/ST: 2 LD and/or ST

Peak IPC 7+

Cortex-A15 Tegra K1-32 Denver Tegra K1-64

Denver

RF wrJCC CMPBypass

I$ Rd Way Sel PickDec PB

EL4 EE5 ES6 EW7

Correct Target

ITLB RF wrBypass ALUBypassSch

EB1 EA2 ED3 EL4 EE5 ES6 EW7SB2

Misprediction

Signal

13 Cycle Penalty

IC2 IW3 IN4 IN5 SB1IP1

Pipeline Microarchitecture – Mispredict Penalty

Tegra K1-32

15 cycle mispredict

Tegra K1-64

13 cycle mispredict

Higher ILP and efficiency Lower is better

CORE CLUSTER RETENTION STATE

New power management state: CC4

Allows cache and architectural state retention

Allows voltage to be reduced below Vmin to a retention voltage

Fast entry and exit latencies enable wider range of use

DENVER IDLE POWER IMPROVES WITH RETENTION

Overhead from energy

required to flush L2 Leakage

Efficiency

crossover

Power penalty if

entered for short

durations

DYNAMIC CODE OPTIMIZATION OPTIMIZE ONCE, USE MANY TIMES

Instructions

Denver Hardware

Hardware

Decoder

Execution

Optimizer

Optimization

m-interrupt

Instructions

Denver Hardware

Hardware

Decoder

Execution

Optimizer

Optimization Cache

Optimized

mcode Dynamic

Profile

Information

Unrolls Loops Renames registers Reorders Loads and Stores Improves control flow Removes unused computation Hoists redundant computation Sinks uncommonly executed computation Improves scheduling

Instructions

Denver Hardware

Hardware

Decoder

Execution

Units Optimization Cache

Optimized

mcode Optimization

Lookup Optimized code

execution from

optimization cache

Instructions

Denver Hardware

Hardware

Decoder

Execution

Optimization

Lookup

Optimizer

Optimization Cache

Optimized

Dynamic

Profile

Information

Chaining

DENVER PERFORMANCE

DMIPS SPECInt 2K SPECFP 2K AnTuTu 4 Geekbench 3Single-Core

GoogleOctane v2.0

16MBMemcpy(GB/s)

16MBMemset(GB/s)

16MBMemread

(GB/s)

ance R

tive t

Baytrail (Celeron N2910)

Krait-400 (8974-AA)

iPhone 5s (A7 Cyclone)

Haswell (Celeron 2955U)

Tegra K1 64

DCO: AN EXAMPLE SPECINT – CRAFTY EXECUTION Full benchmark run

Profile of execution

Optimized ucode execution HW Decoder

execution

Dynamic Code Optimizer

Profile of new ARM instructions

Profile of Instructions Per Cycle

DCO: AN EXAMPLE SPECINT – CRAFTY EXECUTION 3% of benchmark run

CONCLUSION

Dynamic Code Optimization is the architecture of the future

Breaks the out-of-order window physical limitation

Opens synergy between HW and SW that current architectures lack

Improves efficiency by optimizing once and using many times

Delivering PC-class performance to mobile form factors

Enables PC-class gaming experience

Enables true enterprise applications

Enables content creation

ACKNOWLEDGMENT

We would like to thank the CPU team in NVIDIA for all the creativity, hard work, and dedication to bring this vision to a reality.

HOT CHIPS 2014 NVIDIA’S DENVER PROCESSOR Boggs, CPU Architecture Co-authors: Gary Brown, Bill...

Documents

DENVER PUBLISHING INSTITUTE - University of Denver

Denver Chiropractic Center - Denver, Colorado Chiropractor

ProteOn Sensor Chips - Bio-Rad Laboratories · ProteOn™ Sensor Chips Tips and Techniques ProteOn Sensor Chips. Contents ProteOn Sensor Chips 1 Storing Sensor Chips 2 Opening a Sensor

Fried potato chips vs baked potato chips

Clockless Chips

Chips with everything “chips glorious chips”

AN OVERVIEW OF NVIDIA’S AUTONOMOUS VEHICLES PLATFORM · AN OVERVIEW OF NVIDIA’S AUTONOMOUS VEHICLES PLATFORM. 2 AGENDA ... Environment 52,000 2% ±1.3% ... Gigabit Ethernet 10

New GPU Features of NVIDIA’s Maxwell Architecturedeveloper.download.nvidia.com/assets/events/GDC15/... · Holger Gruen, New GPU Features of NVIDIA’s Maxwell Architecture 11:00

my name is Lars ishop, and I’m an engineer in NVIDIA’s

New GPU Features of NVIDIA’s Maxwell Architecture

DENVER, CO - Denver Public Schools

Kids Menu - ISCW · 2019-09-03 · frog, cream & sprinkles Chicken Nuggets & Chips Fish & Chips Kids Pizza & Chips (Margherita or Hawaiian) Calamari & Chips Chicken Schnitzel/Parmigiana

NVIDIA’S TEGRA K1 SYSTEM-ON-CHIP€¦ · NVIDIA’S TEGRA K1 SYSTEM-ON-CHIP . Tegra K1 Battery Saver Core 2x ISP ARM7 2160p30 VIDEO ENCODER 2160p30 VIDEO DECODER AUDIO USB 3.0 SECURITY

Denver Denver Cd70

New STARTERS FISH N’ CHIPS · 2020. 4. 4. · FISH N’ CHIPS House Fish & Chips $11.50 Barramundi Fish & Chip $16.50 Snapper Fish & Chips $16.50 2x Calamari, 2x House Fish & Chips

Progress Report on the Analysis of Red/Gray Chips in WTC …911myths.com/9119ProgressReport022912_rev1_030112web.pdf · Denver, CO Respectfully ... according to the procedures described

Denver Water 2007 Budget · March 7, 2007 Board of Water Commissioners City and County of Denver Denver, Colorado 80204-3412 Re: Denver Water 2007 Budget Chips Barry, Manager

University of Colorado Denver | CU Denver

UI ComposerTM: NVIDIA’s New 3D UI Suite · © 2008 NVIDIA Corporation. Stephen Jones, NVIDIA UI ComposerTM: NVIDIA’s New 3D UI Suite

NVIDIA’S XAVIER SOC - Hot Chips · 2018. 8. 19. · ©2018 NVIDIA CORPORATION 4 Volta Tensor Core GPU FP32 / FP16 / INT8 Multi-Precision 512 CUDA Tensor Cores 2.8 CUDA TFLOPS (FP16)