Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L.Lo, and Rebecca L. Stamm

Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L.Lo, and Rebecca

L. Stamm

Presented by Kim Ki Young @ DCSLab

Simultaneous Multithreading(SMT)A Technique that permits multiple independent threads to issue multiple instructions each cycle to a superscalar processor’s functional unitTwo major impediments to processor utilization

long latencieslimited per-thread parallelism

1.Demonstrate the throughput gains of SMT are possible without extensive changes to a conventional, wide-issue superscalar processor2.Show that SMT need not compromise single-thread performance3.Detailed architecture model to analyze and relieve bottlenecks that did not exist in the more idealized model4.Show how simultaneous multithreading creates an advantage previously unexploitable in other architecture

A projection of current superscalar design trends 3-5 years into the futureChanges necessary to support simultaneous multithreading

Multiple program countersSeparate return stack for each threadPer-thread instruction retirement, instruction queue flush, and trap mechanismsA thread id with each branch target buffer entryA larger register file

MIPSIMIPS-based simulatorexecutes unmodified Alpha object code

WorkloadSPEC92 benchmark suitefive floating point programs, two integer programs, TeX

Multiflowtrace scheduling compiler

With only single thread, throughput is less than 2% below a superscalar w/o SMT supportPeak throughput is 84% higher than the superscalarThree problems

IQ sizeFetch throughputLack of parallelism

Improve fetch throughput w/o increasing the fetch bandwidthalg.num1.num2

alg : Fetch selection methodnum1 : # of threads that can fetch in 1 cyclenum2 : max # of instructions fetched per thread in 1 cycle

Partitioning the fetch unitRR.1.8RR.2.4, RR.4.2

Some hardware additionRR.2.8

Additional logic is required11

Fetch PoliciesBRCOUNT

that are least likely to be on a wrong path

MISSCOUNTthat have the fewest outstanding D cache miss

ICOUNTwith the fewest instructions in decode

IQPOSNwith instructions farther from head of IQ

Unblocking the Fetch UnitBIGQ

increase IQ’s size as long as we don’t increase the search spacedouble size, search first 32 entries

ITAGdo I cache tag lookup a cycle early

Two sources of issue slot wasteWrong-path instructions

result from mispredicted branchesOptimistically issued instructions

result from cache miss or bank conflictIssue Algorithms

OPT_LASTSPEC_LASTBRANCH_FIRST

The Issue Bandwidthnot a bottleneck

Instruction Queue Sizenot a bottleneckexperiment with larger queues increased throughput by less than 1%

Fetch Bandwidthprime candidate for bottleneck statusincreasing IQ and excess registers increased performance another 7%

Branch Predictionless sensitive in SMT

Speculative Executionnot a bottleneckeliminating will be a issue

Memory Throughputinfinite bandwidth caches will increase throughput only by 3%

Register File Sizeno sharp drop-off point

Fetch Throughput is still a bottleneck

Borrows heavily from conventional superscalar design, requiring little additional hardware supportMinimizes the impact on single-thread performance, running only 2% slower in that scenarioAchieves significant throughput improvements over the superscalar when many threads are running

Intel Pentium4, 2002Hyper-Threading Technology(HTT)30% speed improvement

MIPS MTIBM POWER5, 2004

two-thread SMT engineSUN Ultrasparc T1, 2005

CMT : SMT + CMP(Chip-level multiprocessing)

Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L.Lo, and Rebecca L. Stamm

Documents

426 Eggers HallSyracuse University Maxwell School of

TH20140522-A-27… · Pitchers and catchers: (I) Stamm and Luebbehusen; (C) Kabrick, Greener and Greener, Schmitt . Home runs: (I) Stamm 2, Bachman, Bravo MINOR LEAGUE Tuesday

Invasive Browser Sniffing and Countermeasures Markus Jakobsson & Sid Stamm

Bill Eggers - Innovation In Government

Dave Eggers by Anthony Bertuca

EL FARO (R. Eggers, 2019) The Lighthouse (R. Eggers, 2019)

ASN.1 Encoding Schemes Done Right Using CMPCTceur-ws.org/Vol-2355/paper1.pdfASN.1 Encoding Schemes Done Right Using CMPCT Mark Tullsen Galois, Inc. tullsen@galois.com Abstract Abstract

Edgebanding Eggers Ind Case History - Nordson

7 stamm updated

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Simultaneous Multithreading: Multiplying Alpha Performance€¦ · u "Simultaneous Multithreading: Maximizing On-Chip Parallelism" by Tullsen, Eggers and Levy in ISCA95. u "Exploiting

Making Strangers Indelible: Eggers, Ondaatje, and Civil War

Eggers' Project Showcase

Jens Eggers

Stamm Consulting Group€¦ · Stamm Consulting Group (SCG) is an international management consulting group dedicated to business restructuring and interim management. We work with

Professor Eggers Course - school certificates+mental illness

These slides are made distributed under the Creative ...cs9242/15/lectures/06-smp_lockingx6.pdf · Simultaneous multithreading (SMT) • D.M. Tullsen, S.J. Eggers, and H.M. Levy,

Thread-Level Parallelism — Simultaneous MultiThreading ...htseng/classes/cse203_2019fa/...maximizing on-chip parallelism Dean M. Tullsen, Susan J. Eggers, Henry M. Levy Department

3REGISTER: WAYS TO M Cubed Math Magic 68 House Hill Road ... Vonda Stamm stamm@mrtc.com Making Math Magic ... provides teachers with strategies for helping students make

Dave Eggers – The Circle