11
Cores vs. Caches Cores vs. Caches CS 838 Project CS 838 Project Matt Ramsay & Chris Feucht Matt Ramsay & Chris Feucht

Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Embed Size (px)

Citation preview

Page 1: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Cores vs. CachesCores vs. Caches

CS 838 Project CS 838 Project

Matt Ramsay & Chris FeuchtMatt Ramsay & Chris Feucht

Page 2: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

MotivationMotivation

• As feature sizes push smaller, additional As feature sizes push smaller, additional hardware can be placed on chiphardware can be placed on chip

• Various trade-offs resultVarious trade-offs result• Among these for a CMP is how many cores and Among these for a CMP is how many cores and

how much cache on each chiphow much cache on each chip• Our project results suggest an optimal Our project results suggest an optimal

configuration for a 16-processor system configuration for a 16-processor system running web-based applicationsrunning web-based applications

Page 3: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

OutlineOutline

• MotivationMotivation• Experiments PerformedExperiments Performed• Simulator EnvironmentSimulator Environment• ResultsResults• Project ShortcomingsProject Shortcomings• Future WorkFuture Work• Conclusions & SummaryConclusions & Summary

Page 4: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

ExperimentsExperiments

• Intended experiments not performed due to simulator Intended experiments not performed due to simulator limitationslimitations

• Intended experiments: Each core equivalent to .5 MB L2 cacheIntended experiments: Each core equivalent to .5 MB L2 cache• Ran apache_8, oltp_2, zeus_8Ran apache_8, oltp_2, zeus_8

Cores Per ChipL2 Size L2 Assoc. L2 Size L2 Assoc.

16 4 MB 2,4,8 2 MB 2,4,88 8 MB 2,4,8 4 MB 2,4,84 10 MB 2,4,8 8 MB 2,4,82 11 MB 2,4,8 12 MB 3,6,121 11.5 MB 2,4,8 16 MB 4,8,16

Intended Simulated

Page 5: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Simulator EnvironmentSimulator Environment

• All nodes include 32 KB, 2 way L1 I All nodes include 32 KB, 2 way L1 I & D caches& D caches

• Each nodes has its own L2 bank, Each nodes has its own L2 bank, regardless of L2 size or assoc.regardless of L2 size or assoc.

• All other ruby and opal settings left All other ruby and opal settings left at defaultat default

Page 6: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Results - ApacheResults - Apache

0

2

4

6

8

10

12

14

16

18

20

CPI

MissesPerThousand

Page 7: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Results - OLTPResults - OLTP

0

2

4

6

8

10

12

14

16

18

20

CPI

MissesPerThousand

Page 8: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Results – ZeusResults – Zeus

0

2

4

6

8

10

12

14

16

18

20

CPI

MissesPerThousand

Page 9: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Project Shortcomings & Project Shortcomings & Future WorkFuture Work

• Longer runs needed for convincing Longer runs needed for convincing datadata

• Test different number of Test different number of processors/systemprocessors/system

• Add L3 cache to memory hierarchyAdd L3 cache to memory hierarchy

Page 10: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

ConclusionsConclusions

• CPI (IPC) changes little in a 16-processor CPI (IPC) changes little in a 16-processor system as number of cores/chip variessystem as number of cores/chip varies

• This happens despite rapid system-wide This happens despite rapid system-wide L2 cache growth with added chipsL2 cache growth with added chips

• Best performance per cost is with all 16 Best performance per cost is with all 16 processors on one chipprocessors on one chip– Even with 2 MB total L2Even with 2 MB total L2– Would be helped by off-chip L3Would be helped by off-chip L3

Page 11: Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Project SummaryProject Summary

We look here!

50 miles