Upload
giles-boone
View
216
Download
3
Embed Size (px)
Citation preview
Cores vs. CachesCores vs. Caches
CS 838 Project CS 838 Project
Matt Ramsay & Chris FeuchtMatt Ramsay & Chris Feucht
MotivationMotivation
• As feature sizes push smaller, additional As feature sizes push smaller, additional hardware can be placed on chiphardware can be placed on chip
• Various trade-offs resultVarious trade-offs result• Among these for a CMP is how many cores and Among these for a CMP is how many cores and
how much cache on each chiphow much cache on each chip• Our project results suggest an optimal Our project results suggest an optimal
configuration for a 16-processor system configuration for a 16-processor system running web-based applicationsrunning web-based applications
OutlineOutline
• MotivationMotivation• Experiments PerformedExperiments Performed• Simulator EnvironmentSimulator Environment• ResultsResults• Project ShortcomingsProject Shortcomings• Future WorkFuture Work• Conclusions & SummaryConclusions & Summary
ExperimentsExperiments
• Intended experiments not performed due to simulator Intended experiments not performed due to simulator limitationslimitations
• Intended experiments: Each core equivalent to .5 MB L2 cacheIntended experiments: Each core equivalent to .5 MB L2 cache• Ran apache_8, oltp_2, zeus_8Ran apache_8, oltp_2, zeus_8
Cores Per ChipL2 Size L2 Assoc. L2 Size L2 Assoc.
16 4 MB 2,4,8 2 MB 2,4,88 8 MB 2,4,8 4 MB 2,4,84 10 MB 2,4,8 8 MB 2,4,82 11 MB 2,4,8 12 MB 3,6,121 11.5 MB 2,4,8 16 MB 4,8,16
Intended Simulated
Simulator EnvironmentSimulator Environment
• All nodes include 32 KB, 2 way L1 I All nodes include 32 KB, 2 way L1 I & D caches& D caches
• Each nodes has its own L2 bank, Each nodes has its own L2 bank, regardless of L2 size or assoc.regardless of L2 size or assoc.
• All other ruby and opal settings left All other ruby and opal settings left at defaultat default
Results - ApacheResults - Apache
0
2
4
6
8
10
12
14
16
18
20
CPI
MissesPerThousand
Results - OLTPResults - OLTP
0
2
4
6
8
10
12
14
16
18
20
CPI
MissesPerThousand
Results – ZeusResults – Zeus
0
2
4
6
8
10
12
14
16
18
20
CPI
MissesPerThousand
Project Shortcomings & Project Shortcomings & Future WorkFuture Work
• Longer runs needed for convincing Longer runs needed for convincing datadata
• Test different number of Test different number of processors/systemprocessors/system
• Add L3 cache to memory hierarchyAdd L3 cache to memory hierarchy
ConclusionsConclusions
• CPI (IPC) changes little in a 16-processor CPI (IPC) changes little in a 16-processor system as number of cores/chip variessystem as number of cores/chip varies
• This happens despite rapid system-wide This happens despite rapid system-wide L2 cache growth with added chipsL2 cache growth with added chips
• Best performance per cost is with all 16 Best performance per cost is with all 16 processors on one chipprocessors on one chip– Even with 2 MB total L2Even with 2 MB total L2– Would be helped by off-chip L3Would be helped by off-chip L3
Project SummaryProject Summary
We look here!
50 miles