32
Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Embed Size (px)

Citation preview

Page 1: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Multimedia Workloads versus SPEC Benchmarks

Christopher Martinez, Mythri Pinnamaneni, and Eugene JohnUniversity of Texas – San Antonio

Page 2: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Outline

MotivationMultimedia WorkloadsCycles Per InstructionBranch PredictionCache PerformanceConclusion

Page 3: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Motivation

The common workloads for the home user now focus upon entertainmentFor the home user entertainment performance is the selling pointThere are many media benchmarks but can SPEC benchmarks give some insight to entertainment applications?

Page 4: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Objective

Understand the performance characteristics of multimedia workloads

Compare them against SPEC CPU 2000

Page 5: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Multimedia Workloads

Codecs used include: mp3, aac, MPEG2(dvd), windows media(dvd, HD), and MPEG4

Examine multimedia playback and creation (decoding/encoding)

Page 6: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Multimedia Workloads

Decoding MP3/AAC – iTunes, Winamp,

RealPlayer Video – Windows Media Player

Encoding MP3 – iTunes, Windows Media Player,

RealPlayer AAC – iTunes, RealPlayer Video – Windows Encoder

Page 7: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Multimedia Workloads

MP3 files used a bitrate of 128kbpsAAC files used a bitrate of 128kbpsVideo files used presets from applicationsVideo was a TV capture of a football gameAudio encoding was done on Beethoven Symphonie Pastoraie Audio playback was done on “Boulevard Of Broken Dreams” by Greenday

Page 8: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Performance

Performance based on common measurements: cycles per instruction (CPI), uops per instruction, branch prediction, cache hit rateUse on chip performance counters on the Pentium 4 processorUse Vtune to capture the on chip counters

Page 9: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

CPIOur test were performed on a Pentium 4 which is capable of executing 6 micro operation per second (uops)Audio decoding CPI --- 1.85 - 3.55Audio encoding CPI --- 1.40 - 2.11Video decoding --- 1.96 - 2.56Video encoding --- 1.82 and 2.08Integer SPEC 2000 CPI --- 1.16 - 8.54Floating SPEC 2000 CPI --- 4.72 – 8.31

Page 10: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

CPI

0

0.5

1

1.5

2

2.5

3

3.5

4

CPI

Page 11: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

uops

Audio decoding uops --- 1.38 – 1.71Audio encoding uops --- 1.30 – 1.41Video decoding uops --- 1.28 – 1.43Video encoding uops --- 1.29 – 1.31SPEC 2000 integer uops --- 1.29 – 2.11SPEC 2000 float uops --- 1.32 – 2.48

Page 12: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Prediction

SPEC benchmarks have a large percentage of branch instructions than media applicationsAudio decoding -- 12% branch instructionsAudio encoding -- 7% branch instructionsVideo decoding & encoding -- 8% branch instructionsSPEC -- 13% - 20% branch instructions

Page 13: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Prediction

Media and SPEC benchmark exhibit a high branch prediction rate Prediction rates of 94% and higher in

most cases

With media application there is a high correlation between misprediction and CPI

Page 14: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Prediction

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

CPI

br miss

Page 15: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Cache Performance

The Pentium 4 processor has two level cache 1st level 16KB & 2nd level 1MB

Multimedia deals with data in a linear fashion Audio/Video must be played in order This sequential data should allow for high hit

rates

Since SPEC benchmark covers a wide application range not all benchmarks will resemble the media hit rates

Page 16: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

1st Level Cache Performance

For 1st level cache hit rates the multimedia had hit rates of 93% and higherHalf of the SPEC benchmarks had similar 1st level hit rates Remainder of the SPEC benchmarks

were considerable worst performance

Page 17: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

1st Level Cache Performance

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

CPI

L1 miss

Page 18: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

2nd Level Cache Performance

For all multimedia application 2nd level cache had a hit rate of 99.8% or greaterOnly 5 of the 14 SPEC benchmarks had similar 2nd level hit rates Most of the remaining SPEC

benchmarks had 98% or higher but 2 SPEC had 86%

Page 19: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

2nd Level Cache Performance

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

CPI

L2 miss

Page 20: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Conclusion

Audio and video have similar range in CPI, uops per instruction, and uops per cycleSPEC programs exhibit performance characteristics in a much larger range than media. i.e SPEC suites are very diverse

Page 21: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Conclusion

Both audio and video are comparable to SPEC in 2nd level cache performanceHalf of the SPEC benchmarks resemble audio and video in 1st level cacheSPEC benchmarks can give some insight into performance of media applications

Page 22: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

CPI iTunes MP3/AAC Decode 1.85 / 1.98

WMV DVD/HD Decode 1.96 / 2.14

Video Encode Pass1/Pass2 2.02 / 1.82

RealPlayer MP3 Encode 2.02

iTunes MP3 Encode 2.07

gcc / crafty / praser 1.81 / 1.81 / 1.86

bzip2 2.06

Encode WMP MP3/ Real AAC 1.66 / 1.71

Encode iTunes AAC 1.40

gzip/ vortex/ gap 1.52 / 1.32 / 1.40

Page 23: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

CPI

Winamp MP3 Decode 3.11

Real MP3 Decode 3.55

vpr 3.17

Twolf 3.36

Winamp AAC decode 2.43

MPEG2 2.38

MPEG4 2.59

Real AAC Decode 2.82

eon 2.53

Page 24: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

uops

uops/instr

MP3 Decode RealPlayer & iTunes 1.54

AAC Decode RealPlayer/iTunes 1.57/1.61

vortex 1.60

parser 1.52

gap 1.53

twolf 1.56

Page 25: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

uops Encode MP3 WMP/ Real/ iTunes

1.49 / 1.38 / 1.41

Encode AAC Real / iTunes 1.38 / 1.30

Winamp AAC Decode 1.38

MPEG2 / MPEG4/ WMV DVD 1.43 / 1.37 / 1.28

WMV HD / pass1 / pass2 1.31 / 1.31 / 1.29

gzip / mcf / vpr 1.35 / 1.29 / 1.46

art / crafty / perlbmk 1.32 / 1.31 / 1.48

bzip2 1.42

Page 26: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

uops

Besides just similar number of uops one can also look at the cycles to complete the uop

Cycle/uop CPI

iTunes AAC Encode 1.08 1.40

gcc 1.05 1.81

gzip 1.13 1.52

Page 27: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

uops

Decode Real MP3/AAC 2.30 / 1.80

Decode winamp MP3 / AAC 1.80 / 1.76

vpr / twolf 2.17 / 2.15

Decode iTunes AAC / MP3 1.23 / 1.20

parser / eon 1.22 / 1.20

Pass1 / pass2 1.59 / 1.42

bzip2 1.44

Encode MP3 Real / iTunes 1.47 / 1.47

Page 28: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Prediction% of branches

Prediction Rate Mispredict/Instr

Winamp MP3

12.8 94.92 0.0065

Real MP3 9.41 91.50 0.0080

iTunes MP3 11.76 97.84 0.0025

Winamp AAC

16.85 96.88 0.0053

Real AAC 13.02 95.26 0.0060

iTunes AAC 12.81 98.16 0.0024Audio Decoding

Page 29: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Prediction% of branches

Prediction Rate Mispredict/Instr

WMP MP3 9.08 96.96 0.0028

Real MP3 10.42 95.86 0.0043

iTunes MP3 0.53 94.87 0.0055

Real AAC 7.74 94.52 0.0043

iTunes AAC 7.68 95.36 0.0035

Audio Encoding

Page 30: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Prediction% of branches

Prediction Rate Mispredict/Instr

MPEG2 (DVD)

8.91 92.93 0.0063

MPEG4 8.28 96.76 0.0027

WMV DVD 5.12 95.86 0.0021

WMV HD 9.89 96.30 0.0018

WMV HD -Pass1

6.31 94.69 0.0033

WMV HD - Pass2

9.28 95.46 0.0042

Video

Page 31: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Predictiongcc 21.84 96.91 0.0067

gzip 19.10 94.89 0.0097

mcf 24.25 95.78 0.0102

vortex 21.22 99.75 0.0005

vpr 16.57 92.86 0.0118

art 14.21 99.21 0.0011

equake 11.00 98.21 0.0020

parser 20.80 96.65 0.0074

crafty 15.76 94.20 0.0091

eon 13.45 97.12 0.0039

gap 17.51 98.57 0.0025

perlbmk 21.18 98.56 0.0031

bzip2 14.83 94.35 0.0084

twolf 16.48 88.39 0.0019

Page 32: Multimedia Workloads versus SPEC Benchmarks Christopher Martinez, Mythri Pinnamaneni, and Eugene John University of Texas – San Antonio

Branch Prediction

The high correlation between branch prediction and CPI can give improvement insightWhen new CPU enhancements show improvement in SPEC, a similar or higher gain will be observed in multimedia applications