54
Multimedia Signal Processing on Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs with Many Cores with Many Cores Yen Yen-Kuang Chen Kuang Chen Intel Corporation Intel Corporation [email protected] [email protected] 2 Summary Summary Multi Multi-Core Core is is Becoming Mainstream Becoming Mainstream Architecture: Architecture: Number of cores per chip will grow quickly Number of cores per chip will grow quickly Power consumption is Power consumption is a key driver a key driver Algorithm: Algorithm: Future processors demand special Future processors demand special designs designs Must consider parallelism Must consider parallelism Algorithms often need changes Algorithm Algorithm-Architecture Co Architecture Co-Exploration Exploration Architecture matters Architecture matters Exploit Exploit data data-level and functional level and functional-level parallelism level parallelism Algorithm with minimal overhead/data dependencies Algorithm with minimal overhead/data dependencies Balance loads for better performance scaling Balance loads for better performance scaling Take advantage of sharing Take advantage of sharing cache cache

Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

  • Upload
    others

  • View
    35

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Mu

ltim

ed

ia S

ign

al P

roc

es

sin

g o

n

Mu

ltim

ed

ia S

ign

al P

roc

es

sin

g o

n

CP

Us

C

PU

s a

nd

a

nd

GP

Us

G

PU

s w

ith

Ma

ny C

ore

sw

ith

Ma

ny C

ore

s

Yen

Yen

--Ku

an

g C

hen

Ku

an

g C

hen

Inte

l C

orp

ora

tio

nIn

tel C

orp

ora

tio

n

y.k

.ch

en

@ie

ee.o

rgy.k

.ch

en

@ie

ee.o

rg

2

Su

mm

ary

Su

mm

ary

Mu

lti

Mu

lti--

Co

re

Co

re is

is

Be

co

min

g M

ain

str

ea

mB

ec

om

ing

Ma

ins

tre

am

��A

rchitectu

re:

Arc

hitectu

re: N

um

ber

of

core

s p

er

chip

will

gro

w q

uic

kly

Num

ber

of

core

s p

er

chip

will

gro

w q

uic

kly

––P

ow

er

co

nsu

mp

tio

n is

Po

wer

co

nsu

mp

tio

n is a

key d

river

a k

ey d

river

��A

lgorith

m:

Alg

orith

m: F

utu

re p

rocessors

dem

and s

pecia

l F

utu

re p

rocessors

dem

and s

pecia

l desig

ns

desig

ns

––M

ust

co

nsid

er

para

lleli

sm

Mu

st

co

nsid

er

para

lleli

sm

•A

lgorith

ms o

ften n

eed c

hang

es

��A

lgorith

mA

lgorith

m--A

rchitectu

re C

oA

rchitectu

re C

o--E

xplo

ration

Explo

ration

––A

rch

itectu

re m

att

ers

Arc

hit

ectu

re m

att

ers

––E

xp

loit

E

xp

loit

data

data

--level an

d f

un

cti

on

al

level an

d f

un

cti

on

al--

level p

ara

lleli

sm

le

vel p

ara

lleli

sm

––

Alg

ori

thm

wit

h m

inim

al o

verh

ead

/data

dep

en

den

cie

sA

lgo

rith

m w

ith

min

imal o

verh

ead

/data

dep

en

den

cie

s––

Bala

nce lo

ad

s f

or

bett

er

pe

rfo

rman

ce s

cali

ng

Bala

nce lo

ad

s f

or

bett

er

pe

rfo

rman

ce s

cali

ng

––T

ake a

dvan

tag

e o

f sh

ari

ng

T

ake a

dvan

tag

e o

f sh

ari

ng

cach

ecach

e

Page 2: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

3

Ou

tlin

eO

utl

ine

��M

oti

vati

on

& c

urr

en

t tr

en

ds

Mo

tivati

on

& c

urr

en

t tr

en

ds

––B

asic

pri

ncip

les in

mu

lti

Basic

pri

ncip

les in

mu

lti--

co

re a

rch

itectu

reco

re a

rch

itectu

re

��T

heo

ry a

nd

pri

ncip

les i

n

Th

eo

ry a

nd

pri

ncip

les i

n p

ara

lleli

zati

on

para

lleli

zati

on

––C

PU

an

d G

PU

have s

am

e p

rin

cip

les

CP

U a

nd

GP

U h

ave s

am

e p

rin

cip

les

��A

dvan

ce

d

Ad

van

ce

d o

pti

miz

ati

on

tech

niq

ues i

n m

ult

io

pti

miz

ati

on

tech

niq

ues i

n m

ult

i--co

re

co

re a

rch

itectu

rearc

hit

ectu

re

––C

PU

an

d G

PU

have d

iffe

ren

t tr

eatm

en

tsC

PU

an

d G

PU

have d

iffe

ren

t tr

eatm

en

ts

��A

sp

ecif

ic d

esig

n e

xam

ple

A

sp

ecif

ic d

esig

n e

xam

ple

��S

um

mary

Su

mm

ary

4

Mo

re T

ran

sis

tors

M

ore

Tra

ns

isto

rs &

Be

tte

r P

erf

orm

an

ce

& B

ett

er

Pe

rfo

rma

nc

e

1E

+3

1E

+4

1E

+5

1E

+6

1E

+7

1E

+8

1E

+9

1E

+1

0

1970

1975

1980

1985

1990

1995

2000

2005

Ye

ar

Number of transisotrs

40048008

8080

i286

i386

i486

Pentium

Pentium II

Pentinum III Pentium 4

Itanium Itanium 2 Itanium 2 (w/ 9MB cache) .

Dual Core Itanium .

8086

1

10

10

0

10

00 1

98

01

98

51

99

01

99

52

00

0

Ye

ar

Relative performance since 1980

Mic

rop

roce

sso

r M

IPS

~5

0%

annually

So

urc

e:

1. M

oo

re's

La

w 4

0th

An

niv

ers

ary

(h

ttp

://w

ww

.in

tel.co

m/p

ressro

om

/kits

/eve

nts

/moo

res_la

w_

40th

/)

2. D

. A

. P

atte

rso

n, "L

ate

ncy L

ag

s B

an

dw

idth

," C

om

mu

nic

atio

n o

f th

e A

CM

, V

ol.

47

, n

o. 1

0, p

p. 7

1-7

5, O

ct. 2

00

04

.

Faste

r co

mp

ute

rs e

nab

le n

ovel

ap

pli

cati

on

sF

aste

r co

mp

ute

rs e

nab

le n

ovel

ap

pli

cati

on

s

Page 3: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

5

Th

erm

al

De

sig

n P

oin

ts

0

20

40

60

80

100

120

140 1

992

1994

1996

1998

2000

2002

2004

2006

2008

Year

Watts

Pentium

Pentium MMX

Pentium II

Pentium III

Pentium 4

Pentium D

Core 2 Dual

Core 2 Quad

Mo

re P

ow

er,

to

o!

Mo

re P

ow

er,

to

o!

So

urc

e: D

eskto

p C

PU

Co

mp

ariso

n G

uid

e R

ev.

3.3

(h

ttp

://w

ww

.te

ch

arp

.co

m/s

ho

wa

rtic

le.a

sp

x?

art

no

=3

37

)

Tra

nsis

tors

are

“fr

ee,”

bu

t p

ow

er

is e

xp

en

siv

eTra

nsis

tors

are

“fr

ee,”

bu

t p

ow

er

is e

xp

en

siv

e

6

Ba

sic

Co

mp

ute

r A

rch

ite

ctu

reB

as

ic C

om

pu

ter

Arc

hit

ec

ture

��P

erf

orm

an

ce is r

ou

gh

ly p

rop

ort

ion

al to

c*f

Perf

orm

an

ce is r

ou

gh

ly p

rop

ort

ion

al to

c*f

��H

igh

er

Hig

her

freq

uen

cy

freq

uen

cy � ���� ���

mo

re p

ow

er

mo

re p

ow

er

––P

ow

er

is p

rop

ort

ion

al to

cV

Po

wer

is p

rop

ort

ion

al to

cV

22ff

––V

olt

ag

e m

ust

scale

wit

h f

req

uen

cy

Vo

ltag

e m

ust

scale

wit

h f

req

uen

cy

––2x f

req

uen

cy

2x f

req

uen

cy � ���� ���

2x p

erf

orm

an

ce, b

ut

8x p

ow

er!

2x p

erf

orm

an

ce, b

ut

8x p

ow

er!

��A

ltern

ati

vely

, w

e

Alt

ern

ati

vely

, w

e c

an

exp

lore

can

exp

lore

para

lleli

sm

para

lleli

sm

––K

eep

fre

qu

en

cy t

he s

am

eK

eep

fre

qu

en

cy t

he s

am

e

––U

se m

ore

are

a (

do

ub

le t

he c

ap

acit

an

ce “

c”)

Use m

ore

are

a (

do

ub

le t

he c

ap

acit

an

ce “

c”)

––2x a

rea

2x a

rea � ���� ���

2x p

erf

orm

an

ce, w

ith

on

ly 2

x p

ow

er!

2x p

erf

orm

an

ce, w

ith

on

ly 2

x p

ow

er!

Page 4: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

7

Co

mm

erc

ial E

xa

mp

les

of

Mu

lti

Co

mm

erc

ial E

xa

mp

les

of

Mu

lti--

Co

re

Co

re

for

Ge

ne

ral P

urp

os

e C

om

pu

tin

gfo

r G

en

era

l P

urp

os

e C

om

pu

tin

g

��In

tel®

In

tel®

––P

en

tiu

DP

en

tiu

D

––C

ore

™ D

uo

Co

re™

Du

o

––C

ore

™ 2

Du

oC

ore

™ 2

Du

o

––C

ore

™ 2

Qu

ad

Co

re™

2 Q

uad

––C

ore

™ i7

Co

re™

i7

��IB

MIB

M

––C

EL

L B

road

ban

d E

ng

ine

CE

LL

Bro

ad

ban

d E

ng

ine

��S

un

Su

n

––U

ltra

SP

AR

CU

ltra

SP

AR

CT

1 (

co

de n

am

e:

Nia

gara

)T

1 (

co

de n

am

e:

Nia

gara

)

Die

ph

oto

of In

tel 4

5n

m d

ua

l co

re p

roce

sso

rs

So

urc

e:

http

://w

ww

.inte

l.co

m/p

ressro

om

/arc

hiv

e/r

ele

ase

s/

20

07

032

8fa

ct.h

tm

So

urc

e:

http

://e

n.w

ikip

ed

ia.o

rg/w

iki

/Ce

ll_m

icro

pro

ce

ssor

So

urc

e: h

ttp

://e

n.w

ikip

ed

ia.o

rg/w

iki/U

ltra

SP

AR

C_

T1

Pro

gra

mm

ab

le G

PU

•E

arl

y g

rap

hic

s p

roce

sso

rs

–O

fflo

ad

ed

gra

ph

ics

fro

m t

he

CP

U

–S

imp

le, f

ixe

d f

un

ctio

n p

ipe

line

•G

PU

pe

rfo

rma

nce

ra

mp

–1

.7x

FLO

PS

incr

ea

se p

er

yea

r *

–H

igh

co

mp

uta

tio

na

l de

nsi

ty

gain

ed

by

incr

ea

sed

pa

ralle

lism

–A

lso

by

larg

er

die

& m

ore

po

we

r

•C

ha

ng

ing

gra

ph

ics

pip

elin

e

–P

rog

ram

ma

ble

sta

ge

s a

dd

ed

Fra

me

Bu

ffe

rB

len

d

Inp

ut

Da

ta

Pix

el

Pro

ce

ss

ing

Ra

ste

riza

tio

n

Pri

mit

ive

Se

tup

Tra

ns

form

ati

on

an

d L

igh

tin

g

Fra

me

Bu

ffe

r

Inp

ut

Da

ta

Pri

mit

ive

Se

tup

Ra

ste

riza

tio

n

Fra

me

Bu

ffe

rB

len

d

Pix

el

Sh

ad

ing

Fra

me

Bu

ffe

r

Ve

rte

xS

ha

din

g

Ge

om

etr

yS

ha

din

g

* S

ou

rce

: Jo

hn

Ow

en

s, “E

xp

erie

nce

s w

ith G

PU

Co

mp

utin

g”,

8th

An

nu

al I

EE

E/N

AT

EA

Co

nfe

ren

ce

, 2

00

7

Pre

20

01

:F

ixe

d F

un

cti

on

Po

st

20

06

:P

rog

ram

ma

ble

8

Page 5: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

9

Blo

ck

Dia

gra

m o

f G

rap

hic

Pip

eli

ne

Blo

ck

Dia

gra

m o

f G

rap

hic

Pip

eli

ne

Vertex

Index

Stream

3D API

Commands

Assembled

Primitives

Pixel

Updates

Pixel

Location

Stream

Programmable

Fragment

Processor

Programmable

Fragment

Processor

TransformedVertices

Programmable

Vertex

Processor

Programmable

Vertex

Processor

GPU

Front End

GPU

Front End

Prim

itive

Assembly

Prim

itive

Assembly

Frame

Buffer

Frame

Buffer

Raster

Operations

Rasterization

and

Interpolation

3D API:

OpenGL or

Direct3D

3D API:

OpenGL or

Direct3D

3D

Application

Or Game

3D

Application

Or Game

Pre-transformedVertices

Pre-transformedFragments

TransformedFragments

GPUCommand &Data Stream

CPU-G

PU Boundary (AGP/PCIe)

Fixed-function

pipeline

So

urc

e: U

PE

NN

's C

IS 6

65

tu

toria

ls: h

ttp

://w

ww

.se

as.u

pen

n.e

du/~

cis

665/

G8

0 T

hre

ad

Co

mp

uti

ng

Pip

elin

e

Alt

ern

ati

ve o

pe

rati

ng

mo

de

sp

eci

fica

lly f

or

com

pu

tin

g

L2

FB

SP

SP

L1

TF

Thread Processor

Vtx

Th

rea

d I

ssu

e

Se

tup

/ R

str

/ Z

Cu

ll

Ge

om

Th

rea

d I

ssu

eP

ixe

l T

hre

ad

Iss

ue

Inp

ut

Ass

em

ble

r

Ho

st

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

SP

SP

L1

TF

L2

FB

L2

FB

L2

FB

L2

FB

L2

FB

Loa

d/s

tore

Glo

ba

l M

em

ory

Th

rea

d E

xe

cuti

on

Ma

na

ge

r

Inp

ut

Ass

em

ble

r

Ho

st

Tex

ture

Te

xtu

reT

ex

ture

Te

xtu

reT

ex

ture

Te

xtu

reT

ex

ture

Te

xtu

reT

ex

ture

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Loa

d/s

tore

Loa

d/s

tore

Loa

d/s

tore

Loa

d/s

tore

Loa

d/s

tore

10

So

urc

e: U

PE

NN

's C

IS 6

65

tu

toria

ls: h

ttp

://w

ww

.se

as.u

pen

n.e

du/~

cis

665/

Page 6: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

11

So

ftw

are

S

oft

ware

To

ol

To

ol o

n G

PU

on

GP

U

CPU C program

CUDA C program

void add_matrix_cpu

(float *a, float *b, float *c, int

N)

{

inti, j, index;

for (i=0; i<N; i++) {

for (j=0; j<N; j++) {

index = i+j*N;

c[index]=a[index]+b[index];

}

}

} void main()

{

...

add_matrix_cpu(a,b,c,N);

}

__global__ void add_matrix_gpu

(float *a, float *b, float *c, int

N)

{

inti=blockidx.x*blodkDim.x+threadidx.x;

intj=blockidx.y*blodkDim.y+threadidx.y;

index = i+j*N;

c[index]=a[index]+b[index];

} void main()

{

...

dim3 dimBlock

(blocksize,blocksize);

dim3 imGrid

(N/dimBlock.x,N/dimBlock.y);

add_matrix_gpu<<<dimGrid,dimBlock>>>(a,b,c,N);

}

��H

igh

Hig

h--l

evel p

rog

ram

min

g lan

gu

ag

es

level p

rog

ram

min

g lan

gu

ag

es

––H

LS

L, C

g, G

LS

L, C

TM

, H

LS

L, C

g, G

LS

L, C

TM

, B

roo

kG

PU

Bro

okG

PU

––C

UD

A (

Co

mp

ute

Un

ifie

d D

evic

e A

rch

itectu

re)

CU

DA

(C

om

pu

te U

nif

ied

Devic

e A

rch

itectu

re)

––O

pen

CL

Op

en

CL

��M

an

y n

on

Man

y n

on

--gra

ph

ics a

pp

licati

on

s h

ave b

een

bu

ilt

gra

ph

ics a

pp

licati

on

s h

ave b

een

bu

ilt

––h

ttp

://w

ww

.gp

gp

u.o

rgh

ttp

://w

ww

.gp

gp

u.o

rg

12

Mu

lti

Mu

lti--

co

re is

c

ore

is

Pre

va

len

tP

reva

len

t��

Gra

ph

ics p

rocessin

g u

nit

(G

PU

)G

rap

hic

s p

rocessin

g u

nit

(G

PU

)––

NIV

DIA

8800 G

TX

has 1

28 s

tream

ing

pro

cesso

rsN

IVD

IA 8

800 G

TX

has 1

28 s

tream

ing

pro

cesso

rs––

NIV

DIA

GT

X 2

80 h

as 2

40

str

eam

ing

pro

cesso

rsN

IVD

IA G

TX

280 h

as 2

40

str

eam

ing

pro

cesso

rs

��In

tel u

pco

min

g

Inte

l u

pco

min

g L

arr

ab

ee

Larr

ab

ee

pro

cesso

rsp

rocesso

rs��

Mu

lti

Mu

lti--

pro

cesso

r syste

mp

rocesso

r syste

m--o

no

n--aa

--ch

ip (

ch

ip (

MP

So

CM

PS

oC

) ) ��

Em

bed

ded

Em

bed

ded

––3D

lab

s’s

DM

S3D

lab

s’s

DM

S--0

202

––A

nalo

g D

evic

es’

AD

SP

An

alo

g D

evic

es’

AD

SP

--BF

561

BF

561

––C

rad

le T

ech

no

log

y’s

CT

3616

Cra

dle

Tech

no

log

y’s

CT

3616

––F

reescale

’sF

reescale

’sM

SC

8156

MS

C8156

––S

an

db

rid

ge

San

db

rid

ge

Tech

no

log

ies’

SB

3500

Tech

no

log

ies’

SB

3500

––T

en

silic

a’s

Ten

silic

a’s

388V

DO

388V

DO

––T

I’s

TI’s D

aV

inci

DaV

incian

d O

MA

P p

rocesso

rsan

d O

MA

P p

rocesso

rs––

Tilera

’sT

ilera

’sT

ILE

64 p

rocess

or

TIL

E64 p

rocess

or

Page 7: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

13

1

10

100

1000 2

006

2008

2010

2012

2014

2016

2018

Year

The number of cores

2x e

very

tw

o y

ears

2x e

very

1.5

years

Fu

ture

of

Mu

lti

Fu

ture

of

Mu

lti--

co

res

co

res

��M

oo

re’s

Law

M

oo

re’s

Law

� ���� ���# o

f co

res w

ill d

ou

ble

18

# o

f co

res w

ill d

ou

ble

18––24 m

on

ths

24 m

on

ths

��2007

2007 ––

8 c

ore

s8 c

ore

s

��2009

2009 ––

16

16

��2013

2013 ––

64

64

��2015

2015 ––

128

128

��2021

2021 ––

1024

1024

Fu

ture

arc

hit

ectu

re w

ill h

ave m

an

y c

ore

s

Fu

ture

arc

hit

ectu

re w

ill h

ave m

an

y c

ore

s

14

Mu

lti

Mu

lti--

co

re is

c

ore

is

Be

co

min

g M

ain

str

ea

mB

ec

om

ing

Ma

ins

tre

am

��“T

he s

eq

uen

tial p

roce

sso

r era

is n

ow

over.

” “

Th

at

“T

he s

eq

uen

tial p

roce

sso

r era

is n

ow

over.

” “

Th

at

is b

ad

new

s f

or

so

ftw

are

co

mp

an

ies.”

is b

ad

new

s f

or

so

ftw

are

co

mp

an

ies.”

––W

. G

ibb

s, "A

Sp

lit

at

the C

ore

," S

cie

nti

fic A

meri

can

, N

ov

W. G

ibb

s, "A

Sp

lit

at

the C

ore

," S

cie

nti

fic A

meri

can

, N

ov

2004

2004

��D

uri

ng

th

e 8

0s, N

SF

tri

ed

to

pers

uad

e t

he c

om

pu

ter

Du

rin

g t

he 8

0s, N

SF

tri

ed

to

pers

uad

e t

he c

om

pu

ter

ind

ustr

y, b

ut

“fo

un

d lit

tle in

tere

st.

” “

No

w t

he

ind

ustr

y, b

ut

“fo

un

d lit

tle in

tere

st.

” “

No

w t

he

mach

ines a

re h

ere

” …

to

get

aro

un

d “

po

wer

wall

.”

mach

ines a

re h

ere

” …

to

get

aro

un

d “

po

wer

wall

.”

“N

ew

er

ch

ips w

ith

mu

ltip

le p

rocesso

rs r

eq

uir

e

“N

ew

er

ch

ips w

ith

mu

ltip

le p

rocesso

rs r

eq

uir

e

dau

nti

ng

ly c

om

ple

x s

oft

ware

.”d

au

nti

ng

ly c

om

ple

x s

oft

ware

.”

––J.

J. M

ark

off

Mark

off

, "F

aste

r C

hip

s a

re L

eavin

g P

rog

ram

mers

in

, "F

aste

r C

hip

s a

re L

eavin

g P

rog

ram

mers

in

T

heir

Du

st,

" T

he N

ew

Yo

rk T

imes, D

ec 2

007

Th

eir

Du

st,

" T

he N

ew

Yo

rk T

imes, D

ec 2

007

Page 8: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

15

Pe

rfo

rma

nc

e v

s.

Sc

ala

bil

ity

Pe

rfo

rma

nc

e v

s.

Sc

ala

bil

ity

1122

4488

16

16

32

32

Number of cores

Number of cores

0022 11334455Relative Throughput Relative Throughput

gSpan algorithm

gSpan algorithm

Gaston algorithm

Gaston algorithm

Gra

ph M

inin

gG

raph M

inin

g

Rig

ht

alg

ori

thm

fo

r m

an

y c

ore

s i

s i

mp

ort

an

tR

igh

t alg

ori

thm

fo

r m

an

y c

ore

s i

s i

mp

ort

an

t

16

Mu

lti

Mu

lti--

Co

re is

Be

co

min

g M

ain

str

ea

mC

ore

is

Be

co

min

g M

ain

str

ea

m��

Arc

hit

ectu

re:

Nu

mb

er

of

co

res p

er

ch

ip w

ill

Arc

hit

ectu

re:

Nu

mb

er

of

co

res p

er

ch

ip w

ill

gro

w q

uic

kly

gro

w q

uic

kly

––P

ow

er

co

nsu

mp

tio

n is a

key d

river

Po

wer

co

nsu

mp

tio

n is a

key d

river

��A

lgo

rith

m:

Fu

ture

pro

cesso

rs d

em

an

d s

pecia

l A

lgo

rith

m:

Fu

ture

pro

cesso

rs d

em

an

d s

pecia

l d

esig

ns

desig

ns

––A

lgo

rith

ms o

ften

need

ch

an

ges

Alg

ori

thm

s o

ften

need

ch

an

ges

��A

lth

ou

gh

para

llel

co

mp

uti

ng

has b

een

stu

die

d

Alt

ho

ug

h p

ara

llel

co

mp

uti

ng

has b

een

stu

die

d

for

decad

es

for

decad

es

––In

th

e p

ast,

scie

nti

fic a

pp

licati

on

s

In t

he p

ast,

scie

nti

fic a

pp

licati

on

s � ���� ���

Few

peo

ple

Few

peo

ple

––T

od

ay,

mu

lti

To

day,

mu

lti--

co

re p

roces

so

rs a

re d

iffe

ren

tco

re p

roces

so

rs a

re d

iffe

ren

t•

Sm

alle

r la

st-

level on-d

ie c

ache

•F

aste

r in

ter-

core

on-d

ie c

om

munic

ation

Page 9: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

17

Ou

tlin

eO

utl

ine

��M

oti

vati

on

& c

urr

en

t tr

en

ds

Mo

tivati

on

& c

urr

en

t tr

en

ds

��T

heo

ry a

nd

pri

ncip

les i

n p

ara

lleli

zati

on

T

heo

ry a

nd

pri

ncip

les i

n p

ara

lleli

zati

on

(C

PU

an

d G

PU

have s

am

e p

rin

cip

les)

(CP

U a

nd

GP

U h

ave s

am

e p

rin

cip

les)

––A

vo

id s

eq

uen

tial d

ep

en

den

cie

sA

vo

id s

eq

uen

tial d

ep

en

den

cie

s

––M

axim

ize lo

ad

bala

nce

Maxim

ize lo

ad

bala

nce

––R

ed

uce o

verh

ead

Red

uce o

verh

ead

��A

dvan

ce

d o

pti

miz

ati

on

tech

niq

ues i

n m

ult

iA

dvan

ce

d o

pti

miz

ati

on

tech

niq

ues i

n m

ult

i--co

re a

rch

itectu

reco

re a

rch

itectu

re––

CP

U a

nd

GP

U h

ave d

iffe

ren

t tr

eatm

en

tsC

PU

an

d G

PU

have d

iffe

ren

t tr

eatm

en

ts

��A

sp

ecif

ic d

esig

n e

xam

ple

A

sp

ecif

ic d

esig

n e

xam

ple

��S

um

mary

Su

mm

ary

18

Mu

lti

Mu

lti--

thre

ad

: O

ne

wa

y t

o P

rog

ram

Mu

lti

thre

ad

: O

ne

wa

y t

o P

rog

ram

Mu

lti--

co

rec

ore

for

(i=

0;

i<it

era

tio

ns;

i++

)fo

r (i

=0;

i<it

era

tio

ns;

i++

)

{{

part

1(i

, b

uff

);

part

1(i

, b

uff

);

// d

ata

are

passed

via

bu

ff//

data

are

passed

via

bu

ff

part

2(i

, b

uff

);p

art

2(i

, b

uff

);

}}

��S

ing

leS

ing

le--t

hre

ad

ed

execu

tio

n m

od

el

thre

ad

ed

execu

tio

n m

od

el

��M

ult

iM

ult

i--th

read

ed

execu

tio

n m

od

el

thre

ad

ed

execu

tio

n m

od

el

Pa

rt 1

(1)

Pa

rt 2

(1)

Pa

rt 1

(2)

Pa

rt 2

(2)

Pa

rt 1

(3)

Pa

rt 2

(3)

Pa

rt 1

(1)

Pa

rt 2

(1)

Pa

rt 1

(2)

Pa

rt 2

(2)

Pa

rt 1

(3)

Pa

rt 2

(3)

Pa

rt 1

(4)

Pa

rt 2

(4)

Pa

rt 1

(4)

Pa

rt 2

(4)

Pro

ce

sso

r 1

Pro

ce

sso

r 2

Page 10: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

19

A R

ea

l T

hre

ad

ing

Ex

am

ple

A R

ea

l T

hre

ad

ing

Ex

am

ple

vo

id t

hre

ad

1(

)v

oid

th

read

1(

)

{

{ in

t b

_id

x=

0;

// in

dex t

o 2

bu

ffers

in

t b

_id

x=

0;

// in

dex t

o 2

bu

ffers

part

1(i

,bu

f[b

_id

x])

;p

art

1(i

,bu

f[b

_id

x])

;

b_id

x=

1b

_id

x=

1--b

_id

x;

b_id

x;

for

(i=

1;

i<it

era

tio

ns;

i++

)fo

r (i

=1;

i<it

era

tio

ns;

i++

)

{{

Sig

nal(

&sig

nal,1);

Sig

nal(

&sig

nal,1);

part

1(i

,bu

f[b

_id

x])

;p

art

1(i

,bu

f[b

_id

x])

;

Wait

Fo

rSig

nal(

en

d,1

);W

ait

Fo

rSig

nal(

en

d,1

);

b_id

x=

1b

_id

x=

1--b

_id

x;

b_id

x;

}} Sig

nal(

&sig

nal,1);

Sig

nal(

&sig

nal,1);

}}

vo

id t

hre

ad

2(

)v

oid

th

read

2(

)

{{

int

b_id

x=

0;

// in

dex t

o 2

bu

ffers

in

t b

_id

x=

0;

// in

dex t

o 2

bu

ffers

for

(i=

0;

i<it

era

tio

ns;

i++

)fo

r (i

=0;

i<it

era

tio

ns;

i++

)

{{

Wia

tFo

rSig

nal(

sig

nal,1);

Wia

tFo

rSig

nal(

sig

nal,1);

part

2(i

,bu

f[b

_id

x])

;p

art

2(i

,bu

f[b

_id

x])

;

Sig

nal(

&en

d,1

);S

ign

al(

&en

d,1

);

b_id

x=

1b

_id

x=

1--b

_id

x;

b_id

x;

}}

}}

20

Pa

rall

eli

za

tio

n S

ch

em

es

Pa

rall

eli

za

tio

n S

ch

em

es

��D

ata

Data

--do

main

: fr

am

es,

sli

ces, an

d m

acro

blo

cks

do

main

: fr

am

es,

sli

ces, an

d m

acro

blo

cks

��F

un

cti

on

al

Fu

ncti

on

al--

do

main

: M

E/M

C,

DC

T/I

DC

T,

VL

C/V

LD

do

main

: M

E/M

C,

DC

T/I

DC

T,

VL

C/V

LD

��H

yb

rid

Hyb

rid

slic

es

Thre

ad 1

Thre

ad 2

VL

DID

CT

MC

VL

DID

CT

MC

pic

ture

Thre

ad 1

Thre

ad 2

VL

DID

CT

MC

Thre

ad 3

Page 11: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

21

Pa

rall

el

Pa

rall

el S

pe

ed

up

Sp

ee

du

p

��Id

eall

y:

fro

m o

ne p

rocesso

r to

2 p

rocesso

rsId

eall

y:

fro

m o

ne p

rocesso

r to

2 p

rocesso

rs

��S

peed

up

=T

Sp

eed

up

=T

se

qu

en

tia

ls

eq

ue

nti

al/T/T

pa

rallel

pa

rallel=

(t=

(tss+

t+

t pp)/

(t)/

(tss+

t+

t pp/2

)/2

)

��A

md

ah

l’s L

aw

A

md

ah

l’s L

aw

� ���� ���w

hen

wh

en

t st p

t p/2

t s

t p/2

Tse

quen

tial

Tp

ara

llel

s

Ps

Ps

Ps

t

tt

Ntt

tt

+≈

++∞

→N

22

Mo

re R

ea

lis

tic

Sp

ee

du

p R

ule

Mo

re R

ea

lis

tic

Sp

ee

du

p R

ule

��S

peed

up

Sp

eed

up

==

��T

o m

axim

ize p

ara

llel

To

maxim

ize p

ara

llel sp

eed

up

sp

eed

up

––A

vo

id s

eq

uen

tial d

ep

en

den

cie

sA

vo

id s

eq

uen

tial d

ep

en

den

cie

s

––M

axim

ize lo

ad

bala

nce

Maxim

ize lo

ad

bala

nce

––R

ed

uce o

verh

ead

sR

ed

uce o

verh

ead

s

––U

tilize c

ach

e e

ffic

ien

tly

Uti

lize c

ach

e e

ffic

ien

tly

Ot

t

tt

iP

N is

Ps

++

+

=]

[m

ax 1

t p/1

.9t s

t p/2

.1

Tp

ara

llel

overhead

t p/2

t s

t p/2

Tp

ara

llel

Page 12: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Avo

id S

eq

uen

tial

Dep

en

den

cie

sA

vo

id S

eq

uen

tial

Dep

en

den

cie

s

Ot

t

tt

iP

N is

Ps

++

+

=]

[m

ax 1

24

0

32

64

96

128

032

64

96

128

Wh

y A

re W

e S

o P

ara

no

id?

Wh

y A

re W

e S

o P

ara

no

id?

��U

se b

asic

Am

dah

l’s L

aw

: U

se b

asic

Am

dah

l’s L

aw

:

��E

xam

ple

E

xam

ple

11:

Assu

me w

e g

et

: A

ssu

me w

e g

et

11..99

x s

peed

up

wit

h t

wo

x s

peed

up

wit

h t

wo

th

read

sth

read

s

��E

xam

ple

E

xam

ple

22:

Assu

me

: A

ssu

me 11

% s

eri

al co

de

% s

eri

al co

de

0

32

64

96

12

8

03

26

49

61

28

2.5

% s

erial code

77%

belo

w lin

ear

perf

orm

ance a

t

128 t

hre

ads

()

++

Ntt

tt

Ps

Ps

/

56%

belo

w lin

ear

perf

orm

ance a

t

128 t

hre

ads

Page 13: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

25

��P

ara

lleli

ze e

very

mo

du

leP

ara

lleli

ze e

very

mo

du

le

––F

rom

4.2

x t

o 1

9.9

x o

n e

dg

e d

ete

cti

on

Fro

m 4

.2x t

o 1

9.9

x o

n e

dg

e d

ete

cti

on

��R

eR

e--o

rder

co

nstr

ain

ts t

o r

em

ove

ord

er

co

nstr

ain

ts t

o r

em

ove

dep

en

den

cie

sd

ep

en

den

cie

s

––F

rom

4x t

o 3

2x o

n p

hysic

s c

on

str

ain

t so

lver

Fro

m 4

x t

o 3

2x o

n p

hysic

s c

on

str

ain

t so

lver

��F

ind

dif

fere

nt

alg

ori

thm

Fin

d d

iffe

ren

t alg

ori

thm

Avo

id S

eq

ue

nti

al D

ep

en

de

ncie

sA

vo

id S

eq

ue

nti

al D

ep

en

de

ncie

s

26

Ex

am

ple

: C

an

ny E

dg

e D

ete

cto

rE

xa

mp

le:

Ca

nn

y E

dg

e D

ete

cto

r

��F

ou

r ste

ps:

Fo

ur

ste

ps:

––(a

) im

ag

e g

rad

ien

t an

d e

dg

e o

rien

tati

on

(a)

imag

e g

rad

ien

t an

d e

dg

e o

rien

tati

on

––(b

) n

on

(b)

no

n--m

axim

um

su

pp

ressio

n a

nd

hig

h g

rad

ien

t m

axim

um

su

pp

ressio

n a

nd

hig

h g

rad

ien

t p

ixel id

en

tifi

cati

on

pix

el id

en

tifi

cati

on

––(c

) h

yste

risis

(tr

acin

g f

rom

hig

h g

rad

ien

t p

ixels

)(c

) h

yste

risis

(tr

acin

g f

rom

hig

h g

rad

ien

t p

ixels

)

––(d

) fi

nal assig

nm

en

t o

f th

e b

inary

decis

ion

(d)

fin

al assig

nm

en

t o

f th

e b

inary

decis

ion

��S

tep

s a

, b

, &

d a

re e

asy t

o p

ara

lleli

ze i

n

Ste

ps a

, b

, &

d a

re e

asy t

o p

ara

lleli

ze i

n

imag

e d

om

ain

imag

e d

om

ain

(a)

(a)

(b)

(b)

(c)

(c)

Page 14: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

27

Hys

teri

sis

Te

sti

ng

Hys

teri

sis

Te

sti

ng

��2

2 t

hre

sh

old

s, a h

igh

th

resh

old

s, a h

igh

T_

hT

_h

an

d a

lo

w

an

d a

lo

w T

_w

T_w

��E

dg

eE

dg

e––

Pix

el w

ho

se v

alu

e i

s g

reate

r th

an

P

ixel w

ho

se v

alu

e i

s g

reate

r th

an

T_h

T_h

––PP

ixel w

ho

se v

alu

e i

s g

reate

r th

an

ix

el w

ho

se v

alu

e i

s g

reate

r th

an

T_w

T_w

an

d t

hat

is

an

d t

hat

is

co

nn

ecte

d t

o a

n e

dg

e p

ixe

lco

nn

ecte

d t

o a

n e

dg

e p

ixe

l

T_h

T_h

T_w

T_w

28

Pa

rall

eli

za

tio

n o

f H

ys

teri

sis

Pa

rall

eli

za

tio

n o

f H

ys

teri

sis

��C

an

no

t sim

ply

div

ide

Can

no

t sim

ply

div

ide

the im

ag

e i

nto

reg

ion

sth

e im

ag

e i

nto

reg

ion

s��

Du

pli

cati

on

s m

ay

Du

pli

cati

on

s m

ay

red

uce p

erf

orm

an

ce

red

uce p

erf

orm

an

ce

��C

on

ven

tio

nall

y, w

e u

se s

eri

al co

de

Co

nven

tio

nall

y, w

e u

se s

eri

al co

de

��B

ad

fo

r m

an

y c

ore

sB

ad

fo

r m

an

y c

ore

s

Ot

t

tt

iP

N is

Ps

++

+

=]

[m

ax 1

Page 15: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

29

08

16

24

32

08

16

24

32

Pro

ce

sso

rs

Speed-up

Ori

gin

al

With

Pa

ralle

l H

yste

risis

Lin

ea

r S

ca

ling

Sp

ee

du

p

Sp

ee

du

p A

fte

r C

are

ful P

ara

lle

liza

tio

nA

fte

r C

are

ful P

ara

lle

liza

tio

n

19.9

4.2~

5X

30

Re

su

lts

be

fore

an

d a

fte

r P

ara

lle

liza

tio

nR

es

ult

s b

efo

re a

nd

aft

er

Pa

rall

eli

za

tio

n

��S

am

e r

esu

lts

Sam

e r

esu

lts

––N

ice

Nic

e--t

oto--h

ave

have

––N

ot

alw

ays r

eq

uir

ed

N

ot

alw

ays r

eq

uir

ed

––M

ust

Mu

st--

have:

sem

an

tic c

orr

ectn

ess

have:

sem

an

tic c

orr

ectn

ess

––F

or

exam

ple

: ra

nd

om

nu

mb

er

gen

era

tor

Fo

r exam

ple

: ra

nd

om

nu

mb

er

gen

era

tor

��H

ave m

ore

H

ave m

ore

fre

ed

om

fr

eed

om

wh

en

para

llel

wh

en

para

llel

alg

ori

thm

alg

ori

thm

is

all

ow

ed

to

pro

du

ce d

iffe

ren

t re

su

ltis

all

ow

ed

to

pro

du

ce d

iffe

ren

t re

su

lt

––S

om

e a

lgo

rith

ms h

ave

no

sin

gle

co

rrect

an

sw

er

So

me a

lgo

rith

ms h

ave

no

sin

gle

co

rrect

an

sw

er

Do

n’t

be a

fraid

of

ch

an

gin

g r

esu

lts

Do

n’t

be a

fraid

of

ch

an

gin

g r

esu

lts

for

bett

er

para

lleli

sm

for

bett

er

para

lleli

sm

Page 16: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Maxim

ize L

oad

Bala

nce

Maxim

ize L

oad

Bala

nce

Ot

t

tt

iP

N is

Ps

++

+

=]

[m

ax 1

32

��MM

inim

ize t

he s

low

est

inim

ize t

he s

low

est

execu

tio

n t

ime

execu

tio

n t

ime

��M

axim

ize l

oad

bala

nce

Maxim

ize l

oad

bala

nce

––A

lmo

st

all

pro

cesso

rs h

ave s

am

e a

mo

un

t o

f lo

ad

Alm

ost

all

pro

cesso

rs h

ave s

am

e a

mo

un

t o

f lo

ad

––N

o p

rocesso

r is

id

leN

o p

rocesso

r is

id

le

Ma

xim

ize P

erf

orm

an

ce

Ma

xim

ize P

erf

orm

an

ce

0.5

0.5

0.6

0.4

vs.

0.5

0.6

{}

P

N i

iP

iP

N it

tt

ti

P

=

∑ =

=1

1]

[m

axm

in

Page 17: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

33

��D

yn

am

ic t

ask a

ssig

nm

en

tD

yn

am

ic t

ask a

ssig

nm

en

t

––4%

on

Hyp

er

4%

on

Hyp

er--

Th

read

ing

fo

r M

PE

G d

eco

din

g

Th

read

ing

fo

r M

PE

G d

eco

din

g

��M

ake t

asks s

mall

er

Make t

asks s

mall

er

––F

rom

2.8

x t

o 5

.7x o

n 8

pro

cesso

rs f

or

gra

ph

F

rom

2.8

x t

o 5

.7x o

n 8

pro

cesso

rs f

or

gra

ph

m

inin

gm

inin

g

��F

use m

ult

iple

lo

op

sF

use m

ult

iple

lo

op

s

––F

rom

2.9

x t

o 5

.5x o

n 8

pro

cesso

rs f

or

Fro

m 2

.9x t

o 5

.5x o

n 8

pro

cesso

rs f

or

Ad

aB

oo

st

Ad

aB

oo

st

��U

se d

ata

Use d

ata

--do

main

deco

mp

osit

ion

in

ste

ad

of

do

main

deco

mp

osit

ion

in

ste

ad

of

fun

cti

on

al

fun

cti

on

al--

do

main

deco

mp

osit

ion

do

main

deco

mp

osit

ion

��F

ind

dif

fere

nt

alg

ori

thm

Fin

d d

iffe

ren

t alg

ori

thm

Lo

ad

Ba

lan

ce

Ap

pro

ac

he

sL

oa

d B

ala

nc

e A

pp

roa

ch

es

34

(b)

Dyn

am

ic

dis

patc

hin

g

(a)

Sta

tic

dis

patc

hin

g

pic

ture

slic

es

Assig

ned

slic

es

Thre

ad 1

Thre

ad 2

Thre

ad 1

Thre

ad 2

Pa

rall

eli

ze

P

ara

lle

lize

MP

EG

Co

de

cM

PE

G C

od

ec

Page 18: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

35

Co

ars

eC

oa

rse

--Gra

ined

G

rain

ed

an

d

an

d F

ine

Fin

e--G

rain

ed

Gra

ine

d

Pa

ralle

lizatio

n o

f P

ara

lleliz

atio

n o

f P

refixS

pa

nP

refixS

pa

n

()

ab

cd

e

aa

ab

(ab)

aaa

aab

a(a

b)

cc

cd

(cd)

ccc

ccd

c(c

d)

(ce)

Coars

e

Fin

e

()

ab

cd

e

aa

ab

(ab)

aaa

aab

a(a

b)

cc

cd

(cd)

ccc

ccd

c(c

d)

(ce)

Coars

e

Fin

e

36

Tim

e V

ari

ati

on

via

T

ime

Va

ria

tio

n v

ia C

oa

rse

Co

ars

e--G

rain

ed

G

rain

ed

Pa

rall

eli

sm

Pa

rall

eli

sm

0

0.2

0.4

0.6

0.81

1.2

1.4

Task

Time

Vari

ati

on

V

ari

ati

on

� ���� ���L

oad

im

bala

nce

Lo

ad

im

bala

nce

Page 19: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

37

Sp

ee

du

p

Sp

ee

du

p v

s.

Pa

rtit

ion

ing

Le

ve

lsvs

. P

art

itio

nin

g L

eve

ls

Sm

all

er

Vari

ati

on

S

mall

er

Vari

ati

on

� ���� ���B

ett

er

load

bala

nce

Be

tter

load

bala

nce

2.7

3.3

4.6

4.7

5.1

5.5

5.8

01234567

12

34

56

7

Le

ve

ls o

f p

art

itio

ns

Speed-up on 8 proceessors

38

Pre

fer

Da

taP

refe

r D

ata

--Do

ma

in o

ve

r D

om

ain

ove

r F

un

cti

on

al

Fu

nc

tio

na

l--D

om

ain

Pa

rall

eli

za

tio

nD

om

ain

Pa

rall

eli

za

tio

n

��D

ata

Data

--do

main

: fr

am

es,

sli

ces, an

d

do

main

: fr

am

es,

sli

ces, an

d m

acro

blo

cks

macro

blo

cks

��F

un

cti

on

al

Fu

ncti

on

al--

do

main

: M

E/M

C,

DC

T/I

DC

T,

do

main

: M

E/M

C,

DC

T/I

DC

T,

VL

C/V

LD

VL

C/V

LD

slic

es

Thre

ad 1

Thre

ad 2

VL

DID

CT

MC

VL

DID

CT

MC

pic

ture

Thre

ad 1

Thre

ad 2

VL

DID

CT

MC

Thre

ad 3

Page 20: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Red

uce O

verh

ead

Red

uce O

verh

ead

Ot

t

tt

iP

N is

Ps

++

+

=]

[m

ax 1

40

Re

du

ce

Ove

rhe

ad

Re

du

ce

Ove

rhe

ad

��R

ed

uce c

om

mu

nic

ati

on

Red

uce c

om

mu

nic

ati

on

––U

se m

ore

co

mp

uta

tio

n in

ste

ad

Use m

ore

co

mp

uta

tio

n in

ste

ad

��R

ed

uce r

ep

eate

d c

om

pu

tati

on

Red

uce r

ep

eate

d c

om

pu

tati

on

��R

ed

uce

Red

uce l

ockin

g o

verh

ead

lockin

g o

verh

ead

––F

ine

Fin

e--g

rain

ed

lo

ck

gra

ined

lo

ck in

ste

ad

of

ins

tead

of

co

ars

eco

ars

e--g

rain

ed

lo

ck

gra

ined

lo

ck

––P

rivate

bu

ffer

inste

ad

of

sh

are

d m

em

ory

un

less

Pri

vate

bu

ffer

inste

ad

of

sh

are

d m

em

ory

un

less

necessary

necessary

��R

ed

uce b

arr

iers

or

syn

ch

ron

izati

on

Red

uce b

arr

iers

or

syn

ch

ron

izati

on

––P

ara

lleli

ze t

he o

ute

r lo

op

Para

lleli

ze t

he o

ute

r lo

op

Page 21: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

41

Ove

rhe

ad

: C

om

mu

nic

ati

on

Ove

rhe

ad

: C

om

mu

nic

ati

on

��G

asto

n

Gasto

n

––M

ore

co

mm

un

icati

on

betw

een

tasks

Mo

re c

om

mu

nic

ati

on

betw

een

tasks

––F

aste

r w

hen

sin

gle

Faste

r w

hen

sin

gle

--th

read

ed

thre

ad

ed

��G

sp

an

Gsp

an

––L

ess c

om

mu

nic

ati

on

betw

een

tasks

Less c

om

mu

nic

ati

on

betw

een

tasks

––S

low

er

wh

en

sin

gle

Slo

wer

wh

en

sin

gle

--th

read

ed

thre

ad

ed

42

Pe

rfo

rma

nc

e v

s.

Sc

ala

bil

ity

Pe

rfo

rma

nc

e v

s.

Sc

ala

bil

ity

1122

4488

16

16

32

32

Number of cores

Number of cores

0022 11334455

Relative Throughput Relative Throughput

gSpan algorithm

gSpan algorithm

Gaston algorithm

Gaston algorithm

Gra

ph M

inin

gG

raph M

inin

g

Page 22: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

43

Ove

rhe

ad

: R

ep

ea

ted

Co

mp

uta

tio

nO

ve

rhe

ad

: R

ep

ea

ted

Co

mp

uta

tio

n

64 p

art

itio

ns

4 p

art

itio

ns

Re

plic

ate

d d

ata

/work

44

Ad

vancin

g t

he fro

nt

Initia

l F

luid

-air

inte

rface

Fa

st

Ma

rch

ing

Me

tho

dF

as

t M

arc

hin

g M

eth

od

Page 23: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

45

Sw

eep a

long t

he g

rid n

odes a

nd

update

the d

ista

nce

Dif

fere

nt

Alg

ori

thm

:D

iffe

ren

t A

lgo

rith

m:

Fa

st

sw

ee

pin

g m

eth

od

Fa

st

sw

ee

pin

g m

eth

od

46

Pa

rall

el P

erf

orm

an

ce

Pa

rall

el P

erf

orm

an

ce

0

16

32

48

64

016

32

48

64

# o

f core

s

Speed-up

Fast

marc

hin

g m

eth

od

Fast

sw

eepin

g m

eth

od

~2

X

Page 24: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

47

Ove

rhe

ad

: G

ua

ran

tee

ing

Co

rre

ctn

es

s

Ove

rhe

ad

: G

ua

ran

tee

ing

Co

rre

ctn

es

s

��T

wo

ch

ecks c

an

be

cash

ed

at

the s

am

e t

ime

Tw

o c

hecks c

an

be

cash

ed

at

the s

am

e t

ime

��O

nly

O

nly

on

e o

f o

ne o

f th

e “

tran

sacti

on

s” s

ho

uld

go

th

e “

tran

sacti

on

s” s

ho

uld

go

th

rou

gh

thro

ug

h

��T

his

oft

en

req

uir

es

“lo

cks”

Th

is o

ften

req

uir

es

“lo

cks” � ���� ���

Overh

ead

Overh

ead

$1

00

$1

00

$1

00

Ba

nk

1

Ba

nk

2

48

Ho

ug

h T

ran

sfo

rmH

ou

gh

Tra

ns

form

��W

ide

ly u

se

d in

co

mp

ute

r vis

ion

an

d

Wid

ely

us

ed

in

co

mp

ute

r vis

ion

an

d

dig

ita

l im

ag

e p

roc

es

sin

gd

igit

al im

ag

e p

roc

es

sin

g

��F

or

ex

am

ple

, li

ne

de

tec

tio

nF

or

ex

am

ple

, li

ne

de

tec

tio

n

Page 25: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

49

Ho

ug

h T

ran

sfo

rm C

od

eH

ou

gh

Tra

ns

form

Co

de

for( j = 0; j < height; j++ )

for( j = 0; j < height; j++ )

for(

for( ii= 0;

= 0; ii< width;

< width; ii++ )

++ )

{{

if(

if( BinImage

BinImage[j][

[j][ii] != 0 )

] != 0 )

for( theta = 0; theta <

for( theta = 0; theta < numangle

numangle; theta++ )

; theta++ )

{{

rho =

rho = ii* Cos[theta] + j * Sin[theta];

* Cos[theta] + j * Sin[theta];

accum

accum[theta][rho]++;

[theta][rho]++;

}}

}}

��T

o g

uara

nte

e c

orr

ectn

ess

To

gu

ara

nte

e c

orr

ectn

ess

––U

se l

ock p

rote

ct

the d

ata

Use l

ock p

rote

ct

the d

ata

––P

rivate

bu

ffer

Pri

vate

bu

ffer

50

Re

Re

--ord

eri

ng

Lo

op

s in

Ho

ug

h T

ran

sfo

rmo

rde

rin

g L

oo

ps

in

Ho

ug

h T

ran

sfo

rmint

int

EdgePixelSize

EdgePixelSize= = 00;;

for( j =

for( j = 00; j < height; j++ )

; j < height; j++ )

for(

for( ii= = 00; ; ii< width;

< width; ii++ )

++ )

if(

if( BinImage

BinImage[j][

[j][ii] !=

] != 0 0 ) {

) {

EdgePixels

EdgePixels[[EdgePixelSize

EdgePixelSize++] = j;

++] = j;

EdgePixels

EdgePixels[[EdgePixelSize

EdgePixelSize++] =

++] = ii;;

}}

for(theta =

for(theta = 00; theta <

; theta < numangle

numangle; theta++)

; theta++)

for(m =

for(m = 00; m <

; m < EdgePixelSize

EdgePixelSize; m+=

; m+=22) {

) {

j =

j = EdgePixels

EdgePixels[m];

[m];

ii= = EdgePixels

EdgePixels[m+

[m+11];];

rho =

rho = ii*Cos[theta] + j*Sin[theta];

*Cos[theta] + j*Sin[theta];

accum

accum[theta][rho]++;

[theta][rho]++;

}}

Page 26: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

51

Old

vs

. N

ew

Ho

ug

h

Old

vs

. N

ew

Ho

ug

h T

ran

sfo

rmT

ran

sfo

rm

��O

ld (

scatt

er)

Old

(scatt

er)

��N

ew

(g

ath

er)

New

(g

ath

er)

Lock

52

08

16

24

32

40

48

56

64

08

16

24

32

40

48

56

64

Nu

mb

er

of

thre

ad

s

Parallel Speedup

old

Hou

gh

Tra

nsfo

rmn

ew

Ho

ugh

Tra

nsfo

rm

Sp

ee

du

p

Sp

ee

du

p A

fte

r C

are

ful P

ara

lle

liza

tio

nA

fte

r C

are

ful P

ara

lle

liza

tio

n

62

24

~2

.5X

Page 27: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

53

Ove

rhe

ad

: S

yn

ch

ron

iza

tio

n &

Ba

rrie

rsO

ve

rhe

ad

: S

yn

ch

ron

iza

tio

n &

Ba

rrie

rs

��T

ask v

ari

ati

on

can

be a

vera

ged

ou

t in

lo

ng

Task v

ari

ati

on

can

be a

vera

ged

ou

t in

lo

ng

--term

te

rm

��E

xp

licit

syn

ch

ron

izati

on

wil

l exp

ose t

he v

ari

ati

on

Exp

licit

syn

ch

ron

izati

on

wil

l exp

ose t

he v

ari

ati

on

54

Pa

rall

eli

ze

th

e O

ute

r L

oo

pP

ara

lle

lize

th

e O

ute

r L

oo

p

For

each

pix

el

For

each

pix

el

For

each

pix

el

Inn

er

loo

p

Fo

r e

ach

fr

am

e

Ou

ter

loo

p

Page 28: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

55

GC

UT

Im

ag

e S

eg

me

nta

tio

n o

n S

MP

GC

UT

Im

ag

e S

eg

me

nta

tio

n o

n S

MP

048

12

16

04

812

16

Num

ber

of C

ore

s

Parallel Speedup

Coars

e-G

rain

ed

Fin

e-G

rain

ed

56

Qu

ick

Su

mm

ary

of

the

Pri

nc

iple

sQ

uic

k S

um

ma

ry o

f th

e P

rin

cip

les

��A

vo

id s

eri

al

co

de

Avo

id s

eri

al

co

de

��M

axim

ize l

oad

bala

nce

Maxim

ize l

oad

bala

nce

��R

ed

uce o

verh

ea

dR

ed

uce o

verh

ea

d

Ot

t

tt

iP

N is

Ps

++

+

=]

[m

ax 1

Page 29: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

57

Ou

tlin

eO

utl

ine

��M

oti

vati

on

& c

urr

en

t tr

en

ds

Mo

tivati

on

& c

urr

en

t tr

en

ds

��T

heo

ry

Th

eo

ry a

nd

pri

ncip

les i

n

an

d p

rin

cip

les i

n p

ara

lleli

zati

on

para

lleli

zati

on

��A

dvan

ce

d

Ad

van

ce

d o

pti

miz

ati

on

tech

niq

ues i

n m

ult

io

pti

miz

ati

on

tech

niq

ues i

n m

ult

i--co

re

co

re a

rch

itectu

re

arc

hit

ectu

re

––C

PU

an

d G

PU

have d

iffe

ren

t tr

eatm

en

tsC

PU

an

d G

PU

have d

iffe

ren

t tr

eatm

en

ts

––In

cre

ase c

ach

e e

ffic

ien

cy

Incre

ase c

ach

e e

ffic

ien

cy

––In

cre

ase S

IMD

para

lle

lism

In

cre

ase S

IMD

para

lle

lism

––A

vo

id s

yn

ch

ron

izati

on

Avo

id s

yn

ch

ron

izati

on

��A

sp

ecif

ic d

esig

n e

xam

ple

A

sp

ecif

ic d

esig

n e

xam

ple

��S

um

mary

Su

mm

ary

G8

0 T

hre

ad

Co

mp

uti

ng

Pip

elin

e

Alt

ern

ati

ve o

pe

rati

ng

mo

de

sp

eci

fica

lly f

or

com

pu

tin

g

Loa

d/s

tore

Glo

ba

l M

em

ory

Th

rea

d E

xe

cuti

on

Ma

na

ge

r

Inp

ut

Ass

em

ble

r

Ho

st

Tex

ture

Te

xtu

reT

ex

ture

Te

xtu

reT

ex

ture

Te

xtu

reT

ex

ture

Te

xtu

reT

ex

ture

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Pa

rall

el

Da

ta

Ca

che

Loa

d/s

tore

Loa

d/s

tore

Loa

d/s

tore

Loa

d/s

tore

Loa

d/s

tore

58

So

urc

e: U

PE

NN

's C

IS 6

65

tu

toria

ls: h

ttp

://w

ww

.se

as.u

pen

n.e

du/~

cis

665/

Page 30: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

59

Hig

hH

igh

--Le

ve

l V

iew

Le

ve

l V

iew

��T

hre

ad

s i

n a

“th

read

blo

ck”

Th

read

s i

n a

“th

read

blo

ck”

execu

ted

in

lo

ck

execu

ted

in

lo

ck--s

tep

ste

p––

“S

tream

ing

mu

lti

“S

tream

ing

mu

lti--

pro

cesso

r”

pro

cesso

r”

� ���� ���“p

rocesso

r”“p

rocesso

r”

––“S

tream

ing

pro

cesso

r”

“S

tream

ing

pro

cesso

r” � ���� ���

“S

IMD

” lan

e“S

IMD

” lan

e

��T

hre

ad

s i

n a

“th

read

blo

ck”

Th

read

s i

n a

“th

read

blo

ck”

can

co

mm

un

icate

th

rou

gh

can

co

mm

un

icate

th

rou

gh

sh

are

d m

em

ory

sh

are

d m

em

ory

––“S

hare

d m

em

ory

“S

hare

d m

em

ory

” � ���� ���

“cach

e”

“cach

e”

��T

hre

ad

s f

rom

tw

o d

iffe

ren

t T

hre

ad

s f

rom

tw

o d

iffe

ren

t b

locks h

ard

to

co

op

era

teb

locks h

ard

to

co

op

era

te––

Lim

ited

hard

ware

cach

e

Lim

ited

hard

ware

cach

e

co

here

nce b

etw

een

cach

es

co

here

nce b

etw

een

cach

es

So

urc

e: U

PE

NN

's C

IS 6

65

tu

toria

ls: h

ttp

://w

ww

.se

as.u

pen

n.e

du/~

cis

665/

Str

ea

min

g M

ult

ipro

ce

ss

or

16

Str

ea

min

g M

ult

ipro

ce

ss

or

2

Str

ea

min

g M

ult

ipro

ce

ss

or

1

De

vic

e m

em

ory

Sh

are

d M

em

ory

(1

6K

B)

Ins

tru

cti

on

Un

it

Str

ea

min

gP

roc

es

so

r 1

Re

gis

ters

…S

tre

am

ing

Pro

ce

ss

or

2

Re

gis

ters

Str

ea

min

g

Pro

ce

ss

or

8

Re

gis

ters

Co

nsta

nt

Ca

ch

e

Te

xtu

reC

ac

he

Glo

ba

l, c

on

sta

nt,

te

xtu

re m

em

ori

es

60

An

oth

er

Vie

w o

f G

PU

Arc

hit

ec

ture

An

oth

er

Vie

w o

f G

PU

Arc

hit

ec

ture

��8800 G

TX

: 128 p

hysic

al

str

eam

ing

pro

cesso

rs

8800 G

TX

: 128 p

hysic

al

str

eam

ing

pro

cesso

rs

––E

qu

als

to

512 lo

gic

al

str

eam

ing

pro

cesso

rsE

qu

als

to

512 lo

gic

al

str

eam

ing

pro

cesso

rs

––32 t

hre

ad

s in

a w

arp

mu

st

be e

xecu

ted

in

lo

ck

32 t

hre

ad

s in

a w

arp

mu

st

be e

xecu

ted

in

lo

ck--s

tep

ste

p

––E

qu

als

to

16 s

tream

ing

mu

ltip

rocesso

rs

Eq

uals

to

16 s

tream

ing

mu

ltip

rocesso

rs

��16 p

rocesso

rs16 p

rocesso

rs

––E

ach

pro

cesso

r h

as 3

2E

ach

pro

cesso

r h

as 3

2--w

ide S

IMD

w

ide S

IMD

•F

lexib

le:

each S

IMD

lane c

an p

rocess it

s o

wn d

ata

•B

est perf

orm

ance w

hen

–W

hole

SIM

D w

arp

work

s t

ogeth

er

(no b

ranch d

iverg

ence)

–C

ontiguous p

iece o

f data

(coale

scin

g m

em

ory

accesses)

––W

ith

in e

ach

pro

cesso

r, d

ata

in

cach

e c

an

be s

hare

dW

ith

in e

ach

pro

cesso

r, d

ata

in

cach

e c

an

be s

hare

d

––L

imit

ed

hard

ware

cach

e c

oh

ere

nce b

etw

een

cach

es

Lim

ited

hard

ware

cach

e c

oh

ere

nce b

etw

een

cach

es

Page 31: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

61

Dif

fere

nc

e in

Pro

gra

mm

ing

GP

UD

iffe

ren

ce

in

Pro

gra

mm

ing

GP

U

��M

em

ory

hie

rarc

hy o

pti

miz

ed

fo

r th

rou

gh

pu

t,

Mem

ory

hie

rarc

hy o

pti

miz

ed

fo

r th

rou

gh

pu

t,

no

t fo

r la

ten

cy

no

t fo

r la

ten

cy

––S

mall

er

cach

e a

nd

hig

her

mem

ory

ban

dw

idth

Sm

all

er

cach

e a

nd

hig

her

mem

ory

ban

dw

idth

––N

eed

s 1

000s o

f th

read

s f

or

full

eff

icie

ncy

Need

s 1

000s o

f th

read

s f

or

full

eff

icie

ncy

•192 thre

ads t

o h

ide

read-a

fter-

write

regis

ter

late

ncy*

��H

ard

to

syn

ch

ron

ize a

mo

ng

all

th

read

sH

ard

to

syn

ch

ron

ize a

mo

ng

all

th

read

s

––T

hre

ad

s i

n a

“th

read

blo

ck” c

an

co

op

era

te w

ith

T

hre

ad

s i

n a

“th

read

blo

ck” c

an

co

op

era

te w

ith

each

oth

er

via

(1)

the l

ock

each

oth

er

via

(1)

the l

ock--s

tep

execu

tio

n a

nd

(2)

ste

p e

xecu

tio

n a

nd

(2)

co

mm

un

icati

on

th

rou

gh

sh

are

d m

em

ory

co

mm

un

icati

on

th

rou

gh

sh

are

d m

em

ory

––T

hre

ad

s f

rom

dif

fere

nt

blo

cks h

ard

to

co

op

era

teT

hre

ad

s f

rom

dif

fere

nt

blo

cks h

ard

to

co

op

era

te

* S

ou

rce

: N

VU

DU

A C

UD

A C

om

pu

te D

evic

e A

rch

ite

ctu

re P

rog

ram

min

g G

uid

e v

ers

ion

1.1

, 1

1/2

9/2

007.

GP

U h

as s

mall

, lo

cal,

G

PU

has s

mall

, lo

cal,

sh

are

d

sh

are

d s

tora

ge.

sto

rag

e.

It is h

ard

It

is h

ard

to

syn

ch

ron

ize a

mo

ng

all

to

syn

ch

ron

ize a

mo

ng

all

th

read

s.

thre

ad

s.

62

Op

tim

iza

tio

n P

rin

cip

les

Op

tim

iza

tio

n P

rin

cip

les

��G

en

era

l p

rin

cip

les s

till a

pp

licab

le (

16 p

rocesso

rs)

Gen

era

l p

rin

cip

les s

till a

pp

licab

le (

16 p

rocesso

rs)

––A

vo

id s

eq

uen

tial d

ep

en

den

cie

sA

vo

id s

eq

uen

tial d

ep

en

den

cie

s

––M

axim

ize lo

ad

bala

nce

Maxim

ize lo

ad

bala

nce

––R

ed

uce o

verh

ead

Red

uce o

verh

ead

��In

cre

ase c

ach

e e

ffic

ien

cy

Incre

ase c

ach

e e

ffic

ien

cy

––16K

B s

hare

d m

em

ory

in

GP

U16K

B s

hare

d m

em

ory

in

GP

U

��In

cre

ase S

IMD

para

llelism

In

cre

ase S

IMD

para

llelism

––

Red

uce b

ran

ch

div

erg

en

ce (

32

Red

uce b

ran

ch

div

erg

en

ce (

32--w

ide S

IMD

in

a w

arp

)w

ide S

IMD

in

a w

arp

)

��A

vo

id s

yn

ch

ron

izati

on

A

vo

id s

yn

ch

ron

izati

on

––

Avo

id c

om

mu

nic

ati

on

wit

h o

ther

thre

ad

sA

vo

id c

om

mu

nic

ati

on

wit

h o

ther

thre

ad

s

––O

pera

tio

ns s

ho

uld

be in

dep

en

den

tO

pera

tio

ns s

ho

uld

be in

dep

en

den

t

��R

ed

uce g

lob

al m

em

ory

access (

400

Red

uce g

lob

al m

em

ory

access (

400--6

00 c

ycle

s)

600 c

ycle

s)

Page 32: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Inc

rea

se

Ca

ch

e E

ffic

ien

cy

Inc

rea

se

Ca

ch

e E

ffic

ien

cy

Take a

dva

nta

ge o

f sh

are

d c

ach

e

Take a

dva

nta

ge o

f sh

are

d c

ach

e

Ch

oo

se t

he “

rig

ht”

lo

op

fo

r p

ara

lleli

zati

on

Ch

oo

se t

he “

rig

ht”

lo

op

fo

r p

ara

lleli

zati

on

Syn

ch

ron

izati

on

overh

ead

vs.

mem

ory

pre

ssu

re

Syn

ch

ron

izati

on

overh

ead

vs.

mem

ory

pre

ssu

re 6

4

Me

mo

ry B

an

dw

idth

is

Cri

tic

al R

es

ou

rce

Me

mo

ry B

an

dw

idth

is

Cri

tic

al R

es

ou

rce

1

10

10

0

10

00 1

98

01

98

51

99

01

99

52

00

0

Ye

ar

Relative performance since 1980

Mic

rop

roce

sso

r M

IPS

~5

0%

annually

DR

AM

Band

wd

ith

~2

7%

annually

DR

AM

Late

ncy

~7

% a

nnually

So

urc

e: D

. A

. P

atte

rso

n, "L

ate

ncy L

ag

s

Ba

nd

wid

th,"

Co

mm

unic

atio

n o

f th

e A

CM

,

Vo

l. 4

7, n

o. 1

0, p

p. 7

1-7

5, O

ct. 2

00

4.

Takin

g a

dvan

tag

e o

f ca

ch

e i

s im

po

rtan

tTakin

g a

dvan

tag

e o

f ca

ch

e i

s im

po

rtan

t

Page 33: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

core

L2 c

ach

e

core

core

L2 c

ach

e

core

core

L2 c

ach

e

core

core

L2 c

ach

e

core

core

L2 c

ach

e

core

core

L2 c

ach

e

core

core

L2 c

ach

e

core

core

L2 c

ach

e

core

Bri

dg

e

(wit

h 6

4M

B s

no

op

fil

ter

cach

e)

Off

-ch

ip b

us

inte

rco

nn

ect

C

hip

bo

un

da

ry

Xe

on

E7

34

0 p

ack

ag

e b

ou

nd

ary

core

L2 c

ach

e

core

core

core

. . .

core

core

core

core

. . .

Bi-d

ire

ctio

n rin

g in

terc

on

ne

ctt

Ch

ip b

ou

nd

ary

Sym

metr

ic M

ultip

rocessor

Syste

ms (

SM

P)

Chip

Multip

rocessor

(CM

P)

65

66

Op

tim

iza

tio

n f

or

Ch

ip M

ult

ipro

ce

ss

or

Op

tim

iza

tio

n f

or

Ch

ip M

ult

ipro

ce

ss

or

��C

hip

mu

ltip

rocesso

r C

hip

mu

ltip

rocesso

r

�S

malle

r la

st-

level on-d

ie c

ache

�F

aste

r in

ter-

core

on-d

ie c

om

munic

ation

��U

tili

ze

Uti

lize c

ach

e

cach

e &

in

ter

& i

nte

r--co

re

co

re c

om

mu

nic

ati

on

co

mm

un

icati

on

eff

icie

ntl

yeff

icie

ntl

y

�T

ake a

dvanta

ge o

f share

d c

ache

�C

hoose t

he “

right”

loop f

or

para

lleliz

ation

�B

ala

nce b

etw

een t

he s

ynchro

niz

ation o

verh

ead a

nd

mem

ory

pre

ssure

Page 34: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

67

Ca

ch

e S

ha

rin

g

Ca

ch

e S

ha

rin

g

��E

ith

er

dis

rup

tive

or

co

ns

tru

cti

ve

Eit

he

r d

isru

pti

ve

or

co

ns

tru

cti

ve

��D

isru

pti

ve

:D

isru

pti

ve

:

––M

ult

iple

th

read

s h

ave t

heir

ow

n d

isti

nct

Mu

ltip

le t

hre

ad

s h

ave t

heir

ow

n d

isti

nct

wo

rkin

g

wo

rkin

g s

ets

sets

––T

hey

Th

ey c

om

pete

co

mp

ete

th

e

the l

imit

ed

li

mit

ed

cach

e r

eso

urc

ecach

e r

eso

urc

e

��C

on

str

uc

tive

:C

on

str

uc

tive

:

––M

ult

iple

th

read

s s

hare

th

e d

ata

bro

ug

ht

in

Mu

ltip

le t

hre

ad

s s

hare

th

e d

ata

bro

ug

ht

in

by o

ne o

r an

oth

er

by o

ne o

r an

oth

er

68

(b)

Dyn

am

ic

dis

patc

hin

g

(a)

Sta

tic

dis

patc

hin

g

pic

ture

slic

es

Assig

ned

slic

es

Thre

ad 1

Thre

ad 2

Thre

ad 1

Thre

ad 2

Pa

rall

eli

zin

g M

PE

G D

ec

od

er

Pa

rall

eli

zin

g M

PE

G D

ec

od

er

Page 35: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

69

Fra

me t

Fra

me t+

1

All

local

cache h

its

Fra

me t

Fra

me t+

1

Som

e lo

cal

cache m

isses

Mo

tio

n C

om

pe

ns

ati

on

in

MP

EG

Mo

tio

n C

om

pe

ns

ati

on

in

MP

EG

(a)

Sta

tic

dis

patc

hin

g

(b)

Dyn

am

ic

dis

patc

hin

g

Spe

ed

up

s vs

Ca

che

Lo

calit

ies

0

0.2

0.4

0.6

0.81

1.2

1.4

1.6

Du

al-

pro

cesso

rH

yp

er-

thre

ad

ing

11

1.48

1.04

1.39

1.08

Speedup

Sin

gle

-th

read

Sta

tic

Dyn

am

ic•

Dyn

am

ic s

che

du

le in

curs

mo

re c

ach

e m

isse

s o

n d

ua

l-

pro

cess

or

–M

ore

bu

s tr

aff

ics

–Sp

ee

du

p is

lim

ite

d

•S

ha

rin

g c

ach

e o

n H

ype

r-

Th

rea

din

g e

nfo

rce

th

e

cach

e lo

calit

ies

–B

ett

er

spe

ed

up

be

cau

se o

f

be

tte

r lo

ad

ba

lan

ce

70

Page 36: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

71

��A

po

pu

lar

mach

ine l

ea

rnin

g

A p

op

ula

r m

ach

ine l

ea

rnin

g t

ech

niq

ue

tech

niq

ue

��E

valu

ati

on

of

train

ed

SV

Ms i

s v

ery

str

uctu

red

E

valu

ati

on

of

train

ed

SV

Ms i

s v

ery

str

uctu

red

––

Can

be m

ult

ith

read

ed

at

mu

ltip

le levels

Can

be m

ult

ith

read

ed

at

mu

ltip

le levels

•T

he d

imensio

nalit

y K

of th

e in

put data

can b

e v

ery

larg

e•

The e

valu

ation o

f each e

xpre

ssio

n in t

he s

um

is

independent of each o

ther

•S

evera

l sam

ple

s a

re te

ste

d a

nd e

ach e

valu

ation c

an b

e

done in

para

llel.

Su

pp

ort

Ve

cto

r M

ac

hin

es

(S

VM

)S

up

po

rt V

ec

tor

Ma

ch

ine

s (

SV

M)

+

Φ

=∑ =

by

Fn i

ii

i

1

),

(si

gn

)(

xx

72

Mu

lti

Mu

lti--

Th

read

ing

SV

M

Th

read

ing

SV

M

// L

INE

AR

KE

RN

EL

// L

INE

AR

KE

RN

EL

flo

at

lin

ea

r_k

ern

el(

co

ns

t Ip

pfl

oa

t lin

ea

r_k

ern

el(

co

ns

t Ip

p3

23

2f*

pS

rcf*

pS

rc11

, in

t le

n,

int

ind

ex

),

int

len

, in

t in

de

x)

{{Ip

pIp

p3

23

2f

res

ult

;f

res

ult

;ip

ps

Do

tPro

d_

ipp

sD

otP

rod

_3

23

2f(

pS

rcf(

pS

rc11

, s

up

po

rtV

ec

tor[

ind

ex

], l

en

, &

res

ult

);, s

up

po

rtV

ec

tor[

ind

ex

], l

en

, &

res

ult

);re

turn

re

su

lt *

co

eff

s[i

nd

ex

];re

turn

re

su

lt *

co

eff

s[i

nd

ex

];}} in

t m

ain

() {

int

ma

in()

{#

pra

gm

a o

mp

pa

ralle

l fo

r #

pra

gm

a o

mp

pa

ralle

l fo

r fo

r (i

nt

j=fo

r (i

nt

j=0

0

; j

<N

UM

_S

AM

PL

ES

; j+

+)

{;

j<N

UM

_S

AM

PL

ES

; j+

+)

{fl

oa

t s

um

=fl

oa

t s

um

=00

;;#

pra

gm

a o

mp

pa

ralle

l fo

r re

du

cti

on

(+

:su

m)

#p

rag

ma

om

p p

ara

lle

l fo

r re

du

cti

on

(+

:su

m)

for

(in

t i=

for

(in

t i=

0

0 ;

i<

NU

M_

SU

PP

_V

EC

; i+

+)

{;

i<N

UM

_S

UP

P_

VE

C;

i++

) {

//

// 1

00

01

00

0fl

oa

t tm

p =

lin

ea

r_k

ern

el(

&s

am

ple

s[j

], N

UM

_V

EC

_D

IM, i)

;

//

flo

at

tmp

= l

ine

ar_

ke

rne

l(&

sa

mp

les

[j],

NU

M_

VE

C_

DIM

, i)

;

//

24

24

**24

24

su

m +

= t

mp

; s

um

+=

tm

p;

}} res

ult

[j]

= s

um

; re

su

lt[j

] =

su

m;

}}}}

Page 37: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

73

Mu

tua

l P

refe

tch

ing

Mu

tua

l P

refe

tch

ing

��C

ach

e l

ocali

ty i

s

Cach

e l

ocali

ty i

s k

ey

key

��A

th

read

sch

ed

uli

ng

A

th

read

sch

ed

uli

ng

to

en

forc

e l

ocali

ties

to e

nfo

rce l

ocali

ties

��T

wo

th

read

s

Tw

o t

hre

ad

s

––R

eq

uir

e t

he s

am

e

Req

uir

e t

he s

am

e

data

sets

d

ata

sets

––P

refe

tch

Pre

fetc

hd

ata

fo

r d

ata

fo

r each

oth

er

each

oth

er

––M

utu

al

Mu

tual p

refe

tch

ing

pre

fetc

hin

g

��E

xcell

en

t sp

eed

up

s

Excell

en

t sp

eed

up

s

on

Hyp

er

on

Hyp

er--

Th

read

ing

Th

read

ing

Fir

st

ou

ter

Fir

st

ou

ter--

loo

p

loo

p

para

llelism

para

llelism S

eco

nd

Seco

nd

--lo

op

lo

op

p

ara

lleli

sm

para

lleli

sm

X1*V

1X

1*V

1

XX11*V*V

22X

1*V

3X

1*V

3X

2*V

1X

2*V

1

X2*V

2X

2*V

2

X2*V

3X

2*V

3 74

SV

M o

n

SV

M o

n H

yp

er

Hyp

er--

Th

rea

din

g T

ec

hn

olo

gy

Th

rea

din

g T

ec

hn

olo

gy

Pe

rfo

rma

nc

e o

n d

iffe

ren

t a

rch

ite

ctu

re

11

1.6

1.5

81.6

41.7

3

3.0

22.6

6

0

0.51

1.52

2.53

3.5

SV

M-l-n

SV

M-r

-n

Speed-up

SP

SP

+ H

TD

PD

P +

HT

Bett

er

cach

e u

tili

zati

on

can

Bett

er

cach

e u

tili

zati

on

can

imp

rove p

erf

orm

an

ce s

ign

ific

an

tly

imp

rove p

erf

orm

an

ce s

ign

ific

an

tly

Page 38: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

75

Pa

rt 1

(2)

Ea

rlie

r P

ara

lle

liza

tio

n S

ch

em

eE

arl

ier

Pa

rall

eli

za

tio

n S

ch

em

e

��C

ach

eC

ach

e--u

naw

are

un

aw

are

��C

ach

eC

ach

e--a

ware

aw

are

Pa

rt 1

(1)

Pa

rt 2

(2)

Pa

rt 1

(3)

Pa

rt 2

(3)

Pa

rt 1

(4)

Pa

rt 2

(4)

Pro

ce

sso

r 1

Pro

ce

sso

r 2

Pa

rt 1

(1)

Pa

rt 2

(1)

Pa

rt 1

(2)

Pa

rt 2

(2)

Pa

rt 1

(3)

Pa

rt 2

(3)

Pa

rt 1

(4)

Pa

rt 2

(4)

Pro

ce

sso

r 1

Pro

ce

sso

r 2

Pa

rt 2

(1)

76

Co

ars

eC

oa

rse

--Gra

ine

d a

nd

Fin

eG

rain

ed

an

d F

ine

--Gra

ine

d

Gra

ine

d

Pa

rall

eli

ze

d G

CU

T o

n S

MP

& C

MP

Pa

rall

eli

ze

d G

CU

T o

n S

MP

& C

MP

Sp

eed

up

on

SM

P (

p.

Sp

eed

up

on

SM

P (

p. 55

55))

Sp

eed

up

on

CM

PS

peed

up

on

CM

P

Sh

ari

ng

cach

e i

mp

roves p

erf

orm

an

ce

Sh

ari

ng

cach

e i

mp

roves p

erf

orm

an

ce

048

12

16

04

812

16

Num

ber

of C

ore

s

Parallel Speedup

Coars

e-G

rain

ed

Fin

e-G

rain

ed

0

16

32

48

64

016

32

48

64

Num

ber

of C

ore

s

Parallel Speedup

Coars

e-G

rain

ed

Fin

e-G

rain

ed

••23%

accesses t

ou

ch

sh

are

d d

ata

in

23%

accesses t

ou

ch

sh

are

d d

ata

in

fin

efi

ne--g

rain

ed

sch

em

e

gra

ined

sch

em

e

Page 39: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Ch

oo

se t

he “

Rig

ht”

Lo

op

fo

r P

ara

lleli

zati

on

Ch

oo

se t

he “

Rig

ht”

Lo

op

fo

r P

ara

lleli

zati

on

78

Co

ars

eC

oa

rse

--Gra

ine

d a

nd

Fin

eG

rain

ed

an

d F

ine

--Gra

ine

d

Gra

ine

d

Pa

rall

eli

ze

d H

ou

gh

P

ara

lle

lize

d H

ou

gh

Tra

ns

form

T

ran

sfo

rm o

n C

MP

on

CM

P

Sp

eed

up

Sp

eed

up

Co

ars

eC

oars

e--g

rain

ed

sp

eed

gra

ined

sp

eed

--up

u

p

of

vs.

siz

e o

f o

no

f vs.

siz

e o

f o

n--d

ie c

ach

ed

ie c

ach

e

Ou

ter

Ou

ter--

loo

p p

ara

lleli

zati

on

red

uces s

yn

ch

ron

izati

on

lo

op

para

lleli

zati

on

red

uces s

yn

ch

ron

izati

on

o

verh

ea

d,

bu

t in

cre

ases m

em

ory

de

man

do

verh

ea

d,

bu

t in

cre

ases m

em

ory

de

man

d

0

16

32

48

64

016

32

48

64

Num

ber

of C

ore

s

Parallel Speedup

Coars

e-G

rain

ed

Fin

e-G

rain

ed

0

16

32

48

64

01

63

24

86

4

Nu

mb

er

of C

ore

s

Parallel Speedup

16

MB

64

MB

25

6M

B

Page 40: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

79

Co

ars

eC

oa

rse

--Gra

ine

d a

nd

Fin

eG

rain

ed

an

d F

ine

--Gra

ine

d

Gra

ine

d

Pa

rall

eli

ze

d

Pa

rall

eli

ze

d A

da

Bo

os

tA

da

Bo

os

to

n C

MP

on

CM

P

Sp

eed

up

Sp

eed

up

Cach

e m

iss r

ate

s v

s.

Cach

e m

iss r

ate

s v

s.

siz

e o

f o

nsiz

e o

f o

n--d

ie c

ach

ed

ie c

ach

e

Hig

her

mem

ory

dem

an

d

Hig

her

mem

ory

dem

an

d � ���� ���

Lo

wer

perf

orm

an

ce

Lo

wer

perf

orm

an

ce

0

16

32

48

64

016

32

48

64

Num

ber

of C

ore

s

Parallel Speedup

Vid

eo F

ram

es

Fra

me P

art

ition

02468

10

12

512K

1M

2M

4M

8M

16M

32M

64M

128M

Cache S

ize

Misses per Kilo Instructions (MPKI)

Vid

eo F

ram

es

Fra

me P

art

ition

Bala

nce B

etw

een

Syn

ch

ron

izati

on

Overh

ea

d a

nd

B

ala

nce B

etw

een

Syn

ch

ron

izati

on

Overh

ea

d a

nd

M

em

ory

Pre

ssu

re

Mem

ory

Pre

ssu

re

Page 41: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

81

Ho

ug

h T

ran

sfo

rm

Ho

ug

h T

ran

sfo

rm P

ara

lle

lize

d

Pa

rall

eli

ze

d v

ia

via

L

oc

k v

s.

Th

rea

d P

riva

tiza

tio

n

Lo

ck

vs

. T

hre

ad

Pri

va

tiza

tio

n

Sp

eed

up

on

SM

PS

peed

up

on

SM

PS

peed

up

on

CM

PS

peed

up

on

CM

P

0

16

32

48

64

016

32

48

64

Num

ber

of C

ore

s

Parallel Speedup

Fin

e-G

rain

ed L

ock

Thre

ad P

rivi

tizatio

n

048

12

16

04

812

16

Num

ber

of C

ore

s

Parallel Speedup

Fin

e-G

rain

ed L

ock

Thre

ad P

riva

tizatio

n

Th

read

pri

vati

zati

on

in

cre

ases m

em

ory

pre

ssu

reT

hre

ad

pri

vati

zati

on

in

cre

ases m

em

ory

pre

ssu

re

82

Ou

tlin

eO

utl

ine

��M

oti

vati

on

& c

urr

en

t tr

en

ds

Mo

tivati

on

& c

urr

en

t tr

en

ds

��T

heo

ry

Th

eo

ry a

nd

pri

ncip

les i

n

an

d p

rin

cip

les i

n p

ara

lleli

zati

on

para

lleli

zati

on

��A

dvan

ce

d

Ad

van

ce

d o

pti

miz

ati

on

tech

niq

ues i

n m

ult

io

pti

miz

ati

on

tech

niq

ues i

n m

ult

i--co

re

co

re a

rch

itectu

re (

arc

hit

ectu

re (

CP

U a

nd

GP

U h

ave

CP

U a

nd

GP

U h

ave

dif

fere

nt

treatm

en

ts)

dif

fere

nt

treatm

en

ts)

��A

sp

ecif

ic d

esig

n e

xam

ple

A

sp

ecif

ic d

esig

n e

xam

ple

––In

cre

ase S

IMD

para

lle

lism

In

cre

ase S

IMD

para

lle

lism

––A

vo

id s

yn

ch

ron

izati

on

Avo

id s

yn

ch

ron

izati

on

��S

um

mary

Su

mm

ary

Page 42: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Co

nti

nu

ou

s Sp

ee

ch R

eco

gnit

ion

83

WFS

T R

eco

gn

itio

n N

etw

ork

HO

P

ON

PO

P

CA

T

HA

T

IN TH

E

...

...

...

...

...

CAT

HAT

...

...

HOPIN...

ONPOP...

THE...

Big

ram

Lan

gu

age

Mo

de

l

Feat

ure

s

fro

m o

ne

fra

me

...

HO

P h

ha

ap

...

ON

a

an

...

PO

P p

aa

p

...

Pro

nu

nci

ati

on

Mo

de

l

aa

hh

n

HM

M A

cou

stic

Ph

on

e M

od

el

Ga

uss

ian

Mix

ture

Mo

de

l

for

On

e P

ho

ne

Sta

te

………

……

Mix

ture

Co

mp

on

en

ts

Co

mp

uti

ng

dis

tan

ce t

o

ea

ch m

ixtu

re

com

po

ne

nts

Co

mp

uti

ng

we

igh

ted

su

m

of

all

co

mp

on

en

ts

Co

nti

nu

ou

s Sp

ee

ch R

eco

gnit

ion

84

Page 43: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Infe

ren

ce E

ng

ine

•H

iera

rch

ica

l str

uct

ure

–It

era

tive

ou

ter

loo

p o

ver

tim

e s

tep

s

–P

ipe

line

of

op

era

tio

ns

in e

ach

tim

e s

tep

–Se

t o

f a

lte

rna

tive

hyp

oth

esi

s to

ad

van

ce

Ph

ase

1

Ph

ase

2

Ph

ase

3

On

e it

er

pe

r

tim

e s

tep

:(~

60

M i

nst

)

Ob

sp

rob

com

pu

te

No

n-e

ps

trav

ers

al

Ep

silo

n t

rave

rsa

l

Mu

ltip

le s

tep

s in

a

ph

ase

, ea

ch h

as:

10

00

s to

10

,00

0s

con

curr

en

t ta

sks

(10

to

50

0 i

nst

r.)

Co

mp

ute

In

ten

siv

e

Co

mm

un

ica

tio

n

Inte

nsi

ve

Ex

ten

siv

e f

ine

-gra

ine

d

pa

rall

eli

sm a

t th

e i

nn

er

mo

st l

ev

el

Se

qu

en

tia

l o

pe

rati

on

wit

h i

tera

tio

n t

o i

tera

tio

n

de

pe

nd

en

cie

s

85

Re

cog

nit

ion

Pro

cess

•P

ha

se 1

:–

Ob

serv

ati

on

pro

ba

bili

ty c

om

pu

tati

on

o

nly

re

qu

ire

d f

or

ou

t-go

ing

arc

s o

f a

ctiv

e s

tate

s

–H

igh

ly c

om

pu

te in

ten

sive

ste

p

•P

ha

se 2

:–

Trav

ers

e o

ut-

goin

g n

on

-ep

silo

n a

rcs

fro

m a

ctiv

e s

tate

s

–W

rite

co

nte

nti

on

mu

st b

e r

eso

lve

d a

t th

e d

est

ina

tio

n s

tate

s

–D

est

ina

tio

n s

tate

is u

pd

ate

d w

ith

th

e

mo

st-l

ike

ly in

-co

min

g a

rc

•P

ha

se 3

:–

Trav

ers

e o

ut-

goin

g e

psi

lon

arc

s to

co

mp

lete

th

e it

era

tio

n

86

Re

cog

nit

ion

is a

pro

cess

of

gra

ph

tra

vers

al

Re

cog

nit

ion

is a

pro

cess

of

gra

ph

tra

vers

al

Ph

ase

1

Ph

ase

2

Ph

ase

3

Ob

sp

rob

com

pu

te

No

n-e

ps

trav

ers

al

Ep

silo

n t

rave

rsa

l

WFS

T R

eco

gn

itio

n N

etw

ork

WFS

T R

eco

gn

itio

n N

etw

ork

WFS

T R

eco

gn

itio

n N

etw

ork

Page 44: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Gra

ph

Tra

vers

al

87

1 2 3 4

5 6 7

81

01

2

91

1

1 2 3 4

5 6 7

81

01

2

91

1

Arc

evalu

ati

on

:S

ou

rce s

tate

co

st

+ O

bs

erv

ati

on

Pro

b. +

Arc

weig

ht

Infe

ren

ce E

ng

ine

Ch

alle

ng

es

88

Co

reC

ore

Co

reC

ore

Co

re

Co

re

Co

re

$ $ $

Co

re

Co

re

Co

re

$ $ $

Sy

nch

ron

iza

tio

n

SIM

D E

ffic

ien

cy

Sy

nch

ron

iza

tio

n

SIM

D E

ffic

ien

cy

WFS

T R

eco

gn

itio

n N

etw

ork

•A

pp

lica

tio

n:

–P

ara

llel g

rap

h t

rave

rsa

l th

rou

gh

irre

gu

lar

ne

two

rk

–C

on

tin

uo

usl

y ch

an

gin

g w

ork

ing

se

t a

t ru

nti

me

•C

ha

llen

ge

s:–

Effi

cie

ntl

y sy

nch

ron

ize

be

twe

en

co

ncu

rre

nt

task

s

–Ef

fect

ive

ly u

tiliz

e a

ll le

vels

of

pa

ralle

l re

sou

rce

s, in

clu

din

g S

IMD

Page 45: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Wri

te C

on

flic

t in

Arc

Tra

vers

al

•U

pd

ate

of

de

stin

ati

on

sta

te

–M

inim

um

of

inco

min

g c

ost

sho

uld

be

ad

op

ted

•P

oss

ible

ap

pro

ach

es

–M

ake

up

da

te a

tom

ic

–P

riva

tiza

tio

n

–Lo

ck b

ase

d im

ple

me

nta

tio

n

89

1 2 3 4

5 6

Thre

ad 0

Thre

ad 1

De

sig

n T

rad

e-o

ffs

for

Syn

chro

niz

ati

on

•C

ha

llen

ge:

–T

he

co

st f

or

wri

te-c

on

flic

t re

solu

tio

n

•E

xpe

rim

en

t:–

Allo

w t

rave

rsa

l to

eit

he

r p

rop

aga

te f

rom

so

urc

e o

r a

gg

rega

te a

t d

est

ina

tio

n f

or

wri

te c

on

flic

t re

solu

tio

n

90

Ad

va

nta

ge

sD

isa

dv

an

tag

es

Fig

ure

Tra

ve

rsa

l b

y

Pro

pa

ga

tio

n

Ea

sy t

o p

rog

ram

, H

W

ha

nd

les

wri

te c

on

flic

ts

tra

nsp

are

ntl

y

On

e a

tom

ic o

pe

rati

on

fo

r

eve

ry a

rc,

larg

e n

um

be

r o

f

ato

mic

op

era

tio

ns

ma

kes

it

sen

siti

ve t

o a

tom

ic o

pe

rati

on

late

ncy

Tra

ve

rsa

l b

y

Ag

gre

ga

tio

n

Exp

lici

t re

solu

tio

n o

f

wri

te c

on

flic

ts,

no

t

sen

siti

ve t

o H

W

eff

icie

ncy

of

ato

mic

op

era

tio

ns

Invo

lve

s si

gn

ific

an

t o

verh

ea

d

in b

uil

din

g li

sts

of

to-b

e-

up

da

ted

de

stin

ati

on

sta

tes

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Page 46: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Syn

chro

niz

ati

on

Co

st

•T

he

fix

ed

co

st

(ove

rhe

ad

) o

f a

gg

reg

ati

on

te

chn

iqu

e is

si

gn

ific

an

t

•R

ela

tive

gra

die

nt

of

pro

pa

ga

tio

n a

nd

a

gg

reg

ati

on

te

chn

iqu

es

de

pe

nd

o

n t

he

eff

icie

ncy

of

the

pla

tfo

rm in

re

solv

ing

wri

te

con

flic

ts

91

0

0.51

1.52

2.53

3.5

05

01

00

Total Synchronization Cost [sec]

Nu

mb

er

of

Arc

s S

yn

chro

niz

ed

[M

illi

on

s o

f A

rcs]

Measure

d s

ynchro

niz

ation c

ost in

GT

X280

SIM

D U

tiliz

ati

on

Eff

icie

ncy

•C

ha

llen

ge

:–

Ve

cto

r u

nit

eff

icie

ncy

ca

n q

uic

kly

dro

p o

ff w

ith

incr

ea

sed

ve

cto

r w

idth

•E

xpe

rim

en

t:–

Trav

ers

e t

he

re

cog

nit

ion

ne

two

rk b

ase

d o

n a

ctiv

e s

tate

s o

r a

ctiv

e a

rcs 9

2

Ad

va

nta

ge

sD

isa

dv

an

tag

es

Fig

ure

Act

ive

Sta

tes

Ea

sy t

o p

rog

ram

, a

ll

act

ive

arc

s e

mit

fro

m

act

ive

sta

tes

Ou

t-d

eg

ree

of

act

ive

sta

tes

vari

es

wid

ely

, d

iffi

cult

to

fu

lly

uti

lize

ve

cto

r u

nit

s

Act

ive

Arc

sP

ara

lle

liza

tio

n a

t fi

ne

r

gra

nu

lari

ty,

can

ach

ieve

be

tte

r lo

ad

ba

lan

cin

g

Mo

re in

form

ati

on

to

ma

inta

in

fro

m it

era

tio

n t

o i

nte

ract

ion

,

the

re a

re a

lwa

ys m

ore

act

ive

arc

s th

an

act

ive

sta

tes

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Page 47: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

SIM

D U

tiliz

ati

on

Eff

icie

ncy

93

Act

ive

Sta

tes

Ma

pp

ed

on

to S

IMD Tim

e

SIM

D

Uti

liza

tio

nE

xtr

a w

ork

0%

10

%

20

%

30

%

40

%

50

%

60

%

70

%

80

%

90

%

10

0%

0123456789

10

12

48

16

32

SIMD Utilization

SpeedupS

IMD

Wid

th

Sp

ee

du

p a

nd

SIM

D E

ffic

ien

cy

in S

tate

Ba

sed

Tra

ve

rsa

l

Spe

ed

up

Ove

r

Seq

ue

nti

al C

ase

SIM

D

Uti

liza

tio

n

Tra

vers

al b

y s

tate

s

De

sig

n S

pa

ce

94

Tra

ve

rsa

l b

y P

rop

ag

ati

on

Tra

ve

rsa

l b

y A

gg

reg

ati

on

Act

ive

Sta

tes

Ma

inta

in a

ctiv

e s

ou

rce

sta

tes,

pro

pa

gate

ou

t-a

rc c

om

pu

tati

on

re

sult

s

to d

est

ina

tio

n s

tate

Ma

inta

in a

ctiv

e d

est

ina

tio

n s

tate

s,

de

term

ine

all

po

ten

tia

l d

est

ina

tio

n

sta

tes

an

d a

gg

rega

te i

nco

min

g a

rcs

Act

ive

Arc

s

Ma

inta

in a

ctiv

e a

rcs,

pro

pa

gate

act

ive

arc

co

mp

uta

tio

n r

esu

lts

to d

est

ina

tio

n

sta

te

Ma

inta

in a

ctiv

e a

rcs,

gro

up

arc

s w

ith

sam

e d

est

ina

tio

n s

tate

s a

nd

ag

gre

gate

act

ive

arc

s lo

call

y to

re

solv

e w

rite

con

flic

ts

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Cu

rre

nt

Sta

tes

Ne

xt

Sta

tes

Page 48: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

Pe

rfo

rma

nce

on

CP

U

95

Sta

te-b

ase

d A

gg

rega

tio

n

RT

F: 2

.59

3;

1.2

x

Arc

-ba

sed

Pro

pa

gati

on

RT

F:

1.0

06

; 3

.2x

Sta

te-b

ase

d P

rop

aga

tio

n

RT

F:

0.9

25

; 3

.4x

0.7

37

0.2

42

0.0

26

0.0

01

0.7

32

0.1

57

0.0

35

0.0

01

0.7

54

1.3

56

0.4

82

0.0

01

Seq

ue

nti

al

RT

F:3

.17

; 1

x

RT

F:

Re

al T

ime

Fa

cto

r

3.4

x:S

pe

ed

up

vs

Se

q

Ob

s. P

rob

. C

om

p.

No

n-e

ps

Tra

vers

al

Ep

sTr

ave

rsa

l

Se

q.

Ove

rhe

ad

2.6

23

0.4

74

0.0

73

Pe

rfo

rma

nce

on

GP

U

96

Arc

-ba

sed

Ag

gre

gati

on

RT

F: 0

.91

2;

3.5

x

Sta

te-b

ase

d A

gg

rega

tio

n

RT

F: 1

.20

3;

2.6

x

Arc

-ba

sed

Pro

pa

gati

on

RT

F:

0.3

02

; 1

0.5

x

Sta

te-b

ase

d P

rop

aga

tio

n

RT

F:0

.77

6; 4

.1x

0.1

48

0.1

03

0.0

43

0.0

08

0.1

48

0.5

12

0.1

08

0.0

08

0.1

48

0.4

69

0.2

81

0.0

14

0.1

47

0.7

7

0.2

72

0.0

14

Seq

ue

nti

al

RT

F:3

.17

; 1

x

RT

F:

Re

al T

ime

Fa

cto

r

10

.5x:

Sp

ee

du

p vs

Se

q

Ob

s. P

rob

. C

om

p.

No

n-e

ps

Tra

vers

al

Ep

sTr

ave

rsa

l

Se

q.

Ove

rhe

ad

2.6

23

0.4

74

0.0

73

Faste

st

alg

ori

thm

sty

le d

iffe

rs b

etw

een

pla

tfo

rms

Page 49: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

97

Ou

tlin

eO

utl

ine

��M

oti

vati

on

& c

urr

en

t tr

en

ds

Mo

tivati

on

& c

urr

en

t tr

en

ds

��T

heo

ry

Th

eo

ry a

nd

pri

ncip

les i

n

an

d p

rin

cip

les i

n p

ara

lleli

zati

on

para

lleli

zati

on

��A

dvan

ce

d

Ad

van

ce

d o

pti

miz

ati

on

tech

niq

ues i

n m

ult

io

pti

miz

ati

on

tech

niq

ues i

n m

ult

i--co

re

co

re a

rch

itectu

re (

arc

hit

ectu

re (

CP

U a

nd

GP

U h

ave

CP

U a

nd

GP

U h

ave

dif

fere

nt

treatm

en

ts)

dif

fere

nt

treatm

en

ts)

��A

sp

ecif

ic d

esig

n e

xam

ple

A

sp

ecif

ic d

esig

n e

xam

ple

��S

um

mary

Su

mm

ary

98

Ke

y L

ea

rnin

gs

Ke

y L

ea

rnin

gs

��A

vo

id s

eri

al d

ep

en

den

cie

sA

vo

id s

eri

al d

ep

en

den

cie

s––

Para

llelize a

s m

uch

as p

oss

ible

Para

llelize a

s m

uch

as p

oss

ible

––R

eR

e--o

rder

co

nstr

ain

tso

rder

co

nstr

ain

ts

��In

cre

ase lo

ad

bala

nce

Incre

ase lo

ad

bala

nce

––D

yn

am

ic t

ask a

ssig

nm

en

tD

yn

am

ic t

ask a

ssig

nm

en

t––

Make t

asks s

maller

Make t

asks s

maller

––F

use m

ult

iple

lo

op

sF

use m

ult

iple

lo

op

s––

Data

Data

--do

main

deco

mp

osit

ion

in

ste

ad

of

fun

cti

on

al

do

main

deco

mp

osit

ion

in

ste

ad

of

fun

cti

on

al--

do

main

d

om

ain

d

eco

mp

osit

ion

deco

mp

osit

ion

��R

ed

uce o

verh

ead

Red

uce o

verh

ead

––P

ara

llelize t

he o

ute

r lo

op

Para

llelize t

he o

ute

r lo

op

––R

ed

uce c

om

mu

nic

ati

on

Red

uce c

om

mu

nic

ati

on

––R

ed

uce r

ep

eate

d c

om

pu

tati

on

Red

uce r

ep

eate

d c

om

pu

tati

on

––R

ed

uce l

ockin

g o

verh

ead

Red

uce l

ockin

g o

verh

ead

��U

tilize c

ach

e e

ffic

ien

tly

Uti

lize c

ach

e e

ffic

ien

tly

––T

ake a

dvan

tag

e o

f sh

ari

ng

cach

e

Take a

dvan

tag

e o

f sh

ari

ng

cach

e

––C

ho

ose t

he “

rig

ht”

lo

op

fo

r p

ara

llelizati

on

Ch

oo

se t

he “

rig

ht”

lo

op

fo

r p

ara

llelizati

on

––B

ala

nce b

etw

een

syn

ch

ron

izati

on

overh

ead

& m

em

ory

pre

ssu

re

Bala

nce b

etw

een

syn

ch

ron

izati

on

overh

ead

& m

em

ory

pre

ssu

re

Ot

t

tt

iP

N is

Ps

++

+

=]

[m

ax 1

Page 50: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

99

Su

mm

ary

S

um

ma

ry

Mu

lti

Mu

lti--

Co

re is

Go

ing

Ma

ins

tre

am

Co

re is

Go

ing

Ma

ins

tre

am

��A

rchitectu

re:

Arc

hitectu

re: N

um

ber

of

core

s p

er

chip

will

gro

w q

uic

kly

Num

ber

of

core

s p

er

chip

will

gro

w q

uic

kly

––P

ow

er

co

nsu

mp

tio

n is

Po

wer

co

nsu

mp

tio

n is a

key d

river

a k

ey d

river

��A

lgorith

m:

Alg

orith

m: F

utu

re p

rocessors

dem

and s

pecia

l F

utu

re p

rocessors

dem

and s

pecia

l desig

ns

desig

ns

––M

ust

co

nsid

er

para

lleli

sm

Mu

st

co

nsid

er

para

lleli

sm

•A

lgorith

ms o

ften n

eed c

hang

es

��A

lgorith

mA

lgorith

m--A

rchitectu

re C

oA

rchitectu

re C

o--E

xplo

ration

Explo

ration

––A

rch

itectu

res w

ith

dif

fere

nt

ch

ara

cte

risti

cs n

eed

dif

fere

nt

Arc

hit

ectu

res w

ith

dif

fere

nt

ch

ara

cte

risti

cs n

eed

dif

fere

nt

imp

lem

en

tati

on

an

d o

pti

miz

ati

on

str

ate

gie

sim

ple

men

tati

on

an

d o

pti

miz

ati

on

str

ate

gie

s

100

New

Pa

rad

igm

N

ew

Pa

rad

igm

��C

hall

en

ge t

rad

itio

nal

ways o

f d

oin

g/t

hin

kin

gC

hall

en

ge t

rad

itio

nal

ways o

f d

oin

g/t

hin

kin

g

––N

um

ero

us e

merg

ing

ap

pli

cati

on

s a

re e

nab

led

by

Nu

mero

us e

merg

ing

ap

pli

cati

on

s a

re e

nab

led

by

mu

lti

mu

lti--

co

re c

om

pu

tin

gco

re c

om

pu

tin

g

––D

evelo

p p

ara

llel so

ftw

are

an

d a

lgo

rith

ms f

or

futu

re

Develo

p p

ara

llel so

ftw

are

an

d a

lgo

rith

ms f

or

futu

re

syste

ms w

ith

man

y c

ore

ssyste

ms w

ith

man

y c

ore

s

•Learn

para

llel pro

gra

mm

ing,

if y

ou h

aven

’t

•“T

he m

ost im

port

ant

new

thin

gs w

e m

ust

teach

stu

dents

in c

om

puting is h

ow

to thin

k in p

ara

llel.”

---

John O

wens,

UC

Davis

Page 51: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

101

Fo

otn

ote

Fo

otn

ote

��M

an

y w

ays t

o w

rite

mu

lti

Man

y w

ays t

o w

rite

mu

lti--

thre

ad

ed

co

des,

thre

ad

ed

co

des,

e.g

.,

e.g

.,

––C

UD

A

CU

DA

(cf.

, p

ag

e

(cf.

, p

ag

e 1

111))

––W

in32

Win

32 t

hre

ad

s (

cf.

, p

ag

e

thre

ad

s (

cf.

, p

ag

e 1

9)

19)

––P

OS

IX T

hre

ad

s,

a.k

.a.,

P

OS

IX T

hre

ad

s,

a.k

.a.,

pth

read

pth

read

––O

pen

MP

Op

en

MP

(cf.

, p

ag

e

(cf.

, p

ag

e 7

2)

72

)

��In

tel als

o p

rovid

es t

oo

ls t

o h

elp

develo

p

Inte

l als

o p

rovid

es t

oo

ls t

o h

elp

develo

p

thre

ad

ed

so

ftw

are

, e.g

.,

thre

ad

ed

so

ftw

are

, e.g

.,

(( htt

p:/

/so

ftw

are

.in

tel.co

m/e

nh

ttp

://s

oft

ware

.in

tel.co

m/e

n--u

s/i

nte

lu

s/i

nte

l--p

ara

llel

para

llel--

stu

dio

stu

dio

--h

om

e/)

ho

me/)

––In

tel®

Para

llel C

om

po

ser

Inte

l® P

ara

llel C

om

po

ser

––In

tel

Inte

® P

ara

llel In

sp

ec

tor

Para

llel In

sp

ec

tor

––In

tel

Inte

l® P

ara

llel A

mp

lifi

er

® P

ara

llel A

mp

lifi

er

Re

fere

nc

es

Re

fere

nc

es

Page 52: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

103

Jo

urn

al P

ap

ers

Jo

urn

al P

ap

ers

��"I

mag

e P

rocessin

g o

n M

ult

i"I

mag

e P

rocessin

g o

n M

ult

i--C

ore

x86 A

rch

itectu

res

Co

re x

86 A

rch

itectu

res --

Op

tim

izati

on

Tech

niq

ues a

nd

O

pti

miz

ati

on

Tech

niq

ues a

nd

E

xam

ple

s,"

D. K

im, V

. L

ee, an

d Y

.E

xam

ple

s,"

D. K

im, V

. L

ee, an

d Y

.--K

. C

hen

, to

ap

pear

in IE

EE

Sig

nal P

rocessin

g

K. C

hen

, to

ap

pear

in IE

EE

Sig

nal P

rocessin

g

Mag

azin

e, M

arc

h 2

010.

Mag

azin

e, M

arc

h 2

010.

��"A

lgo

rith

m/A

rch

itectu

re C

o"A

lgo

rith

m/A

rch

itectu

re C

o--E

xp

lora

tio

n o

f V

isu

al C

om

pu

tin

g o

n E

merg

ing

E

xp

lora

tio

n o

f V

isu

al C

om

pu

tin

g o

n E

merg

ing

P

latf

orm

s,"

G. G

. C

. L

ee, Y

.P

latf

orm

s,"

G. G

. C

. L

ee, Y

.--K

. C

hen

, M

. M

att

avelli, a

nd

E. S

. Jan

g, IE

EE

K

. C

hen

, M

. M

att

avelli, a

nd

E. S

. Jan

g, IE

EE

T

ran

sacti

on

s o

n C

ircu

its a

nd

Syste

ms f

or

Vid

eo

Tech

no

log

y, vo

l. 1

9, n

o. 11, p

p.

Tra

nsacti

on

s o

n C

ircu

its a

nd

Syste

ms f

or

Vid

eo

Tech

no

log

y, vo

l. 1

9, n

o. 11, p

p.

1576

1576--1

587,

No

v 2

009.

1587,

No

v 2

009.

��"P

ara

llel S

cala

bilit

y i

n S

peech

Reco

gn

itio

n

"Para

llel S

cala

bilit

y i

n S

peech

Reco

gn

itio

n --

Infe

ren

ce E

ng

ines in

Larg

e V

ocab

ula

ry

Infe

ren

ce E

ng

ines in

Larg

e V

ocab

ula

ry

Co

nti

nu

ou

s S

peech

Reco

gn

itio

n,"

K. Y

ou

, J.

Ch

on

g, Y

. Y

i, E

. G

on

ina, C

. H

ug

hes,

Co

nti

nu

ou

s S

peech

Reco

gn

itio

n,"

K. Y

ou

, J.

Ch

on

g, Y

. Y

i, E

. G

on

ina, C

. H

ug

hes,

Y.

Y.--

K. C

hen

, W

. S

un

g a

nd

K.

K. C

hen

, W

. S

un

g a

nd

K. K

eu

tzer

Keu

tzer,

IE

EE

Sig

nal P

rocessin

g M

ag

azin

e, vo

l. 2

6,

no

. , IE

EE

Sig

nal P

rocessin

g M

ag

azin

e, vo

l. 2

6,

no

. 6, p

p. 124

6, p

p. 124--1

35,

No

v 2

009.

135,

No

v 2

009.

��"P

ara

llelizati

on

Str

ate

gie

s a

nd

Perf

orm

an

ce A

naly

sis

of

Med

ia M

inin

g A

pp

licati

on

s

"Para

llelizati

on

Str

ate

gie

s a

nd

Perf

orm

an

ce A

naly

sis

of

Med

ia M

inin

g A

pp

licati

on

s

on

Mu

lti

on

Mu

lti--

Co

re P

rocesso

rs,"

W. L

i, X

. T

on

g, T

. W

an

g, Y

. Z

han

g, an

d Y

.C

ore

Pro

cesso

rs,"

W. L

i, X

. T

on

g, T

. W

an

g, Y

. Z

han

g, an

d Y

.--K

. C

hen

, K

. C

hen

, Jo

urn

al o

f S

ign

al P

rocessin

g S

yste

ms,

vo

l. 5

7,

no

. 2, p

p. 213

Jo

urn

al o

f S

ign

al P

rocessin

g S

yste

ms,

vo

l. 5

7,

no

. 2, p

p. 213--2

28,

No

v 2

009.

228,

No

v 2

009.

��"A

ccele

rati

ng

Vid

eo

"Accele

rati

ng

Vid

eo

--Min

ing

Ap

plicati

on

s U

sin

g M

an

y S

mall, G

en

era

lM

inin

g A

pp

licati

on

s U

sin

g M

an

y S

mall, G

en

era

l--P

urp

ose

Pu

rpo

se

Co

res,"

E. L

i, W

. L

i, X

. T

on

g, J. L

i, Y

. C

hen

, T

. W

an

g, P

. W

an

g, W

. H

u, Y

. D

u, Y

. C

ore

s,"

E. L

i, W

. L

i, X

. T

on

g, J. L

i, Y

. C

hen

, T

. W

an

g, P

. W

an

g, W

. H

u, Y

. D

u, Y

. Z

han

g, an

d Y

.Z

han

g, an

d Y

.--K

. C

hen

, IE

EE

Mic

ro, vo

l. 2

8, n

o. 5, p

p. 8

K. C

hen

, IE

EE

Mic

ro, vo

l. 2

8, n

o. 5, p

p. 8--2

1, S

ep

t 2008.

21, S

ep

t 2008.

��"H

igh

"Hig

h--P

erf

orm

an

ce P

hysic

al

Sim

ula

tio

ns o

n N

ext

Perf

orm

an

ce P

hysic

al

Sim

ula

tio

ns o

n N

ext--

Gen

era

tio

n A

rch

itectu

re w

ith

G

en

era

tio

n A

rch

itectu

re w

ith

M

an

y C

ore

s,"

Y.

Man

y C

ore

s,"

Y.--

K. C

hen

, J. C

hh

ug

an

i, C

. H

ug

hes, D

. K

im, S

. K

um

ar,

V. L

ee,

A. L

in,

K. C

hen

, J. C

hh

ug

an

i, C

. H

ug

hes, D

. K

im, S

. K

um

ar,

V. L

ee,

A. L

in,

A. N

gu

yen

, E

. A

. N

gu

yen

, E

. S

ifakis

Sif

akis

, an

d M

. S

mely

an

skiy

, In

tel T

ech

no

log

y J

ou

rnal, A

ug

. 2007.

, an

d M

. S

mely

an

skiy

, In

tel T

ech

no

log

y J

ou

rnal, A

ug

. 2007.

��"M

ed

ia M

inin

g"M

ed

ia M

inin

g——

Em

erg

ing

E

merg

ing

Tera

Tera

--sca

le C

om

pu

tin

g A

pp

licati

on

s,"

Y. C

hen

, E

. L

i, W

. sca

le C

om

pu

tin

g A

pp

licati

on

s,"

Y. C

hen

, E

. L

i, W

. L

i, T

. W

an

g, J. L

i, X

. T

on

g, P

. W

an

g, W

. H

u, Y

. Z

han

g, an

d Y

.L

i, T

. W

an

g, J. L

i, X

. T

on

g, P

. W

an

g, W

. H

u, Y

. Z

han

g, an

d Y

.--K

. C

hen

, In

tel

K. C

hen

, In

tel

Tech

no

log

y J

ou

rnal, A

ug

. 2007.

Tech

no

log

y J

ou

rnal, A

ug

. 2007.

��“M

ed

ia A

pp

licati

on

s o

n H

yp

er

“M

ed

ia A

pp

licati

on

s o

n H

yp

er--

Th

read

ing

Tech

no

log

y,”

Y.

Th

read

ing

Tech

no

log

y,”

Y.--

K. C

hen

, M

. H

ollim

an

, E

. K

. C

hen

, M

. H

ollim

an

, E

. D

eb

es

Deb

es, S

. , S

. Z

helt

ov

Zh

elt

ov,

A.

, A

. K

nyazev

Kn

yazev,

S. B

rata

no

v,

R.

, S

. B

rata

no

v,

R. B

ele

no

vB

ele

no

v,

an

d I. S

an

tos, In

tel

, an

d I. S

an

tos, In

tel

Tech

no

log

y J

ou

rnal, F

eb

. 2002.

Tech

no

log

y J

ou

rnal, F

eb

. 2002.

��“Im

ple

men

tati

on

of

H.2

64 E

nco

der

an

d D

eco

der

on

Pers

on

al

Co

mp

ute

rs,”

Y.

“Im

ple

men

tati

on

of

H.2

64 E

nco

der

an

d D

eco

der

on

Pers

on

al

Co

mp

ute

rs,”

Y.--

K.

K.

Ch

en

, E

. Q

. L

i, X

. Z

ho

u, an

d S

. L

. C

hen

, E

. Q

. L

i, X

. Z

ho

u, an

d S

. L

. G

eG

e, Jo

urn

al o

f V

isu

al C

om

mu

nic

ati

on

s a

nd

Im

ag

e

, Jo

urn

al o

f V

isu

al C

om

mu

nic

ati

on

s a

nd

Im

ag

e

Rep

resen

tati

on

s, vo

l. 1

7, n

o. 2 , p

p 5

09

Rep

resen

tati

on

s, vo

l. 1

7, n

o. 2 , p

p 5

09--5

32,

Ap

r. 2

006.

532,

Ap

r. 2

006.

��"A

Co

mp

iler

for

Exp

loit

ing

Neste

d"A

Co

mp

iler

for

Exp

loit

ing

Neste

d--P

ara

llelism

in

P

ara

llelism

in

Op

en

MP

Op

en

MP

Pro

gra

ms,"

X.

Pro

gra

ms,"

X. T

ian

Tia

n, J.

, J.

Ho

efl

ing

er

Ho

efl

ing

er,

G.

, G

. H

aab

Haab

, Y

., Y

.--K

. C

hen

, M

. G

irk

ar,

S. S

hah

, P

ara

llel C

om

pu

tin

g J

ou

rnal,

K. C

hen

, M

. G

irk

ar,

S. S

hah

, P

ara

llel C

om

pu

tin

g J

ou

rnal,

vo

l. 3

1,

no

. 10

vo

l. 3

1,

no

. 10--1

2,

pp

. 960

12,

pp

. 960--9

83,

Oct.

2005.

983,

Oct.

2005.

104

Co

nfe

ren

ce

Pa

pe

rs

Co

nfe

ren

ce

Pa

pe

rs

��"C

hallen

ges a

nd

Op

po

rtu

nit

ies o

f O

bta

inin

g P

erf

orm

an

ce f

rom

Mu

lti

"Ch

allen

ges a

nd

Op

po

rtu

nit

ies o

f O

bta

inin

g P

erf

orm

an

ce f

rom

Mu

lti--

Co

re C

PU

s

Co

re C

PU

s

an

d M

an

yan

d M

an

y--C

ore

GP

Us,"

T. C

hen

an

d Y

.C

ore

GP

Us,"

T. C

hen

an

d Y

.--K

. C

hen

, in

IE

EE

In

tern

ati

on

al C

on

fere

nce

K. C

hen

, in

IE

EE

In

tern

ati

on

al C

on

fere

nce

on

Aco

usti

cs,

Sp

eech

, an

d S

ign

al P

rocessin

g (

ICA

SS

P),

Ap

r. 2

009.

on

Aco

usti

cs,

Sp

eech

, an

d S

ign

al P

rocessin

g (

ICA

SS

P),

Ap

r. 2

009.

��"P

ara

llelizati

on

Of

"Para

llelizati

on

Of

Ad

aB

oo

st

Ad

aB

oo

st

Alg

ori

thm

On

Mu

lti

Alg

ori

thm

On

Mu

lti--

Co

re P

rocesso

rs,"

Y.

Co

re P

rocesso

rs,"

Y.--

K. C

hen

, W

. K

. C

hen

, W

. L

i, a

nd

X. T

on

g, IE

EE

Wo

rksh

op

on

Sig

nal P

rocessin

g S

yste

ms,

Oct.

2008.

Li, a

nd

X. T

on

g, IE

EE

Wo

rksh

op

on

Sig

nal P

rocessin

g S

yste

ms,

Oct.

2008.

��""N

ovel P

ara

llel H

ou

gh

Tra

nsfo

rm o

n M

ult

iN

ovel P

ara

llel H

ou

gh

Tra

nsfo

rm o

n M

ult

i--C

ore

Pro

cesso

rs,"

Y.

Co

re P

rocesso

rs,"

Y.--

K. C

hen

, W

. L

i, J

. K

. C

hen

, W

. L

i, J

. L

i, a

nd

T. W

an

g, in

In

t’l C

on

f. o

n A

co

usti

cs,

Sp

eech

, an

d S

ign

al P

rocessin

g,

Ap

r.

Li, a

nd

T. W

an

g, in

In

t’l C

on

f. o

n A

co

usti

cs,

Sp

eech

, an

d S

ign

al P

rocessin

g,

Ap

r.

2008.

2008.

��“P

ara

llelizati

on

, P

erf

orm

an

ce A

naly

sis

, an

d A

lgo

rith

m C

on

sid

era

tio

n o

f H

ou

gh

“P

ara

llelizati

on

, P

erf

orm

an

ce A

naly

sis

, an

d A

lgo

rith

m C

on

sid

era

tio

n o

f H

ou

gh

T

ran

sfo

rm o

n C

hip

Mu

ltip

rocesso

rs,”

W. L

i, a

nd

Y.

Tra

nsfo

rm o

n C

hip

Mu

ltip

rocesso

rs,”

W. L

i, a

nd

Y.--

K. C

hen

, in

Wo

rksh

op

on

K

. C

hen

, in

Wo

rksh

op

on

D

esig

n, A

rch

itectu

re a

nd

Sim

ula

tio

n o

f C

hip

Mu

lti

Desig

n, A

rch

itectu

re a

nd

Sim

ula

tio

n o

f C

hip

Mu

lti--

Pro

cesso

rs,

Dec. 2007.

Pro

cesso

rs,

Dec. 2007.

��“C

om

pu

ter

Vis

ion

on

Mu

lti

“C

om

pu

ter

Vis

ion

on

Mu

lti--

Co

re P

roc

esso

rs:

Art

icu

late

d B

od

y T

rackin

g,”

T. C

hen

, C

ore

Pro

cesso

rs:

Art

icu

late

d B

od

y T

rackin

g,”

T. C

hen

, D

. D

. B

ud

nik

ov

Bu

dn

iko

v, C

. H

ug

hes, an

d Y

., C

. H

ug

hes, an

d Y

.--K

. C

hen

, in

In

t’l C

on

f. o

n M

ult

imed

ia a

nd

Exp

o, Ju

ly

K. C

hen

, in

In

t’l C

on

f. o

n M

ult

imed

ia a

nd

Exp

o, Ju

ly

2007.

2007.

��"A

dap

tive P

ara

llel

Gra

ph

Min

ing

fo

r C

MP

Arc

hit

ectu

res,"

G.

"Ad

ap

tive P

ara

llel

Gra

ph

Min

ing

fo

r C

MP

Arc

hit

ectu

res,"

G. B

ueh

rer

Bu

eh

rer,

S.

, S

. P

art

hasara

thy

Part

hasara

thy, an

d Y

., an

d Y

.--K

. C

hen

, in

In

t’l C

on

f. o

n D

ata

Min

ing

, p

p. 97

K. C

hen

, in

In

t’l C

on

f. o

n D

ata

Min

ing

, p

p. 97--1

06,

Dec. 2006.

106,

Dec. 2006.

��“W

ork

load

Ch

ara

cte

rizati

on

of

a P

ara

llel

Vid

eo

Min

ing

Ap

plicati

on

on

a 1

6“W

ork

load

Ch

ara

cte

rizati

on

of

a P

ara

llel

Vid

eo

Min

ing

Ap

plicati

on

on

a 1

6--W

ay

Way

Sh

are

dS

hare

d--M

em

ory

Mu

ltip

rocesso

r S

yste

m,”

W. L

i, E

. L

i, C

. M

em

ory

Mu

ltip

rocesso

r S

yste

m,”

W. L

i, E

. L

i, C

. D

ulo

ng

Du

lon

g, Y

., Y

.--K

. C

hen

, T

. K

. C

hen

, T

. W

an

g, Y

. Z

han

g, in

In

t’l

Wan

g, Y

. Z

han

g, in

In

t’l S

ym

pS

ym

p. o

n W

ork

load

Ch

ara

cte

rizati

on

, p

p. 7

. o

n W

ork

load

Ch

ara

cte

rizati

on

, p

p. 7--1

6, O

ct.

2006.

16, O

ct.

2006.

��"E

ffic

ien

t F

req

uen

t P

att

ern

Min

ing

on

Sh

are

d M

em

ory

Syste

ms:

Imp

licati

on

s f

or

"Eff

icie

nt

Fre

qu

en

t P

att

ern

Min

ing

on

Sh

are

d M

em

ory

Syste

ms:

Imp

licati

on

s f

or

Ch

ip M

ult

ipro

cesso

r A

rch

itectu

res,"

G.

Ch

ip M

ult

ipro

cesso

r A

rch

itectu

res,"

G. B

ueh

rer

Bu

eh

rer,

S.

, S

. P

art

hasara

thy

Part

hasara

thy,

A.

, A

. G

ho

tin

gG

ho

tin

g, Y

., Y

.--K

. K

. C

hen

, D

. K

im, an

d A

. N

gu

yen

, in

Me

mo

ry S

yste

ms P

erf

orm

an

ce a

nd

Co

rrectn

ess

Ch

en

, D

. K

im, an

d A

. N

gu

yen

, in

Me

mo

ry S

yste

ms P

erf

orm

an

ce a

nd

Co

rrectn

ess

Wo

rksh

op

, O

ct.

2006.

Wo

rksh

op

, O

ct.

2006.

Page 53: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

105

Co

nfe

ren

ce

Pa

pe

rs (

Vid

eo

Co

de

c R

ela

ted

)C

on

fere

nce

Pa

pe

rs (

Vid

eo

Co

de

c R

ela

ted

)��

“Im

ple

men

tati

on

of

H.2

64 E

nc

od

er

on

Gen

era

l“Im

ple

men

tati

on

of

H.2

64 E

nc

od

er

on

Gen

era

l--P

urp

ose P

rocesso

rs

Pu

rpo

se P

rocesso

rs

wit

h H

yp

er

wit

h H

yp

er--

Th

read

ing

Tech

no

log

y,”

E. Q

. L

i an

d Y

.T

hre

ad

ing

Tech

no

log

y,”

E. Q

. L

i an

d Y

.--K

. C

hen

, in

Pro

c.

K. C

hen

, in

Pro

c.

of

SP

IE V

isu

al C

om

mu

nic

ati

on

s a

nd

Im

ag

e P

rocessin

g, v

ol. 5

308, p

p.

of

SP

IE V

isu

al C

om

mu

nic

ati

on

s a

nd

Im

ag

e P

rocessin

g, v

ol. 5

308, p

p.

384

384——

395, Jan

. 2004.

395, Jan

. 2004.

��“T

ow

ard

s E

ffic

ien

t M

ult

i“T

ow

ard

s E

ffic

ien

t M

ult

i--L

evel T

hre

ad

ing

of

H.2

64 E

nco

der

on

In

tel

Level T

hre

ad

ing

of

H.2

64 E

nco

der

on

In

tel

Hyp

er

Hyp

er--

Th

read

ing

Arc

hit

ectu

res

,” Y

.T

hre

ad

ing

Arc

hit

ectu

res

,” Y

.--K

. C

hen

, X

. T

ian

, S

. G

e, M

. G

irkar,

K

. C

hen

, X

. T

ian

, S

. G

e, M

. G

irkar,

in

Pro

c. o

f In

t’l P

ara

llel an

d D

istr

ibu

ted

Pro

cessin

g S

ym

p., A

pr.

2004.

in P

roc. o

f In

t’l P

ara

llel an

d D

istr

ibu

ted

Pro

cessin

g S

ym

p., A

pr.

2004.

��“E

ffic

ien

t M

ult

ith

read

ing

Im

ple

men

tati

on

of

H.2

64 E

nco

der

on

In

tel

“E

ffic

ien

t M

ult

ith

read

ing

Im

ple

men

tati

on

of

H.2

64 E

nco

der

on

In

tel

Hyp

er

Hyp

er--

Th

read

ing

Arc

hit

ectu

res

,” S

. G

e, X

. T

ian

, an

d Y

.T

hre

ad

ing

Arc

hit

ectu

res

,” S

. G

e, X

. T

ian

, an

d Y

.--K

. C

hen

, in

K

. C

hen

, in

P

acif

icP

acif

ic--R

im C

on

f. o

n M

ult

imed

ia, D

ec 2

003.

Rim

Co

nf.

on

Mu

ltim

ed

ia, D

ec 2

003.

��“E

xp

lori

ng

th

e U

se o

f H

yp

er

“E

xp

lori

ng

th

e U

se o

f H

yp

er--

Th

read

ing

Tech

no

log

y f

or

Mu

ltim

ed

ia

Th

read

ing

Tech

no

log

y f

or

Mu

ltim

ed

ia

Ap

plicati

on

s w

ith

In

tel O

pen

MP

Co

mp

iler,

” X

. T

ian

, Y

.A

pp

licati

on

s w

ith

In

tel O

pen

MP

Co

mp

iler,

” X

. T

ian

, Y

.--K

. C

hen

, M

. K

. C

hen

, M

. G

irkar,

S. G

e, R

. L

ien

hart

, an

d S

. S

hah

, in

In

t’l P

ara

llel an

d

Gir

kar,

S. G

e, R

. L

ien

hart

, an

d S

. S

hah

, in

In

t’l P

ara

llel an

d

Dis

trib

ute

d P

rocessin

g S

ym

p., p

p. 36

Dis

trib

ute

d P

rocessin

g S

ym

p., p

p. 36--4

3, A

pr.

2003.

43, A

pr.

2003.

��"T

he Im

pact

of

SM

T/S

MP

Des

ign

s o

n M

ult

imed

ia S

oft

ware

"T

he Im

pact

of

SM

T/S

MP

Des

ign

s o

n M

ult

imed

ia S

oft

ware

E

ng

ineeri

ng

En

gin

eeri

ng

---

---A

Wo

rklo

ad

An

aly

sis

Stu

dy,”

Y.

A W

ork

load

An

aly

sis

Stu

dy,”

Y.--

K. C

hen

, R

. L

ien

hart

, K

. C

hen

, R

. L

ien

hart

, E

. D

eb

es, M

. H

ollim

an

, an

d M

. Y

eu

ng

, in

Pro

c. o

f In

t’l S

ym

p. o

n

E. D

eb

es, M

. H

ollim

an

, an

d M

. Y

eu

ng

, in

Pro

c. o

f In

t’l S

ym

p. o

n

Mu

ltim

ed

ia S

oft

ware

En

gin

ee

rin

g, D

ec. 2002.

Mu

ltim

ed

ia S

oft

ware

En

gin

ee

rin

g, D

ec. 2002.

��"V

ideo

Ap

plicati

on

s o

n H

yp

er

"Vid

eo

Ap

plicati

on

s o

n H

yp

er--

Th

read

ing

Tech

no

log

y,"

Y.

Th

read

ing

Tech

no

log

y,"

Y.--

K. C

hen

, M

. K

. C

hen

, M

. H

ollim

an

, an

d E

. D

eb

es, in

In

t'l C

on

f. o

n M

ult

imed

ia a

nd

Exp

o, v

ol. 2

, H

ollim

an

, an

d E

. D

eb

es, in

In

t'l C

on

f. o

n M

ult

imed

ia a

nd

Exp

o, v

ol. 2

, p

p. 193

pp

. 193 --

196, A

ug

. 2002.

196, A

ug

. 2002.

106

Sp

ec

ial Is

su

es

Sp

ec

ial Is

su

es

��“M

ult

i“M

ult

i--C

ore

C

ore

En

ab

led

Mu

ltim

ed

ia A

pp

licati

on

s &

E

nab

led

Mu

ltim

ed

ia A

pp

licati

on

s &

A

rch

itectu

res,”

Jo

urn

al

of

Sig

nal

Pro

cessin

g

Arc

hit

ectu

res,”

Jo

urn

al

of

Sig

nal

Pro

cessin

g

Syste

ms (

No

v.

2009)

Syste

ms (

No

v.

2009)

��“S

ign

al

Pro

cessin

g o

n P

latf

orm

s w

ith

Mu

ltip

le

“S

ign

al

Pro

cessin

g o

n P

latf

orm

s w

ith

Mu

ltip

le

Co

res,”

IE

EE

Sig

nal

Pro

cessin

g M

ag

azin

e

Co

res,”

IE

EE

Sig

nal

Pro

cessin

g M

ag

azin

e

––P

art

1

Part

1 -

---O

verv

iew

an

d M

eth

od

olo

gy

Overv

iew

an

d M

eth

od

olo

gy (

No

v.

2009)

(No

v.

2009)

––P

art

P

art

2

2 -

---D

esig

n

Desig

n a

nd

Ap

pli

cati

on

s (

Marc

h 2

010)

an

d A

pp

lic

ati

on

s (

Marc

h 2

010)

��“A

lgo

rith

m/A

rch

itectu

re C

o“A

lgo

rith

m/A

rch

itectu

re C

o--E

xp

lora

tio

n o

f E

xp

lora

tio

n o

f V

isu

al

Co

mp

uti

ng

,” I

EE

E T

ran

sacti

on

s o

n

Vis

ual

Co

mp

uti

ng

,” I

EE

E T

ran

sacti

on

s o

n

Cir

cu

it a

nd

Syste

m f

or

Vid

eo

Tech

no

log

y (

No

v.

Cir

cu

it a

nd

Syste

m f

or

Vid

eo

Tech

no

log

y (

No

v.

2009)

2009)

Page 54: Multimedia Signal Processing on CPUs CPUs and and GPUs GPUs …speed.cis.nctu.edu.tw/~ydlin/course/cn/seminar/cpu.pdf · 2018-06-07 · Multimedia Signal Processing on CPUs CPUs and

107

Perf

orm

an

ce t

ests

an

d r

ati

ng

s a

re m

easu

red

usin

g

Perf

orm

an

ce t

ests

an

d r

ati

ng

s a

re m

easu

red

usin

g

sp

ecif

ic c

om

pu

ter

sys

tem

s a

nd

/or

co

mp

on

en

ts a

nd

sp

ecif

ic c

om

pu

ter

sys

tem

s a

nd

/or

co

mp

on

en

ts a

nd

re

flect

the a

pp

roxim

ate

perf

orm

an

ce o

f In

tel

refl

ect

the a

pp

roxim

ate

perf

orm

an

ce o

f In

tel

pro

du

cts

as m

easu

red

by t

ho

se t

ests

. A

ny

pro

du

cts

as m

easu

red

by t

ho

se t

ests

. A

ny

dif

fere

nce in

syste

m h

ard

ware

or

so

ftw

are

desig

n

dif

fere

nce in

syste

m h

ard

ware

or

so

ftw

are

desig

n

or

co

nfi

gu

rati

on

may a

ffect

actu

al p

erf

orm

an

ce.

or

co

nfi

gu

rati

on

may a

ffect

actu

al p

erf

orm

an

ce.

Bu

yers

sh

ou

ld c

on

su

lt o

ther

so

urc

es o

f in

form

ati

on

B

uyers

sh

ou

ld c

on

su

lt o

ther

so

urc

es o

f in

form

ati

on

to

evalu

ate

th

e p

erf

orm

an

ce o

f syste

ms o

r to

evalu

ate

th

e p

erf

orm

an

ce o

f syste

ms o

r co

mp

on

en

ts t

hey a

re c

on

sid

eri

ng

pu

rch

asin

g. F

or

co

mp

on

en

ts t

hey a

re c

on

sid

eri

ng

pu

rch

asin

g. F

or

mo

re in

form

ati

on

on

perf

orm

an

ce t

ests

an

d o

n t

he

mo

re in

form

ati

on

on

perf

orm

an

ce t

ests

an

d o

n t

he

perf

orm

an

ce o

f In

tel p

rod

ucts

, vis

it

perf

orm

an

ce o

f In

tel p

rod

ucts

, vis

it

ww

w.in

tel.co

m/p

erf

orm

an

ce/ o

r call

(U

.S.)

1w

ww

.in

tel.co

m/p

erf

orm

an

ce/ o

r call

(U

.S.)

1--8

00

800--

628

628--8

686 o

r 1

8686 o

r 1--9

16

916--3

56

356--3

104.

31

04.