Upload
others
View
35
Download
0
Embed Size (px)
Citation preview
Mu
ltim
ed
ia S
ign
al P
roc
es
sin
g o
n
Mu
ltim
ed
ia S
ign
al P
roc
es
sin
g o
n
CP
Us
C
PU
s a
nd
a
nd
GP
Us
G
PU
s w
ith
Ma
ny C
ore
sw
ith
Ma
ny C
ore
s
Yen
Yen
--Ku
an
g C
hen
Ku
an
g C
hen
Inte
l C
orp
ora
tio
nIn
tel C
orp
ora
tio
n
y.k
.ch
en
@ie
ee.o
rgy.k
.ch
en
@ie
ee.o
rg
2
Su
mm
ary
Su
mm
ary
Mu
lti
Mu
lti--
Co
re
Co
re is
is
Be
co
min
g M
ain
str
ea
mB
ec
om
ing
Ma
ins
tre
am
��A
rchitectu
re:
Arc
hitectu
re: N
um
ber
of
core
s p
er
chip
will
gro
w q
uic
kly
Num
ber
of
core
s p
er
chip
will
gro
w q
uic
kly
––P
ow
er
co
nsu
mp
tio
n is
Po
wer
co
nsu
mp
tio
n is a
key d
river
a k
ey d
river
��A
lgorith
m:
Alg
orith
m: F
utu
re p
rocessors
dem
and s
pecia
l F
utu
re p
rocessors
dem
and s
pecia
l desig
ns
desig
ns
––M
ust
co
nsid
er
para
lleli
sm
Mu
st
co
nsid
er
para
lleli
sm
•A
lgorith
ms o
ften n
eed c
hang
es
��A
lgorith
mA
lgorith
m--A
rchitectu
re C
oA
rchitectu
re C
o--E
xplo
ration
Explo
ration
––A
rch
itectu
re m
att
ers
Arc
hit
ectu
re m
att
ers
––E
xp
loit
E
xp
loit
data
data
--level an
d f
un
cti
on
al
level an
d f
un
cti
on
al--
level p
ara
lleli
sm
le
vel p
ara
lleli
sm
––
Alg
ori
thm
wit
h m
inim
al o
verh
ead
/data
dep
en
den
cie
sA
lgo
rith
m w
ith
min
imal o
verh
ead
/data
dep
en
den
cie
s––
Bala
nce lo
ad
s f
or
bett
er
pe
rfo
rman
ce s
cali
ng
Bala
nce lo
ad
s f
or
bett
er
pe
rfo
rman
ce s
cali
ng
––T
ake a
dvan
tag
e o
f sh
ari
ng
T
ake a
dvan
tag
e o
f sh
ari
ng
cach
ecach
e
3
Ou
tlin
eO
utl
ine
��M
oti
vati
on
& c
urr
en
t tr
en
ds
Mo
tivati
on
& c
urr
en
t tr
en
ds
––B
asic
pri
ncip
les in
mu
lti
Basic
pri
ncip
les in
mu
lti--
co
re a
rch
itectu
reco
re a
rch
itectu
re
��T
heo
ry a
nd
pri
ncip
les i
n
Th
eo
ry a
nd
pri
ncip
les i
n p
ara
lleli
zati
on
para
lleli
zati
on
––C
PU
an
d G
PU
have s
am
e p
rin
cip
les
CP
U a
nd
GP
U h
ave s
am
e p
rin
cip
les
��A
dvan
ce
d
Ad
van
ce
d o
pti
miz
ati
on
tech
niq
ues i
n m
ult
io
pti
miz
ati
on
tech
niq
ues i
n m
ult
i--co
re
co
re a
rch
itectu
rearc
hit
ectu
re
––C
PU
an
d G
PU
have d
iffe
ren
t tr
eatm
en
tsC
PU
an
d G
PU
have d
iffe
ren
t tr
eatm
en
ts
��A
sp
ecif
ic d
esig
n e
xam
ple
A
sp
ecif
ic d
esig
n e
xam
ple
��S
um
mary
Su
mm
ary
4
Mo
re T
ran
sis
tors
M
ore
Tra
ns
isto
rs &
Be
tte
r P
erf
orm
an
ce
& B
ett
er
Pe
rfo
rma
nc
e
1E
+3
1E
+4
1E
+5
1E
+6
1E
+7
1E
+8
1E
+9
1E
+1
0
1970
1975
1980
1985
1990
1995
2000
2005
Ye
ar
Number of transisotrs
40048008
8080
i286
i386
i486
Pentium
Pentium II
Pentinum III Pentium 4
Itanium Itanium 2 Itanium 2 (w/ 9MB cache) .
Dual Core Itanium .
8086
1
10
10
0
10
00 1
98
01
98
51
99
01
99
52
00
0
Ye
ar
Relative performance since 1980
Mic
rop
roce
sso
r M
IPS
~5
0%
annually
So
urc
e:
1. M
oo
re's
La
w 4
0th
An
niv
ers
ary
(h
ttp
://w
ww
.in
tel.co
m/p
ressro
om
/kits
/eve
nts
/moo
res_la
w_
40th
/)
2. D
. A
. P
atte
rso
n, "L
ate
ncy L
ag
s B
an
dw
idth
," C
om
mu
nic
atio
n o
f th
e A
CM
, V
ol.
47
, n
o. 1
0, p
p. 7
1-7
5, O
ct. 2
00
04
.
Faste
r co
mp
ute
rs e
nab
le n
ovel
ap
pli
cati
on
sF
aste
r co
mp
ute
rs e
nab
le n
ovel
ap
pli
cati
on
s
5
Th
erm
al
De
sig
n P
oin
ts
0
20
40
60
80
100
120
140 1
992
1994
1996
1998
2000
2002
2004
2006
2008
Year
Watts
Pentium
Pentium MMX
Pentium II
Pentium III
Pentium 4
Pentium D
Core 2 Dual
Core 2 Quad
Mo
re P
ow
er,
to
o!
Mo
re P
ow
er,
to
o!
So
urc
e: D
eskto
p C
PU
Co
mp
ariso
n G
uid
e R
ev.
3.3
(h
ttp
://w
ww
.te
ch
arp
.co
m/s
ho
wa
rtic
le.a
sp
x?
art
no
=3
37
)
Tra
nsis
tors
are
“fr
ee,”
bu
t p
ow
er
is e
xp
en
siv
eTra
nsis
tors
are
“fr
ee,”
bu
t p
ow
er
is e
xp
en
siv
e
6
Ba
sic
Co
mp
ute
r A
rch
ite
ctu
reB
as
ic C
om
pu
ter
Arc
hit
ec
ture
��P
erf
orm
an
ce is r
ou
gh
ly p
rop
ort
ion
al to
c*f
Perf
orm
an
ce is r
ou
gh
ly p
rop
ort
ion
al to
c*f
��H
igh
er
Hig
her
freq
uen
cy
freq
uen
cy � ���� ���
mo
re p
ow
er
mo
re p
ow
er
––P
ow
er
is p
rop
ort
ion
al to
cV
Po
wer
is p
rop
ort
ion
al to
cV
22ff
––V
olt
ag
e m
ust
scale
wit
h f
req
uen
cy
Vo
ltag
e m
ust
scale
wit
h f
req
uen
cy
––2x f
req
uen
cy
2x f
req
uen
cy � ���� ���
2x p
erf
orm
an
ce, b
ut
8x p
ow
er!
2x p
erf
orm
an
ce, b
ut
8x p
ow
er!
��A
ltern
ati
vely
, w
e
Alt
ern
ati
vely
, w
e c
an
exp
lore
can
exp
lore
para
lleli
sm
para
lleli
sm
––K
eep
fre
qu
en
cy t
he s
am
eK
eep
fre
qu
en
cy t
he s
am
e
––U
se m
ore
are
a (
do
ub
le t
he c
ap
acit
an
ce “
c”)
Use m
ore
are
a (
do
ub
le t
he c
ap
acit
an
ce “
c”)
––2x a
rea
2x a
rea � ���� ���
2x p
erf
orm
an
ce, w
ith
on
ly 2
x p
ow
er!
2x p
erf
orm
an
ce, w
ith
on
ly 2
x p
ow
er!
7
Co
mm
erc
ial E
xa
mp
les
of
Mu
lti
Co
mm
erc
ial E
xa
mp
les
of
Mu
lti--
Co
re
Co
re
for
Ge
ne
ral P
urp
os
e C
om
pu
tin
gfo
r G
en
era
l P
urp
os
e C
om
pu
tin
g
��In
tel®
In
tel®
––P
en
tiu
m®
DP
en
tiu
m®
D
––C
ore
™ D
uo
Co
re™
Du
o
––C
ore
™ 2
Du
oC
ore
™ 2
Du
o
––C
ore
™ 2
Qu
ad
Co
re™
2 Q
uad
––C
ore
™ i7
Co
re™
i7
��IB
MIB
M
––C
EL
L B
road
ban
d E
ng
ine
CE
LL
Bro
ad
ban
d E
ng
ine
��S
un
Su
n
––U
ltra
SP
AR
CU
ltra
SP
AR
CT
1 (
co
de n
am
e:
Nia
gara
)T
1 (
co
de n
am
e:
Nia
gara
)
Die
ph
oto
of In
tel 4
5n
m d
ua
l co
re p
roce
sso
rs
So
urc
e:
http
://w
ww
.inte
l.co
m/p
ressro
om
/arc
hiv
e/r
ele
ase
s/
20
07
032
8fa
ct.h
tm
So
urc
e:
http
://e
n.w
ikip
ed
ia.o
rg/w
iki
/Ce
ll_m
icro
pro
ce
ssor
So
urc
e: h
ttp
://e
n.w
ikip
ed
ia.o
rg/w
iki/U
ltra
SP
AR
C_
T1
Pro
gra
mm
ab
le G
PU
•E
arl
y g
rap
hic
s p
roce
sso
rs
–O
fflo
ad
ed
gra
ph
ics
fro
m t
he
CP
U
–S
imp
le, f
ixe
d f
un
ctio
n p
ipe
line
•G
PU
pe
rfo
rma
nce
ra
mp
–1
.7x
FLO
PS
incr
ea
se p
er
yea
r *
–H
igh
co
mp
uta
tio
na
l de
nsi
ty
gain
ed
by
incr
ea
sed
pa
ralle
lism
–A
lso
by
larg
er
die
& m
ore
po
we
r
•C
ha
ng
ing
gra
ph
ics
pip
elin
e
–P
rog
ram
ma
ble
sta
ge
s a
dd
ed
Fra
me
Bu
ffe
rB
len
d
Inp
ut
Da
ta
Pix
el
Pro
ce
ss
ing
Ra
ste
riza
tio
n
Pri
mit
ive
Se
tup
Tra
ns
form
ati
on
an
d L
igh
tin
g
Fra
me
Bu
ffe
r
Inp
ut
Da
ta
Pri
mit
ive
Se
tup
Ra
ste
riza
tio
n
Fra
me
Bu
ffe
rB
len
d
Pix
el
Sh
ad
ing
Fra
me
Bu
ffe
r
Ve
rte
xS
ha
din
g
Ge
om
etr
yS
ha
din
g
* S
ou
rce
: Jo
hn
Ow
en
s, “E
xp
erie
nce
s w
ith G
PU
Co
mp
utin
g”,
8th
An
nu
al I
EE
E/N
AT
EA
Co
nfe
ren
ce
, 2
00
7
Pre
20
01
:F
ixe
d F
un
cti
on
Po
st
20
06
:P
rog
ram
ma
ble
8
9
Blo
ck
Dia
gra
m o
f G
rap
hic
Pip
eli
ne
Blo
ck
Dia
gra
m o
f G
rap
hic
Pip
eli
ne
Vertex
Index
Stream
3D API
Commands
Assembled
Primitives
Pixel
Updates
Pixel
Location
Stream
Programmable
Fragment
Processor
Programmable
Fragment
Processor
TransformedVertices
Programmable
Vertex
Processor
Programmable
Vertex
Processor
GPU
Front End
GPU
Front End
Prim
itive
Assembly
Prim
itive
Assembly
Frame
Buffer
Frame
Buffer
Raster
Operations
Rasterization
and
Interpolation
3D API:
OpenGL or
Direct3D
3D API:
OpenGL or
Direct3D
3D
Application
Or Game
3D
Application
Or Game
Pre-transformedVertices
Pre-transformedFragments
TransformedFragments
GPUCommand &Data Stream
CPU-G
PU Boundary (AGP/PCIe)
Fixed-function
pipeline
So
urc
e: U
PE
NN
's C
IS 6
65
tu
toria
ls: h
ttp
://w
ww
.se
as.u
pen
n.e
du/~
cis
665/
G8
0 T
hre
ad
Co
mp
uti
ng
Pip
elin
e
Alt
ern
ati
ve o
pe
rati
ng
mo
de
sp
eci
fica
lly f
or
com
pu
tin
g
L2
FB
SP
SP
L1
TF
Thread Processor
Vtx
Th
rea
d I
ssu
e
Se
tup
/ R
str
/ Z
Cu
ll
Ge
om
Th
rea
d I
ssu
eP
ixe
l T
hre
ad
Iss
ue
Inp
ut
Ass
em
ble
r
Ho
st
SP
SP
L1
TF
SP
SP
L1
TF
SP
SP
L1
TF
SP
SP
L1
TF
SP
SP
L1
TF
SP
SP
L1
TF
SP
SP
L1
TF
L2
FB
L2
FB
L2
FB
L2
FB
L2
FB
Loa
d/s
tore
Glo
ba
l M
em
ory
Th
rea
d E
xe
cuti
on
Ma
na
ge
r
Inp
ut
Ass
em
ble
r
Ho
st
Tex
ture
Te
xtu
reT
ex
ture
Te
xtu
reT
ex
ture
Te
xtu
reT
ex
ture
Te
xtu
reT
ex
ture
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Loa
d/s
tore
Loa
d/s
tore
Loa
d/s
tore
Loa
d/s
tore
Loa
d/s
tore
10
So
urc
e: U
PE
NN
's C
IS 6
65
tu
toria
ls: h
ttp
://w
ww
.se
as.u
pen
n.e
du/~
cis
665/
11
So
ftw
are
S
oft
ware
To
ol
To
ol o
n G
PU
on
GP
U
CPU C program
CUDA C program
void add_matrix_cpu
(float *a, float *b, float *c, int
N)
{
inti, j, index;
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
index = i+j*N;
c[index]=a[index]+b[index];
}
}
} void main()
{
...
add_matrix_cpu(a,b,c,N);
}
__global__ void add_matrix_gpu
(float *a, float *b, float *c, int
N)
{
inti=blockidx.x*blodkDim.x+threadidx.x;
intj=blockidx.y*blodkDim.y+threadidx.y;
index = i+j*N;
c[index]=a[index]+b[index];
} void main()
{
...
dim3 dimBlock
(blocksize,blocksize);
dim3 imGrid
(N/dimBlock.x,N/dimBlock.y);
add_matrix_gpu<<<dimGrid,dimBlock>>>(a,b,c,N);
}
��H
igh
Hig
h--l
evel p
rog
ram
min
g lan
gu
ag
es
level p
rog
ram
min
g lan
gu
ag
es
––H
LS
L, C
g, G
LS
L, C
TM
, H
LS
L, C
g, G
LS
L, C
TM
, B
roo
kG
PU
Bro
okG
PU
––C
UD
A (
Co
mp
ute
Un
ifie
d D
evic
e A
rch
itectu
re)
CU
DA
(C
om
pu
te U
nif
ied
Devic
e A
rch
itectu
re)
––O
pen
CL
Op
en
CL
��M
an
y n
on
Man
y n
on
--gra
ph
ics a
pp
licati
on
s h
ave b
een
bu
ilt
gra
ph
ics a
pp
licati
on
s h
ave b
een
bu
ilt
––h
ttp
://w
ww
.gp
gp
u.o
rgh
ttp
://w
ww
.gp
gp
u.o
rg
12
Mu
lti
Mu
lti--
co
re is
c
ore
is
Pre
va
len
tP
reva
len
t��
Gra
ph
ics p
rocessin
g u
nit
(G
PU
)G
rap
hic
s p
rocessin
g u
nit
(G
PU
)––
NIV
DIA
8800 G
TX
has 1
28 s
tream
ing
pro
cesso
rsN
IVD
IA 8
800 G
TX
has 1
28 s
tream
ing
pro
cesso
rs––
NIV
DIA
GT
X 2
80 h
as 2
40
str
eam
ing
pro
cesso
rsN
IVD
IA G
TX
280 h
as 2
40
str
eam
ing
pro
cesso
rs
��In
tel u
pco
min
g
Inte
l u
pco
min
g L
arr
ab
ee
Larr
ab
ee
pro
cesso
rsp
rocesso
rs��
Mu
lti
Mu
lti--
pro
cesso
r syste
mp
rocesso
r syste
m--o
no
n--aa
--ch
ip (
ch
ip (
MP
So
CM
PS
oC
) ) ��
Em
bed
ded
Em
bed
ded
––3D
lab
s’s
DM
S3D
lab
s’s
DM
S--0
202
––A
nalo
g D
evic
es’
AD
SP
An
alo
g D
evic
es’
AD
SP
--BF
561
BF
561
––C
rad
le T
ech
no
log
y’s
CT
3616
Cra
dle
Tech
no
log
y’s
CT
3616
––F
reescale
’sF
reescale
’sM
SC
8156
MS
C8156
––S
an
db
rid
ge
San
db
rid
ge
Tech
no
log
ies’
SB
3500
Tech
no
log
ies’
SB
3500
––T
en
silic
a’s
Ten
silic
a’s
388V
DO
388V
DO
––T
I’s
TI’s D
aV
inci
DaV
incian
d O
MA
P p
rocesso
rsan
d O
MA
P p
rocesso
rs––
Tilera
’sT
ilera
’sT
ILE
64 p
rocess
or
TIL
E64 p
rocess
or
13
1
10
100
1000 2
006
2008
2010
2012
2014
2016
2018
Year
The number of cores
2x e
very
tw
o y
ears
2x e
very
1.5
years
Fu
ture
of
Mu
lti
Fu
ture
of
Mu
lti--
co
res
co
res
��M
oo
re’s
Law
M
oo
re’s
Law
� ���� ���# o
f co
res w
ill d
ou
ble
18
# o
f co
res w
ill d
ou
ble
18––24 m
on
ths
24 m
on
ths
��2007
2007 ––
8 c
ore
s8 c
ore
s
��2009
2009 ––
16
16
��2013
2013 ––
64
64
��2015
2015 ––
128
128
��2021
2021 ––
1024
1024
Fu
ture
arc
hit
ectu
re w
ill h
ave m
an
y c
ore
s
Fu
ture
arc
hit
ectu
re w
ill h
ave m
an
y c
ore
s
14
Mu
lti
Mu
lti--
co
re is
c
ore
is
Be
co
min
g M
ain
str
ea
mB
ec
om
ing
Ma
ins
tre
am
��“T
he s
eq
uen
tial p
roce
sso
r era
is n
ow
over.
” “
Th
at
“T
he s
eq
uen
tial p
roce
sso
r era
is n
ow
over.
” “
Th
at
is b
ad
new
s f
or
so
ftw
are
co
mp
an
ies.”
is b
ad
new
s f
or
so
ftw
are
co
mp
an
ies.”
––W
. G
ibb
s, "A
Sp
lit
at
the C
ore
," S
cie
nti
fic A
meri
can
, N
ov
W. G
ibb
s, "A
Sp
lit
at
the C
ore
," S
cie
nti
fic A
meri
can
, N
ov
2004
2004
��D
uri
ng
th
e 8
0s, N
SF
tri
ed
to
pers
uad
e t
he c
om
pu
ter
Du
rin
g t
he 8
0s, N
SF
tri
ed
to
pers
uad
e t
he c
om
pu
ter
ind
ustr
y, b
ut
“fo
un
d lit
tle in
tere
st.
” “
No
w t
he
ind
ustr
y, b
ut
“fo
un
d lit
tle in
tere
st.
” “
No
w t
he
mach
ines a
re h
ere
” …
to
get
aro
un
d “
po
wer
wall
.”
mach
ines a
re h
ere
” …
to
get
aro
un
d “
po
wer
wall
.”
“N
ew
er
ch
ips w
ith
mu
ltip
le p
rocesso
rs r
eq
uir
e
“N
ew
er
ch
ips w
ith
mu
ltip
le p
rocesso
rs r
eq
uir
e
dau
nti
ng
ly c
om
ple
x s
oft
ware
.”d
au
nti
ng
ly c
om
ple
x s
oft
ware
.”
––J.
J. M
ark
off
Mark
off
, "F
aste
r C
hip
s a
re L
eavin
g P
rog
ram
mers
in
, "F
aste
r C
hip
s a
re L
eavin
g P
rog
ram
mers
in
T
heir
Du
st,
" T
he N
ew
Yo
rk T
imes, D
ec 2
007
Th
eir
Du
st,
" T
he N
ew
Yo
rk T
imes, D
ec 2
007
15
Pe
rfo
rma
nc
e v
s.
Sc
ala
bil
ity
Pe
rfo
rma
nc
e v
s.
Sc
ala
bil
ity
1122
4488
16
16
32
32
Number of cores
Number of cores
0022 11334455Relative Throughput Relative Throughput
gSpan algorithm
gSpan algorithm
Gaston algorithm
Gaston algorithm
Gra
ph M
inin
gG
raph M
inin
g
Rig
ht
alg
ori
thm
fo
r m
an
y c
ore
s i
s i
mp
ort
an
tR
igh
t alg
ori
thm
fo
r m
an
y c
ore
s i
s i
mp
ort
an
t
16
Mu
lti
Mu
lti--
Co
re is
Be
co
min
g M
ain
str
ea
mC
ore
is
Be
co
min
g M
ain
str
ea
m��
Arc
hit
ectu
re:
Nu
mb
er
of
co
res p
er
ch
ip w
ill
Arc
hit
ectu
re:
Nu
mb
er
of
co
res p
er
ch
ip w
ill
gro
w q
uic
kly
gro
w q
uic
kly
––P
ow
er
co
nsu
mp
tio
n is a
key d
river
Po
wer
co
nsu
mp
tio
n is a
key d
river
��A
lgo
rith
m:
Fu
ture
pro
cesso
rs d
em
an
d s
pecia
l A
lgo
rith
m:
Fu
ture
pro
cesso
rs d
em
an
d s
pecia
l d
esig
ns
desig
ns
––A
lgo
rith
ms o
ften
need
ch
an
ges
Alg
ori
thm
s o
ften
need
ch
an
ges
��A
lth
ou
gh
para
llel
co
mp
uti
ng
has b
een
stu
die
d
Alt
ho
ug
h p
ara
llel
co
mp
uti
ng
has b
een
stu
die
d
for
decad
es
for
decad
es
––In
th
e p
ast,
scie
nti
fic a
pp
licati
on
s
In t
he p
ast,
scie
nti
fic a
pp
licati
on
s � ���� ���
Few
peo
ple
Few
peo
ple
––T
od
ay,
mu
lti
To
day,
mu
lti--
co
re p
roces
so
rs a
re d
iffe
ren
tco
re p
roces
so
rs a
re d
iffe
ren
t•
Sm
alle
r la
st-
level on-d
ie c
ache
•F
aste
r in
ter-
core
on-d
ie c
om
munic
ation
17
Ou
tlin
eO
utl
ine
��M
oti
vati
on
& c
urr
en
t tr
en
ds
Mo
tivati
on
& c
urr
en
t tr
en
ds
��T
heo
ry a
nd
pri
ncip
les i
n p
ara
lleli
zati
on
T
heo
ry a
nd
pri
ncip
les i
n p
ara
lleli
zati
on
(C
PU
an
d G
PU
have s
am
e p
rin
cip
les)
(CP
U a
nd
GP
U h
ave s
am
e p
rin
cip
les)
––A
vo
id s
eq
uen
tial d
ep
en
den
cie
sA
vo
id s
eq
uen
tial d
ep
en
den
cie
s
––M
axim
ize lo
ad
bala
nce
Maxim
ize lo
ad
bala
nce
––R
ed
uce o
verh
ead
Red
uce o
verh
ead
��A
dvan
ce
d o
pti
miz
ati
on
tech
niq
ues i
n m
ult
iA
dvan
ce
d o
pti
miz
ati
on
tech
niq
ues i
n m
ult
i--co
re a
rch
itectu
reco
re a
rch
itectu
re––
CP
U a
nd
GP
U h
ave d
iffe
ren
t tr
eatm
en
tsC
PU
an
d G
PU
have d
iffe
ren
t tr
eatm
en
ts
��A
sp
ecif
ic d
esig
n e
xam
ple
A
sp
ecif
ic d
esig
n e
xam
ple
��S
um
mary
Su
mm
ary
18
Mu
lti
Mu
lti--
thre
ad
: O
ne
wa
y t
o P
rog
ram
Mu
lti
thre
ad
: O
ne
wa
y t
o P
rog
ram
Mu
lti--
co
rec
ore
for
(i=
0;
i<it
era
tio
ns;
i++
)fo
r (i
=0;
i<it
era
tio
ns;
i++
)
{{
part
1(i
, b
uff
);
part
1(i
, b
uff
);
// d
ata
are
passed
via
bu
ff//
data
are
passed
via
bu
ff
part
2(i
, b
uff
);p
art
2(i
, b
uff
);
}}
��S
ing
leS
ing
le--t
hre
ad
ed
execu
tio
n m
od
el
thre
ad
ed
execu
tio
n m
od
el
��M
ult
iM
ult
i--th
read
ed
execu
tio
n m
od
el
thre
ad
ed
execu
tio
n m
od
el
Pa
rt 1
(1)
Pa
rt 2
(1)
Pa
rt 1
(2)
Pa
rt 2
(2)
Pa
rt 1
(3)
Pa
rt 2
(3)
Pa
rt 1
(1)
Pa
rt 2
(1)
Pa
rt 1
(2)
Pa
rt 2
(2)
Pa
rt 1
(3)
Pa
rt 2
(3)
Pa
rt 1
(4)
Pa
rt 2
(4)
Pa
rt 1
(4)
Pa
rt 2
(4)
Pro
ce
sso
r 1
Pro
ce
sso
r 2
19
A R
ea
l T
hre
ad
ing
Ex
am
ple
A R
ea
l T
hre
ad
ing
Ex
am
ple
vo
id t
hre
ad
1(
)v
oid
th
read
1(
)
{
{ in
t b
_id
x=
0;
// in
dex t
o 2
bu
ffers
in
t b
_id
x=
0;
// in
dex t
o 2
bu
ffers
part
1(i
,bu
f[b
_id
x])
;p
art
1(i
,bu
f[b
_id
x])
;
b_id
x=
1b
_id
x=
1--b
_id
x;
b_id
x;
for
(i=
1;
i<it
era
tio
ns;
i++
)fo
r (i
=1;
i<it
era
tio
ns;
i++
)
{{
Sig
nal(
&sig
nal,1);
Sig
nal(
&sig
nal,1);
part
1(i
,bu
f[b
_id
x])
;p
art
1(i
,bu
f[b
_id
x])
;
Wait
Fo
rSig
nal(
en
d,1
);W
ait
Fo
rSig
nal(
en
d,1
);
b_id
x=
1b
_id
x=
1--b
_id
x;
b_id
x;
}} Sig
nal(
&sig
nal,1);
Sig
nal(
&sig
nal,1);
}}
vo
id t
hre
ad
2(
)v
oid
th
read
2(
)
{{
int
b_id
x=
0;
// in
dex t
o 2
bu
ffers
in
t b
_id
x=
0;
// in
dex t
o 2
bu
ffers
for
(i=
0;
i<it
era
tio
ns;
i++
)fo
r (i
=0;
i<it
era
tio
ns;
i++
)
{{
Wia
tFo
rSig
nal(
sig
nal,1);
Wia
tFo
rSig
nal(
sig
nal,1);
part
2(i
,bu
f[b
_id
x])
;p
art
2(i
,bu
f[b
_id
x])
;
Sig
nal(
&en
d,1
);S
ign
al(
&en
d,1
);
b_id
x=
1b
_id
x=
1--b
_id
x;
b_id
x;
}}
}}
20
Pa
rall
eli
za
tio
n S
ch
em
es
Pa
rall
eli
za
tio
n S
ch
em
es
��D
ata
Data
--do
main
: fr
am
es,
sli
ces, an
d m
acro
blo
cks
do
main
: fr
am
es,
sli
ces, an
d m
acro
blo
cks
��F
un
cti
on
al
Fu
ncti
on
al--
do
main
: M
E/M
C,
DC
T/I
DC
T,
VL
C/V
LD
do
main
: M
E/M
C,
DC
T/I
DC
T,
VL
C/V
LD
��H
yb
rid
Hyb
rid
slic
es
Thre
ad 1
Thre
ad 2
VL
DID
CT
MC
VL
DID
CT
MC
pic
ture
Thre
ad 1
Thre
ad 2
VL
DID
CT
MC
Thre
ad 3
21
Pa
rall
el
Pa
rall
el S
pe
ed
up
Sp
ee
du
p
��Id
eall
y:
fro
m o
ne p
rocesso
r to
2 p
rocesso
rsId
eall
y:
fro
m o
ne p
rocesso
r to
2 p
rocesso
rs
��S
peed
up
=T
Sp
eed
up
=T
se
qu
en
tia
ls
eq
ue
nti
al/T/T
pa
rallel
pa
rallel=
(t=
(tss+
t+
t pp)/
(t)/
(tss+
t+
t pp/2
)/2
)
��A
md
ah
l’s L
aw
A
md
ah
l’s L
aw
� ���� ���w
hen
wh
en
t st p
t p/2
t s
t p/2
Tse
quen
tial
Tp
ara
llel
s
Ps
Ps
Ps
t
tt
Ntt
tt
+≈
++∞
→N
22
Mo
re R
ea
lis
tic
Sp
ee
du
p R
ule
Mo
re R
ea
lis
tic
Sp
ee
du
p R
ule
��S
peed
up
Sp
eed
up
==
��T
o m
axim
ize p
ara
llel
To
maxim
ize p
ara
llel sp
eed
up
sp
eed
up
––A
vo
id s
eq
uen
tial d
ep
en
den
cie
sA
vo
id s
eq
uen
tial d
ep
en
den
cie
s
––M
axim
ize lo
ad
bala
nce
Maxim
ize lo
ad
bala
nce
––R
ed
uce o
verh
ead
sR
ed
uce o
verh
ead
s
––U
tilize c
ach
e e
ffic
ien
tly
Uti
lize c
ach
e e
ffic
ien
tly
Ot
t
tt
iP
N is
Ps
++
+
=]
[m
ax 1
t p/1
.9t s
t p/2
.1
Tp
ara
llel
overhead
t p/2
t s
t p/2
Tp
ara
llel
Avo
id S
eq
uen
tial
Dep
en
den
cie
sA
vo
id S
eq
uen
tial
Dep
en
den
cie
s
Ot
t
tt
iP
N is
Ps
++
+
=]
[m
ax 1
24
0
32
64
96
128
032
64
96
128
Wh
y A
re W
e S
o P
ara
no
id?
Wh
y A
re W
e S
o P
ara
no
id?
��U
se b
asic
Am
dah
l’s L
aw
: U
se b
asic
Am
dah
l’s L
aw
:
��E
xam
ple
E
xam
ple
11:
Assu
me w
e g
et
: A
ssu
me w
e g
et
11..99
x s
peed
up
wit
h t
wo
x s
peed
up
wit
h t
wo
th
read
sth
read
s
��E
xam
ple
E
xam
ple
22:
Assu
me
: A
ssu
me 11
% s
eri
al co
de
% s
eri
al co
de
0
32
64
96
12
8
03
26
49
61
28
2.5
% s
erial code
77%
belo
w lin
ear
perf
orm
ance a
t
128 t
hre
ads
()
++
Ntt
tt
Ps
Ps
/
56%
belo
w lin
ear
perf
orm
ance a
t
128 t
hre
ads
25
��P
ara
lleli
ze e
very
mo
du
leP
ara
lleli
ze e
very
mo
du
le
––F
rom
4.2
x t
o 1
9.9
x o
n e
dg
e d
ete
cti
on
Fro
m 4
.2x t
o 1
9.9
x o
n e
dg
e d
ete
cti
on
��R
eR
e--o
rder
co
nstr
ain
ts t
o r
em
ove
ord
er
co
nstr
ain
ts t
o r
em
ove
dep
en
den
cie
sd
ep
en
den
cie
s
––F
rom
4x t
o 3
2x o
n p
hysic
s c
on
str
ain
t so
lver
Fro
m 4
x t
o 3
2x o
n p
hysic
s c
on
str
ain
t so
lver
��F
ind
dif
fere
nt
alg
ori
thm
Fin
d d
iffe
ren
t alg
ori
thm
Avo
id S
eq
ue
nti
al D
ep
en
de
ncie
sA
vo
id S
eq
ue
nti
al D
ep
en
de
ncie
s
26
Ex
am
ple
: C
an
ny E
dg
e D
ete
cto
rE
xa
mp
le:
Ca
nn
y E
dg
e D
ete
cto
r
��F
ou
r ste
ps:
Fo
ur
ste
ps:
––(a
) im
ag
e g
rad
ien
t an
d e
dg
e o
rien
tati
on
(a)
imag
e g
rad
ien
t an
d e
dg
e o
rien
tati
on
––(b
) n
on
(b)
no
n--m
axim
um
su
pp
ressio
n a
nd
hig
h g
rad
ien
t m
axim
um
su
pp
ressio
n a
nd
hig
h g
rad
ien
t p
ixel id
en
tifi
cati
on
pix
el id
en
tifi
cati
on
––(c
) h
yste
risis
(tr
acin
g f
rom
hig
h g
rad
ien
t p
ixels
)(c
) h
yste
risis
(tr
acin
g f
rom
hig
h g
rad
ien
t p
ixels
)
––(d
) fi
nal assig
nm
en
t o
f th
e b
inary
decis
ion
(d)
fin
al assig
nm
en
t o
f th
e b
inary
decis
ion
��S
tep
s a
, b
, &
d a
re e
asy t
o p
ara
lleli
ze i
n
Ste
ps a
, b
, &
d a
re e
asy t
o p
ara
lleli
ze i
n
imag
e d
om
ain
imag
e d
om
ain
(a)
(a)
(b)
(b)
(c)
(c)
27
Hys
teri
sis
Te
sti
ng
Hys
teri
sis
Te
sti
ng
��2
2 t
hre
sh
old
s, a h
igh
th
resh
old
s, a h
igh
T_
hT
_h
an
d a
lo
w
an
d a
lo
w T
_w
T_w
��E
dg
eE
dg
e––
Pix
el w
ho
se v
alu
e i
s g
reate
r th
an
P
ixel w
ho
se v
alu
e i
s g
reate
r th
an
T_h
T_h
––PP
ixel w
ho
se v
alu
e i
s g
reate
r th
an
ix
el w
ho
se v
alu
e i
s g
reate
r th
an
T_w
T_w
an
d t
hat
is
an
d t
hat
is
co
nn
ecte
d t
o a
n e
dg
e p
ixe
lco
nn
ecte
d t
o a
n e
dg
e p
ixe
l
T_h
T_h
T_w
T_w
28
Pa
rall
eli
za
tio
n o
f H
ys
teri
sis
Pa
rall
eli
za
tio
n o
f H
ys
teri
sis
��C
an
no
t sim
ply
div
ide
Can
no
t sim
ply
div
ide
the im
ag
e i
nto
reg
ion
sth
e im
ag
e i
nto
reg
ion
s��
Du
pli
cati
on
s m
ay
Du
pli
cati
on
s m
ay
red
uce p
erf
orm
an
ce
red
uce p
erf
orm
an
ce
��C
on
ven
tio
nall
y, w
e u
se s
eri
al co
de
Co
nven
tio
nall
y, w
e u
se s
eri
al co
de
��B
ad
fo
r m
an
y c
ore
sB
ad
fo
r m
an
y c
ore
s
Ot
t
tt
iP
N is
Ps
++
+
=]
[m
ax 1
29
08
16
24
32
08
16
24
32
Pro
ce
sso
rs
Speed-up
Ori
gin
al
With
Pa
ralle
l H
yste
risis
Lin
ea
r S
ca
ling
Sp
ee
du
p
Sp
ee
du
p A
fte
r C
are
ful P
ara
lle
liza
tio
nA
fte
r C
are
ful P
ara
lle
liza
tio
n
19.9
4.2~
5X
30
Re
su
lts
be
fore
an
d a
fte
r P
ara
lle
liza
tio
nR
es
ult
s b
efo
re a
nd
aft
er
Pa
rall
eli
za
tio
n
��S
am
e r
esu
lts
Sam
e r
esu
lts
––N
ice
Nic
e--t
oto--h
ave
have
––N
ot
alw
ays r
eq
uir
ed
N
ot
alw
ays r
eq
uir
ed
––M
ust
Mu
st--
have:
sem
an
tic c
orr
ectn
ess
have:
sem
an
tic c
orr
ectn
ess
––F
or
exam
ple
: ra
nd
om
nu
mb
er
gen
era
tor
Fo
r exam
ple
: ra
nd
om
nu
mb
er
gen
era
tor
��H
ave m
ore
H
ave m
ore
fre
ed
om
fr
eed
om
wh
en
para
llel
wh
en
para
llel
alg
ori
thm
alg
ori
thm
is
all
ow
ed
to
pro
du
ce d
iffe
ren
t re
su
ltis
all
ow
ed
to
pro
du
ce d
iffe
ren
t re
su
lt
––S
om
e a
lgo
rith
ms h
ave
no
sin
gle
co
rrect
an
sw
er
So
me a
lgo
rith
ms h
ave
no
sin
gle
co
rrect
an
sw
er
Do
n’t
be a
fraid
of
ch
an
gin
g r
esu
lts
Do
n’t
be a
fraid
of
ch
an
gin
g r
esu
lts
for
bett
er
para
lleli
sm
for
bett
er
para
lleli
sm
Maxim
ize L
oad
Bala
nce
Maxim
ize L
oad
Bala
nce
Ot
t
tt
iP
N is
Ps
++
+
=]
[m
ax 1
32
��MM
inim
ize t
he s
low
est
inim
ize t
he s
low
est
execu
tio
n t
ime
execu
tio
n t
ime
��M
axim
ize l
oad
bala
nce
Maxim
ize l
oad
bala
nce
––A
lmo
st
all
pro
cesso
rs h
ave s
am
e a
mo
un
t o
f lo
ad
Alm
ost
all
pro
cesso
rs h
ave s
am
e a
mo
un
t o
f lo
ad
––N
o p
rocesso
r is
id
leN
o p
rocesso
r is
id
le
Ma
xim
ize P
erf
orm
an
ce
Ma
xim
ize P
erf
orm
an
ce
0.5
0.5
0.6
0.4
vs.
0.5
0.6
{}
P
N i
iP
iP
N it
tt
ti
P
=
∑ =
=1
1]
[m
axm
in
33
��D
yn
am
ic t
ask a
ssig
nm
en
tD
yn
am
ic t
ask a
ssig
nm
en
t
––4%
on
Hyp
er
4%
on
Hyp
er--
Th
read
ing
fo
r M
PE
G d
eco
din
g
Th
read
ing
fo
r M
PE
G d
eco
din
g
��M
ake t
asks s
mall
er
Make t
asks s
mall
er
––F
rom
2.8
x t
o 5
.7x o
n 8
pro
cesso
rs f
or
gra
ph
F
rom
2.8
x t
o 5
.7x o
n 8
pro
cesso
rs f
or
gra
ph
m
inin
gm
inin
g
��F
use m
ult
iple
lo
op
sF
use m
ult
iple
lo
op
s
––F
rom
2.9
x t
o 5
.5x o
n 8
pro
cesso
rs f
or
Fro
m 2
.9x t
o 5
.5x o
n 8
pro
cesso
rs f
or
Ad
aB
oo
st
Ad
aB
oo
st
��U
se d
ata
Use d
ata
--do
main
deco
mp
osit
ion
in
ste
ad
of
do
main
deco
mp
osit
ion
in
ste
ad
of
fun
cti
on
al
fun
cti
on
al--
do
main
deco
mp
osit
ion
do
main
deco
mp
osit
ion
��F
ind
dif
fere
nt
alg
ori
thm
Fin
d d
iffe
ren
t alg
ori
thm
Lo
ad
Ba
lan
ce
Ap
pro
ac
he
sL
oa
d B
ala
nc
e A
pp
roa
ch
es
34
(b)
Dyn
am
ic
dis
patc
hin
g
(a)
Sta
tic
dis
patc
hin
g
pic
ture
slic
es
Assig
ned
slic
es
Thre
ad 1
Thre
ad 2
Thre
ad 1
Thre
ad 2
Pa
rall
eli
ze
P
ara
lle
lize
MP
EG
Co
de
cM
PE
G C
od
ec
35
Co
ars
eC
oa
rse
--Gra
ined
G
rain
ed
an
d
an
d F
ine
Fin
e--G
rain
ed
Gra
ine
d
Pa
ralle
lizatio
n o
f P
ara
lleliz
atio
n o
f P
refixS
pa
nP
refixS
pa
n
()
ab
cd
e
aa
ab
(ab)
aaa
aab
a(a
b)
cc
cd
(cd)
ccc
ccd
c(c
d)
(ce)
Coars
e
Fin
e
()
ab
cd
e
aa
ab
(ab)
aaa
aab
a(a
b)
cc
cd
(cd)
ccc
ccd
c(c
d)
(ce)
Coars
e
Fin
e
36
Tim
e V
ari
ati
on
via
T
ime
Va
ria
tio
n v
ia C
oa
rse
Co
ars
e--G
rain
ed
G
rain
ed
Pa
rall
eli
sm
Pa
rall
eli
sm
0
0.2
0.4
0.6
0.81
1.2
1.4
Task
Time
Vari
ati
on
V
ari
ati
on
� ���� ���L
oad
im
bala
nce
Lo
ad
im
bala
nce
37
Sp
ee
du
p
Sp
ee
du
p v
s.
Pa
rtit
ion
ing
Le
ve
lsvs
. P
art
itio
nin
g L
eve
ls
Sm
all
er
Vari
ati
on
S
mall
er
Vari
ati
on
� ���� ���B
ett
er
load
bala
nce
Be
tter
load
bala
nce
2.7
3.3
4.6
4.7
5.1
5.5
5.8
01234567
12
34
56
7
Le
ve
ls o
f p
art
itio
ns
Speed-up on 8 proceessors
38
Pre
fer
Da
taP
refe
r D
ata
--Do
ma
in o
ve
r D
om
ain
ove
r F
un
cti
on
al
Fu
nc
tio
na
l--D
om
ain
Pa
rall
eli
za
tio
nD
om
ain
Pa
rall
eli
za
tio
n
��D
ata
Data
--do
main
: fr
am
es,
sli
ces, an
d
do
main
: fr
am
es,
sli
ces, an
d m
acro
blo
cks
macro
blo
cks
��F
un
cti
on
al
Fu
ncti
on
al--
do
main
: M
E/M
C,
DC
T/I
DC
T,
do
main
: M
E/M
C,
DC
T/I
DC
T,
VL
C/V
LD
VL
C/V
LD
slic
es
Thre
ad 1
Thre
ad 2
VL
DID
CT
MC
VL
DID
CT
MC
pic
ture
Thre
ad 1
Thre
ad 2
VL
DID
CT
MC
Thre
ad 3
Red
uce O
verh
ead
Red
uce O
verh
ead
Ot
t
tt
iP
N is
Ps
++
+
=]
[m
ax 1
40
Re
du
ce
Ove
rhe
ad
Re
du
ce
Ove
rhe
ad
��R
ed
uce c
om
mu
nic
ati
on
Red
uce c
om
mu
nic
ati
on
––U
se m
ore
co
mp
uta
tio
n in
ste
ad
Use m
ore
co
mp
uta
tio
n in
ste
ad
��R
ed
uce r
ep
eate
d c
om
pu
tati
on
Red
uce r
ep
eate
d c
om
pu
tati
on
��R
ed
uce
Red
uce l
ockin
g o
verh
ead
lockin
g o
verh
ead
––F
ine
Fin
e--g
rain
ed
lo
ck
gra
ined
lo
ck in
ste
ad
of
ins
tead
of
co
ars
eco
ars
e--g
rain
ed
lo
ck
gra
ined
lo
ck
––P
rivate
bu
ffer
inste
ad
of
sh
are
d m
em
ory
un
less
Pri
vate
bu
ffer
inste
ad
of
sh
are
d m
em
ory
un
less
necessary
necessary
��R
ed
uce b
arr
iers
or
syn
ch
ron
izati
on
Red
uce b
arr
iers
or
syn
ch
ron
izati
on
––P
ara
lleli
ze t
he o
ute
r lo
op
Para
lleli
ze t
he o
ute
r lo
op
41
Ove
rhe
ad
: C
om
mu
nic
ati
on
Ove
rhe
ad
: C
om
mu
nic
ati
on
��G
asto
n
Gasto
n
––M
ore
co
mm
un
icati
on
betw
een
tasks
Mo
re c
om
mu
nic
ati
on
betw
een
tasks
––F
aste
r w
hen
sin
gle
Faste
r w
hen
sin
gle
--th
read
ed
thre
ad
ed
��G
sp
an
Gsp
an
––L
ess c
om
mu
nic
ati
on
betw
een
tasks
Less c
om
mu
nic
ati
on
betw
een
tasks
––S
low
er
wh
en
sin
gle
Slo
wer
wh
en
sin
gle
--th
read
ed
thre
ad
ed
42
Pe
rfo
rma
nc
e v
s.
Sc
ala
bil
ity
Pe
rfo
rma
nc
e v
s.
Sc
ala
bil
ity
1122
4488
16
16
32
32
Number of cores
Number of cores
0022 11334455
Relative Throughput Relative Throughput
gSpan algorithm
gSpan algorithm
Gaston algorithm
Gaston algorithm
Gra
ph M
inin
gG
raph M
inin
g
43
Ove
rhe
ad
: R
ep
ea
ted
Co
mp
uta
tio
nO
ve
rhe
ad
: R
ep
ea
ted
Co
mp
uta
tio
n
64 p
art
itio
ns
4 p
art
itio
ns
Re
plic
ate
d d
ata
/work
44
Ad
vancin
g t
he fro
nt
Initia
l F
luid
-air
inte
rface
Fa
st
Ma
rch
ing
Me
tho
dF
as
t M
arc
hin
g M
eth
od
45
Sw
eep a
long t
he g
rid n
odes a
nd
update
the d
ista
nce
Dif
fere
nt
Alg
ori
thm
:D
iffe
ren
t A
lgo
rith
m:
Fa
st
sw
ee
pin
g m
eth
od
Fa
st
sw
ee
pin
g m
eth
od
46
Pa
rall
el P
erf
orm
an
ce
Pa
rall
el P
erf
orm
an
ce
0
16
32
48
64
016
32
48
64
# o
f core
s
Speed-up
Fast
marc
hin
g m
eth
od
Fast
sw
eepin
g m
eth
od
~2
X
47
Ove
rhe
ad
: G
ua
ran
tee
ing
Co
rre
ctn
es
s
Ove
rhe
ad
: G
ua
ran
tee
ing
Co
rre
ctn
es
s
��T
wo
ch
ecks c
an
be
cash
ed
at
the s
am
e t
ime
Tw
o c
hecks c
an
be
cash
ed
at
the s
am
e t
ime
��O
nly
O
nly
on
e o
f o
ne o
f th
e “
tran
sacti
on
s” s
ho
uld
go
th
e “
tran
sacti
on
s” s
ho
uld
go
th
rou
gh
thro
ug
h
��T
his
oft
en
req
uir
es
“lo
cks”
Th
is o
ften
req
uir
es
“lo
cks” � ���� ���
Overh
ead
Overh
ead
$1
00
$1
00
$1
00
Ba
nk
1
Ba
nk
2
48
Ho
ug
h T
ran
sfo
rmH
ou
gh
Tra
ns
form
��W
ide
ly u
se
d in
co
mp
ute
r vis
ion
an
d
Wid
ely
us
ed
in
co
mp
ute
r vis
ion
an
d
dig
ita
l im
ag
e p
roc
es
sin
gd
igit
al im
ag
e p
roc
es
sin
g
��F
or
ex
am
ple
, li
ne
de
tec
tio
nF
or
ex
am
ple
, li
ne
de
tec
tio
n
49
Ho
ug
h T
ran
sfo
rm C
od
eH
ou
gh
Tra
ns
form
Co
de
for( j = 0; j < height; j++ )
for( j = 0; j < height; j++ )
for(
for( ii= 0;
= 0; ii< width;
< width; ii++ )
++ )
{{
if(
if( BinImage
BinImage[j][
[j][ii] != 0 )
] != 0 )
for( theta = 0; theta <
for( theta = 0; theta < numangle
numangle; theta++ )
; theta++ )
{{
rho =
rho = ii* Cos[theta] + j * Sin[theta];
* Cos[theta] + j * Sin[theta];
accum
accum[theta][rho]++;
[theta][rho]++;
}}
}}
��T
o g
uara
nte
e c
orr
ectn
ess
To
gu
ara
nte
e c
orr
ectn
ess
––U
se l
ock p
rote
ct
the d
ata
Use l
ock p
rote
ct
the d
ata
––P
rivate
bu
ffer
Pri
vate
bu
ffer
50
Re
Re
--ord
eri
ng
Lo
op
s in
Ho
ug
h T
ran
sfo
rmo
rde
rin
g L
oo
ps
in
Ho
ug
h T
ran
sfo
rmint
int
EdgePixelSize
EdgePixelSize= = 00;;
for( j =
for( j = 00; j < height; j++ )
; j < height; j++ )
for(
for( ii= = 00; ; ii< width;
< width; ii++ )
++ )
if(
if( BinImage
BinImage[j][
[j][ii] !=
] != 0 0 ) {
) {
EdgePixels
EdgePixels[[EdgePixelSize
EdgePixelSize++] = j;
++] = j;
EdgePixels
EdgePixels[[EdgePixelSize
EdgePixelSize++] =
++] = ii;;
}}
for(theta =
for(theta = 00; theta <
; theta < numangle
numangle; theta++)
; theta++)
for(m =
for(m = 00; m <
; m < EdgePixelSize
EdgePixelSize; m+=
; m+=22) {
) {
j =
j = EdgePixels
EdgePixels[m];
[m];
ii= = EdgePixels
EdgePixels[m+
[m+11];];
rho =
rho = ii*Cos[theta] + j*Sin[theta];
*Cos[theta] + j*Sin[theta];
accum
accum[theta][rho]++;
[theta][rho]++;
}}
51
Old
vs
. N
ew
Ho
ug
h
Old
vs
. N
ew
Ho
ug
h T
ran
sfo
rmT
ran
sfo
rm
��O
ld (
scatt
er)
Old
(scatt
er)
��N
ew
(g
ath
er)
New
(g
ath
er)
Lock
52
08
16
24
32
40
48
56
64
08
16
24
32
40
48
56
64
Nu
mb
er
of
thre
ad
s
Parallel Speedup
old
Hou
gh
Tra
nsfo
rmn
ew
Ho
ugh
Tra
nsfo
rm
Sp
ee
du
p
Sp
ee
du
p A
fte
r C
are
ful P
ara
lle
liza
tio
nA
fte
r C
are
ful P
ara
lle
liza
tio
n
62
24
~2
.5X
53
Ove
rhe
ad
: S
yn
ch
ron
iza
tio
n &
Ba
rrie
rsO
ve
rhe
ad
: S
yn
ch
ron
iza
tio
n &
Ba
rrie
rs
��T
ask v
ari
ati
on
can
be a
vera
ged
ou
t in
lo
ng
Task v
ari
ati
on
can
be a
vera
ged
ou
t in
lo
ng
--term
te
rm
��E
xp
licit
syn
ch
ron
izati
on
wil
l exp
ose t
he v
ari
ati
on
Exp
licit
syn
ch
ron
izati
on
wil
l exp
ose t
he v
ari
ati
on
54
Pa
rall
eli
ze
th
e O
ute
r L
oo
pP
ara
lle
lize
th
e O
ute
r L
oo
p
For
each
pix
el
For
each
pix
el
For
each
pix
el
Inn
er
loo
p
Fo
r e
ach
fr
am
e
Ou
ter
loo
p
55
GC
UT
Im
ag
e S
eg
me
nta
tio
n o
n S
MP
GC
UT
Im
ag
e S
eg
me
nta
tio
n o
n S
MP
048
12
16
04
812
16
Num
ber
of C
ore
s
Parallel Speedup
Coars
e-G
rain
ed
Fin
e-G
rain
ed
56
Qu
ick
Su
mm
ary
of
the
Pri
nc
iple
sQ
uic
k S
um
ma
ry o
f th
e P
rin
cip
les
��A
vo
id s
eri
al
co
de
Avo
id s
eri
al
co
de
��M
axim
ize l
oad
bala
nce
Maxim
ize l
oad
bala
nce
��R
ed
uce o
verh
ea
dR
ed
uce o
verh
ea
d
Ot
t
tt
iP
N is
Ps
++
+
=]
[m
ax 1
57
Ou
tlin
eO
utl
ine
��M
oti
vati
on
& c
urr
en
t tr
en
ds
Mo
tivati
on
& c
urr
en
t tr
en
ds
��T
heo
ry
Th
eo
ry a
nd
pri
ncip
les i
n
an
d p
rin
cip
les i
n p
ara
lleli
zati
on
para
lleli
zati
on
��A
dvan
ce
d
Ad
van
ce
d o
pti
miz
ati
on
tech
niq
ues i
n m
ult
io
pti
miz
ati
on
tech
niq
ues i
n m
ult
i--co
re
co
re a
rch
itectu
re
arc
hit
ectu
re
––C
PU
an
d G
PU
have d
iffe
ren
t tr
eatm
en
tsC
PU
an
d G
PU
have d
iffe
ren
t tr
eatm
en
ts
––In
cre
ase c
ach
e e
ffic
ien
cy
Incre
ase c
ach
e e
ffic
ien
cy
––In
cre
ase S
IMD
para
lle
lism
In
cre
ase S
IMD
para
lle
lism
––A
vo
id s
yn
ch
ron
izati
on
Avo
id s
yn
ch
ron
izati
on
��A
sp
ecif
ic d
esig
n e
xam
ple
A
sp
ecif
ic d
esig
n e
xam
ple
��S
um
mary
Su
mm
ary
G8
0 T
hre
ad
Co
mp
uti
ng
Pip
elin
e
Alt
ern
ati
ve o
pe
rati
ng
mo
de
sp
eci
fica
lly f
or
com
pu
tin
g
Loa
d/s
tore
Glo
ba
l M
em
ory
Th
rea
d E
xe
cuti
on
Ma
na
ge
r
Inp
ut
Ass
em
ble
r
Ho
st
Tex
ture
Te
xtu
reT
ex
ture
Te
xtu
reT
ex
ture
Te
xtu
reT
ex
ture
Te
xtu
reT
ex
ture
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Pa
rall
el
Da
ta
Ca
che
Loa
d/s
tore
Loa
d/s
tore
Loa
d/s
tore
Loa
d/s
tore
Loa
d/s
tore
58
So
urc
e: U
PE
NN
's C
IS 6
65
tu
toria
ls: h
ttp
://w
ww
.se
as.u
pen
n.e
du/~
cis
665/
59
Hig
hH
igh
--Le
ve
l V
iew
Le
ve
l V
iew
��T
hre
ad
s i
n a
“th
read
blo
ck”
Th
read
s i
n a
“th
read
blo
ck”
execu
ted
in
lo
ck
execu
ted
in
lo
ck--s
tep
ste
p––
“S
tream
ing
mu
lti
“S
tream
ing
mu
lti--
pro
cesso
r”
pro
cesso
r”
� ���� ���“p
rocesso
r”“p
rocesso
r”
––“S
tream
ing
pro
cesso
r”
“S
tream
ing
pro
cesso
r” � ���� ���
“S
IMD
” lan
e“S
IMD
” lan
e
��T
hre
ad
s i
n a
“th
read
blo
ck”
Th
read
s i
n a
“th
read
blo
ck”
can
co
mm
un
icate
th
rou
gh
can
co
mm
un
icate
th
rou
gh
sh
are
d m
em
ory
sh
are
d m
em
ory
––“S
hare
d m
em
ory
”
“S
hare
d m
em
ory
” � ���� ���
“cach
e”
“cach
e”
��T
hre
ad
s f
rom
tw
o d
iffe
ren
t T
hre
ad
s f
rom
tw
o d
iffe
ren
t b
locks h
ard
to
co
op
era
teb
locks h
ard
to
co
op
era
te––
Lim
ited
hard
ware
cach
e
Lim
ited
hard
ware
cach
e
co
here
nce b
etw
een
cach
es
co
here
nce b
etw
een
cach
es
So
urc
e: U
PE
NN
's C
IS 6
65
tu
toria
ls: h
ttp
://w
ww
.se
as.u
pen
n.e
du/~
cis
665/
Str
ea
min
g M
ult
ipro
ce
ss
or
16
Str
ea
min
g M
ult
ipro
ce
ss
or
2
Str
ea
min
g M
ult
ipro
ce
ss
or
1
De
vic
e m
em
ory
Sh
are
d M
em
ory
(1
6K
B)
Ins
tru
cti
on
Un
it
Str
ea
min
gP
roc
es
so
r 1
Re
gis
ters
…S
tre
am
ing
Pro
ce
ss
or
2
Re
gis
ters
Str
ea
min
g
Pro
ce
ss
or
8
Re
gis
ters
Co
nsta
nt
Ca
ch
e
Te
xtu
reC
ac
he
Glo
ba
l, c
on
sta
nt,
te
xtu
re m
em
ori
es
60
An
oth
er
Vie
w o
f G
PU
Arc
hit
ec
ture
An
oth
er
Vie
w o
f G
PU
Arc
hit
ec
ture
��8800 G
TX
: 128 p
hysic
al
str
eam
ing
pro
cesso
rs
8800 G
TX
: 128 p
hysic
al
str
eam
ing
pro
cesso
rs
––E
qu
als
to
512 lo
gic
al
str
eam
ing
pro
cesso
rsE
qu
als
to
512 lo
gic
al
str
eam
ing
pro
cesso
rs
––32 t
hre
ad
s in
a w
arp
mu
st
be e
xecu
ted
in
lo
ck
32 t
hre
ad
s in
a w
arp
mu
st
be e
xecu
ted
in
lo
ck--s
tep
ste
p
––E
qu
als
to
16 s
tream
ing
mu
ltip
rocesso
rs
Eq
uals
to
16 s
tream
ing
mu
ltip
rocesso
rs
��16 p
rocesso
rs16 p
rocesso
rs
––E
ach
pro
cesso
r h
as 3
2E
ach
pro
cesso
r h
as 3
2--w
ide S
IMD
w
ide S
IMD
•F
lexib
le:
each S
IMD
lane c
an p
rocess it
s o
wn d
ata
•B
est perf
orm
ance w
hen
–W
hole
SIM
D w
arp
work
s t
ogeth
er
(no b
ranch d
iverg
ence)
–C
ontiguous p
iece o
f data
(coale
scin
g m
em
ory
accesses)
––W
ith
in e
ach
pro
cesso
r, d
ata
in
cach
e c
an
be s
hare
dW
ith
in e
ach
pro
cesso
r, d
ata
in
cach
e c
an
be s
hare
d
––L
imit
ed
hard
ware
cach
e c
oh
ere
nce b
etw
een
cach
es
Lim
ited
hard
ware
cach
e c
oh
ere
nce b
etw
een
cach
es
61
Dif
fere
nc
e in
Pro
gra
mm
ing
GP
UD
iffe
ren
ce
in
Pro
gra
mm
ing
GP
U
��M
em
ory
hie
rarc
hy o
pti
miz
ed
fo
r th
rou
gh
pu
t,
Mem
ory
hie
rarc
hy o
pti
miz
ed
fo
r th
rou
gh
pu
t,
no
t fo
r la
ten
cy
no
t fo
r la
ten
cy
––S
mall
er
cach
e a
nd
hig
her
mem
ory
ban
dw
idth
Sm
all
er
cach
e a
nd
hig
her
mem
ory
ban
dw
idth
––N
eed
s 1
000s o
f th
read
s f
or
full
eff
icie
ncy
Need
s 1
000s o
f th
read
s f
or
full
eff
icie
ncy
•192 thre
ads t
o h
ide
read-a
fter-
write
regis
ter
late
ncy*
��H
ard
to
syn
ch
ron
ize a
mo
ng
all
th
read
sH
ard
to
syn
ch
ron
ize a
mo
ng
all
th
read
s
––T
hre
ad
s i
n a
“th
read
blo
ck” c
an
co
op
era
te w
ith
T
hre
ad
s i
n a
“th
read
blo
ck” c
an
co
op
era
te w
ith
each
oth
er
via
(1)
the l
ock
each
oth
er
via
(1)
the l
ock--s
tep
execu
tio
n a
nd
(2)
ste
p e
xecu
tio
n a
nd
(2)
co
mm
un
icati
on
th
rou
gh
sh
are
d m
em
ory
co
mm
un
icati
on
th
rou
gh
sh
are
d m
em
ory
––T
hre
ad
s f
rom
dif
fere
nt
blo
cks h
ard
to
co
op
era
teT
hre
ad
s f
rom
dif
fere
nt
blo
cks h
ard
to
co
op
era
te
* S
ou
rce
: N
VU
DU
A C
UD
A C
om
pu
te D
evic
e A
rch
ite
ctu
re P
rog
ram
min
g G
uid
e v
ers
ion
1.1
, 1
1/2
9/2
007.
GP
U h
as s
mall
, lo
cal,
G
PU
has s
mall
, lo
cal,
sh
are
d
sh
are
d s
tora
ge.
sto
rag
e.
It is h
ard
It
is h
ard
to
syn
ch
ron
ize a
mo
ng
all
to
syn
ch
ron
ize a
mo
ng
all
th
read
s.
thre
ad
s.
62
Op
tim
iza
tio
n P
rin
cip
les
Op
tim
iza
tio
n P
rin
cip
les
��G
en
era
l p
rin
cip
les s
till a
pp
licab
le (
16 p
rocesso
rs)
Gen
era
l p
rin
cip
les s
till a
pp
licab
le (
16 p
rocesso
rs)
––A
vo
id s
eq
uen
tial d
ep
en
den
cie
sA
vo
id s
eq
uen
tial d
ep
en
den
cie
s
––M
axim
ize lo
ad
bala
nce
Maxim
ize lo
ad
bala
nce
––R
ed
uce o
verh
ead
Red
uce o
verh
ead
��In
cre
ase c
ach
e e
ffic
ien
cy
Incre
ase c
ach
e e
ffic
ien
cy
––16K
B s
hare
d m
em
ory
in
GP
U16K
B s
hare
d m
em
ory
in
GP
U
��In
cre
ase S
IMD
para
llelism
In
cre
ase S
IMD
para
llelism
––
Red
uce b
ran
ch
div
erg
en
ce (
32
Red
uce b
ran
ch
div
erg
en
ce (
32--w
ide S
IMD
in
a w
arp
)w
ide S
IMD
in
a w
arp
)
��A
vo
id s
yn
ch
ron
izati
on
A
vo
id s
yn
ch
ron
izati
on
––
Avo
id c
om
mu
nic
ati
on
wit
h o
ther
thre
ad
sA
vo
id c
om
mu
nic
ati
on
wit
h o
ther
thre
ad
s
––O
pera
tio
ns s
ho
uld
be in
dep
en
den
tO
pera
tio
ns s
ho
uld
be in
dep
en
den
t
��R
ed
uce g
lob
al m
em
ory
access (
400
Red
uce g
lob
al m
em
ory
access (
400--6
00 c
ycle
s)
600 c
ycle
s)
Inc
rea
se
Ca
ch
e E
ffic
ien
cy
Inc
rea
se
Ca
ch
e E
ffic
ien
cy
Take a
dva
nta
ge o
f sh
are
d c
ach
e
Take a
dva
nta
ge o
f sh
are
d c
ach
e
Ch
oo
se t
he “
rig
ht”
lo
op
fo
r p
ara
lleli
zati
on
Ch
oo
se t
he “
rig
ht”
lo
op
fo
r p
ara
lleli
zati
on
Syn
ch
ron
izati
on
overh
ead
vs.
mem
ory
pre
ssu
re
Syn
ch
ron
izati
on
overh
ead
vs.
mem
ory
pre
ssu
re 6
4
Me
mo
ry B
an
dw
idth
is
Cri
tic
al R
es
ou
rce
Me
mo
ry B
an
dw
idth
is
Cri
tic
al R
es
ou
rce
1
10
10
0
10
00 1
98
01
98
51
99
01
99
52
00
0
Ye
ar
Relative performance since 1980
Mic
rop
roce
sso
r M
IPS
~5
0%
annually
DR
AM
Band
wd
ith
~2
7%
annually
DR
AM
Late
ncy
~7
% a
nnually
So
urc
e: D
. A
. P
atte
rso
n, "L
ate
ncy L
ag
s
Ba
nd
wid
th,"
Co
mm
unic
atio
n o
f th
e A
CM
,
Vo
l. 4
7, n
o. 1
0, p
p. 7
1-7
5, O
ct. 2
00
4.
Takin
g a
dvan
tag
e o
f ca
ch
e i
s im
po
rtan
tTakin
g a
dvan
tag
e o
f ca
ch
e i
s im
po
rtan
t
core
L2 c
ach
e
core
core
L2 c
ach
e
core
core
L2 c
ach
e
core
core
L2 c
ach
e
core
core
L2 c
ach
e
core
core
L2 c
ach
e
core
core
L2 c
ach
e
core
core
L2 c
ach
e
core
Bri
dg
e
(wit
h 6
4M
B s
no
op
fil
ter
cach
e)
Off
-ch
ip b
us
inte
rco
nn
ect
C
hip
bo
un
da
ry
Xe
on
E7
34
0 p
ack
ag
e b
ou
nd
ary
core
L2 c
ach
e
core
core
core
. . .
core
core
core
core
. . .
Bi-d
ire
ctio
n rin
g in
terc
on
ne
ctt
Ch
ip b
ou
nd
ary
Sym
metr
ic M
ultip
rocessor
Syste
ms (
SM
P)
Chip
Multip
rocessor
(CM
P)
65
66
Op
tim
iza
tio
n f
or
Ch
ip M
ult
ipro
ce
ss
or
Op
tim
iza
tio
n f
or
Ch
ip M
ult
ipro
ce
ss
or
��C
hip
mu
ltip
rocesso
r C
hip
mu
ltip
rocesso
r
�S
malle
r la
st-
level on-d
ie c
ache
�F
aste
r in
ter-
core
on-d
ie c
om
munic
ation
��U
tili
ze
Uti
lize c
ach
e
cach
e &
in
ter
& i
nte
r--co
re
co
re c
om
mu
nic
ati
on
co
mm
un
icati
on
eff
icie
ntl
yeff
icie
ntl
y
�T
ake a
dvanta
ge o
f share
d c
ache
�C
hoose t
he “
right”
loop f
or
para
lleliz
ation
�B
ala
nce b
etw
een t
he s
ynchro
niz
ation o
verh
ead a
nd
mem
ory
pre
ssure
67
Ca
ch
e S
ha
rin
g
Ca
ch
e S
ha
rin
g
��E
ith
er
dis
rup
tive
or
co
ns
tru
cti
ve
Eit
he
r d
isru
pti
ve
or
co
ns
tru
cti
ve
��D
isru
pti
ve
:D
isru
pti
ve
:
––M
ult
iple
th
read
s h
ave t
heir
ow
n d
isti
nct
Mu
ltip
le t
hre
ad
s h
ave t
heir
ow
n d
isti
nct
wo
rkin
g
wo
rkin
g s
ets
sets
––T
hey
Th
ey c
om
pete
co
mp
ete
th
e
the l
imit
ed
li
mit
ed
cach
e r
eso
urc
ecach
e r
eso
urc
e
��C
on
str
uc
tive
:C
on
str
uc
tive
:
––M
ult
iple
th
read
s s
hare
th
e d
ata
bro
ug
ht
in
Mu
ltip
le t
hre
ad
s s
hare
th
e d
ata
bro
ug
ht
in
by o
ne o
r an
oth
er
by o
ne o
r an
oth
er
68
(b)
Dyn
am
ic
dis
patc
hin
g
(a)
Sta
tic
dis
patc
hin
g
pic
ture
slic
es
Assig
ned
slic
es
Thre
ad 1
Thre
ad 2
Thre
ad 1
Thre
ad 2
Pa
rall
eli
zin
g M
PE
G D
ec
od
er
Pa
rall
eli
zin
g M
PE
G D
ec
od
er
69
Fra
me t
Fra
me t+
1
All
local
cache h
its
Fra
me t
Fra
me t+
1
Som
e lo
cal
cache m
isses
Mo
tio
n C
om
pe
ns
ati
on
in
MP
EG
Mo
tio
n C
om
pe
ns
ati
on
in
MP
EG
(a)
Sta
tic
dis
patc
hin
g
(b)
Dyn
am
ic
dis
patc
hin
g
Spe
ed
up
s vs
Ca
che
Lo
calit
ies
0
0.2
0.4
0.6
0.81
1.2
1.4
1.6
Du
al-
pro
cesso
rH
yp
er-
thre
ad
ing
11
1.48
1.04
1.39
1.08
Speedup
Sin
gle
-th
read
Sta
tic
Dyn
am
ic•
Dyn
am
ic s
che
du
le in
curs
mo
re c
ach
e m
isse
s o
n d
ua
l-
pro
cess
or
–M
ore
bu
s tr
aff
ics
–Sp
ee
du
p is
lim
ite
d
•S
ha
rin
g c
ach
e o
n H
ype
r-
Th
rea
din
g e
nfo
rce
th
e
cach
e lo
calit
ies
–B
ett
er
spe
ed
up
be
cau
se o
f
be
tte
r lo
ad
ba
lan
ce
70
71
��A
po
pu
lar
mach
ine l
ea
rnin
g
A p
op
ula
r m
ach
ine l
ea
rnin
g t
ech
niq
ue
tech
niq
ue
��E
valu
ati
on
of
train
ed
SV
Ms i
s v
ery
str
uctu
red
E
valu
ati
on
of
train
ed
SV
Ms i
s v
ery
str
uctu
red
––
Can
be m
ult
ith
read
ed
at
mu
ltip
le levels
Can
be m
ult
ith
read
ed
at
mu
ltip
le levels
•T
he d
imensio
nalit
y K
of th
e in
put data
can b
e v
ery
larg
e•
The e
valu
ation o
f each e
xpre
ssio
n in t
he s
um
is
independent of each o
ther
•S
evera
l sam
ple
s a
re te
ste
d a
nd e
ach e
valu
ation c
an b
e
done in
para
llel.
Su
pp
ort
Ve
cto
r M
ac
hin
es
(S
VM
)S
up
po
rt V
ec
tor
Ma
ch
ine
s (
SV
M)
+
Φ
=∑ =
by
Fn i
ii
i
1
),
(si
gn
)(
xx
xα
72
Mu
lti
Mu
lti--
Th
read
ing
SV
M
Th
read
ing
SV
M
// L
INE
AR
KE
RN
EL
// L
INE
AR
KE
RN
EL
flo
at
lin
ea
r_k
ern
el(
co
ns
t Ip
pfl
oa
t lin
ea
r_k
ern
el(
co
ns
t Ip
p3
23
2f*
pS
rcf*
pS
rc11
, in
t le
n,
int
ind
ex
),
int
len
, in
t in
de
x)
{{Ip
pIp
p3
23
2f
res
ult
;f
res
ult
;ip
ps
Do
tPro
d_
ipp
sD
otP
rod
_3
23
2f(
pS
rcf(
pS
rc11
, s
up
po
rtV
ec
tor[
ind
ex
], l
en
, &
res
ult
);, s
up
po
rtV
ec
tor[
ind
ex
], l
en
, &
res
ult
);re
turn
re
su
lt *
co
eff
s[i
nd
ex
];re
turn
re
su
lt *
co
eff
s[i
nd
ex
];}} in
t m
ain
() {
int
ma
in()
{#
pra
gm
a o
mp
pa
ralle
l fo
r #
pra
gm
a o
mp
pa
ralle
l fo
r fo
r (i
nt
j=fo
r (i
nt
j=0
0
; j
<N
UM
_S
AM
PL
ES
; j+
+)
{;
j<N
UM
_S
AM
PL
ES
; j+
+)
{fl
oa
t s
um
=fl
oa
t s
um
=00
;;#
pra
gm
a o
mp
pa
ralle
l fo
r re
du
cti
on
(+
:su
m)
#p
rag
ma
om
p p
ara
lle
l fo
r re
du
cti
on
(+
:su
m)
for
(in
t i=
for
(in
t i=
0
0 ;
i<
NU
M_
SU
PP
_V
EC
; i+
+)
{;
i<N
UM
_S
UP
P_
VE
C;
i++
) {
//
// 1
00
01
00
0fl
oa
t tm
p =
lin
ea
r_k
ern
el(
&s
am
ple
s[j
], N
UM
_V
EC
_D
IM, i)
;
//
flo
at
tmp
= l
ine
ar_
ke
rne
l(&
sa
mp
les
[j],
NU
M_
VE
C_
DIM
, i)
;
//
24
24
**24
24
su
m +
= t
mp
; s
um
+=
tm
p;
}} res
ult
[j]
= s
um
; re
su
lt[j
] =
su
m;
}}}}
73
Mu
tua
l P
refe
tch
ing
Mu
tua
l P
refe
tch
ing
��C
ach
e l
ocali
ty i
s
Cach
e l
ocali
ty i
s k
ey
key
��A
th
read
sch
ed
uli
ng
A
th
read
sch
ed
uli
ng
to
en
forc
e l
ocali
ties
to e
nfo
rce l
ocali
ties
��T
wo
th
read
s
Tw
o t
hre
ad
s
––R
eq
uir
e t
he s
am
e
Req
uir
e t
he s
am
e
data
sets
d
ata
sets
––P
refe
tch
Pre
fetc
hd
ata
fo
r d
ata
fo
r each
oth
er
each
oth
er
––M
utu
al
Mu
tual p
refe
tch
ing
pre
fetc
hin
g
��E
xcell
en
t sp
eed
up
s
Excell
en
t sp
eed
up
s
on
Hyp
er
on
Hyp
er--
Th
read
ing
Th
read
ing
Fir
st
ou
ter
Fir
st
ou
ter--
loo
p
loo
p
para
llelism
para
llelism S
eco
nd
Seco
nd
--lo
op
lo
op
p
ara
lleli
sm
para
lleli
sm
X1*V
1X
1*V
1
XX11*V*V
22X
1*V
3X
1*V
3X
2*V
1X
2*V
1
X2*V
2X
2*V
2
X2*V
3X
2*V
3 74
SV
M o
n
SV
M o
n H
yp
er
Hyp
er--
Th
rea
din
g T
ec
hn
olo
gy
Th
rea
din
g T
ec
hn
olo
gy
Pe
rfo
rma
nc
e o
n d
iffe
ren
t a
rch
ite
ctu
re
11
1.6
1.5
81.6
41.7
3
3.0
22.6
6
0
0.51
1.52
2.53
3.5
SV
M-l-n
SV
M-r
-n
Speed-up
SP
SP
+ H
TD
PD
P +
HT
Bett
er
cach
e u
tili
zati
on
can
Bett
er
cach
e u
tili
zati
on
can
imp
rove p
erf
orm
an
ce s
ign
ific
an
tly
imp
rove p
erf
orm
an
ce s
ign
ific
an
tly
75
Pa
rt 1
(2)
Ea
rlie
r P
ara
lle
liza
tio
n S
ch
em
eE
arl
ier
Pa
rall
eli
za
tio
n S
ch
em
e
��C
ach
eC
ach
e--u
naw
are
un
aw
are
��C
ach
eC
ach
e--a
ware
aw
are
Pa
rt 1
(1)
Pa
rt 2
(2)
Pa
rt 1
(3)
Pa
rt 2
(3)
Pa
rt 1
(4)
Pa
rt 2
(4)
Pro
ce
sso
r 1
Pro
ce
sso
r 2
Pa
rt 1
(1)
Pa
rt 2
(1)
Pa
rt 1
(2)
Pa
rt 2
(2)
Pa
rt 1
(3)
Pa
rt 2
(3)
Pa
rt 1
(4)
Pa
rt 2
(4)
Pro
ce
sso
r 1
Pro
ce
sso
r 2
Pa
rt 2
(1)
76
Co
ars
eC
oa
rse
--Gra
ine
d a
nd
Fin
eG
rain
ed
an
d F
ine
--Gra
ine
d
Gra
ine
d
Pa
rall
eli
ze
d G
CU
T o
n S
MP
& C
MP
Pa
rall
eli
ze
d G
CU
T o
n S
MP
& C
MP
Sp
eed
up
on
SM
P (
p.
Sp
eed
up
on
SM
P (
p. 55
55))
Sp
eed
up
on
CM
PS
peed
up
on
CM
P
Sh
ari
ng
cach
e i
mp
roves p
erf
orm
an
ce
Sh
ari
ng
cach
e i
mp
roves p
erf
orm
an
ce
048
12
16
04
812
16
Num
ber
of C
ore
s
Parallel Speedup
Coars
e-G
rain
ed
Fin
e-G
rain
ed
0
16
32
48
64
016
32
48
64
Num
ber
of C
ore
s
Parallel Speedup
Coars
e-G
rain
ed
Fin
e-G
rain
ed
••23%
accesses t
ou
ch
sh
are
d d
ata
in
23%
accesses t
ou
ch
sh
are
d d
ata
in
fin
efi
ne--g
rain
ed
sch
em
e
gra
ined
sch
em
e
Ch
oo
se t
he “
Rig
ht”
Lo
op
fo
r P
ara
lleli
zati
on
Ch
oo
se t
he “
Rig
ht”
Lo
op
fo
r P
ara
lleli
zati
on
78
Co
ars
eC
oa
rse
--Gra
ine
d a
nd
Fin
eG
rain
ed
an
d F
ine
--Gra
ine
d
Gra
ine
d
Pa
rall
eli
ze
d H
ou
gh
P
ara
lle
lize
d H
ou
gh
Tra
ns
form
T
ran
sfo
rm o
n C
MP
on
CM
P
Sp
eed
up
Sp
eed
up
Co
ars
eC
oars
e--g
rain
ed
sp
eed
gra
ined
sp
eed
--up
u
p
of
vs.
siz
e o
f o
no
f vs.
siz
e o
f o
n--d
ie c
ach
ed
ie c
ach
e
Ou
ter
Ou
ter--
loo
p p
ara
lleli
zati
on
red
uces s
yn
ch
ron
izati
on
lo
op
para
lleli
zati
on
red
uces s
yn
ch
ron
izati
on
o
verh
ea
d,
bu
t in
cre
ases m
em
ory
de
man
do
verh
ea
d,
bu
t in
cre
ases m
em
ory
de
man
d
0
16
32
48
64
016
32
48
64
Num
ber
of C
ore
s
Parallel Speedup
Coars
e-G
rain
ed
Fin
e-G
rain
ed
0
16
32
48
64
01
63
24
86
4
Nu
mb
er
of C
ore
s
Parallel Speedup
16
MB
64
MB
25
6M
B
79
Co
ars
eC
oa
rse
--Gra
ine
d a
nd
Fin
eG
rain
ed
an
d F
ine
--Gra
ine
d
Gra
ine
d
Pa
rall
eli
ze
d
Pa
rall
eli
ze
d A
da
Bo
os
tA
da
Bo
os
to
n C
MP
on
CM
P
Sp
eed
up
Sp
eed
up
Cach
e m
iss r
ate
s v
s.
Cach
e m
iss r
ate
s v
s.
siz
e o
f o
nsiz
e o
f o
n--d
ie c
ach
ed
ie c
ach
e
Hig
her
mem
ory
dem
an
d
Hig
her
mem
ory
dem
an
d � ���� ���
Lo
wer
perf
orm
an
ce
Lo
wer
perf
orm
an
ce
0
16
32
48
64
016
32
48
64
Num
ber
of C
ore
s
Parallel Speedup
Vid
eo F
ram
es
Fra
me P
art
ition
02468
10
12
512K
1M
2M
4M
8M
16M
32M
64M
128M
Cache S
ize
Misses per Kilo Instructions (MPKI)
Vid
eo F
ram
es
Fra
me P
art
ition
Bala
nce B
etw
een
Syn
ch
ron
izati
on
Overh
ea
d a
nd
B
ala
nce B
etw
een
Syn
ch
ron
izati
on
Overh
ea
d a
nd
M
em
ory
Pre
ssu
re
Mem
ory
Pre
ssu
re
81
Ho
ug
h T
ran
sfo
rm
Ho
ug
h T
ran
sfo
rm P
ara
lle
lize
d
Pa
rall
eli
ze
d v
ia
via
L
oc
k v
s.
Th
rea
d P
riva
tiza
tio
n
Lo
ck
vs
. T
hre
ad
Pri
va
tiza
tio
n
Sp
eed
up
on
SM
PS
peed
up
on
SM
PS
peed
up
on
CM
PS
peed
up
on
CM
P
0
16
32
48
64
016
32
48
64
Num
ber
of C
ore
s
Parallel Speedup
Fin
e-G
rain
ed L
ock
Thre
ad P
rivi
tizatio
n
048
12
16
04
812
16
Num
ber
of C
ore
s
Parallel Speedup
Fin
e-G
rain
ed L
ock
Thre
ad P
riva
tizatio
n
Th
read
pri
vati
zati
on
in
cre
ases m
em
ory
pre
ssu
reT
hre
ad
pri
vati
zati
on
in
cre
ases m
em
ory
pre
ssu
re
82
Ou
tlin
eO
utl
ine
��M
oti
vati
on
& c
urr
en
t tr
en
ds
Mo
tivati
on
& c
urr
en
t tr
en
ds
��T
heo
ry
Th
eo
ry a
nd
pri
ncip
les i
n
an
d p
rin
cip
les i
n p
ara
lleli
zati
on
para
lleli
zati
on
��A
dvan
ce
d
Ad
van
ce
d o
pti
miz
ati
on
tech
niq
ues i
n m
ult
io
pti
miz
ati
on
tech
niq
ues i
n m
ult
i--co
re
co
re a
rch
itectu
re (
arc
hit
ectu
re (
CP
U a
nd
GP
U h
ave
CP
U a
nd
GP
U h
ave
dif
fere
nt
treatm
en
ts)
dif
fere
nt
treatm
en
ts)
��A
sp
ecif
ic d
esig
n e
xam
ple
A
sp
ecif
ic d
esig
n e
xam
ple
––In
cre
ase S
IMD
para
lle
lism
In
cre
ase S
IMD
para
lle
lism
––A
vo
id s
yn
ch
ron
izati
on
Avo
id s
yn
ch
ron
izati
on
��S
um
mary
Su
mm
ary
Co
nti
nu
ou
s Sp
ee
ch R
eco
gnit
ion
83
WFS
T R
eco
gn
itio
n N
etw
ork
HO
P
ON
PO
P
CA
T
HA
T
IN TH
E
...
...
...
...
...
CAT
HAT
...
...
HOPIN...
ONPOP...
THE...
Big
ram
Lan
gu
age
Mo
de
l
…
Feat
ure
s
fro
m o
ne
fra
me
...
HO
P h
ha
ap
...
ON
a
an
...
PO
P p
aa
p
...
Pro
nu
nci
ati
on
Mo
de
l
aa
hh
n
HM
M A
cou
stic
Ph
on
e M
od
el
Ga
uss
ian
Mix
ture
Mo
de
l
for
On
e P
ho
ne
Sta
te
…
…
………
……
Mix
ture
Co
mp
on
en
ts
Co
mp
uti
ng
dis
tan
ce t
o
ea
ch m
ixtu
re
com
po
ne
nts
Co
mp
uti
ng
we
igh
ted
su
m
of
all
co
mp
on
en
ts
Co
nti
nu
ou
s Sp
ee
ch R
eco
gnit
ion
84
Infe
ren
ce E
ng
ine
•H
iera
rch
ica
l str
uct
ure
–It
era
tive
ou
ter
loo
p o
ver
tim
e s
tep
s
–P
ipe
line
of
op
era
tio
ns
in e
ach
tim
e s
tep
–Se
t o
f a
lte
rna
tive
hyp
oth
esi
s to
ad
van
ce
Ph
ase
1
Ph
ase
2
Ph
ase
3
On
e it
er
pe
r
tim
e s
tep
:(~
60
M i
nst
)
Ob
sp
rob
com
pu
te
No
n-e
ps
trav
ers
al
Ep
silo
n t
rave
rsa
l
Mu
ltip
le s
tep
s in
a
ph
ase
, ea
ch h
as:
10
00
s to
10
,00
0s
con
curr
en
t ta
sks
(10
to
50
0 i
nst
r.)
Co
mp
ute
In
ten
siv
e
Co
mm
un
ica
tio
n
Inte
nsi
ve
Ex
ten
siv
e f
ine
-gra
ine
d
pa
rall
eli
sm a
t th
e i
nn
er
mo
st l
ev
el
Se
qu
en
tia
l o
pe
rati
on
wit
h i
tera
tio
n t
o i
tera
tio
n
de
pe
nd
en
cie
s
85
Re
cog
nit
ion
Pro
cess
•P
ha
se 1
:–
Ob
serv
ati
on
pro
ba
bili
ty c
om
pu
tati
on
o
nly
re
qu
ire
d f
or
ou
t-go
ing
arc
s o
f a
ctiv
e s
tate
s
–H
igh
ly c
om
pu
te in
ten
sive
ste
p
•P
ha
se 2
:–
Trav
ers
e o
ut-
goin
g n
on
-ep
silo
n a
rcs
fro
m a
ctiv
e s
tate
s
–W
rite
co
nte
nti
on
mu
st b
e r
eso
lve
d a
t th
e d
est
ina
tio
n s
tate
s
–D
est
ina
tio
n s
tate
is u
pd
ate
d w
ith
th
e
mo
st-l
ike
ly in
-co
min
g a
rc
•P
ha
se 3
:–
Trav
ers
e o
ut-
goin
g e
psi
lon
arc
s to
co
mp
lete
th
e it
era
tio
n
86
Re
cog
nit
ion
is a
pro
cess
of
gra
ph
tra
vers
al
Re
cog
nit
ion
is a
pro
cess
of
gra
ph
tra
vers
al
Ph
ase
1
Ph
ase
2
Ph
ase
3
Ob
sp
rob
com
pu
te
No
n-e
ps
trav
ers
al
Ep
silo
n t
rave
rsa
l
WFS
T R
eco
gn
itio
n N
etw
ork
WFS
T R
eco
gn
itio
n N
etw
ork
WFS
T R
eco
gn
itio
n N
etw
ork
Gra
ph
Tra
vers
al
87
1 2 3 4
5 6 7
81
01
2
91
1
1 2 3 4
5 6 7
81
01
2
91
1
Arc
evalu
ati
on
:S
ou
rce s
tate
co
st
+ O
bs
erv
ati
on
Pro
b. +
Arc
weig
ht
Infe
ren
ce E
ng
ine
Ch
alle
ng
es
88
Co
reC
ore
Co
reC
ore
Co
re
Co
re
Co
re
$ $ $
Co
re
Co
re
Co
re
$ $ $
Sy
nch
ron
iza
tio
n
SIM
D E
ffic
ien
cy
Sy
nch
ron
iza
tio
n
SIM
D E
ffic
ien
cy
WFS
T R
eco
gn
itio
n N
etw
ork
•A
pp
lica
tio
n:
–P
ara
llel g
rap
h t
rave
rsa
l th
rou
gh
irre
gu
lar
ne
two
rk
–C
on
tin
uo
usl
y ch
an
gin
g w
ork
ing
se
t a
t ru
nti
me
•C
ha
llen
ge
s:–
Effi
cie
ntl
y sy
nch
ron
ize
be
twe
en
co
ncu
rre
nt
task
s
–Ef
fect
ive
ly u
tiliz
e a
ll le
vels
of
pa
ralle
l re
sou
rce
s, in
clu
din
g S
IMD
Wri
te C
on
flic
t in
Arc
Tra
vers
al
•U
pd
ate
of
de
stin
ati
on
sta
te
–M
inim
um
of
inco
min
g c
ost
sho
uld
be
ad
op
ted
•P
oss
ible
ap
pro
ach
es
–M
ake
up
da
te a
tom
ic
–P
riva
tiza
tio
n
–Lo
ck b
ase
d im
ple
me
nta
tio
n
89
1 2 3 4
5 6
Thre
ad 0
Thre
ad 1
De
sig
n T
rad
e-o
ffs
for
Syn
chro
niz
ati
on
•C
ha
llen
ge:
–T
he
co
st f
or
wri
te-c
on
flic
t re
solu
tio
n
•E
xpe
rim
en
t:–
Allo
w t
rave
rsa
l to
eit
he
r p
rop
aga
te f
rom
so
urc
e o
r a
gg
rega
te a
t d
est
ina
tio
n f
or
wri
te c
on
flic
t re
solu
tio
n
90
Ad
va
nta
ge
sD
isa
dv
an
tag
es
Fig
ure
Tra
ve
rsa
l b
y
Pro
pa
ga
tio
n
Ea
sy t
o p
rog
ram
, H
W
ha
nd
les
wri
te c
on
flic
ts
tra
nsp
are
ntl
y
On
e a
tom
ic o
pe
rati
on
fo
r
eve
ry a
rc,
larg
e n
um
be
r o
f
ato
mic
op
era
tio
ns
ma
kes
it
sen
siti
ve t
o a
tom
ic o
pe
rati
on
late
ncy
Tra
ve
rsa
l b
y
Ag
gre
ga
tio
n
Exp
lici
t re
solu
tio
n o
f
wri
te c
on
flic
ts,
no
t
sen
siti
ve t
o H
W
eff
icie
ncy
of
ato
mic
op
era
tio
ns
Invo
lve
s si
gn
ific
an
t o
verh
ea
d
in b
uil
din
g li
sts
of
to-b
e-
up
da
ted
de
stin
ati
on
sta
tes
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
Syn
chro
niz
ati
on
Co
st
•T
he
fix
ed
co
st
(ove
rhe
ad
) o
f a
gg
reg
ati
on
te
chn
iqu
e is
si
gn
ific
an
t
•R
ela
tive
gra
die
nt
of
pro
pa
ga
tio
n a
nd
a
gg
reg
ati
on
te
chn
iqu
es
de
pe
nd
o
n t
he
eff
icie
ncy
of
the
pla
tfo
rm in
re
solv
ing
wri
te
con
flic
ts
91
0
0.51
1.52
2.53
3.5
05
01
00
Total Synchronization Cost [sec]
Nu
mb
er
of
Arc
s S
yn
chro
niz
ed
[M
illi
on
s o
f A
rcs]
Measure
d s
ynchro
niz
ation c
ost in
GT
X280
SIM
D U
tiliz
ati
on
Eff
icie
ncy
•C
ha
llen
ge
:–
Ve
cto
r u
nit
eff
icie
ncy
ca
n q
uic
kly
dro
p o
ff w
ith
incr
ea
sed
ve
cto
r w
idth
•E
xpe
rim
en
t:–
Trav
ers
e t
he
re
cog
nit
ion
ne
two
rk b
ase
d o
n a
ctiv
e s
tate
s o
r a
ctiv
e a
rcs 9
2
Ad
va
nta
ge
sD
isa
dv
an
tag
es
Fig
ure
Act
ive
Sta
tes
Ea
sy t
o p
rog
ram
, a
ll
act
ive
arc
s e
mit
fro
m
act
ive
sta
tes
Ou
t-d
eg
ree
of
act
ive
sta
tes
vari
es
wid
ely
, d
iffi
cult
to
fu
lly
uti
lize
ve
cto
r u
nit
s
Act
ive
Arc
sP
ara
lle
liza
tio
n a
t fi
ne
r
gra
nu
lari
ty,
can
ach
ieve
be
tte
r lo
ad
ba
lan
cin
g
Mo
re in
form
ati
on
to
ma
inta
in
fro
m it
era
tio
n t
o i
nte
ract
ion
,
the
re a
re a
lwa
ys m
ore
act
ive
arc
s th
an
act
ive
sta
tes
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
SIM
D U
tiliz
ati
on
Eff
icie
ncy
93
Act
ive
Sta
tes
Ma
pp
ed
on
to S
IMD Tim
e
SIM
D
Uti
liza
tio
nE
xtr
a w
ork
0%
10
%
20
%
30
%
40
%
50
%
60
%
70
%
80
%
90
%
10
0%
0123456789
10
12
48
16
32
SIMD Utilization
SpeedupS
IMD
Wid
th
Sp
ee
du
p a
nd
SIM
D E
ffic
ien
cy
in S
tate
Ba
sed
Tra
ve
rsa
l
Spe
ed
up
Ove
r
Seq
ue
nti
al C
ase
SIM
D
Uti
liza
tio
n
Tra
vers
al b
y s
tate
s
De
sig
n S
pa
ce
94
Tra
ve
rsa
l b
y P
rop
ag
ati
on
Tra
ve
rsa
l b
y A
gg
reg
ati
on
Act
ive
Sta
tes
Ma
inta
in a
ctiv
e s
ou
rce
sta
tes,
pro
pa
gate
ou
t-a
rc c
om
pu
tati
on
re
sult
s
to d
est
ina
tio
n s
tate
Ma
inta
in a
ctiv
e d
est
ina
tio
n s
tate
s,
de
term
ine
all
po
ten
tia
l d
est
ina
tio
n
sta
tes
an
d a
gg
rega
te i
nco
min
g a
rcs
Act
ive
Arc
s
Ma
inta
in a
ctiv
e a
rcs,
pro
pa
gate
act
ive
arc
co
mp
uta
tio
n r
esu
lts
to d
est
ina
tio
n
sta
te
Ma
inta
in a
ctiv
e a
rcs,
gro
up
arc
s w
ith
sam
e d
est
ina
tio
n s
tate
s a
nd
ag
gre
gate
act
ive
arc
s lo
call
y to
re
solv
e w
rite
con
flic
ts
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
Cu
rre
nt
Sta
tes
Ne
xt
Sta
tes
Pe
rfo
rma
nce
on
CP
U
95
Sta
te-b
ase
d A
gg
rega
tio
n
RT
F: 2
.59
3;
1.2
x
Arc
-ba
sed
Pro
pa
gati
on
RT
F:
1.0
06
; 3
.2x
Sta
te-b
ase
d P
rop
aga
tio
n
RT
F:
0.9
25
; 3
.4x
0.7
37
0.2
42
0.0
26
0.0
01
0.7
32
0.1
57
0.0
35
0.0
01
0.7
54
1.3
56
0.4
82
0.0
01
Seq
ue
nti
al
RT
F:3
.17
; 1
x
RT
F:
Re
al T
ime
Fa
cto
r
3.4
x:S
pe
ed
up
vs
Se
q
Ob
s. P
rob
. C
om
p.
No
n-e
ps
Tra
vers
al
Ep
sTr
ave
rsa
l
Se
q.
Ove
rhe
ad
2.6
23
0.4
74
0.0
73
Pe
rfo
rma
nce
on
GP
U
96
Arc
-ba
sed
Ag
gre
gati
on
RT
F: 0
.91
2;
3.5
x
Sta
te-b
ase
d A
gg
rega
tio
n
RT
F: 1
.20
3;
2.6
x
Arc
-ba
sed
Pro
pa
gati
on
RT
F:
0.3
02
; 1
0.5
x
Sta
te-b
ase
d P
rop
aga
tio
n
RT
F:0
.77
6; 4
.1x
0.1
48
0.1
03
0.0
43
0.0
08
0.1
48
0.5
12
0.1
08
0.0
08
0.1
48
0.4
69
0.2
81
0.0
14
0.1
47
0.7
7
0.2
72
0.0
14
Seq
ue
nti
al
RT
F:3
.17
; 1
x
RT
F:
Re
al T
ime
Fa
cto
r
10
.5x:
Sp
ee
du
p vs
Se
q
Ob
s. P
rob
. C
om
p.
No
n-e
ps
Tra
vers
al
Ep
sTr
ave
rsa
l
Se
q.
Ove
rhe
ad
2.6
23
0.4
74
0.0
73
Faste
st
alg
ori
thm
sty
le d
iffe
rs b
etw
een
pla
tfo
rms
97
Ou
tlin
eO
utl
ine
��M
oti
vati
on
& c
urr
en
t tr
en
ds
Mo
tivati
on
& c
urr
en
t tr
en
ds
��T
heo
ry
Th
eo
ry a
nd
pri
ncip
les i
n
an
d p
rin
cip
les i
n p
ara
lleli
zati
on
para
lleli
zati
on
��A
dvan
ce
d
Ad
van
ce
d o
pti
miz
ati
on
tech
niq
ues i
n m
ult
io
pti
miz
ati
on
tech
niq
ues i
n m
ult
i--co
re
co
re a
rch
itectu
re (
arc
hit
ectu
re (
CP
U a
nd
GP
U h
ave
CP
U a
nd
GP
U h
ave
dif
fere
nt
treatm
en
ts)
dif
fere
nt
treatm
en
ts)
��A
sp
ecif
ic d
esig
n e
xam
ple
A
sp
ecif
ic d
esig
n e
xam
ple
��S
um
mary
Su
mm
ary
98
Ke
y L
ea
rnin
gs
Ke
y L
ea
rnin
gs
��A
vo
id s
eri
al d
ep
en
den
cie
sA
vo
id s
eri
al d
ep
en
den
cie
s––
Para
llelize a
s m
uch
as p
oss
ible
Para
llelize a
s m
uch
as p
oss
ible
––R
eR
e--o
rder
co
nstr
ain
tso
rder
co
nstr
ain
ts
��In
cre
ase lo
ad
bala
nce
Incre
ase lo
ad
bala
nce
––D
yn
am
ic t
ask a
ssig
nm
en
tD
yn
am
ic t
ask a
ssig
nm
en
t––
Make t
asks s
maller
Make t
asks s
maller
––F
use m
ult
iple
lo
op
sF
use m
ult
iple
lo
op
s––
Data
Data
--do
main
deco
mp
osit
ion
in
ste
ad
of
fun
cti
on
al
do
main
deco
mp
osit
ion
in
ste
ad
of
fun
cti
on
al--
do
main
d
om
ain
d
eco
mp
osit
ion
deco
mp
osit
ion
��R
ed
uce o
verh
ead
Red
uce o
verh
ead
––P
ara
llelize t
he o
ute
r lo
op
Para
llelize t
he o
ute
r lo
op
––R
ed
uce c
om
mu
nic
ati
on
Red
uce c
om
mu
nic
ati
on
––R
ed
uce r
ep
eate
d c
om
pu
tati
on
Red
uce r
ep
eate
d c
om
pu
tati
on
––R
ed
uce l
ockin
g o
verh
ead
Red
uce l
ockin
g o
verh
ead
��U
tilize c
ach
e e
ffic
ien
tly
Uti
lize c
ach
e e
ffic
ien
tly
––T
ake a
dvan
tag
e o
f sh
ari
ng
cach
e
Take a
dvan
tag
e o
f sh
ari
ng
cach
e
––C
ho
ose t
he “
rig
ht”
lo
op
fo
r p
ara
llelizati
on
Ch
oo
se t
he “
rig
ht”
lo
op
fo
r p
ara
llelizati
on
––B
ala
nce b
etw
een
syn
ch
ron
izati
on
overh
ead
& m
em
ory
pre
ssu
re
Bala
nce b
etw
een
syn
ch
ron
izati
on
overh
ead
& m
em
ory
pre
ssu
re
Ot
t
tt
iP
N is
Ps
++
+
=]
[m
ax 1
99
Su
mm
ary
S
um
ma
ry
Mu
lti
Mu
lti--
Co
re is
Go
ing
Ma
ins
tre
am
Co
re is
Go
ing
Ma
ins
tre
am
��A
rchitectu
re:
Arc
hitectu
re: N
um
ber
of
core
s p
er
chip
will
gro
w q
uic
kly
Num
ber
of
core
s p
er
chip
will
gro
w q
uic
kly
––P
ow
er
co
nsu
mp
tio
n is
Po
wer
co
nsu
mp
tio
n is a
key d
river
a k
ey d
river
��A
lgorith
m:
Alg
orith
m: F
utu
re p
rocessors
dem
and s
pecia
l F
utu
re p
rocessors
dem
and s
pecia
l desig
ns
desig
ns
––M
ust
co
nsid
er
para
lleli
sm
Mu
st
co
nsid
er
para
lleli
sm
•A
lgorith
ms o
ften n
eed c
hang
es
��A
lgorith
mA
lgorith
m--A
rchitectu
re C
oA
rchitectu
re C
o--E
xplo
ration
Explo
ration
––A
rch
itectu
res w
ith
dif
fere
nt
ch
ara
cte
risti
cs n
eed
dif
fere
nt
Arc
hit
ectu
res w
ith
dif
fere
nt
ch
ara
cte
risti
cs n
eed
dif
fere
nt
imp
lem
en
tati
on
an
d o
pti
miz
ati
on
str
ate
gie
sim
ple
men
tati
on
an
d o
pti
miz
ati
on
str
ate
gie
s
100
New
Pa
rad
igm
N
ew
Pa
rad
igm
��C
hall
en
ge t
rad
itio
nal
ways o
f d
oin
g/t
hin
kin
gC
hall
en
ge t
rad
itio
nal
ways o
f d
oin
g/t
hin
kin
g
––N
um
ero
us e
merg
ing
ap
pli
cati
on
s a
re e
nab
led
by
Nu
mero
us e
merg
ing
ap
pli
cati
on
s a
re e
nab
led
by
mu
lti
mu
lti--
co
re c
om
pu
tin
gco
re c
om
pu
tin
g
––D
evelo
p p
ara
llel so
ftw
are
an
d a
lgo
rith
ms f
or
futu
re
Develo
p p
ara
llel so
ftw
are
an
d a
lgo
rith
ms f
or
futu
re
syste
ms w
ith
man
y c
ore
ssyste
ms w
ith
man
y c
ore
s
•Learn
para
llel pro
gra
mm
ing,
if y
ou h
aven
’t
•“T
he m
ost im
port
ant
new
thin
gs w
e m
ust
teach
stu
dents
in c
om
puting is h
ow
to thin
k in p
ara
llel.”
---
John O
wens,
UC
Davis
101
Fo
otn
ote
Fo
otn
ote
��M
an
y w
ays t
o w
rite
mu
lti
Man
y w
ays t
o w
rite
mu
lti--
thre
ad
ed
co
des,
thre
ad
ed
co
des,
e.g
.,
e.g
.,
––C
UD
A
CU
DA
(cf.
, p
ag
e
(cf.
, p
ag
e 1
111))
––W
in32
Win
32 t
hre
ad
s (
cf.
, p
ag
e
thre
ad
s (
cf.
, p
ag
e 1
9)
19)
––P
OS
IX T
hre
ad
s,
a.k
.a.,
P
OS
IX T
hre
ad
s,
a.k
.a.,
pth
read
pth
read
––O
pen
MP
Op
en
MP
(cf.
, p
ag
e
(cf.
, p
ag
e 7
2)
72
)
��In
tel als
o p
rovid
es t
oo
ls t
o h
elp
develo
p
Inte
l als
o p
rovid
es t
oo
ls t
o h
elp
develo
p
thre
ad
ed
so
ftw
are
, e.g
.,
thre
ad
ed
so
ftw
are
, e.g
.,
(( htt
p:/
/so
ftw
are
.in
tel.co
m/e
nh
ttp
://s
oft
ware
.in
tel.co
m/e
n--u
s/i
nte
lu
s/i
nte
l--p
ara
llel
para
llel--
stu
dio
stu
dio
--h
om
e/)
ho
me/)
––In
tel®
Para
llel C
om
po
ser
Inte
l® P
ara
llel C
om
po
ser
––In
tel
Inte
l®
® P
ara
llel In
sp
ec
tor
Para
llel In
sp
ec
tor
––In
tel
Inte
l® P
ara
llel A
mp
lifi
er
® P
ara
llel A
mp
lifi
er
Re
fere
nc
es
Re
fere
nc
es
103
Jo
urn
al P
ap
ers
Jo
urn
al P
ap
ers
��"I
mag
e P
rocessin
g o
n M
ult
i"I
mag
e P
rocessin
g o
n M
ult
i--C
ore
x86 A
rch
itectu
res
Co
re x
86 A
rch
itectu
res --
Op
tim
izati
on
Tech
niq
ues a
nd
O
pti
miz
ati
on
Tech
niq
ues a
nd
E
xam
ple
s,"
D. K
im, V
. L
ee, an
d Y
.E
xam
ple
s,"
D. K
im, V
. L
ee, an
d Y
.--K
. C
hen
, to
ap
pear
in IE
EE
Sig
nal P
rocessin
g
K. C
hen
, to
ap
pear
in IE
EE
Sig
nal P
rocessin
g
Mag
azin
e, M
arc
h 2
010.
Mag
azin
e, M
arc
h 2
010.
��"A
lgo
rith
m/A
rch
itectu
re C
o"A
lgo
rith
m/A
rch
itectu
re C
o--E
xp
lora
tio
n o
f V
isu
al C
om
pu
tin
g o
n E
merg
ing
E
xp
lora
tio
n o
f V
isu
al C
om
pu
tin
g o
n E
merg
ing
P
latf
orm
s,"
G. G
. C
. L
ee, Y
.P
latf
orm
s,"
G. G
. C
. L
ee, Y
.--K
. C
hen
, M
. M
att
avelli, a
nd
E. S
. Jan
g, IE
EE
K
. C
hen
, M
. M
att
avelli, a
nd
E. S
. Jan
g, IE
EE
T
ran
sacti
on
s o
n C
ircu
its a
nd
Syste
ms f
or
Vid
eo
Tech
no
log
y, vo
l. 1
9, n
o. 11, p
p.
Tra
nsacti
on
s o
n C
ircu
its a
nd
Syste
ms f
or
Vid
eo
Tech
no
log
y, vo
l. 1
9, n
o. 11, p
p.
1576
1576--1
587,
No
v 2
009.
1587,
No
v 2
009.
��"P
ara
llel S
cala
bilit
y i
n S
peech
Reco
gn
itio
n
"Para
llel S
cala
bilit
y i
n S
peech
Reco
gn
itio
n --
Infe
ren
ce E
ng
ines in
Larg
e V
ocab
ula
ry
Infe
ren
ce E
ng
ines in
Larg
e V
ocab
ula
ry
Co
nti
nu
ou
s S
peech
Reco
gn
itio
n,"
K. Y
ou
, J.
Ch
on
g, Y
. Y
i, E
. G
on
ina, C
. H
ug
hes,
Co
nti
nu
ou
s S
peech
Reco
gn
itio
n,"
K. Y
ou
, J.
Ch
on
g, Y
. Y
i, E
. G
on
ina, C
. H
ug
hes,
Y.
Y.--
K. C
hen
, W
. S
un
g a
nd
K.
K. C
hen
, W
. S
un
g a
nd
K. K
eu
tzer
Keu
tzer,
IE
EE
Sig
nal P
rocessin
g M
ag
azin
e, vo
l. 2
6,
no
. , IE
EE
Sig
nal P
rocessin
g M
ag
azin
e, vo
l. 2
6,
no
. 6, p
p. 124
6, p
p. 124--1
35,
No
v 2
009.
135,
No
v 2
009.
��"P
ara
llelizati
on
Str
ate
gie
s a
nd
Perf
orm
an
ce A
naly
sis
of
Med
ia M
inin
g A
pp
licati
on
s
"Para
llelizati
on
Str
ate
gie
s a
nd
Perf
orm
an
ce A
naly
sis
of
Med
ia M
inin
g A
pp
licati
on
s
on
Mu
lti
on
Mu
lti--
Co
re P
rocesso
rs,"
W. L
i, X
. T
on
g, T
. W
an
g, Y
. Z
han
g, an
d Y
.C
ore
Pro
cesso
rs,"
W. L
i, X
. T
on
g, T
. W
an
g, Y
. Z
han
g, an
d Y
.--K
. C
hen
, K
. C
hen
, Jo
urn
al o
f S
ign
al P
rocessin
g S
yste
ms,
vo
l. 5
7,
no
. 2, p
p. 213
Jo
urn
al o
f S
ign
al P
rocessin
g S
yste
ms,
vo
l. 5
7,
no
. 2, p
p. 213--2
28,
No
v 2
009.
228,
No
v 2
009.
��"A
ccele
rati
ng
Vid
eo
"Accele
rati
ng
Vid
eo
--Min
ing
Ap
plicati
on
s U
sin
g M
an
y S
mall, G
en
era
lM
inin
g A
pp
licati
on
s U
sin
g M
an
y S
mall, G
en
era
l--P
urp
ose
Pu
rpo
se
Co
res,"
E. L
i, W
. L
i, X
. T
on
g, J. L
i, Y
. C
hen
, T
. W
an
g, P
. W
an
g, W
. H
u, Y
. D
u, Y
. C
ore
s,"
E. L
i, W
. L
i, X
. T
on
g, J. L
i, Y
. C
hen
, T
. W
an
g, P
. W
an
g, W
. H
u, Y
. D
u, Y
. Z
han
g, an
d Y
.Z
han
g, an
d Y
.--K
. C
hen
, IE
EE
Mic
ro, vo
l. 2
8, n
o. 5, p
p. 8
K. C
hen
, IE
EE
Mic
ro, vo
l. 2
8, n
o. 5, p
p. 8--2
1, S
ep
t 2008.
21, S
ep
t 2008.
��"H
igh
"Hig
h--P
erf
orm
an
ce P
hysic
al
Sim
ula
tio
ns o
n N
ext
Perf
orm
an
ce P
hysic
al
Sim
ula
tio
ns o
n N
ext--
Gen
era
tio
n A
rch
itectu
re w
ith
G
en
era
tio
n A
rch
itectu
re w
ith
M
an
y C
ore
s,"
Y.
Man
y C
ore
s,"
Y.--
K. C
hen
, J. C
hh
ug
an
i, C
. H
ug
hes, D
. K
im, S
. K
um
ar,
V. L
ee,
A. L
in,
K. C
hen
, J. C
hh
ug
an
i, C
. H
ug
hes, D
. K
im, S
. K
um
ar,
V. L
ee,
A. L
in,
A. N
gu
yen
, E
. A
. N
gu
yen
, E
. S
ifakis
Sif
akis
, an
d M
. S
mely
an
skiy
, In
tel T
ech
no
log
y J
ou
rnal, A
ug
. 2007.
, an
d M
. S
mely
an
skiy
, In
tel T
ech
no
log
y J
ou
rnal, A
ug
. 2007.
��"M
ed
ia M
inin
g"M
ed
ia M
inin
g——
Em
erg
ing
E
merg
ing
Tera
Tera
--sca
le C
om
pu
tin
g A
pp
licati
on
s,"
Y. C
hen
, E
. L
i, W
. sca
le C
om
pu
tin
g A
pp
licati
on
s,"
Y. C
hen
, E
. L
i, W
. L
i, T
. W
an
g, J. L
i, X
. T
on
g, P
. W
an
g, W
. H
u, Y
. Z
han
g, an
d Y
.L
i, T
. W
an
g, J. L
i, X
. T
on
g, P
. W
an
g, W
. H
u, Y
. Z
han
g, an
d Y
.--K
. C
hen
, In
tel
K. C
hen
, In
tel
Tech
no
log
y J
ou
rnal, A
ug
. 2007.
Tech
no
log
y J
ou
rnal, A
ug
. 2007.
��“M
ed
ia A
pp
licati
on
s o
n H
yp
er
“M
ed
ia A
pp
licati
on
s o
n H
yp
er--
Th
read
ing
Tech
no
log
y,”
Y.
Th
read
ing
Tech
no
log
y,”
Y.--
K. C
hen
, M
. H
ollim
an
, E
. K
. C
hen
, M
. H
ollim
an
, E
. D
eb
es
Deb
es, S
. , S
. Z
helt
ov
Zh
elt
ov,
A.
, A
. K
nyazev
Kn
yazev,
S. B
rata
no
v,
R.
, S
. B
rata
no
v,
R. B
ele
no
vB
ele
no
v,
an
d I. S
an
tos, In
tel
, an
d I. S
an
tos, In
tel
Tech
no
log
y J
ou
rnal, F
eb
. 2002.
Tech
no
log
y J
ou
rnal, F
eb
. 2002.
��“Im
ple
men
tati
on
of
H.2
64 E
nco
der
an
d D
eco
der
on
Pers
on
al
Co
mp
ute
rs,”
Y.
“Im
ple
men
tati
on
of
H.2
64 E
nco
der
an
d D
eco
der
on
Pers
on
al
Co
mp
ute
rs,”
Y.--
K.
K.
Ch
en
, E
. Q
. L
i, X
. Z
ho
u, an
d S
. L
. C
hen
, E
. Q
. L
i, X
. Z
ho
u, an
d S
. L
. G
eG
e, Jo
urn
al o
f V
isu
al C
om
mu
nic
ati
on
s a
nd
Im
ag
e
, Jo
urn
al o
f V
isu
al C
om
mu
nic
ati
on
s a
nd
Im
ag
e
Rep
resen
tati
on
s, vo
l. 1
7, n
o. 2 , p
p 5
09
Rep
resen
tati
on
s, vo
l. 1
7, n
o. 2 , p
p 5
09--5
32,
Ap
r. 2
006.
532,
Ap
r. 2
006.
��"A
Co
mp
iler
for
Exp
loit
ing
Neste
d"A
Co
mp
iler
for
Exp
loit
ing
Neste
d--P
ara
llelism
in
P
ara
llelism
in
Op
en
MP
Op
en
MP
Pro
gra
ms,"
X.
Pro
gra
ms,"
X. T
ian
Tia
n, J.
, J.
Ho
efl
ing
er
Ho
efl
ing
er,
G.
, G
. H
aab
Haab
, Y
., Y
.--K
. C
hen
, M
. G
irk
ar,
S. S
hah
, P
ara
llel C
om
pu
tin
g J
ou
rnal,
K. C
hen
, M
. G
irk
ar,
S. S
hah
, P
ara
llel C
om
pu
tin
g J
ou
rnal,
vo
l. 3
1,
no
. 10
vo
l. 3
1,
no
. 10--1
2,
pp
. 960
12,
pp
. 960--9
83,
Oct.
2005.
983,
Oct.
2005.
104
Co
nfe
ren
ce
Pa
pe
rs
Co
nfe
ren
ce
Pa
pe
rs
��"C
hallen
ges a
nd
Op
po
rtu
nit
ies o
f O
bta
inin
g P
erf
orm
an
ce f
rom
Mu
lti
"Ch
allen
ges a
nd
Op
po
rtu
nit
ies o
f O
bta
inin
g P
erf
orm
an
ce f
rom
Mu
lti--
Co
re C
PU
s
Co
re C
PU
s
an
d M
an
yan
d M
an
y--C
ore
GP
Us,"
T. C
hen
an
d Y
.C
ore
GP
Us,"
T. C
hen
an
d Y
.--K
. C
hen
, in
IE
EE
In
tern
ati
on
al C
on
fere
nce
K. C
hen
, in
IE
EE
In
tern
ati
on
al C
on
fere
nce
on
Aco
usti
cs,
Sp
eech
, an
d S
ign
al P
rocessin
g (
ICA
SS
P),
Ap
r. 2
009.
on
Aco
usti
cs,
Sp
eech
, an
d S
ign
al P
rocessin
g (
ICA
SS
P),
Ap
r. 2
009.
��"P
ara
llelizati
on
Of
"Para
llelizati
on
Of
Ad
aB
oo
st
Ad
aB
oo
st
Alg
ori
thm
On
Mu
lti
Alg
ori
thm
On
Mu
lti--
Co
re P
rocesso
rs,"
Y.
Co
re P
rocesso
rs,"
Y.--
K. C
hen
, W
. K
. C
hen
, W
. L
i, a
nd
X. T
on
g, IE
EE
Wo
rksh
op
on
Sig
nal P
rocessin
g S
yste
ms,
Oct.
2008.
Li, a
nd
X. T
on
g, IE
EE
Wo
rksh
op
on
Sig
nal P
rocessin
g S
yste
ms,
Oct.
2008.
��""N
ovel P
ara
llel H
ou
gh
Tra
nsfo
rm o
n M
ult
iN
ovel P
ara
llel H
ou
gh
Tra
nsfo
rm o
n M
ult
i--C
ore
Pro
cesso
rs,"
Y.
Co
re P
rocesso
rs,"
Y.--
K. C
hen
, W
. L
i, J
. K
. C
hen
, W
. L
i, J
. L
i, a
nd
T. W
an
g, in
In
t’l C
on
f. o
n A
co
usti
cs,
Sp
eech
, an
d S
ign
al P
rocessin
g,
Ap
r.
Li, a
nd
T. W
an
g, in
In
t’l C
on
f. o
n A
co
usti
cs,
Sp
eech
, an
d S
ign
al P
rocessin
g,
Ap
r.
2008.
2008.
��“P
ara
llelizati
on
, P
erf
orm
an
ce A
naly
sis
, an
d A
lgo
rith
m C
on
sid
era
tio
n o
f H
ou
gh
“P
ara
llelizati
on
, P
erf
orm
an
ce A
naly
sis
, an
d A
lgo
rith
m C
on
sid
era
tio
n o
f H
ou
gh
T
ran
sfo
rm o
n C
hip
Mu
ltip
rocesso
rs,”
W. L
i, a
nd
Y.
Tra
nsfo
rm o
n C
hip
Mu
ltip
rocesso
rs,”
W. L
i, a
nd
Y.--
K. C
hen
, in
Wo
rksh
op
on
K
. C
hen
, in
Wo
rksh
op
on
D
esig
n, A
rch
itectu
re a
nd
Sim
ula
tio
n o
f C
hip
Mu
lti
Desig
n, A
rch
itectu
re a
nd
Sim
ula
tio
n o
f C
hip
Mu
lti--
Pro
cesso
rs,
Dec. 2007.
Pro
cesso
rs,
Dec. 2007.
��“C
om
pu
ter
Vis
ion
on
Mu
lti
“C
om
pu
ter
Vis
ion
on
Mu
lti--
Co
re P
roc
esso
rs:
Art
icu
late
d B
od
y T
rackin
g,”
T. C
hen
, C
ore
Pro
cesso
rs:
Art
icu
late
d B
od
y T
rackin
g,”
T. C
hen
, D
. D
. B
ud
nik
ov
Bu
dn
iko
v, C
. H
ug
hes, an
d Y
., C
. H
ug
hes, an
d Y
.--K
. C
hen
, in
In
t’l C
on
f. o
n M
ult
imed
ia a
nd
Exp
o, Ju
ly
K. C
hen
, in
In
t’l C
on
f. o
n M
ult
imed
ia a
nd
Exp
o, Ju
ly
2007.
2007.
��"A
dap
tive P
ara
llel
Gra
ph
Min
ing
fo
r C
MP
Arc
hit
ectu
res,"
G.
"Ad
ap
tive P
ara
llel
Gra
ph
Min
ing
fo
r C
MP
Arc
hit
ectu
res,"
G. B
ueh
rer
Bu
eh
rer,
S.
, S
. P
art
hasara
thy
Part
hasara
thy, an
d Y
., an
d Y
.--K
. C
hen
, in
In
t’l C
on
f. o
n D
ata
Min
ing
, p
p. 97
K. C
hen
, in
In
t’l C
on
f. o
n D
ata
Min
ing
, p
p. 97--1
06,
Dec. 2006.
106,
Dec. 2006.
��“W
ork
load
Ch
ara
cte
rizati
on
of
a P
ara
llel
Vid
eo
Min
ing
Ap
plicati
on
on
a 1
6“W
ork
load
Ch
ara
cte
rizati
on
of
a P
ara
llel
Vid
eo
Min
ing
Ap
plicati
on
on
a 1
6--W
ay
Way
Sh
are
dS
hare
d--M
em
ory
Mu
ltip
rocesso
r S
yste
m,”
W. L
i, E
. L
i, C
. M
em
ory
Mu
ltip
rocesso
r S
yste
m,”
W. L
i, E
. L
i, C
. D
ulo
ng
Du
lon
g, Y
., Y
.--K
. C
hen
, T
. K
. C
hen
, T
. W
an
g, Y
. Z
han
g, in
In
t’l
Wan
g, Y
. Z
han
g, in
In
t’l S
ym
pS
ym
p. o
n W
ork
load
Ch
ara
cte
rizati
on
, p
p. 7
. o
n W
ork
load
Ch
ara
cte
rizati
on
, p
p. 7--1
6, O
ct.
2006.
16, O
ct.
2006.
��"E
ffic
ien
t F
req
uen
t P
att
ern
Min
ing
on
Sh
are
d M
em
ory
Syste
ms:
Imp
licati
on
s f
or
"Eff
icie
nt
Fre
qu
en
t P
att
ern
Min
ing
on
Sh
are
d M
em
ory
Syste
ms:
Imp
licati
on
s f
or
Ch
ip M
ult
ipro
cesso
r A
rch
itectu
res,"
G.
Ch
ip M
ult
ipro
cesso
r A
rch
itectu
res,"
G. B
ueh
rer
Bu
eh
rer,
S.
, S
. P
art
hasara
thy
Part
hasara
thy,
A.
, A
. G
ho
tin
gG
ho
tin
g, Y
., Y
.--K
. K
. C
hen
, D
. K
im, an
d A
. N
gu
yen
, in
Me
mo
ry S
yste
ms P
erf
orm
an
ce a
nd
Co
rrectn
ess
Ch
en
, D
. K
im, an
d A
. N
gu
yen
, in
Me
mo
ry S
yste
ms P
erf
orm
an
ce a
nd
Co
rrectn
ess
Wo
rksh
op
, O
ct.
2006.
Wo
rksh
op
, O
ct.
2006.
105
Co
nfe
ren
ce
Pa
pe
rs (
Vid
eo
Co
de
c R
ela
ted
)C
on
fere
nce
Pa
pe
rs (
Vid
eo
Co
de
c R
ela
ted
)��
“Im
ple
men
tati
on
of
H.2
64 E
nc
od
er
on
Gen
era
l“Im
ple
men
tati
on
of
H.2
64 E
nc
od
er
on
Gen
era
l--P
urp
ose P
rocesso
rs
Pu
rpo
se P
rocesso
rs
wit
h H
yp
er
wit
h H
yp
er--
Th
read
ing
Tech
no
log
y,”
E. Q
. L
i an
d Y
.T
hre
ad
ing
Tech
no
log
y,”
E. Q
. L
i an
d Y
.--K
. C
hen
, in
Pro
c.
K. C
hen
, in
Pro
c.
of
SP
IE V
isu
al C
om
mu
nic
ati
on
s a
nd
Im
ag
e P
rocessin
g, v
ol. 5
308, p
p.
of
SP
IE V
isu
al C
om
mu
nic
ati
on
s a
nd
Im
ag
e P
rocessin
g, v
ol. 5
308, p
p.
384
384——
395, Jan
. 2004.
395, Jan
. 2004.
��“T
ow
ard
s E
ffic
ien
t M
ult
i“T
ow
ard
s E
ffic
ien
t M
ult
i--L
evel T
hre
ad
ing
of
H.2
64 E
nco
der
on
In
tel
Level T
hre
ad
ing
of
H.2
64 E
nco
der
on
In
tel
Hyp
er
Hyp
er--
Th
read
ing
Arc
hit
ectu
res
,” Y
.T
hre
ad
ing
Arc
hit
ectu
res
,” Y
.--K
. C
hen
, X
. T
ian
, S
. G
e, M
. G
irkar,
K
. C
hen
, X
. T
ian
, S
. G
e, M
. G
irkar,
in
Pro
c. o
f In
t’l P
ara
llel an
d D
istr
ibu
ted
Pro
cessin
g S
ym
p., A
pr.
2004.
in P
roc. o
f In
t’l P
ara
llel an
d D
istr
ibu
ted
Pro
cessin
g S
ym
p., A
pr.
2004.
��“E
ffic
ien
t M
ult
ith
read
ing
Im
ple
men
tati
on
of
H.2
64 E
nco
der
on
In
tel
“E
ffic
ien
t M
ult
ith
read
ing
Im
ple
men
tati
on
of
H.2
64 E
nco
der
on
In
tel
Hyp
er
Hyp
er--
Th
read
ing
Arc
hit
ectu
res
,” S
. G
e, X
. T
ian
, an
d Y
.T
hre
ad
ing
Arc
hit
ectu
res
,” S
. G
e, X
. T
ian
, an
d Y
.--K
. C
hen
, in
K
. C
hen
, in
P
acif
icP
acif
ic--R
im C
on
f. o
n M
ult
imed
ia, D
ec 2
003.
Rim
Co
nf.
on
Mu
ltim
ed
ia, D
ec 2
003.
��“E
xp
lori
ng
th
e U
se o
f H
yp
er
“E
xp
lori
ng
th
e U
se o
f H
yp
er--
Th
read
ing
Tech
no
log
y f
or
Mu
ltim
ed
ia
Th
read
ing
Tech
no
log
y f
or
Mu
ltim
ed
ia
Ap
plicati
on
s w
ith
In
tel O
pen
MP
Co
mp
iler,
” X
. T
ian
, Y
.A
pp
licati
on
s w
ith
In
tel O
pen
MP
Co
mp
iler,
” X
. T
ian
, Y
.--K
. C
hen
, M
. K
. C
hen
, M
. G
irkar,
S. G
e, R
. L
ien
hart
, an
d S
. S
hah
, in
In
t’l P
ara
llel an
d
Gir
kar,
S. G
e, R
. L
ien
hart
, an
d S
. S
hah
, in
In
t’l P
ara
llel an
d
Dis
trib
ute
d P
rocessin
g S
ym
p., p
p. 36
Dis
trib
ute
d P
rocessin
g S
ym
p., p
p. 36--4
3, A
pr.
2003.
43, A
pr.
2003.
��"T
he Im
pact
of
SM
T/S
MP
Des
ign
s o
n M
ult
imed
ia S
oft
ware
"T
he Im
pact
of
SM
T/S
MP
Des
ign
s o
n M
ult
imed
ia S
oft
ware
E
ng
ineeri
ng
En
gin
eeri
ng
---
---A
Wo
rklo
ad
An
aly
sis
Stu
dy,”
Y.
A W
ork
load
An
aly
sis
Stu
dy,”
Y.--
K. C
hen
, R
. L
ien
hart
, K
. C
hen
, R
. L
ien
hart
, E
. D
eb
es, M
. H
ollim
an
, an
d M
. Y
eu
ng
, in
Pro
c. o
f In
t’l S
ym
p. o
n
E. D
eb
es, M
. H
ollim
an
, an
d M
. Y
eu
ng
, in
Pro
c. o
f In
t’l S
ym
p. o
n
Mu
ltim
ed
ia S
oft
ware
En
gin
ee
rin
g, D
ec. 2002.
Mu
ltim
ed
ia S
oft
ware
En
gin
ee
rin
g, D
ec. 2002.
��"V
ideo
Ap
plicati
on
s o
n H
yp
er
"Vid
eo
Ap
plicati
on
s o
n H
yp
er--
Th
read
ing
Tech
no
log
y,"
Y.
Th
read
ing
Tech
no
log
y,"
Y.--
K. C
hen
, M
. K
. C
hen
, M
. H
ollim
an
, an
d E
. D
eb
es, in
In
t'l C
on
f. o
n M
ult
imed
ia a
nd
Exp
o, v
ol. 2
, H
ollim
an
, an
d E
. D
eb
es, in
In
t'l C
on
f. o
n M
ult
imed
ia a
nd
Exp
o, v
ol. 2
, p
p. 193
pp
. 193 --
196, A
ug
. 2002.
196, A
ug
. 2002.
106
Sp
ec
ial Is
su
es
Sp
ec
ial Is
su
es
��“M
ult
i“M
ult
i--C
ore
C
ore
En
ab
led
Mu
ltim
ed
ia A
pp
licati
on
s &
E
nab
led
Mu
ltim
ed
ia A
pp
licati
on
s &
A
rch
itectu
res,”
Jo
urn
al
of
Sig
nal
Pro
cessin
g
Arc
hit
ectu
res,”
Jo
urn
al
of
Sig
nal
Pro
cessin
g
Syste
ms (
No
v.
2009)
Syste
ms (
No
v.
2009)
��“S
ign
al
Pro
cessin
g o
n P
latf
orm
s w
ith
Mu
ltip
le
“S
ign
al
Pro
cessin
g o
n P
latf
orm
s w
ith
Mu
ltip
le
Co
res,”
IE
EE
Sig
nal
Pro
cessin
g M
ag
azin
e
Co
res,”
IE
EE
Sig
nal
Pro
cessin
g M
ag
azin
e
––P
art
1
Part
1 -
---O
verv
iew
an
d M
eth
od
olo
gy
Overv
iew
an
d M
eth
od
olo
gy (
No
v.
2009)
(No
v.
2009)
––P
art
P
art
2
2 -
---D
esig
n
Desig
n a
nd
Ap
pli
cati
on
s (
Marc
h 2
010)
an
d A
pp
lic
ati
on
s (
Marc
h 2
010)
��“A
lgo
rith
m/A
rch
itectu
re C
o“A
lgo
rith
m/A
rch
itectu
re C
o--E
xp
lora
tio
n o
f E
xp
lora
tio
n o
f V
isu
al
Co
mp
uti
ng
,” I
EE
E T
ran
sacti
on
s o
n
Vis
ual
Co
mp
uti
ng
,” I
EE
E T
ran
sacti
on
s o
n
Cir
cu
it a
nd
Syste
m f
or
Vid
eo
Tech
no
log
y (
No
v.
Cir
cu
it a
nd
Syste
m f
or
Vid
eo
Tech
no
log
y (
No
v.
2009)
2009)
107
Perf
orm
an
ce t
ests
an
d r
ati
ng
s a
re m
easu
red
usin
g
Perf
orm
an
ce t
ests
an
d r
ati
ng
s a
re m
easu
red
usin
g
sp
ecif
ic c
om
pu
ter
sys
tem
s a
nd
/or
co
mp
on
en
ts a
nd
sp
ecif
ic c
om
pu
ter
sys
tem
s a
nd
/or
co
mp
on
en
ts a
nd
re
flect
the a
pp
roxim
ate
perf
orm
an
ce o
f In
tel
refl
ect
the a
pp
roxim
ate
perf
orm
an
ce o
f In
tel
pro
du
cts
as m
easu
red
by t
ho
se t
ests
. A
ny
pro
du
cts
as m
easu
red
by t
ho
se t
ests
. A
ny
dif
fere
nce in
syste
m h
ard
ware
or
so
ftw
are
desig
n
dif
fere
nce in
syste
m h
ard
ware
or
so
ftw
are
desig
n
or
co
nfi
gu
rati
on
may a
ffect
actu
al p
erf
orm
an
ce.
or
co
nfi
gu
rati
on
may a
ffect
actu
al p
erf
orm
an
ce.
Bu
yers
sh
ou
ld c
on
su
lt o
ther
so
urc
es o
f in
form
ati
on
B
uyers
sh
ou
ld c
on
su
lt o
ther
so
urc
es o
f in
form
ati
on
to
evalu
ate
th
e p
erf
orm
an
ce o
f syste
ms o
r to
evalu
ate
th
e p
erf
orm
an
ce o
f syste
ms o
r co
mp
on
en
ts t
hey a
re c
on
sid
eri
ng
pu
rch
asin
g. F
or
co
mp
on
en
ts t
hey a
re c
on
sid
eri
ng
pu
rch
asin
g. F
or
mo
re in
form
ati
on
on
perf
orm
an
ce t
ests
an
d o
n t
he
mo
re in
form
ati
on
on
perf
orm
an
ce t
ests
an
d o
n t
he
perf
orm
an
ce o
f In
tel p
rod
ucts
, vis
it
perf
orm
an
ce o
f In
tel p
rod
ucts
, vis
it
ww
w.in
tel.co
m/p
erf
orm
an
ce/ o
r call
(U
.S.)
1w
ww
.in
tel.co
m/p
erf
orm
an
ce/ o
r call
(U
.S.)
1--8
00
800--
628
628--8
686 o
r 1
8686 o
r 1--9
16
916--3
56
356--3
104.
31
04.