1
Miodrag BolicMiodrag Bolic
ARCHITECTURES FOR EFFICIENT IMPLEMENTATION
OF PARTICLE FILTERS
Department of Electrical and Computer EngineeringStony Brook University
Advisor: Prof. Petar M. Djuric
STONY BROOK UNIVERSITY
Dissertation Defense
2
Outline
PART I: Introduction
Conclusions and future work
PART II: Theory of PFs
Dynamic model Monte Carlo sampling Importance sampling Resampling Bearings-only tracking example Steps and complexity
PART III: Implementation of PFs
VLSI signal processing architectures Methodology
Non-parallel implementation Algorithm characteristics Modifications of the PF New resampling algorithms Architecture Implementation results
Parallel implementation Propagation of particles Parallel resampling Architectures for parallel
resampling Space exploration
Gaussian PFs
Motivation and goals Challenges
3
sensor
ParticleFilter
t
Obs
erve
d si
gnal
t
Estimation
PARTICLE FILTERCHIP
Introduction – Motivations and Goals
Goal
Increase speed of particle filters
4
Introduction - Challenges
First hardware implementation of particle filters (50 times improvement in speed in comparison with DSP)
New resampling algorithms suitable for hardware implementation
Fast particle filtering algorithms that do not use memories
First distributed algorithms and architectures for particle filters
Contributions
Reducing computational complexity
Randomness – difficult to exploit regular structures in VLSI
Exploiting temporal and spatial concurrency
Challenges
5
Outline
PART I: Introduction
Conclusions and future work
PART II: Theory of PFs
Dynamic model Monte Carlo sampling Importance sampling Resampling Bearings-only tracking example Steps and complexity
PART III: Implementation of PFs
VLSI signal processing architectures Methodology
Non-parallel implementation Algorithm characteristics Modifications of the PF New resampling algorithms Architecture Implementation results
Parallel implementation Propagation of particles Parallel resampling Architectures for parallel
resampling Space exploration
Gaussian PFs
Motivation and goals Challenges
6
States: position and velocity xk=[xk, Vxk, yk, Vyk]T
Observations: angle zk
Theory of PFs – Dynamic model
zk=fz(xk,vk)
xk=fx(xk-1, uk)
Example: Bearings-only tracking
Observation equation: zk=atan(yk/ xk)+vk
State equation:xk=Fxk-1+ Guk
x
y
T rajec to ry
xk xk + 1
ykyk + 1
zkzk + 1
fz measurement functionvk observation noise
fx state transition functionuk process noise
General dynamic model
7
Objective in Bayesian approach
p(x0:k|z1:k)
posterior distribution
Theory of PFs – Bayesian approach
xk? State space model
Solution Problem
Estimate posterior
Difficult to drawsamples
Integrals are not tractable
Monte Carlo Sampling
ImportanceSampling
Use of knowing the posterior
All kinds of estimates can be calculated
Gaussian processes and
linear model
Kalman filter
Non-Gaussian processes and/or
non-linear model
Particle filter
8
Theory of PFs – Monte Carlo Sampling
Densities can be approximated by discrete random measures:
Particles and Weights
• χ approximates the density p(x)
• Integrals simplify to summations
t
State space model
Solution Problem
Estimate posterior
Difficult to drawsamples
Integrals are not tractable
Monte Carlo Sampling
ImportanceSampling
9
State space model
Solution Problem
Estimate posterior
Difficult to drawsamples
Integrals are not tractable
Monte Carlo Sampling
ImportanceSampling
Theory of PFs - Importance Sampling
Objective:
Approximate a density p(x) by a discrete random measure
• Steps:1. Generation of particles proposal density
2. Updating of the weights Bayes theory
10
Theory of PFs - Resampling
t
1t
Particles after resampling
Particles after resampling
time
Problems:
Weight Degeneration
Wastage of Computational resources
Solution RESAMPLING
Replicate particles in proportion to their weights
11
Theory of PFs – Bearings-Only Tracking Example
12
Theory of PFs - Bearings-Only Tracking Example (Cont.)
• Blue – True trajectory
• Red – Estimates
13
Theory of PFs – Steps and ComplexityInitialize particles
Output
Output estimates
1 2 M. . .
Particlegeneration
New observation
Exit
Normalize weights
1 2 M. . .
Weigthcomputation
Resampling
4M random number generations
Propagation of the particles
M exponential and arctangent functions
Bearings-only tracking problemNumber of particles M=1000
Complexity
More observations?
yes
no
14
Outline
PART I: Introduction
Conclusions and future work
PART II: Theory of PFs
Dynamic model Monte Carlo sampling Importance sampling Resampling Bearings-only tracking example Steps and complexity
PART III: Implementation of PFs
VLSI signal processing architectures Methodology
Non-parallel implementation Algorithm characteristics Modifications of the PF New resampling algorithms Architecture Implementation results
Parallel implementation Propagation of particles Parallel resampling Architectures for parallel
resampling Space exploration
Gaussian PFs
Motivation and goals Challenges
15
Implementation of PFs – VLSI Signal Processing Architectures
Approach
Temporal and spatial concurrency One-to-one mapping between operations and hardware blocks FPGA implementation
Speed is the main goal Functionality of the system does not change
Application specific processors
Programmable digital signal processors Application-domain specific processors Application specific processors
Types of architectures
16
Implementation of PFs – Methodology
Algorithmiclevel
Architecturelevel
RT level
Gate level
Impact of adesign decision
Complexity
Systemlevel
Joint algorithmic and architectural design
To increase performances, algorithms must be matched to architectures
17
Implementation of PFs – Algorithm Characteristics
Start
1 2 M. . .
Particle generation
New observation
Exit
Resampling
1 2 M. . .
Weightcomputation
Propagation of particles
Particle generation andweight computation
High computational complexity
No data dependencies among particles
Complexity depends on the state space model
Suitable for parallel andpipelined implementation
Resampling
Data dependent algorithmLow complexity
operations
Propagation of particles:random
Algorithm does not depend on the state space model
18
Implementation of PFs – Modifications of the PF
Ge n e r a t io n o f p a r t ic le s
W e igh t c o m p ut a t io n
R e sa m p lin g
O ut p ut c a lc ula t io n
Ge n e r a t io n o f p a r t ic le s
W e igh t c o m p ut a t io n
O ut p ut c a lc ula t io n
L S L I
T sir f
M 2 M - 1
T T + 1
Modifications
Architecture Algorithm
Fine-grain pipelining
Avoiding normalization
Looptransformations
Finite precision arithmetic
Spatialconcurrency
Dedicated hardware
Addressingschemes
Parameter Current Limits
Sample period ~2MTclk ~MTclk
Memories (2N+1)M (N+1)M
19
Implementation of PFs –New Resampling Algorithms
Ge n e r a t io n o f p a r t ic le s
W e igh t c o m p ut a t io n
R e sa m p lin g
O ut p ut c a lc ula t io n
Ge n e r a t io n o f p a r t ic le s
T sir f
M M
T T + 1
L
Parameter Algorithm 1 Algorithm 2
Sample period ~2MTclk ~MTclk
Memories Particle memory: (N+1)MIndex memory: 2M
Particle memory: (N+1)MIndex memory: 4M
Performances Same Worse (deterministic algorithm)
Ge n e r a t io n o f p a r t ic le s
W e igh t c o m p ut a t io n
R e sa m p lin g
O ut p ut c a lc ula t io n
Ge n e r a t io n o f p a r t ic le s
T sir f
M L R
T T + 1
L
20
Implementation of PFs – Architecture
P a rtic lege ne ra tion
R e s a m plingW e ightC om puta tion
Inde xm e m ory
R e plic a tionfa c tor
m e m ory
a ddr
a ddr
da ta
P M E Mc ontro l
P a r tic lem e m oryP M E M
21
Implementation of PFs – Implementation results
Particle generation
Weight Computation
Resampling
Logic blocks 16% 75% 9%
Block RAMs 67% 11% 22%
Logic blocks: 4%
Memories: 3%
Resources
DSP: ~ 1kHz
FPGA: ~ 50 kHz
Sampling frequency
Percentage of utilization of the PF blocks
Hardware platform is Xilinx Virtex-II Pro
Clock period is 10ns
PFs is applied to the bearings-only tracking problem
1000 particles is used
22
• Universal architecture with a central unit
ProcessingElement 1
ProcessingElement 4
ProcessingElement 2
CentralUnit
Implementation of PFs – Parallelism
Start
New observation
Exit
1 2 M. . .
Particle generation
Resampling
1 2 M. . .
Weightcomputation
Propagation of particles
ProcessingElement 3
Processing elements (PE) Particle generation Weight computation
Central Unit Algorithm for particle propagation Resampling
1 M
1 M
23
PE 2PE 1 PE 3 PE 4
Implementation of PFs – Propagation of Particles
ProcessingElement 1
ProcessingElement 4
ProcessingElement 2
CentralUnit
ProcessingElement 3
Disadvantages of the particle propagation step
Random communication pattern
Decision about connections is not known before the run time
Requires dynamic type of a network
Speed-up is significantly affected
Particles after resampling
time
t
24
Implementation of PFs – Parallel Resampling
1 2
3 4
N=13N=0
N=0 N=3
14
4 1 2
3 4
N=8N=0
N=0 N=8
4
4
1 2
3 4
N=4N=4
N=4 N=4
1
1
1 1
Advantages Propagation is only local Propagation is controlled in advance by a designer Performances are the same as in the sequential applications
Solution The way in which Monte Carlo sampling is performed is modified
Result Speed-up is almost equal to the number of PEs (up to 8 PEs)
25
PE1
PE2 PE4
PE3
CentralUnit
Architecture that allows adaptive connection among the processing elements
Implementation of PFs Architectures for Parallel Resampling
• Controlled particle propagation after resampling
26
1
2
4
8
16
32
1
10
100
1000
1 10 100
Number of PEs
Sam
ple
per
iod
(us
) 500
1000
5000
10000
50000
Vir tex I I P r o d es ig n s p ac e
K= 1 4
Num ber ofpart ic les M
Implementation of PFs – Space exploration
Hardware platform is Xilinx Virtex-II Pro
Clock period is 10ns
PFs are applied to the bearings-only tracking problem
Limit: Available memory
Limit: Logic blocks
27
1 2 M. . .
Implementation of PFs – Gaussian PFs
Sampling period is minimal ~ MTclk
No need for memories for storing particles
Simple communication in parallel implementation
Advantages
Start
1 2 M. . .
Particle generation
Exit
1 2 M. . .
Weightcomputation
Computing the mean and the covariance
matrix
Drawing conditioning particles
New observation No
Yes
Propagates only first two moments
Approximates densities by Gaussians
No need for resampling
Functionality
Higher computational complexity
Limited scope of applications
Disadvantages
28
Implementation of PFs – Gaussian PFs (cont.)
1
10
100
1000
0 5 10 15 20 25 30 35
Number of processing elements
Sam
ple
perio
d (u
s) SIRF (M=500)
SIRF (M=5000)
SIRF (M=50000)
GPF (M=500)
GPF (M=5000)
GPF (M=50000)
Minimum sampling period versus number of PEs of parallel GPFs and SIRs
29
Conclusions and Future Work
Simplifying floating to fixed-point conversion
Developing application-domain specific processor for PFs
Developing reconfigurable architectures for PFs
Future work
Summary
Modification of the algorithms to be suitable for hardware implementation
Development of parallel algorithms and architectures
Implementation of the particle filter in FPGA
Analysis of the other types of particle filtering algorithms
30
Miodrag BolicMiodrag Bolic
ARCHITECTURES FOR EFFICIENT IMPLEMENTATION
OF PARTICLE FILTERS
Department of Electrical and Computer EngineeringStony Brook University
Advisor: Prof. Petar M. Djuric
STONY BROOK UNIVERSITY
Dissertation Defense