Upload
harsh-kaushik
View
208
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Architecture for Block Based ARPS
Citation preview
1
ELECTRONICS AND ELECTRICAL COMMUNICATION ENGINEERINGINDIAN INSTITUTE OF TECHNOLOGYKHARAGPUR
ELECTRONICS AND ELECTRICAL COMMUNICATION ENGINEERINGINDIAN INSTITUTE OF TECHNOLOGYKHARAGPUR-721302
VLSI ARCHITECTURE FOR BLOCK MATCHING MOTION ESTIMATION
USING ARPS ALGORITHM
Project Presentationon
Under the guidance of
Dr. Indrajit Chakrabarti
Presented By:
HARSH KAUSHIK(12EC62R10)
2
OUTLINE
INTRODUCTION
MOTIVATION
OBJECTIVE
ARPS METHOD
PROPOSED ARCHITECTURE
SIMULATION RESULTS
CONCLUSIONS
FUTURE SCOPE
REFERENCES
3
INTRODUCTION ‘Raw’ video is composed of huge amount of data. [1]
Video compression is used to exploit redundancies present in the video. [1]
Various applications such as digital video storage, video conferencing via wired or wireless medium, online gaming and broadcast services over satellite channels use digital video interface.
International video coding standards such as H.264 or MPEG-4 use modern methods of video compression.
Compression Efficiency of video encoders have to be improved.
4
INTRODUCTION (contd.) Video compression involves two main blocks: Encoder and Decoder.
Encoder converts the data coming from the source into a compressed bit stream of data.
Motion Estimation is one of the most important block in a Video Encoder.
It provides the displacement between each block in the current frame and its closest match in the reference frame.
Types of Motion Estimation:
1. Pixel Based Motion Estimation.
2. Block Based Motion Estimation.
3. Mesh Based Motion Estimation.
5
MOTIVATION
Motion Estimation is one of the most consequential as well as computationally exhaustive block of a video encoder.
Block-Matching algorithm (BMA) is used for motion estimation (ME) in various video coding.
FS is highly computational so, we use fast BMA Techniques.
Adaptive Rood Pattern Search (ARPS) most efficient in terms of the computational speed and achieves good PSNR.
6
OBJECTIVE
To design an efficient VLSI architecture for block matching motion estimation using ARPS algorithm which will help to enhance the performance of a video encoder.
7
Adaptive Rood Pattern Search method SIZE of the search pattern and MAGNITUDE of the target MV determines
Speed and Accuracy of pattern-based search algorithms. Two main issues are: 1) Pre-determining the motion behavior of current block ? 2) Size and shape of the search pattern ? For First issue, Current block’s motion behavior can be predicted by its neighboring blocks’ MVs. For Second issue, two types of search patterns are used:- 1. Adaptive rood pattern (ARP) 2. Small search pattern (URP) Prediction of the target motion vector is achieved with the help of ROS (region
of search).
8
Current Macro Block
Neighboring macro Block which acts as a ROS whose motion vector has already been computed
Figure 1: Region of support for the current Macro Block.
9
START
ARP
Calculate SAD min Block
Set the MME point of ARP as the Center point for URP
STOP
URP
MOTION VECTOR
Is Center MME point?
YES
NOFigure 2:
Algorithmic Flow Chart of ARPS
10
Adaptive Rood Pattern ARP’s size,
L = Max {| MVpredicted(x)|,| MVpredicted(y)|}
Leftmost blocks in each frame have a fixed arm length of 2 pixels.
Predicted MV along with four-armed rood pattern is added into ARP which is similar to target MV.
It mostly overlaps with one of the search points.
Therefore, ARP has either 5 search points when predicted Motion Vector is not zero; otherwise only one search point is involved.
ARP is applied only once.
Figure 3: Adaptive Rood Pattern [2][3].
11
Figure 4: Initial Search using ARP.
Right Shifted Block by 2 pixels
Bottom shifted Block by 2
pixels
Left shifted Block by 2
pixels
Top shifted Block by 2 pixels
12
Fixed Pattern – for refined search
Initial search, leads to new search center.
Small search pattern (URP) is used for local
refined search unrestrictedly and repeatedly .
The URP continues until the center becomes the
MME point.
When the center becomes the minimum point
the process stops and corresponding Motion
Vector is obtained and the search is started for
the next macro block. Figure 5: Unit Rood Pattern [2][3].
13
Figure 6: Refined search using Unit Rood Pattern.
Right Shifted Block by 1 pixel
Bottom shifted Block by 1
pixel
Left shifted Block by 1
pixel
Top shifted Block by 1 pixels
14
PROPOSED ARCHITECTUREA
D
D
R
E
S
S
And
C
O
N
T
R
O
L
Unit
BLOCK RAM FOR CURRENT FRAME
BLOCK RAM FOR REFERENCE FRAME
ARP MODULE
URP MODULE
COMPARATOR For Deciding the URP center Point
MOTION VECTOR UNIT
ARP FLAG
MEMORY UNIT
SSP FLAG
Figure 7: Block Diagram of the Proposed Architecture
15
MEMORY UNIT
0 511
512
(16 × 16) macro block
352
288
For proper Addressing of the pixel intensity values we use zero padding.
CIF FRAME(352 × 288)
16
Current Frame Memory
Reference Frame Memory
SAD Block
Comparison Block
Motion Vector
Address Generation Block
Figure 8: Internal Constitution of ARP and URP Modules
ARP and URP Module both are made up three main sub blocks namely Address generation block, Comparator block and SAD block.
PROPOSED ARCHITECTURE
17
Base Address Reference Address
Block Pattern Address
19 19
19
19
19
Figure 9: Address Generation Block
Address Generation Block Current frame and Reference frame are stored in two block RAM’s of (2K × 9).
Address generation block generates the addresses to refer the pixel intensity values
which are stored in the memory.
Address generation block has three main sub blocks:
1. Base address generation block.
2. Reference address generation block.
3. Block Pattern address generation block.
18
MOD – 18 Counter
MOD – 22 Counter
TCTC ResetReset
CLK CLKGLOBAL CLOCK
Figure 10: Base Address Generation Block
Base Address Generation Block This block is used to generate the address of the current macro block.
The block is designed using two counters which are made up of 5 D – flip flop
modules.
CIF format video test sequence has resolution of 352 × 288. Frame is divided
into 16 × 16 blocks that means 18 rows and 22 columns.
19
MOD – 5 Counter
CLK
DECODER
19 bit
Figure 11: Reference Address Generation Block
Reference Address Generation Block It generates the addresses of the blocks which are to be searched in the reference
frame.
Five search points including the center point are to be searched.
A MOD – 5 counter using 3 D – flip flop modules is used for generating five
positions.
Output of this counter is passed to a decoder.
20
Output of MOD – 5
Counter
Address in Decimal
Form
Address in Binary 2’s
Complement Form
000 (0) 0 0000000000000000000
001 (1) 2 0000000000000000010
010 (2) 1024 0000000010000000000
011 (3) -2 1111111111111111110
100 (4) -1024 1111111110000000000
For ARP module (Left most blocks the ARP size is two pixels)
21
Output of MOD – 5
Counter
Address in Decimal
Form
Address in Binary 2’s
Complement Form
000 (0) 0 0000000000000000000
001 (1) 1 0000000000000000001
010 (2) 512 0000000001000000000
011 (3) -1 1111111111111111111
100 (4) -512 1111111111000000000
For URP module
22
MOD – 16 Counter
MOD – 16 Counter
TCTC ResetReset
CLK CLKGLOBAL CLOCK
Figure 12: Block Pattern Address Generation Block
Block Pattern Address Generation Block
This block is used to generate the address of each pixel present in the current
macro block of (16 ×16) size.
It is designed using two counter made up of 4 D – flip flops each.
23
Current Pixel
Reference Pixel
SAD register
88
8
14
14
Figure 13: Sum of Absolute Difference Block
It computes the difference of the pixel intensity values of both the current and
reference frame macro block.
The difference is added with the previous pixel difference value until the
difference of all the pixel intensity values for a macro block is finished.
Sum of Absolute Difference Block
24
Register with Minimum SAD
value
SAD value from the
SAD block
MUX
Sign bit
14 14
1414
14
Figure 14: Comparison Block
This block calculates the minimum SAD value.
The SAD value obtained from the SAD block and Register value which is storing
the minimum SAD value are fed as inputs of the subtractor.
At the Start of comparison the Register value is preset.
The borrow output of subtractor acts as the selection line of the multiplexer.
Comparison Block
25
Motion Vector Computing Unit
The Borrow output from the Subtractor of the Comparison Block is used to store the
address of the reference block having the minimum matching error.
The motion vector is obtained by using the reference address obtained.
26
SIMULATION RESULTS
27
MATLAB SIMULATION RESULTS
28
AVERAGE PSNR (dB) PERFORMANCE OF FS, TSS, DS, ARPS AND FIXED MESH ARPS (30 fps)
Video (Kbps) FS TSS DS ARPS FIXED MESH ARPS
AKIYO(1024) 43.7737 43.6324 43.7575 43.7228 40.6678BRIDGE-CLOSE(1024) 35.0512 35.0512 35.0512 35.0512 34.2305CONTAINER (1024) 38.3828 38.3823 38.3822 38.3820 37.6384ELEPHANT DREAMS (1024) 38.4459 38.0282 38.2063 38.0708 38.1541FOREMAN (1024) 33.8942 33.1140 33.6211 33.4685 28.4741FOOTBALL (1024) 23.5653 22.8817 22.7465 22.8567 22.4860HALL (1024) 35.5239 35.4433 35.4129 35.3432 30.6391MOTHER AND DAUGHTER (1024)
42.2745 42.1989 42.2362 42.2024 39.2444
NEWS (1024) 38.4927 38.4050 38.4446 38.4142 34.5488PARIS (1024) 31.3799 31.0338 31.2952 31.2478 28.2278SILENT (1024) 37.1517 36.9080 36.8728 36.8263 36.9954STEFAN (1024) 25.9698 24.7376 24.3586 25.5327 22.4696TABLE (1024) 31.3935 30.7814 30.6616 30.4916 27.0253TEMPETE (1024) 26.7330 26.5292 26.5532 26.5184 25.1976WATERFALL (1024) 35.3052 35.3051 35.3053 35.3047 32.5124
29
AVERAGE NUMBER OF SEARCH POINTS PER MV GENERATION
Video (Kbps) FS TSS DS ARPS
AKIYO(1024) 262.1717 23.2121 12.2596 5.0378BRIDGE-CLOSE(1024) 262.1717 23.2432 14.2559 6.0570CONTAINER (1024) 262.1717 23.2256 12.3772 5.1253ELEPHANT DREAMS (1024) 262.1717 23.2935 16.343 8.1380FOREMAN (1024) 262.1717 23.2571 16.043 8.4591FOOTBALL (1024) 262.1717 23.3654 20.805 11.7219HALL (1024) 262.1717 23.2510 12.9231 5.8659MOTHER AND DAUGHTER (1024)
262.1717 23.2874 13.375 6.2562
NEWS (1024) 262.1717 23.2128 12.5235 5.3822PARIS (1024) 262.1717 23.2217 12.827 5.7681SILENT (1024) 262.1717 23.2143 12.9878 5.8658STEFAN (1024) 262.1717 23.3177 17.2021 8.1941TABLE (1024) 262.1717 23.2853 13.7468 6.5654TEMPETE (1024) 262.1717 23.2356 12.867 5.9502WATERFALL (1024) 262.1717 23.2121 12.2942 5.2849
30
1 3 5 7 9 1 1 1 3 1 5 1 7 1 9 2 1 2 3 2 5 2 7 2 90
5
10
15
20
25
30
35
40
FS TSS DS ARPS
FRAME NumBER
PSNR (DB)
1 3 5 7 9 1 1 1 3 1 5 1 7 1 9 2 1 2 3 2 5 2 7 2 90
5
10
15
20
25
30
35
40
45FS TSS DS ARPS
Frame Number
PSNR (DB)
Plot of frame by frame PSNR values of FS, DS, TSS and ARPS for Table Tennis and Football Sequence
FOOTBALL SEQUENCETABLE TENNIS SEQUENCE
31
BLOCK-MATCHING ALGORITHM USING ARPS
REFERENCE FRAME
CURRENT FRAME
RECONSTRUCTED FRAME
AKIYO SEQUENCE
32
BLOCK-MATCHING ALGORITHM USING ARPS
REFERENCE FRAME
CURRENT FRAME
RECONSTRUCTED FRAME
MOTHER AND DAUGHTER SEQUENCE
33
BLOCK-MATCHING ALGORITHM USING ARPS
REFERENCE FRAME
CURRENT FRAME
RECONSTRUCTED FRAME
FOOTBALL SEQUENCE
34
XILINX TIMING DIAGRAMS
35
SIMULATION RESULTS
Figure 15: Timing Simulation of Combined Base Address and Reference Address Generation Block.
36
SIMULATION RESULTS
Figure 16: Timing Simulation of Block Pattern Address Generation Block.
37
SIMULATION RESULTS
Figure 17: Timing Simulation of Address Generation Block.
38
SIMULATION RESULTS
Figure 18: Timing Simulation of SAD Block.
39
SIMULATION RESULTS
Figure 19: Timing Simulation of Comparison Block.
40
SIMULATION RESULTS
Figure 20: Timing Simulation of ARP Module.
41
SIMULATION RESULTS
Figure 21: Timing Simulation of URP Module.
42
SIMULATION RESULTS
Figure 22: Timing Simulation of Final Architecture Block.
43
SIMULATION RESULTS
Figure 23: Schematic of Final Architecture by Design Vision.
44
SIMULATION RESULTS
Logic Utilization Used Available Utilization
Number of Slices 1702 63168 2%
Number of Slice Flip Flops 374 126336 1%
Number of 4 input LUTs 3077 126336 2%
Number of bonded IOBs 252 768 32%
Number of FIFO 16/RAMB16s 512 552 92%
Number of GCLKs 5 32 15%
The overall device utilization summary generated using XILINX 14.2 is given
in the table below. (Virtex4, Device: XC4VFX140)
45
Minimum Period: 9.702 ns
Maximum Frequency: 103.027 MHz
Minimum input arrival time before
clock:
2.805 ns
Maximum output required time after
clock:
13.580 ns
SIMULATION RESULTS
The overall timing summary generated using XILINX 14.2 is given in the
table below.
(Speed Grade: -11)
46
SIMULATION RESULTS
Figure 24: XILINX Power Report.
47
SIMULATION RESULTS
Sr. No. Module Name Operating
Conditions
Area Analysis (µm2) Power Analysis
1 Address Generation
Block
SS 12117.9744 281.9040µWTT 11727.3744 352.3345µW
FF 11727.3744 437.9998µW
2 SAD Block SS 6771.4416 323.18.18µW
TT 6496.4592 408.7913µW
FF 6496.4592 505.7709µW
3 Comparison Block SS 10927.4257 545.6620µW
TT 10952.4241 675.6204µW
FF 10977.4225 847.0907µW
4 ARP Module SS 34150.9394 495.8019µW
TT 32654.1602 603.2296µW
FF 32647.9106 745.2020µW
5 URP Module SS 38488.1618 522.8998µW
TT 37022.6305 634.4330µW
FF 37016.3809 793.7040µW
6 Final Architecture SS 73126.5700 974.1342µW
TT 70151.7602 1.1877mWFF 70139.2611 1.4743mW
Synopsys design vision tool (0.18 µm). Results obtained at 10ns (Frequency = 100 MHz).
48
CONCLUSIONS Main focus of this project has been on improving the performance of the motion
estimation block of the video encoder.
Adaptive Rood Pattern Search Algorithm has been used for this purpose. Various BMA techniques such as FS, TSS, DS and ARPS have been compared with each other to determine the most efficient search algorithm. ARPS was found to be having the least time complexity.
The hardware architecture was designed for the implementation of the ARPS algorithm.
Many individual blocks like address generator, comparator and SAD block which are made up of several other smaller blocks were designed and tested. The individual blocks were then integrated to complete the ARP module as well as the URP module. Then, finally both the ARP module and URP module were combined together to develop the ARPS architecture.
49
FUTURE SCOPE Optimization of the proposed Architecture in terms of speed.
Architecture enhancement for mesh based motion estimation using ARPS for removing the disadvantages of block based motion estimation techniques.
It will help in including spatial deformations like zoom, shear and rotation which are present in the video.
50
REFERENCES[1] I.E.G. Richardson, “H.264 and MPEG-4 Video Compression Video Coding for Next –
generation Multimedia”, 2003.
[2] Yao Nie and Kai-Kuang Ma, “Adaptive Rood Pattern Search for Fast Block-Matching
Motion Estimation”, IEEE Transaction on Image Processing, Vol. 11, No. 12, pp. 1442-1448,
December 2002.
[3] Kai-Kuang Ma and Gang Qiu, “An Adaptive Rood Pattern Search for Fast Block-Matching
Motion Estimation in JVT/H.26L”, IEEE Conference on Circuits and Systems, Vol. 2, pp. II-
708-II-711, 2003.
[4] A. Barjatya, “Block Matching Algorithms for Motion Estimation”, IEEE Conference on
Digital Image Processing, April 2004.
[5] D. Vijendra Babu, P. Subramanian, C. Karthikeyan, “Performance Analysis of Block
Matching Algorithms for Highly Scalable Video Compression”, International Symposium on Ad
Hoc and Ubiquitous Computing, December 2006.
[6] Renxiang Li, Bing Zeng and Ming L. Liou, “A new Three-Step Search Algorithm for Block
Motion Estimation”, IEEE Conference on Circuits and Systems for Video Technology, Vol. 4,
No. 4, pp. 438-442, August 1994.
51
[7] Shan Zhu and Kai-Kuang Ma, “A new Diamond Search Algorithm for Fast Block-Matching
Motion Estimation”, IEEE Transaction on Image Processing, Vol. 9, No. 2, pp. 287-290, February
2000.
[8] Yifeng Qiu and W. Badawy, “The Hardware Architecture of a Novel Motion Estimator with
Adaptive Crossed Quarter Polar Search Patterns for H.264 Encoding”, IEEE Conference, pp. 819-822,
2009.
[9] B.G. Kim, S.T. Kim, S.K. Song and P.S. Mah, “Fast-adaptive rood pattern search for block motion
estimation”, Electronic letters, Vol. 41, No. 16, August 2005.
[10] Chunchun Chen, Junming Shan and Eryan Yang, “An improved Adaptive Rood Pattern Search
Algorithm Based on Temporal and Spatial Correlation”, International Conference on Audio,
Language and Image Processing, pp. 1088-1092, 2012.
[11] D. Dia, M. Atri and R. Tourki, “A Improved Fast Motion Block Matching for Wavelet Video
Coding”, International Symposium on Signal Processing and Information Technology, pp. 801-806,
2007.
REFERENCES (contd.)
52
[12] U.Y. Oktiawati and V.V. Yap, “Evaluating the Effects of the Dual Tree Complex Wavelet
Transform and the Adaptive Rood Pattern Search Algorithm on a Video Codec”, IEEE Conference
on Industrial Electronics and Applications, pp. 2544-2547, 2008.
[13] M. Sayed and W. Badawy, “An Affine-Based Algorithm and SIMD Architecture for Video
Compression with Low Bit-Rate Applications”, IEEE Transactions on Circuits and Systems for
Video Technology, Vol.16, No. 4, April 2006.
[14] W. Badawy, G. Zhang and M. Bayoumi, “VLSI Architecture for Hierarchical Mesh Based
Motion Estimation”, IEEE Conference on Signal Processing Systems, pp. 110-119, October 1999.
[15] S. Palnitkar, “Verilog HDL: A Guide to Digital Design and Synthesis”, second edition,
Prentice Hall Inc., February 2003.
[16] Video Lectures on Multimedia Processing by Prof. Somnath Sengupta,
“htttp://nptel.iitm.ac.in/courses/Web course-contents/IIT Kharagpur/Multimedia
Processing/New_index1.html”.
[17] MATLAB User Manual Version R 2012b.
[18] XILINX - ISE User Manual Version 14.2.
REFERENCES (contd.)
53
ELECTRONICS AND ELECTRICAL COMMUNICATION ENGINEERINGINDIAN INSTITUTE OF TECHNOLOGYKHARAGPUR
ELECTRONICS AND ELECTRICAL COMMUNICATION ENGINEERINGINDIAN INSTITUTE OF TECHNOLOGYKHARAGPUR-721302
THANK YOU