15
Spiteri, T., Vafiadis, G., & Nunez-Yanez, JL. (2009). A toolset for the analysis and optimization of motion estimation algorithms and processors. In International Conference on Field Programmable Logic and Applications, 2009 (FPL 2009), Prague (pp. 423 - 428). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/FPL.2009.5272247 Peer reviewed version Link to published version (if available): 10.1109/FPL.2009.5272247 Link to publication record in Explore Bristol Research PDF-document University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/

processors. In International Conference on Field ... · Inte rpola tor Q ua rte r-pe l Inte rpola tor 6 4 -bit hp inte rpola te d pix e ls 1 6 -bit be s t s a d 1 6 -bit c urre nt

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Spiteri, T., Vafiadis, G., & Nunez-Yanez, JL. (2009). A toolset for theanalysis and optimization of motion estimation algorithms andprocessors. In International Conference on Field Programmable Logicand Applications, 2009 (FPL 2009), Prague (pp. 423 - 428). Instituteof Electrical and Electronics Engineers (IEEE).https://doi.org/10.1109/FPL.2009.5272247

    Peer reviewed version

    Link to published version (if available):10.1109/FPL.2009.5272247

    Link to publication record in Explore Bristol ResearchPDF-document

    University of Bristol - Explore Bristol ResearchGeneral rights

    This document is made available in accordance with publisher policies. Please cite only thepublished version using the reference above. Full terms of use are available:http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/

    https://doi.org/10.1109/FPL.2009.5272247https://doi.org/10.1109/FPL.2009.5272247https://research-information.bris.ac.uk/en/publications/e61060fe-a4c7-490e-a95b-6206c69ddb03https://research-information.bris.ac.uk/en/publications/e61060fe-a4c7-490e-a95b-6206c69ddb03

  • Department of Electrical and Electronic Engineering

    Trevor Spiteri, George Vafiadis, Jose Luis Nunez-Yanez2009-09-02

    A Toolset for the Analysis and Optimization of Motion Estimation

    Algorithms and Processors

  • Department of Electrical and Electronic Engineering

    Overview

    • Motion estimation takes time,

    full search expensive for HD.

    • Flexible reconfigurable

    processor.

    • IDE to design and test

    algorithms.

    • Toolset to configure the

    processor.

    2

  • Department of Electrical and Electronic Engineering

    • Motion estimation

    takes processor time.

    • The design space to

    explore is large.

    • Configuring the ME

    processor takes

    developer times.

    Saving time

    0 20 40 60 80 100 120 140 160 180 200

    60%

    10%

    9%

    6%

    4%

    11%

    Execution Time Distribution

    720p pedestrian, baseline, noasm

    Inter (ME)

    Intra

    DCT

    Quantisation

    CABAC

    Other

    Execution Time (s)

    3

  • Department of Electrical and Electronic Engineering

    The reconfigurable processor

    • Advanced features such as rate distortion

    optimization using Lagrangian techniques.

    • Multiple motion vector candidates allowed.

    • Multiple sub-partition sizes allowed.

    • Multiple reference frames allowed.

    • Can do fractional pel motion estimation that can

    be used for the H.264 standard.

    4

  • Department of Electrical and Electronic Engineering

    Simple and complex configurations5

    Fetch, Decode,

    Issue

    Reference

    Memory

    Program

    Memory

    SAD

    Selector

    128-bit Reference Vector

    64-bit Current

    Vector

    16-bit current

    sad 8-bit eu id

    16-bit best sad

    8-bit best eu id

    8-bit Fetch

    addresses

    20-bit

    instructions

    8-bit pattern

    addresses

    12-bit reference

    addresses

    Point Memory

    8-bit point

    addresses

    Address

    Calculator

    16-bit current

    motion vector

    64-bit

    Reference Vector

    Integer-pel

    Execution Unit

    Macroblock

    Memory

    Vector

    Alignment

    Unit

    Fetch, Decode,

    Issue

    Reference

    Memory

    Program

    Memory

    Macroblock

    Memory

    Integer-pel

    Execution

    Unit

    Integer-pel

    Execution

    Unit

    Integer-pel

    Execution

    Unit

    Integer-pel

    Execution

    Unit

    SAD Selector

    Reference

    Memory

    Reference

    Memory

    Reference

    Memory

    128-bit Reference Vector 128-bit Reference Vector 128-bit Reference Vector 128-bit Reference Vector

    64-bit Current

    Vector

    64-bit Current

    Vector64-bit Current

    Vector64-bit Current

    Vector

    16-bit current

    sad

    8-bit eu id 8-bit eu id

    16-bit current

    sad

    16-bit current

    sad

    8-bit eu id

    16-bit current

    sad

    8-bit eu id

    16-bit best sad

    8-bit best eu id

    8-bit Fetch

    addresses

    20-bit

    instructions

    8-bit pattern

    addresses

    12-bit reference

    addresses

    Point Memory Point Memory Point Memory Point Memory

    16-bit point

    addresses

    Address

    Calculator

    Address

    Calculator

    Address

    Calculator

    Address

    Calculator

    12-bit reference

    addresses

    12-bit reference

    addresses

    12-bit reference

    addresses

    16-bit point

    addresses

    16-bit point

    addresses

    16-bit point

    addresses

    16-bit current

    motion vector

    Half-pel

    Reference

    Memory

    Fractional

    -pel

    Execution

    Unit

    SAD

    Selector

    Half-pel

    Interpolator

    Quarter-pel

    Interpolator

    64-bit hp interpolated

    pixels

    16-bit best sad

    16-bit current

    sad

    Vector

    Alignment

    Unit

    Vector

    Alignment

    Unit

    Vector

    Alignment

    Unit

    Vector

    Alignment

    Unit

    Macroblock

    Memory

    Vector

    Alignment

    Unit

    128-bit hp interpolated

    pixels

    128-bit qp interpolated

    pixels

    64-bit interpolated

    vector

    64-bit

    Current

    Vector

    8-bit eu id

    Lagrangian

    Optimizer

    MV

    cost

    Simple (1 integer pel unit)

    Complex (4 int. pel units, 1 frac. pel unit, Lagrangian optimizer)

  • Department of Electrical and Electronic Engineering

    ProcessorConfiguration

    Speed(cycles/MB,frames/second)

    FPGA size(LUTs,slices)

    Memory(BRAMS)

    Base configuration(1 integer-pelexecution unit)

    625 cycles/MB,39 fps

    2289 LUTs,1300 slices

    21 BRAMS(2 ref. areas, 112×128 pixels)

    Complex configuration (4 integer-pelexecution units)

    234 cycles/MB,104 fps

    7074 LUTs,3703 slices

    72 BRAMS(2 ref. areas,112×128 pixels)

    6 Processor performance and complexity evaluation

    Video sequence: 1080p crowdrun from SVT HD multi format test set

    FPGA part: Virtex-4 SX35, 200 MHz clock frequency

    Algorithm: 6-point hexagonal search (up to 8 steps), then 8-point square

  • Department of Electrical and Electronic Engineering

    Designing block-matching algorithms

    • Estimo C: high-level

    language for search

    algorithms.

    • Compiler targets the

    reconfigurable processor.

    • No need to know how

    hardware works.

    • Compiled program works

    across all configurations.

    s = 8; // initial step sizecheck(0, 0);check(0, s);check(0, -s);check(s, 0);check(-s, 0);update;do {

    s = s / 2;for (i = 1 to 5 step 1) {

    check(0, s);check(0, -s);check(s, 0);check(-s, 0);update;#if (WINID == 0)

    #break;}

    } while (s > 1);for (x = -0.5 to 0.5 step 0.25)

    for (y = -0.5 to 0.5 step 0.25)check(x, y);

    update;

    7

  • Department of Electrical and Electronic Engineering

    Cycle-accurate simulator

    • Analysing processor

    configurations on

    hardware takes time.

    • Using simulator, no need

    for synthesizing hardware

    and configuring board.

    • No hardware required for

    evaluation of processor.

    8

  • Department of Electrical and Electronic Engineering

    The IDE9

  • Department of Electrical and Electronic Engineering

    The IDE10

  • Department of Electrical and Electronic Engineering

    The IDE11

  • Department of Electrical and Electronic Engineering

    The IDE12

  • Department of Electrical and Electronic Engineering

    Prototype implementation13

    FPGAPCI Core

    AHB/APB

    Bridge

    Memory

    Controller

    UART

    Interface

    Timer

    Unit

    Interrupt

    Controller

    AHB Master

    AHB Slave

    APB Slave

    AHB

    APB

    PCI

    On-board memory

    JTAG Interface

    me_

    wrapper

  • Department of Electrical and Electronic Engineering

    Summary

    • Programmable and configurable processor supports HD

    motion estimation

    (supports H.264, MPEG-4, MPEG-2, VC-1, AVS).

    • Motion Estimation Processor: http://www.opencores.org/

    • Estimo C compiler for easy development of proprietary

    block-matching algorithms.

    • FPGA-based PCI demonstration board available.

    • Cycle-accurate simulator for quick evaluation and design

    space exploration.

    • SharpEye IDE: http://sharpeye.borelspace.com/

    14