46
10 - DSP Firmware Andreas Ehliar October 9, 2014 Andreas Ehliar 10 - DSP Firmware

10 - DSP Firmware · 2015. 9. 3. · 10 - DSP Firmware Andreas Ehliar October 9, 2014 ... Todays lecture I Mea Culpa I DSP Firmware I Application modeling I Hardware Veri cation Andreas

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • 10 - DSP Firmware

    Andreas Ehliar

    October 9, 2014

    Andreas Ehliar 10 - DSP Firmware

  • Todays lecture

    I Mea Culpa

    I DSP Firmware

    I Application modeling

    I Hardware Verification

    Andreas Ehliar 10 - DSP Firmware

  • Mea culpa

    I Step 1: Come up with an interesting idea during a lecture

    I Step 2: Tell students I will look into this until next lecture

    I Step 3: Write test program for original version and run it

    I Step 4: Modify test program so it uses the new idea and run it

    I Step 5: Rejoice! The modified test program produces thecorrect output!

    I Step 6: Tell students during next lecture that the idea worksgreat!

    I Step 7: Profit!

    Andreas Ehliar 10 - DSP Firmware

  • Mea culpa

    I Step 1: Come up with an interesting idea during a lecture

    I Step 2: Tell students I will look into this until next lecture

    I Step 3: Write test program for original version and run it

    I Step 4: Modify test program so it uses the new idea and andrun it WHILE FORGETTING TO RECOMPILE IT. . .

    I Step 5: No profit :(

    Andreas Ehliar 10 - DSP Firmware

  • Mea culpa

    I Conclusion 1: The schematic I draw last lecture does nothandle arbitrary memory offsets

    I It does however handle a situation where the bit-reversedbuffer (of size N is located at an address which is evenlydivisible by N)

    I Conclusion 2: When creating a testbench for hardware it isvery important to actually test the testbench itself. (E.g., byintroducing errors and making sure that the testbench willcatch those errors.)

    Andreas Ehliar 10 - DSP Firmware

  • Conclusions: Application modelling

    I We need something to testI Instrument reference applicationI Write new application using matlab, C, etc

    I We need some way to evaluate our resultsI RMS, SNR, Absolute error

    I Based on the standard or other requirements

    I Subjective tests (ABX and other double blind tests)

    Andreas Ehliar 10 - DSP Firmware

  • Conclusions: Application modelling

    I We need to use reasonable datatypesI Fixed point with appropriate bit widthsI Floating point with appropriate bit widths

    I We need to use reasonable algorithmsI FFT, Fast DCT, Newton-Raphson, CORDIC, lookup tables,

    etc...I Algorithms need to be adaptable to our HW

    Andreas Ehliar 10 - DSP Firmware

  • Conclusions: Application modelling

    I Finally, our algorithms must use reasonable sized program,data and constant memory

    I We don’t want megabytes of lookup-tables

    Andreas Ehliar 10 - DSP Firmware

  • Other issues

    I On-chip/off-chip memory usage?I DMA?I Cache?I Memory organization? (e.g. tile-based or linear? Multibank?)

    I Interrupt latenciesI Reserve registers for interrupts? (In software or hardware?)

    Andreas Ehliar 10 - DSP Firmware

  • DSP Firmware

    I Challenges compared to normal desktop applicationsI Real-time requirementsI Low memory requirementsI Specialized processors with limited compiler support (e.g.I Often cumbersome to fix bugs using software updates

    Andreas Ehliar 10 - DSP Firmware

  • Writing large assembly programs

    I Avoid this.I But if you don’t have a compiler you don’t have a choice

    I You need a reference code in a high level languageI You get to play C to assembly compiler yourselfI At every step you should be able to compare the intermediate

    output from your C code with your assembler output

    Andreas Ehliar 10 - DSP Firmware

  • C-code and other high level languages

    I The closer the C-code to HW, the better can be the resultfrom the C-compiler

    I Understand the compiler in detailI gcc -S

    I Annotate enough ”Compiler known” functionsI Functional verification of compiled code

    I Don’t forget the regression suite for SW!

    Andreas Ehliar 10 - DSP Firmware

  • C-code and other high level languages

    I Inline assemblerI Use C for everything but the most critical loopsI Use inline assembler for these

    I Don’t optimize before you benchmark!

    Andreas Ehliar 10 - DSP Firmware

  • C-code and other high level languages

    I Try to save memoryI Know your C compiler outputI It may be a good idea to allocate memory staticallyI Don’t use dynamic memory allocation if you can avoid it (new,

    malloc)I Don’t use huge library functions out of convenience (printf vs

    puts)I Don’t use floating point math if your DSP processor doesn’t

    have HW support for it...

    Andreas Ehliar 10 - DSP Firmware

  • Low cycle cost assembly kernels

    I Gain much by saving cycles in an inner loop!

    I Use REPEAT instead of conditional jump

    I Loop unrolling

    I The code cost of inner loops is not so important!

    I Use as many vector instruction as possible

    I Keep useful data in RF as long as possibleI Use conditional execution if needed

    I Exception: Modern OoO processors with good branchpredictors

    Andreas Ehliar 10 - DSP Firmware

  • When to benchmark/profile?

    I When the project has reached:I Pen and paper

    I Can this be done for lab 4? Try it!

    I Application modellingI Crude MIPS count available here based on instrumentation for

    example

    I ASIP instruction selectionI Firmware development

    Andreas Ehliar 10 - DSP Firmware

  • Tools of the trade

    I For calculating sin/cos/tan, x4/3, etc there are manypossibilities

    I CORDICI Lookup tablesI Newton-RaphsonI TaylorI ChebyshevI etc

    I Combinations are also possibleI First lookup-table, then a few Newton-Raphson iterations

    Andreas Ehliar 10 - DSP Firmware

  • Tools of the trade

    I Algorithmic strength reductionI FFT/DCT/etcI Fast Fir Algorithms (FFA)I Fast Matrix x Vector multiplications (Strassen)I (And others. )

    Andreas Ehliar 10 - DSP Firmware

  • Tools of the trade

    I Fast Matrix multiplicationI Winograd’s inner product

    I (Can be used for FIR filters as well to halve the number ofmultiplications (ISCAS2013))

    I Strassen (O(22.8))http://en.wikipedia.org/wiki/Strassen_algorithm

    I (Later work brings down the complexity to less than O(22.4),but only for very large matrices.)

    Andreas Ehliar 10 - DSP Firmware

    http://en.wikipedia.org/wiki/Strassen_algorithm

  • Tools of the trade

    I Strength reduction algorithms often show better performanceon paper than in real benchmarks

    I Reason: Caches, Branch prediction, addressing irregularities,etc

    I However, we are free to design a processor in whatever waynecessary => such algorithms may make more sense forASIPs than general purpose processors.

    Andreas Ehliar 10 - DSP Firmware

  • Tools of the trade (Esoteric)

    I Esoteric way of calculating 1/sqrt(x) where x is a floatingpoint value

    I Google 0x5f3759d5

    I Fun reading: Hacker’s delightI Tips for various bit-level manipulations, etcI Example: Why would you calculate x & (x-1)?I Or x & (-x)?

    I See also http://aggregate.org/MAGIC/ and http://graphics.stanford.edu/~seander/bithacks.html

    Andreas Ehliar 10 - DSP Firmware

    http://aggregate.org/MAGIC/http://graphics.stanford.edu/~seander/bithacks.htmlhttp://graphics.stanford.edu/~seander/bithacks.html

  • Memory efficiency woes

    I 1. Minimize memory costsI Low program memory costsI Low data memory costs

    I 2. Minimize memory transaction costsI Minimize on off chip swappingI Minimize data transfer between tasksI Minimize load and store

    I It is usually hard to minimize both at the same time

    Andreas Ehliar 10 - DSP Firmware

  • Memory efficient

    I Find algorithms use less on chip data memories. For example,some algorithms require fewer coefficients.

    I Trade computing complexity for memory efficiency

    I Select algorithms with full memory access predictability ifpossible. Data can thus be stored in off-chip memory andpre-fetched efficiently when needed.

    Andreas Ehliar 10 - DSP Firmware

  • Example: Iterate over unit circle

    for(i=0; i < 512; i++) {

    foo[i] *= cos((float)i*M_PI/256;

    }

    Andreas Ehliar 10 - DSP Firmware

  • With lookup table for cos and sin

    tmp = 0;

    for(r = 0.0; r < M_PI*2; r+=M_PI/256){

    LUT[tmp++] = cos(r);

    }

    // ... at some other point in the program:

    for(i=0; i < 512; i++){

    foo[i] *= LUT[i];

    }

    Andreas Ehliar 10 - DSP Firmware

  • Vector rotation to calculate sequence of cos/sin values

    // Calculate cosine function using vector rotation

    A=0;

    B=1;

    for(i=0; i < 512; i++){

    foo[i] *= B;

    C = A*cos(M_PI/256)-B*sin(M_PI/256);

    B = A*sin(M_PI/256)+B*cos(M_PI/256);

    A=C;

    }

    Andreas Ehliar 10 - DSP Firmware

  • Vector rotation to calculate sequence of cos/sin values

    I Not perfectI Precision is not that goodI Calculation cost of vector rotation is higher than table lookup

    I Multiplication is fairly power hungryI (But we don’t need power hungry memory accesss)I No need for fairly large lookup table

    I Only works for regular sin/cos function calls

    Andreas Ehliar 10 - DSP Firmware

  • Real time considerations

    I In a real-time system it is important to know about worst caseexecution time (WCET)

    I Different algorithms have different sensitivities to input dataI Program path analysis

    I Dynamic run time analysisI Static run time analysis

    Andreas Ehliar 10 - DSP Firmware

  • Profiling example of MP3 decoding

    I Note how some parts take the same amount of timeindependent ot eh bit-rate whereas the bit-rate have a hugeeffect on the run time of other parts

    I Conclusion: You can’t trust your average execution timeAndreas Ehliar 10 - DSP Firmware

  • Coding quality checklist

    I Try to use double precision instructions and keep computinginside the MAC

    I Insert and optimize data measurement and scaling subroutines

    I Use guard and shift together to avoid overflow

    I Perform truncation and rounding at the right time

    Andreas Ehliar 10 - DSP Firmware

  • Important techniques for DSP firmware developer

    I Using metrics like SNR and RMS error to determine thequality of the implementation

    I Know how to calculate functions like pow(), sin(), cos(),tan(), etc efficiently in a given scenario

    I Know how to minimize memory usage or trade off memoryusage vs computing complexity

    I Trade off latency vs throughput (c.f. lab 1)

    Andreas Ehliar 10 - DSP Firmware

  • HW verification

    I Verification cost is huge (e.g., more than double the cost ofRTL development)

    Andreas Ehliar 10 - DSP Firmware

  • Verification - Simple nonchecking testbench

    initial begin

    rst = 1;

    #10; // Wait for 10 ns

    rst = 0;

    opa = 32;

    opb = 45;

    ctrl=1;

    #10;

    opa = 45;

    opb = 11;

    ctrl=2;

    #10;

    // And so on...

    Andreas Ehliar 10 - DSP Firmware

  • HW verification

    I Verification cost is huge (e.g., more than double the cost ofRTL development)

    Andreas Ehliar 10 - DSP Firmware

  • Verification - Simple nonchecking testbench

    initial begin

    rst = 1;

    #10; // Wait for 10 ns

    rst = 0;

    opa = 32;

    opb = 45;

    ctrl=1;

    #10;

    opa = 45;

    opb = 11;

    ctrl=2;

    #10;

    // And so on...

    Andreas Ehliar 10 - DSP Firmware

  • HW verification

    I Verification cost is huge (e.g., more than double the cost ofRTL development)

    Andreas Ehliar 10 - DSP Firmware

  • Verification - Simple nonchecking testbench

    initial begin

    rst = 1;

    #10; // Wait for 10 ns

    rst = 0;

    opa = 32;

    opb = 45;

    ctrl=1;

    #10;

    opa = 45;

    opb = 11;

    ctrl=2;

    #10;

    // And so on...

    Andreas Ehliar 10 - DSP Firmware

  • Verification - Simple nonchecking testbench

    initial begin

    rst = 1;

    #10; // Wait for 10 ns

    rst = 0;

    opa = 32;

    opb = 45;

    ctrl=1;

    #10;

    opa = 45;

    opb = 11;

    ctrl=2;

    #10;

    // And so on...

    Not a good idea!

    Andreas Ehliar 10 - DSP Firmware

  • Verification using files

    initial begin

    fd = $fopen("indata","r");

    while(!finished) begin

    @(posedge clk);

    $fgets(buf,fd);

    numdata= $sscanf(line,"%08x %08x %08x",x,y,z);

    if(numdata == 3) begin

    opa

  • Verification using files

    I Essentially what we do in the labsI Very nice for processor development

    I You need a simulator anyway for processor development

    I Can be cumbersome for many other systems where it is notnatural to write a simulator

    Andreas Ehliar 10 - DSP Firmware

  • (System)Verilog and VHDL are programming languages!

    I Write testbenches in a structured mannerI Divide and conquer!I Use tasks/procedures to make the testbenches easier to

    understandI You can, in many cases, make the testbench selfchecking

    without any need for external filesI Model the system using behavioral code

    Andreas Ehliar 10 - DSP Firmware

  • Example: Verifying a divider

    task testdivs; // (Would be a procedures in VHDL)

    input [BITS1:0] dividend;

    input [BITS1:0] divisor;

    begin

    @(posedge clk);

    divop

  • Example: Verifying a divider

    task simpletest;

    begin

    $display("Testing simple values");

    test_divu(1,1);

    test_divu(2,1);

    test_divu(2,2);

    $display("Testing corner values (large)");

    test_divu(32’h80000000,1);

    test_divu(32’h80000000,32’h80000000);

    test_divu(0,32’h80000000);

    test_divu(1,32’h80000000);

    test_divu(2,32’h80000000);

    // And so on

    Andreas Ehliar 10 - DSP Firmware

  • Example: Verifying a divider

    initial begin

    startclock();

    releasereset();

    simpletest();

    simpletest_signed();

    test_small_values();

    test_corners();

    test_random_values();

    stopclock();

    $display("All tests finished!");

    end

    Andreas Ehliar 10 - DSP Firmware

  • Design for verification

    I Remove unneeded complexity => Reduced verification cost

    Andreas Ehliar 10 - DSP Firmware

  • Other things to look into

    I SystemVerilog has a lot of nice features for testbenchesI Fifos, classes, interfaces, assertions and many othersI Look for Verification Methodology Manual for inspiration

    Andreas Ehliar 10 - DSP Firmware