37
1 Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto

Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  • Upload
    emmet

  • View
    47

  • Download
    3

Embed Size (px)

DESCRIPTION

Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation. Tor Aamodt and Paul Chow University of Toronto. Presentation Outline. Background / Motivation Floating-to-Fixed-Point Conversion Architectural Support Experimental Results Summary / Future Directions. - PowerPoint PPT Presentation

Citation preview

Page 1: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

1

Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt and Paul Chow

University of Toronto

Page 2: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 2 / 38

Presentation Outline

Background / Motivation

Floating-to-Fixed-Point Conversion

Architectural Support

Experimental Results

Summary / Future Directions

Page 3: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 3 / 38

Background: University of Toronto DSP Project

Motivation: DSP Compiler/Architecture Co-design First Generation Silicon (Sean Peng’s M.A.Sc. Thesis) taped-

out Sept. 30, 1999: 108 pin PGA / 0.35 µm CMOS / 63 MHz 16-bit Fixed-Point VLIW with Two-Level Instruction Fetching Harvard Memory Architecture 5 stage pipeline: IF1 IF2 ID EX WB 7 function units:

2 integer units: 16.0 multiply & 1.15 multiply operations 2 address units: modulo addressing 2 memory units: each tied to one data memory bank 1 control unit

Page 4: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 4 / 38

Background:

Fixed-Point versus Floating-Point

32 bit Floating-Point (IEEE):

Fixed-Point:

sign bit

sign bit

8 bit exponent (excess 127)

fractional part

IWL

integer part

23+1 bit normalizedmantissa

Page 5: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 5 / 38

Background:

Fixed-Point versus Floating-Point

Property WL-bit Fixed-Point 32 bit Floating-Point

Dynamic Range of |x| [0,2IWL) (2-126, 2127)

Precision of x: |x / x| x -1 2(1+IWL - WL) 2-23

Function Unit Cost significantly less

This factor motivates us to find ways of coping with the shortcomings of fixed-point representations

Page 6: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 6 / 38

Motivation

Why convert floating-point code to fixed-point code? Saves area and power.

Why automate the process? Manual conversion is time-consuming and error-prone.

What qualities are we looking for in an automated conversion system? Good signal quality*. Fast code.

Page 7: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 7 / 38

Background: Fixed-point Numerical Representations in Signal Processing

Consider a program P with associated inputs x(k) SP. Example: P an IIR filter, SP the set of all human speech samples x(k).

Signal Scaling: Integer Word Length (IWL)

definition: IW Ld ef x S P

lo g | |m ax,2

Input, program variable, intermediate result, output For all definitions of , and all inputs x + an infinitesimally small number. Why? e.g. log22 = 1

Page 8: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 8 / 38

Background:

Fixed-Point Arithmetic Operations

n

>> n (binary point alignment)

>> 1

( + 1)

Overflow Guard BitsAddition / Subtraction

B:

A:

Multiplication

IWLA+ IWLB

A*B:

IWLB

IWLA

A:

B:

???

Page 9: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 9 / 38

Presentation Outline

Background Material / Motivation

Floating-to-Fixed-Point Conversion

Architecture Support

Experimental Results

Summary / Future Directions

Page 10: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 10 / 38

Conversion Process:

Previous Work

‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997.

A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997.

Page 11: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 11 / 38

Conversion Process: OverviewInput C File

SUIF Front End

Math Library Replacement

Alias Analysis &ID Assignment

Instrument CodeProfile to obtainDynamic Ranges

Generate ScalingOperations

Code Generation /Detect & GenerateFMLS operations

UofT DSP Simulator

float *p, x, y, A[N], B[N];

for( int i=0; i < N; i++ ){ p = (condition) ? A : B; y += x*p[i];}

float fubar( float *p ){ float sum = 0.0; for( int i=0; i < N; i++) sum += p[i];}

“sin(x)” “utdsp_sin(x)”

Page 12: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 12 / 38

Conversion Process: Collecting Dynamic Range Information

y +

*

*

a

x[i+1]

b

x[i]

Equivalent Expression Tree:

ID Assignment:

“1” : tmp_1

“2” : tmp_2

“0” :

profile(tmp_1,1);

profile(tmp_2,2);

profile(y,0);

Code Instrumentation:

Consider the ANSI C code:

float a, b, x[N]; y = a*x[i] + b*x[i+1];

tmp_1 = a*x[i];

tmp_2 = b*x[i+1];

y = tmp_1 * tmp_2;

fin

Page 13: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 13 / 38

Conversion Process:

Desired Result

Continuation of Previous Example :

float a, b, x[N];y = a*x[i] + b*x[i+1];

int a, b, x[N];

y = a•x[i] >> 2 + b•x[i+1];

2. Scaling Operations

1. Type Conversion

3. Fractional Fixed-Point Operations

Page 14: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 14 / 38

Conversion Process:

Type Conversion / Scaling Operation Generation

Type conversion: {float, double} int

Scaling Operations are added to expression trees using a post-order traversal...

Two previous algorithms from the literature for generating scaling operations...

Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes in a bottom-up fashion.

Is Useful Information Lost?

Page 15: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 15 / 38

Conversion Process:

IRP: Using Intermediate Result Profile Data ‘Worst-Case Evaluation’: Markus Willems et. al.

FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997.

A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997.

UTDSP Algorithms: IRP, IRP-SA Each node has a measured IWL and a current IWLMeasured: IWL as determined by profilingCurrent: IWL due to scaling operations within

Page 16: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 16 / 38

Scaling Operation Generation

IWLA measured

IWLA current

IWLA op B measured

IWLA op B current

IWLB measured

IWLB current

Converted Sub-Expressions

Example: “A op B”:

op

A B

?

Page 17: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 17 / 38

IRP: Additive Operations

where: nA = IWLA current - IWLA measured

nB = IWLA current - IWLB measured

n = IWLA measured - IWLB measured

“A B” “(A << nA) (B >> [n-nB])”

IWLA+B current = IWLA measured

n

“A ± B”

B:

A:

For example, assume |A| > |B|, andIWLA+B measured IWLA measured

>> n

Page 18: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 18 / 38

IRP: Multiplication

“A • B” “(A << nA) • (B << nB)”

where: nA = IWLA current - IWLA measured

nB = IWLA current - IWLB measured

IWLA•B current = IWLA measured + IWLB measured

Note: Typo in Notes!IWLA•B current = nA + nB

Page 19: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 20 / 38

IRP-SA: Using ‘Shift Absorption’

Problem:

Question: Is information discarded unnecessarily here?

y = (a*x[i] + b*x[i+1]>>1) << 1

y = (a*x[i]<<1) + b*x[i+1]

Answer: Yes! Consider the following alternative:

Assuming 2’s-complement arithmetic, this expression results in a more precise answer.

Page 20: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 21 / 38

Presentation Outline

Background Material / Motivation

Floating-to-Fixed-Point Conversion

Architecture Support

Experimental Results

Summary / Future Directions

Page 21: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 22 / 38

Architectural Support

Left Shift

A*B:

A:

B:

Common occurrence (using IRP-SA):

A•B << n

Fractional Multiplication with integrated Left Shift:

Page 22: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 23 / 38

Presentation Outline

Background Material / Motivation

Floating-to-Fixed-Point Conversion

Architecture Support

Experimental Results

Summary / Future Directions

Page 23: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 24 / 38

Experimental Results

Four test-cases presented in paper:(1) 4th Order IIR Filter

(2) 1024 Point Radix 2 Decimation in Time FFT

(3) Nonlinear Feedback Control System

(4) 16th Order Lattice Filter

Look at (1) in detail, summarize results for others.

Explore some interesting properties exhibited in (4) that are indicative of possible future improvements.

Page 24: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 25 / 38

Experimental Results:

4th Order IIR Filter4th Order Chebyshev Type II Low-Pass FilterDesigned using MATLAB’s cheby2 commandTransfer Function:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-300

-200

-100

0

100

Normalized Frequency (´p rad/sample)

Pha

se (

degr

ees)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-100-80-60-40-20

020

Normalized Frequency (´p rad/sample)

Mag

nitu

de (

dB)

Page 25: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 26 / 38

Experimental Results

4th Order IIR Filter (cont’d)

Filter Realization:MATLAB’s tfsos command (pole-zero pairing)2 Cascaded Direct-Form IIR filters

Algorithm14 Bit 16 Bit

w/o FMLS w/ FMLSw/o FMLS w/ FMLS

SNU-4

WC

IRP

IRP-SA

44.7 dB44.7 dB 56.4 dB 56.4 dB

45.6 dB 45.6 dB 57.1 dB57.1 dB

49.2 dB 49.3 dB 60.9 dB 62.0 dB

48.8 dB 53.5 dB 61.0 dB 66.9 dB

Page 26: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 27 / 38

Experimental Results

4th Order IIR Filter (cont’d)

(A2[0]*t2 << 3) - (A2[1]*D2[0] << 3) + (A2[2]*D2[1] << 3)

IRP:

IRP-SA:

(A2[0]*t2 - A2[1]*D2[0] << 1) + (A2[2]*D2[1] << 1 ) << 2

Page 27: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 28 / 38

Experimental Results:

1024-Point Radix-2 FFT

Algorithm14 Bit 16 Bit

w/o FMLS w/ FMLSw/o FMLS w/ FMLS

SNU-4

WC

IRP

IRP-SA

28.7 dB28.7 dB 36.7 dB 36.7 dB

28.7 dB 28.7 dB 36.7 dB36.7 dB

28.7 dB 34.9 dB 36.7 dB 44.6 dB

28.7 dB 34.9 dB 36.7 dB 44.6 dB

Page 28: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 29 / 38

Experimental Results:

Rotational Inverted Pendulum

U of T System Control GroupNon-linear Testbench

Page 29: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 30 / 38

Experimental Results:

Rotational Inverted Pendulum

Algorithm14 Bit 16 Bit

w/o FMLS w/ FMLSw/o FMLS w/ FMLS

SNU-4

WC

IRP

IRP-SA

42.7 dB4.0 dB 30.7 dB 54.9 dB

47.3 dB 54.3 dB 66.1 dB59.2 dB

53.1 dB 58.4 dB 65.8 dB 71.8 dB

52.8 dB 59.4 dB 64.4 dB 72.0 dB

Page 30: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 31 / 38

Experimental Results:

Rotational Inverted Pendulum - 12-bit Controller Comparison

WC : 32.8 dBIRP-SA: 41.1 dBIRP-SA w/ fmls: 48.0 dB

Page 31: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 32 / 38

Experimental Results:

16th Order Lattice Filter

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1000

-500

0

500

1000

Normalized Frequency (´p rad/sample)

Pha

se (

degr

ees)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80

-60

-40

-20

0

20

Normalized Frequency (´p rad/sample)

Mag

nitu

de (

dB)

16th Order Elliptic Bandpass Filter Transfer Function

Page 32: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 33 / 38

Experimental Results:

Lattice Filter

Algorithm 32 Bit w/o Loop Unrolling 16 Bit w/ Loop Unrolling

w/o FMLS w/ FMLSw/o FMLS w/ FMLS

SNU-4

WC

IRP

IRP-SA

22.8 dB22.8 dB 47.1 dB 47.0 dB

28.1 dB 28.1 dB 48.3 dB48.3 dB

36.1 dB 36.2 dB 51.3 dB 51.3 dB

36.1 dB 36.2 dB 51.3 dB 50.9 dB

Page 33: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 34 / 38

Experimental Results:

Lattice Filter#define N 16;double state[N+1], K[N], V[N+1];

double lattice( double x ){ double y = 0.0; for( int i=0; i < N; i++ ) { x = x - K[N-i-1] * state[N-i-1]; state[N-i] = state[N-i-1] + K[N-i-1]*x; y = y + V[N-i]*state[N-i]; } state[0] = x; return y + V[0]*state[0];}

Page 34: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 35 / 38

Experimental Results:

Lattice Filter

Observation: Wide Dynamic Ranges of “state”, “V”, “x”, and “y” are due to ‘Name Dependencies’ of array elements and accumulators when assigning integer word lengths.

Can use Loop Unrolling + Renaming to break dependencies and achieve far better results (iteration dependant analysis mentioned in FRIDGE paper—however no experimental results reported)

Page 35: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 36 / 38

Presentation Outline

Background Material / Motivation

Floating-to-Fixed-Point Conversion

Architecture Support

Experimental Results

Summary / Future Directions

Page 36: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 37 / 38

Summary

Intermediate result profile data can used to reduce numerical error of fixed-point code.

A fractional multiply with integrated left shift operation can improve the results, especially when combined with the IRP-SA algorithm.

Improvements between 3.0 dB and 12.8 dB have been observed so far.

Page 37: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul ChowUniversity of Toronto

Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 38 / 38

Future Directions

Structural Transformations

Extended Precision Arithmetic

Overflows due to accumulated rounding error — use two profiling phases to estimate the effect of ‘second-order’ interactions.