[IEEE 2006 IEEE Workshop on Signal Processing Systems Design and Implementation - Banff, AB, Canada (2006.10.2-2006.10.4)] 2006 IEEE Workshop on Signal Processing Systems Design and

Spatial Optical Distortion Correction in an FPGALin Qiang and Nigel M Allinson

Abstract Due to the complexities of the image processingalgorithms, correcting spatial distortion of optical imagesquickly and efficiently is a major challenge. This paperdescribes an efficient pipelined parallel architecture for opticaldistortion correction in imaging systems using a low cost FPGAdevice. The proposed architecture produces a fast, almost real-time solution for the correction of image distortionimplemented using VHDL HDL with a single Xilinx FPGAXCS31000-4 device. The experimental results show that thebarrel and pincushion distortion can be corrected with a verylow residual error. The system architecture can be applied toother imaging processing algorithms in optical systems.

I. INTRODUCTION

O NE picture is worth ten thousand words. Provided thatit is a true representation of the subject, the story that it

tells will be properly understood. Unfortunately opticalimaging systems are prone to distortion and the challenge isto correct for these as quickly and efficiently as possible.

The introduction of FPGA processors has provided apowerful platform with which to undertake these taskssubject to overcoming the computational complexities of thealgorithms. Spatial distortion is commonly seen in opticalsystems [1] especially when utilizing complex lens systems.It is manifested by changes in the shape of an image and thetwo most prevalent types of spatial distortion are pincushionand barrel. Of particular interest to us is the use of systemscoherent fibre-optic tapers commonly used in scientific andmedical x-ray imaging systems. These tapers are opticallymore efficient than lens systems for moderatedemagnification ratios. However, they do suffer fromsignificant optical distortions. The methods presented herefor near real-time correction are equally applicable for otheroptical systems.

Several image processing algorithms have been developedfor spatial distortion tasks. Sawhney et al. [2] calibrated alens with a third-order correction algorithm for the creationof video mosaics. Tsai [3] proposed a second-order radialcorrection for 3D camera calibration using an off-the-shelfTV camera and lenses. An on-line camera calibration

Manuscript received May 27, 2006. This work was supported in part bythe RC-UK Basic Technology Programme - Multi dimensional IntelligentIntegrated Imaging (EPSRC Ref GR/S85733/01).

L. Qiang is with the Department of Electronic and ElectricalEngineering, Laboratory of Image and Vision Engineering, University ofSheffield, Mappin Street, Sheffield, SI 3JD, UK (telephone: +44(0)114222 5832; fax:+44(0)114 222 5146; e-mail: l.qiang(sheffield .ac.uk).

N. Allinson is with the Department of Electronic and ElectricalEngineering, Laboratory of Image and Vision Engineering, University ofSheffield, Mappin Street, Sheffield, S1 3JD, UK (e-mail:n.allinson(sheffield.ac.uk).

technique based on least-square estimation was presented byAsari et al. [4]. Most published work on image distortioncorrection has been demonstrated in software due to the easeof implementation of the complex computational processeson microprocessor-based systems. However, softwareimplementation has inherent problems due to its speedlimitations.

Since image processing algorithms, in general, arecomputationally expensive and become very time-consuming for high frame rates or large images, hardwareimplementation provides not only an optimal parallelenvironment, but also a potentially faster and real-timesolution. The hardware implementation of image processingalgorithms, can provide superior performance [5]-[7] byusing special purpose hardware compared to generalpurpose microprocessors. This is because applicationspecific hardware is configured to meet the precisecomputational requirements of the algorithms used. Inparticularly, the advent of multi-million gate FPGAs (e.g.Spartan-3), with low cost and high gate density for highlyefficient DSP system designs, provides the opportunity torecreate the image processing for a sophisticated system in asingle programmable device. The Spartan-3 FPGA combinesa high capacity of flexible programmable logic, optimallyembedded hardware multipliers, advanced clockingcircuitry, versatile embedded memory and fast interconnectstructures. So it can provide a fully compliant platform foran image distortion correction processor.

In this paper, an FPGA implementation of a pipelinedparallel architecture is described which is used for opticalimage distortion correction. The distorted image is obtainedfrom a CMOS sensor at real-time and the corrected image isstored in memory device. The entire distortion correctionprocess is implemented in VHDL and then simulated andsynthesized into the FPGA.

II. IMAGE PROCESSING

In an imaging system, it can be assumed that thedistortion is circularly symmetric and can be corrected usinga polynomial equation that provides a correction in the radialdirection. Under this assumption, the required processing fordistortion correction is described below.

In general, the distorted image obtained from a sensor isrepresented in an (x, y) Cartesian coordinates. The correctionpolynomial uses polar coordinates, i.e. the radius and angle(p, 0). These can be obtained by performing forwardcoordinate transformation:

1-4244-0383-9/06/$20.00 §2006 IEEE 268

p= x2+y2 0 =arctan(yx) (1)

If the image system is distortion-free, the object should beimaged in a linear proportion to its shape. However, bothbarrel or pincushion distortion deforms the image by aspatial distortion that is nonlinear. When viewing arectangular image, the barrel distortion causes the image tobe compressed at the extreme regions, and pincushiondistortion expands it in the outermost regions, especially atthe corners.

In order to deal with this problem, a distorted image of astandard pattern, such as a checkerboard is employed, andthe model parameters are then estimated as a basis for thissquare distorted pattern. In mathematical terms, the radius pis corrected by a back-mapping n-order polynomial functionas expressed by

2 3 + c44 + + CnPc0:::: C1P±+C2P±+C3P±c4P ±..dPn (2)

where p' is the corrected radius, the coefficients (c1, C2, c3,c4, ., cO) can be estimated by a polynomial regression

[ Cl C2 C3 C4 .. Cn*c] T=::::X [Pl )022Ap33 ... pnnn-l (3)

where Pi, P2, P3, P4 ..., pn are the radii of the distortedimage and the arrayX is represented by

P1' (P1)2 (P1')3 (P1') 4 ... (p,)fnP2 (P2')2 (P2 )3 (P2 ')4 * (p2 )n

X P3 (p3 )2 (p3')3 (p3')4 ... (p3 )n

Pm (Pm')2 (Pm')3 Pm')4 * P.,)n

where (Pi', P2', P3', ..., Pm') are the radii of the correctedimage.

Haneishi's experimental results [8] show that errors areminimal for a 4th order polynomial or greater. Since themodel is circularly symmetric, the corrected angle variablea should not change in the corrected image space from thecorresponding value of 0, i.e. 0' = 0. Using the correctedpolar coordinates (p', M), the corresponding Cartesiancoordinates (x', y') for the corrected image can be obtainedby performing backward coordinate transformation:

x'= p'sinO', y'= p'cos0' (4)

Unfortunately, when the grid-coordinate pixels in thedistorted image are mapped to the corrected image, theircounterparts in the corrected image need further correctionbecause the mapping is nonlinear. The corrected pixel has tobe referred to the original grid-coordinate system and thenew pixel value T(x', y') is derived by linear interpolationfrom the neighboring pixels, as shown Fig. 1. This produces

the interpolated point that is a neighbor with points(a b1), (a, b1 + 1), (a, + 1, bI ) and (a, + 1,bh + 1). In order tomap each pixel onto the grid-coordinate in the correctedimage, the corresponding point is obtained by regressionanalysis. The new pixel is derived by a linear interpolationfrom the neighboring pixels or the nearest neighboringpixels. This is a form of polynomial regression. The linearinterpolation can be expressed by

Tab) = ,b (1-af )(1-bf ) + T(al )af(1b)

+ T(a,b/+l) (1-af )bf + T(a +l,b +l)afbf(5)

where a = a, + af,b = b& + bf .

In the interpolation process, the result is a function of thecoordinates (integer and fractional part) and intensities ofneighboring pixels.

Neighbouring Grid Points: (a1,b1),(a,b1 +±1)L(a, + 1, bI), (a, + 1, b + 1)

integer parta = a tfractional part

\ ,b \\ / Interploated point

Fig. 1 Interpolated point with neighboring pixels

III. FPGA PROCESSING

The aim of hardware design is to implement the proposedalgorithms as quickly and efficiently as possible into a singlelow-cost Xilinx FPGA device and to use the minimumdevice resources. The pipelined parallel architectures andhardware implementations of the image distorted correctionalgorithm process are discussed below.

A. Co-ordinate Transformation ImplementationThe proposed algorithm used to correct the image

distortion as outlined in Section II uses coordinatetransformations between Cartesian and polar coordinatestwice. From equations (1) and (4), it can be seen that thesetransformation algorithms involve computing both squareroots and trigonometric functions.

Unfortunately, implementing these algorithms in amicroprocessor-based system is demanding as they do notusually easily map onto all hardware systems. Effectiveimplementation in FPGA hardware uses the CORDICalgorithms by Volder [9]. The main advantage of theCORDIC algorithms is that they use addition and shiftoperations and avoid trigonometric functions. Thus they are

269

easily targeted onto an FPGA architecture.Using CORDIC algorithms [9] [10], the coordinate

transformations (1) and (4) are implemented with thefollowing equation iteration:

X nx n Y (6)

Yn+I Yn + 8n2 n (7)

Zn+ Zn dn arctan(2 fl) (8)

where, the subscript n is the iteration number, the angle °cn isreplaced by zn, and An is a decision constant which signdetermines whether the process is to perform an addition ora subtraction at each iteration.

The sign of variable y or z is used to determine thedirection of rotation An. The decision function depends onthe transformation whether from Cartesian to polarcoordinates or from polar to Cartesian.When the decision constant /n is determined by the

variable y, it can be represented by

Cartesian to polar coordinates is also performed by shift andaddition operations without the need to compute complexfunctions in the FPGA.The transformation from polar to Cartesian coordinates is

also performed using the CORDIC algorithms [10]. Thesame iterative equations as (6) to (8) are used to replace theimplementation equation (4). However, the decisionconstant An is determined by the z variable, i.e. An = +1 whenzn > 0; /An = -1 when z, < 0. The iteration process willeventually drive the initial angle zo to zero; this is called therotation mode.

In the rotation mode, the results of the iterated initialvalues (x0 = p, yo = 0, z0 = 0) can be used to calculate theCartesian coordinate (x', y') afterN iterations.

Therefore, implementation of the transformation frompolar to Cartesian coordinates can use a similar pipelinedparallel architecture as shown in Fig. 2, the only differenceis to derive /n from zn rather than Yn.

0

(9)

The iteration process will seek to minimise the y value ofthe residual vector at each rotation and eventually drive it tozero; this is called the vector mode. In this, the angle zo isinitialized to zero in the initial iteration, the iteration resultscan be used to calculate the radius and the angle from theinitial values (x0, yo) afterN iterations.An n-iterative architecture of the CORDIC algorithm in

the vector mode for implementation transformation fromCartesian to polar in an FPGA is shown in Fig. 2. Once theiterative initial values (x0 Yo, zo) are loaded into the registers,and after subsequent clock pulses, the values of registers aremoved using an n-bit right shift operation. They are thencrossed-over to perform either addition or subtraction withthe existing register values; the sign of An determines whichprocess - addition or subtraction - to perform. The inversetangent constants (ao, a1, ..., acn) are stored in ROM.During the iteration, a fixed-point value of the angleconstant can be obtained for each iteration corresponding tothe memory address.

In Fig. 2, the three columns correspond to the threeiteration variables x, y and z. These are processedsynchronously and their operation executed in parallel. Theresults are stored in registers for synchronization with theimage during every cycle. The iterative results are thenoutput from the registers, which correspond to the Cartesiancoordinates. Since Xilinx FPGAs have a large number ofCombinatorial Logic Blocks (CLBs), they already containthe required registers, and hence there is no additional costin using a pipeline operation. The transformation from

2

n-1

n

XI

adder/subtracter

m-7tangentconstant

Zn

dicisionfunction

S

Fig. 2 Pipeline architecture of n-iterations

B. Radius Correction ImplementationEquation (2) is used to obtain the corrected radii for the

corrected image. The corrected radius p' is obtained fromthe polynomial approximation of the distortion radius p. Thepolynomial coefficients (C1, C2, C3, C4, ..., CA), can then beestimated by polynomial regression in software, and areused as constants in a hardware implementation.

270

(5, =+L Y, <0L Y, >0

A pipelined architecture of the 4th order radius polynomialfunction can be implemented as shown in Fig. 3. For thepipeline implementation, only two multipliers are employed.One is used for the multiplication of the distortion radiussuch as 61 = pp, ,62 p=p, A63 p3p. The other is used forthe multiplication of the polynomial coefficient by the radiussuch as cl,61, C2,62, C3 C4,64. One adder is employed toimplement the sum of polynomial terms. The term ATrepresents a delay of the signal. Every output result ofmultiplication and addition must be loaded into registers forsynchronization in the clock cycle. The implementation ofthe coefficients can use hardwiring instead of programmemory.

In the polynomial implementation, it is worth noting thatevery coefficient should be scaled differently. This isbecause as the radius power increases, the value of thecorresponding coefficient of the polynomial functiondecreases. This could be implemented by increasing thecoefficients and decreasing the corresponding multiplierfactors of the radius by the same factor, such as c,,d =

Cnpmpmpn. In this way, high precision can be achieved.Without using this process, the accumulated terms will causesignificant error in equation (2). Truncation of the outputsof the multiplication process is also performed after themultiplication, although this has not been shown in the blockdiagram.

3

4

5

6

p'Register Delay

> :1EAdder

\

Hardwiring

Fig. 4 Architecture for the radius correction

C. Implementation Interpolation1) Obtaining the Valuesfor Neighboring Pixels

Since the back mapping of coordinates from polar toCartesian is nonlinear, the corrected coordinate (x', y')obtained from the iterative result in the above CORDICalgorithms of the rotation mode, consists of two parts: theinteger part and the fractional part. This is shown in Fig. 1.The integer parts can be used as addresses to the pixelmemory (RAM) for obtaining the values of the neighboringpixel intensities, such as T(a1,b1), T(a1 1,b1), T(aI+l ,bI+) andT(a1 1,b1 '). The fractional parts, such as af bf 1-af and 1-bfcan be used for the measurement of the distances toneighboring pixels and used in later interpolation.

In the imaging system, the input values of distorted imagepixels are stored in memory, and these are read in one clockcycle by the corresponding memory address.

X '=ai +af

t < ~ ~~~~~~~Yt= b + bf

2

3

l - v ' 1~~~~~~~a b;+1

i__ ---------I- - - - i - ----------~-------i

a T~4T (- b 1) T(a b) lbj.l)

Register Subtractor Adder Memory Delay

- + Y RAM <CI& l ll

Fig. 5 Architecture for obtaining neighboring pixel parameters

To obtain the values for neighboring pixels, a pipelinedarchitecture is designed to get the distance parameters andpixel values from the memory. This is shown in Fig. 5. Oncethe corrected coordinates (x', y') are loaded into registers ina clock cycle, they are then separated into the integer partand the fractional part in the following clock cycle. In thesubsequent cycles, each cycle will read the RAM once to geta pixel value for the corresponding addresses. One adder is

271

used for getting the address of neighboring pixels. Thefollowing four consecutive cycles can therefore obtain fourneighboring values of the pixels. Within six cycles, thevalues of all parameters for an interpolation implementationof neighboring pixels are obtained.

2) Linear Interpolation ImplementationEquation (5) is used to obtain the corrected image values

from the distorted pixels and the corrected coordinates.There are eight multiplications to calculate a corrected valueof the image. Using 32-bit x 32-bit operands and the parallelimplementation of equation (5), 24 hardware multipliers willbe required for calculating one set of interpolation values.As there is always a limited number of dedicated multiplierson an FPGA device, this will cause problems withinsufficient device resources immediately because thesemultipliers are also needed for the polynomialapproximation of the distortion radius and scalingmagnitudes after the CORDIC iterations. In the XilinxFPGA Spartan 3 1000-4 device there are only 24 hardwaremultipliers. The efficient implementation of a pipelinedoperation is therefore crucial.

af bf 1-af 1-bf

Fig. 6 shows a pipelined architecture for theimplementation of equation (5) in FPGA hardware. Twodedicated multipliers are employed for the implementationof the linear interpolation in the each cycle. MI is used forcoordinate fractions multiplication like af xbf,, and M2 isused for the pixel intensity by a product of the coordinatefraction.

Since the results of the dedicated multipliers are stored inregisters at every clock cycle, no additional registers arerequired after the multiplications. One adder is used forsuccessively calculating the sum in equation (5). The RAMwill provide the pixel values from memory as shown in Fig.6. A delay is used for the pipeline so that the operandsappear at the correct time in the process. After every clockcycle, the results are stored in registers for synchronousoperation. Having obtained the parameters for theneighboring pixel, they are available for implementationinterpolation.

IV. EXPERIMENTAL RESULTS

To verify the effectiveness of the architectures for theproposed algorithm for correcting the distorted image, acheckerboard pattern is used for an orthoscopic image withpixel dimensions of 250 x 250 as shown in Fig. 7(a). Atypical barrel distortion of the checkerboard is shown in Fig.7 (b) and a pincushion distortion in Fig. 7 (c). The distortionrates of the two distorted images are given in Fig. 8,indicating that the maximum value of the barrel distortion is11.1% and the pincushion is 6.0%.The estimatedcoefficients for the corrected image experiments are given inTable 1.

(a) Orthoscopic checkerboard (b) Barrel distortion checkerboard

T

Multiplier Adder Register Memory Delay

Fig6 P e l RAM ipe

Fig. 6 Pipelined linear interpolation implementation

(c) Pincushion distortion checkerboard (d) Corrected checkerboard

Fig. 7 Processed checkerboards

272

The proposed architectures described above have beenimplemented in VHDL on a Xilinx Spartan 3 1000-4 FPGAdevice, and a ModelSim simulator was used to test thedesign. The pixel values of both the input distorted and theoutput corrected images use a 10-bit word length integernumber. The coordinates use a 32-bit word length, whichconsists of a 1-bit sign plus an 8-bit integer part of the pixelcoordinate and a 23-bit fractional part. The constants of theinverse tangent coefficients and the 4th order polynomialapproximation coefficients of the radius are represented by a1-bit sign plus a 24-bit word. The transformation for bothCartesian to/from polar coordinates used 9% of the slicesavailable and 2% were used for the radius polynomialcalculation. In the design, four multipliers are needed for a32-bit x 32-bit multiplication. The scaling of magnitudes forthe coordinate transformations of Cartesian from/to polar,used eight dedicated multipliers. The polynomial calculationof the radii correction used another eight multipliers. Twodedicated multipliers are also used for the pipelinedinterpolation computation, required for two 16-bit x 16-bitmultiplications. The whole implementation of pipelinedarchitectures for the image system after synthesize, uses29% of the slices, 20% of the chip memory (RAM) and 75%of the hardware multipliers. Its operation can be run atclock frequencies up to 64 MHz. The correctedcheckerboard of the barrel distortion is shown in Fig. 7 (d)and the corrected image for pincushion distortion are similarand therefore have been omitted. Experimental results showthat the errors of the corrected checkerboard were 1.53% forthe barrel distorted image and 1.10% for pincushion distorted.

15 X-

10

XPinQushlonX

.2')

,7iBarrel

-10

-15 10 20 40 60 80 100 120 140 160

Radius

Fig. 8 Distortion error

TABLE 1 RADIUS POLYNOMIAL COEFFICIENTSImage Barrel coefficient Pincushion coefficientC1 1.0003870170656408000000 0.9984815596059590800000C2 -0.0000000284392336662648 0.0000784818073125427130C3 -0.0000039512833394741542 0.0000006559020987804277C4 0.0000000068708496341944 0.0000000073350279261776

The full pipelined parallel architectures of the proposedmodel are implemented using VHDL and simulated andsynthesized into a FPGA. The coordinate transformationsfor moving between Cartesian and polar coordinatesemployed CORDIC algorithms, which have been shown tobe suitable for an FPGA structure without the need tocompute complex square roots or trigonometric functions.An efficient pipelined architecture for pixel interpolationprovided a good solution for the heavy multiplicationrequired. The experiment results for correction of barreldistorted and pincushion distorted images have shown thatthe proposed algorithm process is feasible and that thepipelined parallel architecture implementation uses lessdevice resources than are available when using a XilinxSpartan 3-1000 FPGA. This therefore provides a low-costsolution. The corrected images have very low residualerrors compared to an orthoscopic checkerboard. Theexperimental results are encouraging and it has been shownthat efficient pipelined architectures for image processingalgorithm can be implemented in a single FPGA deviceproviding both low-cost and high-speed. The generalapproach can be applied to other forms of image processingfor optical systems.

REFERENCES

[1] Sidney F. Ray, Applied photographic optics, 2nd ed., Focal Press,1997.

[2] H. S. Sawhney and R. Kumar, "True multi-image alignment and itsapplication to mosaicing and lens distortion correction," IEEETransactions on Pattem Analysis and Machine Intelligence, vol. 21. no.3, pp. 235-243. March 1999.

[3] R. Y. Tsai, "A versatile camera calibration technique for high-accuracy3D machine vision metrology using off-the-shelf TV cameras andlenses", IEEE Journal of Robotics and Automation, vol. 3, issue 4, pp.323 - 344, August 1987.

[4] Vijayan Asari, K.; Kumar, S.; Radhakrishnan, D.; "A new approach fornonlinear distortion correction in endoscopic images based on leastsquares estimation", IEEE Transactions on Medical Imaging,vol.18, issue 4, pp. 345 - 354, April 1999.

[5] I. L. Hennesey and D.A. Patterson. Computer Architecture: AQuontitotive Approach. Morgan Kaufmann Publishers Inc., SanMateo,USA, 1990.

[6] S. 0. Memik, A K Katsaggelos, and M. Sarrafzadeh. " Analysis andFPGA implementation of image restoration under resource constraints", IEEE Transactions on Computers, vol. 52. no.3, Mar. 2003.

[7] P. Kitsos. N. Sklavos. and 0. Koufopavlou. " An efficientimplementation of the digital signature algorithm," Proc. of the 9thIEEE International Conference on Electronics. Circuits and Systems,vol.3, San Jose. USA. 2002.

[8] H. Haneishi, Y. Yagihashi, Y. Miyake. "A new method for distortioncorrection of electronic endoscope images", IEEE Transactions onMedical Imaging, vol. 14, Page(s):548-555, September, 1995.

[9] J. E. Volder, "The CORDIC Trigonometric Computing Technique,"IRE Transactions on Electronic Computers, vol. EC-8, no. 3, pp. 330-334, 1959.

[10] R. Andraka.A Survey of CORDIC Algorithms for FPGA basedComputers - Proc of the 1998, CM/SIGDA Sixth InternationalSymposium on FPGAs, February 22-28, 1998, Monterey, CA, pp.191-200.

IV. CONCLUSION

The proposed algorithm for correcting spatial distortionin optical image systems has been presented in the paper.

273

Documents

[IEEE 2006 IEEE Workshop on Signal Processing Systems Design and Implementation - Banff, AB, Canada (2006.10.2-2006.10.4)] 2006 IEEE Workshop on Signal Processing Systems Design and