Variational Methods in Machine Vision · Variational Methods in Machine Vision JÓZSEF MOLNÁR Supervisor: Prof. Dmitry Chetverikov Eötvös Loránd University Doctoral School on

PhD theses

Variational Methods in Machine Vision

JÓZSEF MOLNÁR

Supervisor: Prof. Dmitry Chetverikov

Eötvös Loránd University

Doctoral School on Informatics

The Basics and Methodology of Informatics

Program Director: Prof. János Demetrovics

Budapest 2011

2

I. Introduction

This short introduction summarizes the variational method’s role in the sciences in general

and in the machine vision specifically. The fundamental notions are written with italics.

A discipline, which models its objects’ non trivial interactions with mathematical

categories – usually synthesized in its fundamental equations. These equations are often

ordinary or partial differential equations, emanating directly or indirectly from the discipline’s

axioms. Mathematically, the indirect equations derived from variational principles considered

the most established basis for the fundamentals. This is because the equations stemming from

variational principles express more than local interactions, they also guarantee certain global

principles. These are usually conservation, minimization principles. The assuring method is

the extremal-finding of functionals. Consider the smallest action principles in physics to

derive motion and field equations, or the geodetics (as the generalization of straight lines) in

curved spaces.

In the machine vision, the energy-minimization analogy leads to variational principles.

Examples such as image content segmentation using active contour, where we minimize the

“energy” of the segmenting curve use this analogy: the energy consists of external – image

content dependent, and internal – shape dependent parts. Similarly, in the case of optical flow

(where the pixel-wise motion field is sought between neighboring images of a sequence), the

purpose is to determine an energy-minimisation field: this field has to satisfy the optical

constraint (e.g. intensity-conservation) besides the minimization of an internal trait of the

sought field (e.g. the field divergence). The conservation of a quantity is equivalent with the

minimization of its variability; therefore the total energy minimization idea remains valid in

this case too. Thus, in the machine vision the energy-minimization analogy often used1. The

external-internal dualism illustrated above, is also typical. The total energy consists of data

term responsible for the external influences, and a smoothness term (or reguliser) responsible

for preserving some internal characteristics.

The regular manner, computing the functionals’ extremals is the derivation of Euler-

Lagrange equations. The types of these equations depend on the problem’s dimension, the

number of unknown functions and the order of the unknown functions’ derivatives: run from

ordinary, second order differential equations to higher-order partial differential equation

systems. For solving partial differential equations iterative methods are used. The solution is

1 This is the reason, why in the literature the energy-functional expression is used.

3

the fix point of the iteration, where the difference between the successive approximations is

closer than a certain threshold. The iterative methods applied for embedded manifolds are

known as evolution. One of the most successful evolution methods is the Level Set method.

Now we summarize the variational methods’ typical fields and duties in machine vision.

The active contour (snake), active surface methods are widely used for image segmentation,

3D object and scene reconstruction. The total variation (TV) method is the variational

approach for noisy, blurred image restoration. The combination of total variation and active

contours is a possible way for image interpolation, where the problem is the substitution of

the missing information from the neighborhood of the deficiency, keeping the important

image features as edges and textures continuous. The variational optical flow is an essential

method for analyzing the motion occurring between successive images of a video.

Applications are wide: video compressing techniques by key frames, robotics, vehicles’

assistant systems, human-machine interactions etc. Also, wide range of image registration

methods based on optical flow; where the problem is the fitting of the object representations

captured by different sensors (multispectral, multimodal registration). These problems occur

frequently in aerial photo processing or in medical diagnostics.

4

II. The organization of the dissertation

After a short Introduction in chapter 2 – Variational principles, their appearances in machine

vision – we review the fields – with literature references – where the variational methods

prevalent in the machine vision. We analyze the meaning and structure of typical functionals

invoking representative examples to illustrate the usage of data and smoothness terms. These

examples will be used as references later in the document. The Level Set formalism and the

methods to derive differential equations are discussed here too. Finally we illustrate the Euler-

Lagrange equations via a specific example (Horn-Schunck optical flow), which will be

referred in chapter 3.

In the introduction of chapter 3 – Optical flow – we present the applications where the

method is applied, the properties of the method, the motivation led us to this research area

(illumination robust application), and the related researches. In the second part we specifically

discuss the Optical flow based on cross-correlation, including the (non-central) normalized

cross-correlation data term for grayscale and color images, the approximate Euler-Lagrange

equations and the principles of linearization and discretisation. The numerical formula is then

compared and discussed with the introduction’s Horn-Shunck formula. The integral part of

this subchapter the derivation of the Euler-Lagrange equation can be found in appendix A.

The next subchapter: Tests of cross-correlation optical flow describes the test circumstances,

test results; grouping the results by the input sources: synthetic grayscale, outdoor and

synthetic color data. In the conclusion we compare the method quality with the state of the art

methods and discuss the possible future improvements.

In the introduction of chapter 4 – Active contour – we present the development and types

of segmentation techniques based on active contours. In the subchapter: Segmentation using

local regions the motivation (layer segmentation of Optical Coherence Tomography images)

and the expected properties of the proposed method are presented. In the next subchapter we

introduce the Basic model, the simplest local region based model. This includes the definition

of the local regions nearby the segmentation curve, the associated energy functional, the

normal component of the derived Euler-Lagrange equation (derivation can be found in

appendix B), the normal flow’s Level Set equation and a simple statistical separator function.

The subchapter is finished with criticism of the basic model, which justifies the need for

improvement. In the Model’s refinements subchapter we discuss twofold improvements: with

the second order curve approximation the size of the local regions can be extended in

5

tangential direction (for the sake of more robust statistics), while the integration region with

optimal shape using optimal integration boundary in normal direction, can enhance the

method’s ability for separation of regions where the average intensity difference is low. We

prove that the latter problem is a local variational problem in its own. In the Applying the

model, results subchapter we describe test circumstances and test results, and a possible two-

step algorithm which can further improve the speed. We close the chapter with the possible

future improvements including 3D.

In the introduction of chapter 5 – 3D reconstruction – we shortly summarize the 3D

reconstruction methods based on functional minimization principle, discuss the most

frequently used pinhole camera model and the projective and affine transformations based on

that model. At the end, we review the limitations stem from the pinhole camera model, and

we set our objective: the deduction of a quadratic transformation which is compatible with the

Level Set method. In the subchapter Linear transformation, the deduction steps of a Level Set

method-compatible linear transformation is specified. Partly similar steps are used for the

deduction of quadratic transformation. The formula of linear transformation is also a

constitutive part of the quadratic transformation. In the Quadratic transformation subchapter

we deduce the invariant equations of the quadratic mapping between two projections

(images). Both the cameras’ projection functions and the observed surface are approximated

with their (first and) second order differential quantities. Integral parts of the deduction found

in appendixes C and D. The subchapter is closed with the quantities’ computing instructions

on fixed spatial grid (for Level Set). We also present alternative computing in appendix E. in

the subchapter The result of the quadratic transformation we discuss the meanings of the

different constitutive terms, and compare with linear mapping (that is affine homography in

the case of pinhole camera model), and illustrate the quadratic mapping’s accuracy against the

projective and affine homography. In the closing chapter: An application of the quadratic

transformation we present the multiview 3D reconstruction proposed by Feugeras-Keriven,

which was used for validation, the test circumstances and test results.

Chapter 6 – Theses – sets out the theses of the dissertation. The used notation can be

found at the beginning of the document under Notation heading, the references collected at

the end of the document under Bibliography heading.

6

III. New scientific result discussed in the

dissertation

I did research on three different areas of the machine vision. These are optical flow, active

contours and 3D reconstruction. I took particular attention for the mathematical clarification

of all topics.

figure 1: Two frames of an outdoor video sequence (upper row). Applying displacement field to the pixels of the first frame we get the reconstruction of the second frame. Parts of reconstructed images using Horn-

Schunck and cross-correlation methods (lower row).

In the case of optical flow the objective was the elaboration of a fast, illumination-robust

method, which can be applied for outdoor video sequences’ processing (figure 1), even with

ability handling the changes in color illumination. The new results attained using the

normalized cross-correlation as data term in the energy functional. The special structure of the

Lagrangian (a compound of local integrals) implied the Euler-Lagrange equations as infinite

series of integro-differential terms. I simplified the analytical formula to a well applicable

numerical form with multi step linearisation. I also developed a software component for

testing. The tests were prudently performed on standard data sets, according to the current

requirements. I tested the method on synthetic grayscale, color as well as on real outdoor data.

According to the accuracy tests the method is comparable with the state of the art methods too

(despite the fact that high accuracy was not pursued). The publications of the method and the

results can be found in: [S1,S2,S3,S5,S6].

Horn-Schunck Horn-Schunck Cross-correlation Cross-correlation

frame 1 (with artifical shadow) frame 2

7

figure 2: Few segmentation phase using the elaborated local region model. Images made by OCT technology from rodent retina.

In the case of active contour topic, the objective was to elaborate a fast, reliable method

to allow the segmentation of retinal layer images captured by Optical Coherence Tomography

(OCT) technology. The images have no real edges, image features can be interpreted as

statistical quantities (figure 2). The usage of local regions alongside the segmentation curve is

a combination of the local feature driven and fully global region based methods. The method

allows the image’s data statistical interpretation without full image regions processing,

applicable to both open and closed curves. I proposed a Lagrangian, which separates image

regions by mean intensity, and derived the respective Normal Flow and Level Set equations. I

recommended the basic model’s twofold improvements enhancing the model’s robustness and

separation capability. I developed a software component with which method’s tests were

performed. The publications of the method and the results can be found in: [S7,S10].

Segmentation of Internal Limiting Membrane (ILM)

Segmentation of Retinal Pigment Epitheliun (RPE)

8

figure 3: Correspondences established by different mappings (the leftmost image is the first projection). The observed object is given as implicit surface.

In the case of 3D reconstruction the objective was to improve the quality and to extend

the usability domain of a variational method proposed Feugeras-Keriven. The method based

on active surface evolution driven by images’ local regional correspondences (between

images taken from different views). I deduced a Level Set compatible quadratic mapping

between the corresponding image regions, which approximates both the projections and the

observed surface. The equations don’t suppose the pinhole camera model. I made the analysis

of the result: discussed the contribution of the different terms to the result, clarified the

relationship to the projective and affine homographies (figure 3). We tested the equations on

synthetic data. The test justifies that the quadratic mapping serves more reliable results in the

case of objects with big curvatures. It is important to note that the equations can be used more

generally, wherever a method based on images’ local region correspondences. The

publications the method and the results can be found in: [S4,S8], submitted: [S9*].

1st projection affine homography projective homographhy quadratic mapping

9

IV. Theses

In this dissertation we can see examples for the usage of variational methods in the machine

vision. Theses are related to these topics.

Thesis 1: Equations of cross-correlation based variational optical flow

and their application

1.1 I introduced to the variational optical flow field the normalized cross-correlation data

term for grayscale and color images. I derived the local integral’s Euler Lagrange, and applied

the result to the functionals using normalized cross-correlation.

1.2 For practical applications, I elaborated the approximate linearized numerical formula:

first, I determined the approximate analytical equations for small local integration window,

second I performed numerical linearization on the analytical formula given by the first step.

1.3 I developed a software component for the practical use and test the method. I

performed the tolerance tests for intensity change, and accuracy test according to the

literature’s requirements.

Thesis 2: Usage of local region based active contour, proposal for

Lagrangian, recommendation for further improvements

2.1 I introduced the local regions alongside curves for segmentation purposes, allowing

the statistical interpretation of image features for open as well as closed curves. I proposed a

Lagrangian for region separation in statistical sense.

2.2 I improved the basic model twofold. First, a higher order curve approximation

allowing the enhancement of the integration area size alongside the separation curve. Second I

defined the optimal size (shape) of the integration domain, which maximizes the degree of

separation, augmenting the method’s precision. I presented that this latter improvement is a

local variational problem in its own. I recommended Lagrangian for the improved model.

2.3 I derived the models’ Euler-Lagrange and Level Set equations. I developed a software

component for the practical use and test the method. The software was tested on practical

examples. The method was able to improve presegmentation results according to experts’

analysis.

10

Thesis 3: Deduction of quadratic transformation for planar mapping of

implicit surfaces with invariant (intrinsic) quantities

3.1 I deduced images’ local regions’ quadratic mapping for correspondence purposes. First

I prescribe linear mapping with invariant quantities: containing the projections’ gradients and

the observed surface’s normal unit. Second, I deduced the equations for quadratic mapping in

parametric form as well as in invariant form.

3.2 I gave practical methods/formulas for practical use: a construction, allowing the use of

the formulas in any environment (e.g. for finite elemet methods), and a specific, Level Set

compatible formula defined on fixed spatial grids. We tested the formula’s applicability for a

multiview variational 3D reconstruction.

3.3 I discussed the relationship between the quadratic mapping and the affine as well as

projective homography. The test results justified the usefulness of quadratic mapping

wherever the input data does not favor the homographies (surfaces with high curvatures

and/or sparsely textured models). The quadratic mapping allows enhancing the

correspondence’s accuracy, therefore the robustness for any methods based on images’ local

regions’ correspondences.

11

The author’s publications

[S1] Molnár József, Csetverikov Dmitrij: "Kereszt-korrelációs optikai áramlás variációs sémája: megvilágítás-változásra invariáns egyenletek", Proc. KÉPAF 2009: 7th Conference of Hungarian Association for Image Processing and Pattern Recognition, CD, Budapest, 2009.

[S2] J. Molnar and D. Chetverikov: "Illumination-robust variational optical flow based on

cross-Correlation", Proc. 33rd Workshop of the Austrian Association For Pattern Recognition, Stainz, Austria, 2009, pp.119-128.

[S3] S. Fazekas, D. Chetverikov, and J. Molnar: "An implicit non-linear numerical scheme

for illumination-robust variational optical flow", Proc. British Machine Vision Conference 2009.

[S4] J. Molnar, D. Csetverikov: "Másodfokú közelítés implicit felületek síkbeli

leképezésére", Proc. Fifth Hungarian Conference on Computer Graphics and Geometry, Budapest, pp. 118-124, 2010.

[S5] D. Chetverikov, J. Molnar: "An experimental study of image components and data

metrics for illumination-robust variational optical flow", Proc. International Conference on Pattern Recognition, Istanbul, pp. 1694-1697, 2010.

[S6] J. Molnar, D. Chetverikov, and S. Fazekas: "Illumination-robust variational optical

flow using cross-correlation", Computer Vision and Image Understanding, vol.114, pp.1104-1114, 2010.

[S7] J. Molnár, D. Chetverikov, D. Cabrera DeBuc, Wei Gao, and G.M. Somfai:

"Segmentation of rodent retinal OCT images", Proc. KÉPAF 2011: 8th Conference of Hungarian Association for Image Processing and Pattern Recognition, Szeged, 2011, pp.140-154.

[S8] J. Molnár and D. Chetverikov: "Multiview Reconstruction Using Refined Planar

Mapping of Implicit Surfaces", Proc. KÉPAF 2011: 8th Conference of Hungarian Association for Image Processing and Pattern Recognition, Szeged, 2011, pp.221-232.

[S10] J. Molnár, D. Chetverikov, D. Cabrera DeBuc, Wei Gao, and G.M. Somfai: "Layer

extraction in rodent retinal images acquired by Optical Coherence Tomography", Machine Vision and Applications. Accepted for publication. DOI: 10.1007/s00138-011-0343-y. 2011.

Under review:

[S9*] J. Molnár, D. Chetverikov: ”Quadratic Transformation for Planar Mapping of Implicit Surfaces”, Journal of Mathematical Imaging and Vision

Documents

Variational Methods in Machine Vision · Variational Methods in Machine Vision JÓZSEF MOLNÁR Supervisor: Prof. Dmitry Chetverikov Eötvös Loránd University Doctoral School on