9
Project no. FP6-507752 MUSCLE Network of Excellence Multimedia Understanding through Semantics, Computation and LEarning Intermediate Report Task 5: Applications of state-of-the-art techniques to current problems in multimedia understanding Due date of deliverable: 28.02.2007 Actual submission date: 13.04.2007 Start date of project: 1 March 2004 Duration: 48 Months Workpackage: 6 Deliverable: D6.1 Editors: Simon Wilson Pádraig Cunningham Department of Statistics School of Computer Science Trinity College Dublin University College Dublin Dublin 2, Ireland Dublin 4, Ireland Revision 1.0 Project co-funded by the European Commission within the Sixth Framework Programme (2002-2006) Dissemination Level PU Public x PP Restricted to other programme participants (including the commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services) Keyword List: Classification, feature selection, kernel, machine learning, statistical learning c MUSCLE- Network of Excellence www.muscle-noe.org

Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

Project no. FP6-507752

MUSCLE

Network of ExcellenceMultimedia Understanding through Semantics, Computation and LEarning

Intermediate ReportTask 5: Applications of state-of-the-art techniques to current

problems in multimedia understanding

Due date of deliverable: 28.02.2007Actual submission date: 13.04.2007

Start date of project: 1 March 2004 Duration: 48 Months

Workpackage: 6Deliverable: D6.1

Editors:

Simon Wilson Pádraig CunninghamDepartment of Statistics School of Computer Science

Trinity College Dublin University College DublinDublin 2, Ireland Dublin 4, Ireland

Revision 1.0

Project co-funded by the European Commission within the Sixth Framework Programme(2002-2006)

Dissemination LevelPU Public xPP Restricted to other programme participants (including the commission Services)RE Restricted to a group specified by the consortium (including the Commission Services)CO Confidential, only for members of the consortium (including the Commission Services)

Keyword List: Classification, feature selection, kernel, machine learning, statistical learning

c©MUSCLE- Network of Excellence www.muscle-noe.org

Page 2: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

2

Page 3: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

Contents

Contents 3

1 Focus Area Extraction by Blind Deconvolution for Defining Regions of Interest, LeventeKovacs and Tamas Sziranyi 4

3

Page 4: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

Focus Area Extraction by Blind Deconvolutionfor Defining Regions of Interest

Levente Kovacs, Member, IEEE, andTamas Sziranyi, Senior Member, IEEE

Abstract—We present an automatic focus area estimation method, working with a

single image without a priori information about the image, the camera, or the

scene. It produces relative focus maps by localized blind deconvolution and a new

residual error-based classification. Evaluation and comparison is performed and

applicability is shown through image indexing.

Index Terms—Transform methods, feature representation, indexing methods,

sharpening and deblurring, video retrieval.

Ç

1 INTRODUCTION

IMAGE and video classification based on the automatically extracted

location of the main objects for indexing and retrieval purposes is a

problem with no definitive solution yet. We present a possible

solution for this problem, a method for automatically classifying

image areas relative to each other based on the local blur/focus in the

image. The main novelty lies in the use of localized blind

deconvolution for automatic estimation of focused areas on ordinary

images, without a priori knowledge about the image or the shootingconditions. We use a new error measure for area classification,

demonstrate the method’s practical applicability through an image

indexing proof-of-concept, and propose areas of application.

Generally, deconvolution techniques [1], [2], [3], [4], [5], [6], [7],

[8] are used for reconstruction of images degraded by optics and

channel noise. The blurring function which represents the distortion

is the so-called point spread function (PSF). Blind deconvolution [3],

[7] is the variant used when there is no, or just estimated knowledge

about the distortion. Application areas include tomography [8],

microscopy, medical, astronomical areas, and aerial imagery [6].

From blind deconvolution methods, we use a Richardson-Lucy-

based implementation [1], [2], which has good convergence

properties and the iterative process can be controlled for our

purposes. We exploit the capabilities within the deconvolution to

estimate the local image blurredness and use this information for

image area classification. Thus, we can obtain a so called region of

relevance-based segmentation and show that it can be used for

indexing, search, and retrieval on image databases. This segmenta-

tion goal is not uncommon: other depth map generation techniques

[10], [11], [12] use image series, shot with different focus settings,

which, in our case, are not available. Depth from defocus techniques

are also used to generate three-dimensional layout of the scene by a

series of defocused images [13]. Binary foreground segmentation

methods [14] also exist, where a wavelet-based approach is used for

binary segmentation of low depth of field images. The method uses

block mean and wavelet coefficient variation for region segmenta-

tion, then refining with lower wavelet scales, and texture sharpness

is implicitly considered. Applications for determining out-of-focus

images also exist [9] where frequency ratios and color information

are used.

We extract the classification feature by localized blind deconvo-

lution which is a simple and elegant solution. Texture and local

image variations are only implicitly considered, also by using a

localized contrast-weighted error classification. Our method tries to

be an extraction method for relevant regions, with multiple

applications like image indexing [15] or surveillance tasks. This

approach could also help in foreground/background separation

which is usually done by segmentation based on generalized texture

parameters and morphology [16]. A common technique for filming

closeups and portraits in movies is to capture a focused sequence of

an actor on a blurred background, which the presented method

could also automatically identify, and use as an additional feature

for indexing.

In what follows, we will describe the localized blind deconvo-

lution scheme, then the region segmentation process itself, and

application for image indexing.

2 LOCALIZED DECONVOLUTION OVERVIEW

The deconvolution we use is a model based on a Bayesian

approach with maximum likelihood minimization scheme [17],

based on the works of Richardson [1], Lucy [2], Ayers and Dainty

[7]. We use a localized scheme, which results in position

dependent PSF estimations for local image areas. The obtained

PSF estimates are varying over the set of local regions (i.e., blocks),

thus every region will have its own PSF. The possible use of blind

deconvolution for focus map estimation was introduced in [18].

Let gðxÞ ¼ hðxÞ � fðxÞ be the observed image (i.e., region and

block) formed by the convolution of the unknown original image f

with the unknown point spread function h. Given the observed

image g, we search for the original image f which maximizes the

Bayesian probability of observing g given the estimated f of the

form P ðfijglÞ ¼ ½P ðgljfiÞP ðfiÞ�=P

j½P ðgljfjÞP ðfjÞ� (indices point to

image pixels). Using the definition of the conditional probability

and the above equation, we can write

P ðfiÞ¼Xl

P ðfiglÞ¼Xl

P ðfijglÞP ðglÞ ¼Xl

P ðgljfiÞP ðfiÞP ðglÞPj P ðgljfjÞP ðfjÞ

: ð1Þ

From this form, the following iteration scheme first introduced by

Richardson [1] can be derived

Pkþ1ðfiÞ ¼ PkðfiÞXl

P ðgljfiÞP ðglÞPj P ðgljfjÞPkðfjÞ

ð2Þ

k being the iteration factor, P ðfiÞ ¼ fi=jf j, P ðglÞ ¼ gl=jgj, and

P ðgljfiÞ ¼ P ðhi;lÞ ¼ hi;l=jhj [1] based on the constancy of light

energy distribution. Taking an optical/physical point of view, the

above form can be conceived as normalized energy distribution [19],

and by writing (2) in the form fi;kþ1 ¼ fi;kP

l½hi;lgl=ðP

j hj;lfj;k� and

by substituting gk ¼ fk � hk, we arrive to the iteration form:

fkþ1 ¼ fk hk �g

fk � hk

� �¼ fk hk �

g

gk

� �: ð3Þ

A similar iteration scheme for obtaining the point spread function

can also be constructed [19], [20]. The main steps of the double

iteration scheme we use are in Fig. 1. We extended this iteration

scheme [18] for localized PSF extraction. Thus, the localized

deconvolution gives an estimation of the PSF and the original

image at every step ðkÞ, r denoting the location vector. Convolutions

are performed locally around position r, with a region of support T :

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007 1

. L. Kovacs is with the Department of Image Processing and Neurocomput-ing (KNT), University of Pannonia, Egyetem u. 10, Veszprem, H-8200Hungary. E-mail: [email protected].

. T. Sziranyi is with the Hungarian Academy of Sciences, Computer andAutomation Research Institute (MTA SZTAKI), Kende u. 13-17,Budapest, H-1111 Hungary. E-mail: [email protected].

Manuscript received 23 June 2006; revised 4 Oct. 2006; accepted 27 Nov.2006; published online 18 Jan. 2007.Recommended for acceptance by B.S. Manjunath.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPAMI-0470-0606.Digital Object Identifier no. 10.1109/TPAMI.2007.1079.

0162-8828/07/$25.00 � 2007 IEEE Published by the IEEE Computer Society

4

Page 5: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

fkþ1ðrÞ ¼ fkðrÞ hkðrÞ � ggkðrÞ

h i

hkþ1ðrÞ ¼ hkðrÞ� fkðrÞ � g

gkðrÞ

h i:

8><>: ð4Þ

The region of support of the PSF we use is proportional to the size of

the local region (block) on which the iterations are performed,

practically around 1/2 of the block radius. The block widths (being a

power of 2) are hand picked, practically using 32� 32. We use the

� weighting in the PSF iteration filter normalization, with the value

� ¼P

T gkðrÞ. The double iteration’s initial values are: f0ðTrÞ is a

gray image with DC of the observed g image, h0 starts as circular

constant unity. Most of the calculations are done in Fourier space to

decrease the computational cost involved with large convolutions.

The constraints imposed during the iteration steps are only the

most necessary ones, on the size of the PSF, pixel amplitudes,

nonnegativity in the space domain and zero phase in the frequency

domain [18]. Full image deconvolution processes generally use

multiple hundred or thousand iterations and a large set of complex

constraints on image features, but here we only run a few iterations

on image areas, with simple constraints. Thus, we needed an error

measure which is stable in these circumstances and whose values

could serve as the basis for classification.

2.1 Error Bound

The localized deconvolution runs on small blocks. Thus, the ill-

posed [20] iteration process tends to be noisy with higher number

of iterations. For the classification, we stop at a low iteration count

and calculate the region’s local reconstruction error measure.

Starting with f0, the more in the focus a spot is, the higher the

distortion to g is at an early iteration. We need an error measure

which consistently gives different values for differently focused

areas, and which is not much affected by the process’s noisy

nature. In [18], we used simple mean square error (MSE, jg� gkj)for this purpose, but later experiments have shown that MSE is

sensitive to the noise coming from the ill-posedness of the iteration

process which often causes fluctuations in the classification.

Thus, we constructed a more stable error measure based on the

orthogonality principle [21], considering the independence of noise

and the estimated signal. This measure theoretically converges to

zero and instead of simple block differences gives the angle

deviation error (ADE) of the measurement and estimation residual

error. The main reason ADE has proven to be better suited for our

classifications is that, while MSE is a simple difference measure,

which can greatly vary and cannot provide a consistent scale, ADE

gives the normalized angle of the reconstruction error which even

in such situations provides a stable scale with a zero minimal value

and the ADE-based classification remains consistent. The ADE

measure has the following form:

Eðg; gkÞ ¼ arc sin< g� gk; g >jg� gkj � jgj

��������: ð5Þ

This measure is new in that it provides a normalized local

reconstruction-based difference which is more suitable for our

classification purposes. When using g and gk in the error measure

in practice, the differences in convergence of the ADE and MSE are

not always evident, although classification results show ADE’s

superiority. But, the convergence of ADE is clear on ground-truth

data, when we know the original f and we can measure the direct

error through comparing f to fk. This case shows that ADE almost

always converges to zero while MSE is usually disturbed by the

error coming from the low-constrained iterations. Fig. 2 shows an

example, where the textured image was blurred uniformly, and

eight neighboring areas were iterated. Ideally, all curves of the

same measure should remain close to each other (with minor

variations because of the local texture differences), but the example

shows MSE’s instability which causes fluctuations in the classifica-

tion. The better performance of ADE on real images is evident from

the experimental results, presented later.

Thus, the superiority of ADE over MSE comes from the reduction

of the fluctuation in the relative error values that caused similar areas

to be classified differently. Simulation and segmentation results are

in Figs. 2 and 3, 6, 7, 8, 9, 10. A sample image showing comparison of

extraction with MSE and ADE is in Fig. 3 for 15 iterations. Extraction

with ADE produces a more fine and detailed map.

2.2 Contrast Weighting

Equation (5) works well for multitextured images in depth with

similar contrast. For further improving the robustness of the

error deviation-based extraction (5) in the general case, we add

local contrast weighting. For local contrast measurement, we

use the conventional contrast definition having the form:

CrðgrÞ ¼ ðgmaxfx2Trg�gminfx2TrgÞ=ðgmaxfx2Trg þ gminfx2TrgÞ, gmaxfx2Trg,and gminfx2Trg being the maximum and minimum local image

intensities in region Tr at location r. Thus, the local error later

used in the classification becomes:

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007

Fig. 1. Steps of the iteration. Constraints on the estimated regions and blur are

imposed in the frequency and image domain during each iteration, to retain

convergence during the deconvolution process.

Fig. 2. Error curves for eight neighboring blocks (each curve stands for one block)

on a blurred texture sample (top) for the same blur with ADE (left), and MSE

(right). Ideally, curves of the same measure should remain close to each other.

Fig. 3. Comparison of (a) input image’s, (b) MSE distance-based extraction,

(c) PSF-based classification, and (d) the new ADE measure.

5

Page 6: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

Erðg; gkÞ ¼ arc sin< g� gk; g >jg� gkj � jgj

�������� � CrðgrÞmaxrfCrðgrÞg

: ð6Þ

This measure depends both on the local reconstruction error and

the local contrast, therefore it can give usable results both for low

depth of field images and for high depth of field images with

textured content. Also, since local contrast can serve as a low level

depth cue approximation, its inclusion gives more robustness to

ADE. A sample image showing extraction differences when

processing with and without contrast weighting is in Fig. 4,

showing that higher contrast areas get higher weight.

2.3 Multiresolution and Performance

The deconvolution and focus extraction process, a quasi-multi-

resolution technique, can be performed from the pixel level through

overlapping or nonoverlapping blocks (Fig. 5 shows an example).

The motivation for also investigating block-based approaches

was to reduce the computation need of the focus map extraction

since the pixel-based map extraction can take almost a minute on a

512� 512 image, around 19s time for the overlapping and 5s for

the block-based calculations (on a 2 Ghz PC). The level of detail

and the computation time depend on the desired map resolution.

Depending on the resolution of the focus map, the estimated

focused region’s area can show some increase over the real focus

area observable on the input images. This typically occurs when

running a block-based estimation at the perimeters of focused

areas. Higher accuracy at the boundaries (when using a block-

based approach) could be obtained either by local higher

resolution maps or a block subdivisioning scheme.

3 EXTRACTION OF FOCUS MAPS

The goal of the image classification is to extract the focus maps, so

called regions of relevance from images. To achieve this goal, we use

the presented localized blind deconvolution. During the classifica-

tion, the locally obtained relative error values (6) are used to

separate the areas which are more in focus with linear classification.

The later presented example images are generated by using

overlapping image regions in the focus map estimation process.

We compared our method to gradient and autocorrelation-

based methods. In the present comparisons, we consider only the

basic image information content and do not exploit any specific a

priori knowledge about texture or shapes. Higher order optimiza-

tions like wavelet coefficients, Markov random fields, and texture

features could also be added, but we are considering the basic

capabilities of the method herein.

Edge content and/or gradient-based sharpness measures [22]

exploit detections in edge changes for local and global sharpness

measures, while autocorrelation methods can also provide local

sharpness measure by checking how well neighboring pixels

correlate. Practically, in-focus images contain many small correlated

areas, having high central peaks. For a quick visual comparison, see

Fig. 6, where Fig. 6b is the deconvolution-based map.

The proposed blind deconvolution-based extraction and classi-

fication does not require any a priori information about the image

contents, giving refined and well-scaled relative focus estimations.

Depending on edge measurements can give false results, e.g.,

when there is a low blur difference between image areas and

autocorrelation usually cannot provide enough area discrimination

for images with textured content.

Figs. 7 and 8 contain samples of focus extraction with our method

and the ones mentioned above for visual comparison. The first

(Fig. 7) is an example for using the same texture with areas

progressively more blurred (numbers show increasing Gaussian

blur). The deconvolution-based method can provide good segmen-

tation and visually distinguishable relative scales of the blurred

areas (the higher the blur, the lower the map intensity). Fig. 8 shows

an example where the image is constructed from four different

textures and the same blur is applied through different areas. Our

method can both reliably segment the blurred areas from the rest of

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007 3

Fig. 8. Example images showing blurred area classification on different textures.

(a) Input (contains areas equally blurred shown with black borders), (b) map with

our approach, then (c) with autocorrelation, and (d) edge content-based

approaches (maps should have equally dark intensities on the blurred areas).

Fig. 7. Example images showing blurred area classification on different textures.

(a) Input (numbers show increasingly blurred subareas), (b) map with our

approach, (c) map with autocorrelation, and (d) edge content-based approaches

(on the maps decreasing intensity should equal increasing blur).

Fig. 6. Visual comparison for (a) input image, for (b) deconvolution,

(c) autocorrelation, and (d) edge content-based extractions.

Fig. 5. Example for extraction on the (a) input image with maps on the (b) pixel

level, (c) with overlapping blocks, and (d) nonoverlapping blocks.

Fig. 4. Example for focus map extraction with contrast weighting. (a) input,

(b) focus map without contrast weighting, (c) focus map with contrast weighting,

and (d) the extracted relative focus surface.

6

Page 7: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

the image independently of the texture, and can also provide a

relative scaling between the different textures, because of the

contrast weighting.

In Fig. 9, numerical evaluation is presented for the comparisons.

We used texture-sets of histogram-equalized Brodatz-tiles [23] for

the first two examples. Central areas (with black rectangles) are

blurred with changing strength, and the segmentation capability of

the methods is checked through the masking error. A ground-truth

mask is generated of the hand-blurred regions, calculating the

ratio between the blurred area and the whole image. Then, the

methods are compared by generating a mask containing the most

blurred area with the same ratio as in the ground truth. The error

metric used was the ratio between the extracted blurred areas and

the ground truth (i.e., the real hand-selected and blurred areas)

errorð%Þ ¼ 100 � kAextracted �Arealk=Areal, where lower values mean

better extraction. The horizontal axes represent the increasing radius

of the applied blur. As the figures show, deconvolution-based focus

extraction with the ADE measure can give good results even from

low blur to high blur and the others can only achieve similar good

extraction for high levels of blur, when probably every technique

would be able to differentiate the blurred areas. Also, our method

can achieve this consistently, proportionally differentiating blurred

areas and identifying areas with the same blur. Fig. 10 shows

practical examples for this capability for real images.

Fig. 11 shows examples of using the focus extraction method to

select focused targets in video. It also can track focus changes

across a video scene (Fig. 12).

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007

Fig. 12. Following the move of focus across a scene (focus shifted from the left of

the image to the stairs in the background). Upper row are the input frames, bottom

row are their respective focus maps with higher intensity meaning higher focus.

Fig. 11. Two examples of focus extraction on video frames: The top row shows the video frames. The bottom row shows their respective focus maps (higher intensity

means higher focus).

Fig. 10. Examples for focus extraction on images with various textures (top row: input, bottom row: respective focus maps).

Fig. 9. Evaluation (lower is better) of the performance of deconvolution/autocorrelation/edge content extraction methods. For deconvolution, both the new error measure(deconv) and the mean square error approach (deconv-mse) are included. Blurred areas are shown with black borders. Errors larger than 100 percent mean that areasoutside the ground-truth area were also falsely detected as being blurred, thus increasing the error.

7

Page 8: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

4 APPLICATION—CBIR

In the following, we describe a possible application of the

presented method for image indexing (i.e., content-based indexing

and retrieval, or CBIR), by extending our earlier investigations [15].

The goal of the indexing is to automatically extract relative focus

maps of database images so that later queries can be formulated for

objects in focus on a specific area. In our tests, we used focus-

extracted model images as queries and searched for images with

similarly structured focus maps over a collection of images.

Our sample image database consists of 280 hand selected and

annotated images, based on the location of the focused areas

(bottom: 16 images, left: 35 images, right: 31 images, center:

118 images, and 80 with no particular focus/blur). The images

were gathered from Web search engines and the authors’ images,

and grouped by hand before any processing. Then, the region

extraction was run on all the images and the obtained classifications,

i.e., focus maps, stored along the images. During the tests, we stored

the maps as gray-scale bitmaps, the level of intensity proportional to

the level of focus (markup languages or 3D height maps can also be

used). We ran 15 queries against the image set with different focus

locations (two bottom, three left, three right, and seven center).

When comparing the query’s map with the images’ maps we used

MSE distance between the two maps. If the compared maps do not

have the same resolution, the map from the database is scaled to

match the size and shape of the query image, then MSE distance is

calculated. Search is currently performed without internal optimi-

zations, and the images are a collection of files. After the automatic

focus estimation, the evaluation of the retrievals is done manually,

by counting those images from the respective results which have

their focus on the same area as the query image. All other responses

(focus on other area or images with no particular focused area) are

counted as false responses.

Fig. 13 shows a query and some responses over the test

database. When there is something in focus on the left part of the

query image, the search should ideally return images which

contain similarly focused area at the given location.

Search was performed by two means: First, the images of the

database were placed in decreasing order of their similarity to the

query image and the first 10 were retrieved (called first10). The

distance ðDÞ was measured as the MSE between the query’s ðQÞand the database images’ map ðIÞ, D ¼MSEðQ; IÞ. Second, those

database images were retrieved whose distance from the query

image’s map was closer than n% of the average MSE over the

entire database (called errorbound, with n ¼ 0:3� 0:6). In this case,

the number of retrieved images can be influenced by changing the

above threshold (see Fig. 15).

As stated above, we used 15 query images, combined from each

category, and the resulting precision-recall plots are shown in

Figs. 14 and 15 for the two similarity measures (first10 and

errorbound), compared to the edge content and autocorrelation

approaches (both performing on databases generated by the

respective methods). In Fig. 14a, the P/R points of the results are

plotted when displaying the best 10 matches for each query. Here,

each point is a result of a query (some are close or overlapping). In

Fig. 14b, the F ð1=2Þmeasure (combined P/R) plot is also shown for

the first10 approach for all the queries. The plots show that, in most

cases, the deconvolution-based retrievals have better precision/

recall values when searching for the best 10 matches. In Fig. 15,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007 5

Fig. 14. Precision-Recall graph for 15 queries: (a) The returned closest 10 matches and (b) F ð0:5Þ measure values (higher is better).

Fig. 15. (a) Averaged Precision-Recall curves for returning every match above a threshold (errorbound approach) for the 15 query images and (b) the respective F ð0:5Þmeasure values for the different thresholds (higher is better).

Fig. 13. Query-response example. Left: The query and its map below. Others:

Response images and their maps.

8

Page 9: Intermediate Report Task 5: Applications of state-of-the-art …muscle/WP6/D6.1_Task5.pdf · 2007. 5. 3. · power of 2) are hand picked, practically using 32 32 . We use the weighting

responses are shown when the number of returned results is

controlled by the above mentioned error bound approach. The

shown curves in Fig. 15a are averaged P/R points for queries with

different thresholds and in Fig. 15b, the respective combined F ð1=2Þmeasure values are also shown (higher is better).

A search and retrieval process over the 280 image database took

an average time of 18s using the mentioned brute force approach

(file operations and map rescalings included), and good responses

had an average of 74 percent and 62 percent for the two error

measures, respectively, over all the queries.

5 CONCLUSIONS

We presented new results on using localized blind deconvolution

for automatic relative focus map extraction from ordinary images

with no explicit knowledge about the image or exposure

conditions. We have introduced a new robust error measure based

on the orthogonality principle. Multiscale relative maps can be

extracted from images and/or video frames which can be used for

focus-based feature extraction. Proposed applications include

image indexing and retrieval, focus tracking in videos, main actor

selection, news anchor detection, closeup detection in scenes,

object extraction, or tracking. In general, focus maps could be well

used as a complementary indexing feature in image databases.

ACKNOWLEDGMENTS

The author’s work was supported by the OTKA T-049001 researchgrant and the MUSCLE NoE of the EU.

REFERENCES

[1] W.H. Richardson, “Bayesian-Based Iterative Method of Image Restoration,”J. Optical Soc. Am., vol. 62, pp. 55-59, 1972.

[2] L.B. Lucy, “An Iterative Technique for Rectification of Observed Distribu-tions,” The Astronomical J., vol. 79, pp. 745-765, 1974.

[3] D. Kundur and D. Hatzinakos, “Blind Image Deconvolution,” IEEE SignalProcessing Magazine, pp. 43-64, May 1996.

[4] J.R. Hopgood and J.W. Rayner, “Bayesian Single Channel Blind Deconvo-lution Using Parametric Signal and Channel Models,” Proc. 1999 IEEEWorkshop Application of Signal Processing to Audio and Acoustics, 1999.

[5] T. Sziranyi, “Robustness of Cellular Neural Networks in Image Deblurringand Texture Segmentation,” Int’l J. Circuit Theory and Applications, vol. 24,pp. 381-396, 1996.

[6] A. Jalobeanu, L. Blanc-Feraud, and J. Zerubia, “An Adaptive GaussianModel for Satellite Image Deblurring,” IEEE Trans. Image Processing, vol. 13,no. 4, 2004.

[7] G.R. Ayers and J.C. Dainty, “Iterative Blind Deconvolution Method and ItsApplications,” Optics Letters, vol. 13, pp. 547-549, 1988.

[8] S.M. Jefferies, K. Schulze, C.L.m. Matson, K. Stoltenberg, and E.K. Hege,“Blind Deconvolution in Optical Diffusion Tomography,” Optical Express,vol. 10, pp. 46-53, 2002.

[9] S.H. Lim, J. Yen, and P. Wu, “Detection of Out-of-Focus DigitalPhotographs,” Technical Report HPL 2005-14,HP Lab., 2005.

[10] J. Ens and P. Lawrence, “An Investigation of Methods for DeterminingDepth from Focus,” IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 15, no. 2, pp. 97-108, Feb. 1993.

[11] A. Pentland, “A New Sense for Depth of Field,” IEEE Trans. Pattern Analysisand Machine Intelligence, vol. 9, no. 4, pp. 523-531, 1987.

[12] L. Czuni and D. Csordas, “Depth-Based Indexing and Retrieval ofPhotographic Images,” Proc. Eighth Int’l Workshop Visual Content Processingand Representation, pp. 76-83, 2003.

[13] P. Favaro and S. Soatto, “A Geometric Approach to Shape from Defocus,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 406-417, Mar. 2005.

[14] J.Z. Wang, J. Li, R.M. Gray, and G. Wiederhold, “Unsupervised Multi-resolution Segmentation for Images with Low Depth of Field,” IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 23, no. 1, pp. 85-90, Jan. 2001.

[15] L. Kovacs and T. Sziranyi, “Image Indexing by Focus Map,” Proc. SeventhInt’l Conf. Advanced Concepts for Intelligent Vision Systems, pp. 300-307, 2005.

[16] A. Hanbury, U. Kandaswamy, and D.A. Adjeroh, “Illumination-InvariantMorphological Texture Classification,” Proc. Seventh Int’l Symp. Math.Morphology, pp. 377-386, 2005.

[17] C.-Y. Chi and W.-T. Chen, “Maximum-Likelihood Blind Deconvolution:Non-White Bernoulli-Gaussian Case,” IEEE Trans. Geoscience and RemoteSensing, vol. 29, no. 5, 1991.

[18] L. Kovacs and T. Sziranyi, “Relative Focus Map Estimation Using BlindDeconvolution,” Optics Letters, vol. 30, pp. 3021-3023, 2005.

[19] D.A. Fish, A.M. Brinicombe, and E.R. Pike, “Blind Deconvolution by Meansof the Richardson-Lucy Algorithm,” J. Optical Soc. Am. A, vol. 12, no. 1,pp. 58-65, 1995.

[20] M. Jiang and G. Wang, “Development of Blind Deconvolution and ItsApplications,” J. X-Ray Science and Technology, vol. 11, pp. 13-19, 2003.

[21] A. Papoulis, Probability, Random Variables ad Stochastic Processes. McGraw-Hill, 1984.

[22] J. Dijk, M. van Ginkel, R.J. van Asselt, L.J. van Vliet, and P.W. Verbeek, “ANew Sharpness Measure Based on Gaussian Lines and Edges,” Proc.Advanced School for Computing and Imaging, pp. 39-73, 2002.

[23] P. Brodatz, Textures: A Photographic Album for Artists and Designers. DoverPublications, 1999.

. For more information on this or any other computing topic, please visit ourDigital Library at www.computer.org/publications/dlib.

6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007

9