10
AN INFORMATION-THEORETIC APPROACH TO MULTI-EXPOSURE FUSION VIA STATISTICAL FILTERING USING LOCAL ENTROPY Johannes Herwig and Josef Pauli Intelligent Systems Group University of Duisburg-Essen Duisburg, Germany email: {firstname.lastname}@uni-due.de ABSTRACT An adaptive and parameter-free image fusion method for multiple exposures of a static scene captured by a stationary camera is described. The notion of a statistical convolution operator is discussed and convolution by entropy is introduced. Images are fused by weighting pixels with the amount of information present in their local surroundings. The proposed fusion approach is solely based on non-structural histogram statistics. Its purely information-theoretic view contrasts the phyiscally-based photometric calibration method of high dynamic range (HDR) imaging. 1 Introduction The dynamic range of light in real world scenes by far exceeds what is capable with a single exposure using a digital camera device. A CCD (charge-coupled device) sensor measures 1 : 10 4 in contrast but approximately up to 1 : 10 7 is common in natural scenes [1]. Digital images usually have an 8-bit range of gray values with even lower contrast than original mea- surements, and therefore a post-processing step built within the camera device applies non-linear dynamic range compression, so that contrast in darker and lighter areas of a scene is lost. The properties of this mapping are physically described by the camera response curve that relates irradiating light to digital gray values. The lack of contrast - due to both the sensor prop- erties and subsequent compression - results in under- and overexposed parts within an image, so that a single exposure is not enough to capture all the details of a scene. Therefore a series of exposures need to be taken and fused via image processing, so that one image contains every detail. The discussion in this paper is restricted to multiple exposures of the same static scene captured with a stationary camera. Otherwise an exposure sequence would need to be registered beforehand, e.g. using the efficient method of [2], and moving objects need to be specially treated, known as ghost removal, which is often done semi-automatically. 1.1 The Physical Approach: Photometric Calibration The literature discusses two different approaches to fusion of a bracketed exposure sequence of a static scene. The physical approach calibrates an imaging device with respect to its response to different amounts of irradiating light [3, 4, 5, 6]. Thereby the response curve that maps light to digital gray values is recovered. Since the dynamic range of light in real world scenes by far exceeds the usual 8-bit range of gray levels in digital images, the response is likely an S-shaped curve compressing lower and upper bounds of dynamic range. After recovery its inverse is applied and exposures are fused into a single 32-bit floating point radiance map. Because usual display and reproduction devices cannot cope with such images, radiances need to be downscaled again using a tonemapping operator whose compression is adaptive to scene content. Therefore the tonemapped result contains more visual information than is achievable with any single exposure. Although physically sound there are drawbacks. A natural scene used for calibration needs to be carefully chosen, so that it reveals much of the properties of the response curve, and for each image its exposure time must exacly be known. During and after calibration properties of the imaging device, like values for color balancing or sensitivity, are not supposed to change, which is only possible if the device is manually controllable. 0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical Filtering using Local Entropy”, Proceedings of the Seventh IASTED International Conference on Signal Processing, Pattern Recognition and Applications, ACTA Press, pp. 50-57, 2010.

An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

AN INFORMATION-THEORETIC APPROACH TO MULTI-EXPOSUREFUSION VIA STATISTICAL FILTERING USING LOCAL ENTROPY

Johannes Herwig and Josef PauliIntelligent Systems Group

University of Duisburg-EssenDuisburg, Germany

email: {firstname.lastname}@uni-due.de

ABSTRACTAn adaptive and parameter-free image fusion method for multiple exposures of a static scene captured by a stationary camerais described. The notion of a statistical convolution operator is discussed and convolution by entropy is introduced. Images arefused by weighting pixels with the amount of information present in their local surroundings. The proposed fusion approachis solely based on non-structural histogram statistics. Its purely information-theoretic view contrasts the phyiscally-basedphotometric calibration method of high dynamic range (HDR) imaging.

1 Introduction

The dynamic range of light in real world scenes by far exceeds what is capable with a single exposure using a digital cameradevice. A CCD (charge-coupled device) sensor measures 1 : 104 in contrast but approximately up to 1 : 107 is commonin natural scenes [1]. Digital images usually have an 8-bit range of gray values with even lower contrast than original mea-surements, and therefore a post-processing step built within the camera device applies non-linear dynamic range compression,so that contrast in darker and lighter areas of a scene is lost. The properties of this mapping are physically described by thecamera response curve that relates irradiating light to digital gray values. The lack of contrast - due to both the sensor prop-erties and subsequent compression - results in under- and overexposed parts within an image, so that a single exposure is notenough to capture all the details of a scene. Therefore a series of exposures need to be taken and fused via image processing,so that one image contains every detail. The discussion in this paper is restricted to multiple exposures of the same staticscene captured with a stationary camera. Otherwise an exposure sequence would need to be registered beforehand, e.g. usingthe efficient method of [2], and moving objects need to be specially treated, known as ghost removal, which is often donesemi-automatically.

1.1 The Physical Approach: Photometric Calibration

The literature discusses two different approaches to fusion of a bracketed exposure sequence of a static scene. The physicalapproach calibrates an imaging device with respect to its response to different amounts of irradiating light [3, 4, 5, 6]. Therebythe response curve that maps light to digital gray values is recovered. Since the dynamic range of light in real world scenes byfar exceeds the usual 8-bit range of gray levels in digital images, the response is likely an S-shaped curve compressing lower andupper bounds of dynamic range. After recovery its inverse is applied and exposures are fused into a single 32-bit floating pointradiance map. Because usual display and reproduction devices cannot cope with such images, radiances need to be downscaledagain using a tonemapping operator whose compression is adaptive to scene content. Therefore the tonemapped result containsmore visual information than is achievable with any single exposure. Although physically sound there are drawbacks. A naturalscene used for calibration needs to be carefully chosen, so that it reveals much of the properties of the response curve, and foreach image its exposure time must exacly be known. During and after calibration properties of the imaging device, like valuesfor color balancing or sensitivity, are not supposed to change, which is only possible if the device is manually controllable.

0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical Filtering usingLocal Entropy”, Proceedings of the Seventh IASTED International Conference on Signal Processing, Pattern Recognition and Applications, ACTA Press, pp.50-57, 2010.

Page 2: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

1.2 The Ad-hoc Approach: Exposure Blending

To circumvent these inconveniencies another fusion approach has emerged, which abandons the calibration step, thereforeallowing to change camera parameters and even incorporate flash images into the to be fused image sequence. Contrary tocalibration there is no dynamic range increase possible, because pixels are fused by weighted averaging of gray levels. Withmethods of this type desirable image qualities are envisioned and measured locally in every exposure image. The fusion resultis made up of image patches copied from exposures where the quality measure is maximal. Then exposures with locally non-maximal quality measures are blended in to smooth sharp edges between differently exposed neighbouring patches in the fusionresult. The fusion process is a weighted averaging of pixels of a stack of input images guided by quality measures.

1.3 The Proposed Information-Theoretic Approach

Grounded on these ideas, the purpose of this paper is to develop a purely information-theoretic view of multi-exposure fusionthat is contradictory to the physically-based calibration method. Here the ad-hoc approach is generalized into entropy-basedpixelwise weighting with adaptive scale that is not biased by any spatial gray value pattern that may be preferred by a local qual-ity measure, nor is it based on Gaussian scale-space that inherently spatially contextualizes a local neighbourhood, and neithera distance based weighting function for smooth blending is required. Thereby entropy is the most unbiased statistical measure[7] with respect to the observed data. In image processing measuring entropy equals to a non-structural histogram analysis.Although the maximum entropy method (MEM), that optimizes for the best result achievable using available information only[7], has been previously applied to e.g. multi-spectral image fusion [8] for resolution enhancement the method presented hereis not an entropy optimization method but rather a direct convolution approach with entropy as a filter kernel that results in anacceptable but not necessarily optimal outcome in the sense of least biased inference and a priori knowledge.

2 Previous Work

Examples of the second class of fusion algorithms previously introduced are briefly reviewed, where the method presented hereis loosely based upon. It is emphasized that any one of those quality measures or blending methods used incorporates someform of structural information and therefore is not purely based on image statistics.

In two of the earlier works images are fused by analyzing the feature space of Laplacian pyramids [9, 10]. For eachpyramid level a pattern selective fusion process computes a feature saliency map, e.g. measuring local gradients. Then apyramid of coefficients is obtained by a selection process on saliency maps that favours images that maximize a compositefeature saliency. The fused result is reconstructed from the original pyramids subject to the generated coefficients.

Desirable image qualities defined in [11] are contrast, saturation and well-exposedness. These pixel-wise measures aretransformed into single scalar values that compose a weight map corresponding to each exposure. The fusion result is obtainedby blending input images with their weight maps. With this naive per-pixel blending disturbing seams appeared in the outputwhere weight maps had sharp transitions due to different absolute intensities caused by different exposures. Smoothing weightmaps through Gaussian or bilateral filtering produced other artefacts. To avoid introducing false edges in originally homoge-neous areas a multiresolution approach using Laplacian pyramid decomposition is applied. Each input image is decomposedand has weight maps computed at each scale. Then blending is carried out for each level separately, while the Laplacianpyramid is collapsed to obtain the fusion result.

In the work of [12] images are devided into rectangular blocks. For each block and exposure its corresponding entropyis computed. Then the exposure with maximal entropy is selected. After every block is linked to its best input image blendingfunctions propagate with decreasing weights starting with unity from the center of each block over the whole output image toperform weighted averaging of selected input images. The block size and the width of the blending filter are free parameters,which are optimally estimated using gradient descent iteration. A similar region-based method with spatial Gaussian exposureblending is described in [13]. Whereas previously block and filter sizes for blending were globally equal, these are locallyadaptive to scene content here using two-dimensional quad-trees to iteratively derive blocks while finer resolution is needed.Besides entropy also intensity deviation and level of detail, that is gradient frequency, are additionally considered as qualitymeasures of a region. Another similar work [14] uses only the level of detail measure with fixed region sizes.

3 Proposed Method

Most of the previous works use a local quality measure of properties that their authors think would well describe a best exposedimage. These are generally local gradient frequencies or intensity variances and are supposed to be large in correctly exposedregions because they reveal structure which is not present in the relatively homogeneous over- or underexposed parts of animage. In fact one doesn’t really know what features are worth preserving in an exposure, except that one wants to maximize

Page 3: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

information content of an image or make it look interesting. Also blending functions used in previous algorithms do nothave a direct connection with image content. These are either explicitly modeled as continuously decreasing Gaussian weightdistributions or are inherently present in the Laplacian pyramid approach. But there is no justification other than beeing smoothand therefore avoiding the creation of artificial edges due to local intensity variations between different exposures withinotherwise homogeneous regions.

3.1 Exposure Blending based on Local Entropy

To overcome these issues the method presented uses entropy as a measure of information, only. Entropy has already beenused in [12], but here it is proposed to use entropy for blending, too, which ultimately means that the fusion result is thepixel-wise average of all input images weighted by their ambient information content. Therefore it becomes now necessary tocompute the entropy measure for a local neighbourhood of every pixel per exposure, whereas the previous method analyzedentropy per entire rectangular blocks. The following outlines the proposed fusion algorithm. Most operations are performed onimages, which are two-dimensional matrices and are denoted by capitalization, e.g. E, whereby computations are local usingelement-wise assignments with the matrix element denoted by e(x, y) correspondingly.

1. Iterate through all stacked exposures En=1,...,N . If En are in color, convert them into their single-channel luminancerepresentation Ln, otherwise set Ln = En. Because in real world images color channels are expected to be highlycorrelated [15], it is justified to measure entropy of the luminance image, only.

2. Define the probability p for a specific gray value g to occur in image n of the stack within a particular square region thatdepends on its location (x, y) and is bounded by its width b(x, y) as

png (x, y) =

∑ b(x,y)−12

i,j=− b(x,y)−12

δg(ln(i, j))

b(x, y)2(1)

whereby the delta function δg counts the occurrences

δg(l) =

{1 l = g

0 otherwise(2)

Then for each Ln compute its corresponding entropy image Sn using Shannon’s definition, whereby a pixel

sn(x, y) = −255∑g=0

png (x, y) · log2(png (x, y)) (3)

measures the information content of Ln within a squared region b(x, y) centered at location (x, y).

3. Fuse all input images En into the resulting image R, which is the sum of all En weighted and normalized by theircorresponding entropy images Sn,

r•(x, y) =

∑Nn=0 s

n(x, y) · en• (x, y)∑Nn=0 s

n(x, y). (4)

IfEn are multi-channel images, then the same weights from the single-plane weight maps Sn are applied to each channelseparately, i.e. in the above formula iteratively replace • with R, G, B planes for color images or either set • = I to fusescalar gray value intensity images En.

3.2 Concept of Non-Structural Statistical Convolution

The usage of entropy in this paper is much like in the sense of a convolution operator: At every pixel location entropy ismeasured over its surroundings and the result is stored for that pixel. Mostly, filter kernels are a pattern of weights and the filterresult is a weighted sum of pixel values that is proportional to the features one wants to detect or enhance. Thus the filter resultcarries structural information about the surroundings due to the pattern of weights.

On the other hand there are filters that do not depend their output on where a specific gray value is found, but on thehistogram statistics of the distribution of gray values itself without weighting pixels due to their distance from the center ofthe filter kernel. Hence, these filters do not have a pattern. Known filter kernels in this sense are the mean and the medianoperator. These compute their result from local histrogram analysis, and may therefore be classified as non-structural statisticalconvolution operators.

Page 4: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

Figure 1. The fusion result obtained by applying fixed size entropy convolution shows disturbing halo artifacts.

Here entropy is used as a non-structural statistical convolution operator: A measure of information is computed from alocal histogram, that describes the level of uncertainty about its distribution. With increasing uncertainty the filter responseincreases, too. Entropy of a histogram has its maximum when the probability of occurance of any gray value is equal, and itsminimum when there is only a single gray value possible. This is intuitive, because when every gray value is equally possiblefor 8-bit images one has to ask 255 questions in the worst case to finally know the value of a certain pixel, but if it is knownthat there is only a single value possible, then the one and only question is which one, and therefore certainty is high. Note thatthe filter response does not make any proposition about the actual spatial gray value pattern the histogram stems from.

The purpose of the entropy filter here is to detect the amount of activity at a pixel within a certain exposure. In previousworks on exposure blending the activity measure has been defined by extend and frequency of gray value gradients that imposea preferred spatial structure of image content. By using entropy no features like gray value edges are preferred, but the onlyfeature is the interest in a certain pixel which becomes greater with increasing uncertainty about its surroundings when spatialcorrelation is assumed from a-priori knowledge. For example, an image region which is made up of two different gray valuesand either shows two homogeneous parts separated by a single bar or otherwise a speckle pattern would give different resultsusing gradient-based activity detection, where the speckle pattern would be preferred because there are more gradients, althoughthis might be regarded as noise by most humans. For entropy both spatial patterns are equally interesting, which makes sense,because it cannot distinguish valuable information from noise.

3.3 Convolution by Entropy with Fixed Filter Size

Exposure blending based on local entropy has been applied to the exposure series of thirteen images obtained from [16] andshown partly in figure 5. The size of the square filter region is set to b(x, y) = 17 pixels for all (x, y) independently. This valuewas chosen because then the filter has 17× 17 = 289 underlying pixels, so that in an 8-bit image every gray value has a chanceto appear, and nevertheless the filter can be centered. The fusion result obtained is shown in figure 1. There are a lot of haloartifacts visible at the borders of objects, which is very common with exposure blending algorithms [11]. These artifacts arethought to exist because of sharp variation in gray value intensities through the exposure series due to bright light sources in thescene. This is confirmed by this work because scenes like figure 6 with smoother intensity distributions do not produce halosusing the same filter as above. During test runs with increasing filter widths it was found that halos disappear from brighterregions, but at the same time lower lighted regions became blurry in the fused result. Hence, it has been concluded that the sizeof the entropy filter needs to locally adapt itselft to large intensity variations.

3.4 Convolution by Entropy with Adaptive Filter Size

In order to prevent halo artifacts in the fusion result the pixel weighting process needs to integrate entropy over a larger scalewindow when the brightness variation is large. This finding is reasonable if spatial correlation of brightness is assumed. Thenfrom a large brightness variation at a pixel it can be concluded that its local neighbourhood at least is nearly saturated in thelonger exposures of the scene. The entropy of a saturated neighbourhood is small, because it has a homogeneous appareanceand therefore certainty about the measurement is high. In turn certainty of measurements in shorter exposures is low (andentropy is high) due to sensor noise. Therefore noise is more prominently weighted in these areas as is noticable from the

Page 5: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

fused image shown in figure 1 (especially at the window frame and the wooden plate of the desktop). In order to absorb thiseffect the filter window needs to be larger over those areas, so that statistics from hopefully non-saturated surroundings can beintegrated to obtain more meaningful weights. On the other hand if non-saturated measurements are locally available over thewhole image set, the filter window should be small, because if it were larger uncertainty would not degrade as fast as possiblewith longer exposures, since uncertainty is obviously more likely if the integration area is larger. Hence in the fusion resultdetails may be blurred because slightly overexposed pixels would still receive high weights.

In order to define an adaptive integration scale of the entropy filter that depends on absolute brightness variation at thepixel of the filter location the following strategy is proposed.

1. An image Ldif that defines the absolute brightness variation of the scene by iterating through the stacked exposures Ln

is given byldif (x, y) = max

0≤n<Nln(x, y)− min

0≤m<Nlm(x, y). (5)

The result Ldif is qualitatively similar to the image of the longest exposure, but e.g. pixels that are continuously saturatedare black here, which makes sense, since independent of the filter size a normalized weighting of intensity values alwaysgives the same saturated result. On the other hand pixels that are saturated in the longest exposure become non-saturatedhere if these are still not underexposed in the shortest exposure. This definition of brightness variation also guaranteesthat the fusion algorithms does not depend on the order of images in the stack.

2. Because artifacts result from sharp variations in brightness differences present within the scene, the standard deviation ofa pixel brightness with respect to the overall range of brightness of the scene is of interest. The variance image Lvar ofthe absolute brightness differences Ldif is

lvar(x, y) =

√(Ldif − ldif (x, y))2 (6)

where Ldif denotes the mean of Ldif . An example of a variance image of absolute brightness variation throughout thescene from figure 5 is shown in figure 2. Please note that the resulting image has large gray values at pixel locationswhere brightness over all exposures is continuously relatively dark, where pixels are underexposed and suffer fromthermal noise, or light, where pixels do not contain valid information about the scene because they are saturated. Bothcases benefit from larger integration scales, because under the assumption that brightness values are spatially correlatedtheir near neighbourhood does not contain valuable information for fusion by weighted averaging.

3. The filter size b(x, y) should be some function of lvar(x, y) to be adaptive to scene content as discussed, and has beenchosen to be simply

b(x, y) := lvar(x, y). (7)

Filter results show that this relation is appropriate although other (non-linear) solutions might perform better. E.g. onecould additionally cutoff the maximum filter width for faster computation, possibly risking recognizable artifacts in thefusion result.

4 Results and Evaluation

The proposed information-theoretic fusion approach with adaptive filter size is compared to the physically-based calibrationmethod for high dynamic range imaging using the algorithm from [4] with adaptive logarithmic tonemapping [17] implementedby the picturenaut software [18] and the ad-hoc Gaussian scale-space approach described in [11] that is implemented by theenfuse software [19]. The three approaches are qualitatively evaluated on four high dynamic range scenes.

4.1 Qualitative Analysis of Fusion Results of Sample Scenes Comparing Different Methods

Sample images of an exposure series and results of the tonemapped HDR image, the enblend software, and the proposed methodare shown for each of the four scenes in figures 5, 6, 7, and 8, respectively. It can be concluded from visual inspection thatthe proposed approach produces results that are perceptually natural without recognizable artifacts. Also it performs equallywell on all example scenes, whereas the tonemapped HDR image in figure 6 has reddish colors and the mecbeth color chart infigure 7 has a foggy appearance. The ad-hoc method implemented by the enfuse software produces visible halo artifacts aroundthe backdrop of the bright light source in figure 7 and produces non-white colors at the window frame in figure 5. With theproposed approach colors of the mecbeth chart in figure 7 are not that vivid as with the enfuse software, but due to the verybright light source this makes a natural appearance to a human viewer. The shortcomings of the tonemapped HDR images are

Page 6: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

Figure 2. Variance image of pixelwise maximum absolute brightness differences measured throughout the time domain of anunordered exposure series.

Image series Fusion results SceneMin Max HDRI Enfuse Prop.

0.458 7.177 7.465 7.334 7.190 desktop2.014 7.716 7.439 7.065 6.856 memorial1.640 5.771 7.585 7.360 7.459 mecbeth4.855 7.247 6.855 6.985 6.583 sunset

Table 1. Entropy values for entire images are given here for comparison of fusion approaches. Higher values mean that there ismore uncertainty in the image, so it contains more information which is better. The first two columns show the minimum andmaximum entropies corresponding to images of the original exposure series. Then entropies of the fusion results are shown.These are measured from the tonemapped HDR image, the result produced by the enfuse software, and the result from theproposed method.

due to the specific tonemapper used. The problem with unnatural colors in the enfuse algorithm can be explained with thefusion method that prefers mid-range gray value intensities.

For a quantitative analysis entropy has been computed for each fusion result over the whole image once. From the overalldiscussion in this paper it can be concluded that a higher entropy which corresponds to increased uncertainty in an image ispreferable because then it contains greater activity and hence more detail. Quantitative comparison of fusion results is difficultdue to the lack of an appropriate metric. One has to keep in mind, that there is also higher entropy in an image if it containsartifacts introduced by the fusion algorithm itself. The results are given in table 1, where entropy measurements from extremalimages of the original exposure scene have been included for orientation. If the fusion algorithm is successful it should containmore information than any other of the original images within the series, which is even not true for the memorial and sunsetscenes, maybe due to their vast range of radiances in the order of more than five magnitudes. From the quantitative results itis apparent that no single algorithm performes best for all scenes although the tonemapped HDR image seems to be preferablefrom that point of view, since the sunset scene is the only one where the enfuse algorithm performs better. As already notedduring qualitative analysis the mecbeth scene is well fused by the proposed method, but for all other scenes it performs worstalthough within reach of the ad-hoc approach implemented by the enfuse software.

It is noted as a remark that there has been a class project [20] (the scene used in figure 8 has been made available there)where the HDR method and other ad-hoc approaches are qualitatively compared. The project concludes that ad-hoc methodsproduce perceptually better results and are more robust.

4.2 Physically-based High Dynamic Range Recovery vs. Information-Theoretic Weighting by Entropy

Because the radiance image with increased dynamic range is not directly displayable a false-color image of 32-bit radiances isgiven in figure 3 corresponding to the scene shown in figure 5. In the spirit of statistical mechanics the aim is to compare theresult obtained through physically-based considerations - like calibrating the response curve of a physical system like the camera

Page 7: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

Figure 3. A false-color image showing relative radiance values of the luminance version of the radiance map corresponding tothe desktop scene. Radiances have been logarithmically scaled and span over four orders of magnitude.

Figure 4. A false-color image showing accumulated entropies that have been summed pixelwise over all filtered entropy imagescorresponding to every luminance image of the exposure series of the desktop scene. Entropy values are linearly scaled andrepresent the amount of information present within a local neighbourhood for every pixel, where the size of the neighbourhoodspatially varies but is constant over time represented by the exposure stack.

device - with the information-theoretic result obtained only through analysis of gray value measurements without modellingknowledge of the physical system generating those measurements [7]. Therefore for comparison a similar false-color image isshown in figure 4 that is the accumulation of entropy values obtained by the proposed entropy filter for every pixel throughoutthe exposure series. Accumulated entropy is expected to approximately correspond to radiance values, because at regionswhere radiance is high scene details should have been measured at most of the shorter exposures and therefore accumulateduncertainty is high, whereas at image regions with lower radiance only in longer exposures details are visible and with shorterexposures those regions become more homogeneous due to being underexposed, and thus receiving lower entropy values. Alsothe integration scale for higher radiances is larger, so there is higher probability for uncertainty, and hence entropy is higher.This expectation can be roughly verified by comparing figures 3 and 4. Since entropy is measured over a neighbourhood it iswith much less detail than the radiance map, but even some tree leafs can be recognized. The overall energy distribution is validtoo, although there is less detail revealed at regions of higher energy due to more aggressive smoothing because of the largerfilter scale.

5 Conclusion

In this paper previously developed ad-hoc fusion algorithms for multiple exposure fusion by different authors have beendiscussed. It has been shown that their fusion approaches are biased by the way how certain image features are preferredwhen using specific cost functions for exposure selection and blending. Here the ad-hoc approach has been refined into aninformation-theoretic framework using local entropy for pixelwise averaging that weights pixels by their ambient informationcontent. Because entropy is based on histogram analysis no specific spatial pixel pattern is unjustifiably preferred. The onlybias that remains is the integration scale of the entropy filter which has been proven by example to be locally dependend to scenebrightness. Therefore a non-structural statistical convolution filter based on local entropy has been newly developed. A methodto determine the filter size solely by analysing gray value statistics coupling mean global brightness variation of the scene with

Page 8: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

local brightness variances at a single pixel has been introduced, whereby the filter size is different per pixel and depends linearlyon brightness variances. It is interesting to note that in terms of statistical mechanics macroscoping and microscopic behavioris linked here. Although a priori knowledge has been applied here, that assumes spatially correlated brightness and an existingrelation between integration scale and brightness variance, the filter size is still derived by data-driven gray value histogramanalysis. Hence, the whole fusion process is based on information found through histogram analysis, only. The proposedmethod has been compared to previously developed methods that are representative for physically-based HDR imaging andad-hoc exposure fusion. Although a qualitative analysis of the proposed method is encouraging, a simple quantitative analysisdoes not favour any one algorithm under consideration. The presented method is theoretical interesting but a disadvantage isits huge computational cost. Depending on the overall filter sizes processing times are up to twenty minutes on a Pentium IV2.2 GHz using a single-threaded implementation. However it has applications in information retrieval and visualization, remotesensing and automatic unsupervised blending of exposure bracketed photographs by artists.

References

[1] Bernd Hoefflinger, editor. High-Dynamic-Range (HDR) Vision. Springer Series in Advanced Microelectronics. Springer,2007.

[2] Greg Ward. Fast, robust image registration for compositing high dynamic range photographcs from hand-held exposures.Journal of Graphics Tools, 8(2):17–30, 2003.

[3] S. Mann and R. W. Picard. Being ’undigital’ with digital cameras: Extending Dynamic Range by Combining DifferentlyExposed Pictures. In IS&T’s 48th annual conference Cambridge, Massachusetts, pages 422–428. IS&T, May 1995.

[4] Paul E. Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In SIGGRAPH’97: Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 369–378, NewYork, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co.

[5] T. Mitsunaga and S.K. Nayar. Radiometric Self Calibration. In IEEE Conference on Computer Vision and PatternRecognition (CVPR), volume 1, pages 374–380, Jun 1999.

[6] M. A. Robertson, S. Borman, and R. L. Stevenson. Estimation-theoretic approach to dynamic range enhancement usingmultiple exposures. In Journal of Electronic Imaging, volume 12, pages 219–228. SPIE and IS&T, April 2003.

[7] E. T. Jaynes. Information Theory and Statistical Mechanics. The Physical Review, 106(4):620–630, May 1957.

[8] F. J. Tapiador and J. L. Casanova. An algorithm for the fusion of images based on Jaynes’ maximum entropy method.International Journal of Remote Sensing, 23(4):777–785, February 2002.

[9] L. Bogoni. Extending dynamic range of monochrome and color images through fusion. In Proc. 15th InternationalConference on Pattern Recognition, volume 3, pages 7–12, 3–7 Sept. 2000.

[10] Ron Rubinstein and Alexander Brook. Fusion of differently exposed images. Final project report, Israel Institute ofTechnology, October 2004.

[11] Tom Mertens, Jan Kautz, and Frank Van Reeth. Exposure fusion. In Pacific Graphics, 2007.

[12] A. Goshtasby. Fusion of multi-exposure images. Image and Vision Computing, 23:611–618, 2005.

[13] A. Vavilin and Kang-Hyun Jo. Recursive hdr image generation from differently exposed images based on local imageproperties. In Proc. International Conference on Control, Automation and Systems ICCAS 2008, pages 2791–2796, 14–17Oct. 2008.

[14] Annamaria R. Varkonyi-Koczy, Andras Rvid, Szilveszter Balogh, Takeshi Hashimoto, and Yoshifumi Shimodaira. Highdynamic range image based on multiple exposure time synthetization. Acta Polytechnica Hungarica, 4(1):5–15, 2007.

[15] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau. Color plane interpolation using alternating projections. IEEETransactions on Image Processing, 11(9):997–1013, Sept. 2002.

[16] Grzegorz Krawczyk. PFScalibration - photometric calibration of HDR and LDR cameras [Computer files]. Exampleimages retrieved September 2009. Available from http://www.mpi-inf.mpg.de/resources/hdr/calibration/pfs.html.

[17] F. Drago, K. Myszkowski, T. Annen, and N. Chiba. Adaptive logarithmic mapping for displaying high contrast scenes. InP. Brunet and D. Fellner, editors, EUROGRAPHICS 2003, volume 22. Blackwell, 2003.

Page 9: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

[18] Marc Mehl. Picturenaut [Computer software], 2005 - 2007. Available from http://www.hdrlabs.com/picturenaut/.

[19] Andrew Mihal, Max Lyons, Pablo d’Angelo, Joe Beda, Erik Krause, Konstantin Rotkvich, and Christoph Spiel. Enblendand Enfuse [Computer software], 2004-2008. Available from http://enblend.sf.net.

[20] Tina Dong, Sufeng Li, and Michael Lin. High dynamic range imaging for display on low dynamic range devices, March2006. Class project Psych 221. Retrieved from http://scien.stanford.edu/class/psych221/projects/06/ in September 2009.

Page 10: An Information-Theoretic Approach to Multi …...0 This is the draft version of Johannes Herwig and Josef Pauli, ”An Information-Theoretic Approach to Multi-Exposure Fusion via Statistical

Figure 5. On the left are samples of 13 exposures. Then fusion results of the HDR, enblend, and the proposed approach follow.

Figure 6. On the left are samples of 16 exposures. Then fusion results of the HDR, enblend, and the proposed approach follow.

Figure 7. On the left are samples of 12 exposures. Then fusion results of the HDR, enblend, and the proposed approach follow.

Figure 8. On the left are samples of 5 exposures. Then fusion results of the HDR, enblend, and the proposed approach follow.