Personalized Image Enhancement - Stanford University · Personalized Image Enhancement Gregory Luppescu Electrical Engineering Stanford University [email protected] Raj Shah Electrical

Personalized Image EnhancementGregory Luppescu

Electrical EngineeringStanford University

[email protected]

Raj ShahElectrical Engineering

Stanford [email protected]

Abstract—We implement an auto-enhancement frameworkthat can learn user preferences to enhance images in a per-sonalized way. Our method finds a maximally representativetraining subset (20 images) out of a large dataset, allowing forefficient training. The parameters chosen in the training phasecan then be applied accordingly to other images in the dataset,automatically creating an entire library of personally customizedimages. To test this method, we performed eight user studies, andthe results suggest that the personalization of image enhancementparameters is sometimes better than conventional enhancementmethods.

I. INTRODUCTION

Image enhancement is almost always a necessary step aftertaking a picture with a digital camera. Many different photosoftware packages, like Google Photos or Photoshop, attemptto automate this process with various auto enhancement tech-niques. None of these methods, however, take user preferenceinto account. Social media outlets like Instagram supply theirusers with many different types of filters to add some level ofcustomization, but the user must manually choose a filter everytime he or she uploads a photo. While manually customizingevery image in a library is not a problem for a small amountof photos, the process can quickly become tedious for largerphoto libraries. In this paper, we implement a system that canlearn a user’s preferences and apply those preferences to therest of his or her photo library.

II. RELATED WORK

Image enhancement has been an active field of research,especially, automatic image enhancement. Kaufman et al.[1]propose an automatic photo enhancement pipeline that iscontent-aware. Unlike other automatic photo enhancementmethods, they take the local semantics of the image intoaccount and specifically, attempt to enhance faces, blue skies,and underexposed regions in the image. Celik and Tjahjadi[2] use Gaussian mixture modeling to model the gray-leveldistribution in the image and then, using this distribution,automatically enhance the contrast of the image. Dale et al. [3]develop a visual search technique to find the closest imagesto the input image from a large web database. The imageenhancement pipeline uses these images to define the inputimage’s visual context, and then uses this visual context for anumber of image enhancement operations like white balance,auto-exposure, and contrast correction. While these methodstry to make the auto enhancement pipeline more content-aware, they do not address the problem of incorporating user

Fig. 1: Example results after applying two different S-curves.The left image results after applying an S-curve with param-eters a = 0.5 and λ = 0.5. The right image results afterapplying an S-curve with parameters a = 0.5 and λ = 1.5.

preferences. The work of this project is a modified version ofthe method developed in [4].

III. ENHANCEMENT PARAMETERS

We decided to focus on two types of image enhancements:contrast manipulation and color correction. Contrast manip-ulation was achieved by applying an S-curve to the images,and color correction was performed by modifying the colortemperature and the tint of the images.

A. S-Curve

The equation for the S-curve is given as

Fig. 2: Example results after modifying T and h

y =

{a− a(1− x

a )λ if x ≤ aa+ (1− a)(x−a1−a )λ otherwise

(1)

where a determines the inflection point of the S-curve, λdetermines the shape of the S-curve, x is a normalized inputpixel value, and y is a normalized output pixel value. Forλ ≥ 1, applying an S-curve maps pixel values that are greaterthan a to lower values, and maps pixel values that are lessthan a to higher values. For λ ≤ 1, the opposite is achieved.Examples of applying various S-curves to an image are shownin figure 1.

B. Color Correction

Color correction can be achieved by modifying the colortemperature T and the tint h. The color temperature is definedas the wavelength of light emitted by an ideal blackbodyradiator heated to some temperature, where warmer colorsare more blue, while cooler colors are more red. In essence,modifying the color temperature of an image simply changeshow warm or cool (blue or red) an image looks. Orthogonalto color temperature is the tint, which determines the amountof green in an image. To apply color correction, we simplycalculate

Rout = Rin −∆T

Gout = Gin + ∆h

Bout = Bin + ∆T

(2)

where Rin, Gin, and Bin are the input RGB values for agiven pixel, ∆T is the change in color temperature, ∆h isthe change in tint, and Rout, Gout, and Bout are the outputRGB values for a given pixel. Examples of changing colortemperature and tint are shown in figure 2.

IV. METHODOLOGY

A. Dataset

The dataset used for this project consisted of 500 photos.The images were selected to represent a typical user photolibrary, consisting of a wide variety of contexts. The categoriesof photos included rural landscapes, urban areas, faces andpeople, bodies of water, etc. The images were taken from [5].

B. Pre-processing

To deal with low quality images in the dataset, all imageswere auto enhanced before applying personalized enhance-ment. The auto enhancement procedure consists of two steps:

1) Auto-white balancing: The images are white-balancedusing the gray-world assumption for the brightest 5% ofpixels. This step uses 3 scaling factors to transform each colorchannel.

2) Auto-contrast stretch: Here, we first convert the imageinto grayscale and then find intensities IL and IH , where ILis greater than at most 0.4% of the intensities and IH is lessthan at most 1% of the intensities. Finally, the pixel values arelinearly transformed such that IL is mapped to 0 and IH ismapped to 1.

C. Distance Metric

One of the critical steps in this project is finding an effectivemetric to measure the similarity between a pair of images. Thedistance metric is formulated on the basic assumption that ifthe images are similar, then their auto enhancement parametersare also similar. Thus, we define the sum of the absolutedifference between the 5 auto enhancement parameters (foundin section IV-B) for a pair of images as

Dparams(i, j) =

5∑k=1

|pik − pjk| (3)

where pik is the kth auto enhancement parameter for image i.We define our distance metric as

Dimagesα (i, j) =

25∑n=1

αnDn(i, j) (4)

where

α∗ = arg minα

∑i,j

‖Dimagesα (i, j)−Dparams(i, j)‖22 (5)

24 of the distance metrics we used were the KL Divergence,L1, L2, and L∞ norms of the differences between:• Two color images• The R, G, and B histograms of two color images• The grayscale values of two color images after being

converted to grayscale• The grayscale histograms of two color images after being

converted to grayscaleThe last distance metric is the entropy of the difference

between two color images. To find the value of α∗, we usedthe BFGS algorithm to minimize the objective function givenin equation 5. In order for equation 5 to be well conditioned,it was necessary to normalize the individual distance functionssuch that they had comparable values. The L1 norm and KLDivergence of the differences of the color images were dividedby 3 ∗ number of pixels. The L1 norm and KL Divergence ofthe differences of the grayscale images were divided by thenumber of pixels. The L2 norm of the differences of the colorimages were divided by the square root of 3∗number of pixels.The L2 norm of the differences of the grayscale images weredivided by the square root of number of pixels. The L1 normof the differences of histograms were normalized by 256 ∗number of pixels, as the number of bins in the histograms was

256. Finally, the L2 norm of the differences of histograms werenormalized by the square root of 256 ∗ number of pixels.

D. Training Set Selection

Since the purpose of this pipeline is to make image per-sonalization as efficient as possible, it is imperative to have asmall training set to ensure a seamless user experience. Ideally,each training image would represent a specific category inthe dataset. For instance, if our dataset consisted of imagesof faces, cats, and trees, our training set should at leastcontain one face, one cat, and one tree. Then, the parameterschosen for each of these training images would be appliedto all other images within their respective categories. Ideally,the distance metric learned in IV-C will determine similaritybetween images such that images in the same category areclosest to each other. Practically, however, with a large unla-beled image set containing images not belonging to specificcategories, we must select a training set that is maximallyrepresentative of the dataset. This problem can be interpretedas a sensor placement problem [6], where the top n sensors(training images), when combined, share the most amountof information with the rest of the dataset. To achieve thisobjective we employ a greedy selection procedure found in [6],which considers a Gaussian process with covariance matrix

ki,j = exp

(−N2 Dimages

α (i, j)∑k,lD

imagesα (k, l)

)(6)

where Dimagesα (i, j) is the distance described in IV-C

between images i and j, and N is the total number of imagesin the dataset. The covariance matrix K has an intuitiveinterpretation: each entry (i, j) encodes the similarity betweenimage i and image j, where a value of 0 signifies the twoimages are infinitely far apart, and a value of 1 signifies theimages are at the same location in the image space.

As we are employing a greedy algorithm, each selectionstep chooses the image that maximizes the gain in mutualinformation among the unselected images:

I∗ = arg maxIi∈U

f(i) (7)

where

f(i) = MI(U − i;S ∪ i)−MI(U − i;S)

=1− kTS,iK

−1S,SkS,i

1− kTU−i,iK−1U−i,U−ikU−i,i

In equation (7), MI(x, y) is the mutual information betweenx and y [7]. S is the set of images already selected to be a partof the training set, and U is the set of unselected images. KS,S

is the similarity matrix among images in S, whose values arecalculated according to (6). KU−i,U−i is the similarity matrixamong images in U excluding image i. Similarly, kS,i andkU−i,i are the similarity vectors between images in S and U−iwith image i. It can be intuitively seen that if the the similarityof image i with U − i is high, then the denominator is small.Conversely, if the similarity of image i with S is low, then the

Fig. 3: Training GUI implemented in MATLAB

numerator is large. Thus, finding i such that it maximizes f(i)implies that image i is very similar to the unselected images,but at the same time, dissimilar to selected images, which iswhat we are trying to achieve. Greedily finding an image iat each iteration gives us a ranking of images that maximallyrepresent the dataset. For this project, we choose the top 20images as our training set. The rationale behind choosing 20images is to make the training convenient and time-efficientfor the user.

E. Parameter Set Selection

As mentioned in section III, we learn 4 enhancement param-eters in this project. For each of the enhancement parameters,we decided to use 3 different values, which gives us a totalof 34 = 81 different parameter combinations for each trainingimage. The values used for each parameter are listed in tableI. To decide the 3 values per parameter, we applied variouschanges to see how resulting images were affected. As wasexpected, the more aggressive the change, the more unnaturalthe image would look. We ended up choosing parametersthat noticeably changed the images, while keeping the imageslooking reasonably natural.

Asking the user to choose the best parameter combinationout of 81 different choices is simply impractical. Hence, wedecided to select an optimal subset of parameter combinationsto make the training process less tedious. To achieve this, perimage, we apply all 81 different parameter combinations andthen use the same sensor placement optimization procedurein section IV-D to choose the top 8 parameter combinations.As a result, the 8 parameter combinations found maximallyrepresent the parameter space for that image. Now, insteadof choosing among 81 versions of the same image, the useronly has to choose between 9 versions: the original autoenhanced image, and the 8 images enhanced using the selectedparameters.

F. Training

Once the training images and parameter combinations pertraining image are selected, the system is ready to learn a

Fig. 4: Image Processing Pipeline to Personally Enhance Newly Added Images

Parameters Valuesλ 0.75, 1, 1.5a 0.3, 0.5, 0.7

∆T -7.5, 0, 7.5∆h -7.5, 0, 7.5

TABLE I: Values used for each parameter

user’s preferences. For training, we implemented a GUI inMATLAB, which can be seen in figure 3. For each trainingimage, a user chooses which enhanced photo he or she prefersthe most. The GUI signifies the image chosen by highlightingit with a green border. The user can traverse the training setusing forward and backward buttons, and can press the ”CheckSelections” button to check which training images still haveunspecified parameter combinations. We tried to make the GUIvery easy to use so that the training phase would be as smoothas possible. After a user has selected all preferences for eachtraining image, he or she can click ”Finish Training” to savethe preferences chosen.

V. PROCESSING PIPELINE

After the user preferences are learned, all other imagesin the dataset can be personally enhanced. The enhancementpipeline (figure 4) is as follows:

1) Transform the image into the perceptually linear domain(we used γ = 2.2)

2) Auto enhance the image using the procedure describedin IV-B

3) Find the closest training image to the auto enhancedinput image using the distance metric learned in IV-C

4) Get the enhancement parameters chosen by the userduring the training phase associated with the closesttraining image

5) Apply these enhancement parameters to the test image6) Perform the inverse operation from step 1) to produce

the final image

VI. RESULTS

A. Testing Procedure

To test the efficacy of our method, we took 10 test imagesfrom the dataset, each with varying contexts (faces, landscapes,

Fig. 5: Testing GUI implemented in MATLAB

etc.). We enhanced these images three different ways: usingour personalized enhancement method, the Google Photos autoenhancement feature, and Photoshop’s auto contrast and colorcorrection tools. We then performed one-on-one comparisonsbetween the personalized images and the original images,the personalized images and the Google enhanced images,and the personalized images and the Photoshop enhancedimages. To perform the comparisons, we created another GUIin MATLAB which can be seen in figure 5. Note that there isan option for ”No Preference,” as some pairs of images maybe indiscernible from one another.

B. User Study

The results of 8 user studies can be seen in table II. A littlemore than 50% time, users preferred the personalized imagesto the original images, had no preference about 20% of thetime, and preferred the original images about a 25% of thetime. These results are very reasonable, as more often then not,some type of enhancement is better than no enhancement at all.When compared to Google Photos and Photoshop, our methodalso yielded reasonable results, as users preferred personalizedimages about 30% of the time, had no preference about 25%of the time, and preferred the professional enhancement toolsabout 45% of the time. Since we are only modifying fourparameters, our method is limited as to how effectively itcan enhance an image. We expect that the auto enhancementfeatures for Google Photos and Photoshop both modify manyother parameters, which could be one reason why those photoswere preferred a majority of the time. Another source of errorcould come from the fact that the learned distance metric isimperfect. For example, if according to our distance metric

a test image of a face is closer to a training image of arural landscape than to a training image of a face, parametersassociated with the training image of the rural landscapewould be applied to the test image of the face, leading toundesirable enhancement results. We also noticed that peoplewho preferred more artistic looking photos (like, for instance,a photo put through an Instagram filter) liked the personalizedresults better than the professionally auto enhanced results.This observation also makes sense, as the professional autoenhancement algorithms are most likely trying to keep imageslooking as natural as possible. Overall, we were pleased withthe results.

Personalization vs. OriginalPreferred Personalization 54.3%

No Preference 18.6%Preferred Original 27.1%

Personalization vs. Google PhotosPreferred Personalization 28.6%

No Preference 21.4%Preferred Google Photos 50.0%

Personalization vs. PhotoshopPreferred Personalization 30.0%

No Preference 30.0%Preferred Photoshop 40.0%

TABLE II: User Study Results

VII. CONCLUSIONS & FUTURE WORK

In this project, we implement an end-to-end pipeline to learnuser preferences to enhance images in a personalized way.The five major components of this project are: computing adistance metric, finding a training set that maximally repre-sents the dataset, finding an optimal parameter set for eachtraining image, training, and finally, enhancing the images. Theefficiency of this approach lies in the fact that almost all of theprocessing can be done offline, so the user is involved onlyin a short training phase. To test the validity of this method,we carried out user studies, and the fact that our method waspreferred over professional software for some images showsthe potential of this approach.

Due to limited time and computational resources, we de-cided to work with a smaller dataset of images and to alsolearn fewer parameters. However, we feel that a larger datasetof images could yield a more accurate distance metric, whichwould in turn yield a training set that is more representative ofthe dataset. Also, applying more enhancements and thus, learn-ing more parameters can help achieve better personalization.Lastly, exploring different ways of learning distance metricsand finding training images that hold maximum informationcould go a long way in making this pipeline more effective.

ACKNOWLEDGMENT

We would like to thank Prof. Gordon Wetzstein and the TAsof EE 368 for guiding us through the course of this projectand for providing the necessary resources.

REFERENCES

[1] Liad Kaufman, Dani Lischinski, and Michael Werman.Content-aware automatic photo enhancement. In Com-puter Graphics Forum, volume 31, pages 2528–2540.Wiley Online Library, 2012.

[2] Turgay Celik and Tardi Tjahjadi. Automatic image equal-ization and contrast enhancement using gaussian mixturemodeling. IEEE Transactions on Image Processing, 21(1):145–156, 2012.

[3] Kevin Dale, Micah K Johnson, Kalyan Sunkavalli, Wo-jciech Matusik, and Hanspeter Pfister. Image restorationusing online photo collections. In 2009 IEEE 12th Interna-tional Conference on Computer Vision, pages 2217–2224.IEEE, 2009.

[4] Sing Bing Kang, Ashish Kapoor, and Dani Lischinski. Per-sonalization of image enhancement. In Computer Visionand Pattern Recognition (CVPR), 2010 IEEE Conferenceon, pages 1799–1806. IEEE, 2010.

[5] Labelme, the open annotation tool. http://labelme.csail.mit.edu/Release3.0/.

[6] Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-optimal sensor placements in gaussian processes: Theory,efficient algorithms and empirical studies. Journal ofMachine Learning Research, 9(Feb):235–284, 2008.

[7] Thomas Cover and Joy Thomas. Elements of InformationTheory. Wiley, New York, 2006.

Documents

Personalized Image Enhancement - Stanford University · Personalized Image Enhancement Gregory Luppescu Electrical Engineering Stanford University [email protected] Raj Shah Electrical