Blur perception: An evaluation of focus measures

Blur perception:An evaluation of focus measures

Robert Thomas Shilston

Department of Computer Science, UCL

September 2011

I, Robert Thomas Shilston confirm that the work presented in this thesis is my own.

Where information has been derived from other sources, I confirm that this has been

indicated in the thesis.

1

Abstract

Since the middle of the 20th century the technological development of conven-

tional photographic cameras has taken advantage of the advances in electronics and

signal processing. One specific area that has benefited from these developments is

that of auto-focus, the ability for a cameras optical arrangement to be altered so

as to ensure the subject of the scene is in focus. However, whilst the precise focus

point can be known for a single point in a scene, the method for selecting a best

focus for the entire scene is an unsolved problem. Many focus algorithms have been

proposed and compared, though no overall comparison between all algorithms has

been made, nor have the results been compared with human observers.

This work describes a methodology that was developed to benchmark focus algo-

rithms against human results. Experiments that capture quantitative metrics about

human observers were developed and conducted with a large set of observers on

a diverse range of equipment. From these experiments, it was found that humans

were highly consensual in their experimental responses. The human results were

then used as a benchmark, against which equivalent experiments were performed by

each of the candidate focus algorithms.

A second set of experiments, conducted in a controlled environment, captured the

underlying human psychophysical blur discrimination thresholds in natural scenes.

The resultant thresholds were then characterised and compared against equivalent

discrimination thresholds obtained by using the candidate focus algorithms as au-

tomated observers. The results of this comparison and how this should guide the

selection of an auto-focus algorithm are discussed, with comment being passed on

how focus algorithms may need to change to cope with future imaging techniques.

2

Forward

As part of my undergraduate studies, I invented a new focus measure. This

was refined and implemented in hardware such that it was used to focus a video

camera, and appeared to outperform the autofocus function built into the camera.

My novel auto focusing algorithm was duly patented [1] and licensed to British

Telecom PLC. However, my curiosity was not sated, and the question remained in

my mind as to whether this new method was ‘better’ than existing methods, or

even what constituted ‘better’ when considering focus. This thesis is the result of

my exploration into focus.

I’d like to extend sincere thanks to Fred Stentiford and Steven Dakin for their

advice, and also to my friends, family, and especially my wife for their support and

patience.

3

Contents

1 Introduction 15

1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Review of literature 19

2.1 Human vision mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.1 Physical mechanism for focusing the eye . . . . . . . . . . . . 20

2.1.2 Source of the control signals . . . . . . . . . . . . . . . . . . . 21

2.1.3 Neurological pathways . . . . . . . . . . . . . . . . . . . . . . 26

2.1.4 The 2Hz oscillations . . . . . . . . . . . . . . . . . . . . . . . 26

2.1.5 Models of accommodation . . . . . . . . . . . . . . . . . . . . 27

2.2 Visual psychophysics . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.1 Early psychometric experiments . . . . . . . . . . . . . . . . . 28

2.2.2 Preliminary investigations of blur . . . . . . . . . . . . . . . . 29

2.2.3 Quantifying blur discrimination . . . . . . . . . . . . . . . . . 32

2.2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2.5 Natural scenes and the human visual system . . . . . . . . . . 35

2.2.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.3 Focus measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.3.1 Timing and performance . . . . . . . . . . . . . . . . . . . . . 46

2.3.2 Getting the images . . . . . . . . . . . . . . . . . . . . . . . . 46

2.3.3 The nature of blur . . . . . . . . . . . . . . . . . . . . . . . . 47

2.3.4 Establishing the ground truth . . . . . . . . . . . . . . . . . . 48

2.3.5 Quantitative comparisons . . . . . . . . . . . . . . . . . . . . 48

2.4 Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.4.1 Auto focus in conventional cameras . . . . . . . . . . . . . . . 55

2.4.2 Other devices . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.4.3 Recent developments . . . . . . . . . . . . . . . . . . . . . . . 56

4

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.7 Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.8 Key research questions . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3 Methodology 65

3.1 Image selection and acquisition . . . . . . . . . . . . . . . . . . . . . 65

3.1.1 Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.1.2 Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.1.3 Other sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.2 Mathematical focus measures . . . . . . . . . . . . . . . . . . . . . . 70

3.3 Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.4 Psychophysical assessment . . . . . . . . . . . . . . . . . . . . . . . . 74

3.4.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.4.2 Stimuli presentation . . . . . . . . . . . . . . . . . . . . . . . 74

3.4.3 Presentation duration . . . . . . . . . . . . . . . . . . . . . . . 75

3.4.4 Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.4.5 Screen position . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.6 Viewing distance . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.7 Viewing method . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.8 Observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.4.9 Number of trials and analysis of responses . . . . . . . . . . . 77

3.4.10 Preparing the stimuli . . . . . . . . . . . . . . . . . . . . . . . 78

3.4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.5 Mathematical tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.5.1 QUEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.5.2 Computation and manipulation of the slope parameter . . . . 83

3.5.3 Scoring focus measures . . . . . . . . . . . . . . . . . . . . . . 83

3.6 Data protection and ethics . . . . . . . . . . . . . . . . . . . . . . . . 84

4 Subjective experiments 85

4.1 Ground truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.1.1 Desktop application . . . . . . . . . . . . . . . . . . . . . . . . 87

4.1.2 Web application . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.1.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.1.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.2 Building a focus curve . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5

4.2.1 Time to re-order . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.2.2 Image viewing frequency . . . . . . . . . . . . . . . . . . . . . 101

4.2.3 Time spent on each image . . . . . . . . . . . . . . . . . . . . 102

4.2.4 Blur equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.2.5 Half-way focused . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5 Performance of focus measures 109

5.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.2 Quantitative scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.3 Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6 Psychophysical experiments 125

6.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7 Model observers 139

7.1 Stimuli and presentation . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.2 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7.3 Decision noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.4 The impact of noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

8 Conclusions and future work 150

8.1 Review of key research questions . . . . . . . . . . . . . . . . . . . . 153

8.2 Significant findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Bibliography 180

Appendices 182

A Example photographs 182

B Preliminary experiments 185

6

C New focus measures 188

D Hardware and software 190

D.1 Digita and the Kodak DC290 . . . . . . . . . . . . . . . . . . . . . . 190

D.2 Sony EVI-D30 camera . . . . . . . . . . . . . . . . . . . . . . . . . . 193

E Scoring the focus measures 194

F Full ANOVA results for ‘best’ experiment 201

G Full results for model observers 208

H Software implementation 217

H.1 absolutegradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

H.2 absolutevariation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

H.3 alphaadult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

H.4 alphaEquals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

H.5 alphaimageensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

H.6 alpharedonion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

H.7 autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

H.8 brennergradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

H.9 chernfft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

H.10 cpbd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

H.11 cranepeak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

H.12 cranesum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

H.13 dopowerplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

H.14 energylaplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

H.15 energylaplace5a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

H.16 energylaplace5b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

H.17 energylaplace5c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

H.18 entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

H.19 groenvariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

H.20 histogramentropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

H.21 hlv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

H.22 imagepower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

H.23 jnbm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

H.24 laplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

H.25 masgrn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

H.26 menmay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

7

H.27 normalizedgroenvariance . . . . . . . . . . . . . . . . . . . . . . . . . 261

H.28 phasecongruence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

H.29 phasecongruence2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

H.30 randomnumber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

H.31 range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

H.32 rawlaplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

H.33 rmscontrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

H.34 scorefocusmeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

H.35 setalpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

H.36 smd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

H.37 sml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

H.38 squaredgradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

H.39 standarddeviationbasedautocorrelation . . . . . . . . . . . . . . . . . 277

H.40 tenengrad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

H.41 thresholdedabsolutegradient . . . . . . . . . . . . . . . . . . . . . . . 280

H.42 thresholdedcontent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

H.43 thresholdedpixelcount . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

H.44 triakis11s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

H.45 va . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

H.46 voll4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

H.47 voll5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

H.48 waveletw1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

H.49 waveletw2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

H.50 waveletw3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

8

List of Figures

2.1 Accommodation in the eye . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Diagram of chromatic aberration . . . . . . . . . . . . . . . . . . . . 22

2.3 Chromatic aberration in a photo of a block of flats . . . . . . . . . . 23

2.4 Ten responses with five initial errors from a typical experiment . . . . 25

2.5 Toates’ negative feedback control system . . . . . . . . . . . . . . . . 28

2.6 Just-noticeable frequency thresholds . . . . . . . . . . . . . . . . . . . 29

2.7 Blur discrimination thresholds . . . . . . . . . . . . . . . . . . . . . . 30

2.8 Zero-crossing used to explain basic illusions . . . . . . . . . . . . . . . 31

2.9 Just-noticeable amplitude of focus oscillations . . . . . . . . . . . . . 33

2.10 Blur discrimination with motion . . . . . . . . . . . . . . . . . . . . . 34

2.11 Contrast sensitivity varies with spatial frequency . . . . . . . . . . . . 36

2.12 Spatial frequency spectra for natural images . . . . . . . . . . . . . . 37

2.13 Comparison of image frequency spectra . . . . . . . . . . . . . . . . . 38

2.14 Impact of edge density . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.15 Simultaneous blur contrast . . . . . . . . . . . . . . . . . . . . . . . . 41

2.16 Attributes used for quantitative comparison . . . . . . . . . . . . . . 50

2.17 Compressive imaging example . . . . . . . . . . . . . . . . . . . . . . 57

2.18 Fujifilm’s 4th generation Super CCD HR . . . . . . . . . . . . . . . . 57

2.19 Sample photograph from a plenoptic camera . . . . . . . . . . . . . . 58

2.20 Synthetic aperture photograph . . . . . . . . . . . . . . . . . . . . . . 59

2.21 Depth from focus example . . . . . . . . . . . . . . . . . . . . . . . . 59

3.1 Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.2 Fresh flowers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3 Povray bolt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.4 Van Hateren images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.5 Convergence of a QUEST trial . . . . . . . . . . . . . . . . . . . . . . 82

4.1 Simple lens arrangement . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.2 Griffin PowerMate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

9

4.3 ‘Best’ experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.4 ‘Best’ scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.5 Observer’s journey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.6 Age of participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.7 Questionnaire results . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.8 Selection of ‘best’ image . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.9 Gender comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.10 Popularity of images . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.11 Distribution of viewing times . . . . . . . . . . . . . . . . . . . . . . 103

4.12 Distribution of viewing times . . . . . . . . . . . . . . . . . . . . . . 103

4.13 ‘Equivalent’ experiment . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.14 Equivalent blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.15 Halfway positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.16 ‘Halfway’ experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.17 Equivalent blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.1 Selected focus measures . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.2 Selection of ‘best’ image by focus measures . . . . . . . . . . . . . . . 111

5.3 Focus measures struggle with bolt scene . . . . . . . . . . . . . . . . 113

5.4 Bolt at different distances . . . . . . . . . . . . . . . . . . . . . . . . 115

5.5 Comparison of focus measures (1) . . . . . . . . . . . . . . . . . . . . 116



6.1 Previous results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.2 Sample stimuli (00005) . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6.3 Sample stimuli (01342) . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.4 Raw psychophysical results (01342) . . . . . . . . . . . . . . . . . . . 133

6.5 Raw psychophysical results (00005) . . . . . . . . . . . . . . . . . . . 134

6.6 Overall psychophysical results . . . . . . . . . . . . . . . . . . . . . . 137

7.1 Discrimination curves for measures without noise . . . . . . . . . . . 141

7.2 Effect of decision noise . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.3 Discrimination varies with noise . . . . . . . . . . . . . . . . . . . . . 147

A.1 Land and Water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

A.2 Village wall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

B.1 Comparison of monocular and binocular discrimination . . . . . . . . 186

B.2 Comparison of monocular with average binocular discrimination . . . 186

10

G.1 Most similar discrimination curves for all measures (1-1) . . . . . . . 209








11

List of Acronyms and

Abbreviations

AE auto-exposure

AF auto-focus

2AFC two-alternative forced choice

ASIC application-specific integrated circuit

AWGN additive white gaussian noise

CCD charge-coupled device – An image sensor

CIE International Commission on Illumination (Commission internationale de

l’clairage)

CIEDE CIE difference equation

CSV comma separated values

D Dioptres

DAC digital to analogue converter

DTD document type definition

EW Edinger-Westphal

FFT fast Fourier transform

GPU graphical processing unit

GUI graphical user interface

HDR high dynamic range

12

HSV hue, saturation, value

HVS human visual system

MPEG Motion Pictures Experts Group

MSE Mean square error

NTSC National Television System Committee, and name of the analogue

television system used predominantly in North America

PAL phase alternate line, an analogue television system used in the UK and many

other countries

PDF probability distribution function

PSF point spread function

PTP picture transfer protocol

RGB red, green, blue

RMS root mean square

SLR single-lens reflex

VA Visual Attention

VEP visual evoked potential

XML extensible markup language

13

Glossary

Amblyopia is a developmental anomaly of spatial vision, which is characterised by

reduced visual acuity and reduced contrast sensitivity, colloquially known as

a “lazy eye”.

Emmetrope is a perfectly sighted individual

Hypermetropia long-sighted

Myopia short-sighted

Pedestal condition is the magnitude of the stimulus for which a psychophysi-

cal threshold is being determined. Typically the threshold is determined for

multiple pedestal conditions.

Plenoptic camera is one which uses a microlens array to capture 3D light field

information about a scene. The microlens array sits between the lens of the

camera and the image sensor. It refocuses light onto the image sensor to

create many small images taken from slightly different viewpoints, which are

manipulated by software to extract depth information.

Presbyopia is the condition where the eye exhibits a progressively diminished abil-

ity to focus with age

14

Chapter 1

Introduction

Every day, countless millions of photographs are taken around the world. The pro-

cess of taking photos has changed over the past decade from using the relatively

expensive silver-halide based approach to digital capture, with cameras being em-

bedded into ever more devices. This change to digital capturing, where there is

no per-image cost and the ability to instantly review images, is enabling and en-

couraging people to take more and more photos. Increased processing power means

advanced features can be incorporated into cameras, such as cameras that wait for

people to smile before automatically taking a photo. Beyond domestic photography,

improvements in storage, communication and processing technologies have enabled

growth in automated imaging applications, ranging from security to quality control

in manufacturing processes.

Despite this rapid growth and development of additional features, there remains

a fundamental aspect of imaging that merits investigation. To capture a photo, light

emanating from the scene goes through one or more lenses to form an image. The

lenses need to be arranged such that the image formed on the camera’s sensor is in

focus, a requirement that is just as important in human vision as it is in a camera.

Numerous focus measures have been proposed, each of which assigns a score to an

image such that the image with the highest score is the image that is best focussed.

This information is used to help cameras adjust their optics to ensure that their

image is in focus. However, a comprehensive review of these measures comparing

their performance with the judgements made by humans has not been conducted.

It is possible to establish the precise configuration of lenses required to ensure

light rays at a particular point in a scene are focussed. However, the method for

selecting a best overall focus for an entire scene is an unsolved problem. This work

starts from the assumption that a camera (or other imaging device) can produce

images of a given scene at a range of focal distances, and explores how these images

15

can be compared to determine which is the ‘best’ for the entire scene. Such an ap-

proach is independent of the optics of the camera, and other device-specific artifacts

arising from image acquisition.

1.1 Objectives

This thesis proposes that there is an algorithm that can reproduce the results ob-

tained from human perceptual and subjective experiments. To explore this state-

ment, several areas are examined. Firstly, existing comparisons of focus measures do

not have a robust approach to establishing the ground truth as to which candidate

image is best focussed. Secondly, previous comparisons have not been comprehensive

– no overall comparison between all algorithms has been made, nor have the results

been compared with human observers. Given the complexities of real-life scenes,

an approach for identifying the truth needs to be established, and used to explore

whether human observers are consensual in their opinion. Beyond identifying the

‘best’ image, other ways of characterising human blur opinions and perception need

to be explored and performed.

The various focus measures found in the literature and developed during the

course of this work then need to be compared with the results obtained from hu-

man experiments, to establish which measure is best able to reproduce the human

results. To permit such a comparison, it might be necessary to consider how the

methodologies need adapting to suit the differences in behaviour between human

and model observers.

To narrow the scope of this work, sample images used will be normal scenes –

scenes that the man-in-the-street might want to photograph. Skilled photographers

who make clever use of depth-of-field or bokeh1 are well able to configure their

camera to achieve the artistic effect they desire, and so are unlikely to need help with

focus. However, most camera owners do not have such a level of skill, and for them it

is important that their camera tries to help them take the photo they want. Given

this attention on normal photographs, the observers used for establishing human

results should be normal people who are neither highly skilled photographers nor

with experience of psychophysics.

1The result of the camera’s optics on out-of-focus points of light

16

1.2 Contribution

A comprehensive account of the state of existing focus measures is provided in the

literature review, and software implementations for all focus measures encountered

during this review are included in the appendix for future research to build upon. For

the first time, a full comparison of all focus measures in the literature is performed

both with a single scene and assessing the overall performance across a number of

scenes. To assist with future focus measure assessment (and related studies), the

image library developed during this work has also been made available.

The results from the experiments described in this thesis increase our understand-

ing of human opinions of image blur. Two key results from human experiments are

reported: Firstly, it is shown that when humans are asked to select the best focused

image of a scene, they are highly consensual, and there appears to be no subset of

the population with a significant difference of opinion.

Secondly, blur discrimination thresholds have been measured in natural scenes,

and these are found to be of a ‘dipper’ shape of similar magnitude to the discrimi-

nation thresholds measured in synthetic images.

Finally, a methodology for assessing focus measures is proposed. Using this

methodology shows that few focus measures are able to reproduce human blur per-

ception characteristics. This suggests that none of the focus measures reviewed in

this work are the sole method used by the human vision system.

1.3 Structure

The structure of the thesis is as follows: Chapter 2 presents a review of literature

relating to the human visual system, models of focus and perception, and recent de-

velopments in image acquisition technology. Chapter 3 provides more details about

the selection and preparation of sample images, and the experimental methodologies

deployed.

Chapter 4 describes and discusses experiments that capture quantitative metrics

about human observers that were conducted with a large population of observers on

a diverse range of equipment. These human results were then used as a benchmark,

against which equivalent experiments were performed by each of the candidate focus

algorithms.

A second set of experiments are described in Chapter 6, which were conducted

in a controlled environment to capture the underlying human psychophysical blur

discrimination thresholds in natural scenes. The resultant thresholds were then

characterised and compared in Chapter 7 against equivalent discrimination thresh-

17

olds obtained by using the candidate focus algorithms as automated observers. The

results of this comparison and how this should guide the selection of an auto-focus

algorithm are discussed, with comment being passed on how focus algorithms may

need to change to cope with future imaging techniques.

In Chapter 8, a summary of all the results is presented and compared with

the thesis objectives. Significant findings are presented, and an indication of the

direction of future work is provided.

Finally, the appendix includes software implementations of the focus measures

used throughout this work, providing a solid base from which this work can be

continued.

18

Chapter 2

Review of literature

This section introduces material that sets the context to, and underpins the theory

and history of the work that has been carried out in a variety of areas that are used

through this thesis, as well as providing an overview of the state-of-the-art of related

fields. It leads on to a discussion, conclusions and direction for the research that

follows.

2.1 Human vision mechanisms

Many living species have the ability to see, and their various vision structures first

evolved during the Cambrian period. The human vision system is very good for our

needs, but is not a perfect seeing-device. Other species can do far better – owls,

for example, have greater visual acuity, and bees can see ultra-violet light which is

invisible to ourselves.

Debate about how the eye worked started with Plato, in the fourth century BC

who wrote that light emanated from the eye, though Aristotle advocated the oppo-

site, that the eye receives rays of light. In the second century AD, Galen identified

many parts of the eye, including the retina, cornea and iris, though concluded that

the crystalline lens was the principal instrument of vision, based on the evidence of

cataracts – where clouding of the lens reduces image clarity. However, real progress

into the understanding of the eye only started during the 16th century, leading to

Kepler proposing the theory of a retinal image in 1604. More recently, most no-

table were the works by Young, Helmholtz and Maxwell exploring how the retina

functions, and which culminated in the trichromatic theory of colour vision [2, 3].

19

2.1.1 Physical mechanism for focusing the eye

The biological term for the way the eye focuses is accommodation, but how the eye

actually does this was only resolved in the mid 19th century. It was the suggestion

by Helmholtz in 1856 that the lens is elastic and held under tension by various

muscles (the ciliary muscles) which became the accepted theory of accommodation

[4]. This is shown in Figure 2.1 which demonstrates how the lens changes shape to

focus the image on the retina. Table 2.1 lists key individuals and the theories of

accommodation they proposed.

Proponent Focusing mechanismHome (1795) Cornea curvature changesMagendie (1816) Eye has universal focusBurow (1841) Lens moves back and forthDonders (1846) Pupil changes in sizeListing (1853) Eye ball elongatesHelmholtz (1856) Lens changes shape

Table 2.1: Proposed accommodation methods (adapted from [4])

Accommodation is present in a wide range of animals, each of which has a differ-

ent accommodative range, as required by their various habitats and behaviour. This

is summarised in Table 2.2. Interestingly, the cat has proven most challenging to

assess, “with studies reporting amplitudes between less than 2 Dioptres (D), and as

much as 12D ... some have doubted that cats accommodate at all. ... One obvious

possibility is that there is a genetic variation, with in-bred house cats having poorer

accommodation than animals that have to catch mice to stay alive!” [6].

Figure 2.1: This diagram shows how the lens changes shape to ensure that the image isfocussed on the retina (adapted from [5])

20

Animal Accommodative powerPigeon 9DChicken 17DMacaques (young) 18DMarmosets 20DHumans (young) 14DCat <2D to 12D

Table 2.2: Accommodative power varies between species (adapted from [6]), which alsonotes that the accommodative range is very large in animals that need to see in both airand water, such as the sea otter and the cormorant.

2.1.2 Source of the control signals

The mechanisms above solely concern the mechanics of how to focus; that is, how

the eye’s structure adapts to ensure that the image formed on the retina is in focus.

The actual signals that control the ciliary muscles were not considered for nearly

a century. Pertinent work exploring these signals is now introduced, considering in

turn the fundamental sources of information that the eye has available:

Mechanism 1: Convergence

The eye uses several cues for driving the ciliary muscles, the simplest of which

is to use the information available from the convergence mechanisms in binocular

vision [7]. The closer the object is to the eye, the greater the eyes must be oriented

towards the nose, and hence by determining the amount of convergence the brain

can send an appropriate signal to the ciliary muscles. However, this is clearly not

the sole means by which the eye focuses, as it is entirely possible for a person to

focus on an object even if they cover one eye.

A widely cited study showed that the human eyes are consensual (ie the left

eye does not focus differently to the right) when presented with unequal accom-

modation demands to each eye, by the use of targets at different distances [8].

Along these lines, Flitcroft (1992) presented subjects with a series of dynamic aniso-

accommodative stimuli and observed that the response was equal in both eyes, and

was a compromise between the inputs to the two eyes, with no evidence of a random

alternation of eye dominance [9]. However, work in 1998 showed that in extreme

circumstances, the eyes can focus differently – Marran measured “an average 0.75D

aniso-accommodative response [for a] 3.0D aniso-accommodative stimulus” [10].

21

Figure 2.2: Chromatic aberration occurs when light of different wavelengths is refractedby a lens

Mechanism 2: Chromatic aberration

Another cue used by the visual system is chromatic aberration [11]. Chromatic

aberration is the result of light of different wavelengths refracting by slightly differ-

ent extents (see Figure 2.2). In photographs this exhibits itself as coloured fringes

at the edges of objects that are not in perfect focus. For example Figure 2.3 ex-

hibits noticeable chromatic aberration where blue and red fringes are evident. It is,

of course, undesirable for photographic images to exhibit such aberration, and its

impact can be greatly reduced by the use of an achromatic doublet lens invented by

Hall and Dolland early in the 18th century [12].

The eye, however, could make use of the chromatic aberration. If the eye is

focused beyond the object (hypermetropia), then the object will have a red edge.

Whereas, if the object is behind the eye’s focus position (myopia), then the object

will have blue fringes. Recent work has indeed shown that this chromatic aberration

can be used to provide an odd function signal1 for ciliary control [13]. The use

importance of an odd-functioned signal is discussed in Section 2.1.5.

As with convergence, it is possible to establish whether this is used as a focusing

cue by investigating whether focusing can be achieved when the cue is absent. If

a scene is illuminated with monochromatic light, then no chromatic aberration will

occur, as there is only a single frequency of light. Experiments by Fincham with 55

human subjects under such conditions found that 19 of these were unable to focus,

and were conscious of a blurred image, 22 focused as well as when the object was

illuminated with white light, and the remaining group accommodated in the same

direction regardless of whether a positive or negative lens was inserted [7,15]. More

1An odd function is one where f(−x) = −f(x), whilst an even function behaves such thatf(x) = f(−x)

22

(a) Photograph of a block of flats (b) Detail shows chromatic aberration

Figure 2.3: A photo of a block of flats [14] which shows significant chromatic aberrationaround the window frames

recently, Chen showed that there is no significance between subjects who are short

sighted, and those with correct vision (emmetropes) when focusing in cue-absent

monochromatic conditions [16].

These results suggested that, at least for some subjects, broad spectrum illumi-

nation (and thus chromatic abberration) is critical in providing directional informa-

tion regarding the level of defocus. This is reinforced by Campbell’s work which

explored the minimum quantity of light required for humans to focus, concluding

“the mechanism of accommodation was found to take up a relatively fixed position

approximately 0.6D greater than the minimum refractive power of the eye when

the luminance of the test object was below the cone threshold for visibility. It is

concluded that the receptors involved in the accommodation reflex are the foveal

cones and that in the absence of a foveal stimulus the mechanism of accommodation

takes up a relatively fixed focus greater than the minimum refractive power of the

eye.” [17].

Mechanism 3: Minimising blur

Whilst it is relatively easy for a person to manually adjust a camera, telescope or

microscope etc to bring the object into focus, actually specifying what constitutes

in-focus is difficult. The dictionary defines focus as “that point or position at which

an object must be situated, in order that the image produced by the lens may be

clear and well-defined” [18], but this then leads to the challenge of the defining ‘clear’

or ‘well-defined’. One view is that in-focus means that the image has the least pos-

sible blur, but yet again, this poses the difficulty of defining the readily understood

concept of blur in a mathematically meaningful way. Pentland stated that ‘exact

23

focus’ means the point spread function has minimum variance [19], but this simply

anchors the definition to a mathematical function without any investigation to show

this is correct.

Just as a human will tend to hunt backwards and forwards whilst attempting

to focus a camera, gradually refining the focus position, the eye appears to perform

a similar hunting motion. In some fascinating work by Fender in the 1960s to

investigate the control system of the eye’s tracking mechanism, he observed that

a ‘hunting’ motion is superimposed on the accommodative mechanism, and that

this continually lengthening and shortening of the focal length will improve (or

worsen) the image and thereby provide information to control the accommodation

system [20].

Fender’s work was cited by Crane, who concluded that it was important to

determine exactly what measure of the image is fed back to the accommodation

system [15, p17]. Crane proceeds to propose that, as “the effect of blurring and

defocus are to reduce the high-frequency spatial components in an image”, an ap-

proach which involves the two-dimensional spatial derivative can reproduce many

experimental results. He looks at models based on both the summation of, and

the peak amplitude of the spatial derivative. Whilst the former does have good

performance at focusing, it is only the latter that, he claims, can provide fine-focus

control [15, p29].

Two decades later, and in marked contrast to the broadly gradient-based ap-

proaches that had been proposed to date, Morrone et al observed that ‘lines’ and

‘edges’ are the points in a waveform where the Fourier components are in phase

with each other [21]. Such a phase coherence approach is explored further: Kovesi

proposed a mechanism for determining when a maxima in phase coherence actu-

ally constitutes a feature [22], whilst Wang showed how blurring disrupts phase

coherence [23]2.

Section 2.3 looks at mathematical focus measures in more detail, whilst the

hunting motion is considered further in Section 2.1.4.

Troelstra and Stark both describe experiments wherein a target is randomly

moved in the absence of cues (using monocular vision, with no change in perceived

size or illumination, and no lateral movement). Both observed that the eye starts

to refocus in the wrong direction 50% of the time. Figure 2.4 shows that, when the

target is moved from its rest position to either the far or near position, the eye makes

an erroneous initial accommodative response half of the time [24, 25]. Stark argues

2An attempt was made to contact Wang et al to obtain details of their algorithm. Unfortunatelyno reply was received.

24

Figure 2.4: Ten responses with five initial errors from a typical experiment. The arrowsannotate where the initial accommodation was erroneous [24].

that this result shows there is not an odd-error component in the accommodative

signal, and that as blur is an even signal, it is likely that blur is the metric that is

used by the eye.

Philips and Stark compare the accommodative response when the target is

blurred (that is, the eye is looking at an out-of-focus image projected onto a screen)

with that when the eye is defocused, and observes the responses are equivalent. The

paper concludes that: “without minimizing the role of vergence-accommodation,

the rich optical and other clues to target distance, and the higher level ‘volitional’

and predictive control of accommodation in normal viewing, we can point to clear

experimental evidence for blur as the sufficient neurological stimulus to accommo-

dation” [26].

Finally, both Marshall and Mather show that blur plays a role in depth percep-

tion: Experiments with an ambiguous figure containing a blurred and sharp region

divided by a wavy line showed that the relative distance of the two regions was per-

ceived differently depending on whether the boundary was blurred or not [27,28].

Other mechanisms

Other work has shown that a change in apparent distance (that is, by varying angular

size of an object) can also cause trigger a change in accommodation [29], and that

subjects can use audio biofeedback to exert voluntary control of accommodation

to reduce myopia [30]. Both results demonstrate how versatile the accommodation

system is at using any available information.

25

2.1.3 Neurological pathways

A thorough overview of the neurological pathways involved in accommodation is

provided by Gamlin [31] who, in summary, describes how observations in the late

19th century led to the identification of the Edinger-Westphal (EW) nucleus, which

has been shown to be responsible for the accommodation in both mammals and

birds. This group of cells – containing 800-1200 neurons in primates – when sub-

jected to electrical microstimulation evokes ocular accommodation in primates with

a latency of around 75ms. Despite relatively few studies examining the inbound

accommodation-related signals into the EW, there are confirmed links from the cere-

bellum (eg [32]). This could be the neurological means by which audio-biofeedback

control of the accommodation system might work.

Hung et al assert that “the summated blur signals are transmitted through the

magnocellular layer of the lateral geniculate nucleus to arrive at area 17 of the visual

cortex. The summated cortical cell responses form a sensory blur signal”, though

provides no indication as to how that original blur signal is computed [33, p288].

Fylan presented preliminary results measuring visual evoked potentials when blur

is applied to image stimuli, suggesting that it might be possible to measure this

blur signal and understand image properties that cause perceptual blur, though no

followup work has been published since these results from 1998 [34].

2.1.4 The 2Hz oscillations

Many papers discuss the fluctuation in lens position that was first observed by Fender

and which has been measured to be around 0.1D at 2Hz. It has been proposed that

this might be used either to perform hunting, to ensure that the subject remains

in focus [15], or that it adds odd information to the otherwise even signal from

blur [25]. Crane also shows, in an appendix, that these vibrations might play a role

in increasing the perceived depth-of-field. However, none of these suggestions are

conclusive.

Analysis of the stability and root-locii3 of various proposed vision models predict

an instability at 0.45Hz, which is observable in human subjects, but do not predict

one at 2Hz. Gray showed that whilst the 0.5Hz fluctuation varied with pupil diam-

eter (and presumably thus posits that it is associated with the vision mechanism),

the high frequency did not. “Thus [Gray] concluded that the lower accommoda-

tive frequency peak which was predicted by the root locus analysis is most likely

3In addition to determining the stability of a system, the root locus can be used to design forthe natural frequency of a feedback system (see, for example, [35])

26

associated with neurologically-controlled feedback instability oscillations, whereas

the high frequency peak is an epiphenomenon due to the effect of arterial pulse on

lens motion that is detected by the recording optometer” [33, p321]. It had been

earlier shown that the 2Hz fluctuations are “significantly correlated with arterial

pulse frequency” [36], though Charman suggests that thay may arise because of the

mechanical and elastic characteristics of the lens, zonule and ciliary body [37].

However, these fluctuations remain an area of intrigue: Judge and Flitcroft,

whilst considering this observed correlation with pulse note “If this is so, it is puz-

zling that macaques, which have considerable higher pulse rates than humans, have

high frequency fluctuations of a similar frequency to humans. Indeed, fluctuations

in humans at 2Hz would imply a heart rate within the definition of clinical tachy-

cardia.” [6].

That this source of the 2Hz signal is not certain does not, of course, mean that

the visual system is not making use of such fluctuations in lens position for some

purpose, whether its cause be deliberate or some artefact of other biomechanical

activities.

2.1.5 Models of accommodation

Despite the wide range in apparent inputs to the accommodation mechanism, the

identification of the neurons which control the ciliary muscles and the pyschophysical

response to stimuli, the control system in the brain is not well understood.

Fender’s exploration of the eye from the point of view of systems analysis, con-

strained itself to the systems associated with the control system that the eye uses

when tracking a target, and whilst touching on accommodation did not itself suggest

how it should be modelled [20]. The systems analysis approach is further pursued by

Toates who begins “the nervous control of the ciliary muscle, a subject of secondary

interest in previous papers, is central to [the proposed model]” [38]. He proposed

a model in the form of a classic negative feedback proportional control system (see

Fig 2.5). This model, as with other negative feedback control systems, relies upon

an odd function to assess blur, and Toates uses Stark’s 1965 function [39]. Stark’s

function is a linear response, but with limits which impose a constant output once

the input exceeds some threshold.

Further work by Stark (1975) used a computer simulation of Toates’ proportional

control model, and demonstrated that it exhibited unstable rather than smooth

responses to step stimuli, and proposed that a model based on a leaky integrator,

rather than a proportional controller, would improve it [33, p300].

A useful review of the progress made in model development over the last quarter

27

Figure 2.5: Toates’ negative feedback control system [38] uses Stark’s measure of blur

of the 20th century, discussing their respective strengths and weaknesses is provided

by Eadie and Carlin [40] and Khosroyani and Hung [41]. Khosroyani and Hung intro-

duce a dual-mode model which connects the vergence and accommodation systems

and can be used to explain the difference in behaviour for slow and fast ramping

inputs. However, no reference is made to the source of the ‘blur’ signal, beyond

acknowledging that it is important: “the vergence system acts to produce foveal

registration and the accommodation system acts to reduce retinal blur” [41].

2.2 Visual psychophysics

Independent of the low-level neural and physical analysis of accommodation and

its control, a separate body of work has looked at the psychophysics of blur; the

study of the relationship between the perception of blur and actual blur present in

the stimuli. A number of intertwining research themes have been pursued over the

past decades, and are introduced approximately chronologically. These consider the

nature of blur, sensitivity to contrast and frequency, as well as examining whether

the human visual system is optimised for the statistical structure present in natural

images.

Psychophysical experiments exploring the sensitivity to a wide range of physi-

cal stimuli, not just within vision science, tend to produce results that indicate a

common perceptual response – that is, at low stimulus levels, the just-noticeable-

difference decreases as the stimulus increases, then rises once beyond a certain stim-

ulus magnitude.

2.2.1 Early psychometric experiments

The earliest literature relevant to the study of blur describes work done in the

late 1960s. Campbell quantitatively investigated the psychometric responses of the

human vision system to known stimulii, showing that in some experiments just-

28

CAMPBELL, NACHMIAS, AND JUKICES

0

.0

0O

Ea

Frequency ratio1.0 102 .04 I O 1.0. 110 .12

2.0 I I I I I I

1.5-

1.0

0*5 1*, e X

0 IC

0.0 -1I0 5 I - I i I

0.0 00.1 0.02 003 004 005

log Frequency ratio

0.95

09

008 0

0.7 g

06

05

04

FIG. 2. Psychometric function for frequency recognitionwith a lower frequency of 6.5 c/deg. Subject JJ.

the display, the subject's viewing distance was halvedto 1.5 m. To ensure that this change of viewing distanceand angular field size had no effect on the standarddeviation (SD) for matches obtained at higher fre-quencies, some spatial frequencies were tested at bothviewing distances, with subject FWC. In the region ofthe overlap, there is no significant difference in theresults and we can conclude that this change of viewingdistance and field size is not important for these higherspatial frequencies.

Figure 1 displays the SD of the frequency matches,expressed as a percentage of the standard frequency.Each point is based on twelve matches, except for thepoints between 1.3 and 3.1 c/deg for subject FWC,which are based on 24 matches. Visual inspection ofthese results does not show any systematic correlationbetween the SD obtained and the standard frequencyused. These results were subjected to the Bartlett testof the homogeneity of variance. The three sets of resultsshown in Fig. 1 were tested separately, as well as thepooled results from the two runs on subject JJ. In all

FIG. 3. The just-noticeable frequency ratio as a functionof the lower frequency. Subject JJ.

cases, the hypothesis of homogeneity could not berejected at the 0.05 level of confidence.

One major objection can be raised to this method ofusing the variability of matches as a measure of dis-crimination. The subject is permitted to take as muchtime as he wishes to make each match. Thus it is con-ceivable that he may take more time and care in makingmatches where discrimination is more difficult. In thismanner, any real variations of discriminability may becompensated by variations of the care taken by theobservers. Indeed, in another type of visual-thresholdmeasurement, just such a problem did arise."0 To over-come this potential artifact, we resorted to a single-trial procedure in which the duration of observation wasstandardized.

Part II: Frequency Recognition

In all of these experiments, a single oscilloscope wasused, upon which a grating of determined spatial fre-quency and contrast could be flashed for 0.63 sec every2.63 sec. In the interval between presentations, thescreen reverted to a uniform luminance equal to thespace-average luminance of the sinusoidal grating. Inany block of trials, a random sequence of just two dif-ferent gratings was presented. Great care was taken toensure that these gratings differed only in spatialfrequency and that there was no contamination by otherfactors, such as differences in contrast, mean luminance,or waveform. The task of the subject was to reportafter each presentation whether he perceived a high- ora low-frequency grating. If his response was incorrect, abell was sounded. A Lab. 8 computer (Digital Equip-ment Corp.) was used to generate the random sequence,sound the bell, record the responses, and calculate theresults. Before each block (100 or 200 trials), the sub-ject familiarized himself with the two gratings betweenwhich he had to discriminate.

Figure 2 shows typical results obtained by thispsychophysical method. Each point represents the per-formance obtained in a block of 100 trials. For all of thepoints, the lower frequency was fixed at 6.5 c/deg and

ratio. Figure 3 is a plot of this measure for eight lower

Vol. 60

(a) Original figure (b) Replotted

Figure 2.6: Results from Campbell showing the original reported just-noticeable frequencyratio as a function of the lower frequency [42], and the same data re-plotted using differentaxis to enable comparison with more recent work.

noticeable-difference of spatial frequency4 is constant in terms of the ratio of the

two frequencies being compared, unlike a dipper [42]. However, he commented that

this might have been an artefact of the experimental design – giving subjects an

unlimited period of time to adjust one frequency to match the stimulii might mean

that they achieved the same degree of accuracy in the end, but that it was harder

(and thus took longer) for certain frequencies.

To address this factor, a second experiment is described, which used a fixed time

interval for the stimulii presentation. It showed that discrimination accuracy im-

proved with the frequency ratio. Whilst not exhibiting a “dip”, this result conforms

with Weber’s law:

just noticeable difference

stimulus intensity= constant (2.1)

Whilst there is no evidence to suggest this is the case, it could be that a dipper

was present, but that by virtue of the experimental design or data processing, it

might be that the relevant pedestal points were not measured, or the results were

considered to be anomalous, and discarded.

2.2.2 Preliminary investigations of blur

During this time, vision-centric theoretical work was also progressing, built upon

quantitative results obtained in the late 1970s. In 1980, the first theory of edge

detection was proposed by Marr [43]. This suggested that edges are defined as

4A spatial frequency is one which is expressed as the number of cycles per degree of visual angle.Experiments examining performance at different spatial frequencies typically use a stimuli basedon a sinusoidal grating.

29

Fig. 3. Blur extent difference thresholds for three blurring functions as a function orcriterio~ blur extenl. The error bars represent 2 standard errors. The spaee constants employed to express blur extent are: standard deviation for the Gaussian; ramp extent for the rectangular; and half-wavelength for the cosinusoidal blur. The data for two subjects are shown (R.J.W. 0; C.C. x ). In each case the rising

portion of the functions corresponds to a power law with an exponent of 1.5.

and have theoretical implications that we shall consider below.

Thresholds for difference in blur extent, for two observers as a function of edge contrast are shown in Fig. 4. A Gaussian blurring function with an SD. of 2.5min. arc was employed for these measurements. The basic result is that threshold varies with contrast in a power law with an exponent of -0.5 (R.J.W., - 0.59; CC.. - 0.43).

LXsctcssion

We now attempt to identify those features of the stimulus that most closely govern performance.

(i) One potential cue that subjects may use when discriminating edge blur is the maximum rate of luminance change. Our data rule this out since it would predict that blur extent difference thresholds for criterion Gaussian blurred edges of SD 2.5 and 5.Omin arc with contrasts of 40 and 80%. respectively should be the same. Likewise, edges with SD of

?ISE

<

‘\, x G 5 01 1 ii! L I I I

IO 20 40 80

% Controst

Fig. 4. 7‘hrtzshok.i blur extent diKerence for a Gaussian hi&cd cdgc (SD = 1.5 min arc), as a function of contrast. The data Callow a power law with an exponent of -0.5. The error bar rcprcscnts two standard errors. The data for two

subject5 are shown (R.J.W. 0: C.C. x ).

2.5 and lO.Omin arc at contrasts of 20 and 80% should provide the same thresholds. In neither case is this true. More generally, the power law exponents for the effects of contrast and criterion blur extent should be equal and opposite, which they are not.

The point of maximum luminance rate of change is the zero-crossing in the second derivative, and the value of the maximum luminance rate of change is proportional to the gradient at the zero-crossing. It therefore follows that the gradient of zero-crossing is not the cue either.

{ii) A similar argument rules out any simple Fourier transform model of blur disc~mination. The data of Campbell er ul. (1970) provide a useful reference set. They found that spatial frequency difference thresholds were a fixed proportion of the criterion frequency (6%). Their data further suggest that contrast had little effect provided that the stimuli \tc‘rc ahosc contrast dctcction threshold. If this set of &IK~ provide a sound basis for al1 spatial dilation thresholds, then blur extent difference thresholds should also be 6% of criterion blur, and a power law with an exponent of 1.0 would be expected. Indeed if the cues available in different frequency bands could co-operate. by probability summation for example, then performance should be better than 6%. Our data generally show blur difference thresholds greater than 6%. and we obtain a power law with an exponent of 1 S. We are thus able to rule out a simple Fourier transform model.

(iii) Another contender model for blur perception is the range of filters reporting a zero-crossing (Marr and Hildreth. 1980). Without making strong asser- tions about the filters involved. it is difficult to test this hypothesis. however the following points are clear. In general all filters will report a zero-crossing for the sharp edges. and as the edge is blurred the amplitude of the response will be attenuated by a larger amount for the high frequency filters than for

the low frequency ones. The range of filters is not

Figure 2.7: Blur extent difference thresholds for three blurring functions as a function ofcriterion blur extent. The error bars represent 2 standard errors. The space constantsemployed to express blur extent are: standard deviation for the Gaussian; ramp extent forthe rectangular; and half-wavelength for the cosinusoidal blur. The data for two subjectsare shown (R.J.W. o; C.C. x ). In each case the rising portion of the functions correspondsto a power law with an exponent of 1.5. From [45]

zero-crossing points of the second derivative of intensity, and showed that the most

sensible means of finding these is by searching for the zero values of the convolution

∇2G∗I, where ∇ is the Laplacian, and G the Gaussian operators, and I is intensity.

Hamerly measured the blur detection threshold, and showed that this was lower

when both stimuli were blurred, than when one was unblurred [44]. However, the

amount of blurring used was not sufficient to reach the other side of the dip, and

encounter a Weber’s law type relationship – the pedestal blur was not increased

beyond 80 arc-seconds.

A few years later in 1983, a simple experiment using a step change in luminance

was conducted by Watt [45]. A 1D band of light with a variety of blur functions

applied was shown, and the observer indicated which of two stimulii had a broader

blur extent. Three blur functions were used: gaussian, rectangular and half-wave

cosinusoidal profiles were each applied to the the step change in luminance.

The primary finding from this work is that “blur comparison is most precise at

some non-zero criterion blur for each blurring function. In each case the data shows

a decrease in threshold as the criterion blur is increased from zero to an optimum

level, beyond which threshold rises rapidly” – see Figure 2.7, which shows that what

Hamerly perceived to be an increase in sensitivity, was simply a dip which ends

after approximately 3 arc-minutes of pedestal blur. Thus, a decade after a dipper

response was observed by Nachmias and Sansbury investigating contrast, the same

shape of response was found in blur discrimination.

In exploring the possible cues that the subjects might use, Watt ruled out δI

30

Figure 2.8: (a) A hypothetical double step stimulus and the second derivative (b) of theresultant retinal image. The visual system is able to resolve the stationary point at I andthereby perceive two separate steps. (c) A narrow double step stimulus and the secondderivative of the resultant retinal illumination (d). The visual system is not able to resolvethe stationary point which would be at I, and the Chevreul illusion results. (e) A rampedge stimulus and the resultant second derivative profile (f). The zero-valued stationarypoint at I causes a misrepresentation of the stimulus and Mach bands result. (Adaptedfrom [45]).

(maximum change in intensity), simple Fourier transform model and zero-crossing

of the second derivative. Instead, contradicting Marr’s theory, he showed that more

suitable primitives were stationary points (rather than zeros) in the second deriva-

tive corresponding to edges. As ever, being able to explain observed phenomena

with a proposed theory is a useful validation technique. Watt showed that his prim-

itives could explain the Mach band and Chevreul illusions. The Mach band illusion

comprises a linear gradient between a light and dark uniform areas. There is the

perception of a lighter stip on the light side of the gradient (and the converse on the

dark side). The Chevreul illusion appears when a light to dark stepped sequence

of bars are viewed – the bars tend not to look as if they are of a single colour, but

instead appear graduated from light to dark in the opposite direction to the main

steps. These illusions and explanation are shown in Figure 2.8.

31

2.2.3 Quantifying blur discrimination

After Watt’s work, various papers were published which used similar experiments

to extract more information about exactly how the eye (and human visual system)

might work. For example, Levi investigated the effect of blur on line detection,

spatial interval discrimination, the 2-line resolution, and developed models which

represent the behaviour in terms of an equivalent intrinsic blur [46]. The experiments

were then repeated in subjects with amblyopia, whereupon the altered behaviour was

shown to be represented by a minor change to the models that had been developed

[47].

The impact of exposure time upon performance was investigated [48], and shown

that discrimination improves with duration, but plateaus after approximately 130ms.

Mather and Smith considered how image blur might be used as a depth cue in

human vision [49,50], and showed that blur discrimination should be more effective

than convergence at larger distances.

Jacobs [51], whilst investigating sensitivity to defocus, considered the two ways

in which it can be produced: “Either the source of the visual image, such as a

photographic print or projected slide can be defocused or the observer can be defo-

cused using positive lenses placed in the spectacle plane. These two methods have

been called source and observer methods respectively...”. In 1989, when this work

was being done, blur thresholds had only been established with observer methods

of defocusing. Jacobs showed that results of both source and observer methods

correlated with a high coefficient – 0.994, and thus, giving “greater validity to the

source method, which has a number of advantages over the observer method, espe-

cially because it is within the capability of current technology to present simulated

defocused images generated by computer processing”.

Whether the results achieved thus far could be reproduced in natural scenes (as

all experiments described to date used relatively simple stimulii; typically a simple

edge or pair of edges shown on an oscilloscope) was explored [52]. Walsh used 18

subjects whose accommodation was temporarily paralysed by the use of anaesthetic.

The subjects were presented with a stimulus oscillating at 2Hz, and used a staircase

procedure to find the just noticeable change in contrast with defocus. The results

showed a dipper function that was symmetrical with both induced hyperopia and

myopia (see Figure 2.9), and was similar for both synthetic images (sinusoidal grat-

ings) and natural scenes (of a street). The same shaped responses were observed in

monochromatic illumination of various frequencies, though the centre of the sym-

metrical dipper was moved “in a progressively more hypermetropic direction as the

wavelength increased”.

32

Figure 2.9: Typical plot of the minimum detectable amplitude of oscillation of defocus asa function of mean position of focus. Vertical bars represent the standard deviations in thethresholds. Subject GW (age 29); 2.55 c/deg grating; 3 mm diameter pupil; green light;2Hz target oscillation frequency [52]. NB. In monochromatic illumination, the centre ofthe symmetrical dipper moved to the right with increased wavelength.

Without reference to Walsh’s work, Flitcroft explored the effect of temporal

modulations in luminance contrast on accommodation [53], and showed that fluc-

tuations in the 1-4Hz region effected the greatest detriment on accommodation, a

result which is “compatible with the ... hypothesis that flicker impairs the ability of

the accommodation system to utilize temporal cues such as those derived from the

higher frequency component (1-2Hz) of accommodative oscillations” (see Section

2.1.3).

2.2.4 Applications

The 1990s saw researchers trying to apply the well-established basic observations

and explanations of blur discrimination to other areas and applications. In one

experiment, subjects were required to judge the amount of blur in moving stimulii

[54], which showed movement made blur discrimination harder. The actual impact

of motion reduced as the pedestal blur increased – see Figure 2.10. Paakkonen

showed that motion produces equivalent spatial blur, and suggested a mechanism

by which it might arise. However, these results are reviewed by Hammett who

showed that “whilst discrimination performance for physically constant blur widths

increases monotonically with speed, subjects’ performance for constant perceived

blur widths is virtually constant for speeds up to 6.3 deg/sec”, and that this might

be as a result of perceptual sharpening, rather than motion blur [55]. There was

consensus that the blur width of a moving edge needed to be larger than for a static

edge to achieve the same perceptual width, and that the extra width required was

33

994 J. Opt. Soc. Am. A/Vol. 11, No. 3/March 1994

2

1.61-

1.2

0.8

0.4

0 -2

2

1.6

1.2

0.8

0.4

0 2--2

2

c 1.6 -

,a 1.2

-06 0.8.r .

c 0.4

0.2-2

MSreferenceblur:

-E 0 arcminC3 10 26 4

0 2 4 6 8 10

RO

0 2 4 6 8 10

0 2 4 6 8 10

Velocity (deg/s)

Fig. 2. Blur-discrimination thresholds as a function of velocityat four different reference blurs (space constants 0, 1, 2, and4 arcmin) for observers MS, RO, and AP. The reference-blurspace constant is specified by the standard deviation of the Gauss-ian. The error bars for the zero-arcmin reference blur are pre-sented as an example; they represent ±1 standard error. Typi-cally 1 standard error was -10% of the threshold.

extend far from the central fovea. Therefore the maxi-mum velocity used in this experiment was limited to8 deg/s.

One of the bands in each trial always had an edge withthe reference-blur width. The reference-blur value wasjittered from trial to trial in the range of the nominal ref-erence blur ±10%. Over a series of 64 trials we used anadaptive probit estimation algorithm's to select the cue,i.e., to select the difference between the reference and thetest blur randomly from a number of preset magnitudes.The absolute value of this difference was always added tothe reference-blur value to produce the test-blur width.The sign of the difference was used to specify whether theband with the reference blur or with a test blur was pre-sented first. We varied the location of the edge withinthe band randomly in a region of uncertainty 2 arcminwide to make it impossible for the subject to use distancecues in the measurement of blur. Two series of 64 trialscorresponding to situations in which the bands moved ei-ther in the same direction or in the opposite directionswere randomly interleaved. The analysis of the resultanttwo psychometric functions was done separately.

The observer's task was to decide whether the edge in thefirst or in the second band was more blurred. Thresholdwas defined as the standard deviation of the resultant psy-chometric function (83%-correct point), and we estimatedit by fitting a cumulative normal curve to the psychometricfunction, using probit analysis." Probit analysis alsoprovides the standard error of the estimate for the stan-dard deviation and a chi-square value that can be used inassessing the goodness of fit. At least four thresholds weredetermined under each condition. Each final value re-ported represents the root mean square of these estimates.Thresholds for all possible combinations of four differentreference blurs (0, 1, 2, and 4 arcmin) and six differentvelocities (0, 1, 2, 4, 6, and 8 deg/s) were measured.

RESULTS

Figure 2 shows the thresholds as a function of velocitywith four different reference blurs for all the observers.Although the data of observer AP differ in some aspectsfrom the data of the others, the main features are similar.For each reference blur the discrimination thresholds in-crease with velocity approximately linearly, and the slopeof this increase is inversely related to the reference blur.The smaller the reference blur, the larger the effect of ve-locity on the thresholds. Blur comparison is at its bestnot at zero blur but at some higher reference-blur valuefor all velocities. This is illustrated in Fig. 3, where thethresholds for observer RO are plotted as a function ofreference blur. This finding confirms the finding of Wattand Morgan9 for stationary Gaussian blur. This opti-mum blur also seems to shift to higher blur values withvelocity. Observer AP's performance is better than thatof the other observers at a reference blur of 4 arcmin,whereas the others perform better than he does at smallerreference blurs.

MODEL OF BLUR DISCRIMINATION OFMOVING TARGETS

The results show that image motion shifts the discrimina-tion thresholds, indicating that motion produces equivalentspatial blur. To estimate the amount of this equivalentblur, we need a model to separate the effects of motionblur and static spatial blur. The use of a mathematicalmodel has a prerequisite: we must assume that the blur-discrimination system is linear near threshold. Anotherfact of signal analysis helps us in building the model: in

Fig. 3. Blur-discrimination data for observer RO from Fig. 2replotted as a function of reference blur for six different ve-locities. The optimum blur is not at zero but at some higherreference-blur value for all velocities.

SiE0._

-

i'

S._E

u

00

C,)

A. K. Pddkk6nen and M. J. Morgan

........... . I . . . . . . . I . . . . . . . I . ... .......... .

Figure 2.10: Blur-discrimination data for an observer, plotted as a function of referenceblur for six different velocities. The optimum blur is not at zero but at some higherreference-blur value for all velocities. [54]

proportional to speed.

Burr and Morgan investigated motion and blur, to explore why “moving objects

look more blurred in brief than in long exposures”. They showed that motion does

not improve an observer’s ability to discriminate, but that moving objects appear

sharp because “the visual system is unable to perform the discrimination necessary

to decide whether the moving object is really sharp or not” [56]. All three of these

papers used gaussian blurred step functions as opposed to natural scenes or other

blur methods.

Approaching this area from a different direction, Kayargadde and Martens [57,

58,59], proposed a strategy for determining the quantity of blur present in an image.

They achieved a high degree of correlation between their algorithm’s response and

the mean-opinion-score of a number of subjects across a range of images, and thereby

argue that their “blur index” can be considered a psychometric measure of sharpness.

The blur index is determined by measuring the blur spread (an estimate of the

kernel size that caused the blur) across an image, then producing a global estimate

by combining an average of the blur spread with a weighting based on the edge

strength and length.

However, Peli observes, when considering Kayargadde’s edge detection perfor-

mance, that “It is clear that the algorithm has high sensitivity, but also a high false

alarm rate. The falsely detected edges are not only single noise pixels, but they

also created false connections between real edges” [60] – a frequent problem in edge

detection algorithms.

Similar blur estimators have been developed by others (especially Ferzli, Karam

34

and associates [61, 62, 63] and Marziliano [64, 65]), with the aim of being able to

quantitatively assess blur scenes without a reference. That is, to to permit compar-

ison of two images of completely different scenes and determine which of the two is

most in focus.

2.2.5 Natural scenes and the human visual system

Early research by Campbell showed that the human visual system appears to contain

multiple ‘channels’ for contrast detection that are tuned to different spatial frequen-

cies, and that the sensitivity of these channels is not equal – see Figure 2.11 [66]

(more can be found in a review by Klein [67]). Commenting on this, Billock sug-

gests at low spatial frequencies, contrast discrimination performance is consistent

with a Laplacian-like filter, whilst at higher frequencies there is evidence for multi-

ple, overlapping, wavelet-like spatial mechanisms tuned to narrow bands of spatial

frequencies [68]. The obvious question to draw from this is “Is this optimal for the

environments in which the human visual system is used?”.

Field observes (1987), from the opposite angle, that “there seems to be a belief

that images from the natural environment vary so widely from scene to scene that a

general description would be impossible” [69]. For any arbitrary non-random signal,

there is a coding strategy which permits it to be transmitted more efficiently than

simply transmitting the raw signal. An examination of a number of scenes that

would be typical for the mammalian visual system to encounter, shows that there

are generalities, so optimisation is possible. Looking at six different images, it is

clear that the frequency spectra follows the relationship g(f) = k/f 2, as shown in

Figure 2.12, a relationship which had been found earlier by Carlson in 1978 [70].

Such a relationship is expected if an image’s energy were scale invariant – there is

equal energy in each frequency octave.

Knill, Field and Kersten [71] explored whether the exponent of the frequency

drop-off is always 2, or does is there a variation in power spectrum between images,

and thus of the form:

PI(fr) ∝1

fβr(2.2)

This can also be rewritten in terms of amplitude, where α = β/2, and is termed

the ‘slope parameter’:

amplitude(f) ∝ 1

fα(2.3)

By rearranging, alpha can be defined as:

35

on April 5, 2010rstb.royalsocietypublishing.orgDownloaded from

Figure 2.11: The thick curve represents the contrast sensitivity (defined as reciprocalthreshold contrast) of the human visual system to a sinusoidal grating, plotted againstspatial frequency. The shaded area must always remain invisible to us unless the spatialfrequency content of the image is shifted into the visible domain by optical means, suchas the microscope. The lighter curves represent channels sensitive to a narrow range ofspatial frequencies (from [66]). It should be noted that “humans vary widely in the shapeand overall sensitivity of the contrast sensitivity function (normal observers may vary byas large as a factor of 3) [68].

α ∝ −log (amplitude(f))

log(f)(2.4)

Knill et al asked subjects to discriminate between random noise textures based

on their spectral drop-off, and compared this with an ideal observer. This ideal

observer is one which takes the place of the subject in the same experimental setup,

but can use all available information. So, rather than displaying a stimuli, then

using a human observer to provide feedback as to their observation, the task can be

put into a closed loop using an ideal observer. Using this approach, they showed

that the ideal observer was uniformly able to discriminate changes in slope, but the

human observers were best in the range 2.8 < β < 3.6, within the range of 2 < β < 4

exhibited by natural images. They conclude that their results are consistent with a

visual system tuned to an ensemble of images with a β of approximately 3.

Alongside investigations into blur discrimination, insight was being gained into

the nature of real-world images. However, there was obvious conflict between the

observation of images showing α = 1 and the tuning of the vision system. Tadmore

and Tolhurst attempt to resolve this discrepancy, and also consider the similarity

36

Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A 2385

E

Fig. 7. Two-dimensional amplitude spectra for two images from Fig. 6 (A and D). The center of such a plot represents 0 spatial frequency.Frequency increases as a function of the distance from the center, and orientation is represented by the angle from the horizontal. For the sakeof the clarity, each 256 X 256 amplitude spectrum has been reduced to 32 X 32. Thus each point in this plot represents an average of an 8 X 8 re-gion of the spectrum. Such plots show that amplitude decreases s

amplitude falls off quickly by a factor of roughly 1/f (i.e., thpower falls at 1/f2). Figure 8 shows the amplitude spectraaveraged across all orientations and plotted on log-log coordinates.

Although the description is by no means perfect, thesamplitude spectra are all roughly described by a slope of -1.This is not to say that all scenes from the natural worlwould be expected to show a 1/f falloff; there are certainlyscenes that do not show this property (i.e., a field of grassthe night sky, etc.). However, there are several reasons whthis 1/f falloff in amplitude should be expected to be a rougaverage.

A 1/f falloff in the amplitude spectrum is what we woulexpect if the relative contrast energy of the image were scalinvariant (i.e., independent of viewing distance). For example, consider an image of a surface with an amount of energyE between frequency f and frequency nf when viewed at distance d. Increasing the distance by a factor a will shifthe energy to the frequency range of af and anf. If we let thenergy at any frequency equal

E(f) = g(f) * (2xf),

Fig. 8. Amplitude spectra for the six images A-F, averaged acrossall orientations. The spectra have been shifted up for clarity. Onthese log-log coordinates the spectra fall off by a factor of roughly1/f (a slope of -1). Therefore the power spectra fall off as 1/f2.

(7

then to keep the energy constant in the range of all f to n/ foall f requires

nf.J g(f) * (27rf)df = K,Jfo

and it follows that

g(f) = k/f2.

(8

(9

In other words, if the power falls off as 1//2, there will bequal energy in equal octaves. For example, the total energbetween 2 and 4 cycles/deg will equal the energy between and 8 cycles/deg. (On a two-dimensional plot the area cov-ered by an octave band is proportional to f 2.) This falloff inpower can also be related to the fractal nature of the lumi-

David J. Field

Figure 2.12: Amplitude spectra for the six [natural scene] images A-F, averaged across allorientations. The spectra have been shifted up for clarity. On these log-log coordinatesthe spectra fall off by a factor of roughly 1/f (a slope of -1). Therefore the power spectrafall off as 1/f2. [69]

37

Figure 2.13: The image on the left has a flat spectrum, whilst the right has a 1/f spectrum.[72]

between slope discrimination and blur discrimination [73]. These are similar tasks,

as a change in slope changes the amount of high frequency signal present, and thus

impedes (or potentially assists) in discriminating blur. They suggest that “even if

the visual system were ‘tuned’ for processing images with natural statistics, it is not

clear why one would expect to be particularly good at discriminating small changes

from natural statistics” and that instead, subjects would be good at discriminating

large changes. That is, only when an image was outside the normal operating region

would the vision system start to flag it as being especially unusual.

Further, Tolhurst and Tadmor [74] discuss that increasing the α value of an

image causes it to look blurred. They show that their observed slope parameter

discrimination threshold approximately matches those blur width discrimination

thresholds, across a number of the papers described above. Continuing the tuning

argument, and contradicting earlier thinking, they conclude:

...a higher threshold represents a greater tolerance of change in α. An

optimised visual system should be most tolerant of image-distortions

when the image is in focus... Perception would not then be disturbed

by, for instance, any small accommodative errors or changes in pupil

diameter that might change α. On the other hand, a high sensitivity

to changes in α is required when the image is defocused, so that the

appropriate accommodative responses can be evoked. In this sense, the

present experiments have shown that the human visual system may be

optimised for the processing of natural images.

38

(a) High edge density (α = −1.133) [75] (b) Low edge density (α = −1.296) [76]

Figure 2.14: As the number of edges increases, the slope parameter changes, despite thefact that both images are in focus.

Field proposed a model, “RCS”, that was intended to overcome two main prob-

lems: Firstly, that α varies between images, and secondly that an image of white-

noise with a uniform frequency distribution appears to have too much high frequency

information (Figure 2.13). Field’s experiment involved changing α for a series of im-

ages that were captured in-focus, then asking subjects to browse through values of

α until they determined that the image was “just blurred”. The results suggested

that “the amplitude spectrum of natural scenes is not sufficient to predict when an

image is in focus or when human observers will judge an image to be blurred. The

slope of the amplitude spectrum is a result of both the amplitude of the structure

at different frequencies (eg contrast of edges) as well as the density of the structure

(number of edges). Blur, however, appears to depend entirely on the amplitude of

the structure” [72]. For example, see Figure 2.14.

Parraga and Tolhurst added random contrast variation to images whilst asking

the observer to discriminate changes in slope [77]. This was done in such a way

to tease apart whether the observer was actually discriminating the slope change,

or whether they were performing a single-frequency-band contrast discrimination.

They showed that it was the latter that was happening, and that the change in slope

could not be directly detected. Thomson and Foster, acknowledging this controversy,

show that phase information clearly plays a big part in encoding the structural

information of a natural scene [78]. They demonstrate that disrupting the phase

information completely removes any the tuning that had previously been reported.

Billock reviews eleven sets of natural images that have been published, and shows

that their average slope parameter is 1.08 across all 1176 images [68], observering

that there is no significant difference between the slope, regardless of photographic

technique, calibration or computation used in the underlying studies. He says that

39

adults are tuned to a slope parameter around 1.09-1.20, and that this is not present

in infants suggests that either a genetic or adaptive influence affects the development

of the visual system.

Later work by other researchers has shown that perception of natural images

varies with crowding (ie the presence of distractors around the target) and whether

the stimulus is viewed in the fovea or periphery of the retina (eg [79,80,81,82].

2.2.6 Related work

Westheimer investigated the impact on blur discrimination on simple edges as con-

trast varied, and showed that thresholds rose significantly as constrast is reduced

[83]. Wuerger examined blur discrimination when a monochromatic stimulus was

presented in different colour channels [84]. She showed that her four subjects had

approximately uniform blur discrimination threshold when the stimuli was yellow-

blue, but that the usual dipper was seen when using the red-green stimuli. No

comparison was made with Fincham’s work (see Section 2.1.2).

More recently, a very interesting result whereby the vision system adapts to blur

was demonstrated [85]. When asked to indicate, by means of a staircase procedure,

whether each image was too sharp or too blurred, the results were significantly

different if the adaptation field between stimuli was of a blurred or sharpened image.

This result was demonstrated both for temporally and spatially separated adaptation

fields (Figure 2.15), as well as temporal edges [86,87,88], but little effect was observed

following adaptation to luminance and chromatic patterns [89].

The way different types of distortion affect the perception of images has also

been measured [90]. The most interesting result is that if an image has experienced

some structural distortion (eg manipulating a wavelet frequency), then the apparent

distortion is reduced if additive white gaussian noise (AWGN) is added. Similar

work used 37 subjects to explore the performance of 7 denoising filters, concluding:

“If a trade-off needs to be made then the blurring is more a problem than the

remaining noise or artefacts”. That is, people are more bothered by blur than by

noise [91].

2.3 Focus measures

In the mid 1970s, work started being published about computational autofocus,

such as Erteza [92] and Muller and Buffingham [93]. The principle of computational

autofocus is maximise a function which produces a ‘figure-of-merit’ based on its input

(a two-dimensional array of pixel brightness values). Chern terms this approach

40

Figure 2.15: The two centre faces are identical and physically focused, but the right imageappears blurry in the sharpened surround while the left image appears sharp in the blurrysurround [86]

“pixel based focusing” [94]. Ultimately, a focus function should produce a small

value for blurred images, and a big value for images which are in focus.

There is minimal discussion to explain how the focus meaures are constructed.

Most appear to be the results of supposition or trial an error, though there are

exceptions. Vollath performs analysis of some simple functions to extract the noise

and scene-dependent terms, then synthesises new measures without these terms [95].

An early comparison of focus measures by Groen categorised those being eval-

uated by determining the underlying format of the equation, and proposed three

equation types [96]:

F 1n,m,θ =

∑∑E

{∣∣∣∣∂ng(x, y)

∂xn

∣∣∣∣− θ}m (2.5)

where θ is an arbitrary threshold, g(x, y) the grey level at coordinates (x, y) and

E(z) = z if z > 0; E(z) = 0 if z < 0. That is, measures of the form F 1 are

derivatives of some sort.

F 2f,θ =

∑∑f (g(x, y)− θ) (2.6)

In F 2, the function f(z) is one which analyses some statistical property of the image,

such as measuring the depths or sizes of peaks or valleys in the image.

F 3m,c =

1

c

∑∑|g(x, y)− g|m (2.7)

41

Functions of the class F 3 are the 2D summation of difference between pixel value

and mean, raised to an arbitrary power, and subject to arbitrary scaling (typically

to normalise the measure).

An alternative set of categories was used in Sun’s review [97]. However, neither

approach is sufficiently comprehensive to cover all the measures discovered in the

literature.

In this work, measures have been grouped based on their underlying core func-

tion. Measures which operate on gradient (ie the simple difference between two

pixels, perhaps subject to an arbitrary sampling interval) could be represented as a

convolution with [1, 0,−1] or similar. However, for clarity, they are listed as gradient

measures, not convolutions. All the measures discovered in the literature are listed

in Table 2.3 which also summarises a few of their key properties. Several new

measures have been created during the course of this work, and have been listed in

this table for completeness. They are denoted by a double-asterisk, and described

in Appendix C.

42

Method Year Category[97]

Eq[96]

Core func-tion

Parameters Summation Thresholded

cranepeak [15] 1966 Derivative Gradient 2D No No

cranesum [15] 1966 Derivative Gradient 2D Yes No

tenengrad [98] 1970 Derivative Convolution 3x3 Sobel Of square Yes

brennergradient [99] 1971 Derivative Gradient 1D, step=2 Of square Yes

menmay [100,101] 1972 Statistical* F2 Histogram Yes No

thresholdedcontent [100,96] 1973 Intuitive F2 None Yes No

thresholdedpixelcount [96,102] 1973 Intuitive F2 None Of number of pixels exceeding inten-sity

Yes

energylaplace [93,96] 1974 Derivative F1 Convolution 3x3 Laplace Of square No

squaredgradient [93, 96] 1974 Derivative F1 Gradient 1D Of square Yes

masgrn [103,101] 1975 Statistical* Histogram Of bins Yes

absolutegradient [104,96] 1976 Derivative* F1 Gradient 1D Of absolute value No

thresholdedabsolute-gradient [96]

1985 Derivative* F1 Gradient 1D Of absolute value Yes

imagepower [96,99] 1985 Intuitive F2 None Of square Yes

groenvariance [96] 1985 Statistical F3 None Of squared distance from mean No

normalizedgroenvariance [96] 1985 Statistical F3 None Of squared distance from mean No

absolutevariation [96] 1985 Statistical* F3 None Of distance from mean No

Table 2.3: Comparison of focus measures (Part 1). (* denotes my categorisation along the same lines as [97])

43


Eq[96]

Core func-tion


autocorrelation [105,97] 1987 Statistical Autocorrelation Yes No

standarddeviationbased-autocorrelation [105,97]

1987 Statistical Autocorrelation Of distance from mean No

voll4 [105,99] 1987 Statistical* Autocorrelation [0,1,-1] Yes No

voll5 [95,99] 1988 Statistical* Autocorrelation [0,1] Yes (less mean) No

sml [106] 1989 Derivative Convolution 3x3 Laplace Of absolute value Yes

rmscontrast [107] 1990 Statistical None Of squared distanct from mean No

triakis7d [101] 1991 Voxel statis-tics

Of matching voxels Yes

triakis11s [101] 1991 Voxel statis-tics

Of matching voxels Yes

entropy [101] 1991 Histogram Information Yes No

range [101] 1991 Histogram Max-Min None No

chernfft [94] 2001 FFT None No

histogramentropy [94] 2001 Statistical* Histogram Of entropy No

hlv [94] 2001 Statistical* Histogram None No

laplace [94] 2001 Derivative* Convolution 3x3 Laplace Yes Yes

smd [94] 2001 Derivative* Gradient None No

nrbm [64] 2002 Derivative* Convolution Of edge widths No

waveletw1 [108] 2003 Derivative* Wavelet Of absolute value of HL, LH and HHsections

No

waveletw2 [108] 2003 Derivative* Wavelet Of squared distance of absolute valuefrom mean absolute, in each of HL,LH and HH sections

No

Table 2.4: Comparison of focus measures (Part 2). (* denotes my categorisation along the same lines as [97])

44


Eq[96]

Core func-tion


waveletw3 [108] 2003 Derivative* Wavelet Of squared distance to mean of eachHL, LH and HH sections

No

kurtosis [109] 2004 Kurtosis Of DCT

JNBM [61] 2006 Statistics

va [110] 2006 Attention Yes No

CPBD [63] 2010 Statistics

energylaplace5a, based on [111] ** Derivative* Convolution 5x5 Laplace Of square No

energylaplace5b, based on [111] ** Derivative* Convolution 5x5 Laplace Of square No

energylaplace5c, based on [111] ** Derivative* Convolution 5x5 Laplace Of square No

rawlaplace ** Convolution 3x3 Laplace Yes No

phasecongruence based on [22] ** Phase congru-ence

Of features Yes

phasecongruence2 based on [22] ** Phase congru-ence

Of features No

randomnumber ** Random None No

alphaAdult ** Statistical Spectrum None No

alphaRedOnion ** Statistical Spectrum None No

alphaImageEnsemble ** Statistical Spectrum None No

Table 2.5: Comparison of focus measures (Part 3). (* denotes my categorisation along the same lines as [97], ** are new measures createdduring this work, details of which can be found in Appendix C.)

45

2.3.1 Timing and performance

Early work comparing focus measures paid particular attention to the computa-

tional requirements of the various algorithms – Santos reports that their focusing

computations took 60% of the total time spent focusing [99]. But, this was done

in 1997 when the computing resources they had available ran at 16MHz and had

12MB memory. Seven years later, Sun et al said: “Computation time does not

exceed 30ms for almost all the focus algorithms tested. Therefore, the speed of the

focus algorithms is not used as a criterion for comparing and ranking [the focus

algorithms]” [97].

Even a low end computer is now 100 times faster than the 1997 computer,

and modern computation strategies such as off-loading matrix calculations to the

graphical processing unit (GPU) can boost performance by another order of magni-

tude [112]. Interestingly, Santos did conclude that if the execution time had been ig-

nored from their evaluation, then the rankings are almost identical – just tenengrad

moves places and rises up the ranking [99].

Santos et al excluded certain algorithms (those in the frequency domain) from

their survey because “their complexity makes it difficult to produce fast algorithms”

[99]. Again, with ever faster computations, such restrictions do not need to be

imposed.

Sampling within captured images will also affect performance. As early work

used images that were just 64x64 pixels (eg [101]), and modern cameras take signifi-

cantly larger photos (sizes in excess of 4000x3000 pixels are common), then consider-

ing the full information content of the image will take longer. One possible solution

is to use subsampling, though such a strategy may require tuning of thresholds and

algorithm parameters.

2.3.2 Getting the images

In Groen’s 1985 comparison of focus measures, he explains that the images acquired

for a real-life image were taken by placing a photograph in front of the camera -

“the in-focus distance between the lens and photograph was 520mm. Focussing

took place by moving the camera with respect to the object in 20-mm steps. In this

image sequence a relatively large change in image content is present related to the

relatively large depth of field of the macro setup. This poses a separate problem from

the variation in fixed-content images.” [96]. Clearly such an approach, whilst it may

have merits, is of limited value given both the enormous change in image content

and that it is not a normal mechanism for focusing a camera – the photographer

46

will normally stay still, and change the optics rather than the other way around.

Most of the literature about mathematical focus measures surrounds the autofo-

cusing of microscopes. Typically, a set of images are captured by changing the focus

step of the optical system by a few µm, whilst maintaining constant illumination

(eg [96,97]). For example, Santos explains that their images were captured at steps

of 0.025µm, had a depth of field of 0.16µm, and had an exposure time of 0.3s which

permitted 185 grey levels to be captured [99]. In summary, by whatever method

was appropriate to the object being imaged, a set of photos was captured.

2.3.3 The nature of blur

The exact nature of focal blur is of importance. Ultimately this is a result of the

point spread function (PSF) of the optical system between the object and imaging

sensor. A number of different functions are discussed in the literature, including

cosine or gaussian operators, and manipulating the amplitude spectrum [92, 113,

19, 87, 114]. Murray and Bex suggested that the blur (and sharpening) caused by

manipulating the amplitude spectrum does not simulate perceptual blur. However,

both sinc (that is, sin(x)/x) and gaussian blurred images do – each produced dipper-

shaped blur discrimination thresholds. Further, they showed that blur-equivalence

between gaussian and sinc blurred images as determined by human observers was

better reproduced by models based on luminance slope than those based on spatial

frequency. Thus, they suggest that phase components of images are important when

measuring perceived blur, indicating that a sinc operator might be more appropriate

than a gaussian [115].

Indeed, in terms of physics, the optical blur at a given wavelength is a sinc

shaped PSF5, but the aggregate effect of summing this across multiple wavelengths

is approximately gaussian [19, Appendix], [96, Fig 2]. Whilst Murray and Bex sug-

gested that phase is important, they did not apply the sinc function at multiple

wavelengths, thus their results are not directly applicable to simulating optical blur

in non-monochromatic scenes. Accordingly, almost all work has used a gaussian

kernel as the mathematical means of simulating blur, and equate the standard de-

viation of the gaussian kernel with the arc-minutes of blur extent. No description

has been found in the literature regarding this relationship, nor how the blur extent

could be compared to known physical changes in focus distance or lens strength

when capturing real objects.

Whilst the behaviour of a man-made optical system can be calculated, the PSF

5Primarily due to diffraction effects which produce wave cancellation and reinforcement (see[19])

47

of the human eye has to be determined by measurement. It is not a trivial mathe-

matical function, and a significant body of work has been done investigating it, such

as that by Roorda et al and Navarro et al who scanned a laser point across the eye

and used a camera to record the resultant image on the retina [116,117].

Synthetic images have been used in several experiments; typically square waves,

sinusoidal gratings and white random noise [84, 101] which have then had gaussian

blur added. Bex went further and created a random noise image, then low-pass

filtered and thresholded it to generate a binary monochrome image, which results

in a random image with an amplitude spectrum equivalent to a natural scene [114].

2.3.4 Establishing the ground truth

An important property of a set of images captured for the purpose of analysing

the performance of focus measures is to know which is the most in focus; that is,

to establish the ground truth. The literature does not indicate the existence of an

established means of doing this across a wide range of applications. Muller, when

considering how to compensate for atmospheric perturbations of astronomical ob-

jects, mathematically proved that certain “sharpness functions reach their maxima

only for a properly restored image” [93]. However, this application can use point

light sources and zero depth of field to perform modelling.

Whilst it would technically be possible (if looking at a flat object perpendicular

to the camera’s axis) to compute the precise ground truth, there is no evidence

in the literature that this has been done. Such an approach could only be used

for three dimensional scenes, where there will be a multitude of distances, if the

depth of field were also taken into account. Again, no discussion of such a strategy

has been found. Instead, the ground truth appears to routinely be established

by human decisions, and rarely discussed. For example, one experiment took 28

focus images, and simply stated that the middle of the sequence was the “visual

in-focus image” [96]. The best image in other papers was “manually determined by

proficient microscope technicians” [97] and “obtained by a trained operator” [99].

Rather than determine a ground truth, Chern et al reviewed the image of best focus

for each measure, and noted that “visual inspection reveals virtually no difference

between the frames” [94].

2.3.5 Quantitative comparisons

Once a set of source images has been captured and processed by a candidate focus

measure, it is desirable to quantitatively describe the result. An early quantitative

48

comparison was made by Groen, who normalised the resulting focus scores to have

a maximum of 1 (though no scaling to anchor the minima to a particular value was

performed), then compared the width of the focus measure at 50% and at 80% of

the maxima [96]. Building on this, a standard approach to characterise and rank

candidate measures is used by a number of different researchers (eg [97, 101]). It

uses the following parameters:

� Accuracy (A): Distance between maxima of the focus curve and the ground

truth of ‘best’ image, measured in number of image frames of distance.

� Range (R): The distance (in number of images) between the first minima on

either side of the global maxima. This should be large, as there should not be

any local maxima on the focus curve.

� Number of false maxima (F): The number of maxima appearing in a focus

cuve, excluding the global maximum.

� Width (W): The width of the curve (in number of images) at 50% of the

maxima’s height. Ideally this should be small.

� Noise level (N): This describes the speed of the direction of change between

two false maxima of a focus curve. It is computed by taking the sum of squares

of the second derivative obtained by convolving the curve (ommitting the peak

value) with the kernel (−1, 2,−1).

Figure 2.16 shows these parameters annotated on a typical focus curve.

An additional parameter was suggested by Ligthard and Groen in an early paper

– that the algorithm should be able to use the same video signal as that used for

the ultimate image capture so as to avoid any systemic errors that might arise from

a hardware implementation using a different video source [118]. However, given the

advances in digital image processing and acquisition, such a precaution is likely not

to be necessary.

So as to measure the robustness of the algorithm under test, Sun et al also mea-

sure these parameters on three additional versions of the input – after subsampling,

adding random noise and low-pass filtering. The overall score for each focus measure

is then computed as the Euclidean distance from the perfect score, where the perfect

score is where all parameters are zero, except for the range which is the number of

images under test. Sun et al normalised each parameter before computing the score,

giving all distances equal weights. Once normalised, the score is thus:

49

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Nor

mal

ised

sco

re

Figure 2.16: An example focus curve annotated with the attributes used to quantitativelycompare this algorithm against others. Accuracy is 2 (the distance between peak of thecurve and ground truth ‘best’ image denoted by a circle at (50, 1). The range is indicatedby the upper horizontal line, and is the distance between the first minima on either sideof the peak (18). There are six false maxima, indicated by downwards arrows. The lowerhorizontal line shows the width of the curve at 50%, which is 30.

score =√A2 + (numimages−R)2 +W 2 +N2 + F 2 (2.8)

None of the existing comparison studies in the literature have covered all focus

measures, and indeed one study failed to include an overall table showing how the

measures ranked across a number of different assessments [97]. Their results have

been consolidated to produce an overall rank to show which measure performs best,

as can be seen in Table 2.6. Sun concluded: Across a wide range of images, conditions

and pre-processing steps, the normalisedvariance measure was determined to be

the best. The only exception is when the images are subsampled [E7 in Table 2.6]

where tenengrad is best.

50

Name E1 E2 E3 E4 E5 E6 E7 E8 E9 Total Rankthresholdedabsolutegradient 7 8 7 7 6 10 3 8 4 60 6squaredgradient 8 10 8 8 8 8 5 9 7 71 7brennergradient 5 6 5 5 5 7 2 6 6 47 5tenengrad 4 5 4 4 4 6 1 5 5 38 4sml 14 14 14 13 14 13 7 14 12 115 14energylaplace 11 11 10 9 10 14 16 12 9 102 11waveletw1 16 15 15 14 15 12 12 17 14 130 16waveletw2 15 12 16 12 13 16 11 16 16 127 15waveletw3 12 13 12 10 12 15 10 15 15 114 13groenvariance 3 3 3 3 3 3 13 2 2 35 3normalisedgroenvariance 1 1 1 1 1 1 6 1 1 14 1autocorrelation 6 9 6 6 7 9 17 7 8 75 9stddevcorr 2 2 2 2 2 2 14 3 3 32 2range 13 16 9 11 16 11 4 11 11 102 11entropy 18 18 18 18 18 18 15 13 18 154 17thresholdedcontent 10 7 13 16 11 5 9 10 13 94 10thresholdedpixelcount 17 17 17 17 17 17 18 18 17 155 18imagepower 9 4 11 15 9 4 8 4 10 74 8

Table 2.6: Score summary across the different tests performed by Sun [97]. E1 uses no magnification; E2 uses 100x magnification; E3 uses 400xmagnification; E4 used bright field observations; E5 used phase contrast observations; E6 was observed with DIC; E7 used 5% subsampling;E8 added random noise; E9 applied low pass filtering. Each of these were performed with at least 18 image series.

51

When Sun et al aggregated their results for a particular experiment acrosss a

number of different image series, they simply averaged the score. Santos proposed a

different approach [99], though this appears not to have been used by any subseqent

work. It is calculated as follows:

1. Compute the mean and standard deviation of each of the five metrics (intro-

duced above) for each image series.

2. Normalise each metric by subtracting the mean and divide by the standard

deviation.

3. From these normalised metrics, compute the score (the Euclidean distance

from the ideal, as in Equation (2.8)

4. The global score for the function is now the mean of the scores of normalised

metrics.

Table 2.7 summarises the ranks of all focus measures evaluated in the most widely

cited review papers. It shows that a variance based measure typically performs best.

52

Focus measure Sun’s rank Santos’ rank Firestone’s rankthresholdedabsolutegradient 6 7squaredgradient 7 11brennergradient 5 10 1tenengrad 4 8sml 14energylaplace 11waveletw1 16waveletw2 15waveletw3 13groenvariance 3 2normalizedgroenvariance 1 1autocorrelation 9stddevcorr 2range 11 12 7entropy 17 9 9thresholdedcontent 10 5thresholdedpixelcount 18 6imagepower 8 4voll4 1voll5 3triakis11s 3spectral 4triakis7d 5menmay 6masgrn 8

Table 2.7: Score summary across three review papers [97,99,101]

53

2.4 Cameras

Whilst the eye took many millions of years to evolve, the evolution of photography

has been far quicker. The earliest camera is the camera obscura (a pin-hole camera),

whose principle was known by Aristotle in 300BC, although the first recorded use

much later was in a drawing by Leonardo da Vinci in 1519. The first actual photo

was taken in the summer of 1827, by Nipce, and required an eight hour exposure.

By the 1850s, exposure times had been reduced to just two or three seconds, but

the process required that the plates were freshly made, and still wet, when the

photographs were taken. However, this too was resolved in the coming decades,

with first gelatin and then celluloid being used to replace the glass plates. Finally, in

1888, George Eastman released the box camera, and photography became available

to the masses [119].

Since then, cameras have improved considerably. Some of the most important

developments are listed below [120]:

Year Event

1914 Leitz introduces the sprocketed 35mm film.

1932 Technicolor for movies is introduced. Three black and white

films in the same camera capture the scene under different

filters.

1936 The first multi-layered colour film (Kodachrome) is developed,

as is the first 35mm single-lens reflex (SLR) camera.

1955 Minsky develops confocal microscope [121]

1963 The first instant colour film is developed by Polaroid.

1975 Kodak build the first CCD-based still camera

1985 Minolta markets the first autofocus SLR

1991 Kodak release first digital SLR, the DCS-100, which is a mod-

ified Nikon F3

1999 Nikon D1 SLR 2.74 mega pixel camera, the first ground-up

digital SLR

2000 The first camera phone is introduced in Japan by Sharp and

J-Phone

2004 Kodak stop producing film cameras

2005 Plenoptic (light field) camera realised

2009 FujiFilm consumer 3D camera launched

54

The following sections describe in more detail the evolution of certain features,

such as auto-focus, as well as introducing some of the latest research into how

cameras could be further improved.

2.4.1 Auto focus in conventional cameras

Despite the large number of focus measures described earlier, literature surrounding

the actual techniques used in cameras is scarce, probably because of reasons of com-

mercial confidentiality. However, there are fundamentally two approaches. Firstly,

the camera can actively measure the distance to the subject, using a variety of

mechanisms, and then use a look-up table to determine the lens position for a given

subject distance. Alternatively, the camera can perform some image analysis and

change the lens position until the image is determined to be in focus (eg [122]). As

most scenes contain vertical lines, this is typically done by varying the lens position

to maximise the contrast between two horizontally adjacent pixels. An improvement

upon this, used by modern cameras, is to consider multiple pairs of pixels, arranged

in a grid around the centre of the image, and even to allow the photographer to

move the focus point to the subject of their photo [123].

More recently, manufacturers have released cameras with significantly more ad-

vanced focusing strategies, such as:

AiAF “Canon’s 9-point AiAF (Artificial Intelligence Auto Focus) automatically

scans and selects subjects from a set of nine focusing areas across the scene.

This ensures accurately focused images even when subjects are not in the

centre of the frame.” [123]

FlexiZone AF/AE “FlexiZone AF/AE lets users manually select the focus point

from almost any point in the frame by moving the auto focus window in the

viewfinder. Exposure can be linked to the focus point to ensure that the chosen

subject is both accurately focused and exposed.” [123]

Face Detection AF/AE “Face Detection AF/AE ensures superb people shots by

automatically detecting the subjects in the frame and setting the optimum

focus and exposure” [124], though it is noted that faces may not be detected

if they “appear small, large, dark or bright relative to the overall composition,

[or] if the subjects are looking sideways, lying down, or their faces are partially

obscured” [125, p45].

Face-priority Auto Focus “A special digital detection program ... [scans] for

facial details and then controls autofocus operation based on the location of

55

the detected face in the scene.” [126]

Auto depth-of-field Canon’s digital SLR cameras incorporate A-DEP, where

“aperture is determined to maximize depth of field so that all objects in the

9 focus points are sharply focused” [127].

Other autofocus strategies include using additional information sources, such

as triangulating the location of the speaker in a video conference by analysing the

time-delay-of-arrival of the speaker’s voice to an array of microphones [128].

Future cameras may well abandon solid lenses for liquid ones, which are stimu-

lated by electric fields to change shape (and hence focus), just as with the human

eye. The advantages include the elimination of moving parts, lower power consump-

tion and smaller size. Jung et al explain how a liquid lens can be improved such

that it could be used in a portable device [129].

2.4.2 Other devices

Certain devices have been invented over the years which are immune to blur. The

confocal camera, patented in 1955, uses a pinhole to eliminate out-of-focus light, in

conjunction with point-wise illumination [121]. However, the trade-off is that image

acquisition is accordingly slower, as the illumination has to be scanned over the

sample.

A novel approach for image acquisition called Compressing Imaging was proposed

by Wakin [130]. Its main benefit is that just a single-pixel sensor is required, allowing

imaging to be performed in spectral regions where a CCD or other matrix-sensor

is financially prohibitive or technically impossible to construct. The sensor is used

in combination with a micromirror array showing pseudo-random binary patterns.

The mirror is set to a particular pattern, and the sensor is instructed to acquire a

data point. The pattern is then changed repeatedly and new data points acquired.

This approach enables a rough image to be displayed after just a few samples, and

for each subsequent sample to simply improve the image quality. This is shown in

Figure 2.17.

2.4.3 Recent developments

In addition to changing the camera’s parameters (such as shutter speed or aperture)

to take an individual image, it is possible to perform post-processing on a range of

photos to improve the output. Such techniques include high dynamic range (HDR)

56

(a) Ideal image (b) 819 measurements (c) 1600 measurements

Figure 2.17: Compressive imaging example: (a) shows an ideal image of 64x64 pixels (4096pixels). This was reconstructed using compressive sensing using (b) 819 measurements and(c) 1600 measurements. (Adapted from [130]).

imagery, whereby the results of conventional exposure bracketing are then combined

to produce a single image with high dynamic range [131].

Some of Fujifilm’s digital cameras incorporate technology in their sensor to im-

prove its performance in terms of dynamic range [132]. This is done by having two

sensing elements per pixel, one with lower sensitivity than the other (see Figure

2.18), which attempts to recreate the variety of grain sizes found in conventional

film photography (and is similar to the difference in sensitivity between the human

eye’s rod and code receptors).

Figure 2.18: Fujifilm’s 4th generation Super CCD HR [133]

HP has added image processing, rather than specific sensor hardware, to attempt

to improve the perceived dynamic range of the camera’s images [134, 135]. Whilst

they acknowledge that “a sophisticated user may be able achieve some of the benefits

of HP Adaptive Lighting Technology in image editing packages”, they go on to say

that “while these techniques are possible, they are very difficult, time consuming

and beyond the ability of most users.”. HP’s technology operates automatically,

and is also able to make use of the greater dynamic range available in the camera.

This means that their algorithm can process raw images before they are reduced to

57

a 24bit JPEG image, so can outperform computer based image manipulation.

Other work has looked at focus, and strategies for improving the depth-of-field

beyond that of simply having a very small aperture. Most notably, the implemen-

tation by Ng et al’s of a plenoptic camera [136] which yields superb results (see

Figure 2.19). By placing an array of microlenses directly on top of the CCD sensor,

plenoptic cameras can capture more of the light field inside the camera itself. The

result of this is that, for a single exposure, it is possible to computationally refocus

the image as well as being able to computationally move the observer both laterally

with respect to the subject, and also in terms of distance. These manipulations are

performed on the light field photograph after acquisition – that is, a scene can be

refocussed years after the image was taken.

(a) Conventional photo, focused on theclasped fingers

(b) Extended depth of field computed from astack of photographs focused at different

depths.

Figure 2.19: Sample photograph from a plenoptic camera [136]

Arbitrarily large synthetic apertures have also been created, meaning that images

can be captured with a very narrow depth of field. This can be used to “see through”

objects, such as in Figure 2.20 [137].

Focus (or defocus) information can also be used to build a three-dimensional

model of the scene [138,139], a technique known as “shape from focus”. In essence, if

a large number of photos are taken of the scene, they can be processed to determine

the optimal focus position for each pixel, and hence produce a three-dimensional

model of the scene, much as could be achieved with stereo cameras or laser range

finders. Figure 2.21 shows an example of such processing.

In summary, camera manufacturers are keen to demonstrate differences between

58

(a) Conventional photograph (b) Large aperture allows for narrow depth offield, eliminating the tree from the

foreground.

Figure 2.20: Synthetic aperture photograph [137]

(a) Original image (near-focused) (b) Final generated shape model

Figure 2.21: Depth from focus example [139]

59

their cameras and those of their competitors. They have done this by making

changes both at the software level, and by improving and evolving the hardware.

Some of the software techniques now being added to consumer cameras simply make

it easier (or even make it practical at all) to achieve some desired end result, despite

the fact that the end result has been technically achievable for some time, if the

photographer has the time to spend working on his photos. Other changes will take

a while to reach the consumer market, and when they do arrive could create a big

change in the way photographs are taken and used.

2.5 Summary

Since Helmhotz’s proposal that the eye accommodates by changing shape over 150

years ago, much investigative work has been done to understand how accommodation

is controlled. The EW group of cells (see Section 2.1.3), mentioned above, has been

found to be responsible for accommodation, but their exact behaviour has not yet

been characterised. Many stimulii have been identified, yet accommodation is still

possible in their absence. Around 1/3rd of the population can focus when stimulii

are removed, such as viewing an unknown subject, monocularly, in monochromatic

light. This leads to the conclusion that another stimulus is used; that the vision

system tries to reduce blur.

Crane was the first to question what mathematical property of the image is used

to calculate blur, and proposed the first focus measure. Since then, almost fifty years

have passed, seeing the publication of a multitude of models of accommodation, each

depending on quantifying ‘blur’ but no such quantitative measure being proven.

The psychophysical behaviour of a number of visual properties has been in-

vestigated and typically shows that discrimination exhibits a dipper response with

increased base magnitude. The seminal work on blur, Watt and Morgan (1983),

showed that the perceptual width of a blurred edge is related to stationary points

in the second derivative. Watt also showed how such a model could explain various

optical illusions, and was the first to publish a dipper shape when measuring blur

discrimination.

Watt’s experiments, and those of most other researchers since then, have been

conducted with simple 1-dimensional stimulii. Just two experiments stand apart

from this trend: Walsh found a dipper-function for blur discrimination using real

photographs, though without using accommodation (the subjects were anaesthetised

and moved the photographs which were blurred by the insertion of lenses) – this

is not a natural viewing experience. Secondly, Kayargadde developed a means of

60

quantifying blur in real images, and showed this could produce similar mean opinion

scores as humans, though did not try to use this to reproduce psychophysical results.

Similarly, Ferzli and Karam developed metrics for quantifying blur such that images

of different scenes could be compared to establish which was most in focus.

Statistical patterns in natural scenes were found by Carlson, which led to the

observation that amplitude ≈ 1/fα – that is, across a wide range of natural scenes

there was found to be a relationship between the amount of energy at each frequency,

and the frequency itself, with the relationship characterised by the slope parameter,

α. Many papers tried to understand α, and to show that human vision is optimised

for the alpha values found in the natural world, though there is debate about what

“optimised” would mean and what would be exhibited by such optimisation. Whilst

changing α does change the degree of blurriness in the image, there is no α which

universally corresponds to the sharpest image, nor does changing α produce the sort

of defocus produced by optical systems.

Tadmore and Tolhurst suggest α drives accommodation: “a high sensitivity to

changes in α is required when the image is defocused so that the appropriate ac-

comodation response can be evoked”. Though, Field subsequently shows “α is not

sufficient to predict when an image in in focus”, and thus it seems unlikely that it

could be the measure used to focus the eye.

Approaching the subject of blur from the engineering perspective are the auto-

focus algorithms in cameras and microscopes. Since the 1970s a growing assortment

of focus measures have been developed. Based on numerous different underlying ap-

proaches, these measures have been compared with a widely used evaluation meth-

odology. Published results show that a normalised variance method performs well.

Modern cameras have moved beyond simple autofocus, adding exciting features

that, one assumes, the manufacturers believe will drive sales. On the research front,

image processing in cameras is showing impressive results. The plenoptic camera, by

capturing the entire light field, means the image can be focussed after acquisition.

Other fields depend on autofocus and have very specific requirements – automated

screening can require time-sensitive reagents to be added to microscope slides, and

for images to be captured automatically within a particular time frame. However,

no general purpose algorithm has been proposed in the literature.

2.6 Discussion

Despite clear progress being made in the understanding of the human vision system,

there remains an uncertainty about the method by which blur is quantified. The

61

exploration of the neurological pathways involved in accommodation has clearly

progressed over the past century, though the difficulty of determining brain function

means that some exploratory procedures appear very crude – for example, making

lesions in regions of the brain and observing their impact (eg [31]). Preliminary

work by Fylan measured the VEPs when looking at blurred stimulii and might shed

more light on the neural interpretation in a less invasive manner. Regardless of the

techniques employed, it is clear that whilst the pathways are known, there is no

explanation from this line of research as to how the image formed on the retina is

interpreted and appropriate signals formed for the ciliary muscles.

Various psychophysical behaviours, responses and thresholds have been estab-

lished, but to-date a blur discrimination assessment has not been performed with

real images - stimulii tend to be simple edges. Whether similar results are obtained

with natural scenes when subjected to a natural point spread function has not been

shown.

Many theories have been proposed to explain perceived blur. Bex summarises

these theories saying: “For example, the perceived blur of the edges in an image

could depend on the gradient at the zero crossings [43], on the separation between

peaks in either the second derivative of luminance [45] or in the summed outputs

of a bank of band-pass filters [113], the scale of the filter producing peak response

to a blurred edge [140], the slope of the amplitude spectrum of the image [9] or

the relative contrast at high spatial frequencies [74]”. Each can explain different

portions of the response, but no comprehensive or universal model has been derived

or tested.

Another thread of model development has examined the human visual system’s

response times to quantified inputs - the same approach used when characterising

physical systems. However, the latency, settling time, initial error, overshoot, damp-

ing etc whilst of value, are not necessarily applicable when looking at a stationary

target. Philips showed that the same responses are elicited when showing a blurred

stimulus as showing an image of a blurred stimulus [26], greatly increasing the prac-

ticality of conducting studies into blur. Thus, whilst there might be possible model

improvements, the lack of a measure of perceptual blur (eg see Figure 2.5) is an area

ripe for further investigation.

Measures of blur are widely used in cameras and microscopy though there appears

to have been no attempt to use these measures to reproduce any psychophysical re-

sults. Instead, they have been compared by rating their accuracy and characteristics

of the focus curve, with variance-based measures being found to be best. No one

has proposed a methodology (or results) for establishing such a curve for humans

62

observing a scene, nor have real world scenes been examined, nor proper ground

truth established.

Modern cameras have created a plethora of additional focus related features to

help differentiate them in the consumer market place. However, there will always

remain scenes whose optimal focus distance can only be found by knowing the

interests of the observer; the top-down objective. Perhaps the point of interest in

the scene (at a later date) differs from the original subject, or simply that different

viewers of the same scene have different interests; a geologist and botanist looking at

an alpine photo will perhaps be drawn to the rock or flora respectively. Advances in

light field photography mean such a dichotomy of requirements can now be met by

post-processing, but do mean that all need their focus distance establishing - optical

focus is eliminated. Thus, despite the change to focusing after capture (rather than

before) these advances still have a need to focus.

Establishing the consensual ‘best’ focus across a range of subjects is essential to

better compare focus measures and understand population diversity, but appears

not to have been done before.

The performance (in terms of computational time) is discussed in many papers,

though with continuous improvements in general purpose computing and the ever

present opportunity to design ASIC solutions mean these should not be considered

a priority when comparing focus measures. The number of frames required to be

captured, and any hill climbing strategies for finding a global maxima that reduces

this figure are similarly areas for design and production optimisation, rather than

of any immediate value when assessing the performance of focusing strategies.

2.7 Thesis statement

As Wang says: “Human observers are bothered by blur, and our visual systems are

quite good at reporting whether an image appears blurred (or sharpened). However,

the mechanism by which this is accomplished is not well understood” [23]. Mech-

anisms for autofocus are well developed, but have not been compared with human

opinion or perception. The literature, discussions and conclusions lead to the thesis

that will be addressed in this research, which can be stated as:

“It is hypothesised that it is possible to construct an algorithm that

accurately replicates human perceptual and subjective experimental re-

sults”

63

2.8 Key research questions

The thesis statement, and the discussions from a review of the literature lead to key

research questions which need to be addressed. Throughout this work, a range of

real-world images should be used.

1. Can ground truth data be obtained that is suitable for testing focus measures?

2. How well do focus measures perform when compared against the ground truth?

3. How can human blur opinions and perception be measured?

4. Can human results be compared with those from focus measures?

The following chapter introduces the methodology, and more details of the vari-

ous algorithms and processes that will be used in answering these questions. Chap-

ters 4 onwards move on to provide the results of experiments that have been per-

formed, and then Chapter 8 discusses the results and draws conclusions to answer

the thesis, before discussing future work that could be done.

64

Chapter 3

Methodology

In order to answer the key research questions, different experiments need to be

undertaken and analysis performed. Each step of the process requires careful prepa-

ration of both input data and experimental procedure as well as the selection of

appropriate analytical techniques for understanding and interpreting the results.

This chapter describes experimental procedures that have been used in previous

work, compares and contrasts their approaches, and describes practical considera-

tions about how the activities are to be performed, then moves on to discuss the

necessary data analysis.

3.1 Image selection and acquisition

To explore the estimation of defocus of humans and algorithms, a library of images

is required. For each scene multiple images, each corresponding to different focus

depths are required. Despite an extensive review of prior work, there appears to be

no publicly available library of images captured at different focus positions. Such

image sets have been captured for related work (eg [141,142] which attempt to merge

multiple images of the same scene to achieve an infinite depth of field), but neither

the source of these images nor a description of their acquisition is provided. In the

future, it is likely that light field photography (eg [136]), will be able to produce the

required set of images in a single exposure. Unfortunately, the present state of the

art of light field photography is of relatively low resolution.

Accordingly, it was necessary to capture scenes for use in this work. As multiple

images need to be captured for each scene, it was found to be impractical to acquire

these manually. Despite great care, adjusting the camera’s lens frequently resulted

in slight movement of the tripod, and thus changed the view of the scene. Instead,

it was necessary to capture images automatically.

65

Some vision research areas, typically those examining the visual pathway and

processing of natural scenes, warrant careful camera calibration. Specifically, there

is a danger that observed behaviour in humans could be a result of artifacts arising

within the camera’s image forming process, and not actually due to the intended

natural scene stimulus. Brady and Legge provide an extensive camera calibration

methodology [143]. They compare a number of lenses and camera bodies, and show

(of the ten characterisation parameters they examine) some parameters vary as the

camera’s lens, zoom, exposure, aperture, ISO and object distance vary. However,

they report that two variables, object distance and lens focus, cannot be charac-

terised, thus rendering their procedure of doubtful use to this work. On this basis,

camera calibration has not been performed, a decision which is especially mitigated

by the fact that there is no requirement to vary camera parameters other than focal

distance during image acquisition, thereby not needing the benefit of Brady and

Legge’s approach to handling different luminances, zoom positions, or other camera

settings.

That is, whilst it is possible to establish the precise configuration of lenses re-

quired to ensure light rays at a particular point in a scene are focussed, characterising

a given camera lens for its entire field of view is an arduous task, and not central

to this work. Instead, it is assumed that a camera (or other imaging device) can

produce images of a given scene at a range of focal distances. This work then ex-

plores how these images can be compared to determine which is the ‘best’ for the

entire scene. Such an approach is independent of the optics of the camera, and other

device-specific artifacts arising from image acquisition.

3.1.1 Capture

There are several approaches for automatically capturing images from a camera, as

follows:

1. On-camera script on the DC-290: Certain cameras, such as Kodak’s DC-

290, allow for programs to be written on a computer and downloaded to the

camera, using the Digita language [144]. Whilst the scripting language allows

for a large number of possible focus positions to be specified, testing on the

camera itself revealed that only seven different positions are achievable. See

Appendix D.1 for further information.

2. Programmable video cameras: Sony have manufactured a range of video

cameras, including the EVI-D30, designed for use in video conferencing appli-

cations. To facilitate their installation and use, they can be controlled via a

66

(a) Olympus C-2040 Zoom (b) Olympus 500-UZ

Figure 3.1: Cameras

serial communications protocol which (in addition to controlling pan, tilt and

zoom) allows the in-built auto focus mechanism to be disabled and the camera

focussed at an arbitrary position. See Appendix D.2 for further information.

3. PTP camera control of still digital cameras: Many cameras built since

2003 support the picture transfer protocol (PTP), which allows computers and

printers to communicate with cameras to download and print photos. This

protocol is not solely one-way, but does allow the host to control the camera

to some extent. Whilst most cameras allow for the computer to request a

photo be taken, and some allow for the shutter and aperture to be specified,

only Olympus cameras allow their focus to be specified.

4. Bespoke scientific cameras: Some papers describe highly specialised equip-

ment for automatically capturing images (eg [145]). However, because of the

high cost associated with such equipment, and that calibration is not con-

sidered in this work (which is centred around image selection from a set of

images with varying de-focus, rather than precise focus points), this route has

not been pursued.

The Sony EVI-D30 responds rapidly to focus commands, which together with

a PC-based video capture card permits images to be captured very rapidly. Un-

fortunately, image quality obtained from a video capture device connected to PAL

cameras is of significantly lower quality than using digital still cameras, as a result

of interlacing and ultimately of the sensor being a lower resolution. As such, this

device was not used for any detailed studies described in the subsequent chapters.

67

Figure 3.2: Fresh flowers moved during the course of a photography session. Note howthe top petal of the central blue flower moves in relation to the centre of the white flower

Two cameras were evaluated using the PTP protocol; Olympus’s 2 mega-pixel

C-2040Z and 6 mega-pixel 500UZ. Both of these were assessed using the open source

ptpcam/libptp software libraries running under Linux and using Pine Tree Comput-

ing’s Camera Controller on Windows. Both cameras allow their focus to be con-

trolled to one of 240 positions, though only the 500UZ model has sufficient memory

and battery life to actually capture that many images in a single session and was

the camera used in this work. (A set of 60 images took approximately 12 minutes

to acquire using Camera Controller).

3.1.2 Scene

Independent of the technique used for capturing images is the preparation of the

scene that is being photographed. Ideal sample images should include a clear subject,

which occupies and prominent and substantial portion of the scene, and feature

relatively obvious edges.

As the acquisition takes a non-zero length of time, natural fluctuations in day-

light (such as from the movement of clouds) cause brightness differences between

images. To minimise the impact of this, it is necessary to ensure the scene is illumi-

nated by artificial light and away from any air movement. Despite such precautions,

certain objects did move. A bunch of fresh flowers was considered as a possible

scene. Unfortunately, the flowers opened and changed shape during the course of

the capturing, rendering them an unsuitable subject (see Figure 3.2).

Scenes were chosen from two categories. Firstly, scenes with minimal depth of

field, such as a flat layer of coins or piece of material. Secondly, objects with a clear

difference in depth, such that each image of the scene clearly had a portion of the

object out of focus. To achieve this, small objects were used with the camera at

68

(a) Original [146] (b) Simplified

Figure 3.3: Rendering of a bolt with Povray

a distance typically between 25 and 50cm. All images were captured with a large

aperture (thus small depth of field) using the camera configured to use its best

quality.

As no technical information about the lens optics was available from the manufac-

turers, it is not possible to quantify the focal distance for each photograph. Instead,

it is solely possible to sequence the photographs in order of software-requested focus

position. Each scene was photographed from nearest to farthest focus to minimise

any hysterisis, mechanical lag, or other confounding behaviour that the camera

might exhibit if captured non-sequentially.

3.1.3 Other sources

Two further sources of images were used: First, the open source ray tracing software

Povray was used to generate synthetic scenes. Ray tracing is a physically accurate

means of modelling how light interacts with virtual objects within a scene. An

image [146] was selected from the Povray Hall of Fame [147], and is shown in Figure

3.3(a). This was then simplied to produce an image for further experiments, as can

be seen in Figure 3.3(b).

Povray’s default behaviour is to render the scene with a pinhole aperture. This

means that all points are in focus, and minimises the computation time required to

render each scene, though this is clearly undesirable when trying to generate a stack

of varying focused images. However, it is possible to build a stack of images focused

at different distances, by specifying the focal_point parameter (this specifies the

coordinates of the point at which the virtual camera should focus) and using a

combination of blur_samples, confidence and variance to control the quality of

the rendering.

69

Secondly, to assist with comparing experimental results with published literature,

existing calibrated images have also been considered. A calibrated monitor is used in

experiments to view images so as to eliminate possible distortions that may confound

the results. On such a device, a proportional increase in photon emission rate will

result from a fixed increase in image brightness – ie the computer monitor will

emit light directly proportional to the pixel value. Thus, the manner in which

the mathematical measures will ‘see’ the image is the same as for humans, and

so be a fairer method for comparing the human visual system with computational

techniques.

Given this level of careful control, and the fact that work exploring the sensitivity

of the human visual system to image properties within natural scenes (eg [148])

makes use of existing libraries of calibrated images, it is logical to consider them

for comparison. Related work investigating human sensitivity to contrast use such

libraries [149], whilst others explicitly choose to use non calibrated images from

other sources (such as still frames from commercial DVDs [150]).

Two image libraries were considered: McGill [151] and Van Hateren [152]. Of

these, Van Hateren was selected because its images are greyscale, and thus eliminate

any confounding factors arising from the challenges of extracting perceptual bright-

ness from colour images. The Van Hateren library is made available in two data

formats, IML and IMC. The IML images are “slightly blurred by the point-spread

function of the camera (in particular due to the optics of the lens)”. This is reversed

in the IMC images “by deconvolving the images with the point-spread function cor-

responding to the used lens aperture ... therefore this image set is best suited for

projects where well-defined edges are of more importance than strict linearity” [152].

Accordingly, the IMC images were used in the experiments described in this thesis.

Of the hundreds of photos within the Van Hateren set, images which appeared to

be in focus were selected – those with a blurred foreground or background were

rejected, as were those that appeared foggy. Figure 3.4 shows some accepted and

rejected images.

3.2 Mathematical focus measures

One of the key research questions is to investigate whether autofocus measures

can be used to reproduce psychophysical results. It is therefore essential that all

measures are implemented in a consistent software interface to simplify their use as

model observers in these experiments. The original literature was reviewed for each

measure, using review papers from 1985 [96], 1991 [101], 1997 [99], 2001 [94] and 2004

70

(a) Field (#2238, rejected as it looks foggy) (b) Windmill on far bank of lake (#3223,rejected as the foreground is out of focus)

(c) Tree bark (#1342, accepted) (d) Office buildings (#5, accepted)

Figure 3.4: Selected images from the Van Hateren image library [152]

71

[97] as the primary starting points. Frequently this contained insufficient information

to reproduce the algorithm, whereupon subsequent work citing the original approach

was reviewed. This helped in the selection of parameter values, and in one instance

to resolve ambiguities in the equations, most likely caused by poor typesetting in the

equations (compare equation F8 in [97] with equation 5 in [108]). The net result is

that over four dozen measures have been implemented in Matlab with a consistent

type signature and output. (The full source code of each measure is included in

Appendix H).

Several measures were not implemented due to unresolvable ambiguities in their

description, and so are excluded from further analysis. Several new measures have

been created during this work and are documented in Appendex C.

Certain measures (as described in literature) operate in the opposite direction,

producing a lower score when in focus (eg the entropy measure). As part of the

implementation, the polarity of measures was normalised such that a high score

corresponded to most in focus for all measures.

The emphasis of these implementations has been on clarity and simplicity, so as

to ensure they are an accurate representation of the intended mathematical func-

tions. To maximise clarity, and reduce the chance of any potential errors in imple-

mentation, no effort has been made to optimise performance (see Section 2.3.1).

For example, Sun describes the Brenner Gradient algorithm as one which “com-

putes the first difference between a pixel and its neighbor with a horizontal/vertical

distance of 2”. However, the mathematical equation only suggests a horizontal off-

set, as does the original paper [97, 153]. Similarly, Crane describes his algorithm as

a “measure of derivative”. However, care must be taken not to compute the 1D

derivative in one direction, then the other, as this can render a simple step input

invisible. Instead, the score must be the result of the ∇I. Such subtleties are not

described in the original literature, but are critical for successful use and evaluation

of the various methods.

Chern notes that the focusing window (ie “the region of the scene that is to be

focused”) must be considered [94]. In this work, the focus measures operate over

the entire input image.

3.3 Colour

The ability to perceive different colour is something that humans rely upon in their

everyday lives; from being able to appreciate scenery through to telling whether a

traffic light is red or green. However, the perception of colours varies from person to

72

person. Computers, on the other hand, need to be able to store colours numerically,

such that they can be reproduced at a later date. This is done by representing a

colour as a set of co-ordinates in a colour space. There are a number of different

colour spaces (such as specified in [154, 155, 156, 157]), each of which has its own

particular benefits, applications and disadvantages. Computers typically use an

RGB representation.

Once a colour has been represented numerically, it is then frequently desirable to

establish the difference between two colours. This is trivially done by computing the

Euclidean distance through the colour space. However, two equal colour distances

at different points in the RGB colour space are not necessarily perceptually similar –

the establishment of a perceptually uniform colour space is an area of active research

(eg [158, 159]). When such a colour space is used, experimental results in a wide

range of areas of research are better than when using simpler colour spaces such as

RGB (see [160,161,162]).

Most of the focus measures described in the literature rely solely upon the grey

level intensity of the constituent pixels. So, whilst Crane’s proposed measure is to

compute the derivative of the image, and this could be extended to be computed

independently on each colour channel, there is then the challenge of determining how

to combine the per-channel measure to give an aggregate score for the entire image.

This could be found by calculating a per-pixel Euclidean distance, then summing the

distances. Or simply the sum of the overall per-channel score, or one of many other

possibilities. If a consistent approach is to be taken with all focus measures, then

many options become ruled out because of the nature of the different algorithms

– summing the outputs from multiple wavelet transforms has little meaning. In

support of this decision, Chern observed no significant difference in global maxima

if different colour channels were used (ie red, green, blue or grey), and conluded that

for most situations “greyscale focussing should be adequate”.

Furthermore the visual system’s response to colour and blur is more complex:

Webster observed that “blurring only the light-dark variations in an image produced

obvious changes in perceived image blur, yet when the same blur was applied only

to the colour variations the image remained perceptually well-focused.” [89, p113].

A similar discrepancy between monochromatic images along different colour axes

was observed by Wuerger [84].

Thus, rather than using colour images, all images are reduced to a single channel

representing luminance from RGB using the NTSC formula:

luminance = 0.2989 ∗R + 0.5870 ∗G+ 0.1140 ∗B (3.1)

73

3.4 Psychophysical assessment

Psychophysical measurements establish the perceptual behaviour of human senses.

In this work, the perception of blur is being assessed. This is typically done by

establishing the blur detection threshold (that is, the amount of additional blur

that must be added to an unblurred image before it can be perceived), and the blur

discrimination threshold (how much more blur must be added to an already blurred

image before the extra can be perceived). In both these cases, the threshold is the

extra amount of blur that is required to cause the subject to make the correct deci-

sion on a predetermined (ie criterion) proportion of trials. Typically the experiment

starts with a large additional amount, and then reduces the additional blur until

the subject is only able to make the right decision 82% of the time.

The experimental procedure for establishing these thresholds varies between the

previous work in this field, though the different approaches are broadly similar. The

key variations are:

3.4.1 Task

Establishing the blur threshold requires the subject to make a decision between stim-

uli, typically choosing between two or three stimuli. In a two-stimuli arrangement

(called 2AFC - two alternative forced choice), the subject indicates which is least

blurred. With three stimuli, two images are the same and the subject must indicate

which of the three stimuli is the odd-one-out. In this work, 2AFC presentation was

used.

3.4.2 Stimuli presentation

Independent of the number of stimuli is the choice of presentation used, either

temporally or spatially separated. The spatially-separated 2AFC method shows all

the stimuli at the same time, but spatially separated. For example, two images

might be adjacent to one another, and the subject indicates whether it is the left or

right image that is least blurred. The alternative option is to separate the stimuli

temporally - that is, show one after the other, and the subject indicates whether the

first or second image is least blurred. This latter method has the advantage that

there is no need for the eye to saccade between the stimuli, though does prevent

the subject from making repeated comparisons between the stimuli which could be

done with a single-interval approach, albeit at the cost of longer experiments. These

experiments used two images, temporally separated.

74

3.4.3 Presentation duration

Campbell showed that, when the subject had an unconstrained viewing time, dis-

crimination thresholds were unaffected as pedestal magnitude changed. Only when

the viewing time was constrained did the discrimination thresholds change with

pedestal [42]. Thus, it is clear that presentation time must be limited. Investi-

gating the opposite, the minimum presentation duration, Westheimer showed that

performance plateaus when stimuli are presented for longer than 130ms [48]. So, a

presentation duration longer than 130ms, but still of constrained duration, should

be used. Typically, stimuli are displayed for 200-500ms (eg [163,48,164]), and a du-

ration of 300ms (with a 500ms inter-stimulus interval) was found to be satisfactory

in preliminary trials for these experiments, and was used in these experiments.

3.4.4 Screen

Over the years, the available technology for performing psychometric evaluations has

changed, from 12-bit DEC minicomputers connected to an oscilloscope of known

phosphor type (1970, [40]), through 35 mm film transparencies on a balsa-wood

carrier driven by a servo motor [50]. Most recent trials use a conventional CRT

screen connected to a general purpose computer, operating at a refresh rate of at

least 75Hz.

To achieve linearity, the transfer function of the monitor is typically determined

using a photometric sensor to measure the brightness whilst the computer is config-

ured to display a particular intensity, and repeated at various intensities. From this,

a look-up table is created to convert desired intensity into the pixel values required

to achieve that intensity.

To maximise the number of distinctly achievable intensities, a video attenua-

tor is connected between the computer and monitor which combines the individual

red, green and blue signals in a predetermined ratio, and use the resultant signal

to power each channel on the monitor [165]. That is, an arbitrary (R, G, B) pixel

value will result in a precise grey-level being displayed on the monitor. The exact

ratios within the attenuator are not of great importance as the overall transfer func-

tion also incorporates the output impedance of the computer’s graphics card, slight

discrepancies in the digital to analogue converters (DACs), and other confounding

factors. Accordingly, the lookup table is constructed by treating the entire system

as a black box with the input being the specified (R, G, B) value sent to the operat-

ing system, and the output being the brightness observed on the screen, to achieve

linearity. This is the same approach as was used by Parraga et al [77].

75

It is also important to ensure that the stimulus is sufficiently bright for the cones

within the retina to fire, as these are the neurons that dominate the fovea and

provide the high acuity vision [17]. They start to fire when the illumination is above

1 cd/m2 [166], and this is well beneath the range of a standard CRT monitor (up to

approx 100 cd/m2), meaning no special measures need taking.

Recent work, such as that by Karatzas has looked at methods for screen calibra-

tion that do not require specific hardware, such as a photometer or colorimeter [167].

It has interesting possible applications, such as for displaying museum artefacts over

the internet in a manner that ensures visitors can see details of the artefacts in their

true colours. However, psychometric experiments have yet to be conducted with

this calibration strategy.

In summary, a CRT monitor connected to a computer via a video attenuator

was used in this work.

3.4.5 Screen position

The stimulus can either be viewed with the fovea (that is, arranged such that the

viewer looks directly at the stimulus, and for the stimulus to not extend more than

the few degrees of angle that the fovea subtends), or shown in the periphery, requiring

the subject to fixate on a target, and assess the stimulus with the periphery of their

retina. As the natural way of looking at an object is to simply look at it, this

foveal approach is what is used by the majority of previous work, and is used by the

experiments in this work.

3.4.6 Viewing distance

Most papers use a viewing distance of between 1 and 2m, and ensure that the

stimulus is confined in angular extent to be entirely visible in the fovea. For example,

Weurger uses stimuli presented at a distance of approximately 1.16m on a 19” CRT

monitor (with a visible screen area of 445mm diagonally). In this work, each stimulus

was 256x256 pixels in size, and the screen was operating at a resolution of 1024x768.

Thus, the stimulus was 89mm square, and at 1.16m this corresponds to a viewing

angle of 4.4 degrees [84].

3.4.7 Viewing method

When viewing the stimulus, the observers might use both eyes or just one (having

the other covered with an eye patch), and might be kept in a fixed location, such

as by using a chin rest, or might be free to move around. Some previous work does

76

not describe the precise viewing method (eg [84]), and it is assumed in these cases

that this unconstrained binocular vision method was used.

A preliminary trial showed no significant difference in psychophysical thresholds

between monocular and binocular observations (see Appendix B). As such, and be-

cause each subject was required to complete a large number of trials, unconstrained

binocular vision was used, as this is the natural way of using ones eyes, and likely to

be the least tiring. A comfortable chair was used, at a fixed position, so as to help

ensure the subject remained at a distance of 1.16m – the distance was reconfirmed

at the start of each trial run.

3.4.8 Observers

Once the experimental arrangements are known, it is necessary to select subjects

for participating in the trials. Frequently a small number of observers are used - for

example, just two were used in [56,77], six in [84]. Other experiments have used a few

subjects for all experiments and then validated the results on additional subjects,

such as [79]. It is also common (eg [168]) to use both subjects who are aware of

the purpose of the experiment, as well as those who are not (‘naıve’ observers). All

observers should have good (or corrected) vision, and different methods have been

used to assess this. For simplicity, subjects for these experiments were required to

have had an eye test within the past 12 months, and to be wearing any prescribed

correction. Subjects who said they were colour-blind were not used. Four observers

were used in this work.

3.4.9 Number of trials and analysis of responses

The exact methodology for presenting stimuli, and for analysing the results, also

varies between previous papers. There are two approaches typically employed -

either using a simple staircase (analysed in [169]), or using a more intelligent stimulus

selection strategy, such as QUEST [53]. A staircase approach means that a large

additional-blur is displayed, such that the observer makes the correct decision as to

which image is more blurred. Then, for as long as the observer makes the correct

decision, the blur is reduced. Once the observer makes a mistake, the staircase

‘reverses’ and blur is increased until they once again make the correct decision. The

final threshold is then taken as the average of the last few reversals after a fixed

number of reversals have been recorded.

To reduce the number of decisions that the observer needs to make (and thus

the time taken to perform the experiment), an estimate of the current most-likely

77

threshold can be made at each iteration, and used as the next stimulus. One such

approach has been implemented as a reusable software library: QUEST. The de-

tails of implementing the QUEST model, and how its results are interpretted are

described later (see Section 3.5.1).

3.4.10 Preparing the stimuli

Section 3.1 describes the sources of images used for these experiments. The psy-

chophysical experiments apply varying amounts of mathematically added blur to

a single, in-focus image, of each scene. To do this, a mathematical model of blur

must be applied. Work by Bex [114] compares the perceptual response of applying

blur via three routes: Varying α (the slope parameter, see Section 2.2.5), convolv-

ing with a 2D sinc function, and convolving with a 2D gaussian. It supports the

results of Field and Brady [72], showing that varying α is not a satisfactory model

of blur. Instead, blur must be applied by convolving the image with a model of the

camera’s optical system’s point spread function. Pentland showed that the PSF of

a camera’s optical system can be approximated by a Gaussian kernel [19], and this

is the method used in the majority of previous research.

Several of the previous experiments using a gaussian kernel do not explain how

the additional blur is quantified. For example, Paakkonen says “the blur width of an

edge was specified by the standard deviation of the gaussian.”, but plots his results

using a scale of arc minutes [54]. This implies an equivalence whose justification

or proof is not mentioned, but is assumed to be present in this, and other papers

(eg [45, 56]). An alternative approach for applying image distortion is to measure

the result in terms of root mean square (RMS) contrast distortion. This was done

by Chandler et al [90] when assessing how observers respond to different types of

distortion. In their experiment, a range of different distortions were applied to

the original images such that each resultant image had a specific RMS contrast

distortion.

As image processing affects the images appearance, there might be other artefacts

that arise beyond the desired blurring which could be used by observers as a cue to

the extent of the blur. The most significant of these confounding factors is contrast:

As blur increases, the amount of contrast present in the image decreases. There

are several strategies which could be employed to ensure that contrast cannot be

depended upon when asessing blur – either randomising the contrast, or normalising

it between images. Tolhurst et al’s experiments used stimuli that “were constrained

to have the same overall power and the same mean luminance; hence, they also had

the same RMS contrast” [74]. Others have normalised the image post-distortion to

78

have a full dynamic range, or to preserve the mean brightness.

In this work, performance comparisons between humans and algorithms are being

conducted. Providing that the same stimuli are used for both sets of observers,

then the experiments are fair, regardless of the confounding factors that might be

used to help observers discriminate blur. Secondly, images that are subjected to

contrast randomisation or normalisation do not look as ‘normal’, and as this work

is motivated by real-world applications, no contrast manipulation is performed on

the stimuli.

For the experiments described in this work, convolution with gaussian is used,

numerically quantified by the standard deviation of the function, with no post-

processing or normalisation. The size of the gaussian kernel (in pixels, σ), is then

mapped into an amount of blur (in arc minutes) using Equation 3.2:

blurarc minutes = 60× arctan

(σ × pixel widthviewing distance

)(3.2)

As filtering cannot work at the image boundaries, images are typically extended

by means of reflection, replication, or periodic repetition. However, as in this work

only a portion of the image is used following blurring, no image extension is used

but instead the blurred image is cropped by the filter radius.

3.4.11 Summary

A variety of methods have been used in previous experiments, with a broadly similar

approach. The methodology being followed here is a hybrid of those used by Wuerger

[84], Morgan [163], Burr [56] and Kayargadde [59], and resulted in preliminary trials

giving expected outcomes.

In summary, images from the Van Hateren image library were used. Blur was

applied synthetically using a Gaussian kernel, and images were presented using

temporally separated 2AFC. Images were presented for 300ms, with a inter-stimuli

interval of 500ms, on a CRT monitor connected to a computer via a video attenuator

to achieve better linearity. Four observers were used in these experiments, each

looking directly at the stimulus with binocular vision from a distance of 1.16m,

corresponding to a viewing angle of 4.4 degrees. QUEST was used to select candidate

stimuli for presentation.

79

3.5 Mathematical tools

The operation of the experiments, and subsequent data analysis rely on a number of

mathematical techniques. Some, such as ANOVA, are well known techniques applied

in standard ways, but others deserve discussion, as they are either less familiar, or

have been used in particular ways for this work:

3.5.1 QUEST

QUEST, introduced above, is “an adaptive psychometric procedure that places each

trial at the current most probable Bayesian estimate of the threshold”. That is, when

trying to establish a threshold stimulus intensity, such that the observer can only

make the correct decision between stimuli a certain percentage of the time, QUEST

can recommend the optimal next intensity to measure, and thereby reduce the num-

ber of stimuli required to establish the threshold. It was proposed by Watson and

Pelli in 1983 [170], and made more readily usable by being included in Pyschtool-

box [171,172], a library of useful functions for performing psychometric experiments

in Matlab. It has been used in many previous experiments (eg [163,168,173]).

The QUEST model has several important parameters (additional explanation

can be found in [174]):

Prior estimate This is the starting point for QUEST’s estimates. This could be

an accurate estimate (and thus potentially reduce the number of trials), or

significantly away from the anticipated result, so as to ensure the observer

has an unambiguous first trial. In preliminary experiments, it was found that

observers required less reassurance when the first few trials were unambiguous,

thus a large estimate of gaussian kernel’s σ = 0.5 was used.

Standard deviation of prior estimate A large relative value was used (3, cf σ =

0.5), to reflect the fact that the actual anticipated result was considerably

different to the prior estimate supplied to the model.

Threshold The threshold being measured, 82%.

Beta The steepness of the psychometric function. Beta of 3.5 was used.

Delta The proportion of trials where the observer makes a key-press error. Typi-

cally this is 0.01, though a value of 0.1 was used.

Gamma The fraction of trials that will generate an erroneous response with lowest

intensity. Gamma of 0.5 was used.

80

Separate to the model’s parameters are considerations as to how the model itself

should be used, and results selected. As the task, especially for observers who

have never previously participated in psychometric experiments, is not one which

is regularly performed in day-to-day life, a ‘dummy’ phase was employed. The

decisions made by the subject in response to the first few stimuli were not fed

back into the QUEST model, thereby helping to ensure that the observers were

comfortable both with their physical environment, and the task at hand, and to do

so without affecting the probability distribution function (PDF) within QUEST.

In addition to considering how the experiment starts, it is necessary to consider

the termination criteria, and to select the results. Morgan et al [163] used a fixed

number of trials (50); once these had been completed, they averaged the results of

multiple experiments to determine the threshold (± a confidence interval). Prelim-

inary experiments for this work showed that multiple (interleaved) repeats of the

same pedestal blur by the same observer did not yield the same threshold after 50

trials. And, after 100 trials, there is no significant improvement in inter-trial agree-

ment. Accordingly, the results of multiple trials shorter trials were aggregated, as

explained below.

To help minimise habituation and similar confounding factors that might be in-

troduced if the same scene were viewed on every single stimulus, several experiments

were interleaved during each session. To achieve this, a list of all experiments to be

performed was prepared in software, and then the software ensured that five experi-

ments were running concurrently. That is, when a particular condition was finished,

the software pseudorandomly selected the next experiment to perform, and add that

to the list of active experiments. At each iteration, the software selected an active

experiment, and asked QUEST to suggest the threshold to assess. The order of

presentation (that is, b + δb then b or vice-versa) was determined pseudorandomly,

and then the stimuli were presented.

Once the QUEST process has been followed, it is then necessary to select the

results. Previous research has alluded to only some results being considered (for

example, saying “Each estimate of threshold was based on at least three separate

determinations (QUESTs) per measure” [56], whilst Simmers et al used “at least four

separate determinations” [173]), though the methodology by which determinations

are selected for inclusion in presented data is not discussed.

Preliminary trials showed that, in most cases, the QUEST determination does

neatly converge, but that occasionally this does not happen. The reasons for a lack

of convergence have not been explored, as it occurs on a minority of trials (less

than 10% of trials). Figure 3.5 shows two trials for the same observer and the same

81

0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

Add

ition

al b

lur

Presentation index

(a) Converging

0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

Add

ition

al b

lur

Presentation index

(b) Not converging

Figure 3.5: In most trials, QUEST’s recommendations gradually converged, as can be seenin (a). However, in a small minority of trials, such convergence did not occur, and resultssimilar to (b) were obtained. These examples were from the same observer and underthe same conditions. A bootsrapping procedure was used to fit the data to psychometricfunctions, ensuring the observations made during non-convergent QUEST determinationswere not discarded.

experimental conditions and stimuli, though only one of the trials converges.

In this work, QUEST is used to improve the speed with which data is collected,

but the final threshold is not taken as the result of QUEST’s convergence. That is,

QUEST is used to direct the experiment to the next optimal measurement, rather

than requiring the observer to respond to pairs of images at every point in the sample

space. Once several determinations of QUEST have been performed, the results for

each condition are then aggregated and tabulated, recording whether the subject

responded correctly or not at each stimuli under test. Thus, for a given pedestal

condition, there are a minimum of 120 data points, each comprising the additional

blur being discriminated (δb), and whether or not that additional blur could be

discriminated. These results were then subjected to a boot-strapping procedure to

estimate the 82% threshold, and its 95% confidence interval. This was achieved

by fitting the data to psychometric functions using psignifit, a software package

which implements the maximum-likelihood method described by Wichmann and

Hill [175,176].

In summary, the method of using QUEST in this work is most similar to Burr et

al’s approach [56], but using bootstrapping to establish the psychometric thresholds

rather than taking the result of each QUEST determination.

82

3.5.2 Computation and manipulation of the slope parame-

ter

The statistics of natural scenes, and discovery that the amount of energy present at

each frequency drops off as frequency increases, is introduced in Section 2.2.5. The

method used in this work to calculate the value of α for a given image is based on

Bex and Dakin’s approach for computing the power present in each octave [150].

From these results, a linear best fit is performed to determine α.

To adjust α, each coefficient of the shifted fast Fourier transform (FFT) of the

image is multiplied by a specific scaling factor. The matrix of scaling factors was

determined experimentally to be a matrix of Euclidean distances from the centre,

with each distance raised to a predetermined power, P:

P = (αdesired − αcurrent) /0.995 (3.3)

The DC component of the FFT was unchanged by setting its scaling factor to

1. The full code for changing an image’s α, as a Matlab script, can be found in

Appendix H.35.

3.5.3 Scoring focus measures

The method for scoring focus measures is introduced in Section 2.3.5, which explains

the techniques used by other researchers. However, when focus measures were being

scored, several issues arose.

Firstly, the ‘accuracy’ property is not defined if the focus measure gives multiple

candidate images the same (highest) score. In this work, the best image is found by

finding the mean image with the highest score, rounding down where necessary, and

then the accuracy metric found by determining the difference between the measure-

best and ground-truth.

Secondly, to assist in comparing scores between different scenes, the range value

was computed as a percentage of the number of images, rather than an absolute

number.

Thirdly, a score of zero was considered to be a minimum, even if there was only

an increase in score on one side of the zero. Without this interpretation, the range

of several focus measures is undefined (any focus measure without a false maximum

will lack a minimum on either side of the primary peak, unless zero is considered to

be a minimum).

The full implementation, as a Matlab script, can be found in Appendix H.34.

83

3.6 Data protection and ethics

Advice was sought from the university’s Records Office who advised that the record-

ing and analysis of anonymous data did not require adherence to the Data Protection

Act 1998. Separately, the experiments in this work did not require approval by the

university’s Research Ethics Committee, as they comprise solely the use of behaviour

observations and educational tests for which the participants cannot be identified

nor would there be any consequences of disclosing their responses.

84

Chapter 4

Subjective experiments

The literature reviewed in Chapter 2 makes little reference to establishing the ground

truth when comparing focus measures. Of the papers that do mention how it was es-

tablished, all have used an ‘experienced observer’, or words to that effect – and there

was no evidence of averaging or obtaining the consensus from multiple observers.

This chapter describes a series of experiments that were conducted to find hu-

man opinions about focus in a number of different scenes using a large number of

observers. Two objectives were planned to establish peoples’ opinions about focus.

The first was to establishing the ground truth – which image was most in focus.

The second was to establish whether a focus curve could be created from human

opinions.

Whilst collecting the data for the first objective (Section 4.1), a trial was con-

ducted to establish which of two software approaches was most appropriate for

obtaining the necessary experimental results. It was found that a web-based ex-

periment worked well, and this then used to collect all the data reported in this

chapter.

The focus curve neded for the second objective is not a concept with which

observers are familiar. As such, a series of tasks were given to the observers to

try to indirectly capture the data necessary to plot a focus curve. These tasks are

described in Section 4.2.

4.1 Ground truth

Mathematically, the ground truth can be precisely determined. For a simple optical

system, this is calculated with the thin lens formula, where S1 is the distance from

the object to the lens, S2 is the distance from lens to image, and f is the focal length

of the lens, as shown in Figure 4.1.

85

(4.1)

Figure 4.1: Principle of the imaging provided by a convex lens [177]

However, given S2 and a fixed optical system, there is only one solution for a

single S1. In real world scenes, depth is present, and so it is not possible to focus the

entire scene at the same time, hence subjective opinions of a population of observers

are required to esablish the ‘best’ focus distance.

It is clear, however, that for some scenes the best focus position may differ

between viewers of the scene (eg Example Photograph A.1), or indeed may vary

over time as an observer returns to the photo (eg Example Photograph A.2). But,

whether there are inter-observer differences on less complex scenes has not been

addressed by previous work. To assess this, multiple observers need to be shown a

scene, and be permitted to change the focus until they have identified their preferred

position – the ‘best’ focus distance.

For this experiment, simple scenes of domestic objects were photographed under

artificial light using a tripod mounted Olympus 500-UZ. The camera was controlled

with Pine Tree Computing’s Camera Controller, and configured to take between 20

and 100 photographs at high resolution (for further details, see Section 3.1). Whilst

the software controlling the camera was instructed to request equal changes in focus

position from the camera, it is unclear whether the camera incorporates any feedback

mechanism to confirm it is in the requested position. Nor is it known whether the

camera’s focus positions are equally spaced, and indeed what the mapping function

is between requested position and resultant focal length of the lens. As such, the

camera is used solely for acquisition of images with increasing focus distance, and

no calibration or implied equivalent change between adjacent positions is present.

This does not, however, affect the ability of humans (or focus measures) to make

judgements on these scenes – decisions can be made between images without knowing

their exact source.

86

The acquired photographs were then manually reviewed to ensure no object

motion or other undesirable features were present. Once checked, they were down-

sampled to 640x480 pixels for display, using the resize function in the open source

ImageMagick software.

Using an existing library of calibrated images is discussed in Section 3.1. How-

ever, it is not possible to use these images for this experiment, as the are not available

at a variety of focus distances. That is, whilst gaussian blur (shown to be a good

approximation of optical blur) can be applied to these images, this can only be

done globally. It is not possible to reconstruct the depth information from the pho-

tographed scenes, reduce the depth of field, and then produce multiple images from

the scene with different focus distances, so these libraries cannot be used.

The next consideration is the graphical user interface (GUI) and presentation of

images. Two approaches were evaluated and tested in pilot studies:

4.1.1 Desktop application

An application was written in Microsoft Visual Studio Express. This software plat-

form was selected as it is relatively easy to develop powerful user interfaces that

can run on other computers without requiring a complex configuration process. The

initial user experience objective was to make the task feel like focussing a camera:

That is, trying to minimise the digital feel by making the experience fluid, responsive

and continuous. To assist this experience, a number of input devices were consid-

ered and the Griffin PowerMate selected. This is a USB-connected rotatable dial

with a smooth motion. By virtue of its operation, just a single parameter can be

adjusted, and thus is well suited to the task of ‘focussing’ through a pre-acquired

set of images.

The most challenging (and important) part of the software development was

to pre-load all the images into memory, so that the appropriate image could be

displayed on screen as soon as the dial’s motion determined that it was necessary.

In addition, the transition between images needed to be performed in such a way that

there was no flickering on the screen. By achieving these two aims, the experience of

using the dial to select the most focused image from a series of images feels entirely

natural, and not at all as if there is a computer ‘in the way’.

A pilot trial involving two observers was performed. The observers were unaware

of the purpose of the project, though were briefed on the task to perform. Both

observers encountered difficulty with the task, finding it too time consuming and

also appearing to forget the objective during the task, seeking clarification of the

objective mid-task.

87

After analysis, the results from these two observers during the pilot study were

considerably different to those obtained when testing and developing the experiment.

On this basis, it was decided that a large number of observers should be used for

the experiment.

So, despite the excellent experience available using Visual Studio in conjunction

with the PowerMate experimental setup, the difficulty of recruiting a large body of

observers to participate in the experiment at a fixed location meant that a different

approach was pursued. The original intention to distribute the software to multiple

computers for observers was rendered impractical given the dependence on the Pow-

erMate – a piece of hardware not available on all computers – and so it was decided

to develop a web-based implementation of the experiment.

4.1.2 Web application

Deploying an application over the internet is common practice for business, but less

so in the field of subjective image assessment. By being available to potentially the

entire planet, a diverse range of web browsers, computers and screens could access

the experiment. Rather than try to recruit observers with particular equipment, it

was decided to allow any observer and equipment combination, but to require them

to complete a short questionnaire. This asked for self-declared answers about:

Age Free-text entry

Gender Choice of ‘male’ or ‘female’

Uncorrected vision Choice of ‘short-sighted’, ‘slight-short-sighted’, ‘normal’,

‘slight-long-sighted’ or ‘long-sighted’

Vision correction Choice of ‘none’, ‘glasses’ or ‘contact lenses’

Colour blindness Choice of ‘don’t know’, ‘none’, ‘red-green’, ‘blue-yellow’, ‘other’

Figure 4.2: Griffin PowerMate: A USB device that can send different events to the com-puter when it is rotated or depressed

88

Screen Choice of ‘don’t know’, ‘CRT’, or ‘LCD’

Mother tongue Free-text entry

These answers were then available for analysing with the results, so as to in-

vestigate whether any particular answer caused a significant difference in the image

selected. By being conducted in multiple, unsupervised, remote locations meant

that the question answers could not be validated. Instead, it is assumed that people

answered honestly.

To ensure the user experience was as responsive as possible, all the images (ie

the images taken at each focus distance) of the scene being presented were preloaded

before presentation started. This was achieved by using Secord’s Image Preloader

library [178]. Once all images were loaded, the progress bar was removed from the

screen, and the relevant scene displayed. The observer could then browse through

the focus distance by pressing the arrow keys on their keyboard. The JavaScript

EventListener hooks for keydown were used to detect the key presses, then the

src attribute of the displayed image was changed to be the new image to display.

Because all images were preloaded, this transition occurred very rapidly, and without

any perceived flicker, on all leading web browsers.

To reduce the impact of on-screen clutter acting as distractors from the task,

the experiment was positioned centrally on a black background, occupying 640x480

pixels (approximately 50% of the screen area of a small monitor). The initial image

displayed was clearly out of focus, so observers knew from the start of the task that

focussing was required – a pilot trial placing the starting point at a random location

left some observers confused when the initial image had been almost in focus, and

they were unsure as to what they needed to do.

The experiment was advertised via various email mailing lists to students and

friends. In total, 80 people started the experiment, though a few did not complete

the entire experiment. Various screenshots from the web-implementation are shown

in Figure 4.3, and the scenes used are shown in Figure 4.4.

The precise instructions given to the observers was: “In a moment, you will be

shown a series of images. They will be slightly out of focus. Use the left and right

arrow keys on your keyboard to change the focus of the image. When you have

found the best image, press the enter key. Click here if you’re unsure which keys

these are.” If the observer was unsure of the keys involved, then the link took them

to a picture of a standard keyboard, with the relevant keys highlighted.

Performing experiments with unsupervised observers on equipment that has not

been calibrated could cause misleading results. However, the author feels that the

89

participants who were selected for these experiments were likely to be cooperative,

and unlikely to have motives for reporting inaccurate information or deliberately

responding in detrimental ways. Furthermore, their demographic (being educated

young professionals) is likely to have their computer well configured with reasonable

colour reproduction and have a good quality monitor running at its native reso-

lution. Such assumptions would need to be reviewed, and potentially additional

investigation conducted, should a wider audience be used for future web-based ex-

periments.

The web-server maintained a full log of the exact sequence of images shown

to each observer, and the precise time at which it was displayed. No reference

information or other feedback was provided to the observer, as these might have

provided additional, non-visual information from which a decision could be made.

Instead, the only source of information available to an observer was the desired

image. From these logs, it is possible to see how one observer browsed through one

of the scenes to find the image they thought was most in focus, as can be seen in

Figure 4.5.

4.1.3 Results

The questionnaire results shown in Figure 4.7 provide some information about the

participants in the experiment. Figure 4.8 contains histograms showing the fre-

quency with which each candidate image was selected as the ‘best’ version for that

scene. There were anomalous results which have been identified and removed man-

ually. These are shown on the graphs as grey bars, and most likely arose as people

accidentally clicked through the experiment without actually performing it. Table

4.1 summarises the final results.

Scene Participants Mode Mean Std dev % selecting modeChillis 74 20 19.85 0.86 46%Coins 74 28 28.34 1.74 23%Bolt 75 72 72.13 2.77 28%Red onion 72 19 19.24 1.71 24%Strawberries 75 13.5 13.44 1.49 25%*Towel 68 33 32.13 2.31 19%

Table 4.1: Results for the ‘best’ experiment showing which images was chosen by theobservers of the different scenes in the experiment. Note: The strawberries image produceda bimodal result – both images 13 and 14 were selected equally frequently by the observersas being most in focus. The reported percentage is for one of the modes.

90

(a) Introduction (b) Questionnaire

(c) Instructions (d) Keyboard

Figure 4.3: Selected screenshots from the web implementation of the ‘best’ experiment.

91

(a) Chillis (b) Coins

(c) Bolt (d) Red onion

(e) Strawberries (f) Towel

Figure 4.4: Scenes shown to observers during the ‘best’ experiment.

92

0 2000 4000 6000 8000 10000 12000 140000

5

10

15

20

25

30

35

40

Imag

e in

dex

Time (milliseconds)

Figure 4.5: The journey one randomly selected observer made through the candidateimages when selecting the most in-focus picture of coins.

0 5 10 15 20 25 30 35 40 45 50 55 60 65 700

5

10

15

20

25

30

35

40

45

Age

Fre

quen

cy

Figure 4.6: Histogram showing age of participants in the ‘best’ experiment. 16 participantsdid not specify their age, and are recorded in the 0-5 bin – there were no observers under15 years of age.

93

Unspecified

Female

Male

(a) Gender

Unknown

Long sighted

Slightly long sighted

Perfect Slightly short sighted

Short sighted

(b) Uncorrected vision

Unspecified

Contacts

Glasses

None

(c) Correction

Don’t know

Other

Red/Green

None

(d) Colour blindness

Unknown

CRT

LCD

(e) Screen

Unspecified

Other

English

(f) Mother tongue

Figure 4.7: Pie charts show the overall questionnaire results for participants of the web-based experiments.

94

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

35

Image index

Fre

quen

cy

(a) Chillis

0 20 40 60 80 1000

2

4

6

8

10

12

14

16

18

Image index

Fre

quen

cy

(b) Coins

0 20 40 60 80 1000

5

10

15

20

Image index

Fre

quen

cy

(c) Bolt

0 10 20 30 40 50 60 700

2

4

6

8

10

12

14

16

18

Image index

Fre

quen

cy

(d) Red Onion

0 5 10 15 200

2

4

6

8

10

12

14

16

18

20

Image index

Fre

quen

cy

(e) Strawberries

0 20 40 60 80 1000

5

10

15

Image index

Fre

quen

cy

(f) Towel

Figure 4.8: The graphs show the frequency with which each candidate image was selectedas being ‘best’ for the given scene. Bars in grey were identified as being anomalousand have been excluded from further analysis. They most likely resulted from observersproceeding to the next task in the sequence without selecting an optimal image.

95

4.1.4 Analysis

It was anticipated that observers would, to a certain extent, agree with one another,

though the exact extent of agreement, and whether it would vary by environment

or individual was unknown. The results in Figure 4.8 confirm that humans mostly

agree with one another. Each scene resulted in a single group of selected images,

which would be expected given that none of the scenes had multiple objects at

different depths within them.

However, the observers were not unanimous in their opinions. To explore the

reasons for the diverse answers, the results were processed using ANOVA as a linear

model evaluated with the R statistical processing environment [179]. A model of

all the individual parameters, as well as all combinations of pairs of parameters

was constructed. The questionnaire results were collated slightly – as there were

no responses to the ‘What is your mother tongue?’ question that occurred more

than once (other than English), the model was built based on the answer to the

question ‘Is your mother tongue English?’. Similarly, ‘age’ was grouped both by

decade, and separately to ‘under 30’ and ‘over 30’. Thus, the full model used the

following parameters, plus all combinations of two parameters:

� Over or under 30 (1 for under, 0 for over)

� Age (in decades, rounded down)

� Gender (1 for male, 2 for female)

� Screen type (1 for LCD, 2 for CRT)

� Colour blindness (1 for none, 2 for red/green, 3 for other)

� Correction (1 for none, 2 for glasses, 3 for contacts)

� Uncorrected vision (1 for short sighted through 5 for long sighted)

� English is mother tongue (1 for yes, 0 for no)

The full ANOVA tabulations, computed independently for each scene, are in-

cluded in Appendix F. From these full tables, the most significant relationships

were different between scenes:

Chillis Pr(> F ) for Gender was 0.096, and 0.065 for the compound variable

DecadeAge and Correction, both of which are very weak effects.

96

Coins This had stronger correlations – Pr(> F ) was< 0.3 for both Young30:Correction

and DecadeAge:Gender, whilst the value for Correction:Uncorrected was

stronger still: Pr(> F ) = 0.0157. There were no strong effects for single

variables.

Bolt For this scene, the strongest effect was with Uncorrected vision (0.013), then

Colourblind (0.019), followed by Young30:English (0.02856) and DecadeAge:Un-

corrected (0.045)

Red onion Gender was the only strong effect, with Pr(> F ) = 0.0162

Strawberries This scene had the strongest effect of all scenes, with Pr(> F ) for

the compound variable Gender:Screen being 0.0001. There were weaker effects

for DecadeAge:Gender, DecadeAge:Uncorrected and Gender:Correction.

Towel There were weak effects with Gender and DecadeAge:Correction

In summary, three of the scenes had no strong effects (Pr(> F ) < 0.05) with any

single variable. The single variables that did have strong effects were Uncorrected

vision, Colourblindness and Gender, though these relationships were not present

across multiple scenes. This suggests that whilst there were some effects, these were

not widespread across the experiment. No correction for multiple comparisons was

performed. Had such an approach been taken, it is anticipated that the individual

effects appearing in the different analyses would have been suppressed. That is, there

is not a subset of the observers who across all images made a particular preference

towards focussing nearer or father away.

By reanalysing the data from one of the single-variable correlation (gender, whilst

looking at the Red Onion scene), the impact of this variable can be seen. Figure

4.9 is of two histograms (normalised to the same area), for males and females,

showing how the genders selected different images as their preferred ‘best’ image.

The modal image indexes were 19 (male) and 20 (female), though the mean index

was the other way around – 19.5 for male (σ = 1.59) and 18.5 for female (σ = 1.84).

If the distribution were modelled as a normal distribution, then all of these figures

are well within the 95% confidence interval (2σ).

No statistical analysis of the union of all scenes in this experiment was performed

as doing so would require their results to be comparable. This is not possible, as

there was no normalised focal position around which to equate the results – for this

was the position being sought by the experiment, and because the differences in

focus between adjacent images of the same scene are not necessarily equal.

97

10 15 20 25 30 40

30

20

10

0

10

20

30

40

Image index

Fre

quen

cy (

perc

enta

ge, s

plit

by g

ende

r)

MaleFemale

Figure 4.9: Comparison of male and female responses when asked to select the best imageof the red onion scene

Instead, the aggregate human results could be considered to be a focus measure.

A focus measure produces a score for each candidate image of a scene. The histogram

showing the frequency with which humans selected each particular image could be

considered a score. So, if three people thought image 10 was best, then image 10

receives a score of three, etc. This human focus measure can then be assessed using

the methodology proposed in Section 2.3.5, to assess the width, range, and other

properties of the ground truth’s distribution. By definition, the average human

response is the ground truth, so the accuracy is perfect. The other criteria are

summarised in Table 4.2, with the score calculated using equation (2.8).

Scene Accuracy Range* False maxima Width Noise level ScoreChillis 0 86% 0 1 1.460 0.321Coins 0 97% 2 5 6.311 1.994Bolt 0 93% 2 1 0.871 1.416Red onion 0 86% 0 3 0.450 0.781Strawberries 0 65% 1 3 1.324 0.723Towel 0 92% 0 3 1.373 1.067

Table 4.2: Ranking human responses shows that humans were best (using the methodologyintroduced in Section 2.3.5) when viewing the chillis scene. Note: For clarity, the rangefigure has been expressed as a percentage of the available images outside the first minimaon either side of the peak, as discussed in Section 3.5.3. Accuracy is zero, by definition,as these are human results being displayed.

98

These results, especially the very narrow width of the histograms at 50% ampli-

tude, show that untrained humans are very much in agreement with each other when

making focus decisions. There were false maxima present, but the largest of these

was less than 40% of the peak amplitude, and none were wider than two images.

4.1.5 Conclusions

This experiment has shown that humans do agree with one another when focusing a

scene on a computer screen. There are differences between subsets of the population

on a per-experiment basis, but these have been shown to be small in comparison

with a 95% confidence interval. That there are no common factors between scenes

reinforces the interpretation that the per-experiment differences are not indicative

of an underlying factor that does affect focus decisions.

The best quantitative comparison score was achieved when focussing a scene

featuring some chillis. There were no significant differences between the observers’

ability to focus on synthetic or photographed scenes, nor a difference between fo-

cusing on 2D or 3D objects. However, even in the scene with greatest agreement

(chillis), only 46% of the observers selected the most popular image as their ‘best’,

and significantly lower for the other scenes. Thus, for untrained observers, the opin-

ion of just one observer is unlikely to be the consensual ground truth, and multiple

observers opinions should be sought.

No justification of for the high level of agreement in the chillis scene can be made

with certainty from these results. It is the author’s opinion that there is, perhaps, a

particularly attentive point in the scene which observers endeavoured to focus, and

that there is not such a popular point in other scenes.

Having now established the ground truth for a number of scenes, Chapter 5

compares these results with existing focus measures.

4.2 Building a focus curve

The focus measures introduced in Section 2.3 each produce a single score for a

single image. If multiple images of a scene are processed by the same measure, then

a focus curve can be plotted, and the peak of that curve is the best focus position,

as judged by that measure. The previous experiment asked human observers to

manually select the best focus position, and then created a focus measure graph

based on the frequency with which humans selected each particular image. These

histograms are tall and thin in shape, which is expected as observers would not

99

select a clearly defocussed image as being ‘best’. Thus, for many candidate images,

the score was zero.

However, humans can discriminate between two out of focus images to determine

which is sharpest, and thus there must be a measure that the brain is using. Previous

work to determine this measure has not been found, though Fylan’s proposal in

1998 to record visual evoked potentials (VEPs) might yield quantitative information

about the measure used by the brain [34].

To determine a focus curve, one could present blurred images to the observer, and

specify that their task be simply to score them (eg [58]). This opinion score would

equate to the quality of the scene at each focus distance, and thus form a focus curve.

However, observers are likely to simply line up the images in order of defocus and

assign scores linearly on that basis, thereby simply producing a triangular function,

and so is not a direct measure of the human focus opinion. So, several approaches

were pursued to try to indirectly extract a human focus curve, without requiring

the observer to retain a memory of multiple images:

4.2.1 Time to re-order

The first method attempted was to ask subjects to re-order photos of the same

scene so that they were in the order of increasing focus. The hypothesis was that

it would take subjects longer to reorder photos that had similar perceived focus

quality (ie focus score), and so the overall focus curve could be plotted by using the

response times as the gradient at each point. Specifically, the subject was shown

on a computer screen three images of the same scene from random focus positions.

Using the mouse, they could drag and drop the images to change their order. Once

satisfied, the subject clicked a button to indicate they had finished. The subject

was then shown another three photos from the same scene, but from different focus

positions.

It was hoped that the reciprocal of the average response time (given the correct

answer was received) for a triplet of source images centred at some focus point, p,

could be used as the gradient of the focus curve at that point. Photos that were

easy to order would have a low average response time, and so a steeper gradient,

as there must be a large difference in perceived focus if observers could sort them

quickly.

However, no meaningful conclusion could be drawn from the results, and this is

likely to be for two reasons. Firstly, for the gradient to be known with confidence

at any given point, then multiple subjects will have had to order a triplet of photos

around that point, and such a large number of trials was not conducted. Secondly,

100

0 10 20 30 40 500

50

100

150

200

250

300

350

Focus position

Tim

es im

age

disp

laye

d

Figure 4.10: Popularity of images in the ‘chillis’ scene, given that each observer started atposition 0.

and perhaps most significant, was that the response times did not vary significantly

when the photos were “easy” or “hard” to order – most of the time was spent using

the mouse to interact with the computer to record the subject’s decision, rather than

actually working out the correct order, so the experiment mainly recorded mouse

dexterity rather than focus opinion.

4.2.2 Image viewing frequency

It was decided next to try to capture more instinctive information. The Griffen

PowerMate was used to enable the subject to rapidly navigate through the images

of the scene captured at different focus positions. The hypothesis was that subjects

would more frequently look at images that were sharper in focus than those that

were less focused.

The results, as reproduced in Figure 4.10, show that there is a peak in image

popularity when the image is in focus. However, it is perhaps not justifiable to say

that this represents the human focus curve, as the image on either side of the peak

will always have to be seen frequently, as the subject will need to go past these to

get to the peak.

101

4.2.3 Time spent on each image

To reduce the impact of the nearest-neighbour phenomenon, the same data was

processed differently. The revised hypothesis was that the less in-focus an image,

the less time a subject would spend looking at it. The results are shown in Figure

4.11. This shows that 99.8% of the images displayed were viewed for less than 125

milliseconds, and that those looked at for longer were significantly longer (typically a

few seconds) – no image was looked at for more than 125 milliseconds excluding those

exceeding 300 milliseconds. Therefore, the average time spent looking at each image,

having excluded those times greater than 300 milliseconds, was plotted. This can be

seen in Figure 4.12. The results are quite noisy, but do show a curve which matches

with the expected observation near the peak. However, there is very little drop-off

in viewing time beyond image 40, despite there being a continued deterioration in

sharpness in this region which is obviously visible to human observers.

4.2.4 Blur equivalence

Rather than attempting to construct an entire focus curve, a new curve could be

generated by anchoring one half of the curve to a known function, and asking ob-

servers to find images that are equivalently blurred. Having established the ground

truth (as described above), the images were partitioned into two groups. The first

group containing only images focussed further away than the best image, whilst the

second group containing the images whose focal distance is nearer than the best

image. The images in the first set were manually assigned a linear score, between 0

(for the least focussed image) to 1 (the most focussed).

Observers were presented with a screen divided in two. The left side of the

screen showed a static image selected from the first set of images. The right side

of the screen could be controlled by the observer to vary the image displayed, but

restricted to only show images from the second set. The observers were then asked

to select the image on the right that they felt was equivalently blurred to the image

on the left. Figure 4.13 shows the user-interface for this experiment.

Whilst this is a straight forward task for flat scenes, such as the coins or towels

scenes, it is harder for an observer looking at a scene with depth. This is because

some points of the target scene might be in focus, yet the overall feel for the scene’s

focus needs to be assessed and reproduced on the matching scene. 84 observers

participated in this experiment, all using the red onion scene. Each observer had

one attempt on each of four stimuli images. Their results are shown in Figure 4.14.

The results, as with the ‘best’ experiment, show a tight grouping of observer

102

0 50 100 1500

250

500

750

1000

1250

1500

Duration (milliseconds)

Fre

quen

cy

0 50 100 1500

50

100

Cum

ulat

ive

freq

uenc

y (%

)

Figure 4.11: Distribution of viewing times when browsing the ‘chillis’ scene using theGriffen Powermate. 99.8% of the images viewed were shown for less than 125 milliseconds.The grey bars show the distribution of viewing times, and the black line shows cumulativepercentage.

0 10 20 30 40 50 60 70 8020

30

40

50

60

70

80

90

100

110

Image index

Ave

rage

tim

e sp

ent a

t eac

h im

age

(ms)

Figure 4.12: Average time spent looking at each position, given that the time was lessthan 300 milliseconds. A trend line has been added using the Lowess method, with awindow size of 20 points.

103

Figure 4.13: User-interface for the ‘equivalent’ experiment. The left side of the screenshowed a target image, and observers were required to use the arrow keys on their keyboardto control the image on the other side and to select the equivalently blurred image.

0 10 20 30 40 50 60 70 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

TargetsMean95% (2 σ)68% (1 σ)Ground truth

Figure 4.14: Results for ‘equivalent’ experiment: 84 observers were shown target imagesfrom the red onion scene, whose score had been linearly assigned, and asked to select theimage from the otherside of the ‘best’ position such that it was blurred to an equivalentamount as the target image. The mean responses, together with confidence intervals, canbe seen on the left side of this graph. For each score the horizontal width of the shadedarea shows the range of opinions for the image at that score, .

104

opinion. With a normal distribution, 95% of the results lie within two standard-

deviations of the mean. This region, shown on the graph, is relatively narrow for

the majority of the curve, encompassing between four and eleven images when the

target is most blurred. The results show a similarly steep drop-off rate of score with

focus position as has been seen in the earlier approaches to producing a focus curve

for human opinions. However, within the 95% confidence region, the same image

(number 10) could be the equivalently blurred version of all of the five different

targets. Therefore, whilst the expected trend is present, and the sample size is

relatively large (84 observers), the data is also consistent with no trend being present.

Observers reported that it was frustrating (and thus hard) to try to find the

equivalently blurred image when part of the scene was in focus. By the experiment’s

design, the same portion of image would never be in focus in the candidate set of

images, and this was the underlying issue. This difficulty with images that were

partially in focus is the most likely explanation of the larger standard deviation

when closer to the ground truth of the ‘best’ image.

4.2.5 Half-way focused

A final approach was pursued wherein human results were used to plot the entire

focus curve, and was designed following observer feedback on the equivalence exper-

iment. Rather than selecting an equivalently blurred image, observers were asked to

select an image that was equally defocussed with respect to two stimuli (ie halfway

defocussed between two target images). This is also known as ‘subjective bisection’.

Using this approach, each trial was restrained to being on just one side of the ground

truth, and so the difficulty reported by observers in Section 4.2.4 eliminated.

By manually assigning the score 0 to the least focussed image at either extreme

(positions A and B in Figure 4.15), and 1 to the in-focus image (position C), the

hypothesis was that the middle of the graph could be populated by asking observers

to identify the image whose score was precisely 0.5, from either side of the in-focus

position (positions D and E). Then, by repeating the process, the quartile values

(positions g, h, i and j) could be determined.

A g D h C i E j B

Figure 4.15: Establishing the half-way defocused images: Images A, B and C were pre-determined, and from those, observers selected positions D and E, then g, h, i and j.

105

Figure 4.16: User-interface for the ‘halfway’ experiment. The left and right sides of thescreen showed target images, and observers were required to use the arrow keys on theirkeyboard to control the image in the centre to select the image that was half way betweenthe target images.

To help reduce any confounding effects that might arise, each position on the

curve was presented to the observers twice, once with either arrangement of targets.

So, the observer had two attempts at identifying image D; once with A on the left

and C on the right, and vice versa. Figure 4.16 shows the user interface for this

experiment.

72 observers participated in this experiment, whose results are shown in Figure

4.17. As with the previous experiments, it shows that observers are in agreement

with each other. There were a few outlying data points, which were not manually

removed, but by virtue of the large number of observers, these are outside the

confidence intervals calculated from the standard deviation. Most importantly, the

results show that this experimental methodology can be used to produce a focus

curve.

4.3 Conclusions

The experiments conducted have provided answers to the first few key questions

that arose from the thesis statement. Across a number of different scenes, com-

prising both natural photographs and synthetically generated images, it has been

shown that there is agreement between observers as to which is the most in-focus

image. Most informatively, the agreement is not unanimous, and the ground-truth

image was never selected by the absolute majority of observers. This suggests that

future work investigating focus should not rely upon a single observer when looking

106

0 10 20 30 40 50 60 70 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

TargetsMean95% (2 σ)68% (1 σ)Overall focus curve

Figure 4.17: Results for ‘halfway’ experiment: 72 observers were shown two target imagesfrom the red onion scene, and asked to select the mid-point in terms of focus. This wasrepeated to find the scores at 0.25, 0.5 and 0.75. Their mean responses, together withconfidence intervals are plotted.

at general purpose images, but instead must gather the opinions of a number of

observers.

Reassuringly, for the scenes used in these experiments, there was no distinguish-

able subsets of the observer population which selected different ground-truths. There

were some factors which had weak correlations with the selected images, but these

factors did not persist across scenes. This suggests that, regardless of gender, age,

vision or screen type, observers make equally accurate decisions when selecting the

ground truth. That said, the population of observers was skewed towards those

people known by the author, and thus are more likely to have undertaken higher

education and be computer literate. Whether these variables cause an influence on

the results is unknown, but in the author’s opinion, unlikely.

Several approaches were pursued to try to construct a human focus curve. Data

analysis and observer feedback led to the gradual refinement of the methodology,

until a working approach was found. The methodology for establishing a focus

curve minimises the amount of specified data, and is completely observer-generated.

No time limits are imposed, and observers are free to browse through the candidate

images as many times as they desire, thereby improving the accuracy of their opinion.

The next step is to compare these ground truths established from a large body of

observers with the behaviour of the various focus measures proposed in the literature,

107

and this is investigated in the following chapter.

108

Chapter 5

Performance of focus measures

The results generated in the previous chapter provide the ground truth for a number

of scenes. The most in-focus image of each scene has been identified, and a focus

curve has been generated. This chapter shows the results of processing the same

scenes with the focus measures discovered in the literature and created during the

course of this research.

A focus measure is a mathematical function which processes a number of images

and determines which image is most in focus. All the measures reviewed in Sections

2.3 and 3.2 process one image at a time, giving each image a score. The image of

the scene with the highest score is the most in-focus image.

Section 2.3.5 describes how previous researchers have compared focus measures,

by giving them a quantitative score according to their performance. The score is de-

rived from a number of individual properties measured from the focus curve. Whilst

this score is of merit, some of the underlying properties measured have more impor-

tance than others – for example, a high-scoring measure might still be inaccurate.

A focus measure should be able to identify the same image as being maximally in-

focus as that identified by human observers, and this accuracy should be of greater

importance than, say, the width of the curve at 50% height. As such, consideration

of the focus measures’ accuracy is given special attention:

5.1 Accuracy

For each scene shown to the human observers, the same images were prepared for

mathematical assessment by a range of focus measures; the same resolution (640x480

pixels) was used, though they were reduced to grayscale from RGB colour, as the

focus measures process a single luminance channel, rather than full colour informa-

tion. This colour reduction was performed by extracting the luminance information

109

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(a) thresholdedcontent

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(b) menmay

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(c) imagepower

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(d) phasecongruence

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(e) cranepeak

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(f) tenengrad

Figure 5.1: Score plotted against focus distance for a range of focus measures using the‘chillis1’ sample scene. Four measures fail to form a global peak near the human observers,who selected image 20 as most in focus.

from the colour image, using the NTSC luminance formula (Equation 3.1). Once

the scores were computed, they were normalised such that, for each measure, the

range of scores given to each scene was 0–1.

The focus curves for all measures were plotted, then these graphs examined by

eye, allowing several initial qualitative observations to be made: Firstly, certain

focus measures fail to produce a peak in score anywhere in the middle of the set

of images. Examples of this behaviour include the thresholdedcontent, menmay and

imagepower measures. Secondly, the phasecongruence measure, whilst it does have

a maximum in the right place, this is not the global peak for the measure. Figure

5.1 shows these four focus curves, together with those for cranepeak and tenengrad,

two measures which performed well. The full set of focus curves for the ‘chillis’ scene

are included in Figures 5.5 through 5.7 (pages 116 – 118).

Results from Section 4.1.3 show that the modal response of human observers

viewing the chillis scene was to select image 20 as the most in focus, with a 95%

confidence interval of ±1.72, but only 80% of the measures selected images 18-22

as their peak. This is illustrated by Figure 5.2, a series of histograms showing the

distribution of the best image selected by each measure across the selection of scenes

as used by the human observers.

Indeed, by considering the accuracy of the measures when assessing all six scenes

under test, it is possible to aggregate the results and establish which measures are

110

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

35

Image index

Fre

quen

cy

(a) Chillis

0 10 20 30 40 50 60 700

5

10

15

20

25

Image index

Fre

quen

cy

(b) Red Onion

0 20 40 60 80 1000

2

4

6

8

10

12

14

16

18

20

Image index

Fre

quen

cy

(c) Coins

0 20 40 60 80 1000

5

10

15

20

Image index

Fre

quen

cy

(d) Bolt

0 5 10 15 200

2

4

6

8

10

12

14

16

18

20

Image index

Fre

quen

cy

(e) Strawberries

0 20 40 60 80 1000

2

4

6

8

10

12

14

16

18

20

Image index

Fre

quen

cy

(f) Towel

Figure 5.2: The graphs show the frequency with which each candidate image was selectedas being ‘best’ for the given scene by individual focus measures. Bars in grey show thehuman results for the same scene, as Figure 4.8. It is not possible to determine which is thebest measure by examination of these frequency charts. What is shown is that (for mostscenes) the image most selected as ‘best’ by observers is also the modal image selectedby focus measures. An assessment of the individual performance of the focus measures isdescribed in the following section.

111

reliable. For a given scene, each measure was classified as either perfectly agreeing

with the modal human response, falling within the 95% confidence intervals of the

human result, or failing to match human observers. Starting with the scene with

the fewest number of measures that failed to match human observers, the measures

that succeeded (either because they agreed, or were within the 95% intervals) in

a given scene were then assessed against the next scene, and the process repeated.

These results are shown in Table 5.1:

Scene Perfect agreement Within 95% Failed Not consideredChillis 28 8 9 n/aRed Onion 27 7 2 9Coin 23 8 3 11Towel 0 27 4 14Strawberries 0 20 7 18Bolt 0 2 18 25

Table 5.1: Whilst most focus measures do well for some scenes, the subset that performwell across a number of scenes is far lower. The scenes in this table are sorted such thatthe fewest measures are eliminated at each stage.

This shows that no single focus measure selected the same best image as the

human consensus every time. And, over all the scenes, only two measures came

close to matching the ground truth. They were cranepeak and tenengrad, two of

the earliest proposed focus measures (1966 and 1970 respectively). However, it is

worth investigating the final scene in detail. Prior to the measures being tested

against the bolt scene, there were still 20 measures that had performed satisfac-

torily in the first five scenes (autocorrelation, brennergradient, cranepeak, crane-

sum, energylaplace, energylaplace5a, energylaplace5b, energylaplace5c, laplace, smd,

sml, squaredgradient, tenengrad, thresholdedabsolutegradient, triakis11s, va, voll4,

waveletw1, waveletw2 and waveletw3 ). That almost all of these successful measures

were eliminated in the bolt scene suggests that there must be something particu-

larly difficult for the focus measures within it. By looking at a few of the measures

that failed on the bolt scene, it is apparent that they are scoring some candidate

images very highly, and that these scenes are a long way from the ground truth.

Four particular measures highlight the difficulties encountered whilst assessing this

scene, and they are shown in Figure 5.3.

Firstly, energylaplace completely fails – it indicates that the frame contains least

energy (and thus lowest score) when the scene is actually in focus. Secondly, both

sml and waveletw1 report a minima around image 50–60, before climbing again.

Both show a local maxima for the image most frequently selected by the observers,

but this is small in relation to the measures’ overall scores. Finally, triakis11s does

112

0 24 48 72 96 1200

5

10

15

20

25

0 24 48 72 96 1200

0.2

0.4

0.6

0.8

1

Image index

(a) energylaplace

0 24 48 72 96 1200

5

10

15

20

25

0 24 48 72 96 1200

0.2

0.4

0.6

0.8

1

Image index

(b) sml

0 24 48 72 96 1200

5

10

15

20

25

0 24 48 72 96 1200

0.2

0.4

0.6

0.8

1

Image index

(c) triakis11s

0 24 48 72 96 1200

5

10

15

20

25

0 24 48 72 96 1200

0.2

0.4

0.6

0.8

1

Image index

(d) waveletw1

Figure 5.3: Score plotted against focus distance for a few of the focus measures using the‘bolt’ sample scene. The black line indicates the score given by the relevant focus measure,whilst the frequency with which each image was selected by human observers as being thebest is shown as a grey histogram.

113

significantly better, but its peak is well outside the human responses, and so is

insufficiently accurate.

Unlike all the other scenes, which were captured photographically, the bolt scene

was artificially rendered using ray tracing by Povray (see Section 3.1.3). This did

not prevent humans from selecting their preferred best image, but did prevent the

mathematical focus measures from making the correct decision.

The nature of ray tracing is to render the image by considering one pixel at

a time, and trace it through the three-dimensional model of the scene until its

colour and brightness can be characterised. Because the PSF of an optical system is

approximately gaussian in shape, and the gaussian distribution has a wide extent,

accurate ray tracing with optical blur takes a very long time. For the bolt scene,

Povray was instructed to continue rendering until 250 rays had been processed per

pixel, or the pixel’s value was over 95% certain, whichever was sooner (Povray

took approximately 90 minutes to render each frame when using these parameters).

However, it is clear from Figure 5.4 that, for the images identified from the local

minima and maximas in Figure 5.3, there is speckling and dithering present in these

rendered scenes.

Thus, it appears that most measures are susceptible to dithering, a technique

typically used to produce visually appealing images even when the absolute number

of colours present in the image is low. Significant work has been done to improve the

way dithering is performed so that it looks best under a model of human perception

(eg [180]), but this can only be done once the full image is known. With ray tracing,

dithering is an artefact of the uncertainty in the image that has been produced so

far, rather than a mechanism for coping with fewer colours. Indeed, the dithering

apparent in these images is very like that shown in Figure 2.17 (page 57), an example

of compressive imaging, where the image quality slowly improves as ever more data

points are acquired. So, whilst humans can apply top-down knowledge, and know

that they are looking at a bolt (or letter R in the compressive imaging example),

algorithms that attempt to determine the scene’s focus on a per pixel basis are

greatly hindered by the high contrast between adjacent pixels, and struggle to obtain

a valid measure in such conditions.

114

(a) Image 20 (b) Image 40

(c) Image 60 (d) Image 70

Figure 5.4: The bolt scene was rendered by Povray with a number of different cameradistances. However, rendering a scene exhibiting focal blur with a ray tracing algorithmtakes a very long time. Even after 90 minutes rendering, there are artefacts present whichconfused many of the focus measures.

115

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(a) absolutegradient

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(b) absolutevariation

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(c) alphaAdult

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(d) alphaImageEnsemble

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(e) alphaRedOnion

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(f) autocorrelation

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(g) brennergradient

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(h) chernfft

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(i) cranepeak

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(j) cranesum

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(k) CPBD

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(l) energylaplace

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(m) energylaplace5a

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(n) energylaplace5b

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(o) energylaplace5c

Figure 5.5: Score plotted against focus distance for a range of focus measures using the‘chillis1’ sample scene

116

16 18 20 22 24 26 28 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(a) entropy

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(b) groenvariance

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(c) histogramentropy

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(d) hlv

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(e) imagepower

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(f) JNBM

20 25 30 35 40 450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(g) laplace

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(h) masgrn

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(i) menmay

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(j) normalizedgroenvariance

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(k) phasecongruence

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(l) phasecongruence2

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(m) randomnumber

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(n) range

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(o) rawlaplace


117

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(a) smd

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(b) sml

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(c) squaredgradient

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(d) standarddeviationbased-autocorrelation

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(e) tenengrad

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(f)thresholdedabsolutegradient

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(g) triakis11s

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(h) thresholdedcontent

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(i) thresholdedpixelcount

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(j) va

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(k) voll4

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(l) voll5

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(m) waveletw1

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(n) waveletw2

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Image index

Sco

re

(o) waveletw3


118

5.2 Quantitative scoring

Beyond the accuracy of the individual measures, discussed in the previous section,

there are other properties of the focus curve that can be quantified and scored using

the methodology introduced in Section 2.3.5, and specifically using Equation (2.8).

To recap, the score is the Euclidean distance between the characteristics of the focus

curve (including the location of the peak, sharpness, and number of false maxima)

and those of an ideal response – a low score is better. All the focus measures were

scored for each scene using this method, using the modal human response as the

ground truth for assessing the accuracy. The full results for all scenes are included

in Appendix E. These have been summarised across the scenes in Table 5.2.

Overall, the measures had a considerable number of false maxima, and thus a

low range. This means that any control system using these focusing measures might

need to average the values from adjacent points, or do a more wide-ranging search

for the maxima to ensure that it did not settle upon an undesirable local maximum

within the curve.

The measures that ranked highly in this quantitative comparison (cranesum and

autocorrelation) are not those which performed best in the earlier assessment of

accuracy. Thus, they have attributes upon which the existing comparison method-

ology places importance, such as a wide range and few false maxima, but do not

have the highest overall accuracy; both perform very poorly on the bolt scene.

If the bolt scene is excluded from this quantitative ranking, then the best measure

is waveletw3, followed by waveletw2, energylaplace5c, autocorrelation down to laplace

ranked fifth. The two measures identified as being most accurate in the previous

section are then ranked 21st and 23rd (cranepeak and tenengrad), rather than 32nd

and 27th respectively if all scenes are considered.

Previous surveys of focus measures show a certain degree of similarity to one-

another, with Santos, Sun and Groen’s surveys all indicating that the normalised-

groenvariance measure was amongst the best few measures. However, in these ex-

periments, normalisedgroenvariance measure has not performed as well. It was a

long way from the ground truth position in the strawberries scene, and suffered from

having a wide peak, and thus scoring badly on the ‘width’ criteria.

In addition to not ranking normalisedgroenvariance highly the results here have

other differences with previous surveys. To a certain extent, this is to be expected:

As previous surveys have not examined as many measures, the lack of a match

between the best algorithms found by them, and the results here, are understandable.

For example, the best measure found by Santos was voll4 and voll5. However, none

of the measures performing better in the results presented here were considered by

119

Measure Chillis

Coi

ns

Bol

t

Red

onio

n

Str

awber

ries

Tow

el

Tot

al

Ran

k

absolutegradient 42 31 38 36 43 35 225 42absolutevariation 25 12 11 35 30 1 114 19alphaAdult 15 18 16 6 11 30 96 12alphaImageEnsemble 17 11 20 6 12 27 93 9alphaRedOnion 15 22 16 6 38 37 134 28autocorrelation 4 8 36 5 10 9 72 3brennergradient 7 17 23 30 25 16 118 22chernfft 24 5 13 34 37 8 121 24cranepeak 30 33 24 20 24 28 159 32cranesum 3 4 19 16 18 7 67 1CPBD 28 35 22 18 14 32 149 30energylaplace 27 27 31 13 1 20 119 23energylaplace5a 26 26 34 14 2 19 121 24energylaplace5b 23 25 32 15 3 18 116 20energylaplace5c 19 23 27 10 7 17 103 18entropy 43 44 44 44 44 42 261 44groenvariance 21 10 10 24 29 2 96 12histogramentropy 44 42 26 41 39 36 228 43hlv 35 32 1 42 22 43 175 33imagepower 39 36 8 32 41 33 189 34JNBM 18 30 39 19 20 31 157 31laplace 14 7 33 11 4 26 95 11masgrn 12 20 28 17 26 23 126 26menmay 37 40 12 39 40 44 212 41normalizedgroenvariance 31 16 7 31 28 3 116 20phasecongruence 41 43 25 25 21 40 195 36phasecongruence2 33 39 14 38 33 39 196 37randomnumber 45 45 45 45 45 45 270 45range 29 34 43 37 34 29 206 39rawlaplace 38 38 18 43 36 38 211 40smd 2 3 37 12 8 6 68 2sml 6 28 21 2 15 22 94 10squaredgradient 9 1 29 22 16 13 90 7standarddeviationbasedautocorrelation 20 15 4 27 31 5 102 16tenengrad 34 29 2 21 19 25 130 27thresholdedabsolutegradient 9 1 29 22 16 13 90 7triakis11s 11 13 6 28 23 21 102 16thresholdedcontent 39 36 8 32 41 33 189 34thresholdedpixelcount 36 41 15 40 27 41 200 38va 32 19 5 29 35 24 144 29voll4 1 6 35 9 13 11 75 4voll5 22 14 3 26 32 4 101 15waveletw1 13 9 40 1 9 12 84 5waveletw2 8 24 42 4 6 15 99 14waveletw3 5 21 41 3 5 10 85 6

Table 5.2: Overall quantitative scores for all measures, together with an overall rank oftheir aggregate performance. Note that the measures that best matched human results(cranepeak and tenengrad) and ranked relatively low by this quantitative assessment.

120

Santos, and thus their conclusion matches these results.

At the other extreme, Sun reported that the wavelet measures performed poorly,

and that the best was normalisedgroenvariance followed by standarddeviationbased-

autocorrelation, results which are quite contradictory to these presented here. One

possible explanation is the difference in subject matter – Sun was considering images

derived from microscopy rather than general purpose scenes.

Discrimination between focus measures by Firestone et al was mainly a com-

mentary on the width at 50%, as their results showed few false positives and high

accuracy. That these parameters of the focus curve differed in the results presented

here is most likely due to the higher resolution of images used here than by Firestone.

In conclusion, quantitative results have been obtained, but as with previous sur-

veys, they do not point to a single measure as being a clear winner. Cranesum and

smd have achieved the highest rank in these scenes, but they are not the most accu-

rate measures, nor have they been identified by previous surveys as being especially

highly performing.

5.3 Shape

One further analytical technique that can be applied to the focus curves is to see

how they compare with the human-derived focus curves. By mapping the near and

far sides of the focus curves to each run from 0 to 1, the curve can be compared with

the human results determined in Section 4.2 and shown in Figure 4.17 in the red

onion scene. Using only those measures whose peak was within the 95% confidence

interval of the human opinion, Table 5.3 shows the defocussed image identified by

each measure as being the appropriate image for each position. The columns labelled

g, D, h, i, E and j, the same references as in Figure 4.15, represent the positions in

the focus curve where the score is 25%, 50% and 75% on the ascending side of the

curve, and the converse on the curve’s descent.

Eleven of the measures fit within the human results. Of those that do not fully

match, most match the descent (far side) of the curve. There is not a measure which

matches the ascent but not the descent. Only a few measures are fail to match at all:

Thresholdedpixelcount and phasecongruence2 both wildly mismatch, and the energy-

laplace series of measures have a narrower peak than the human results. However,

the progression from the 5a to 5c variants of this family of measures does cause the

curve to widen and then start matching human results.

121

Measure G (25%) D (50%) H (75%) I (75%) E (50%) J (25%) Comment

Humans 1.1-7.0 4.1-11.4 8.3-14.9 22.4-33.3 25.2-43.9 29.4-59.9 95% confidence intervals

absolutevariation 3 6 11 26 32 43 Fits

alphaAdult 13 15 17 25 27 33 Fits right side

alphaImageEnsemble 13 15 17 25 27 33 Fits right side

alphaRedOnion 13 15 17 25 27 33 Fits right side

autocorrelation 11 14 17 25 28 35 Fits right side

brennergradient 2 4 13 24 26 33 Fits

chernfft 3 6 11 25 33 43 Fits

cranesum 7 12 16 25 29 38 Fits right side

CPBD 14 16 17 24 26 29 Fits right side

energylaplace 16 17 19 23 24 27 Too narrow

energylaplace5a 15 17 18 23 25 27 Wider, but still too narrow

energylaplace5b 14 16 18 23 25 31 Wider, but still too narrow

energylaplace5c 13 16 18 24 26 32 Fits right side

groenvariance 3 6 11 26 32 47 Fits

JNBM 11 15 17 24 27 33 Fits right side

laplace 14 16 18 23 25 30 Fits right side

masgrn 3 13 17 24 26 31 Fits right side

normalizedgroenvariance 3 6 11 26 31 43 Fits

phasecongruence2 14 16 18 24 25 77 Not a fit

smd 8 14 17 24 26 34 Fits right side

sml 13 14 17 24 28 34 Fits right side

squaredgradient 5 8 15 24 27 39 Fits

standarddeviationbasedautocorrelation 3 6 10 27 33 47 Fits

thresholdedabsolutegradient 5 8 15 24 27 39 Fits

triakis11s 4 8 14 24 29 40 Fits

thresholdedpixelcount 4 7 11 62 77 77 Not a fit

va 2 4 12 24 27 36 Fits

voll4 10 14 17 25 29 37 Fits right side

voll5 3 6 11 27 33 48 Fits

waveletw1 12 15 17 24 27 33 Fits right side

waveletw2 14 16 18 23 25 29 Not a fit

waveletw3 14 16 18 23 25 29 Not a fit

Table 5.3: Focus measures scores were mapped such that they went to zero at each extreme. Their images for the 25, 50 and 75% score valueswere then determined and compared with the human results (Figure 4.17). Most measures match the human results.

122

5.4 Conclusions

The methodology proposed by earlier work has been used to evaluate all focus mea-

sures on the same scenes as the human observers. The best measures, by this quan-

titative assessment, are cranesum and smd. Even the best measure was relatively

poor on some scenes, getting ranked as low as 19th in the bolt scene, suggesting

that there is an opportunity for a better focus measure to be developed.

If accuracy is solely considered, then no measures perfectly agree with humans,

and only two (cranepeak and tenengrad) even come close to matching human results

in all scenes. One particular scene was especially troublesome for the focus measures

which appear unable to handle images acquired via iterative methods such as ray

tracing or compressive imaging, due to the dithered nature of the image as it is

being constructed.

Beyond quantitative comparisons, the shape of the focus curves have also been

compared with human results. This has shown that eleven of the measures being

evaluated fit within the 95% confidence intervals of the human results.

The most promising focus measures are as tabulated below in Table 5.4. No

single measure has performed well in all the tests. Indeed, it can also be seen that

the highest scoring measures have been amongst the worst in other tests.

Overall, the results from this chapter show that a significant subset of the mea-

sures under test fall within the range of likely human opinion, though none per-

fectly agree in all scenes or with all tests. To refine the evaluation further, addi-

tional human-produced information is required, and this is described in the following

chapter.

123

Measure Accuracy Score rank Shapetenengrad Best 27 -cranepeak Best 32 -smd Good 2 Half-fitautocorrelation Good 3 Half-fitvoll4 Good 4 Half-fitwaveletw1 Good 5 Half-fitwaveletw3 Good 6 No fitsquaredgradient Good 7 Fitthresholdedabsolutegradient Good 7 Fitsml Good 10 Half-fitlaplace Good 11 Half-fitwaveletw2 Good 14 No fittriakis11s Good 16 Fitenergylaplace5c Good 18 Half-fitenergylaplace5b Good 20 No fitbrennergradient Good 22 Fitenergylaplace Good 23 No fitenergylaplace5a Good 24 No fitva Good 29 Fitcranesum Poor 1 -alphaImageEnsemble Poor 9 Half-fitgroenvariance Poor 12 Fitabsolutevariation Poor 19 Fit

Table 5.4: Summary of top performing focus measures by the different assessment criteriaused described in this chapter. Only measures that were found to be accurate, or scoredin the top ten, or were a reasonable match to the focus curve shape have been included;many measures have not performed sufficiently well to be listed. The measures are sortedfirst by accuracy, then score, then shape. Note that no measure scores well and has highaccuracy and a good fit to the human focus curve.

124

Chapter 6

Psychophysical experiments

The results achieved from the experiments of the previous chapters have shown that

some focus measures described in the literature are close to the results seen from

a large body of human observers, and certainly within a 95% confidence interval.

Others have performed highly in the quantitative assessment but do not match

human results. Continuing the theme of previous chapters, further human results,

collected via a different approach, need to be obtained in order to narrow down the

focus measure that most closely matches humans.

The experiments in previous chapters allowed the subjects to spend an uncon-

strained amount of time considering their answer. This enabled them to think about

the content of the images, scrutinising whichever aspects attracted their attention.

The results show that people agree with each other, and no populations have been

found which consistently provide different answers. However, there is no strong cor-

relation between the human opinion, and any particular focus metric. This chapter

describes a different set of experiments which approach the concept of focus from

the other side – rather than asking peoples’ considered opinions, they attempt to

capture normal peoples’ underlying perceptual response.

Experiments described in the literature show how human sensitivity to blur, both

in terms of detection and discrimination, can be measured using carefully controlled

experimental methodologies and more constrained viewing conditions than used thus

far in this research. The results published by other researchers, however, have been

from experiments using relatively simple stimuli, such as a simple step edge [45]

in white light, and under different monochromatic luminances [84]. Experiments

involving real scenes have been limited to Walsh, using observers with anaesthetised

eyes and the observer method, showing peak discrimination ability with a pedestal

of approximately 2D [52].

This chapter describes experiments conducted with multiple subjects following

125

nance edges, however, confirmed the validity of Weber’slaw for external blurs larger than 3 arc min.16 Their We-ber fractions ranging from 0.1 to 0.4 for foveal stimuli areconsistent with our estimates.

2. Contrast DependenceSeveral researchers have investigated the contrast depen-dence of blur discrimination,16,25 two-line resolution,27

and positional acuity.26 The luminance contrast range inour experiment varied from 3% to 10%; in Watt andMorgan’s25 experiment the contrast levels ranged from10% to 80%. Blur difference thresholds were measured

as a function of contrast for an external blur of 2.5 arcmin, whereas we assessed contrast dependence for 0-arc-min external blur. We find a much weaker contrast de-pendence (slope 5 20.15) than reported by Watt andMorgan25 (slope 5 20.5). A direct comparison is diffi-cult since the two experiments used different and almostnonoverlapping contrast ranges. Hess et al.16 assessedblur discrimination thresholds for external blurs of 0.1and 2.5 deg of visual angle for various eccentricities andat various contrast levels (2%, 5%, 10%, and 30%). Forlow contrasts below 10%, the effect of contrast on blur dis-criminability is uniform at both external blur levels, and

Fig. 1. Blur discrimination thresholds (means and standard errors) are plotted as a function of reference blur for three different colordirections for all four observers. (a) Luminance, (b) red–green, (c) yellow–blue. The reference blur refers to the blur of the standardstimulus. The stars indicate the blur thresholds averaged over all observers. The data reveal that for red–green and black–whitestimuli, the smallest blur thresholds occur at a slightly blurred standard grating, not at zero reference blur. Blur thresholds for yellow–blue are constant for all external blur levels.

1234 J. Opt. Soc. Am. A/Vol. 18, No. 6 /June 2001 Wuerger et al.

(a)

Visual sensitivity to temporal change in focus 1213

0.3 l(a) 2.55 c/deg 1 Hz b) 51 cideg 2 mm

Iduced yperopia

induced myopia

I I I I

0.2

D.1

0.0 6 4 2 0 2 4 6

Focus setting relative to optima (dioptres)

Fig. 5. Influence of the temporal frequency of target oscillation or the pupil diameter on tbe defocus thresholds of a single subject in green illumination. (a) Effect of changing the temporal frequency of target oscillation. There is little systematic change in the form of the threshold curve for temporal frequencies between 1 and 4 Hz. Pupil dia. 3 mm. (b) Effect of changing the diameter of the artificial pupil for a fixed temporal frequency of 2 Hz. With larger pupils, minimum threshold is recorded closer to the optimal mean

focus.

were present in the street scene to detect small focus changes in the badly-focu~d image. Once more, it made little difference if green (Fig. 7a) or white (Fig. 7b) illumination was used: in both cases the minimum threshold occurred at much the same level of defocus as was found by Campbell and Westheimer (1959).

420246 Focus setting relative to optiirun

(dioptres)

Fig. 6. Changing the illumination from green to white light has only minor effects on the form of the median threshold curve for a 5.1 c/deg grating. The full curves are white light data for 6 subjects: the broken curves are green light data for 11 different subjects: in each case “optimum focus” is defined as that appropriate to the coiour of the target

induced hyperopia induced myopia 0.0 , ’ * + 1 1 * ’ 1 ’ r - ) 8 r * . ’ ’

10 8 6 4 2 0 2 4 6 8 10 Focus setting relative to optimum (dioptres)

Fig. 7. When the position relative to optimal focus of the street scene is varied, the threshold for detection of focus change (full curves) is minimum at a mean position of about 1.5D of defocus, there being little difference between (bottom) q~si-mon~hromatic green and (top) white ilium- ination. 3 mm pupil, 2 Hz oscitlation frequency, median data for 6 subjects. The dashed curves show the corresponding

illumination. results for a 5.1 c/deg grating, taken from Figs 3 and 6.

(b)

Figure 6.1: Graphs showing the results of similar experiments by other researchers. (a)Blur discrimination, with stars showing blur thresholds averaged over all observers, on asimple grating stimuli [181]. (b) The solid line shows the threshold for detection of focuscurve on a street scene; the dashed line shows the corresponding results from a simple 5.1c/deg grating. The lower figure shows the results under monochromatic green light, andthe upper figure under white illumination [52]

.

a methodology similar to that used by Wuerger [84] and Burr [56], and measures

the blur detection and discrimination thresholds with various amounts of baseline

(pedestal) blur, using natural scenes from the Van Hateren image library. It was

anticipated that the results from such experiments would be qualitatively the same

as published results from simpler stimuli, though unlikely to show exactly the same

thresholds, due to the complexity of the stimuli. Such a quantitative difference

has been reported before – Walsh’s experiments showed that for a complex stimuli,

the discrimination threshold at the point of optimum discriminability was greater

than for simple sinusoidal gratings. Figure 6.1 shows the results from these earlier

experiments.

Walsh showed that with grating stimuli (see Figure 2.9), discrimination was

best at approximately 2D of defocus, where changes of approximately 0.15D could

be discriminated. However, for his complex stimuli (the street scene), the best

discrimination performance was at 1.5D, where the threshold (0.1D) was smaller

than that at which observers could discriminate blur in the grating stimulii. That

is, blur discrimination was better in the more complex scene.

126

6.1 Method

In summary, the experiment involves showing observers a pair of images and asking

them which is most in focus. The images are changed, and the process repeated.

By refining the pairs of images displayed, the experiment determines the threshold

at which the subject can only correctly discriminate the most in-focus image 82% of

the time. Similar experiments conducted by previous researchers have shown that

discrimination improves as a small amount of blur is added, but this trend reverses

as more blur is added. This dipper response is described in more detail in Section

2.2.

All experiments were conducted by the author. Additional subjects naive to the

purpose of the experiment and unfamiliar with the concept of the ‘dipper’ response

and psychophysics. Subjects were not colour blind, and required to self-declare

themselves as having had an eye test within the past 12 months (wearing any cor-

rection as deemed necessary by their optician).

The stimuli were presented centrally on a Dell P992 CRT monitor connected

to an ATI Radeon 9800 graphics card, via a video attenuator (as described by

Pelli [165]) which enables pseudo 12-bit control of pixel luminance, and so permitting

the compensation of nonlinearites in the presentation whilst retaining the ability to

display a full precise range of 256 grey levels. The monitor was switched on at least

two minutes before any experiments or calibration were performed, to minimise

any artefacts whilst stabilising. The equipment was calibrated linearly through its

luminance range by using a Minolta CS100 photometer, and applying compensation

by the use of a look up table.

The screen operated at resolution of 1024x768 pixels, with a refresh rate of

100Hz. Its visible area was 36.5cm by 27.4cm, and stimuli were 256x256 pixels in

size (8.7cm by 8.7cm). The part of the screen not displaying the stimuli had a mean

luminance of 44 cd/m2. Peak luminance was 112 cd/m2, and the darkest part of

the screen 0.5 cd/m2. The stimuli did not have the same mean intensity, and so the

mean luminance of the stimuli could differ from the peripherary of the screen. Low

ambient illumination was used, such that the screen was the brightest object in the

room.

Subjects were seated in a comfortable office chair in a fixed location so that they

were 116cm from the screen, equating to the stimuli subtending 4.3°. No chin rests

or other guides or restraints were used. The experiments were divided into intervals

of approximately ten minutes duration, between which subjects were encouraged to

relax, stretch their legs, and have refreshments if they desired. When no stimulus

was displayed on the screen, a small cross was shown at the centre of the screen

127

to provide a fixation target, helping the observers remain comfortable and ready to

observe the stimulus.

Images from the scenes used in the previous experiments (such as the chillis or

strawberries) are not suitable for these experiments. This is because they are not

uniformly in focus, and so observers might make decisions based on the underlying

blur present in the image (as a result of the 3D nature of the scene), rather than the

additional blur added as part of this experiment. Secondly, whilst the individual

images were acquired in sequence, no calibration to establish the precise change in

either focal distance or optical properties of the lens assembly was conducted. This

means that it is not possible to say that the additional blur between image 1 and

2 is equal to that between image 50 and 51. Instead, two scenes, both from the

Van Hateren image set (numbers 0005 and 1342, see Figure 3.4) were used for the

experiments in this chapter. Similarly, the Van Hateren images could not be have

been used in previous experiments, as they are uniformly in focus, and any blur that

is applied will appear to defocus them in the same direction. That is, there is no

depth information available with which to artificially reduce the depth of field and

then display the scene with a variety of focal positions, each with a narrow depth of

field.

The experiment required observers to discriminate between two images tempo-

rally separated. The original Van Hateren image was blurred to the baseline condi-

tion, σ = b, to create the first stimuli. The second stimuli was created by blurring

the original image to σ = b+ δb, where b represents the pedestal blur extent, and δb

the amount of additional blur being discriminated. Blur was added by convolving

the image with a gaussian kernel whose standard deviation, σ, was set to the desired

blur. From these blurred images, a randomly selected square subset of 256x256 pix-

els was extracted. The computer determined, randomly, which order to display the

two images. The QUEST [170] procedure was used to find the thresholds. It builds

a model of responses received, and uses them to place the next value of δb at the

mode of its current posteria probability distribution function – that is, at its current

threshold estimate. Both stimuli were displayed centrally on the screen, in the same

position, for 300ms separated by a 500ms interval during which the fixation target

was displayed.

Abrupt and artefact-free stimuli change, as well as accurate presentation dura-

tion, was possible by using the timing and graphics functions present in the Psy-

chophysics toolbox for Matlab [171]. This software library provides easy access to

OpenGL functions, which include the ability to prepare the stimuli in such a way

that it can be applied to the screen during a vertical retrace (the flip command).

128

Such a technique means that no ripping or tearing of the image is visible on the

screen; artefacts which would be present if the stimuli were directly copied into the

active framebuffer. Separately, because the onset happens at an exact time (coinci-

dent with a particular screen refresh cycle), the precise onset and duration of stimuli

presentation can be controlled.

The observers were instructed to record their opinion as to whether the first (or

second) image was most in focus after the pair of images had been displayed, by

pressing one of two buttons, and no time restriction was placed on their responses.

Audio feedback was provided to indicate whether the choice was correct. To ensure

that the observer was familiar with the process, and to help them settle into the

experiment, the first five responses for each condition were not used to refine the

QUEST model, and instead were discarded. These initial images also had a large

δb to help observers. Figures 6.2 and 6.3 shows a few typical stimuli that would be

encountered by an observer during a session.

Once the initial practice phase was completed, the observers’ response was com-

pared with the truth known by the software (for which there was no ambiguity as

blur was applied mathmatically). The QUEST model was then updated with the

knowledge of whether the observer correctly discriminated the additional blur, δb.

Pilot trials were conducted to determine indicative responses, which were used to

select the pedestal values, as well as to establish the typical rate at which observers

completed the experiment. Based on these trials, eleven pedestal values were selected

to be equally spaced (in log-space) between a gaussian kernel size of σ = 10−1 and

σ = 102.1667. This equates to a blur of between 0.1 and 151 arc minutes at the

specified viewing distance. An experimental condition comprised the pedestal value

and scene. Each observer was measured multiple times for a given condition.

To minimise habituation and help reduce observer fatigue, up to five different

conditions were interleaved and each experimental session lasted approximately ten

minutes. That is, when the experiment started, five conditions were selected, and

presented randomly to the observer. At the completion of the QUEST procedure

for a given condition, the software selected a new condition to start, providing no

more than eight minutes had elapsed since the start of the session. This meant

that observers completed each session after approximately 8-12 minutes, and that

at the end of a session there were no incomplete QUESTs. After a short break,

the observer started a new session, until all conditions were completed. For each

condition, an absolute minimum of 105 observations were recorded (three repetitions

per condition, with a minimum of 35 samples per run). Typically, around 160

observations per condition were actually recorded, as the QUEST procedure only

129

(a) Image 00005, 1.03 arcmin blur (b) Image 00005, 1.36 arcmin blur

(c) Image 00005, 4.78 arcmin blur (d) Image 00005, 5.53 arcmin blur

Figure 6.2: Typical stimuli shown to observers during the experiment described in Section6.1. This is image 00005 from the Van Hateren set. These particular stimuli are shown asthey demonstrate the discrimination thresholds determined for observer RAU. The imageson the left are the pedestal images, and the images on the right show the amount of extrablur that needed to be added before the observer could discriminate 82% of the timeunder the experimental conditions. The results are introduced in Section 6.2. The slightdarkening at the peripherary of each image is an artefact of the application of blur, asdescribed in Section 3.4.10. These edges were never shown to the observer, as in each triala random sub-portion of the image was selected, avoiding the edges. These images appearsignificantly darker in print – they are intended to be viewed on a linearly calibratedmonitor.

130

(a) Image 01342, 1.03 arcmin blur (b) Image 01342, 1.36 arcmin blur

(c) Image 01342, 4.78 arcmin blur (d) Image 01342, 6.00 arcmin blur

Figure 6.3: Typical stimuli shown to observers during the experiment described in Section6.1. This is image 01342 from the Van Hateren set. These particular stimuli are shown asthey demonstrate the discrimination thresholds determined for observer RAU. The imageson the left are the pedestal images, and the images on the right show the amount of extrablur that needed to be added before the observer could discriminate 82% of the timeunder the experimental conditions. The results are introduced in Section 6.2. The slightdarkening at the peripherary of each image is an artefact of the application of blur, asdescribed in Section 3.4.10. These edges were never shown to the observer, as in each triala random sub-portion of the image was selected, avoiding the edges. These images appearsignificantly darker in print – they are intended to be viewed on a linearly calibratedmonitor.

131

terminated after the earlier of 20 reversals or 75 stimulii responses being recorded.

6.2 Results

Four people, aged approximately 25-30, participated in the experiment – the author

and three observers. Each subject performed multiple QUEST convergences for each

pedestal condition – that is, during the experiment QUEST selected the candidate

δb, refining its estimate with each response. For the purposes of analysis, the indi-

vidual responses (a combination of δb, and whether the observer could discriminate

at that point) were collated and analysed with a bootstrapping procedure to es-

tablish the 82% threshold and the 95% confidence intervals of that discrimination

threshold. This was achieved by fitting a psychometric function using psignifit, a

software package which implements the maximum-likelihood method described by

Wichmann and Hill [175,176], to the data.

The discrimination threshold, and its confidence intervals, are shown in Figures

6.4 and 6.5. Overlaid on the raw thresholds are the results of fitting the data to

a contrast response function [182, p222] of the form in Equation 6.1, a hyperbolic

function, using the the rvc and fitit functions from the Matteobox Matlab toolbox

[183].

R(c) = kcm+n

σm + cm+ ac (6.1)

Representative images for one particular observer, RAU, and their blur discrim-

ination thresholds are shown in Figures 6.2 and 6.3. These figures show the images

from the pedestal position, and the additional blur that needed to be added for the

observer to correctly discriminate 82% of the time (under experimental conditions).

As was expected, and previously observed in simpler stimuli, dip shaped re-

sponses were obtained for many observer/image combinations, though this was not

universally observed in these experiments.

All but one observer/image combination show that discrimination is better when

b = 1 arcmin than at b = 0.1 arcmin. By inspection, discrimination improves by

approximately δb = 0.5 arcmin. Further, all observers show a continued reduction

in discrimination performance beyond a pedestal condition of b = 1 arcmin of blur.

One observer, NCA, shows the least dipper-like response – part of this is a result of

a poor fit by the contrast function to the underlying data (see Figure 6.4(d)), but

this does not explain the pattern of underlying results in Figure 6.5(d), for which

no explanation can be found.

As the experimental conditions were the same for all observers, the presence of

132

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103

Pedestal blur (arc minutes)

Thr

esho

ld b

lur

(arc

min

utes

)

(a) Subject RTS

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103

Pedestal blur (arc minutes)T

hres

hold

blu

r (a

rc m

inut

es)

(b) Subject RAU

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) Subject CAS

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) Subject NCA

Figure 6.4: Blur discrimination thresholds as a function of pedestal blur, for observersviewing image 01342 from the Van Hateren library. Each estimate of threshold was basedon at least three separate determinations (QUESTs) per measure, followed by a bootstrap-ping procedure to establish the 82% threshold and confidence intervals. This is overlaidwith the best fit to a contrast response function (Equation 6.1) computed with a leastsquares fitting.

133

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) Subject RTS

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


hres

hold

blu

r (a

rc m

inut

es)

(b) Subject RAU

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) Subject CAS

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) Subject NCA

Figure 6.5: Blur discrimination thresholds as a function of pedestal blur, for observersviewing image 00005 from the Van Hateren library. Each estimate of threshold was basedon at least three separate determinations (QUESTs) per measure, followed by a bootstrap-ping procedure to establish the 82% threshold and confidence intervals. This is overlaidwith the best fit to a contrast response function (Equation 6.1) computed with a leastsquares fitting.

134

these different results suggests that some confounding effect beyond the immediate

environment was likely to have been present. Alternative hypotheses, such as the

‘dip’ response not being present with certain subjects, whilst possible, are unlikely

given the absence of such results in previous research. Even discounting these un-

usual responses, a greater variability was found in these results overall than has

been published by previous researchers. This may be a result of the differences in

result selection methodology, discussed in Section 3.5.1 – in these experiments, all

data from all conditions presented to the observers is included; no data has been

excluded from this analysis.

Certain data points do not have an upper or lower confidence interval plotted

on the graph. This is an artifact of the bootstrap fitting routine which has been

unable to determine the confidence interval at that point. It is anticipated that

this could be addressed by discarding these results and re-measuring at the relevant

conditions.

6.3 Analysis

Qualitatively, the results presented here agree with Watt and Morgan’s findings

for simple edge blur: “blur comparison is most precise at some non-zero criterion

[pedestal] blur ... showing a decrease in threshold as criterion blur is increased from

zero to an optimum level, beyond which the threshold rises rapidly”.

By inspection, the measured discrimination thresholds do appear to vary more

between pedestals (as expected), than to vary with other parameters. To confirm

this, an analysis of variance (ANOVA) was carried out of the blur discrimination

using pedestal, image and observer, and their combinations, as factors. This is shown

in Table 6.1, from which it is confirmed that the blur discrimination is not correlated

with observers, but highly dependent (p < 0.001) on the pedestal condition.

Df Sum Sq Mean Sq F value Pr(>F)Participant 1 205.22 205.22 0.30 0.5840Image 1 2073.96 2073.96 3.05 0.0843Pedestal 1 26667.26 26667.26 39.28 0.0000Participant:Image 1 18.13 18.13 0.03 0.8706Participant:Pedestal 1 121.15 121.15 0.18 0.6738Image:Pedestal 1 5580.33 5580.33 8.22 0.0053Residuals 81 54991.97 678.91

Table 6.1: Anova for blur discrimination

135

Subject Image Detection Opt. pedestal (b) Opt. discrim. (δb)NCA 01342 0.96 0.70 0.36NCA 00005 0.84 0.70 0.81CAS 01342 0.49 0.48 0.08CAS 00005 0.49 0.48 0.11RAU 01342 0.61 1.50 0.23RAU 00005 0.92 1.03 0.34RTS 01342 0.48 0.48 0.23RTS 00005 0.51 0.48 0.19Mean 01342 0.65 0.81 0.23Mean 00005 0.71 0.69 0.37Watt [45] – 0.3 1-2.5 0.15-0.20Wuerger [181] – 1 0.5-1 0.3-0.5Walsh [52] Street 0.22D 1.5D 0.1DWalsh [52] Grating 0.16D 2D 0.15D

Table 6.2: Various key properties of the psychophysical responses were calculated fromthe bootstrapped results: Detection threshold, the point of best discrimination (in termsof pedestal value and actual discriminable blur at the optimum pedestal blur). All figureswithout units are in arc minutes. Equivalent results from previous papers (Watt andMorgan [45] and Wuerger [181]) are included either using figures from their respectivetexts, or reading from graphs. Walsh et al’s results, whilst not using the same units, areincluded here for comparison. Results from Burr et al are not included as they relate tomoving edges rather than static stimuli.

There is a smaller, second-order correlation with image and pedestal. This sec-

ondary relationship is similar to the observation made by Walsh that discrimination

thresholds vary with scene complexity. In this experiment the building scene (00005)

is more recognisable as a real-world scene by observers than the tree bark in the sec-

ond experiment, which when cropped and blurred as in the experimental conditions,

tends to look a little abstract.

Overall results are presented in Figure 6.6, which shows the discrimination

threshold determined for each observer and condition, together with a best-fit to

these data points. This shows a clear dip, followed by a monotonic increase in dis-

crimination with pedestal condition. Table 6.2 shows how these figures compare

with the individual responses, and that they are in agreement with observations by

Wuerger and similar to Watt and Morgan’s seminal results. The differences between

these results and those of other researchers is probably related to the extent of the

methodological (or stimuli) differences.

136

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103

Pedestal blur

Thr

esho

ld b

lur

(a) Image 01342

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103

Pedestal blur

Thr

esho

ld b

lur

(b) Image 00005

Figure 6.6: Overall psychophysical results: The points on these figures are the 82% dis-crimination thresholds determined from each observer at each pedestal condition. Overlaidis the best fit to the contrast response function in Equation (6.1).

137

6.4 Conclusions

The results show that blur discrimination sensitivity when viewing natural scenes

is optimal when a sharp image has been slightly blurred, rather than viewing the

original sharp image. These findings are in agreement with previously reported blur

discrimination thresholds for luminance edges, and extend the results to natural

scenes.

The peak discrimination ability occurs in natural scenes at around b = 0.75

arcmins, in the middle of the range of values reported by Wuerger. Discrimination

performance with natural scenes slightly exceed those reported in simpler stimuli

reported by Wuerger (δb was measured to be between 0.1 and 0.4 arc minutes in

all but one observer/scene combination, a little better than the range of 0.3-0.5

previously reported).

In conclusion, previously reported results in blur perception have been shown to

be present in natural scenes. If the focus measures discussed in previous chapters

are good models of human blur perception, then they should reproduce the results

achieved in this chapter. This is investigated in the following chapter.

138

Chapter 7

Model observers

In the preceeding chapters, various sets of data have been acquired from human ob-

servers, such that human ground truth can be established. Chapter 5 showed that

over all the scenes, only two focus measures are within the 95% confidence interval

of human opinions when identifying which image of a scene is best in focus, but

many more agreed when just one scene was excluded from the analysis. Separately,

ten measures were close to the focus curve determined from human observations.

Chapter 6 describes further experiments which constrain the task and viewing condi-

tions such that aspects of the underlying psychophysical response can be measured.

These techniques established the threshold for blur discrimination at a number of

different baseline conditions.

This chapter describes how focus measures are used as observers to replicate the

experiments described in Chapter 6 which were performed with human observers.

This will provide further criteria for selecting a focus measure which is a good

match to human observations, for it is hypothesised that the focus measure which

most accurately reproduces human results should reproduce the ‘dipper’ responses

found in the previous chapter.

To use focus measures as observers, a computer is used in place of the human

observer in the experiments described in the previous chapter. However, there are

several possible ways of doing this.

7.1 Stimuli and presentation

Firstly, as the human observers viewed the candidate images on a computer screen,

then perhaps the computer should perform the same task – viewing the stimuli via

an analog presentation and subsequent acquisition (eg via a camera). However, such

an arrangement would require significant calibration to parameterise the luminance

139

transfer function and PSF of the camera and its sensor.

The alternative is to perform the observations entirely within the computer, and

have no visual presentation but simply a digital transfer of the stimuli from the

software preparing it across to the program used to make a decision as to which of

the stimuli is most in focus.

Given that the human observers were using a linearly calibrated monitor to

ensure they were viewing the stimuli in the way in which it was intended by the

software, the closest parallel for the model observer is the second of the options

above – direct digital transfer of the image.

Beyond the image transfer, there are several other aspects of stimuli presentation

which apply to human observers and which are not modelled in this work. Firstly, to

minimise the confounding effect of higher level discrimination, presentation duration

is constrained with human observers. Secondly, stimuli were presented to humans

sequentially (ie temporally rather than spatially separated), such that no concurrent

comparison could be performed. Neither of these restrictions are necessary for the

computer models as each focus measure has knowledge of only one stimulus at a

time, and makes no attempt to understand the content.

7.2 Preliminary results

The focus measures were implemented into the software described in Section 6.1 by

replacing the request for user-input with a simple magnitude comparison between

the scores obtained by processing each candidate image pair with the measure under

test. Using this approach, psychophysical-type discrimination curves were obtained

for each measure, examples of which are shown in Figure 7.1.

Broadly, the results were found to fall into four groups. Firstly, for some mea-

sures, the observed discrimination was far from the expected dip shape. For exam-

ple, menmay showed far better discrimination with very large amounts of blur than

with sharp images, the opposite of the human observations. This indicates that

these measures are not good at modeling human blur discrimination.

A second group showed an improvement in discrimination as pedestal blur in-

creased. This improvement plateaued at a floor, showing that discrimination is never

better than a threshold of approximately δb = 0.02 arc minutes, beyond which dis-

crimination deteriorated as pedestal blur continued to increase. This low ability to

discriminate between highly blurred stimuli was far worse than human observers.

CPBD is one such example of a measure exhibiting this behaviour.

The third group (comprising the majority of measures) showed a dip response,

140

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) menmay

10−1

100

101

102

10−2

10−1

100

101

102

103


hres

hold

blu

r (a

rc m

inut

es)

(b) CPBD

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) groenvariance

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) thresholdedpixelcount

Figure 7.1: Discrimination curves generated using various focus measures as observers.The results in black are from five QUEST convergences per pedestal blur for the modelobservers considering Van Hateren image 01342. Grey lines highlight the range betweenmaximum and minimum convergence points. Human results for the same image are shownin red. The curve is the best fit to the bootstrapped human results, the range of which isshown as red vertical lines. These human results are those reported earlier in Figure 6.6.

141

with peak discrimination found in approximately the same conditions as for human

observers, but with discrimination in all conditions being equal to, or better than

humans. Again, many of the measures in this group exhibit a discrimination floor

at δb = 0.02 arc minutes. Finally, some measures, including thresholdedpixelcount

and randomnumber failed to produce any dip – discrimination performance in all

conditions was equally poor.

Overall, these responses indicate discrimination thresholds considerably different

to those determined from human observations with few exhibiting dips, and most

showing a significantly greater ability to discriminate blur than humans – the best

human discrimination is approximately δb = 0.1 arc minutes.

Reviewing the underlying scores upon which the focus measures were compared

in such minima showed that the absolute difference between the scores was typically

very very small in comparison to the scores themselves. The overly precise ability

of the computer algorithm to make such decisions is a result of the direct digital

transfer of the image from the generation to assessment functions of the software.

Combined with Quest’s ability to refine the stimuli to determine the additional

stimuli required to achieve 82% meant that the determined threshold was more a

result of the finite numerical precision in the calculations, than an underlying feature

of the measure under test.

That the floor was present at approximately δb = 0.02 is an artefact caused by

the manner in which blur is being applied to the candidate images. The images are

blurred by convolving the image with a specific kernel. As described in Section 6.1,

this is a gaussian kernel whose standard deviation is set to the desired blur. For large

blur extents this convolution takes a few seconds. So, to maximise the performance

of the software, thus minimising inter-stimuli delay for human observers (and elim-

inate any cues that might be inferred by such a delay), stimuli were pre-computed

with blur extents quantised at approximately one tenth of the peak discrimination

observed in preliminary trials.

Such a step size is far too large to fully explore the ultimate performance of

the focus measures to make discrimination. Indeed, whilst the group mean peak

discrimination for human observers was found to be δb = 0.23 arc minutes, fo-

cus measures can discriminate minute changes in blur. Many of the measures can

correctly discriminate when δb is 1000 times smaller than the peak human ability.

Overall, this means that focus measures can significantly out-perform human ob-

servers. Thus, to use them to model humans and obtain dipper responses matching

human results, a certain amount of performance degradation must be added.

142

7.3 Decision noise

By adding noise to the decision previously made simply on the basis of a magnitude

comparison of the raw scores, a degree of filtering is applied to the decision. The

effect of this is that only large differences in score will cause the same decision

to be made reliably for the same pair stimuli, thus reducing the performance of

the focus measures. The concept of decision noise was proposed by Mueller and

Weidemann [184] who observe that: “decision noise has frequently been ignored

because it often cannot be separated from perceptual noise ... underestimating the

level of perceptual sensitivity”.

There are several plausible explanations for decision noise in human observers

in the experiments in previous chapters. Human observers (by definition) need to

view the stimuli with their eyes, whose state is not permanently fixed. Thus, small

changes in its physical condition (such as accommodation or fixation), as well as

microscopic fluctuations within the eye, would be likely to effect slight changes to

the resultant image projected onto the retina. Such small changes might propagate

through the judgement process to affect the decision. That is, whilst the differences

occur earlier than the actual decision, they might be modelled as decision noise given

fixed stimuli.

A second explanation centres around the nature of the stimuli presentation.

Consecutive stimuli presentation depends upon the observer retaining a quantitative

impression of the first scene such that it can be compared to the second. It is

plausible that this degrades with time, and thus the decision accuracy might be

related to the magnitude of the difference, and inversely related to the time between

presentation and decision being made.

Beyond physical differences, human observers may also be influenced by other

higher level factors. For example, prior experience and personal interests could affect

scene interpretation. Similarly distraction is likely to affect an observer’s decision, be

that distraction in the form of regions of high visual attention, or others peripheral

to the stimuli such as fatigue.

Nevertheless, regardless of the biological justification for adding decision noise,

it solves an implementation problem where the focus measures are extremely sen-

sitive to minute changes in input, thus producing discrimination thresholds that

are a function of the numerical accuracy of Matlab and the measures’ implementa-

tion, rather than their actual performance. To simulate decision noise, the scores

produced by the focus measure under test were scaled by an arbitrary amount of

normally-distributed random noise (σ = 1; using Matlab’s randn function) such

that:

143

10−1

100

101

102

10−2

10−1

100

101

102

103

Thr

esho

ld b

lur

(arc

min

utes

)


0.0%0.2%0.6%2.0%5.0%9.0%13%18%36%100%

Noise factor

(a) groenvariance, 01342

10−1

100

101

102

10−2

10−1

100

101

102

103

Thr

esho

ld b

lur

(arc

min

utes

)


0.0%0.2%0.6%2.0%5.0%9.0%13%18%36%100%

Noise factor

(b) sml, 01342

Figure 7.2: Discrimination curves generated using the groenvariance and sml measureswith a selected range of different noise factors. The lines on this figure are straight linesbetween the data points to help the observer see the data sets, and are not meant to suggesta linear relationship inter-data point. The noise factor is the amount of multiplicative noiseadded.

score = score ∗ (1 + noisefactor ∗ randn) (7.1)

7.4 The impact of noise

To evaluate the impact of this noise, two focus measures were considered, groen-

variance and energylaplace5b, both measures which have performed well over the

tests described in earlier chapters. Whilst uing these measures as observers, a variety

of noise factors were added, between 10−4 and 1, to examine how the discrimination

curve changes with the addition of noise. The impact of increasing this decision

noise is shown in Figure 7.2.

Taking groenvariance, shown in Figure 7.2(a), the lowest line corresponds to the

situation when no noise is present, where the measured discrimination performance

is limited by the implementation. This plateaued minimum disappears once even a

small amount of noise is added, whereupon, the expected dip is present. However,

with the lowest added noise, the discrimination performance with high blur extents is

approximately the same as with no blur, which is not the case with human observers.

Only with additional decision noise does this measure start to exhibit a dipper shape

with similar attributes to human observers. As noise rises about 5%, the measure’s

performance reduces in the conditions with low blur. Beyond 15% noise, the dipper

shape itself starts to disappear.

In contrast, sml (Figure 7.2(b)), shows more clear defined dipper shapes over

144

a wider range of noise factors, but has equal performance when the pedestal blur

extent exceeds approximately 5 arc minutes. This is in contrast with human ob-

servers whose discrimination performance continues to deteriorate as blur increases,

suggesting that sml is not a good model of human blur discrimination.

To quantify the impact of noise, several metrics have been determined numeri-

cally from the groenvariance data shown in Figure 7.2, and are tabulated in Table

7.1, which also include equivalent values determined from human observations as

reported in the previous chapter. Specifically, the detection threshold is defined as

the discrimination when the smallest amount of blur is present. The optimal pedestal

is the point with the best discrimination at which point the discriminable blur is

the optimal discrimination. The exponent is the value of α for a line of the form

y = mxα between the trough of the dip and the discrimination threshold for the

largest pedestal measured. (N.B. If the trough was a plateaued minimum, the con-

dition with greatest pedestal blur was used.) The resultant equation for determining

α from these two points, (x1, y1) and (x2, y2) is:

α =log (y1/y2)

log(x1/x2)(7.2)

These four characteristics can be plotted against noise factor to see more clearly

how each varies with noise. Figure 7.3 shows that there is no amount of noise which

causes sml to coincide with the human mean – at no point are all the data points

close to the human results shown as horizontal lines. However, with approximately

15% noise (nf ≈ 10−0.8), groenvariance does coincide.

Using a Euclidean distance metric, the noise factor closest to human opinions

for each focus measure was computed. That is, for each measure and at each noise

factor, a similarity metric was computed by comparing the measures’ attributes

against human results. So, for the Van Hateren image 01342, the similarity measure

is:

similarity =√

(D − 0.50)2 + (OP − 0.63)2 + (OD − 0.37)2 + (E − 0.98)2 (7.3)

where equal weight was given to D the detectable blur, OP the pedestal blur

at the point of best performance at which δb = OD blur can be discriminated,

and E the exponent of the curve beyond point OP . The numeric constants are the

values obtained from human observations by inspection of Figure 6.6. An equivalent

equation was used for assessing measures’ similarity to human results with the other

scene, Van Hateren image 00005.

145

Noise Detection Opt. pedestal Opt. discrim. Exponent(b) (δb) (α)

0.00% 0.464 4.64 0.0104 1.690.01% 0.44 4.64 0.0104 1.690.02% 0.456 2.15 0.0104 1.270.03% 0.448 0.68 0.0104 0.9210.05% 0.45 0.68 0.0104 0.9210.08% 0.466 0.68 0.011 0.9050.13% 0.472 0.68 0.0183 0.7870.22% 0.449 0.68 0.0225 0.7380.36% 0.463 0.68 0.0327 0.650.60% 0.456 0.464 0.0513 0.531.00% 0.457 0.464 0.0577 0.6021.67% 0.459 0.464 0.0921 0.5822.78% 0.458 0.464 0.0516 0.7984.64% 0.487 0.464 0.0979 0.8567.74% 0.672 0.464 0.254 0.7512.9% 1.2 0.464 0.742 0.57221.5% 2.79 0.464 1.68 0.64735.9% 6.97 0.215 4.85 0.59759.9% 15.4 0.215 10.6 0.52100% 28.3 0.215 25.6 0.419Human mean 0.50 0.63 0.37 0.98

Table 7.1: Discrimination curves were generated for one focus measure, groenvariance,and their key attributes determined. The exponent is the value of α for a straight line ofthe form y = mxα between the trough of the dip and the discrimination threshold for thelargest pedestal measured. These results are for Van Hateren image 01342.

146

10−4

10−3

10−2

10−1

100

10−2

10−1

100

101

102

Noise factor

DetectionOpt PedOpt DiscrimExponent

(a) groenvariance, 01342

10−4

10−3

10−2

10−1

100

10−2

10−1

100

101

Noise factor

DetectionOpt PedOpt DiscrimExponent

(b) sml, 01342

Figure 7.3: Parameters which characterise the discrimination curves were determinedby analysis of the various curves generated using the groenvariance and energylaplace5bmeasures with a number of different noise factors. The horizontal lines on this figureindicate the results for human observers from Figure 6.6. The vertical scale is δb for thedetection and discrimination datasets, b for the optimal pedestal, and the value of α forthe exponent of the curve. These graphs show that there is no amount of noise whichcauses all measured parameters to coincide with human observations for sml, but with15% noise (nf ≈ 10−0.8), groenvariance does coincide. Results for Van Hateren 00005were very similar.

An alternative method for assessing similarity between human observers and

machine results, the Mean square error (MSE), was also computed. As with the

analysis described earlier in this chapter, the average human results from Figure 6.6

were taken and compared with each focus measure and noise factor combination.

7.5 Results

Using both the proposed similarity metric, and computing the MSE, each measure

was scored with each different noise factor. Thus, for each scene, a total of 910

results were ranked by the two metrics. These scores are summarised in Table 7.2

which shows the top combinations of measure and noise factor in terms of similarity

to human observations under these metrics.

Several interesting observations can be made from this data: Firstly, some mea-

sures appear multiple times in this list, which indicates that they remain very similar

to human opinions across a range of decision noise, and thus perhaps are more ro-

bust. Secondly, no measure scores highly across all ranks. With 4.6% noise, range

has the lowest average rank, though cranesum is an excellent fit to all but the sim-

ilarity metric for Van Hateren 00005. Overall, cranesum, range and smd all appear

four times in the list of top performing measures.

In summary, using focus measures as observers does produce the expected dip-

147

shaped discrimination curves, but decision noise needs to be added for these curves

to be close to human results. No single value of decision noise produces the best

performance from all measures. Comprehensive results are included in Appendix G

on page 208 which shows the model observer results for all focus measures operating

with the noise factor causing it to be most similar to human results.

7.6 Conclusions

All the focus measures from previous chapters were used as observers within the

blur discrimination experiment described in Chapter 6. Using this approach, psy-

chophysical discrimination curves were generated, which showed that not all mea-

sures yielded a dip-like response, the response that was desired given the human

results found in this work and work by others.

When decision noise was introduced, the discrimination curve changed shape.

With certain amounts of noise, discrimination curves for far more measures were of

Measure Noise factor R1 R2 R3 R4 Overallcranesum 7.7% 4 1 8 184 1cranesum 12.9% 1 3 2 140 1range 4.6% 3 30 1 10 1range 7.7% 9 56 92 1 1smd 12.9% 2 2 95 229 2voll4 12.9% 36 20 66 2 2chernfft 7.7% 13 53 4 3 3range 2.8% 8 35 3 82 3groenvariance 7.7% 11 37 15 4 4smd 7.7% 5 4 20 249 4cranesum 21.5% 7 32 87 5 5smd 2.8% 93 57 5 17 5triakis11s 1.7% 28 5 424 367 5absolutevariation 2.8% 16 27 14 6 6cranesum 4.6% 24 6 17 166 6smd 21.5% 6 28 156 205 6voll4 21.5% 95 69 6 87 6normalizedgroenvariance 7.7% 10 36 27 7 7range 0.4% 209 163 7 18 7triakis11s 2.8% 15 7 457 423 7

Table 7.2: The combinations of focus measure and noise factor in terms of similarity tohuman observations. Ranks 1 and 2 use MSE, whilst the similarity measure as proposed inEq (7.3) is used for Ranks 3 and 4. Ranks 1 and 3 are for Van Hateren image 01342, withimage 00005’s scores being shown in Ranks 2 and 4. The data is sorted such that all toprated measures are shown first, then second rated etc. Note that many measures appearmultiple times, demonstrating their high performance across a range of noise factors.

148

the expected dip shape. Several plausible explanations were offered to justify the

addition of noise.

To rank the ability of focus measures to match human results, several metrics

were evaluated, including a new similarity measure proposed in this work compares

four parameters of each measure’s discrimination curve with the mean human results

from the previous chapter, and produces a single similarity score.

Using this similarity metric, and by computing the MSE against human results,

the performance of each focus measure in conjunction with a range of noise factors

was assessed. This showed that there was not a single universal amount of noise that,

when added to all measures, yielded the most similar results to humans. Instead, a

specific noise-factor needed to be selected for each individual measure, which in this

work was made from a finite set of noise-factors. That is, whilst a good noise factor

was selected, no optimisation was performed.

Overall, the measures found to be most similar to human results in these ex-

periments are cranesum, smd and range. The first two match human results when

between 5% and 20% decision noise is added, whilst range matches results when

between 0.4% and 8% noise is added. That each performs well over a range of noise

factors indicates that they are not especially sensitive to this parameter.

Optimisation of the noise factor for each measure could be pursued, but given

the wide range of human results in the previous chapter, and the robustness of the

highly ranked focus, the precision with which the noise factor should be determined

is unclear. Secondly, the optimisation would be very computationally expensive.

It takes between one minute and ten hours to process each noise factor / measure

combination, depending on the measure. But, it must be re-stated that at no point

has computation time been used to judge measures, especially as one of the slowest

(triakis11s) was explicitly designed for dedicated hardware that is not present on

the general purpose computers used for processing in these experiments.

149

Chapter 8

Conclusions and future work

In the preceding chapters, various sets of data have been acquired from human

observers. This has captured the ground truth across a number of scenes, an indica-

tion of the shape of the human focus curve, and psychophysical blur discrimination

thresholds. These have been compared with a large number of focus measures dis-

covered in the literature and created during the course of this work.

Before this data was collected and analysed, existing methodologies were re-

viewed. The primary approaches taken in this work are broadly similar to those

existing methodologies described in the literature: Focus measures were scored used

the quantitative metric described by Groen, blur was computationally applied by

convolution with a gaussian kernel, and psychophysical blur-discrimination thresh-

olds were measured with a 2AFC experimental procedure with QUEST providing

estimates for each trial. However, some of the methodologies made assumptions

that warranted exploration, and those were the starting points of this work.

Existing quantitative comparisons of focus measures used a ground truth which

was established by using a ‘trained observer’, but did not comment on the reliability

or repeatability of this observation. The first experiment in this work used a large

number of observers to explore the nature of the ground truth. The results of

this experiment showed that humans are broadly consensual in their selection of the

‘best’ image, though there is a small spread of opinions. No subset of the population

could be found that made a different decision.

This ground truth was then used to score a large number of focus measures.

Existing measures were collated from an extensive body of published work and

implemented to a common software interface, alongside new measures created during

this work. This permitted one of the largest comparisons between focus measures

to be performed – over double the number of measures than compared in previous

review papers. Chapter 5 showed that over all the scenes, only two focus measures

150

are within the 95% confidence interval of human opinions when identifying which

image of a scene is best in-focus. Many more were within 95% for just five of the

six scenes considered.

Beyond establishing the ground truth, various methods (such as finding blur-

equivalence for candidate images) were employed to establish a focus curve from

human opinions. The measures that were accurate in their agreement with the

human-selected ‘best’ were not the same as those which fitted within this focus

curve measured from human opinions.

Psychometric blur discrimination thresholds were measured in naıve observers in

two calibrated natural images. Dipper-shaped responses, as had been reported by

other researchers looking at simple stimuli, were shown to be present in these natural

scenes. However, the confidence intervals were wider than reported in the literature.

It is postulated that this may be due to the more complex nature of the stimuli than

those used by other researchers, and also due to top-down influences that are more

likely to be present when viewing natural scenes than viewing synthetic images.

Focus measures were then used as model observers to see if any measure could

reproduce the blur discrimination thresholds obtained from human observers. None

of the measures evaluated could do this until an amount of decision noise was added.

No single amount of noise could be added which universally made focus measures

agree with humans. Using a new similarity metric, these discrimination thresh-

olds established with these observers were compared with human results, so as to

establish which measures were closest to human results.

Considering the actual results of the various experiments, Table 8.1 aggregates

the data from throughout the thesis, and shows the focus measures that performed

well in each assessment. Many measures (such as menmay and thesholdedpixelcount)

failed to perform well under any of the experiments. These measures rely on global

image properties that are not sufficiently prominent in the best image, either because

of pre-determined thresholds, or their basis on statistics that are not necessary to

be present for an image to be in-focus. Those that performed best are measures

whose score depends on small features; where the underlying mathematical function

computes a score for each pixel in the image based upon comparisons with nearby

pixels, and then produces an overall score by some form of aggregation. So, whilst

there was no single focus measure that performed especially well in all scenarios,

those that did perform well were those which process pixels locally in the spatial

domain.

The measures that best matched the modal human opinion of most in-focus im-

age when considering a range of scenes were tenengrad and cranepeak. Accordingly,

151

Measure Core function Accuracy R1 R2 Shapetenengrad Convolution Best 27 15 -cranepeak 2D gradient Best 32 16 -smd 2D gradient Good 2 4 Half-fitautocorrelation Autocorrelation Good 3 13 Half-fitvoll4 Autocorrelation Good 4 3 Half-fitwaveletw1 Wavelet transform Good 5 20 Half-fitwaveletw3 Wavelet transform Good 6 27 No fitsquaredgradient 1D gradient Good 7 29 Fitthresholdedabsolute-gradient

1D gradient Good 7 28 Fit

sml Convolution Good 10 36 Half-fitlaplace Convolution Good 11 18 Half-fitwaveletw2 Wavelet transform Good 14 24 No fittriakis11s Voxel statistic Good 16 7 Fitenergylaplace5c Convolution Good 18 21 Half-fitenergylaplace5b Convolution Good 20 22 No fitbrennergradient 1D gradient Good 22 23 Fitenergylaplace Convolution Good 23 26 No fitenergylaplace5a Convolution Good 24 25 No fitva Attention Good 29 17 Fitcranesum 2D gradient Poor 1 1 Half-fitalphaImageEnsemble Spectrum statistic Poor 9 37 Half-fitgroenvariance Statistical Poor 12 6 Fitabsolutevariation Statistical Poor 19 8 Fitvol5 Autocorrelation Poor 15 12 Fithlv Histogram Poor 33 33 -standarddeviation-basedautocorrelation

Autocorrelation Poor 16 14 Fits

range Statistical Poor 39 2 -chernfft Spectrum statistic Poor 24 5 -normalizedgroenvariance Statistical Poor 20 9 -

Table 8.1: Summary of top performing focus measures by the different assessment criteria.R1 is the rank of the measure’s focus curve (see Figure 5.2 on page 120). R2 is the rank ofthe measure’s ability to match human psychophysical results, as discussed in Section 7.5.The measures are sorted first by accuracy, then R1 – low numbers are better. Overall, nosingle measure performed well in all experiments.

152

these are likely to be the most appropriate measures of those reviewed to use in

a general purpose auto-focus application. Many other measures are accurate when

considering conventional photographs, but are sensitive to the artefacts present in

new imaging techniques, be that images created by ray tracing, or acquired with

compressive imaging.

These two best measures scored very poorly when assessed with a frequently used

focus scoring metric, suggesting that this existing methodology is not necessarily the

best way of identifying good focus measures. The poor score was due to nearby false

maxima on their overall focus curve. Such a deficit could be resolved by using these

measures for providing fine focus control having first found the approximate global

maxima using another measure that has fewer erroneous maxima, such as smd or

cranesum. An alternative solution would be to ensure the peak-finding logic was

robust when false maxima are present.

Blur discrimination measurements for naıve observers viewing calibrated natu-

ral scenes were within the range of results found by other researchers. The peak

discrimination ability was found at b = 0.37 arc minutes, where discrimination of

δb = 0.63 arc minutes was possible. The closest measures to reproducing these re-

sults were cranesum, range and smd. However, for this to happen, a degree of noise

needed to be added to the model.

8.1 Review of key research questions

This thesis suggested that it is possible to construct an algorithm that can accu-

rately replicate the results of perceptual and subjective experiments, and several

key research questions were proposed to help support this thesis. These are now

considered in turn to establish the extent to which the results described answers

them.

1. Can ground truth data be obtained that is suitable for testing focus

measures?

With a large population of observers, it was found that humans are broadly

consensual when asked to identify the ‘best’ image. No subset of the observers

was found to make a particular preference, across all scenes, that was different

to the modal result. It was possible to establish quantitative data against

which focus measures could be compared.

2. How well do focus measures perform when compared against the

ground truth?

153

A large number of focus measures were implemented from the literature,

and others devised during the course of this research (as listed on page 45).

These were all assessed with the existing methodology used in previous re-

views [97,101,96]. However, the lack of weight placed on the accuracy aspect

of this scoring metric perhaps reduces its value, and it is suggested that other

approaches might be better suited for comparing focus measures depending on

the intended application – some measures were found to be good at matching

the ground truth, whilst having a poor score under the existing methodology.

A class of images was identified which hindered the ability of focus measures

to identify the ‘best’ image, whilst not hindering human observers.

3. How can human blur opinions and perception be measured?

Various approaches were attempted to gather information from observers

which could be used to construct a focus curve. Based on feedback from

the experiments’ participants, the most comfortable task involved finding the

image that represented the mid-point (in terms of blur) between two target

images. By exploring successive mid-points, a focus curve was constructed.

In addition, a standard 2AFC blur discrimination experiment was conducted

to measure blur discrimination in natural scenes under natural PSF. Blur

discrimination thresholds were measured in four human observers, and found

to be of the expected shape. Confidence intervals were larger than reported

in previous work, and it is suggested that this is due to the complexity of the

stimuli.

4. Can human results be compared with those from focus measures?

Throughout this work, the results obtained from human observers were com-

pared with those produced by the individual focus measures. The ground-

truth focus results and human-derived focus curve were compared with the

measures to identify those that best reproduced the human results. The blur-

discrimination thresholds of the focus measures was established by using each

measure as an observer. When a degree of noise was added, many of the

measures produced a dip-shaped response typical of human psychophysical

discrimination. Noise was varied to match the results of these observers with

those of human observers. No single measure was able to reproduce human

results in all experiments.

The thesis statement is “It is hypothesised that it is possible to construct an

algorithm that accurately replicates human perceptual and subjective experimental

results”. Having evaluated 46 measures, the measure identified as best meeting this

154

hypothesis is smd. No measure to-date exactly reproduces psychophysical results

and at the same time performs accurately. Instead, different algorithms have been

shown to excel at particular tasks, with the best measures being those making

judgements based on small features in the spatial domain.

8.2 Significant findings

The work described in this thesis leads to several significant findings:

� Given a set of images taken at different focal distances, the ‘best’ in terms of

human opinion is not unanimous, and should be measured per scene. Specif-

ically, the opinion of a single observer should not be used when establishing

ground truth, but that modal response from a number of observers is prefer-

able.

� Overall, focus measures do work for most scenes, but not for scenes gener-

ated using PovRay. These are artificial, but likely to be representative of the

challenges that might be encountered in the future when using compressive

imaging. It seems necessary that a new approach to focus will be necessary to

process images generated with these latest image acquisition techniques. This

reinforces Vollath’s statement that “no proof has so far been provided that

any autofocus technique can operate reliably for every type of image” [105].

� The existing methodology for comparing focus measures arbitrarily puts equal

importance on a number of criteria, meaning some highly accurate measures

perform poorly. A different framework for comparing focus measures is intro-

duced, which discards focus measures which fall outside the 95% confidence

intervals of human observations.

� Blur discrimination thresholds have been measured in natural scenes subjected

to computationally blurring using a gaussian kernel, and are shown to be in

line with those found in simple stimuli. The confidence intervals on these

thresholds measured in natural scenes are wider than those in simple stimuli,

and it is suggested that this difference is due to top-down influences and merits

further study.

� Focus measures had not previously been used as observers to establish their

blur discrimination thresholds. The results reported here show that only some

are able to reproduce human discrimination behaviour, and that to do this

decision noise must be introduced.

155

8.3 Future work

The conclusions drawn from this work point towards a number of future research

questions which could be investigated. Firstly, it has been shown that dithering

affects the performance of focus measures. Are there other image artefacts which do

not affect human observers’ ability, but which cause difficulties for focus measures?

Are there focus measures which are immune to the artefacts likely to be present in

future novel imaging devices, and are these a better match to human results?

Further investigations could be conducted to understand what features of scenes

affect peoples’ performance when assessing blur. The use of eye-tracking equipment

would expose fixation points that might help identify the salient features. This

could be compared with a ‘bubbles’ approach to identify the regions of importance

when discriminating blur [185], which could be performed by both human observers

and candidate focus measures. Inter-observer behaviour could also be examined

to establish whether blur-perception is similar amongst all observers, or whether

top-down influences can affect blur discrimination thresholds – do people perform

differently looking at a lion than when they are looking at a tree?

Preliminary results measuring VEPs were reported in 1998. Since then the tech-

nologies available to monitor activity within the human body have become ever

more powerful. It might be possible to make measurements of the blur perception

and track how that influences the ciliary muscles as the subject is shown a variety

of stimuli. Such measurements could also be made in conjunction with eye tracking

devices to further understand what aspects of images affect accommodation, from

which an improved model of accommodation could be devised.

Finally, a valuable contribution of this work is the implementation of a large

number of focus measures. To assist others in future work, these have been shared

online as Matlab files, together with the images that were acquired during this

work [186].

156

Bibliography

[1] R. T. Shilston and F. W. M. Stentiford, “Method for focus control,” US Patent

Application 20090310011, June 2007.

[2] B. T. Wagner and D. Kline, “The bases of colour vision,” http://www.psych.

ucalgary.ca/pace/va-lab/Brian/history.htm, 2002.

[3] P. Findlen and R. Bence, “A history of the eye,” http://www.stanford.edu/

class/history13/earlysciencelab/body/eyespages/eye.html, 1999.

[4] E. G. Boring, Sensation and perception in the history of experimental psychol-

ogy, New York : Appleton-Century-Crofts, 1942.

[5] eduMedia, “Focusing via visual accommodation,” http://www.

edumedia-sciences.com/en/a306-focusing-via-visual-accommodation,

Nov 2010.

[6] S. J. Judge and D. I. Flitcroft, Nervous Control of the Eye, chapter Control

of Accommodation, pp. 93–115, Harwood Academic Publishers, 2000.

[7] E. F. Fincham, “The accommodation reflex and its stimulus,” British Journal

of Opthalmology, vol. 35, pp. 381–393, 1951.

[8] E. A. Walkton Ball, “A study in consensual accommodation,” American

Journal of Optometry and Archives of American Academy of Optometry, vol.

29, no. 11, pp. 561–574, 1952.

[9] D. I. Flitcroft, S. J. Judge, and J. W. Morley, “Binocular interactions in

accommodation control: effects of anisometropic stimuli,” Journal of Neuro-

science, vol. 12, no. 1, pp. 188–203, 1992.

[10] Marran L. and Schor C.M., “Lens induced aniso-accommodation,” Vision

Research, vol. 38, pp. 3601–3619(19), November 1998.

157

http://www.psych.ucalgary.ca/pace/va-lab/Brian/history.htm

http://www.psych.ucalgary.ca/pace/va-lab/Brian/history.htm

http://www.stanford.edu/class/history13/earlysciencelab/body/eyespages/eye.html

http://www.stanford.edu/class/history13/earlysciencelab/body/eyespages/eye.html

http://www.edumedia-sciences.com/en/a306-focusing-via-visual-accommodation

http://www.edumedia-sciences.com/en/a306-focusing-via-visual-accommodation

[11] P. B. Kruger and J. Pola, “Stimuli for accommodation: blur, chromatic aber-

ration and size,” Vision Research, vol. 26, pp. 957–971, 1986.

[12] W. Taylor and H. W. Lee, “The development of the photographic lens,”

Proceedings of the Physical Society, vol. 47, no. 3, pp. 502–518, 1935.

[13] P. B. Kruger, S. Mathews, M. Katz, K. R. Addarwala, and S. Nowbotsing,

“Accommodation without feedback suggests directional signals specify ocular

focus,” Vision Research, vol. 37, no. 18, pp. 2511–2526, 1997.

[14] T. Niemann, “Ca tca00.jpg,” http://wiki.panotools.org/Chromatic_

aberration, July 2006, [accessed 13-October-2008].

[15] H. D. Crane, “A theoretical analysis of the visual accommodation system in

humans,” Tech. Rep. NASA CR-606, NASA - Ames Research Centre, Sep

1966.

[16] A. Hong-Chen, “Is there any difference in using blur as a stimulus for accom-

modation between emmetropes and myopes?,” Journal Documenta Ophthal-

mologica, vol. 105, no. 1, pp. 33–39, July 2002.

[17] F. W. Campbell, “The minimum quantity of light required to elicit the ac-

commodation reflex in man,” J Physiol, vol. 123, no. 2, pp. 357–366, 1954.

[18] Oxford University Press, “focus, n.,” The Oxford English Dictionary, 1989.

[19] A. P. Pentland, “A new sense for depth of field,” IEEE Transactions on

Pattern Analysis and Machine Intelligence, vol. 9, no. 4, pp. 523–531, 1987.

[20] D. H. Fender, “Control mechanisms of the eye,” Scientific American, vol. 211,

pp. 24–33, July 1964.

[21] M. Concetta Morrone and D. C. Burr, “Feature detection in human vision: A

phase-dependent energy model,” Proceedings of the Royal Society of London.

Series B, Biological Sciences, vol. 235, no. 1280, pp. 221–245, 1988.

[22] P. Kovesi, “Image features from phase congruency,” Videre: Journal of Com-

puter Vision Research, vol. 1, no. 3, pp. 2–26, 1999.

[23] Z. Wang, E. P. Simoncelli, and H. Hughes, “Local phase coherence and

the perception of blur,” in Advanced Neural Information Processing Systems

(NIPS03. 2004, pp. 786–792, MIT Press.

158

http://wiki.panotools.org/Chromatic_aberration

http://wiki.panotools.org/Chromatic_aberration

[24] A. Troelstra, B. L. Zuber, D. Miller, and L. W. Stark, “Accommodative

tracking: A trial-and-error function,” Vision Research, vol. 4, pp. 585–594,

1964.

[25] L. Stark and Y. Takahashi, “Absence of an odd-error signal mechanism in

human accommodation,” IEE Transactions on Bio-medical Engineering, vol.

12, no. 3/4, pp. 138–146, 1965.

[26] S. Philips and L. Stark, “Blur: A sufficient accommodative stimulus,” Docu-

menta Ophthahnologica, vol. 1, no. 43, pp. 65–89, 1977.

[27] J.A. Marshall, C.A. Burbeck, D. Ariely, J.P. Rolland, and K.E. Martin, “Oc-

clusion edge blur: A cue to relative visual depth,” April 1996.

[28] G. Mather, “Image Blur as a Pictorial Depth Cue,” Royal Society of London

Proceedings Series B, vol. 263, pp. 169–172, Feb. 1996.

[29] W. H Ittelson and A. Ames Jr, “Accommodation, convergence, and their

relation to apparent distance,” The Journal of Psychology, vol. 30, pp. 43–62,

1950.

[30] J. N. Trachtman, V. Giambalvo, and J. Feldman, “Biofeedback of accommo-

dation to reduce functional myopia.,” Biofeedback and self-regulation, vol. 6,

no. 4, pp. 547–562, 1981.

[31] P. D. R. Gamlin, Nervous Control of the Eye, chapter Functions of the

Edinger-Westphal Nucleus, pp. 117–154, Harwood Academic Publishers, 2000.

[32] T. Sugimoto, K. Itoh, and N. Mizuno, “Direct projections from the ediger-

westphal nucleus to the cerebellum and spinal cord in the cat: An hrp study,”

Neuroscience Letters, vol. 9, no. 1, pp. 17 – 22, 1978.

[33] G. K. Hung, K.J. Ciuffreda, and B.C. Jiang, Models of the visual system,

chapter Models of accommodation, pp. 287–340, Kluwer Academic Publish-

ers/Plenum Publishers, 2002.

[34] F. Fylan and M. G. A. Thomson, “Visual evoked responses to blur in natural

scenes,” in Perception 27 ECVP Abstract Supplement, 1998.

[35] S. M. Shinners, Modern control system theory and design, J. Wiley, 1998.

[36] B. Winn, J. R. Pugh, B. Gilmartin, and H. Owens, “Arterial pulse modulates

steady-state ocular accommodation,” Current Eye Research, vol. 9, pp. 971–

975, October 1990.

159

[37] W. N. Charman and G. Heron, “Fluctuations in accommodation: a review,”

Ophthalmic and Physiological Optics, vol. 8, no. 2, pp. 153–164, 1988.

[38] F. M. Toates, “A model of accommodation,” Vision Research, vol. 10, pp.

1069–1076, 1970.

[39] L. Stark, Y. Takahashi, and G. Zames, “Nonlinear servoanalysis of human lens

accommodation,” IEEE Transactions on Systems Science and Cybernetics,

vol. 1, no. 1, pp. 75–83, 1965.

[40] M. Khosroyani and G. K. Hung, “A dual-mode dynamic model of the human

accommodation system,” Bulletin of Mathematical Biology, vol. 64, pp. 285–

299, 2002.

[41] A. S. Eadie and P. J. Carlin, “Evolution of control system models of ocu-

lar accommodation, vergence and their interaction,” Medical and Biological

Engineering and Computing, vol. 33, pp. 517–524, 1995.

[42] F. W. Campbell and J. J. Nachmias, “Spatial-frequency discrimination in

human vision,” Journal of the Optical Society of America (1917-1983), vol.

60, no. 4, pp. 555–559, Apr. 1970.

[43] D. Marr and E. Hildreth, “Theory of edge detection,” Proceedings of the

Royal Society of London: Series B, Biological Sciences, vol. 207, no. 1167, pp.

187–217, Feb. 1980.

[44] J. R. Hamerly and C. A. Dvorak, “Detection and discrimination of blur in

edges and lines,” J. Opt. Soc. Am., vol. 71, no. 4, pp. 448–452, 1981.

[45] R. J. Watt and M. J. Morgan, “The recognition and representation of edge

blur: Evidence for spatial primitives in human vision,” Vision Research, vol.

23, no. 12, pp. 1465–1477, 1983.

[46] D. M. Levi and S. A. Klein, “Equivalent intrinsic blur in spatial vision,”

Vision Research, vol. 30, no. 12, pp. 1971–1993, 1990.

[47] D. M. Levi and S. A. Klein, “Equivalent intrinsic blur in amblyopia,” Vision

Research, vol. 30, no. 12, pp. 1995–2022, 1990.

[48] G. Westheimer, “Sharpness discrimination for foveal targets,” Journal of the

Optical Society of America A, vol. 8, no. 4, pp. 681–685, 1991.

160

[49] G. Mather, “The use of image blur as a depth cue,” Perception, vol. 26, pp.

1147–1158, 1997.

[50] G. Mather and D. R. Smith, “Depth cue integration: stereopsis and image

blur,” Vision Res., vol. 40, pp. 3501–3506, 2000.

[51] R. J. Jacobs, G. Smith, and C. D. Chan, “Effect of defocus on blur thresholds

and on thresholds of perceived change in blur – comparison of source and

observer methods,” Optometry and Vision Science, vol. 66, pp. 545–553, 1989.

[52] G. Walsh and W. N. Charman, “Visual sensitivity to temporal change in focus

and its relevance to the accommodation response,” Vision Research, vol. 28,

no. 11, pp. 1207–1221, 1988.

[53] D. I. Flitcroft, “Accommodation and flicker: evidence of a role for temporal

cues in accommodation control?,” Ophthalmic and Physiological Optics, vol.

11, no. 1, pp. 81–90, 1991.

[54] A. K. Paakkonen and M. J. Morgan, “Effects of motion on blur discrimina-

tion,” Journal of the Optical Society of America A, vol. 11, pp. 992–1002,

Mar. 1994.

[55] S. T. Hammett, “Motion blur and motion sharpening in the human visual

system,” Vision Research, vol. 37, no. 18, pp. 2505–2510, 1997.

[56] D. C. Burr and M. J. Morgan, “Motion deblurring in human vision,” Royal

Society of London Proceedings Series B, vol. 264, pp. 431–436, Mar. 1997.

[57] V. Kayargadde and J. B. Martens, “Estimation of edge parameters and image

blur using polynomial transforms,” CVGIP: Graphical Models and Image

Processing, vol. 56, no. 6, pp. 442–461, 1994.

[58] V. Kayargadde and J. B. Martens, “Perceptual characterization of images

degraded by blur and noise: model,” J. Opt. Soc. Am. A, vol. 13, no. 6, pp.

1178, 1996.

[59] V. Kayargadde and J. B. Martens, “Estimation of perceived image blur using

edge features,” International Journal of Imaging Systems and Technology, vol.

7, no. 2, pp. 102–109, 1996.

[60] E. Peli, “Feature detection algorithm based on a visual system model,” Pro-

ceedings of the IEEE, vol. 90, no. 1, pp. 78–93, January 2002.

161

[61] R. Ferzli and L. J. Karam, “A human visual system-based model for

blur/sharpness perception,” in International Workshop on Video Processing

and Quality Metrics for Consumer Electronics (VPQM), 2006.

[62] R. Ferzli and L. J. Karam, “A no-reference objective image sharpness metric

based on the notion of Just Noticeable Blur (JNB),” IEEE Transactions on

Image Processing, vol. 18, no. 4, pp. 717–728, April 2009.

[63] N. Narvekar and L. Karam, “An improved no-reference sharpness metric based

on the probability of blur detection,” in International Workshop on Video

Processing and Quality Metrics for Consumer Electronics (VPQM), January

2010.

[64] P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, “A no-reference

perceptual blur metric,” in Proc. IEEE International Conference on Image

Processing, 2002, vol. 3, pp. 57–60.

[65] P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, “Perceptual blur

and ringing metrics: application to JPEG2000,” Signal Processing: Image

Communication, vol. 19, pp. 163–172, 2004.

[66] F. W. Campbell, “The physics of visual perception,” Royal Society of London

Philosophical Transactions Series B, vol. 290, pp. 5–9, July 1980.

[67] S. A. Klein, “Channels in the visual nervous system: Neurophysiology, psy-

chophysics and models,” in Channels: Bandwidth, channel independence, de-

tection vs discrimination, B. Blum, Ed. 1992, pp. 11–27, London: Freund.

[68] V. A. Billock, “Neural acclimation to 1/f spatial frequency spectra in natu-

ral images transduced by the human visual system,” Physica D: Nonlinear

Phenomena, vol. 137, no. 3-4, pp. 379 – 391, 2000.

[69] D. J. Field, “Relations between the statistics of natural images and the re-

sponse properties of cortical cells,” J. Opt. Soc. Am. A, vol. 4, no. 12, pp.

2379–2394, 1987.

[70] C. R. Carlson, “Thresholds for perceived image sharpness,” Photographic

science and engineering, 1978.

[71] D. C. Knill, D. Field, and D. Kersten, “Human discrimination of fractal

images,” Journal of the Optical Society of America A, vol. 7, no. 6, pp. 1113–

1123, 1990.

162

[72] D. J. Field and N. Brady, “Visual sensitivity, blur and the sources of variability

in the amplitude spectra of natural scenes,” Vision Research, vol. 37, no. 23,

pp. 3367–3383, 1997.

[73] Y. Tadmor and D. J. Tolhurst, “Discrimination of changes in the second-order

statistics of natural and synthetic images,” Vision Research, vol. 34, no. 4,

pp. 541–554, 1994.

[74] D. J. Tolhurst and Y. Tadmor, “Discrimination of changes in the slopes of the

amplitude spectra of natural images: band-limited contrast and psychometric

functions,” Perception, vol. 26, no. 8, pp. 1011–1025, 1997.

[75] Ian BC North on flickr, “Blinds,” http://www.flickr.com/photos/

91997797@N00/4057584742/, October 2009.

[76] larryvincent on flickr, “Large blinds,” http://www.flickr.com/photos/

larryvincent/3122806255/sizes/l/, December 2008.

[77] C. A. Parraga and D. J. Tolhurst, “The effect of contrast randomisation on

the discrimination of changes in the slopes of the amplitude spectra of natural

scenes,” Perception, vol. 29, pp. 1101–1116, 2000.

[78] M. G. A. Thomson and D. H. Foster, “Role of second- and third-order statistics

in the discriminability of natural images,” J. Opt. Soc. Am. A, vol. 14, no. 9,

pp. 2081–2090, 1997.

[79] C. A. Parraga, T. Troscianko, and D. J. Tolhurst, “The effects of amplitude-

spectrum statistics on foveal and peripheral discrimination of changes in natu-

ral images, and a multi-resolution model,” Vision Research, vol. 45, no. 25-26,

pp. 3145 – 3168, 2005.

[80] M. P. S. To, I. D. Gilchrist, T. Troscianko, J. S. B. Kho, and D. J. Tolhurst,

“Perception of differences in natural-image stimuli: Why is peripheral viewing

poorer than foveal?,” ACM Trans. Appl. Percept., vol. 6, no. 4, pp. 26, 2009.

[81] B. Wang and K. J. Ciuffreda, “Foveal blur discrimination of the human eye,”

Opthal. Physiol. Opt, vol. 25, pp. 45–51, 2005.

[82] B. Wang and K. J. Ciuffreda, “Blur discrimination of the human eye in the

near retinal periphery,” Optometry and Vision Science, vol. 82, no. 1, pp.

52–58, January 2005.

163

http://www.flickr.com/photos/91997797@N00/4057584742/


http://www.flickr.com/photos/larryvincent/3122806255/sizes/l/

http://www.flickr.com/photos/larryvincent/3122806255/sizes/l/

[83] G. Westheimer, S. Brincat, and C. Wehrhahn, “Contrast dependency of foveal

spatial functions: orientation, vernier, separation, blur and displacement dis-

crimination and the tilt and Poggendorff illusions,” Vision Research, vol. 39,

pp. 1631–1639, 1999.

[84] S. M. Wuerger, H. Owens, and S. Westland, “Blur tolerance in different

colour directions,” in International Conference on Color in Graphics and

Image Processing – CGIP2000, 2000.

[85] M. A. Webster, S. M. Webster, J. MacDonald, and S. R. Bahradwadj, “Adap-

tation to blur,” Human Vision and Electronic Imaging VI, vol. 4299, no. 1,

pp. 69–78, 2001.

[86] S. M. Webster, M. A. Webster, and J. Taylor, “Simultaneous blur contrast,”

Human Vision and Electronic Imaging VI, vol. 4299, no. 1, pp. 414–422, 2001.

[87] M. A. Webster, M. A. Georgeson, and S. M. Webster, “Neural adjustments

to image blur,” Nature Neuroscience, vol. 5, no. 9, pp. 839–840, 2002.

[88] A. C. Bilson, Y. Mizokami, and M. A. Webster, “Visual adjustments to tem-

poral blur,” J. Opt. Soc. Am. A, vol. 22, no. 10, pp. 2281–2288, 2005.

[89] M. A. Webster, Y. Mizokami, L. A. Svec, and S. L. Elliott, “Neural adjust-

ments to chromatic blur,” Spatial Vision, vol. 19, no. 2-4, pp. 111–132, 2006.

[90] D. M. Chandler, K. H. Lim, and S. S. Hemami, “Effects of spatial correlations

and global precedence on the visual fidelity of distorted images,” in Human

Vision and Electronic Imaging XI. Edited by Rogowitz, Bernice E.; Pappas,

Thrasyvoulos N.; Daly, Scott J. Proceedings of the SPIE, Volume 6057, pp.

131-145 (2006)., B. E. Rogowitz, T. N. Pappas, and S. J. Daly, Eds., Feb.

2006, vol. 6057 of Presented at the Society of Photo-Optical Instrumentation

Engineers (SPIE) Conference, pp. 131–145.

[91] E. Vansteenkiste, D. Van der Weken, W. Philips, and E. Kerre, Perceived

Image Quality Measurement of State-of-the-Art Noise Reduction Schemes, pp.

114–126, Springer Berlin / Heidelberg, 2006.

[92] A. Erteza, “Sharpness index and its application to focus control,” Appl. Opt.,

vol. 15, no. 4, pp. 877–881, 1976.

[93] R. A. Muller and A. Buffington, “Real-time correction of atmospherically

degraded telescope images through image sharpening,” J. Opt. Soc. Am., vol.

64, no. 9, pp. 1200–1210, 1974.

164

[94] N. K. Chern, P. A. Neow, and Jr. Ang, M.H., “Practical issues in pixel-based

autofocusing for machine vision,” Robotics and Automation, 2001. Proceedings

2001 ICRA. IEEE International Conference on, vol. 3, pp. 2791 – 2796 vol.3,

2001.

[95] D. Vollath, “The influence of the scene and of noise on automatic focussing,”

Journal of Microscopy, vol. 151, no. S, pp. 133–146, 1988.

[96] F. C. Groen, I. T. Young, and G. Ligthart, “A comparison of different focus

functions for use in autofocus algorithms,” Cytometry, vol. 6, no. 2, pp. 81–91,

1985.

[97] Y. Sun, S. Duthaler, and B. Nelson, “Autofocusing in computer microscopy:

Selecting the optimal focus algorithm,” Microscopy Research and Technique,

vol. 65, no. 3, pp. 139–149, 2004.

[98] E. Krotkov, “Focusing,” Intl. Journal of Computer Vision, vol. 1, no. 3,

pp. 223–237, October 1987, Reprinted in Physics-Based Vision, Jones and

Bartlett, 1992.

[99] A. Santos, C. Ortiz de Solorzano, J. J. Vaquero, J. M. Pena, N. Malpica,

and F. del Pozo, “Evaluation of autofocus functions in molecular cytogenetic

analysis,” Journal of Microscopy, vol. 188, no. 3, pp. 264–272, 1997.

[100] M. L. Mendelsohn and B. H. Mayall, “Computer-oriented analysis of human

chromosomes–iii. focus,” Computers in Biology and Medicine, vol. 2, no. 2,

pp. 137 – 138, IN9–IN10, 139–145, 146–150, 1972, Chromosome Analysis.

[101] L. Firestone, K. Cook, K. Culp, N. Talsania, and K. Preston Jr, “Comparison

of autofocus methods for automated microscopy,” Cytometry, vol. 12, no. 3,

pp. 195–206, 1991.

[102] Carl Zeiss, “Method of and device for the automatic focusing of microscopes,”

US Patent 3721759, March 1973.

[103] D. C. Mason and D. K. Green, “Automatic focusing of a computer-controlled

microscope,” Biomedical Engineering, IEEE Transactions on, vol. BME-22,

no. 4, pp. 312 –317, July 1975.

[104] R. A. Jarvis, “Focus optimization criteria for computer image processing,”

Microscope, vol. 24, no. 2, pp. 163–180, 1976.

165

[105] D. Vollath, “Automatic focusing by correlative methods,” Journal of Mi-

croscopy, vol. 147, no. 3, pp. 279–288, 1987.

[106] S. K. Nayar, “Shape from focus,” M.S. thesis, Carnegie Mellon University,

1989.

[107] E. Peli, “Contrast in complex images,” J. Opt. Soc. Am. A, vol. 7, no. 10, pp.

2032–2040, Oct 1990.

[108] G. Yang and B. J. Nelson, “Wavelet-based autofocusing and unsupervised

segmentation of microscopic images,” Intelligent Robots and Systems, 2003.

(IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, vol.

3, pp. 2143 – 2148 vol.3, oct. 2003.

[109] J Caviedes and F. Oberti, “A new sharpness metric based on local kurtosis,

edge and energy information,” Signal Processing: Image Communication, vol.

19, pp. 147–161, 2004.

[110] R. Shilston and F. W. M. Stentiford, “An attention based focus control sys-

tem,” in Image Processing, 2006 IEEE International Conference on, Vol.,

Iss., October 2006, pp. 425–428.

[111] M. Subbarao, T. Choi, and A. Nikzad, “Focusing techniques,” Journal of

Optical Engineering, vol. 32, pp. 2824–2836, 1993.

[112] AccelerEyes, “Success stories for jacket,” http://www.accelereyes.com/

successstories, March 2010.

[113] R. J. Watt and M. J. Morgan, “A theory of the primitive spatial code in

human vision,” Vision Research, vol. 25, pp. 1661–1674, 1985.

[114] P. J. Bex and S. Murray, “Blur perception and discrimination in naturally-

contoured images,” in Association for Research in Vision and Opthalmology

2010 conference: For sight – The future of eye vision and research., 2010.

[115] S. Murray and P. J. Bex, “Perceived blur in naturally-contoured images de-

pends on phase,” Frontiers in Psychology, vol. 1, no. 0, Jun 2010.

[116] A. Roorda, F. Romero-Borja, III W. Donnelly, H. Queener, T. Hebert, and

M. Campbell, “Adaptive optics scanning laser ophthalmoscopy,” Opt. Express,

vol. 10, no. 9, pp. 405–412, May 2002.

166

http://www.accelereyes.com/successstories

http://www.accelereyes.com/successstories

[117] R. Navarro, E. Moreno, and C. Dorronsoro, “Monochromatic aberrations and

point-spread functions of the human eye across the visual field,” J. Opt. Soc.

Am. A, vol. 15, no. 9, pp. 2522–2529, 1998.

[118] G. L. Ligthart and Groen F. C. A., “A comparison of different autofocus

algorithms,” in International Conference on Pattern Recognition, 1982, pp.

597–600.

[119] R. Leggat, “A history of photography from its beginnings till the 1920s,”

http://www.rleggat.com/photohistory/, 1995, [accessed 02-May-2007].

[120] P. Greenspun, “History of photography timeline,” http://photo.net/

history/timeline, January 2007, [accessed 02-May-2007].

[121] M. Minsky, “Memoir on inventing the confocal scanning microscope,” Scan-

ning, vol. 10, pp. 128–138, 1988.

[122] L. Byoung-kwon and K. Hyon-soo, “Focusing method for digital photograph-

ing apparatus,” US Patent Application 2005185082, August 2005.

[123] Canon, “Canon launches first powershot a-series camera with optical image

stabilization,” Press Release, August 2006, [accessed 06-January-2007].

[124] Canon, “Digital ixus i7 zoom,” http://www.canon-europe.com/For_Home/

Product_Finder/Cameras/Digital_Camera/IXUS/Digital_IXUS_I7_zoom/

index.asp?ComponentID=393526\&SourcePageID=231640, 2006, [accessed

06-January-2007].

[125] Canon, “Digital ixus i7 advanced camera user guide,” 2006.

[126] Nikon Corporation, “Nikon face-priority AF,” http://www.nikonimaging.

com/global/news/2005/pdf/FacepriorityAFnr.pdf, February 2005.

[127] Steve’s Digicam Online Inc, “Canon EOS Digital Rebel XSi,” http://www.

steves-digicams.com/2008_reviews/canon_rebel_xsi.html, May 2008.

[128] P. Porntrakoon, D. Kesrarat, and J. Daengdej, “A model for auto-focus us-

ing location detection based on sound localization,” Applied Simulation and

Modelling, Proceedings of, June 2004.

[129] H. Jung, J. Bae, S. Suh, and H. Yi, “Use of TRIZ to develop a novel auto-focus

camera module,” Triz Journal, August 2006.

167

http://www.rleggat.com/photohistory/

http://photo.net/history/timeline

http://photo.net/history/timeline

http://www.canon-europe.com/For_Home/Product_Finder/Cameras/Digital_Camera/IXUS/Digital_IXUS_I7_zoom/index.asp?ComponentID=393526\&SourcePageID=231640



http://www.nikonimaging.com/global/news/2005/pdf/FacepriorityAFnr.pdf

http://www.nikonimaging.com/global/news/2005/pdf/FacepriorityAFnr.pdf

http://www.steves-digicams.com/2008_reviews/canon_rebel_xsi.html

http://www.steves-digicams.com/2008_reviews/canon_rebel_xsi.html

[130] M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar,

K. F. Kelly, and R. G. Baraniuk, “An architecture for compressive imaging,”

in Image Processing, 2006 IEEE International Conference on, Oct. 2006, pp.

1273–1276.

[131] P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps

from photographs,” Computer Graphics, vol. 31, no. Annual Conference Series,

pp. 369–378, 1997.

[132] Fujifilm, “Further progress of the Super CCD,” ttp://www.

fujifilmholdings.com/en/pdf/investors/annual_report/ff_ar_2003_

part_004.pdf, March 2003.

[133] P. Askey, “Fujifilm FinePix S3 Pro review,” http://www.dpreview.com/

reviews/fujifilms3pro, March 2005.

[134] Hewlett Packard, “HP Adaptive Lighting technology,” Tech. Rep., Hewlett-

Packard, 2004.

[135] Hewlett Packard, “HP Photosmart digital cameras HP Real Life technologies,”

Tech. Rep., Hewlett-Packard, 2004.

[136] R. Ng, M. Levoy, M. Brdif, G. Duval, M. Horowitz, and P. Hanrahan, “Light

field photography with a hand-held plenoptic camera,” Tech. Rep. CTSR

2005-02, Stanford, 2005.

[137] A. Isaksen, L. McMillan, and S. J. Gortler, “Dynamically reparameterized

light fields,” in SIGGRAPH ’00: Proceedings of the 27th annual conference

on Computer graphics and interactive techniques, New York, NY, USA, 2000,

pp. 297–306, ACM Press/Addison-Wesley Publishing Co.

[138] Y. Xiong and S. Shafer, “Depth from focusing and defocusing,” Tech. Rep.

CMU-RI-TR-93-07, Robotics Institute, Carnegie Mellon University, Pitts-

burgh, PA, March 1993.

[139] P. Favaro, H. Jin, A. Yezzi, and S. Soatto, “A variational approach to shape

from defocus,” Proc. of the European Conference on Computer Vision, May

2002.

[140] M. A. Georgeson, K. A. May, T. C. A. Freeman, and G. S. Hesse, “From filters

to features: Scalespace analysis of edge and blur coding in human vision,”

Journal of Vision, vol. 7, no. 13, pp. –, 2007.

168

ttp://www.fujifilmholdings.com/en/pdf/investors/annual_report/ff_ar_2003_part_004.pdf



http://www.dpreview.com/reviews/fujifilms3pro

http://www.dpreview.com/reviews/fujifilms3pro

[141] E. Renrich and R. Caruana, “Infinite depth of field methodology,” http:

//www.cs.cornell.edu/~erenrich/dof/, 2003.

[142] S. Ball, “Deep focus,” http://www.dipteristsforum.org.uk/deepfocus/

DeepFocus.pdf, 2004.

[143] M. Brady and G. E. Legge, “Camera calibration for natural image studies and

vision research,” J. Opt. Soc. Am. A, vol. 26, no. 1, pp. 30–42, 2009.

[144] FlashPoint Technology, “Scripting manual for DigitaScript 1.5,” Tech. Rep.

6-2120-0001-01 v1.5.2, FlashPoint Technology, 2000.

[145] R. Willson and S. Shafer, “Precision imaging and control for machine vision re-

search at carnegie mellon university,” Tech. Rep. CMU-CS-92-118, Computer

Science Department, Carnegie Mellon University, Pittsburgh, PA, March 1992.

[146] J. V. Piqueres, “Still life with bolts,” http://www.ignorancia.org/en/

index.php?page=Bolts_still, 2002.

[147] POV-Ray, “Pov-ray hall of fame,” http://hof.povray.org/, March 2010.

[148] D. J. Tolhurst and Y. Tadmor, “Band-limited contrast in natural images ex-

plains the detectability of changes in the amplitude spectra,” Vision Research,

vol. 37, no. 23, pp. 3203 – 3215, 1997.

[149] P. J. Bex, I. Mareschal, and S. C. Dakin, “Contrast gain control in natural

scenes,” Journal of Vision, vol. 7, no. 11, pp. 1–12, 8 2007.

[150] P. J. Bex, S. G. Solomon, and S. C. Dakin, “Contrast sensitivity in natural

scenes depends on edge as well as spatial frequency structure,” J. Vis., vol. 9,

no. 10, pp. 1–19, 9 2009.

[151] A. Olmos and F. Kingdom, “McGill calibrated colour image database,” http:

//tabby.vision.mcgill.ca, 2004.

[152] J. H. van Hateren and A. van der Schaaf, “Independent component filters of

natural images compared with simple cells in primary visual cortex,” Proceed-

ings: Biological Sciences, vol. 265, no. 1394, pp. 359–366, Mar 1998.

[153] J. F. Brenner, B. S. Dew, J. B. Horton, T. King, P. W. Neurath, and W. D.

Selles, “An automated microscope for cytologic research a preliminary evalu-

ation,” J. Histochem. Cytochem., vol. 24, no. 1, pp. 100–111, 1976.

169

http://www.cs.cornell.edu/~erenrich/dof/

http://www.cs.cornell.edu/~erenrich/dof/

http://www.dipteristsforum.org.uk/deepfocus/DeepFocus.pdf

http://www.dipteristsforum.org.uk/deepfocus/DeepFocus.pdf

http://www.ignorancia.org/en/index.php?page=Bolts_still

http://www.ignorancia.org/en/index.php?page=Bolts_still

http://hof.povray.org/

http://tabby.vision.mcgill.ca

http://tabby.vision.mcgill.ca

[154] Wikipedia, “List of color spaces and their uses — Wikipedia, the free ency-

clopedia,” 2006, [accessed 20-November-2006].

[155] C. Poynton, “Frequently asked questions about colour,” http://www.

poynton.com/PDFs/ColorFAQ.pdf, 1997.

[156] M. Stokes, M. Anderson, S. Chandrasekar, and R. Motta, “A standard default

color space for the internet - sRGB,” Tech. Rep., HP and Microsoft, November

1996.

[157] Adobe Systems Inc, “Adobe RGB (1998) color image encoding,” Tech. Rep.,

Adobe, May 2005.

[158] Y. K. Lee, “Comparison of CIELAB dE* and CIEDE2000 color-differences af-

ter polymerization and thermocycling of resin composites,” Dental Materials,

vol. 21, no. 7, pp. 678–682, July 2005.

[159] C. H. Chou, K. C. Liu, and C. S. Lin, “Perceptually optimized JPEG2000

coder based on CIEDE2000 color difference equation,” Image Processing,

2005. ICIP 2005. IEEE International Conference on, vol. 3, pp. 1184–1187,

September 2005.

[160] M. W. Schwarz, W. B. Cowan, and J. C. Beatty, “An experimental comparison

of RGB, YIQ, LAB, HSV, and opponent color models,” ACM Trans. Graph.,

vol. 6, no. 2, pp. 123–158, 1987.

[161] G. Paschos, “Perceptually uniform color spaces for color texture analysis: an

empirical evaluation.,” IEEE Transactions on Image Processing, vol. 10, no.

6, pp. 932–937, 2001.

[162] J. McNames, “An effective color scale for simultaneous color and gray-scale

publications,” IEEE Signal Processing Magazine, vol. 23, no. 1, pp. 82–96,

Jan 2006.

[163] M. Morgan, C. Chubb, and J. A. Solomon, “A dipper function for texture

discrimination based on orientation variance,” Journal of Vision, vol. 8, no.

11, pp. –, 2008.

[164] G. Mather and D. R. R. Smith, “Blur discrimination and its relation to blur-

mediated depth perception,” Perception, vol. 31, no. 10, pp. 1211 – 1219,

2002.

170

http://www.poynton.com/PDFs/ColorFAQ.pdf

http://www.poynton.com/PDFs/ColorFAQ.pdf

[165] D. G. Pelli and L. Zhang, “Accurate control of contrast on microcomputer

displays,” Vision Research, vol. 31, no. 7-8, pp. 1337 – 1350, 1991.

[166] Wikipedia, “Scotopic vision — wikipedia, the free encyclopedia,” 2010, [On-

line; accessed 22-June-2010].

[167] D. Karatzas and S. M. Wuerger, “A hardware-independent colour calibration

technique,” Annals of the British Machine Vision Association, vol. 2007, no.

3, pp. 1–11, Jan. 2007.

[168] J. A. Solomon, “Contrast discrimination: Second responses reveal the rela-

tionship between the mean and variance of visual signals,” Vision Research,

vol. 47, no. 26, pp. 3247 – 3258, 2007.

[169] M. A. Garcia-Perez, “Forced-choice staircases with fixed step sizes: asymptotic

and small-sample properties,” Vision Research, vol. 38, no. 12, pp. 1861 – 1881,

1998.

[170] A. B. Watson and D. G. Pelli, “QUEST: a Bayesian adaptive psychometric

method.,” Perception and Psychophysics, vol. 33, no. 2, pp. 113–120, February

1983.

[171] D. H. Brainard, “The psychophysics toolbox,” Spatial Vision, vol. 10, pp.

433–436(4), 1997.

[172] D. G. Pelli, “The videotoolbox software for visual psychophysics: transforming

numbers into movies,” Spatial Vision, vol. 10, pp. 437–442(6), 1997.

[173] A. J. Simmers, P. J. Bex, and R. F. Hess, “Perceived Blur in Amblyopia,”

Investigative Ophthalmology and Visual Science, vol. 44, no. 3, pp. 1395–1400,

2003.

[174] I. Fine and A. Rokem, “Psychtoolbox tutorial,” Tech. Rep., Silver lab, Uni-

versity of California, Berkeley, 2007.

[175] F. A. Wichmann and N. J Hill, “The psychometric function: I. Fitting, sam-

pling and goodness-of-fit,” Perception and Psychophysics, vol. 63, no. 8, pp.

1293–1313, 2001.

[176] F. A. Wichmann and N. J Hill, “The psychometric function: II. Bootstrap-

based confidence intervals and sampling,” Perception and Psychophysics, vol.

63, no. 8, pp. 1314–1329, 2001.

171

[177] DrBob, “Principle of the imaging provided by the convex lens,” http://

commons.wikimedia.org/wiki/File:Lens3.svg, March 2006, [accessed 01-

June-2010].

[178] R. J. Secord, “Image preloader with progress bar,” On-

line tutorial, http://www.knowledgesutra.com/forums/topic/

21316-image-preloader-with-progress-bar-status/, 2005.

[179] R Development Core Team, R: A Language and Environment for Statistical

Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008,

ISBN 3-900051-07-0.

[180] J. M. Buhmann, D. W. Fellner, M. Held, J. Ketterer, and J. Puzicha,

“Dithered color quantization,” Computer Graphics Forum, vol. 17, no. 3,

pp. 219–231, 1998.

[181] S. M. Wuerger, H. Owens, and S. Westland, “Blur tolerance for luminance

and chromatic stimuli,” Journal of the Optical Socity of America A, vol. 18,

no. 6, pp. 1231–1239, June 2000.

[182] D. G. Albrecht and D. B. Hamilton, “Striate cortex of monkey and cat:

contrast response function,” J Neurophysiol, vol. 48, no. 1, pp. 217–237, 1982.

[183] M. Carandini, “Matteobox toolbox,” 2001.

[184] S. T. Mueller and C. T. Weidemann, “Decision noise: An explanation for

observed violations of signal detection theory,” Psychonomic Bulletin and

Review, vol. 15, no. 3, pp. 465–494, 2008.

[185] F. Gosselin and P. G. Schyns, “Bubbles: a technique to reveal the use of

information in recognition tasks,” Vision Research, vol. 41, no. 17, pp. 2261

– 2271, 2001.

[186] R. T. Shilston, “Matlab implementation of focus measures,” http://

shilston.com/focusmeasures, September 2011.

[187] R. Ferzli, L. J. Karam, and J. Caviedes, “A robust image sharpness met-

ric based on kurtosis measurement of wavelet coefficients,” in International

Workshop on Video Processing and Quality Metrics for Consumer Electronics

(VPQM), 2005.

172

http://commons.wikimedia.org/wiki/File:Lens3.svg

http://commons.wikimedia.org/wiki/File:Lens3.svg

http://www.knowledgesutra.com/forums/topic/21316-image-preloader-with-progress-bar-status/

http://www.knowledgesutra.com/forums/topic/21316-image-preloader-with-progress-bar-status/

http://shilston.com/focusmeasures

http://shilston.com/focusmeasures

[188] R. Ferzli and L. J. Karam, “No-reference objective wavelet based noise im-

mune image sharpness metric,” in Image Processing, 2005. ICIP 2005. IEEE

International Conference on, September 2005, vol. 1, pp. I – 405–8.

[189] Sony, “EVI D30/31 command list,” http://bssc.sel.sony.com/

Professional/docs/manuals/evid30commandlist1-21.pdf, 1999.

[190] University of Southern California iLab, “iLab Neuromorphic Vision C++

Toolkit,” http://ilab.usc.edu/toolkit/, Jan 2006.

[191] BBC, “Mobile phone sales start to slow,” http://news.bbc.co.uk/1/hi/uk_

politics/5403588.stm, Oct 2006.

[192] K. Behr, “‘Same-as-Difference’: Narrative transformations and intersecting

cultures in Harry Potter,” Journal of Narrative Theory, vol. 35, no. 1, pp.

112–132, 2005.

[193] M. Bollmann, R. Hoischen, and B. Mertsching, “Integration of static and

dynamic scene features guiding visual attention,” in Mustererkennung 1997,

19. DAGM-Symposium, London, UK, 1997, pp. 483–490, Springer-Verlag.

[194] A. P. Bradley and F. W. M. Stentiford, “JPEG2000 and region of interest

coding,” Digital Image Computing Techniques and Applications, Australia,

Proceedings of, Jan 2002.

[195] C. Bundesen, “A theory of visual attention,” Psychological Review, vol. 97,

no. 4, pp. 523–547, 1990.

[196] W. H. Cheng, W. T. Chu, and J. L. Wu, “A visual attention based region-of-

interest determination framework for video sequences,” IEICE Trans Inf Syst,

vol. E88-D, no. 7, pp. 1578–1586, 2005.

[197] S. C. Cheung, “Visualizing marriage in Hong Kong,” Visual Anthropology,

vol. 19, no. 1, pp. 21–37, 2006.

[198] T. N. Cornsweet and H. D. Crane, “Experimental study of visual accommo-

dation,” Tech. Rep. NASA CR-2007, NASA - Ames Research Centre, March

1972.

[199] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A.

Harshman, “Indexing by latent semantic analysis,” Journal of the American

Society of Information Science, vol. 41, no. 6, pp. 391–407, 1990.

173

http://bssc.sel.sony.com/Professional/docs/manuals/evid30commandlist1-21.pdf

http://bssc.sel.sony.com/Professional/docs/manuals/evid30commandlist1-21.pdf

http://ilab.usc.edu/toolkit/

http://news.bbc.co.uk/1/hi/uk_politics/5403588.stm

http://news.bbc.co.uk/1/hi/uk_politics/5403588.stm

[200] A. D. Doulamis, N. Doulamis, G. Akrivas, and S. Kollias, “Non-sequential

video content representation using temporal variation of feature vectors,”

IEEE Transactions on Consumer Electronics, vol. 46, no. 3, pp. 758–768,

2000.

[201] S. M. Ebenholtz, Oculomotor systems and perception, Cambridge University

Press, 2001.

[202] M. Fairchild, “Refinement of the RLAB color space,” Color Res. Appl, vol.

21, pp. 338–346, 1996.

[203] A. Mfit Ferman and A. Murat Tekalp, “Two-stage hierarchical video summary

extraction to match low-level user browsing preferences,” Multimedia, IEEE

Transactions on, vol. 5, pp. 244–256, June 2003.

[204] D. M. Frohlich, Audiophotography: Bringing photos to life with sounds,

Kluwer, 2004.

[205] P. Getreuer, “Matlab CIE functions,” http://www.mathworks.com/

matlabcentral/fileexchange/loadFile.do?objectId=7744, 2005.

[206] J. Geusebroek, F. Cornelissen, A. Smeulders, and H. Geerts, “Robust autofo-

cusing in microscopy,” Cytometry, vol. 39, no. 1, pp. 1–9, 2000.

[207] gi@como on flickr, “A good friend,” http://www.flickr.com/photos/

49784886@N00/441230864/, March 2007.

[208] Y. Gong and X. Liu, “Video summarization with minimal visual content

redundancies.,” in ICIP (3), 2001, pp. 362–365.

[209] Y. Gong and X. Liu, “Summarizing video by minimizing visual content re-

dundancies.,” in ICME, 2001.

[210] Y. Gong and X. Liu, “Generating optimal video summaries,” in IEEE Inter-

national Conference on Multimedia and Expo (III), 2000, pp. 1559–1562.

[211] Y. Gong and X. Liu, “Video summarization using singular value decomposi-

tion,” in Proc. IEEE Intl. Conf. on Computer Vision and Pattern Recoginition,

June 2000, pp. 174–180.

[212] P. J. Green, “Colorimetry and colour difference,” in Colour Engineering, P. J.

Green and L. W. MacDonald, Eds. Wiley, 2002.

174

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=7744

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=7744



[213] A. Hanjalic and H. J. Zhang, “An integrated scheme for automated video

abstraction based on unsupervised cluster-validity analysis,” IEEE Trans.

Circuits Syst. Video Techn., vol. 9, no. 8, pp. 1280–1289, 1999.

[214] D. Harper, “Talking about pictures: a case for photo elicitation,” Visual

Studies, vol. 17, no. 1, pp. 13–26, April 2002.

[215] L. He, E. Sanocki, A. Gupta, and J. Grudin, “Auto-summarization of audio-

video presentations,” in MULTIMEDIA ’99: Proceedings of the seventh ACM

international conference on Multimedia (Part 1), New York, NY, USA, 1999,

pp. 489–498, ACM Press.

[216] L. Itti and C. Koch, “Computational modeling of visual attention,” Nature

Reviews: Neuroscience, vol. 2, pp. 2–11, March 2001.

[217] L. Itti and C. Koch, “Feature combination strategies for saliency-based visual

attention systems,” Journal of Electronic Imaging, vol. 10, no. 1, pp. 161–169,

January 2001.

[218] L. Itti and C. Koch, “A saliency-based search mechanism for overt and covert

shifts of visual attention,” Vision Research, vol. 40, no. 10-12, pp. 1489–1506,

May 2000.

[219] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for

rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.

[220] G. M. Johnson and M. D. Fairchild, “A top down description of S-CIELAB

and CIEDE2000,” Color Research and Application, vol. 28, pp. 425–435, 2003.

[221] M. M. Jose, “MPEG-7 overview,” Tech. Rep. ISO/IEC JTCI/SC/29/WG11

N6828, ISO, 2004.

[222] jpstanley on flickr, “Moon and elnath,” http://www.flickr.com/photos/

jpstanley/138038291/, April 2006.

[223] G. King, Say ‘cheese’! : the snapshot as art and social history, Collins, 1986.

[224] C. Koch and S. Ullman, “Shifts in selective visual attention: towards the

underlying neural circuitry.,” Hum Neurobiol, vol. 4, no. 4, pp. 219–227, 1985.

[225] J. C. Kotulak and C. M. Schor, “The accommodative response to subthreshold

blur and to perceptual fading during the troxler phenomenon,” Perception,

vol. 15, pp. 7–15, 1986.

175

http://www.flickr.com/photos/jpstanley/138038291/

http://www.flickr.com/photos/jpstanley/138038291/

[226] R. G. Kuehni, “Color space and its divisions,” Color Research & Application,

vol. 26, no. 3, pp. 209–222, 2001.

[227] T. Kurita and T. Kato, “Learning of personal visual impression for image

database systems,” in in Proceedings of the Second International Conference

on Document Analysis and Recognition, 1993, pp. 547–552.

[228] S. Lee, S. Lee, and D. Chen, “Automatic video summary and description,” in

VISUAL ’00: Proceedings of the 4th International Conference on Advances in

Visual Information Systems, London, UK, 2000, pp. 37–48, Springer-Verlag.

[229] J. E. Lewis and L. Maler, “Blurring of the senses: common cues for distance

perception in diverse sensory systems,” Neuroscience, vol. 114, no. 1, pp.

19–22, September 2002.

[230] J. Li, “Autofocus searching algorithm considering human visual system limi-

tations,” Optical Engineering, vol. 44, no. 11, pp. 113201, 2005.

[231] Y. Li, T. Zhang, and D. Tretter, “An overview of video abstraction tech-

niques,” Tech. Rep. HPL-2001-191, HP Laboratories, Palo Alto, July 2001.

[232] R. Lienhart, S. Pfeiffer, and W. Effelsberg, “Video abstracting,” Communi-

cations of the ACM, vol. 40, no. 12, pp. 54–62, 1997.

[233] M. R. Luo, C. Mincheq, P. Kenyon, and G. Cui, “Verification of CIEDE2000

using industrial data,” in AIC 2004 Color and Paints, Interim Meeting of the

International Color Association, 2004, pp. 97–102.

[234] Y. Ma, L. Lu, H. Zhang, and M. Li, “A user attention model for video summa-

rization,” in MULTIMEDIA ’02: Proceedings of the tenth ACM international

conference on Multimedia, New York, NY, USA, 2002, pp. 533–542, ACM

Press.

[235] Y. F. Ma, L. Lu, H. J. Zhang, and M. Li, “A user attention model for

video summarization,” in MULTIMEDIA ’02: Proceedings of the tenth ACM

international conference on Multimedia, New York, NY, USA, 2002, pp. 533–

542, ACM Press.

[236] Y. F. Ma and H. J. Zhang, “A model of motion attention for video skimming,”

in Proceedings of the International Conference on Image Processing, 2002, pp.

129–132.

176

[237] D. L. MacAdam, “Visual sensitivities to color differences in daylight,” Journal

of the Optical Society of America, vol. 42, no. 5, pp. 247–274, May 1942.

[238] C. McCreery, “Analysis of variance,” Tech. Rep. 2007-3, Oxford Forum, 2007.

[239] J. Meyer, “The future of digital imaging - high dynamic range photography

(HDR),” http://www.cybergrain.com/tech/hdr, 2004.

[240] A. Misawa, “Focusing apparatus for adjusting focus of an optical instrument,”

US Patent 6,864,474, Feb 2002.

[241] J. Nachmias and R. V. Sansbury, “Grating contrast: Discrimination may be

better than detection,” Vision Research, vol. 14, no. 10, pp. 1039 – 1042, 1974.

[242] J. Nam and A. H. Tewfik, “Dynamic video summarization and visualization,”

in MULTIMEDIA ’99: Proceedings of the seventh ACM international confer-

ence on Multimedia (Part 2), New York, NY, USA, 1999, pp. 53–56, ACM

Press.

[243] J. Nam and A. H. Tewfik, “Video abstract of video,” in Multimedia Signal

Processing, 1999 IEEE 3rd Workshop on, 1999, pp. 117–122.

[244] Netscape, “Colors,” http://wp.netscape.com/home/bg/colorindex.html,

1995.

[245] H. Nishiyama, S. Kin, T. Yokoyama, and Y. Matsushita, “An image retrieval

system considering subjective perception,” in CHI ’94: Proceedings of the

SIGCHI conference on Human factors in computing systems, New York, NY,

USA, 1994, pp. 30–36, ACM.

[246] N. Ohta and A. R. Robertson, Colorimetry: Fundamentals and Applications,

Wiley, 2005.

[247] W. M. Osberger, “Visual attention model,” US Patent 6,670,963, December

2003.

[248] O. K. Oyekoya and F. W. Stentiford, “Eye tracking – a new interface for

visual exploration,” BT Technology Journal, vol. 24, no. 3, pp. 57–66, 2006.

[249] D. Parkhurst, K. Law, and E. Niebur, “Modeling the role of salience in the

allocation of overt visual attention,” Vision Research, vol. 42, pp. 107–123,

2002.

177

http://www.cybergrain.com/tech/hdr

http://wp.netscape.com/home/bg/colorindex.html

[250] M. R. Pointer and G. G. Attridge, “Some aspects of the visual scaling of

large colour differences,” Color Research and Application, vol. 22, no. 5, pp.

298–307, 1997.

[251] S. V. Porter, M. Mirmehdi, and B. T. Thomas, “A shortest path represen-

tation for video summarisation,” in Proceedings of the 12th International

Conference on Image Analysis and Processing. September 2003, pp. 460–465,

IEEE Computer Society.

[252] Oxford University Press, “picture, n.,” The Oxford English Dictionary, 1989.

[253] M. Ronnier-Luo, “Colour difference formulae: Past, present and future,” in

ISCC, Ottowa, May 2006.

[254] C. Ronse, “The phase congruence model for edge detection in multi-

dimensional pictures,” Tech. Rep. 97/16, Universit Louis Pasteur, Strasbourg,

France, 1997.

[255] J. K. Rowling, Harry Potter and the Philosopher’s Stone, Bloomsbury, Lon-

don, 1997.

[256] D. M. Russell, “A design pattern-based video summarization technique: Mov-

ing from low-level signals to high-level structure.,” in HICSS, 2000.

[257] M. Russell, “Using eye-tracking data to understand first impressions of a

website,” Usability News, vol. 7, no. 1, pp. 1–14, February 2005.

[258] I. A. Rybak, V. I. Gusakova, A. V. Golovan, L. N. Podladchikova, and N. A.

Shevtsova, “A model of attention-guided visual perception and recognition,”

Vision Research, vol. 38, pp. 2387–2400, 1998.

[259] R. Scruton, “Photography and representation,” Critical Inquiry, vol. 7, no. 3,

pp. 577–603, 1981.

[260] G. Sharma, W. Wu, and E. N. Dalal, “The CIEDE2000 color-difference for-

mula: Implementation notes, supplementary test data, and mathematical ob-

servations,” Color Research and Application, vol. 30, no. 1, pp. 21–30, Febru-

ary 2005.

[261] R. T. Shilston, “Visual attention: An identification of new applications, and

an implementation on the TMS320DM642,” M.S. thesis, University of Bristol,

2005.

178

[262] M. Smith and T. Kanade, “Video skimming and characterization through

the combination of image and language understanding,” in Proceedings of the

1998 IEEE International Workshop on Content-Based Access of Image and

Video Databases, January 1998, pp. 61 – 70.

[263] J. A. Solomon, “The history of dipper functions,” Attention, Perception and

Psychophysics, vol. 71, no. 3, pp. 435–443, 2009.

[264] E. Spiekermann and E. M. Gringer, Stop stealing sheep & find out how type

works, Adobe Press, 1993.

[265] S. Srinivasan, D. B. Ponceleon, A. Amir, and D. Petkovic, ““what is in that

video anyway?” in search of better browsing.,” in ICMCS, Vol. 1, 1999, pp.

388–393.

[266] F. W. M. Stentiford, “Attention based auto image cropping,” in Workshop

on Computational Attention and Applications, Bielefeld, March 2007.

[267] F. W. M. Stentiford, “An estimator for visual attention through competitive

novelty with application to image compression,” in Picture Coding Symposium,

April 2001, pp. 101–104.

[268] F. W. M. Stentiford, “An evolutionary programming approach to the simula-

tion of visual attention,” in Proceedings of the 2001 Congress on Evolution-

ary Computation CEC2001, COEX, World Trade Center, 159 Samseong-dong,

Gangnam-gu, Seoul, Korea, May 2001, pp. 851–858, IEEE Press.

[269] B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs, “Automatic thumbnail

cropping and its effectiveness,” in UIST ’03: Proceedings of the 16th annual

ACM symposium on User interface software and technology, New York, NY,

USA, 2003, pp. 95–104, ACM Press.

[270] J. M. Thorne, “Supporting informal communication and closeness through

video snapshots,” in London Communications Symposium 2005, September

2005, London, England, 2005.

[271] C. Toklu, S. P. Liou, and M. Das, “Video abstract: A hybrid approach to

generate semantically meaningful video summaries.,” in IEEE International

Conference on Multimedia and Expo (III), 2000, pp. 1333–1336.

[272] B. L. Tseng, C. Lin, and J. R. Smith, “Using MPEG-7 and MPEG-21 for

personalizing video,” IEEE Multimedia, vol. 11, no. 1, pp. 42–52, 2004.

179

[273] B. L. Tseng, C. Y. Lin, and J. R. Smith, “Video summarization and person-

alization for pervasive mobile devices,” in Proc. SPIE Vol. 4676, p. 359-370,

Storage and Retrieval for Media Databases 2002, Minerva M. Yeung; Chung-

Sheng Li; Rainer W. Lienhart; Eds., M. M. Yeung, C.-S. Li, and R. W. Lien-

hart, Eds., Dec. 2001, pp. 359–370.

[274] J. K. Tsotsos, “On attention, saliency, recognition and feature binding,” in

CCNCV keynote presentation, Bielefeld, Germany, 2007.

[275] J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. Lai, N. Davis, and F. Nuflo,

“Modeling visual attention via selective tuning,” Artificial Intelligence, vol.

78, no. 1-2, pp. 507–545, 1995.

[276] S. Uchihashi, J. Foote, A. Girgensohn, and J. Boreczky, “Video manga: gen-

erating semantically meaningful video summaries,” in MULTIMEDIA ’99:

Proceedings of the seventh ACM international conference on Multimedia (Part

1), New York, NY, USA, 1999, pp. 383–392, ACM Press.

[277] K. N. Walker, T. F. Cootes, and C. J. Taylor, “Locating salient object fea-

tures,” British Machine Vision Conference, vol. 2, pp. 557–566, 1998.

[278] Y. Wang, P. Zhao, D. Zhang, M. Li, and H. Zhang, “Myvideos: a system for

home video management,” in MULTIMEDIA ’02: Proceedings of the tenth

ACM international conference on Multimedia, New York, NY, USA, 2002, pp.

412–413, ACM Press.

[279] Wikipedia, “Helvetica — wikipedia, the free encyclopedia,” 2008, [accessed

2-September-2008].

[280] W. D. Wright, “The sensitivity of the eye to small colour differences,” The

Proceedings of the Physical Society, vol. 53, no. 296, pp. 93–112, March 1941.

[281] Y. Zhuang, Y. Rui, T. S. Huang, and S. Mehrotra, “Adaptive key frame

extraction using unsupervised clustering.,” in ICIP (1), 1998, pp. 866–870.

[282] T. T. Yeo, S. H. Ong, Jayasooriah, and R. Sinniah, “Autofocusing for tissue

microscopy,” Image and Vision Computing, vol. 11, no. 10, pp. 629 – 639,

1993.

180

Appendices

181

Appendix A

Example photographs

Photographs are taken for many purposes, but there are some specific situations and

stories reported to the author that demonstrate the challenges affecting any auto-

matic processes used to take them. This appendix includes a such few photographs,

together with a description of their meaning and issues they provoke. These help

to explain why photography is so subjective, and that there are many issues that

remain present and ripe for further investigation.

182

Figure A.1: Land and water - Lasalle This photo, from the McGill library [151], is anexcellent photo with which to demonstrate how different people might want to focus atdifferent distances, but retaining the same composition. Such top-down influence is likelyto be present in many photos that are taken – in this scene, a geologist, botanist andornithologist would be likely to focus differently.

183

Figure A.2: An old picture of Boyton School This photo, from the Boyton historial archive,shows children sitting on a wall in front of the school house. Recently, this photo wasexamined to try to judge when certain portions of the wall on which the children are satwas rebuilt. This demonstrates that photographs are sometimes used in ways that thephotographer would not, or could not, anticipate.A framework for examining such varying usage of a photos, discussing in more detailthe relationship between photographer, subject, audience and photograph, and how thatevolves over time, is described by Frohlich [204, Figure 3.5].

184

Appendix B

Preliminary experiments

Section 3.4 (page 74) describes other researchers’ methodologies for exploring human

blur discrimination using controlled viewing conditions. These previous experiments

used a variety of approaches, with experimental differences in aspects of stimuli pre-

sentation, observation and analysis. Several aspects, such as presentation duration,

were assessed in preliminary trials and existing published approaches found to be

satisfactory. Contrast normalisation was considered (see page 79), but for the ex-

periments in this work was not employed, to ensure that stimuli looked like natural

images, rather than being artificially distorted.

One aspect of the methodology, however, that was less clear in the existing lit-

erature is the viewing method. Whilst it was assumed that unconstrained binocular

viewing was used in previous experiments in the literature, it was felt necessary

to conduct preliminary experiments to measure the discrimination response in an

observer when viewing the stimuli monocularly as well as binocularly to confirm

whether blur discrimination was affected by viewing conditions.

To assess any difference in discrimination when viewing monocularly, one ob-

server performed the experiments described in Chapter 6 both with binocular vision,

and also with an eye patch covering one eye. Apart from this patch, the experimental

procedure was identical. The results of this trial are shown in Figure B.1.

What is apparent from these results is that the same underlying dipper response

is present with both viewing methods. Quantitatively, the results are similar, though

there is slightly better discrimination performance with binocular vision than monoc-

ular viewing across all pedestal conditions. The cause of this slight change in dis-

crimination was not investigated, though from observer feedback it is possible that

monocular viewing is more tiring, and this reduced the ability to discriminate be-

tween stimuli.

On the basis of these results, and similar ones for the second Van Hateren scene,

185

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) Image 00005, binocular

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) Image 00005, monocular

Figure B.1: Blur discrimination thresholds as a function of pedestal blur, for one observer(RTS) viewing Van Hateren image 00005 both binocularly and monocularly. Each estimateof threshold was based on at least three separate determinations (QUESTs) per measure,followed by a bootstrapping procedure to establish the 82% threshold and confidenceintervals. This is overlaid with the best fit to a contrast response function (Equation 6.1)computed with a least squares fitting.

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103

Pedestal blur

Thr

esho

ld b

lur

(a) Image 01342

10−2

10−1

100

101

102

10−2

10−1

100

101

102

103

Pedestal blur

Thr

esho

ld b

lur

(b) Image 00005

Figure B.2: Blur discrimination thresholds as a function of pedestal blur, for one ob-server (RTS) viewing Van Hateren image 00005 and 01342 monocularly overlaid on theresults of multiple binocular observers (from Figure 6.6). Monocular results, in red, showthe estimate of thresholds based on at least three separate determinations (QUESTs) permeasure, followed by a bootstrapping procedure to establish the 82% threshold and confi-dence intervals. This is overlaid with the best fit to a contrast response function (Equation6.1) computed with a least squares fitting. Binocular results, in black, show the mean ofeach individual observers’ results after applying a bootstrapping procedure.

186

it was decided that no marked difference was present between viewing techniques.

Thus, just as when considering contrast where emphasis was placed on the similarity

with normal viewing behaviour, it was decided to perform all experiments with

binocular viewing.

Once the full experimental data had been collected, it was possible to validate

this response by demonstrating that the preliminary results for binocular viewing fit

within the range of binocular results found in the reported experiments. Figure B.2

overlays the monocular results on the binocular human results reported in Figure

6.6. The black line (the best fit to the mean binocular observations), falls within the

red error bars (the confidence intervals of the boostrapping procedure applied to the

monocular observations) for all but the two pedestal conditions with largest blur.

The confidence intervals for the conditions with largest blur, whilst not covering the

black best fit line, do encompass some of the individual observers.

Overall, this reconfirms the preliminary finding that blur discrimination can be

performed both monocularly and binocularly, and that there is no marked difference

in discrimination between these two viewing methods – those differences that are

present are similar in magnitude to the differences in results between observers.

187

Appendix C

New focus measures

A list of focus measures discovered in the literature is presented on page 43. Addi-

tional focus measures were conceived by the author during the course of the research

described in this thesis, each of which was fully assessed using all the experiments

in Sections 5 and 7. This appendix provides a brief description of each of these new

measures:

randomnumber This measure does not assess the input image, but instead out-

puts a random number. This was used to confirm that all focus measures

performed better than chance, which Table 5.2 (page 120) demonstrates to be

the case – randomnumber scored lowest of all measures.

rawlaplace The laplace measure is a thresholded summation of the convolution of

the input image with a 3x3 Laplace kernel [94]. As other measures were found

in the literature with in both a thresholded and non-thresholded variant, a

non-thresholded variant of laplace was created and named rawlaplace.

phasecongruence and phasecongruence2 Kovessi observed that many image

features “give rise to points where the Fourier components of the image are

maximally in phase” [22]. By searching for the congruency of Fourier compo-

nents, Kovessi was able to produce a feature map. To turn this feature map

into a focus score, it was summed to give an overall score for the image. As

a slight variation to phasecongruence, the summation of the raw feature map

prior to Kovessi’s thresholding was used as a further focus measure, phasecon-

gruence2.

alphaAdult, Red Onion and Image Ensemble Based on Billock’s observation

that human adults’ spatial frequency contrast sensitivity was similar to that

of the 1/fα power spectra observed in natural scenes [68], and the suggestion

188

by Tadmor and Tolhurst that α drives accommodation, a focus measure based

on α was created. This family of measures outputs a score that is the ratio

of the α determined from the image to that of a predetermined value, such

that the peak output occurs when they are equal. For this predetermined

value, alphaAdult uses the mean value determined by Billock’s observations

of humans. alphaImageEnsemble uses the weighted average of a number of

sets of images, also as reported by Billock. Finally, alphaRedOnion uses the

amplitude spectra slope of the modal red onion image as described in Section

4.

energylaplace5a, 5b and 5c During preliminary experiments, it was found that

energylaplace, a measure which is the summation of the squared convolu-

tion of the original image with a 3x3 Laplacian kernel, performed reasonably.

However, when used as a model observer to discriminate blur, discrimination

plateaued in high blur conditions. It was found that a larger kernel more

closely matched human discrimination performance in such conditions. The

5a, 5b and 5c variants of the energylaplace measure each use a slightly different

5x5 kernel, but otherwise are identical to the original measure.

189

Appendix D

Hardware and software

This appendix describes some of the hardware used in the experiments described in

this report, and highlights key software techniques used to control them.

D.1 Digita and the Kodak DC290

The Kodak DC290 is a 2.1 megapixel consumer digital camera that supports script-

ing with the Digita language. The process of installing a script onto the camera is

simply to save it as a .CSM file in the SYSTEM folder of the camera’s memory card.

Whilst Digita does provide the ability to control all the camera’s parameters, the

camera itself does not fully support them. For example, whilst it is possible to set

the focus distance to 53cm with the command SetCameraState("fdst", 53), the

camera does not support focusing at any distance other than Infinity, 20m, 10m,

5m, 3m, 2m, 1m, 70cm and 50cm.

Likewise, it is possible to specify the shutter speed, using SetCameraState("shut",

s), where s is the time for which shutter should be open (in microseconds), and

aperture using SetCameraState("aper", a), where a is the required F stop mul-

tiplied by 100. However, neither of these appeared to alter the brightness of the

photos that were taken, so it is likely that they are not supported by the camera,

though no documentation could be found to confirm this.

The script that was used to automatically capture a range of images was the

following:

menu "Add-On Scripts"

name "Focus"

label "Focus"

mode 0 # Means the script is only available when the camera is in

# Capture mode.

declare i:status

190

declare u:IPIP

declare u:uImageTaken, uImageAvail, uStorage

declare b:bSystem, bCapture, bVendor

declare i:shutter, speed, focus

declare u:min_shut, max_shut, def_shut

declare t:ShutterList, CompleteShutterList, FocusList

declare s:Head,HeadF

SetCaptureMode (still)

GetCapabilitiesRange("shut", min_shut, max_shut, def_shut)

uStorage = 1

IPIP = 0x10000000

CompleteShutterList = " 90, 0"

ShutterList = CompleteShutterList

FocusList = "050,055,060,065,070,075,080,085,090,095,100,120,150,200, 0"

status = 0

waitloop1:

GetCameraStatus (bSystem, bCapture, bVendor)

if bSystem & IPIP

Wait (300)

goto waitloop1

end

# SetCameraState("aper", 800) # F8.0

# SetCameraState("fdst", 65535) # Focus at infinity

SetCameraState("xmod", 1) # Programmed exposure mode. NB. This value

# must be changed for different models of cameras

SetCameraState("fmod", 3) # Manual focus

SetCameraState("aper", 300) # F3.0 - wide aperture => small depth of field

SetCameraState("wmod", 11) # White balance off

SetCameraState("mcap", 1) # Still (as opposed to burst or timelapse)

SetCameraState("scpn", 7) # Photo compression, 7=lossless

SetCameraState("ssiz", 3) # Photo size, 1=High (1792x1200),

# 2=Medium (1440x960), 3=Standard (720x480),

# 4=Ultra (2240x1500)

SetCameraState("smod", 3) # Disable flash

SetCameraState("xcmp", 0) # No exposure compensation

SetCameraState("zpos", 100) # Zoom position - the DC290 can do 100-300

# (=1x to 3x); the default is 130

SetCameraState("irev", 0) # Instant review, units are 0.01 secs

Wait (1000)

shutloop:

SubString(ShutterList, 0, 3, Head)

SubString(FocusList, 0, 3, HeadF)

StringToNumber(Head, speed)

StringToNumber(HeadF, focus)

if speed == 0

if focus == 0

goto done

end

191

if focus != 0

SubString(FocusList, 4, 999, FocusList)

ShutterList = CompleteShutterList

goto shutloop

end

end

shutter = 1000000 / speed

DisplayClear()

DisplayLine("Shutter: ",Head)

DisplayLine("Focus: ",HeadF,"cm")

Wait (1000)

SetCameraState("fdst", focus)

SetCameraState("shut", shutter)

StartCapture()

waitloop2:

GetCameraStatus (bSystem, bCapture, b‘Vendor)

if bSystem & IPIP

Wait (250)

goto waitloop2

end

SubString(ShutterList, 4, 999, ShutterList)

goto shutloop

done:

exitscript

end

192

D.2 Sony EVI-D30 camera

The Sony EVI-D30 can be controlled using an RS232 connection at 9600 baud, 8

data bits, no parity, and 1 stop bit. Whilst the camera’s control protocol, Visca, is bi-

directional, it is possible to simply send commands using a fire-and-forget strategy.

Full documentation of the protocol is provided by Sony [189], but for this work, the

most pertinent commands are:

Command Description0x8101043803FF Manual focus0x8101043802FF Auto focus0x810104480Z0Z0Z0ZFF Direct focus, where ZZZZ is the focus data

(infinity = 1000, close = 9FFF)

Frequently the EVI-D30 cameras are used in conjunction with an Axis 2401

webcam. Using such a setup, the above commands can be sent directly to the

camera by crafting a special URL such as:

http://host/axis-cgi/com/serial.cgi?port=1\&write=8101043803FF

On a Linux computer, the libVISCA software makes control and interrogation

of the camera far easier. It can be downloaded from:

http://damien.douxchamps.net/libvisca

193

Appendix E

Scoring the focus measures

This appendix contains results from the experiments described in Section 5. The

following tables contain full tabulated results that have been summarised in Table

5.2. They were produced using the methodology outlined in Section 5, and computed

with Equation (2.8).

194

Measure Acc

ura

cy

Ran

ge

Fal

sem

axim

a

Wid

th

Noi

se

Sco

re

Ran

k

absolutegradient 19 95 13 16 3.74 1.11 42absolutevariation 1 40 6 26 0.47 0.37 25alphaAdult 0 49 5 6 0.13 0.26 15alphaImageEnsemble 0 49 5 6 0.13 0.26 17alphaRedOnion 0 49 5 6 0.13 0.26 15autocorrelation 0 33 3 6 0.11 0.17 4brennergradient 0 33 3 10 0.46 0.21 7chernfft 0 33 4 25 0.56 0.32 24cranepeak 1 79 6 8 0.18 0.39 30cranesum 0 30 3 7 0.13 0.17 3CPBD 0 56 9 5 0.15 0.37 28energylaplace 0 53 9 3 0.35 0.37 27energylaplace5a 0 53 9 4 0.30 0.37 26energylaplace5b 0 49 7 4 0.21 0.30 23energylaplace5c 0 44 6 4 0.17 0.27 19entropy 0 84 15 13 6.09 1.58 43groenvariance 1 42 6 13 0.39 0.29 21histogramentropy 14 93 12 32 6.22 1.63 44hlv 1 70 10 28 1.12 0.57 35imagepower 21 93 14 13 2.51 0.90 39JNBM 0 42 6 6 0.20 0.26 18laplace 0 47 5 4 0.20 0.26 14masgrn 0 30 6 11 0.16 0.25 12menmay 21 93 11 26 2.31 0.85 37normalizedgroenvariance 1 44 7 26 0.52 0.40 31phasecongruence 22 84 7 42 3.16 1.00 41phasecongruence2 0 77 7 3 1.05 0.46 33randomnumber 20 93 14 42 26.53 6.45 45range 0 63 6 8 0.73 0.37 29rawlaplace 21 93 14 13 2.48 0.89 38smd 0 30 3 6 0.15 0.17 2sml 0 33 4 5 0.10 0.19 6squaredgradient 0 33 5 7 0.18 0.22 9standarddeviationbasedautocorrelation 1 40 6 13 0.44 0.29 20tenengrad 0 79 12 6 0.57 0.52 34thresholdedabsolutegradient 0 33 5 7 0.18 0.22 9triakis11s 1 33 4 12 0.28 0.22 11thresholdedcontent 21 93 14 13 2.51 0.90 39thresholdedpixelcount 18 88 15 28 1.83 0.83 36va 0 33 4 28 1.22 0.43 32voll4 0 30 3 6 0.10 0.16 1voll5 1 42 6 13 0.44 0.30 22waveletw1 0 47 5 6 0.11 0.25 13waveletw2 0 33 5 4 0.19 0.21 8waveletw3 0 30 4 4 0.19 0.18 5

Table E.1: Scores for Chillis

195

Measure Acc

ura

cy

Ran

ge

Fal

sem

axim

a

Wid

th

Noi

se

Sco

re

Ran

k


Table E.2: Scores for Coins

196

Measure Acc

ura

cy

Ran

ge

Fal

sem

axim

a

Wid

th

Noi

se

Sco

re

Ran

k


Table E.3: Scores for Bolt

197

Measure Acc

ura

cy

Ran

ge

Fal

sem

axim

a

Wid

th

Noi

se

Sco

re

Ran

k


Table E.4: Scores for Red onion

198

Measure Acc

ura

cy

Ran

ge

Fal

sem

axim

a

Wid

th

Noi

se

Sco

re

Ran

k


Table E.5: Scores for Strawberries

199

Measure Acc

ura

cy

Ran

ge

Fal

sem

axim

a

Wid

th

Noi

se

Sco

re

Ran

k


Table E.6: Scores for Towel

200

Appendix F

Full ANOVA results for ‘best’

experiment

The ANOVA models were built and processed in R as follows:

library(xtable)library(car)

bestdat<-read.table(``chillis1-withoutanomolies.txt'', header=T, sep=``\t'')

bestaov<-aov(Result ¬ Young30 + DecadeAge + Gender + Screen +Colourblind + Correction + Uncorrected + English + DecadeAge*Young30 + Gender*Young30 + Screen*Young30 + Colourblind*Young30 +Correction*Young30 + Uncorrected*Young30 + English*Young30 + Gender

*DecadeAge + Screen*DecadeAge + Colourblind*DecadeAge + Correction*DecadeAge + Uncorrected*DecadeAge + English*DecadeAge + Screen*Gender + Colourblind*Gender + Correction*Gender + Uncorrected*Gender + English*Gender + Colourblind*Screen + Correction*Screen +Uncorrected*Screen + English*Screen + Correction*Colourblind +Uncorrected*Colourblind + English*Colourblind + Uncorrected*Correction + English*Correction + English*Uncorrected,data=bestdat)

summary(bestaov)print(xtable(anova(bestaov),caption=``Anova for chillis1''), type=``

latex'', file=``best-aov-chillis1.tex'')

Listing F.1: R code to perform analysis

201

Df Sum Sq Mean Sq F value Pr(>F)Young30 1 0.40 0.40 0.58 0.4518DecadeAge 1 0.19 0.19 0.27 0.6042Gender 1 0.05 0.05 0.08 0.7799Screen 1 3.84 3.84 5.57 0.0225Colourblind 1 3.38 3.38 4.91 0.0318Correction 1 2.24 2.24 3.26 0.0777Uncorrected 1 0.01 0.01 0.01 0.9231English 1 0.00 0.00 0.00 0.9604Young30:DecadeAge 1 0.14 0.14 0.20 0.6537Young30:Gender 1 2.96 2.96 4.29 0.0440Young30:Colourblind 1 2.05 2.05 2.97 0.0914Young30:Correction 1 0.79 0.79 1.14 0.2912Young30:Uncorrected 1 1.58 1.58 2.29 0.1372Young30:English 1 0.12 0.12 0.17 0.6821DecadeAge:Gender 1 0.08 0.08 0.12 0.7358DecadeAge:Correction 1 0.62 0.62 0.89 0.3495DecadeAge:Uncorrected 1 0.03 0.03 0.04 0.8413DecadeAge:English 1 0.17 0.17 0.25 0.6218Gender:Screen 1 0.04 0.04 0.05 0.8186Gender:Correction 1 0.40 0.40 0.58 0.4509Gender:Uncorrected 1 0.03 0.03 0.04 0.8448Gender:English 1 0.93 0.93 1.34 0.2522Screen:Correction 1 0.12 0.12 0.17 0.6788Colourblind:Correction 1 0.99 0.99 1.44 0.2369Correction:Uncorrected 1 0.00 0.00 0.00 0.9548Correction:English 1 0.00 0.00 0.00 0.9751Uncorrected:English 1 0.56 0.56 0.81 0.3737Residuals 46 31.69 0.69

Table F.1: Anova for chillis1

202

Df Sum Sq Mean Sq F value Pr(>F)Young30 1 0.83 0.83 0.32 0.5738DecadeAge 1 1.83 1.83 0.71 0.4046Gender 1 5.61 5.61 2.18 0.1468Screen 1 0.42 0.42 0.16 0.6895Colourblind 1 0.71 0.71 0.28 0.6022Correction 1 0.18 0.18 0.07 0.7906Uncorrected 1 1.63 1.63 0.63 0.4300English 1 4.14 4.14 1.61 0.2113Young30:Gender 1 1.03 1.03 0.40 0.5312Young30:Colourblind 1 7.62 7.62 2.95 0.0923Young30:Correction 1 13.46 13.46 5.22 0.0269Young30:Uncorrected 1 0.31 0.31 0.12 0.7322Young30:English 1 1.18 1.18 0.46 0.5030DecadeAge:Gender 1 13.13 13.13 5.09 0.0288DecadeAge:Correction 1 0.96 0.96 0.37 0.5445DecadeAge:Uncorrected 1 0.62 0.62 0.24 0.6272DecadeAge:English 1 1.04 1.04 0.40 0.5280Gender:Screen 1 1.09 1.09 0.42 0.5194Gender:Correction 1 2.10 2.10 0.82 0.3710Gender:Uncorrected 1 4.27 4.27 1.65 0.2046Gender:English 1 6.03 6.03 2.34 0.1332Screen:Correction 1 1.36 1.36 0.53 0.4718Colourblind:Correction 1 9.15 9.15 3.55 0.0659Correction:Uncorrected 1 16.22 16.22 6.28 0.0157Correction:English 1 4.17 4.17 1.62 0.2100Uncorrected:English 1 0.20 0.20 0.08 0.7804Residuals 47 121.27 2.58

Table F.2: Anova for coin02

203

Df Sum Sq Mean Sq F value Pr(>F)Young30 1 9.84 9.84 1.72 0.1964DecadeAge 1 13.89 13.89 2.42 0.1261Gender 1 0.00 0.00 0.00 0.9850Screen 1 11.80 11.80 2.06 0.1579Colourblind 1 33.83 33.83 5.90 0.0189Correction 1 2.14 2.14 0.37 0.5445Uncorrected 1 38.14 38.14 6.65 0.0130English 1 2.79 2.79 0.49 0.4890Young30:Gender 1 1.00 1.00 0.18 0.6773Young30:Colourblind 1 20.07 20.07 3.50 0.0674Young30:Correction 1 9.07 9.07 1.58 0.2144Young30:Uncorrected 1 6.18 6.18 1.08 0.3045Young30:English 1 29.21 29.21 5.10 0.0286DecadeAge:Gender 1 5.27 5.27 0.92 0.3423DecadeAge:Correction 1 2.71 2.71 0.47 0.4949DecadeAge:Uncorrected 1 24.28 24.28 4.24 0.0450DecadeAge:English 1 4.18 4.18 0.73 0.3973Gender:Screen 1 1.71 1.71 0.30 0.5874Gender:Correction 1 13.53 13.53 2.36 0.1310Gender:Uncorrected 1 1.76 1.76 0.31 0.5824Gender:English 1 0.72 0.72 0.13 0.7238Screen:Correction 1 21.47 21.47 3.75 0.0588Colourblind:Correction 1 6.63 6.63 1.16 0.2874Correction:Uncorrected 1 13.73 13.73 2.40 0.1283Correction:English 1 3.80 3.80 0.66 0.4193Uncorrected:English 1 15.76 15.76 2.75 0.1039Residuals 48 275.14 5.73

Table F.3: Anova for povraybolt

204

Df Sum Sq Mean Sq F value Pr(>F)Young30 1 0.87 0.87 0.33 0.5684DecadeAge 1 4.95 4.95 1.88 0.1772Gender 1 16.41 16.41 6.24 0.0162Screen 1 1.67 1.67 0.64 0.4297Colourblind 1 3.98 3.98 1.51 0.2254Correction 1 2.26 2.26 0.86 0.3590Uncorrected 1 4.97 4.97 1.89 0.1759English 1 0.65 0.65 0.25 0.6219Young30:DecadeAge 1 0.09 0.09 0.04 0.8522Young30:Gender 1 3.62 3.62 1.37 0.2471Young30:Colourblind 1 1.81 1.81 0.69 0.4108Young30:Correction 1 2.30 2.30 0.87 0.3548Young30:Uncorrected 1 1.34 1.34 0.51 0.4790Young30:English 1 2.04 2.04 0.77 0.3834DecadeAge:Gender 1 9.49 9.49 3.61 0.0640DecadeAge:Correction 1 6.01 6.01 2.29 0.1376DecadeAge:Uncorrected 1 6.09 6.09 2.31 0.1352DecadeAge:English 1 0.92 0.92 0.35 0.5578Gender:Correction 1 1.46 1.46 0.56 0.4600Gender:Uncorrected 1 0.92 0.92 0.35 0.5568Gender:English 1 6.12 6.12 2.32 0.1343Screen:Correction 1 4.02 4.02 1.53 0.2226Colourblind:Correction 1 2.05 2.05 0.78 0.3823Correction:Uncorrected 1 0.10 0.10 0.04 0.8448Correction:English 1 2.55 2.55 0.97 0.3303Uncorrected:English 1 1.89 1.89 0.72 0.4017Residuals 45 118.41 2.63

Table F.4: Anova for redonion1

205

Df Sum Sq Mean Sq F value Pr(>F)Young30 1 1.52 1.52 1.02 0.3173DecadeAge 1 0.01 0.01 0.01 0.9336Gender 1 4.18 4.18 2.81 0.1002Screen 1 0.00 0.00 0.00 0.9891Colourblind 1 2.84 2.84 1.91 0.1733Correction 1 4.88 4.88 3.29 0.0762Uncorrected 1 0.00 0.00 0.00 0.9729English 1 3.38 3.38 2.27 0.1383Young30:DecadeAge 1 0.22 0.22 0.15 0.7016Young30:Gender 1 1.16 1.16 0.78 0.3821Young30:Colourblind 1 1.72 1.72 1.16 0.2872Young30:Correction 1 2.76 2.76 1.86 0.1795Young30:Uncorrected 1 1.92 1.92 1.30 0.2607Young30:English 1 0.22 0.22 0.15 0.6989DecadeAge:Gender 1 6.83 6.83 4.60 0.0373DecadeAge:Correction 1 3.57 3.57 2.40 0.1279DecadeAge:Uncorrected 1 8.06 8.06 5.43 0.0242DecadeAge:English 1 0.96 0.96 0.65 0.4254Gender:Screen 1 28.14 28.14 18.95 0.0001Gender:Correction 1 8.59 8.59 5.78 0.0202Gender:Uncorrected 1 4.69 4.69 3.16 0.0820Gender:English 1 1.21 1.21 0.82 0.3706Screen:Correction 1 2.77 2.77 1.86 0.1789Colourblind:Correction 1 0.07 0.07 0.05 0.8265Correction:Uncorrected 1 1.70 1.70 1.15 0.2900Correction:English 1 2.04 2.04 1.37 0.2469Uncorrected:English 1 1.23 1.23 0.83 0.3681Residuals 47 69.81 1.49

Table F.5: Anova for strawberries1

206

Df Sum Sq Mean Sq F value Pr(>F)Young30 1 1.36 1.36 0.25 0.6199DecadeAge 1 2.63 2.63 0.48 0.4910Gender 1 15.80 15.80 2.90 0.0956Screen 1 1.89 1.89 0.35 0.5586Colourblind 1 13.72 13.72 2.52 0.1197Correction 1 0.45 0.45 0.08 0.7752Uncorrected 1 1.70 1.70 0.31 0.5793English 1 2.42 2.42 0.44 0.5089Young30:Gender 1 0.74 0.74 0.14 0.7140Young30:Colourblind 1 5.88 5.88 1.08 0.3044Young30:Correction 1 8.10 8.10 1.49 0.2292Young30:Uncorrected 1 8.64 8.64 1.59 0.2145Young30:English 1 13.20 13.20 2.43 0.1267DecadeAge:Gender 1 3.80 3.80 0.70 0.4082DecadeAge:Correction 1 19.49 19.49 3.58 0.0652DecadeAge:Uncorrected 1 0.44 0.44 0.08 0.7787Gender:Correction 1 8.43 8.43 1.55 0.2200Gender:Uncorrected 1 0.08 0.08 0.01 0.9031Gender:English 1 5.04 5.04 0.93 0.3413Screen:Correction 1 0.20 0.20 0.04 0.8499Colourblind:Correction 1 0.19 0.19 0.04 0.8520Correction:Uncorrected 1 6.33 6.33 1.16 0.2870Correction:English 1 0.47 0.47 0.09 0.7710Uncorrected:English 1 2.76 2.76 0.51 0.4799Residuals 43 234.06 5.44

Table F.6: Anova for towel01

207

Appendix G

Full results for model observers

208

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) absolutegradientnf = 0.02%,

similarity = 0.456

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) absolutevariationnf = 2.78%,

similarity = 0.348

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) alphaAdult nf = 4.64%,similarity = 1.13

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) alphaImageEnsemblenf = 7.74%,

similarity = 0.755

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) alphaRedOnionnf = 7.74%, similarity = 859

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) autocorrelationnf = 35.9%,

similarity = 0.688

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) brennergradientnf = 4.64%, similarity = 0.84

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) chernfft nf = 7.74%,similarity = 0.317

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) cranepeak nf = 21.5%,similarity = 0.688

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) cranesum nf = 12.9%,similarity = 0.281

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(k) CPBD nf = 12.9%,similarity = 1.7

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(l) energylaplace nf = 0.22%,similarity = 0.389

Figure G.1: Discrimination plotted against pedestal blur for each focus measure, wherethe noise factor is that which resulted in the greatest similarity between discriminationshape and results from human observers when considering Van Hateren 01342.

209

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) energylaplace5anf = 0.36%, similarity = 0.44

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) energylaplace5bnf = 0.22%, similarity = 0.4

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) energylaplace5cnf = 0.22%,

similarity = 0.392

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) entropy nf = 0.05%,similarity = 843

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) groenvariance nf = 4.64%,similarity = 0.342

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) histogramentropynf = 100%, similarity = 949

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) hlv nf = 0.01%,similarity = 1.83

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) imagepower nf = 35.9%,similarity = 0.732

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) JNBM nf = 7.74%,similarity = 1.08

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) laplace nf = 0.22%,similarity = 0.375

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(k) masgrn nf = 0.05%,similarity = 23.3

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(l) menmay nf = 0.01%,similarity = 9.05


210

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) normalizedgroenvariancenf = 7.74%,

similarity = 0.367

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) phasecongruencenf = 0.60%, similarity = 26.6

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) phasecongruence2nf = 0.02%, similarity = 229

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) randomnumbernf = 0.05%, similarity = 881

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) range nf = 4.64%,similarity = 0.254

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) rawlaplace nf = 0.01%,similarity = 67.6

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) rmscontrast nf = 2.78%,similarity = 0.364

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) smd nf = 2.78%,similarity = 0.329

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) sml nf = 21.5%,similarity = 1.65

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) squaredgradientnf = 59.9%, similarity = 4.82

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(k) standarddeviationbased-autocorrelation nf = 0.03%,

similarity = 0.369

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(l) tenengrad nf = 35.9%,similarity = 0.345


211

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a)thresholdedabsolutegradientnf = 0.13%, similarity = 2.37

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) triakis11s nf = 0.03%,similarity = 0.511

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) thresholdedcontentnf = 59.9%,

similarity = 0.802

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) thresholdedpixelcountnf = 35.9%, similarity = 767

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) va nf = 1.67%,similarity = 0.54

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) voll4 nf = 21.5%,similarity = 0.336

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) voll5 nf = 0.05%,similarity = 0.371

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) waveletw1 nf = 0.22%,similarity = 0.379

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) waveletw2 nf = 0.22%,similarity = 0.394

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) waveletw3 nf = 0.22%,similarity = 0.398


212

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) absolutegradientnf = 0.22%,

similarity = 0.394

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) absolutevariationnf = 2.78%,

similarity = 0.385

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) alphaAdult nf = 7.74%,similarity = 0.861

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) alphaImageEnsemblenf = 12.9%,

similarity = 0.833

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) alphaRedOnionnf = 12.9%, similarity = 829

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) autocorrelationnf = 12.9%,

similarity = 0.412

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) brennergradientnf = 7.74%,

similarity = 0.749

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) chernfft nf = 7.74%,similarity = 0.361

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) cranepeak nf = 35.9%,similarity = 0.75

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) cranesum nf = 21.5%,similarity = 0.381

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(k) CPBD nf = 21.5%,similarity = 1.22

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(l) energylaplace nf = 0.03%,similarity = 0.498


213

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) energylaplace5anf = 0.05%, similarity = 0.5

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) energylaplace5bnf = 0.22%,

similarity = 0.472

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) energylaplace5cnf = 0.08%,

similarity = 0.505

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) entropy nf = 2.78%,similarity = 818

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) groenvariance nf = 7.74%,similarity = 0.365

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) histogramentropynf = 100%, similarity = 923

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) hlv nf = 0.00%,similarity = 4.31

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) imagepower nf = 59.9%,similarity = 0.993

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) JNBM nf = 4.64%,similarity = 0.596

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) laplace nf = 0.13%,similarity = 0.451

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(k) masgrn nf = 1.67%,similarity = 2.51

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(l) menmay nf = 0.60%,similarity = 0.438


214

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a) normalizedgroenvariancenf = 7.74%, similarity = 0.39

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) phasecongruencenf = 4.64%, similarity = 109

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) phasecongruence2nf = 4.64%, similarity = 50.1

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) randomnumbernf = 0.00%, similarity = 703

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) range nf = 7.74%,similarity = 0.35

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) rawlaplace nf = 0.02%,similarity = 137

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) rmscontrast nf = 2.78%,similarity = 0.426

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) smd nf = 2.78%,similarity = 0.421

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) sml nf = 0.22%,similarity = 1.71

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) squaredgradientnf = 21.5%,

similarity = 0.789

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(k) standarddeviationbased-autocorrelation nf = 4.64%,

similarity = 0.406

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(l) tenengrad nf = 0.22%,similarity = 0.433


215

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(a)thresholdedabsolutegradient

nf = 21.5%,similarity = 0.798

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(b) triakis11s nf = 0.02%,similarity = 0.844

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(c) thresholdedcontentnf = 59.9%,

similarity = 0.991

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(d) thresholdedpixelcountnf = 7.74%, similarity = 859

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(e) va nf = 0.08%,similarity = 0.434

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(f) voll4 nf = 12.9%,similarity = 0.359

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(g) voll5 nf = 4.64%,similarity = 0.399

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(h) waveletw1 nf = 0.05%,similarity = 0.5

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(i) waveletw2 nf = 0.13%,similarity = 0.443

10−1

100

101

102

10−2

10−1

100

101

102

103


Thr

esho

ld b

lur

(arc

min

utes

)

(j) waveletw3 nf = 0.22%,similarity = 0.463

Figure G.8: Discrimination plotted against pedestal blur for each focus measure, wherethe noise factor is that which resulted in the greatest similarity between discriminationshape and results from human observerss when considering Van Hateren 000005.

216

Appendix H

Software implementation

This appendix provides implementation details for all the focus measures used

throughout this work, together with their dependencies. The mathematical the-

ory and justification behind each focus measure is not considered, except where this

has a direct impact on the implementation. Further details about the respective

measures can be found in the references from Table 2.3 (page 43).

Note: It was noted that two algorithms (autocorrelation and voll4) are essential

identical. They differ solely in boundary conditions, and thus produce a different

(though very similar) output when assessing the same input.

217

H.1 absolutegradient

function score = absolutegradient(I)

% "This function was proposed by Jarvis [2] and is obtained by

setting

% n = 1, m = 1, theta = 0 in F1" [1]

%

% References:

% [1] "A Comparison of Different Focus Functions for Use in

% Autofocus Algorithms" by Groen et al in Cytometry 6:81-91 (1985)

% [2] "Focus optimisation criteria for computer image processing."

by

% Jarvis in Microscope 24:163, 1976

%

% Ensure the image is 8-bit (0-255) grey intensity.

if max(I(:))<1, I = I * 255; end;

z = gradient(I);

E = find(z>0);

score = sum(sum(E));

end

218

H.2 absolutevariation

function score = absolutevariation(I)

% "Absolute variation. As the computation of the variance is rather

% complicated, a comparable result could be expected from the much

% easier to calculate absolute difference. Thus m = 1 and c = A (

the

% image area) [using equation F3]." [1]

%

% References:

% [1] "A Comparison of Different Focus Functions for Use in

% Autofocus Algorithms" by Groen et al in Cytometry 6:81-91 (1985)

%


if max(I(:))<1, I = I * 255; end;

% Calculate image area:

imagearea = prod(size(I));

% Calculate difference from mean:

d = I - mean(I(:));

d = abs(d);

% Finally, the score:

score = 1/imagearea * sum(sum(d));

end

219

H.3 alphaadult

function score = alphaAdult(I)

% Use Alpha from Billock's results showing adults tuned to alpha=-

1.15

score = alphaEquals(I,-1.15);

end

220

H.4 alphaEquals

function score = alphaEquals(I, desiredalpha)

% Score is based on difference between the image's alpha, and the

% desired alpha. This function is used by several wrappers which

set

% pre-determined desired alpha values.


if max(I(:))<1, I = I * 255; end;

% DoPowerPlot, which calculates alpha, requires a greyscale

% square image.

[d1 d2]=size(I);

if d2>d1,

offset = floor((d2-d1)/2);

Igc = I(1:d1, (1+offset):(offset+d1));

else


Igc = I((1+offset):(offset+d2), 1:d2);

end;

% First: Determine current image alpha

[currentfreq,currentscale hist,currentorient,currentorient hist,

...

currentalpha,currentalphaintercept,currentdummy] = ...

DoPowerPlot(Igc,32);

% Second: Return the score, such that the peak value (1) is

returned

% when image alpha = desired alpha (and a negative value is

returned

% should it be necessary).

if currentalpha ≤ desiredalpha,

score = desiredalpha / currentalpha;

else

score = currentalpha / desiredalpha;

end;

end

221

H.5 alphaimageensemble

function score = alphaImageEnsemble(I)

% Use Alpha from Billock's results showing the average alpha over

an

% ensemble of natural scenes is 1.08


end

222

H.6 alpharedonion

function score = alphaRedOnion(I)

% Red Onion modal ground truth was image 19 (64134421-640x480.jpg).

% Its alpha is 1.392 calculated as follows:

% I = imread('64134421-640x480-modal-best.jpg');

% Ig = rgb2gray(I);

% size(Ig) % 640x480

% Igc = Ig(1:480,80:80+479); % Crop out 480x480 from the centre

% doPowerPlot(Igc,32)


end

223

H.7 autocorrelation

function score = autocorrelation(I)

% "AutoCorrelation (Vollath, 1987, 1988)." [1]

%

% References:

% [1] "Autofocusing in Computer Microscopy: Selecting the Optimal

Focus

% Algorithm" by Sun et al in MICROSCOPY RESEARCH AND TECHNIQUE

% 65:139 149 (2004)

%

% score = sum over all image of i(x, y) . i(x + 1, y)

% - sum over all image of i(x, y) . i(x + 2, y)

%

[w h] = size(I);

score = 0;

for i=1:w-2,

for j=1:h,

score = score + (I(i,j)*I(i+1,j) - I(i,j)*I(i+2,j));

end;

end;

end

224

H.8 brennergradient

function score = brennergradient(I, T)

% "Evaluation of autofocus functions in molecular cytogenetic

analysis"

% by Santos et al in Journal of Microscopy Vol.188 Issue 03, pp 264

-272

% http://dx.doi.org/10.1046/j.1365-2818.1997.2630819.x

%

% score = sum over height

% sum over width

% abs(i(x+2,y)-i(x,y))ˆ2

% where abs(i(x+2,y)-i(x,y))ˆ2 > T

%

% Table 1 shows that a threshold of 5 is used.

step = 2;

T1 = 5;

[width height] = size(I);

ML = zeros(width,height);

for x=(1+step):(width-step),

for y=(1+step):(height-step),

ML(x,y) = abs(I(x+step,y)-I(x,y))ˆ2;

end;

end;

score = sum(sum(find(ML≥T1)));

end

225

H.9 chernfft

function score = chernfft(theimage)

% Chern et al (2001) "Practical issues in pixel-based auto-

% focusing for machine vision", Proceedings of the 2001 IEEE

% International Conference of Robotics and Automation, Seoul 2001

%

% "The image's grey levels are placed row by row into a 1D

% array, and its FFT evaluated."

%

theimage = double(theimage);

[w h] = size(theimage);

% Matlab's reshape takes elements columnwise, so rotate

% the image before reshaping:

theimage = theimage';

oned = reshape(theimage,w*h,1);

% Perform an FFT. Chern stated that zero padding was used

% but does not comment on the length of the FFT. As such

% no padding is performed in this implementation.

f = fft(oned);

re = real(f);

im = imag(f);

score = 1/(w*h) * sum(abs((re.ˆ2+im.ˆ2).*atan(im./re)));

end

226

H.10 cpbd

%=====================================================================

% File: CPBD compute.m

% Original code written by Niranjan D. Narvekar

% IVU Lab (http://ivulab.asu.edu)

% Last Revised: October 2009 by Niranjan D. Narvekar

%=====================================================================

% Copyright Notice:

% Copyright (c) 2009-2010 Arizona Board of Regents.

% All Rights Reserved.

% Contact: Lina Karam ([email protected]) and Niranjan D. Narvekar (

[email protected])

% Image, Video, and Usabilty (IVU) Lab, ivulab.asu.edu

% Arizona State University

% This copyright statement may not be removed from this file or from

% modifications to this file.

% This copyright notice must also be included in any file or product

% that is derived from this source file.

%

% Redistribution and use of this code in source and binary forms,

% with or without modification, are permitted provided that the

% following conditions are met:

% - Redistribution's of source code must retain the above copyright

% notice, this list of conditions and the following disclaimer.

% - Redistribution's in binary form must reproduce the above copyright

% notice, this list of conditions and the following disclaimer in the

% documentation and/or other materials provided with the distribution.

% - The Image, Video, and Usability Laboratory (IVU Lab,

% http://ivulab.asu.edu) is acknowledged in any publication that

% reports research results using this code, copies of this code, or

% modifications of this code.

% The code and our papers are to be cited in the bibliography as:

%


%

% N.D. Narvekar and L. J. Karam, "CPBD Sharpness Metric Software",

% http://ivulab.asu.edu/Quality/CPBD

%

% N.D. Narvekar and L. J. Karam, "A No-Reference Perceptual Image

Sharpness

% Metric Based on a Cumulative Probability of Blur Detection,"

% International Workshop on Quality of Multimedia Experience (QoMEX

227

2009),

% pp. 87-91, July 2009.

%

% N. D. Narvekar and L. J. Karam, "An Improved No-Reference Sharpness

Metric Based on the

% Probability of Blur Detection," International Workshop on Video

Processing and Quality Metrics

% for Consumer Electronics (VPQM), http://www.vpqm.org, January 2010.

%

% DISCLAIMER:

% This software is provided by the copyright holders and contributors

% "as is" and any express or implied warranties, including, but not

% limited to, the implied warranties of merchantability and fitness for

% a particular purpose are disclaimed. In no event shall the Arizona

% Board of Regents, Arizona State University, IVU Lab members, or

% contributors be liable for any direct, indirect, incidental, special,

% exemplary, or consequential damages (including, but not limited to,

% procurement of substitute goods or services; loss of use, data, or

% profits; or business interruption) however caused and on any theory

% of liability, whether in contract, strict liability, or tort

% (including negligence or otherwise) arising in any way out of the use

% of this software, even if advised of the possibility of such damage.

%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% function : CPBD compute

% description : This function computes the CPBD metric which

determines

% the amount of sharpness of the image. Larger the

metric

% value, sharper the image.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

function [sharpness metric] = CPBD compute(input image)

%%%%%%%%%%%% pre-processing %%%%%%%%%%%%

% convert to gray scale if color image

[x y z] = size(input image);

if z > 1

input image = rgb2gray(input image);

end

% Robert Shilston's fix - ensure 8bit images are integers:

228

input image = round(input image);

% convert the image to double for further processing

input image = double(input image);

% get the size of image

[m,n] = size(input image);

%%%%%%%%%%%% parameters %%%%%%%%%%%%

% threshold to characterize blocks as edge/non-edge blocks

threshold = 0.002;

% fitting parameter

beta = 3.6;

% block size

rb = 64;

rc = 64;

% maximum block indices

max blk row idx = floor(m/rb);

max blk col idx = floor(n/rc);

% just noticeable widths based on the perceptual experiments

%widthjnb = [5*ones(1,57) 3*ones(1,200)];

widthjnb = [5*ones(1,51) 3*ones(1,205)];

%%%%%%%%%%%% initialization %%%%%%%%%%%%

% arrays and variables used during the calculations

total num edges = 0;

hist pblur = zeros(1,101);

cum hist = zeros(1,101);

%%%%%%%%%%%% edge detection %%%%%%%%%%%%

% edge detection using canny and sobel canny edge detection is done to

% classify the blocks as edge or non-edge blocks and sobel edge

% detection is done for the purpose of edge width measurement.

input image canny edge = edge(input image,'canny');

input image sobel edge = edge(input image,'Sobel',[2],'vertical');

%%%%%%%%%%%% edge width calculation %%%%%%%%%%%%

[width] = marziliano method(input image sobel edge, input image);

%%%%%%%%%%%% sharpness metric calculation %%%%%%%%%%%%

% loop over the blocks

for i=1:max blk row idx

for j=1:max blk col idx

% get the row and col indices for the block pixel positions

229

rows = (rb*(i-1)+1):(rb*i);

cols = (rc*(j-1)+1):(rc*j);

% decide whether the block is an edge block or not

decision = get edge blk decision(input image canny edge(rows,

cols), threshold);

% process the edge blocks

if (decision==1)

% get the edge widths of the detected edges for the block

local width = width(rows,cols);

local width = local width(local width ¬= 0);

% find the contrast for the block

blk contrast = blkproc(double(input image(rows,cols)),[rb

rc],@get contrast block)+1;

% get the block Wjnb based on block contrast

blk jnb = widthjnb(blk contrast);

% calculate the probability of blur detection at the edges

% detected in the block

prob blur detection = 1 - exp(-abs(local width./blk jnb).ˆ

beta);

% update the statistics using the block information

for k = 1:numel(local width)

% update the histogram

temp index = round(prob blur detection(k)* 100) + 1;

hist pblur(temp index) = hist pblur(temp index) + 1;

% update the total number of edges detected

total num edges = total num edges + 1;

end

end

end

end

% normalize the pdf

if(total num edges ¬=0)hist pblur = hist pblur / total num edges;

else

hist pblur = zeros(size(hist pblur));

end

% calculate the sharpness metric

sharpness metric = sum(hist pblur(1:64));

230

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% function : marziliano method

% description : This function calculates the edge-widths of the

detected

% edges and returns an matrix as big as the image with

0's

% at non-edge locations and edge-widths at the edge

% locations.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

function [edge width map] = marziliano method(E, A)

% edge width map consists of zero and non-zero values. A zero value

% indicates that there is no edge at that position and a non-zero value

% indicates that there is an edge at that position and the value itself

% gives the edge width

edge width map = zeros(size(A));

% converting the image to type double

A = double(A);

% find the gradient for the image

[Gx Gy] = gradient(A);

% dimensions of the image

[M N] = size(A);

% initializing the matrix to empty which holds the angle information of

the

% edges

angle A = [];

% calculate the angle of the edges

for m=1:M

for n=1:N

if (Gx(m,n)¬=0)angle A(m,n) = atan2(Gy(m,n),Gx(m,n))*(180/pi); % in

degrees

end

231

if (Gx(m,n)==0 && Gy(m,n)==0)

angle A(m,n) = 0;

end

if (Gx(m,n)==0 && Gy(m,n)==pi/2)

angle A(m,n) = 90;

end

end

end

if(numel(angle A) ¬= 0)

% quantize the angle

angle Arnd = 45*round(angle A./45);

count = 0;

for m=2:M-1

for n=2:N-1

if (E(m,n)==1)

%%%%%%%%%%%%%%%%%%%% If gradient angle = 180 or -180 %%

%%%%%%%%%%%%%%%%%%%%

if (angle Arnd(m,n) ==180 | | angle Arnd(m,n) ==-180)

count = count + 1;

for k=0:100

posy1 = n-1 -k;

posy2 = n-2 -k;

if ( posy2≤0)

break;

end

if ((A(m,posy2) - A(m,posy1))≤0)

break;

end

end

width count side1 = k + 1 ;

for k=0:100

negy1 = n+1 + k;

negy2 = n+2 + k;

if (negy2>N)

break;

end

if ((A(m,negy2) - A(m,negy1))≥0)

232

break;

end

end


edge width map(m,n) = width count side1+width count

side2;

end

%%%%%%%%%%%%%%%%%%%% If gradient angle = 0 %%%%%%%%%%%%

%%%%%%%%%%

if (angle Arnd(m,n) ==0)

count = count + 1;

for k=0:100

posy1 = n+1 +k;

posy2 = n+2 +k;

if ( posy2>N)

break;

end


break;

end

end


for k=0:100

negy1 = n-1 - k;

negy2 = n-2 - k;

if (negy2≤0)

break;

end


break;

end

end


edge width map(m,n) = width count side1+width count

side2;

end

233

end

end

end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% function : get edge blk decision

% description : Gives a decision whether the block is edge block or

not.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

function [im out] = get edge blk decision(im in,T)

[m,n] = size(im in);

L = m*n;

im edge pixels = sum(sum(im in));

im out = im edge pixels > (L*T) ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% function : get contrast block

% description : Returns the contrast of the block.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

function contrast = get contrast block(A)

A = double(A);

[m,n] =size(A);

% get constrast locally

contrast = max(max(A)) - min(min(A));

234

H.11 cranepeak

function score = cranepeak(theimage)

% H. D. Crane, "A theoretical analysis of the visual accommodation

% system in humans" Tech. Rep. NASA CR-606, NASA - Ames Research,

% Sep 1966. http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/

% 19660027855 1966027855.pdf on PDF page 41 or paper page 29:

% To be definite, we assume a first spatial derivative over the

% receptive field. The relevant measure is the integral of the

absolute

% magnitude of the derivative over the entire field. Suppose, for

% example, that a defocused edge falls across the field. Then the

% derivative will everywhere have the same polarity, and the

integral

% of the derivative over the field will (except for an arbitrary

% constant) simply be equal to the total light intensity. If

instead

% of a single edge, however, we focused a bar across the field,

then

% the derivative would have opposite polarity at the two edges, and

% integrating the absolute magnitude would result in a "measure of

% derivative over the field" that would be twice the value obtained

% with only a single edge.

%

% This implies an equation:

% score = sum(sum(abs(diff(diff(theimage)'))));

%

% However, this does not work for the image:

% 1 1 1 1

% 1 1 1 1

% 0 0 0 0

% 0 0 0 0

%

% >> theimage = [1 1 1 1; 1 1 1 1; 0 0 0 0; 0 0 0 0];

% >> score = sum(sum(abs(diff(diff(theimage)'))))

%

% score =

%

% 0

%

% >>

%

235

% As the first diff() function determines that there's no

horizontal

% difference, and so outputs an array of zeros. The second diff()

then

% sees that the input is all zeros, so outputs zero. ie score = 0.

% When, in actual fact, we'd expect the score to be non-zero This

can

% be found by instead computing the two-dimensional gradient, then

% evaluating the euclidean distance at each point:

%

% >> theimage = [1 1 1 1; 1 1 1 1; 0 0 0 0; 0 0 0 0];

% >> [px,py] = gradient(double(theimage))

%

% px =

%

% 0 0 0 0

% 0 0 0 0

% 0 0 0 0

% 0 0 0 0

%

%

% py =

%

% 0 0 0 0

% -0.5000 -0.5000 -0.5000 -0.5000

% -0.5000 -0.5000 -0.5000 -0.5000

% 0 0 0 0

%

% >> score = sum(sum(sqrt(px.ˆ2+py.ˆ2)))

%

% score =

%

% 4

%

% >>

%

[px,py] = gradient(double(theimage));

score = max(max(sqrt(px.ˆ2+py.ˆ2)));

end

236

H.12 cranesum

function score = cranesum(theimage)

% H. D. Crane, "A theoretical analysis of the visual accommodation

% system in humans" Tech. Rep. NASA CR-606, NASA - Ames Research,

% Sep 1966. http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/

% 19660027855 1966027855.pdf on PDF page 41 or paper page 29:

% To be definite, we assume a first spatial derivative over the

% receptive field. The relevant measure is the integral of the

absolute

% magnitude of the derivative over the entire field. Suppose, for

% example, that a defocused edge falls across the field. Then the

% derivative will everywhere have the same polarity, and the

integral

% of the derivative over the field will (except for an arbitrary

% constant) simply be equal to the total light intensity. If

instead

% of a single edge, however, we focused a bar across the field,

then

% the derivative would have opposite polarity at the two edges, and

% integrating the absolute magnitude would result in a "measure of

% derivative over the field" that would be twice the value obtained

% with only a single edge.

%

% This implies an equation:

% score = sum(sum(abs(diff(diff(theimage)'))));

%

% However, this does not work for the image:

% 1 1 1 1

% 1 1 1 1

% 0 0 0 0

% 0 0 0 0

%

% >> theimage = [1 1 1 1; 1 1 1 1; 0 0 0 0; 0 0 0 0];

% >> score = sum(sum(abs(diff(diff(theimage)'))))

%

% score =

%

% 0

%

% >>

237

%

% As the first diff() function determines that there's no

horizontal

% difference, and so outputs an array of zeros. The second diff()

then

% sees that the input is all zeros, so outputs zero. ie score = 0.

% When, in actual fact, we'd expect the score to be non-zero This

can

% be found by instead computing the two-dimensional gradient, then

% evaluating the euclidean distance at each point:

%

% >> theimage = [1 1 1 1; 1 1 1 1; 0 0 0 0; 0 0 0 0];

% >> [px,py] = gradient(double(theimage))

%

% px =

%

% 0 0 0 0

% 0 0 0 0

% 0 0 0 0

% 0 0 0 0

%

%

% py =

%

% 0 0 0 0

% -0.5000 -0.5000 -0.5000 -0.5000

% -0.5000 -0.5000 -0.5000 -0.5000

% 0 0 0 0

%

% >> score = sum(sum(sqrt(px.ˆ2+py.ˆ2)))

%

% score =

%

% 4

%

% >>

%

[px,py] = gradient(double(theimage));

score = sum(sum(sqrt(px.ˆ2+py.ˆ2)));

end

238

H.13 dopowerplot

function [freq,scale hist,orient,orient hist,alpha,alphaintercept,dummy

]...

=DoPowerPlot(image,nbins)

% Source code provided by Steven Dakin.

dummy = 0;

a = size(image);

scale hist = zeros(1,nbins);

freq = zeros(1,nbins);

orient = zeros(1,nbins);

orient hist = zeros(1,nbins);

f1=fftshift(abs(fft2(double(image))));

[X,Y]=meshgrid(-a(1)/2:a(1)/2-1,-a(2)/2:a(2)/2-1);

A=angle(X+Y.*i);

B=abs(X+Y.*i);

crit angle range=(2*pi)/nbins;

crit scale range=(sqrt(2)*a(1)/2.0)/nbins;

noctaves=log2(sqrt(2)*(a(1)/2));

for j=1:nbins

low scale=2ˆ(((j-1.0)/nbins) *noctaves);

high scale=2ˆ(((j)/nbins) *noctaves);

low angle=(j-0.5)*crit angle range-pi;

high angle=low angle+crit angle range;

s1=((B ≤ high scale)&(B>low scale));

s2=((A ≤ high angle)&(A>low angle));

scale sum=sum(sum(s1));

scale energy=sum(sum(s1.*f1));

if (scale sum>0)

scale hist(j)=scale energy/scale sum;

end

freq(j)=2ˆ(((j)/nbins) *noctaves );

orient sum=sum(sum(s2));

orient energy=sum(sum(s2.*f1));

if (orient sum>0)

orient hist(j)=orient energy/orient sum;

end

239

orient(j)=pi/2+(low angle+high angle)/2;

end

iScaleHist=interp1(scale hist,8);

iFreq=2.ˆ(([1:(8*nbins)]./(8*nbins)) .*noctaves );

[maxVal maxLoc]=max(iScaleHist);

maxFreq=iFreq(maxLoc);

aboveT=find((iScaleHist./max(iScaleHist))>0.5);

fprintf('peak sf. %3.3f FWHH %3.3f\n',...maxFreq,iFreq(aboveT(end))./iFreq(aboveT(1)));

goodVals=find(scale hist>0);

% Find alpha:

p1=polyfit(log(freq(goodVals)),log(scale hist(goodVals)),1);

alpha = p1(1);

alphaintercept = p1(2);

fprintf('Slope parameter, alpha = %3.3f\n',alpha);pred=polyval(p1,log(freq));

fweight=scale hist(2)/(1/(freq(2).ˆ1.5));

if nargout < 7,

subplot(3,1,1);

ishow(image);

subplot(3,1,2);

loglog(freq,scale hist,'o');

hold on;

loglog(freq,exp(pred),'b-');

hold off;

subplot(3,1,3);

plot(orient,orient hist,'o-');

end;

240

H.14 energylaplace

function score = energylaplace(theimage)

% Subbarao et al (1993) "Focusing Techniques"

% Journal of Optical Engineering


L = [-1 -4 -1; -4 20 -4; -1 -4 -1];

S = conv2(theimage,L).ˆ2;

score = sum(sum(S));

end

241

H.15 energylaplace5a

function score = energylaplace5a(theimage)


% http://www.ph.tn.tudelft.nl/¬imap/library/wouter/laplace5.htmlL =[ 0 0 -1 0 0

0 -1 -2 -1 0

-1 -2 +16 -2 -1

0 -1 -2 -1 0

0 0 -1 0 0];



end

242

H.16 energylaplace5b

function score = energylaplace5b(theimage)

% Energy laplace, plus a new 5x5 kernel.


% http://rsb.info.nih.gov/nih-image/more-docs/macros.html

L =[-1 -1 -1 -1 -1

-1 -1 -1 -1 -1

-1 -1 24 -1 -1

-1 -1 -1 -1 -1

-1 -1 -1 -1 -1];



end

243

H.17 energylaplace5c

function score = energylaplace5c(theimage)

% Energy laplace, plus a new 5x5 kernel.


% http://rsb.info.nih.gov/nih-image/more-docs/macros.html

L =[-1 -3 -4 -3 -1

-3 0 6 0 -3

-4 6 20 6 -4

-3 0 6 0 -3

-1 -3 -4 -3 -1];



end

244

H.18 entropy

function score = entropy(I)

% "Entropy Algorithm (Firestone et al., 1991). This algorithm

assumes

% that focused images contain more information than defocused

images."

% [1]

%

% References:


Focus


% 65:139 149 (2004)

%

% score = - sum over all i [p(i) log2(p(i))]

%

% where p(i) = h(i) / (width*height)

% and h(i) = probability of a pixel with intensity i

score = 0;


for i=min(min(I)):max(max(I)),

h = length(find(I==i));

p = h / (width*height);

if (p¬=0) score = score + (p * log2(p));

end;

score = - score;

end

245

H.19 groenvariance

function score = groenvariance(I)

% "Variance (Groen et al., 1985; Yeo et al., 1993). This algorithm

% computes variations in gray level among image pixels. It uses the

% power function to amplify larger differences from the mean

intensity

% mu instead of simply amplifying high-intensity values." [1]

%

% References:


Focus


% 65:139 149 (2004)

%

% score = 1/(HW) * 2d sum of [I(x,y) - mean(I)]ˆ2

%

[w h] = size(I);

score = 1/(h*w) * sum(sum( (I - mean(mean(I))).ˆ2 ));

end

246

H.20 histogramentropy

function score = histogramentropy(I)




% "If the intensity histogram is, h(i), where h(i) is the frequency

% of pixels of intensity i, then the histogram entropy is defined

% as -sum(h(i)*ln(h(i))) if h(i)!=0"


if max(I(:))<1, I = I * 255; end;

% First: Determine number of pixels with each intensity, and store

% in a histogram H

h = hist(I(:),0:255);

nonzerobins = find(h¬=0);

score = sum(h(nonzerobins).*log(h(nonzerobins)));

end

247

H.21 hlv

function score = hlv(I)




% "In the histogram of local variations, the intensity

% histogram is evaluated with pixel intensities compressed

% logarithmically and the gradient of the line of best fit through

% the points, m, is evaluated. The quantity, m, is at a minimum

% for the sharpest image. Since there are 256 gray levels, i = 0

% to 255, sum[ln(i+l)] and sum[ln(i+l)]ˆ2 is known and m may be

% evaluated.


if max(I(:))<1, I = I * 255; end;


% in a histogram H

h = hist(I(:),0:255);

% Construct the first term

term1 = 0;

for i = 1:256,

term1 = term1 + log(i+1)*h(i);

end;

% Compute m

m = (256 * term1 - 1167.26*sum(h)) / 60354.1;

% Then, as m is a minimum for the sharpest image, invert it:

score = 1/m;

% Finally, as some scores were negative, shift the score up by 1.

% All scores are normalised before comparison, so this has low

% impact.

score = score + 1;

end

248

H.22 imagepower

function score = imagepower(I)

% "Image Power (Santos et al., 1997). This algorithm sums the

square of

% image intensities above a given threshold."

% [1]

%

% "A threshold of 150 was set for focus algorithms (F-15 F -18)

because

% these algorithms exhibit satisfactory behavior with this

threshold

% value." [1] I presume this is with 8 bit pixel values (ie 0-255)

.

%

% References:


Focus


% 65:139 149 (2004)

%

% score = sum over entire image where I(x,y)>threshold

%

% First, make the image into a 1-D array:

[w h] = size(I);

I = reshape(I,w*h,1);

% Determine the thesold:

if (max(max(I))≤1),

% Assume this image is between 0-1, and so use a threshold of

% 150/255:

threshold = 150/255;

else

% Assume this is an 8 bit image

threshold = 150;

end;

% Find the matching values:

[r c v] = find(I>threshold);

% Square the intensities:

249

v = v.ˆ2;

% And sum the values:

score = sum(v);

end

250

H.23 jnbm

%=====================================================================

% File: JNBM compute.m

% Original code written by Rony Ferzli, IVU Lab (http://ivulab.asu.edu)

% Code modified by Lina Karam

% Last Revised: September 2009 by Lina Karam

%=====================================================================

% Copyright Notice:

% Copyright (c) 2007-2009 Arizona Board of Regents.

% All Rights Reserved.

% Contact: Lina Karam ([email protected]) and Rony Ferzli (rony.ferzli@asu.

edu)

% Image, Video, and Usabilty (IVU) Lab, http://ivulab.asu.edu

% Arizona State University

% This copyright statement may not be removed from this file or from

% modifications to this file.

% This copyright notice must also be included in any file or product

% that is derived from this source file.

%

% Redistribution and use of this code in source and binary forms,

% with or without modification, are permitted provided that the

% following conditions are met:

% - Redistribution's of source code must retain the above copyright

% notice, this list of conditions and the following disclaimer.

% - Redistribution's in binary form must reproduce the above copyright

% notice, this list of conditions and the following disclaimer in the

% documentation and/or other materials provided with the distribution.

% - The Image, Video, and Usability Laboratory (IVU Lab,

% http://ivulab.asu.edu) is acknowledged in any publication that

% reports research results using this code, copies of this code, or

% modifications of this code.


% R. Ferzli and L. J. Karam, "JNB Sharpness Metric Software",

% http://ivulab.asu.edu

%

% R. Ferzli and L. J. Karam, "A No-Reference Objective Image Sharpness

% Metric Based on the Notion of Just Noticeable Blur (JNB)," IEEE

% Transactions on Image Processing, vol. 18, no. 4, pp. 717-728, April

% 2009.

%

% DISCLAIMER:

% This software is provided by the copyright holders and contributors

251

% "as is" and any express or implied warranties, including, but not

% limited to, the implied warranties of merchantability and fitness for

% a particular purpose are disclaimed. In no event shall the Arizona

% Board of Regents, Arizona State University, IVU Lab members, or

% contributors be liable for any direct, indirect, incidental, special,

% exemplary, or consequential damages (including, but not limited to,

% procurement of substitute goods or services; loss of use, data, or

% profits; or business interruption) however caused and on any theory

% of liability, whether in contract, strict liability, or tort

% (including negligence or otherwise) arising in any way out of the use

% of this software, even if advised of the possibility of such damage.

%

function [metric] = JNBM compute(A)

% Robert Shilston's fix - ensure 8bit images are integers:

A = round(A);

%A = imfilter(A,1/9*ones(3,3));

beta = 3.6;

T = 0.002;

A = double(A);

[m,n] = size(A);

rb = 64;

rc = 64;

count = 1;

C=1:255;

widthjnb = [5*ones(1,60) 3*ones(1,30) 3*ones(1,180)];

for i=1:floor(m/rb)

for j=1:floor(n/rc)

row = rb*(i-1)+1:rb*i;

col = rc*(j-1)+1:rc*j;

A temp = A(row,col);

% check if block to be processed

decision = get edgeblocks mod(A temp,T);

if (decision==1)

local width = edge width(A temp);

Ac meas = blkproc(A temp,[rb rc],@get contrast block);

Ajnb = widthjnb(Ac meas+1);

temp(count) = sum(abs(local width./Ajnb).ˆbeta).ˆ(1/beta);

count = count + 1;

end

252

end

end

blockrow = floor(m/rb);

blockcol = floor(n/rc);

L = blockrow*blockcol;

metric = (L/(sum(temp.ˆbeta).ˆ(1/beta)));

%

function [local] = edge width(A)

% Compute edge width based on following paper:

% P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi,

% Perceptual blur and ringing metrics: Applications to JPEG2000,

% Signal Proc.: Image Comm., vol. 19, pp. 163 172 , 2004.

A = double(A);

E = edge(A,'Sobel',[],'vertical');

%E = edge(A,'Sobel');

[Gx Gy] = gradient(A);

% Magnitude

graA = abs(Gx) + abs(Gy);

[M N] = size(A);

for m=1:M

for n=1:N

if (Gx(m,n)¬=0)angle A(m,n) = atan2(Gy(m,n),Gx(m,n))*(180/pi); % in

degrees

end

if (Gx(m,n)==0 && Gy(m,n)==0)

angle A(m,n) = 0;

end

if (Gx(m,n)==0 && Gy(m,n)==pi/2)

angle A(m,n) = 90;

end

end

end

% quantize the angle

angle Arnd = 45*round(angle A./45);

width loc = [];

count = 0;

for m=2:M-1

for n=2:N-1

if (E(m,n)==1)

253

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%

%%%%%%%%%%%%%%%%%%%% If gradient = 0 %%%%%%%%%%%%%%%%%%%%%%

%%%%

%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%

if (angle Arnd(m,n) ==180 | | angle A(m,n) ==-180)

count = count + 1;

for k=0:100

posy1 = n-1 -k;

posy2 = n-2 -k;

if ( posy2≤0)

break;

end


break;

end

end


for k=0:100

negy1 = n+1 + k;

negy2 = n+2 + k;

if (negy2>N)

break;

end


break;

end

end


width loc = [width loc width count side1+width count

side2];

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%

%%%%%%%%%%%%%%%%%%%% If gradient = 0 %%%%%%%%%%%%%%%%%%%%%%

%%%%

%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%

if (angle Arnd(m,n) ==0)

254

count = count + 1;

for k=0:100

posy1 = n+1 +k;

posy2 = n+2 +k;

if ( posy2>N)

break;

end


break;

end

end


for k=0:100

negy1 = n-1 - k;

negy2 = n-2 - k;

if (negy2≤0)

break;

end


break;

end

end


width loc = [width loc width count side1+width count

side2];

end

end

end

end

local = width loc;

%

function im out = get edgeblocks mod(im in,T)

im in = double(im in);

[im in edge,th] = edge(im in,'canny');

255

[m,n] = size(im in edge);

L = m*n;

im edge pixels = sum(sum(im in edge));

im out = im edge pixels > (L*T) ;

%

function contrast = get contrast block(A)

A = double(A);

[m,n] =size(A);

contrast = max(max(A)) - min(min(A));

256

H.24 laplace

function score = laplace(theimage)




%

% Whilst the paper doesn't explicitly state that this is

% measure is a convolution, its context implies that this

% is the case.

%



T = 0;

L = 1/6 * [1 4 1; 4 -20 4; 1 4 1];

S = conv2(theimage,L);

score = 1/(w*h) * sum(sum(S(find(S>T))));

end

257

H.25 masgrn

function score = masgrn(I)

% "Mason and Green's histogram method differs from the Mendelsohn

and

% Mayall histogram method in the way the threshold is selected.

They

% weighed the importance of picturepoints by estimates of the

gradient

% at that point." [1]

%

% References:

% [1] "Comparison of autofocus methods for automated microscopy" by

% Firestone et al in CYTOMETRY 12:195-206 (1991)

% [2] "Automatic Focusing of a Computer-Controlled Microscope" by

% Mason and Green IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING,

% VOL. BME-22, NO. 4, JULY 1975.

%

% This implementation uses equations 10-12 from [1].


if max(I(:))<1, I = I * 255; end;

% First: Compute the 'importance' of each pixel, except for the

edge

% pixels where importance is not defined.

[width height]=size(I);

importance = zeros(size(I));

for i=1+1:width-1,

for j=1+1:height-1,

term1 = 2 * (I(i,j-1)-I(i,j+1))ˆ2;

term2 = 2 * (I(i-1,j)-I(i+1,j))ˆ2;

term3 = (I(i-1,j-1)-I(i+1,j+1))ˆ2;

term4 = (I(i-1,j+1)-I(i+1,j-1))ˆ2;

importance(i,j) = term1 + term2 + term3 + term4;

end;

end;

% Then determine the threshold:

T = sum(sum(importance.*I))/sum(sum(importance));

% Build the histogram:

258

H = hist(I(:),0:255);

% Finally, the score:

score = 0;

for k=ceil(T):255,

score = score + H(k) * (k-T);

end;

end

259

H.26 menmay

function score = menmay(I)

% "Mendelsohn and Mayall's histogram method. The focus function is

% found by computing the weighted sum of picturepoints in histogram

% bins that are above a given threshold." [1]

%

% References:



% [2] "Computer-oriented analysis of human chromosomes - III Focus"

% by Mendelsohn and Mayall in Comput. Biol. Med. 2:137-150.

%

% This implementation uses equation 9 from [1].


if max(I(:))<1, I = I * 255; end;


% in a histogram H

H = hist(I(:),0:255);

% Second: Compute mean intensity of the image, rounding up to the

% nearest integer, and store this as T.

T = ceil(mean(I(:)));

% score = sum of multiplying the bin index with the number of

% pixels in that bin, for all bins > T.

score = 0;

for k=(T+1):255,

score = score + k * H(k);

end;

end

260

H.27 normalizedgroenvariance

function score = normalizedgroenvariance(I)

% "Normalized Variance (Groen et al., 1985; Yeo et al., 1993). By

% normalizing the final output with the mean intensity mu, this

% algorithm compensates for the differences in average image

intensity

% among different images." [1]

%

% References:


Focus


% 65:139 149 (2004)

%

% score = 1/(H*W*mean(I)) * 2d sum of [I(x,y) - mean(I)]ˆ2

%

[w h] = size(I);

score = 1/(h*w*mean(mean(I))) * sum(sum( (I - mean(mean(I))).ˆ2 ));

end

261

H.28 phasecongruence

function score = phasecongruence(I)

% 2d summation of phasecongruency metric proposed by Kovesi, using

% original parameters [1, section 9.1].

% Depends on functions from

% http://mitpress.mit.edu/e-journals/Videre/001/articles/Kovesi/

[pc orient ft] = phasecong(I);

nonmax = nonmaxsup(pc, orient, 1.5);

features = hysthresh(nonmax, 0.3, 0.15);

score = sum(sum(features));

end

262

H.29 phasecongruence2

function score = phasecongruence2(I)

% 2d summation of phasecongruency metric proposed by Kovesi,

ommitting

% the original thresholding parameters parameters [1, section 9.1].

% Depends on functions from

% http://mitpress.mit.edu/e-journals/Videre/001/articles/Kovesi/

[pc orient ft] = phasecong(I);

nonmax = nonmaxsup(pc, orient, 1.5);

score = sum(sum(nonmax));

end

263

H.30 randomnumber

function score = randomnumber(theimage)

% This simply returns a random number.

score = rand;

end

264

H.31 range

function score = range(I)

% "Range Algorithm (Firestone et al., 1991). This algorithm

computes

% the difference between the highest and the lowest intensity

levels."

% [1]

%

% References:


Focus


% 65:139 149 (2004)

%

% [2] "Comparison of autofocus methods for automated microscopy",

by

% Firestone et al in Cytometry, vol 12, 3:195-206, 1991

%

% score = max(I) - min(I)

%

score = max(max(I)) - min(min(I));

end

265

H.32 rawlaplace

function score = rawlaplace(theimage)


L = [-1 -4 -1; -4 21 -4; -1 -4 -1];

S = conv2(theimage,L);

score = sum(S(:));

end

266

H.33 rmscontrast

function score = rmscontrast(theimage)

% RMS Contrast

%

% Equations 4a and 4b from E. Peli (Oct. 1990). "Contrast in

Complex

% Images". Journal of the Optical Society of America A 7 (10):

% pp 2032 2040. doi:10.1364/JOSAA.7.002032.


% Mean (eq 4b)

xmean = mean(theimage(:));

% RMS contrast (eq 4a: rms = ((1/(n-1)) * sum((x - xmean)ˆ2) )ˆ2)

% First, compute the centre of the summation:

sumvals = (theimage - xmean).ˆ2;

sumresult = sum(sumvals(:));

score = ((1 / w*h) * sumresult)ˆ0.5;

end

267

H.34 scorefocusmeasure

function [A R Rmax Rscoring F W N] = ...

scorefocusmeasure(score, groundtruthindex),

% Prune the input data, in case we've a dual peak.

groundtruthindex = floor(mean(groundtruthindex));

peakindex = floor(mean(find(score==max(score(:)))));

% Ensure that the score is scaled 0-1

score(find(score==0))=nan;

activerange = length(score) - length(find(isnan(score)));

scalefactor = (max(score)-min(score));

if scalefactor==0, scalefactor=1; end;

score = (score-min(score))/scalefactor;

score(find(isnan(score)))=0;

% * Accuracy (A): Distance between maxima of the focus curve and

the

% ground truth of `best' image, measured in number of image frames

of

% distance.

A = abs(peakindex-groundtruthindex);

% * Range (R): The distance (in number of images) between the first

% minima on either side of the global maxima. This should be large,

as

% there should not be any local maxima on the focus curve.

% First, find the near side range:

nearrange=inf;

for i = peakindex:-1:2,

if score(i-1) > score(i),

nearrange = i;

break;

end;

if score(i) == 0,

nearrange = i;

break;

end;

end;

if nearrange==inf,

nearrange = peakindex;

268

else,

nearrange = peakindex - nearrange;

end;

% Then the far side range:

farrange=inf;

for i = (peakindex+1):(length(score)-1),

if score(i+1) > score(i),

farrange = i;

break;

end;

if score(i) == 0,

farrange = i;

break;

end;

end;

if farrange==inf,

farrange = length(score) - peakindex;

else,

farrange = farrange - peakindex;

end;

R = farrange + nearrange;

Rmax = activerange;

Rscoring = Rmax - R;

% * Number of false maxima (F): The number of maxima appearing in a

% focus cuve, excluding the global maximum.

F = 0;

for i = 2:length(score)-1,

if i¬=peakindex,if score(i)>score(i-1) & score(i)>score(i+1),

F = F + 1;

end;

end

end

% * Width (W): The width of the curve (in number of images) at 50%

of

% the maxima's height. Ideally this should be small.

maximaheight = score(peakindex);

I = find(score > (maximaheight/2));

if length(I),

W = max(I) - min(I);

else

269

W = 1;

end;

% * Noise level (N): This describes the speed of the direction of

% change between two false maxima of a focus curve. It is computed

by

% taking the sum of squares of the second derivatives obtained by

% convolving the curve (ommitting the peak value) with the kernel

% (-1; 2;-1).

trimmedscore = [score(1:peakindex-1) score(peakindex+1:end)];

derivatives = conv(trimmedscore, [-1 2 -1]);

N = sum(derivatives.ˆ2);

% NB. The score cannot be computed here, as we need to normalise

across

% all the images being assessed within this group.

% S = sqrt(Aˆ2 + (length(score) - R)ˆ2 + Wˆ2 + Nˆ2 + Fˆ2);

end

270

H.35 setalpha

function [outputim,newalpha]=SetAlpha(im,alpha)

% This was the script that resulted after writing ExploreAlpha. The

% matrix Z was discovered by trial and error, with a little bit of

% thought.

%

% If this starts failing to set alpha to a particular desired value

,

% then further analysis with ExploreAlpha will be required.

% Specifically it is like that the equation y=mx+c might not hold

for

% the parameter variations, or m might be different and so rrr is

% being erroneously calculated.

if (size(im)¬=[512 512]),

error('This function is only implemented for 512x512 images');

end;

% Get the image in, into the frequency domain

A = im;

[dummy, dummy, dummy, dummy, oldalpha, dummy, dummy] ...

= DoPowerPlot(A,32);

Afft = fft2(double(A));

Affts = fftshift(Afft);

Affts abs = abs(Affts);

Affts angle = angle(Affts);

% Work out the parameter by which to adjust the image. This

equation

% was determined by measuring how alpha varied with rrr for two

images

% ('farm' and Van Hateren's 'imk00005').

% Farm was approx 0.995, and VH5 was around 0.987, but these were

% determined by reading from graphs, so it's likely that they're

the

% same.

rrr = (alpha - oldalpha)/0.995;

% Build a matrix to manipulate it:

[X,Y]=meshgrid(linspace(-1,1,512));

271

Z = sqrt(X.ˆ2+Y.ˆ2);

Z = Z.ˆrrr;

% Ensure that the DC component doesn't change, and normalise the

matrix

% magnitude:

Z(257,257) = 0;

Z = Z./max(Z(:));

Z(257,257) = 1;

% Now apply the scaling:

NA = Z.*Affts abs;

Nffts = NA .* (cos(Affts angle)+i*sin(Affts angle));

Nfft = ifftshift(Nffts);

N = ifft2(Nfft);

N = real(N); % remove any residuals.

N = uint8(N);

[dummy, dummy, dummy, dummy, newalpha, dummy, dummy] ...

= DoPowerPlot(N,32);

outputim = N;

end

272

H.36 smd

function score = smd(theimage)




ydiff = diff(theimage);

xdiff = diff(theimage');

score = sum(abs(ydiff(:))) + sum(abs(xdiff(:)));

end

273

H.37 sml

function score = sml(I)

% [1] eq F-5, defines this as:

% score = sum over height ( sum over width ( abs(Lx(x,y))+abs(Ly(x,

y) )

%

% but the original paper [2, equations 7-11] define SML as:

%

% ML(x,y) = abs(2I(x,y) - I(x-step,y) - I(x+step,y)) +

% abs(2I(x,y) - I(x,y-step) - I(x,y+step))

%

% then the focus at a point is defined by the sum of the modified

% laplacian around some small window:

%

% F(i,j) = Sum from x=i-N to x=i+N of

% Sum from y=j-N to y=j+N of

% ML(x,y) where ML(x,y)≥T1

% where N = 1 or 2.

%

% Nayar [2, p14] shows that SML better than Tenengrad in textured

% images, and that Tenengrad computed with T=1, SML with T1=7, step

=1.

% However, the paper doesn't explicitly say which N to use. This

is

% not a significant problem for the use in this application, as we'

re

% looking for a global score, not a score in a particular location.

% As such, the equation that's implemented here is:

%

% score = sum over all points of

% ML(x,y) where ML(x,y)≥T1

%

% References:


Focus


% 65:139 149 (2004)

%

% [2] "Shape from Focus", Shree Nayar, 1989,

% http://www1.cs.columbia.edu/CAVE/publications/pdfs/Nayar TR89.pdf

%

274

step = 1;

T1 = 7;





term1 = abs(2*I(x,y)-I(x-step,y)-I(x+step,y));

term2 = abs(2*I(x,y)-I(x,y-step)-I(x,y+step));

ML(x,y) = term1 + term2;

end;

end;


end

275

H.38 squaredgradient

function score = squaredgradient(I, T)


analysis"


-272

% http://dx.doi.org/10.1046/j.1365-2818.1997.2630819.x

%


% sum over width

% abs(i(x+1,y)-i(x,y))ˆ2

% where abs(i(x+1,y)-i(x,y))ˆ2 > T

%


step = 1;

T1 = 25;





ML(x,y) = abs(I(x+step,y)-I(x,y))ˆ2;

end;

end;


end

276

H.39 standarddeviationbasedautocorrelation

function score = standarddeviationbasedautocorrelation(I)

% "Standard Deviation-Based Correlation (Vollath, 1987, 1988)." [1]

%

% References:


Focus


% 65:139 149 (2004)

%

% score = sum over all image of i(x, y) . i(x + 1, y)

% - H * W * mean(i)ˆ2

%

[w h] = size(I);

score = 0;

for i=1:w-2,

for j=1:h,

score = score + (I(i,j)*I(i+1,j));

end;

end;

score = score - h*w*mean(mean(I))ˆ2;

end

277

H.40 tenengrad

function score = tenengrad(theimage)

% The method is to estimate the gradient VI(x,y)

% at each image point (x,y), and simply to sum all

% the magnitudes greater than a threshold value.

% The gradient magnitude is

%

% | ∆ I(x,y) | = sqrt(Ixˆ2+Iyˆ2).

%

% The partials can be estimated by many discrete

% operators. We employ the Sobel operator with the

% convolution kernels

%

% ix = 0.25 * [-1 0 1; -2 0 2; -1 0 1];

% iy = 0.25 * [1 2 1; 0 0 0; -1 -2 -1];

%

% We compute the gradient magnitude as

%

% S(x,y) = sqrt([ix I(x,y)]ˆ2 + [iy I(x,y)]ˆ2)

%

% and state the criterion function as

%

% max of 2d summation of S(x,y)ˆ2, for S(x,y)>T

%

% where T is a threshold [1]

%

%

% It is noted [2, p265] that "Although this function made use of a

% threshold in its initial form, following Krotkov (1987) no

threshold

% is proposed".

%

% [1] "Focusing", Krotkov in Journal of Computer Vision, Vol 1,

% pp 223-237, 1987

%

% [2] "Evaluation of autofocus functions in molecular cytogenetic

% analysis" by Santos et al in Journal of Microscopy Vol.188 Issue

03,

% pp 264-272 http://dx.doi.org/10.1046/j.1365-2818.1997.2630819.x

%

278

thethreshold=0; % several papers don't use a threshold


ix = 0.25 * [-1 0 1; -2 0 2; -1 0 1];

iy = 0.25 * [1 2 1; 0 0 0; -1 -2 -1];

% We should do:

% S = sqrt(conv2(theimage(:,:),ix).ˆ2+conv2(theimage(:,:),iy).ˆ2);

% score = sum(sum(find(S>thethreshold).ˆ2));

% But, it's more efficient not to sqrt, but instead compare against

a

% squared threshold:

thethreshold = thethresholdˆ2;

S = conv2(theimage(:,:),ix).ˆ2+conv2(theimage(:,:),iy).ˆ2;

if thethreshold==0,

score = sum(sum(S.ˆ2));

else

error('score = sum(sum(find(S>thethreshold).ˆ2));');

end;

end

279

H.41 thresholdedabsolutegradient

function score = thresholdedabsolutegradient(I, T)


analysis"


-272

% http://dx.doi.org/10.1046/j.1365-2818.1997.2630819.x

%


% sum over width

% abs(i(x+1,y)-i(x,y))

% where abs(i(x+1,y)-i(x,y)) > T

%


step = 1;

T1 = 5;





ML(x,y) = abs(I(x+step,y)-I(x,y));

end;

end;


end

280

H.42 thresholdedcontent

function score = thresholdedcontent(I)

% "Thresholded Content (Groen et al., 1985; Mendelsohn and Mayall,

% 1972). This algorithm sums the pixel intensities above a

threshold."

% [1]

%


because


threshold


.

%

% References:


Focus


% 65:139 149 (2004)

%

% score = sum over entire image where I(x,y)>threshold

%


[w h] = size(I);





% 150/255:


else


threshold = 150;

end;


[r c v] = find(I>threshold);

% And sum the values:

281

score = sum(v);

end

282

H.43 thresholdedpixelcount

function score = thresholdedpixelcount(I)

% "Thresholded Pixel Count (Groen et al., 1985). This algorithm

counts

% the number of pixels having intensity below a given threshold."

% [1]

%


because


threshold


.

%

% References:


Focus


% 65:139 149 (2004)

%

% score = sum over entire image of s(x,y)

%

% where s(x,y) = 1 when i(x,y)<threshold; 0 otherwise


[w h] = size(I);





% 150/255:


else


threshold = 150;

end;


matchingindexes = find(I<threshold);

283

% And count the matching pixels

score = length(matchingindexes);

end

284

H.44 triakis11s

function score = triakis11s(I)

% References:




if max(I(:))<1, I = I * 255; end;

% Resize to 64x64 (as used by [1]). First, square the image:

[d1 d2]=size(I);

if d2>d1,


I = I(1:d1, (1+offset):(offset+d1));

else


I = I((1+offset):(offset+d2), 1:d2);

end;

% Then sample the image down to 64x64

I = imresize(I,[64 64]);

% Determine boundaries:


% Create and populate a 3D binary space:

v = zeros(width,height,255,'uint8');

filterop = logical(v);

v = logical(v);

% Fill the space:

for i = 1:width,

for j = 1:height,

if I(i,j),

v(i,j,1:ceil(I(i,j))) = true;

end;

end;

end;

% For triakis11, we're looking to pass neighbourhoods which have

285

% 11 voxels set.

% However, [1] used a face-centred cubic close pack,

% meaning that each voxel had 12 neighbouring pixels (a

% ring of six around it in one plane, then three sitting

% above and beneath). No explanation was made as to how

% the cartesian-image was transformed into a hexagonal

% packed arrangement for searching.

%

% This implementation pushes the cartesian voxel

% arrangement slightly, so that it becomes hexagonal.

% Then, for an arbitrary point (i,j,k), the neighbours are:

%

% Beneath (k-1)

% NB. o denotes the site of the original pixel in the adjacent

plane.

%

% (i,j,k-1)

% o

% (i,j-1,k-1) (i+1,j-1,k-1)

%

%

%

% Same level (k):

% NB. x denotes site of neighbours in adjacent planes

%

% (i-1,j+1,k) (i,j+1,k)

% x2

% (i-1,j,k) ORIGINAL (i+1,j,k)

% x1 x3

% (i,j-1,k) (i+1,j-1,k)

%

%

% Above (k+1)

% NB. o denotes the site of the original pixel in the adjacent

plane.

%

% (i,j,k+1)

% o

% (i,j-1,k+1) (i+1,j-1,k+1)

%

% Create a mask space:

neighbourmask = zeros(3,3,3);

286

% Define centre of mask:

i=2; j=2; k=2;

% Set mask based on the above voxel selection:

neighbourmask(i,j,k-1)=1;

neighbourmask(i,j-1,k-1)=1;

neighbourmask(i+1,j-1,k-1)=1;

neighbourmask(i-1,j+1,k)=1;

neighbourmask(i,j+1,k)=1;

neighbourmask(i-1,j,k)=1;

neighbourmask(i+1,j,k)=1;

neighbourmask(i,j-1,k)=1;

neighbourmask(i+1,j-1,k)=1;

neighbourmask(i,j,k+1)=1;

neighbourmask(i,j-1,k+1)=1;

neighbourmask(i+1,j-1,k+1)=1;

% Search the space for matching criteria:

for i=2:width-1,

for j=2:height-1,

for k=2:254,

% Extract region:

extract = v(i-1:i+1,j-1:j+1,k-1:k+1);

% Reduce from 26 neighbour voxels to 12:

extract = extract .* neighbourmask;

% See if this is a matching neighbourhood:

if sum(extract(:))==11,

% Merge this neighbour into the filter output:

term1 = filterop(i-1:i+1,j-1:j+1,k-1:k+1);

term2 = v(i-1:i+1,j-1:j+1,k-1:k+1);

filterop(i-1:i+1,j-1:j+1,k-1:k+1) = term1 | term2 ;

end;

end;

end;

end;

% Finally, determine the score:

score = sum(filterop(:));

end

287

H.45 va

function score = va(theimage)

% Convert the input into 24 bit colour, but still greyscale:

if max(theimage(:))>1, theimage = theimage ./ 256; end;

theimage24(:,:,1) = theimage;



% First: write the image to disk as a BMP:

vafn = ['va' num2str(floor(rand*100000)) '.bmp'];

imwrite(theimage24, vafn, 'BMP');

% And compute the score:

theexe = 'C:\Docume¬1\RobShi¬1\Desktop\Phd\TransferReport\focusmeasures\va.exe ';

theoptions = ' --tests=2 --colourcomparison=eucrgb --showprogress=

no';

thecmd = [ theexe theoptions ' ' vafn];

[stat result]=system(thecmd);

score = str2num(result);

delete(vafn)

end

288

H.46 voll4

function score = voll4(I)

% References:


analysis"

% by Santos et al in Journal of Microscopy, Vol 188(3), pp264-272,

% December 1997.


if max(I(:))<1, I = I * 255; end;



% Initialise variables:

score = 0;

% Part one of the equation:

for i = 1:(width-1),

for j = 1:height,

score = score + I(i,j)*I(i+1,j);

end;

end;

% Part two of the equation:


for j = 1:height,

score = score - I(i,j)*I(i+2,j);

end;

end;

end

289

H.47 voll5

function score = voll5(I)

% References:


analysis"

% by Santos et al in Journal of Microscopy, Vol 188(3), pp264-272,

% December 1997.


if max(I(:))<1, I = I * 255; end;



% Initialise variables:

score = 0;


for j = 1:height,

score = score + I(i,j)*I(i+1,j);

end;

end;

score = score - width*height*(mean(I(:)).ˆ2);

end

290

H.48 waveletw1

function score = waveletw1(I)

% "This algorithm uses the Daubechies D6 wavelet filter, applying

both

% high-pass (H) and lowpass (L) filtering to an image. The

resultant

% image is divided into four subimages: LL, HL, LH, and HH. The

% algorithm sums the absolute values in the HL, LH, and HH regions.

"

%

% "Autofocusing in Computer Microscopy: Selecting the Optimal Focus


% 65:139 149 (2004)

% First, install wavelab

I = double(I);

I = makedyadic(I);

d6 = MakeONFilter('Daubechies', 6);

wlI = FWT2 PO(I,5,d6);

[w h] = size(wlI);

ll = wlI(1:w/2, 1:h/2);

lh = wlI(1:w/2, h/2:h);

hl = wlI(w/2:w, 1:h/2);

hh = wlI(w/2:w, h/2:h);

score = sum(sum(abs(hl))) + sum(sum(abs(lh))) + sum(sum(abs(hh)));

end

291

H.49 waveletw2


% "This algorithm sums the variances in the HL, LH, and HH regions.

The

% mean values, mu, in each region are computed from absolute values

."

%

% "Autofocusing in Computer Microscopy: Selecting the Optimal Focus


% 65:139 149 (2004)

%

% "Wavelet-Based Autofocusing and Unsupervised Segmentation of

% Microscopic Images" by Yang and Nelson in Proceedings of the 2003

% IEEE/RSJ lntl Conference on Intelligent Robots and Systems, Las

Vegas

% Nevada October 2003, p2144 -- 2148.

%

% NB. Equation F8 of Sun et al appears to contain a typographical

error

% when compared with Equation 5 of Yang and Nelson.

%

% score = 1/(HW) * 2d sum of

% [abs(HL) - mean(abs(HL))]ˆ2 +

% [abs(LH) - mean(abs(LH))]ˆ2 +

% [abs(HH) - mean(abs(HH))]ˆ2

%

% Furthermore, Sun indicates the summation is over the height and

width

% of the image, whilst Yang sums over the set of i,j present in the

% arrays HL, LH and HH respectively. It's this latter approach

which

% is implemented. It should also be noted that [1]'s equation F8

is

% simply not present in [2] - it looks to be a later version of Eq

5.

%

% This requires Wavelab (http://www-stat.stanford.edu/¬wavelab/)

I = double(I);

I = makedyadic(I);


292


[w h] = size(wlI);

ll = wlI(1:w/2, 1:h/2);

lh = wlI(1:w/2, h/2:h);

hl = wlI(w/2:w, 1:h/2);

hh = wlI(w/2:w, h/2:h);

hlsum = sum(sum((abs(hl) - mean(mean(abs(hl)))).ˆ2));

lhsum = sum(sum((abs(lh) - mean(mean(abs(lh)))).ˆ2));

hhsum = sum(sum((abs(hh) - mean(mean(abs(hh)))).ˆ2));

score = 1/(h*w) * (hlsum + lhsum + hhsum);

end

293

H.50 waveletw3


% "This algorithm differs from [waveletw2] in that the mean values,

mu,

% in the HL, LH, and HH regions are computed without using absolute

% values." [1]

%

% References:


Focus


% 65:139 149 (2004)

%

% [2] "Wavelet-Based Autofocusing and Unsupervised Segmentation of

% Microscopic Images" by Yang and Nelson in Proceedings of the 2003

% IEEE/RSJ lntl Conference on Intelligent Robots and Systems, Las

Vegas

% Nevada October 2003, p2144 -- 2148.

%

% NB. Equation F8 of Sun et al appears to contain a typographical

error

% when compared with Equation 5 of Yang and Nelson. This uses the

Yang

% formula.

%

% score = 1/(HW) * 2d sum of

% [HL - mean(HL)]ˆ2 +

% [LH - mean(LH)]ˆ2 +

% [HH - mean(HH)]ˆ2

%

% Furthermore, Sun indicates the summation is over the height and

width

% of the image, whilst Yang sums over the set of i,j present in the

% arrays HL, LH and HH respectively. There's also discrepancy as

to

% the definition of the mean that's used, as well as whether or not

to

% take absolutes. For the avoidance of doubt, the original paper

[2]'s

% equations are used.

%

294

% This requires Wavelab (http://www-stat.stanford.edu/¬wavelab/)

I = double(I);

I = makedyadic(I);



[w h] = size(wlI);

ll = wlI(1:w/2, 1:h/2);

lh = wlI(1:w/2, h/2:h);

hl = wlI(w/2:w, 1:h/2);

hh = wlI(w/2:w, h/2:h);

hlsum = sum(sum((hl - mean(mean(hl))).ˆ2));

lhsum = sum(sum((lh - mean(mean(lh))).ˆ2));

hhsum = sum(sum((hh - mean(mean(hh))).ˆ2));

score = 1/(h*w) * (hlsum + lhsum + hhsum);

end

295

Documents

Blur perception: An evaluation of focus measures