Objective image quality assessment for X-ray imaging · Objective image quality assessment for X-ray imaging Jop Daalmans May 30, 2008 Supervisors: ... Using an imperfect reference

Master Thesis

Objective image quality assessment

for X-ray imaging

Jop Daalmans

May 30, 2008

Supervisors:

Dr. Ronald Westra, University of Maastricht

Dr. Ir. Paul Withagen, Philips Healthcare

University of Maastricht

Faculty of Humanities and Sciences

Master Operations Research

Abstract

This thesis describes research in the area of objective image quality assessment for X-ray imaging.This work was performed at Philips Healthcare in Best and partly at the University of Maastricht,as a �nal project of the Master study Knowledge Engineering / Operations Research.

The objective of this thesis is to provide a method for objective evaluation of image quality(IQ) on cardiovascular X-ray images. This method can be used to �nd optimal parameter settingsfor noise reduction algorithms.

Several image quality assessment models are reviewed leading to further research in the "full-reference model" approach. The full-reference model was originally designed to measure thestrength of image compression algorithms. A compressed image is compared with the originalimage. The similarity between these two images determines the quality of the compressed image.We have changed the setup of the full-reference model such that it can quantify the strength ofimage enhancement algorithms. The major problem encountered is that there is no "perfect"quality reference image available when enhancing images. By averaging a set of 100 images weobtained an almost noiseless image that we use as a reference.

Using an imperfect reference image however gives rise to an interpretation problem. Full-reference metrics calculate the similarity between two images by adding up all the di�erencesbetween the images. Since our reference image is not perfect, a di�erence can also be an im-provement. As the full-reference methods can not judge whether these di�erences constitute animprovement or not, any change will be graded negatively. This problem is addressed as the"Judging Problem". In order to solve this problem, the noiseless reference image �rst is processedinto a less sharper test image. Changes of the test image which make it more similar to the originalreference image, can now be identi�ed as improvements. This way we have reduced the JudgingProblem.

We will address this issue as "Judging Problem". To solve this problem the sharpness of thetest image is reduced. A reasonable improvement in sharpness will now make the test image moresimilar to the reference image.

This model has been tested for two full-reference metrics, namely the Structural SIMilarityindex (SSIM) and the Visual Information Fidelity (VIF). We have shown that our model is ableto �nd the optimal parameter settings for an image enhancement algorithm measured by SSIMand VIF.

Furthermore, an information theoretic metric that calculates the Information Capacity (IC),created by Papalazarou, has also been evaluated. This metric is based on Shannon's formulationswhich describe the maximum information rate that can be transmitted through a communicationchannel, in this case the imaging system. This is connected to the amount of information presenton the received image, and thus re�ects its quality. We have designed a model to test this metricand show that this model is able to �nd the optimal values for parameter settings of an imageenhancement algorithm measured by IC.

Keywords: Image quality assessment, X-ray imaging, full-reference

Contents

1 Introduction 3

2 Problem description and objectives 4

3 Concepts in image processing and objective image quality 5

3.1 The human visual system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Distinguishing properties of images . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2.1 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2.2 Sharpness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2.3 Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2.4 Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3 Subjective and objective quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Human subjective perception in X-ray image quality . . . . . . . . . . . . . . . . . 93.5 Di�erent approaches objective image quality . . . . . . . . . . . . . . . . . . . . . . 10

3.5.1 No-reference OIQ for X-ray image enhancement . . . . . . . . . . . . . . . . 103.5.2 Reduced-reference OIQ for X-ray image enhancement . . . . . . . . . . . . 103.5.3 Full-reference OIQ for X-ray image enhancement . . . . . . . . . . . . . . . 11

3.6 Image processing enhancement algorithms . . . . . . . . . . . . . . . . . . . . . . . 123.6.1 Adaptive Temporal Recursive �lter . . . . . . . . . . . . . . . . . . . . . . . 123.6.2 Cosmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.7 Distortions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.7.1 Adding arti�cial noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.7.2 Blur �ltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Image quality metrics 14

4.1 Metrics for full-reference image quality assessment . . . . . . . . . . . . . . . . . . 144.1.1 Structural Similarity Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.1.2 Visual Information Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Information capacity by Shannon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Experimental design 18

5.1 Adjusting the full-reference model . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2 Constructing reference and test image . . . . . . . . . . . . . . . . . . . . . . . . . 195.3 Design of experiments for full-reference metrics . . . . . . . . . . . . . . . . . . . . 205.4 Model for Shannon's information capacity metric . . . . . . . . . . . . . . . . . . . 215.5 Design of experiments for Shannon's information capacity metric . . . . . . . . . . 22

6 Numerical experiments 23

6.1 Experiments testing full-reference metrics . . . . . . . . . . . . . . . . . . . . . . . 236.1.1 Noise and blur addition tests . . . . . . . . . . . . . . . . . . . . . . . . . . 236.1.2 ATR Cut parameter test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.1.3 Cosmix on both test and reference image . . . . . . . . . . . . . . . . . . . 266.1.4 Noise reduction by blur experiment (IP on both images) . . . . . . . . . . . 28

1

6.1.5 Cosmix just on test image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.1.6 Noise reduction by blur test (IP on just the test image) . . . . . . . . . . . 32

6.2 Experiments testing Shannon's information capacity . . . . . . . . . . . . . . . . . 346.2.1 Noise and blur addition tests . . . . . . . . . . . . . . . . . . . . . . . . . . 346.2.2 Cosmix Alpha_Mean_Fact parameter test . . . . . . . . . . . . . . . . . . 36

7 Discussions and Conclusions 38

7.1 Conclusions full-reference metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.2 Conclusions Shannon's Information Capacity metric . . . . . . . . . . . . . . . . . 407.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Bibliography 41

2

Chapter 1

Introduction

Cardiovascular disease and other pathologies related to the blood vessels of the heart are amongthe most frequent pathological occurring conditions, and among the leading causes of death indeveloped countries. The e�ort to diagnose and treat these conditions is greatly facilitated byimaging techniques such as X-ray radiography. The goal of X-ray imaging is to depict anatomicalstructures as accurately as possible, allowing di�erentiation of di�erent types of tissues, recognitionof abnormalities and guidance through diagnostic or interventional procedures. Image quality liesat the heart of medical imaging, and image processing is aimed at improving it. The dose of X-rayradiation used to make X-ray images determines the quality of the image. The more dose is used,the better the quality of the image will be. When Image Processing (IP) is applied to enhance theimage quality, less dose can be used to obtain a same quality image. As X-ray radiation is harmfulto patients, it is clear that every improvement to IP results in less harmful radiation used on thepatient to obtain the same quality image. This shows the importance of IP in X-ray imaging.

The task of assessing image quality in an objective way is a di�cult one; the importance,however, of image quality and the impracticality of subjective testing have made it the topic ofextended research. Image quality assessment methods can be used for example to compare IPalgorithms or tune the parameters of an IP algorithm.

The objective of this thesis is to �nd existing image quality metrics and test whether theyare suitable for assessing the strength of X-ray image enhancement algorithms. Three categoriesof metrics have been studied: No-reference, reduced-reference and full-reference. Full-referencemetrics have been selected because they have a clear goal of what the image should look like andthis goal can easily be adjusted. Some promising full-reference metrics are available in literature[2, 13, 16] but they assume that an original "perfect" version of the image is at hand. This is notavailable when enhancing an image, so the model has to be adjusted to use these full-referenceimage quality metrics. Once this is achieved the metrics are tested whether they still functioncorrectly with the new model. Moreover, their compatibility with X-ray images is tested, whichcontain considerable more noise then normal images.

The aforementioned objective of this thesis can be phrased as the following research ques-tion: "How to use full-reference methods for objectively evaluating X-ray image quality whenusing image enhancement algorithms". This project was performed at Philips Healthcare in Best(Netherlands) in cooperation with the University of Maastricht, as part of the Master study:Operations research / Knowledge Engineering.

This thesis is structured as follows: in Chapter 2, the problem and goals are described in moredetail. Chapter 3 explains some basic concepts in image processing (IP) and Objective ImageQuality (OIQ). In Chapter 4, the OIQ metrics are explained that will be evaluated. Chapter 5describes how the full-reference model is adjusted and which experiments have been done. Chapter6 describes in detail the experiments performed and the results will be given. In Chapter 7conclusions are drawn discussing the results of the experiments and the goals we set out to do.

3

Chapter 2

Problem description and objectives

This research is performed in the �eld of Objective Image Quality (OIQ) and aims at selectinga promising way to measure quality and evaluate its ability for X-ray images. Objective imagequality is a broad problem and some restriction are made to create a area to focus on. First ofall we are not going to test the quality of images by themselves but the strength of IP algorithmsperformed on these images. Also we will not perform any subjective testing since this is tootime consuming. After a literature study three di�erent classes of OIQ measures are found: full-reference, reduced-reference and no-reference. These classes will be further explained in Section3.5 along with the choice made to further explore the full-reference model.

Problem description

The research question addressed in our research is: "How do we use the full-reference model forobjectively evaluating X-ray image quality when using image enhancement algorithms?" Moreoverwe studied which existing full-reference metrics are promising for this purpose.

The Setting of Philips healthcare

The X-ray images used in this research are made using a Philips Allura Xper FDXD 4700 system.The images are processed using di�erent IP algorithms to enhance the quality. These algorithmsperform noise reduction, edge enhancement and increasing contrast. The methods that are cur-rently being used to measure image quality are Modulation Transfer Function (MTF), Noise PowerSpectrum (NPS) and Detective Quantum E�ciency (DQE) [19]. These methods very useful whentesting a detector or a linear algorithm. When testing non-linear IP algorithms, which most recentand important ones are, they are not usable. The reason is that MTF and NPS use speci�c inputimages to do there measurements. With non-linear IP algorithms their functionality depends onthe input image. So when the quality of an algorithm is measured using MTF and NPS speci�cinput images, the result will not be applicable on clinical images which have entirely di�erentproperties. Consequently, other quality metrics are needed to asses non-linear IP algorithms.

Research objectives

The three mains regions of interest in this thesis are:

1. Design a setup that enables the use of full-reference methods for evaluating X-ray imagesand IP algorithms.

2. Design a setup that enables us to measure the strength of: di�erent IP algorithms, or thesame IP with di�erent parameter settings (parameter tuning).

3. Test several full-reference metrics for evaluating X-ray image enhancement IP.

4

Chapter 3

Concepts in image processing and

objective image quality

This chapter explains the basic concepts of Image Processing and Image Quality. Readers ac-quainted with IP methods can skip 3.1 and 3.2. Readers with Image Quality experience can alsoskip 3.3, 3.4 and 3.5.

3.1 The human visual system

The Human Visual System (HVS) is the part of the nervous system that is responsible for pro-cessing and interpreting visual information and deciding on adequate actions. It interprets theinformation from visible light to build a representation of the surrounding world. The visual sys-tem has the complex task of (re)constructing a three dimensional world from a two dimensionalprojection of that world. The psychological manifestation of visual information is known as visualperception. Our visual perception is an extremely complex system. For example, we known thatour perception of intensity is a nonlinear function of actual intensity. The human visual systemalso has varying sensitivity to di�erent spatial frequencies. These properties can be modeled toa limited extend. The HVS models are this set of human visual system models used by imageprocessing experts to deal with biological and psychological processes that are not yet completelyunderstood. As our knowledge of the true HVS improves, new models are created and old modelsare updated.

One example of a model of a psychophysical HVS feature is the contrast sensitivity function(CSF) that models the sensitivity of the HVS as a function of the spatial frequency content ina signal. At di�erent spatial frequencies the HVS observes di�erent levels of contrast. This isdemonstrated in �gure 3.1, which was �rst produced by Campbell and Robson [1]. There arenumerous models like the CSF that model a part of the HVS. These models as a whole is whatwe refer to when we say the Human Visual System models.

3.2 Distinguishing properties of images

An image, looking from quality assessment point of view, has three major categories in which thequality can be divided: noise, sharpness and contrast. Also artifacts play a role in determining thequality of an image and will be further discussed in section 3.2.4. These four are shown in Figure3.2 for a one dimensional signal. Noise can be seen as the amplitude of the signal in a part wherethere are no objects. Contrast is the distance between a object and the background. Sharpness isthe steepness of the transition between an object and the background. An artifact is an unwantedintensity di�erence.

5

Figure 3.1: The luminance of pixels is modulated sinusoidally along the horizontal dimension.The frequency of modulation (spatial frequency) increases logarithmically, i.e., with exponentialincrease in frequency from left to right. The contrast also varies logarithmically from 100% toabout 0.5%.

Figure 3.2: Example of quality properties in a signal

3.2.1 Noise

Photon noise is the main source of noise in X-ray images. This noise is a type of noise that occurswhen the �nite number of light particles that carry energy, the photons, is small enough to giverise to detectable statistical �uctuations in a measurement. Noise is present in every pixel and theintensity of the noise is often expressed in the standard deviation of the noise. Photon noise covers70-80% of all the noise in an X-ray image. The other 20-30% is noise created by the measuringtools, like the X-ray detector. This means that just 20-30% of all the noise can be reduced byimproving these measuring tools. The remaining noise can only be reduced by increasing the doseor Image Processing. So when we keep the quality of the images constant, every improvement inIP results in requiering less X-ray radiation used on the patient. This reveals the signi�cance ofIP in the development of X-ray systems.

Signal Dependent and Signal independent noise

Noise can be divided into two categories: signal dependent and signal independent noise. Signalindependent noise is uniformly spread over the image while signal dependent noise depends onthe strength of the signal. Noise in X-ray imaging is signal dependent. In Figure 3.3B one cansee signal dependent noise belonging to 3.3A. One can clearly see that the darker places in Acontain more noise then the brighter areas. Theoretically more noise is expected when the signalis stronger (brighter) but image A has already been transformed by a Flat Detector CorrectionLook Up Table (FDC-LUT). Without going into the details of the FDC-LUT, we just mentionthat it is a logarithmic LUT. Figure 3.4 shows an example of how it e�ects the noise to signalintensity relation. This is achieved within the X-ray device and is the only IP that can not beturned o�.

6

Figure 3.3: Image A is the input image, image B shows signal dependent noise, image C showssignal independent noise

Figure 3.4: FDC-LUT signal transformation

3.2.2 Sharpness

Sharpness describes the clarity of detail in a photo and emphasizes textures in the observed object.It can also be seen as the steepness of the transition between object and background as can beseen in Figure 3.2. When the transition width is thin, high sharpness is perceived. When thetransition width becomes larger the transition will look more blurred. This di�erence is shown in�gure 3.5. Sharpness is e�ected by blurring. When we blur an image, the sharpness is decreased.

Figure 3.5: Sharpness

3.2.3 Contrast

Contrast is the di�erence in visual properties that makes an object (or its representation in animage) distinguishable from other objects and the background. Figure 3.2 shows how to determinethe contrast of an object in a one dimensional signal.

3.2.4 Artifacts

Artifacts are "stains" on the image that can be misinterpreted and mistaken for tissue structuresin medical imaging and are sometimes hard to detect. They can also obscure useful clinicalinformation. The �at detector can produce artifacts like bad pixels, bad lines or blinking pixels.

7

However artifacts can also be caused by IP algorithms when they are not properly tuned. Since wechoose our own test images to be used in our experiments we will not use any images containing�at detector artifacts. This leaves the artifacts created by IP to be handled with. Figure 3.6 and3.7 show some examples of artifacts that can be created by IP.

Figure 3.6: White artifacts, clearly visible around the edges of the blood vessels

Figure 3.7: Worm shaped artifacts that are seen throughout the image

As long as our reference image does not contain any artifacts, full-reference metrics will have noproblems coping with artifacts in the test image. This will become clear when we further explainhow the full-reference metrics will be used in Chapter 5.

3.3 Subjective and objective quality

Since human beings are the ultimate receivers in most image-processing applications, the mostreliable way of assessing the quality of an image is by subjective evaluation. The Mean OpinionScore (MOS), a subjective quality measure requiring the service of a number of human observers,has been long regarded the best method of image quality measurement. However, the MOSmethod is expensive and it is usually too slow to be useful in real-world applications. The goalof objective image quality assessment research is to design computational models that can predictperceived image quality accurately and automatically. These numerical measurements of qualityshould predict the quality of an image that an average human observer would report. Eventuallya subjective test is always needed to verify if an objective metric works correctly. However in the�eld of X-ray imaging this is di�cult and unrealistic. An additional problem is that these doctorshardly have time to participate in time consuming experiments. Moreover not every end user, inthis case a doctor, has the same visual preferences (taste). How we can de�ne taste will be madeclear in Section 3.4.

8

3.4 Human subjective perception in X-ray image quality

Before addressing Human Subjective Perception (taste) we have to de�ne the di�erence betweensimple and advanced IP algorithms. Advanced IP refers to non-linear algorithms and these willbe applied to the point that they start creating artifacts. We refer to simple IP when using linearalgorithms and as we will see next, linear IP will is the part where taste plays a role. Linear IP isperformed by linear operations and can always be reversed.

The three basic properties of image quality that have just been explained are contrast, sharpnessand noise. In general we want the noise level to be as low as possible and the contrast and sharpnesslevel as high as possible. We will now explain how "taste" can be viewed in image quality usingthe following example:When we can enhance the contrast of an image without any negative side e�ects, no doctor'staste would object to this. Up to a certain point this can be achieved by non-linear algorithms.Now we can use linear IP algorithms to further enhance the contrast but as a side e�ect the noisewill increase as well. We now stand for the question: To what height do we want to increase thecontrast accepting the extra noise that is created as well? While one doctor may prefer highercontrast and does not care about the extra noise, another doctor might think that a low noiseimage is more pleasant to look at and will happily have less contrast to keep the noise verylimited. Somewhere in this reasoning there is a turning point where a doctor says: "I have added

the maximum amount of contrast. If I would add more, the noise will become too visible and

distort my image too much."'. This point de�nes the taste for that doctor for the balance betweennoise and contrast.

In this example we discussed the balance between contrast and noise, but sharpness also playsa role. In fact the entire picture is a triangle between sharpness contrast and noise. Figure 3.8Using simple image processing we are able to easily change this balance. When we add contrast

Figure 3.8: Image quality triangle

we also get more noise. The same applies to sharpness. Reducing noise has as side e�ect that italso reduces contrast and sharpness. So the ideal situation would be to maximize contrast andsharpness and at the same time minimize the noise. We could say that two images, di�erentin contrast/sharpness/noise balance, that are easily transformed to one another posses the samequality-potential.

3.5 Di�erent approaches objective image quality

Objective image quality measures can be divided into three categories: no-reference, reduced-reference and full-reference measures. The main criterion to classify objective image quality mea-

9

sures between these classes is the availability of a reference (perfect quality) image. When thisimage exists, objective quality measures use such a perfect image as a reference for quality assess-ment. The way this works is that the more an image looks like its reference image, the higher itsquality will be valued. These measures are known as full-reference and will be further discussedin Section 3.5.3.

In many practical applications such a reference image is not available. For these instancesthere exist quality measures that can evaluate image quality "blindly", without reference. Theseare known as no-reference quality measures and will be discussed in section 3.5.1.Although human observers can e�ciently asses the quality of images using no reference image atall, this turns out to be a very di�cult task to do objectively. A probable reason for this is thatthe human brain holds a lot of knowledge about what images should, or should not, look like.This resembles in a way at how the next category works, namely reduced-reference. In this thirdcategory the reference image is partially available, in the form of a set of extracted features. Thisis referred to as reduced-reference quality assessment and will be discussed in Section 3.5.2

3.5.1 No-reference OIQ for X-ray image enhancement

No-reference image quality assessment is one of the most di�cult tasks in the �eld of imageanalysis. A metric has to assign a quality score to an image without any knowledge of the content,or what the image should look like. Objectively assigning a quality score to an image has beendone for several properties of an image. Some interesting papers on this topic are [20, 5, 15]. Alsowithin our context, research has been done on this topic [7, 9]. In these papers properties likecontrast, sharpness and noise are measured but also histogram features like the kurtosis, energyand entropy. These metrics can just measure one property of an image and therefore are not ableto give a score for the total quality. They also work best when performed on a region of interestand not on entire images. It is for example possible to accurately measure noise in regions of theimage where no objects are located. However, exactly in these area's it is easy to remove noise.Removing and measuring noise in and along edges is far more di�cult then in empty area's of anX-ray image, and this has not been done so far in an no-reference metric. To achieve a total qualitymeasure these NR metrics will have to be combined into one metric that covers every aspect ofquality. The di�culty with this approach is to decide how important every property is comparedto others in the total quality score. This will also be di�erent for every taste in quality. Thesereasons make it a very challenging approach and we therefore continue looking for an alternative.

3.5.2 Reduced-reference OIQ for X-ray image enhancement

Quoting Zhou Wang and Alan C. Bovik [18] we can note that: "Reduced-Reference methodswere �rst proposed as a means to track the degree of visual quality degradation of video datatransmitted through complex communication networks. The framework for deployment of RR imagequality assessment systems is shown in Figure 3.9. It includes a feature extraction process atthe sender side and a feature extraction/comparison process at the receiver side. The extractedfeatures describing the reference image are transmitted to the receiver as side information throughan ancillary channel. The feature extraction method at the receiver side may be adjusted accordingto the extracted features at the sender side (shown as the dashed arrow). An important parameterin an RR system is the bandwidth available for transmitting the side information. The RR systemmust select the most e�ective and e�cient features to optimize image quality prediction accuracyunder the constraint of the available bandwidth."The wavelet transform provides a convenient framework for localized representation of signalssimultaneously in space and frequency. In recent years a number of Natural Scene Statistics (NSS)models have been developed in the wavelet transform domain. These models also use a reducedreference model but in a di�erent way. They are based on the assumption that all "good" imagescontain certain similar features in their wavelet decomposition. These features are extracted froma set of "good" natural images and then compared with any image. The more these featuresoverlap, the better the quality will be graded.

10

Figure 3.9: Deployment of reduced-reference image quality assessment system.

In [17] an RR image quality assessment method based on a NSS model in the wavelet transformdomain is proposed. Also has the matlab code been made available.

An idea could be to adjust this model to work on X-ray images. Research should be done ifX-ray images contain similar features that are typical for high quality X-ray images. The majorproblem with this idea though, is to obtain such a set of high quality X-ray images. The best onesavailable are the ones we are processing. These can not be used because they happen to be theimages we want to decide the quality of. This is why, at this stage, we will not look further intoRR based methods. There is however one full-reference metric, VIF, that will be tested that ispartly based on a NSS model.

3.5.3 Full-reference OIQ for X-ray image enhancement

Full-reference IQ measures available in literature were �rst proposed to measure the image qualityloss when compressing images, for example when using a JPEG compression. After the compres-sion the processed image is compared with the original image. See Figure 3.10. The quality is thengraded in how much the compressed image has remained similar to the original image. However animage of "perfect" quality is required for full-reference metrics and is obviously not available. Inorder to use full-reference IQ metrics for assessing the quality of X-ray images we have to changethe context of the full-reference model. How this is achieved will be discussed in Chapter 6. Twometrics SSIM and VIF are chosen to be evaluated. These metrics have performed better then mostother available full-reference metrics is previous studies [2, 13, 16]. Unlike the no-reference model,the full-reference model can always performed on the entire image. It also has a clear objectiveof how an image should look like which can easily be adjusted for di�erent tastes. This is a greatadvantage over the no-reference and reduced reference model.

A research that has just been performed in our context has created a model to use Shannon'stheory as a quality metric [4]. The Information Capacity (IC) of an image is calculated and themore information an image contains, the better the quality is assumed. The way that this iscalculated resembles the full-reference model. A noiseless version of an image is created. Insteadof using this noiseless version of the image as a reference it it used to separate the noise fromthe signal by subtracting it from the original image. The signal and noise are then separatelyprocessed and the Information Capacity is calculated.

This approach is also tested, though it can not be used in all the same test as the full-referencemetrics. More details about this metric is found in 4.2 and how we are going to use this metric isexplained in Section 5.4.

11

Figure 3.10: Deployment of full-reference image quality assessment system

3.6 Image processing enhancement algorithms

In this section, two of the IP algorithms designed and used by Philips, are explained. This will bedone brie�y because the details are restricted to Philips employees only.

3.6.1 Adaptive Temporal Recursive �lter

The main idea behind Adaptive Temporal Recursive �lter (ATR) is to recursively integrate pixels(gray-levels) along the time axis, with an integration factor depending on the temporal discontinu-ity observed between consecutive samples. This makes the �lter adaptive to motion and guaranteespreservation of moving details. The threshold, from which discontinuity observed ATR assumesthat is it no longer caused by noise but by movement of the patient or camera, is controlled bythe Cut variable. The higher the Cut parameter is set, the higher the pixel value change must be,to be seen as movement.

3.6.2 Cosmix

Cosmix is a steerable spatial �lter that blurs in uniform noisy regions and that enhances in partsof the image that contain contrast. So Cosmix is a combination of two �lters. One isotrope �lterthat is applied in uniform noisy regions and this �lter basically removes noise by blurring. Theother �lter is applied in regions that contain contrast and does directed �ltering (anisotrope). This�lter is used when a large change in contrast is located. The direction of this contrast change isdetermined and the �lter is directed perpendicular to the contour (edge). This blurs only in onedirection and at the same time enhances any transition between objects. The parameter we usedfor our experiments is quanti�ed in the Alpha_Mean_Fact parameter. This parameter measuresthe balance between isotrope and anisotrope �ltering. In other words, it decides from what pointa contrast change is strong enough visible in the signal to switch to directed �ltering. More detailscan be found in [8].

3.7 Distortions

During the experiments in Chapter 6 we need to be able to add arti�cial distortions to images.The distortions must resemble real distortions. The following section explains how the two maindistortions, X-ray noise and blurring, are created.

12

3.7.1 Adding arti�cial noise

Generating noise that resembles real X-ray noise is easier said then done. New images madeby an X-ray machine are automatically processed with an FDC-LUT and a variance stabilizationtransform. Obtaining images before these operations is not possible so they will have to be reversedin order to see the original X-ray noise. Claire Levrier (Philips research) recently made a tool toadd intensity dependent noise. This tool works according to the following steps:

• Inverse the FDC LUT

• Apply the variance stabilization transform

• Add a Gaussian colored noise (intensity independent sigma)

• Inverse the variance stabilization transform

• Re-apply the FDC LUT

The way this noise is constructed is based on [6].Using this tool it is possible to create noisy test images with di�erent standard deviations of

the noise.

3.7.2 Blur �ltering

Blurring is achieved by use of a spatial �lter that uses a gaussian �lter kernel. For every pixel itaverages that pixel with its neighbors. How many pixels it uses is decided by the �lter windowsize. The distribution of the weights over the pixels is decided by the sigma. In our experimentsthe sigma is always half of the blur �lter windows size. An example is given in Figure 3.11.

Figure 3.11: Blurring �lter with a window size of 10 and a sigma of 5

13

Chapter 4

Image quality metrics

After studying literature two promising full-reference metrics are chosen to be tested, SSIM andVIF. They will be further elaborated in section 4.1.1 and 4.1.2.

An interesting study has recently been done here at Philips looking into shannon's theory inorder to do quality assessment on X-ray images [4]. This method does not follow the full-referencemodel but can be tested in the same situations as the full-reference methods. This metric will beexplained in 4.2.

4.1 Metrics for full-reference image quality assessment

The following metrics have been chosen because they have been found promising in literature:SSIM [2, 13, 16]and VIF [13].Full-reference metrics are designed to measure the distance between two images they receive asinput. The output is a score that expresses how much these two images look alike. When thetwo input images are exactly the same, the output will have the value 1. The more the inputimages di�er from each other, the lower the score will be within the domain [0 1]. So actuallythese full-reference methods can be seen as image comparison methods. How they calculate thisdistance is di�erent for every metric.

4.1.1 Structural Similarity Index

This method, presented in [2], is based on comparing the structures of the reference and the testimage and is called the Structural Similarity Index (SSIM).

SSIM is divided into three parts that together give a quality score of a test image comparedto its reference image. The three di�erent measures involved are: the luminance, contrast andstructure comparison measures. They are calculated as follows:

Let x and y be two non negative signals corresponding to the reference and distorted images,and let µx, µy , σ2

x , σ2y and σxy be the mean of x, the mean of y, the variance of x, the variance of

y, and the covariance of x and y, respectively. Here the mean and the standard deviation (squareroot of the variance) of a signal are roughly considered as estimates of the luminance and thecontrast of the signal.

l (x, y) =2µxµy

µ2x + µ2

y

, c (x, y) =2σxσy

σ2x + σ2

y

, s (x, y) =σxy

σxσy

The two constants, C1 and C2 are added to prevent unstable measurement when(µ2

x + µ2y

)or(

σ2x + σ2

y

)is close to zero.

When these three measures are combined with each equal weight we obtain the followingformula [2]:

14

SSIM (x, y) =(2µxµy + C1) (2σxy + C2)(

µ2x + µ2

y + C1

) (σ2

x + σ2y + C2

)They are given by

C1 = (K1L)2 and C2 = ((K2L)2 ,

where L is the dynamic range of the pixel values (L = 2�bits) and K1 and K2 are the same as in[3]: K1 = 0.01 and K2 = 0.03.

The SSIM indexing algorithm is applied for quality assessment of still images using a slidingwindow approach. The window size of the implementation we used in our experiments is 8x8.The SSIM indices are calculated within the sliding window, which moves pixel-by-pixel from thetop-left to the bottom-right corner of the image. This results in a SSIM index max of an image,which is also considered as the quality map of the distorted image being evaluated. The overallquality value is de�ned as the average of the quality map the mean SSIM (MSSIM) index.

4.1.2 Visual Information Fidelity

This metric is based on an information theoretic criterion for image �delity using Natural SceneStatistics (NSS) and is called Visual Information Fidelity (VIF).

Images and videos of the three dimensional visual environment come from a commonclass: the class of natural scenes. Natural scenes form a tiny subspace in the space of allpossible signals, and researchers have developed sophisticated models to characterizethese statistics. Most real-world distortion processes disturb these statistics and makethe image or video signals unnatural.

[12]In [14] Hamid R. Sheikh and Alan C. Bovik proposed that, using natural scene models in con-junction with distortion models to quantify the Shannon information shared between the test andthe reference images. They also showed that this shared image information is an aspect of �delitythat correlates well with visual quality. In their follow up paper [12] they also quantify the infor-mation content of the reference image, since perceptual quality is likely to vary with the relativeinformation loss. They propose a uni�ed information �delity criterion based on NSS, distortionand HVS modeling called VIF.

More details about this metric can be found in [12].

4.2 Information capacity by Shannon

Figure 4.1: Scheme Information Capacity

The work of Shannon[10, 11] has provided a framework for the assessment of digital communi-cation. The Shannon entropy, as a description of the amount of uncertainty or surprise contained

15

in a signal, can be expressed in terms of the underlying distribution p(xi) of the random variableX by:

H(X) = −n∑

i=1

p(xi) · log(p(xi)) (4.1)

where n is the number of states that the (discrete) signal can assume. In the context of ourfull-reference scheme, an important concept is the Mutual Information (MI) between an inputdistribution X and an output Y , which describes the amount of information Y conveys about X:

MI(X;Y ) = H(X) +H(Y )−H(X;Y ) = H(Y )−H(Y |X) =∑X,Y

p(x, y)log2p(x, y)

p(x) · p(y)(4.2)

In terms of imaging, this variable can be regarded as the extent to which a degraded imagere�ects the information present in a reference image. Its calculation for a given pair of imagesrequires knowledge of the underlying distribution of input X and output Y , as well as their jointdistribution, which in practice is not possible as only realizations of the distributions (images) areavailable. Approximating the distribution through the image histogram is commonly applied andprovides a good solution e.g. when the goal is to maximize MI for the purpose of registration.However, it leads to a poor approximation of the true value of entropy, since the pixel values inan image have a strong local dependence and conditioning a pixel to its neighborhood drasticallyreduces the entropy. Ideally, one should calculate the conditional entropy of each pixel givenan in�nite neighborhood, as shown by Ignatenko et al [3]. In practice, a small neighborhood issu�cient for a reasonable approximation, if a Gaussian model is assumed for the dependence ofa pixel on its neighbors. Unfortunately, even approximations using a small neighborhood lead toproblems of very high dimensionality and thus computational cost. It is possible to bypass thislimitation if we consider another aspect of Shannon's theory. In the context of coding, MI givesthe rate of information transfer through a communication channel. The maximum of this rate isequal to the channel capacity, for which Shannon gave an elegant de�nition. For the case of aone-dimensional, zero-mean signal of bandwidth B, which is corrupted by independent additivewhite Gaussian noise, the maximum information capacity of the channel CShannon can be writtenin the simple form:

CShannon = max {H(Y )−H(Y |X)g = B · log2

(1 +

(S

N

)2)

(4.3)

where(

SN

2)is the signal-to-noise ratio, and the maximum is over all power-constrained input

densities. Here the channel bandwidth is measured in Hz and CShannon is expressed in bitsper second (bps). Shannon's noisy channel coding theorem can be generalized for two-dimensionalsignals where both signal and noise are zero-mean and bandwidth-limited and the noise is additive,Gaussian and possibly colored. In this case, the information capacity (C0) per unit area can becalculated as:

C0 =∫ fNx

−fNx

∫ fNy

−fNy

log2

(1 +

P (fx, fy)N (fx, fy)

)dfxdfy (4.4)

where P (fx, fy) and N(fx, fy) are the signal and noise power spectra, respectively, and theintegration takes place up to the Nyquist frequency fN = (fNx, fNy). In this calculation, thezero-frequency components of the spectra are discarded since the information is contained in thedi�erence between the signal and the background, corresponding with the zero-mean demandin Equation 4.3. The unit of the result is bits per pixel (bpp) or bits per square millimeters(b/mm2), depending on the spatial frequency units. This information capacity expresses themaximum amount of information per unit area that can be transmitted for a given signal andnoise power and it is upper-bounded by the entropy of the source producing the signal. In thissense, it can be considered as a measure of the information content of a speci�c image at a given

16

noise level. The contrast of the image is re�ected in the signal power spectrum, where the signalis de�ned as a modulation over the background. Loss of sharpness is also incorporated in thesignal power spectrum, since it follows the transfer characteristics of the imaging system. In thisway, sharpness, contrast and noise are all re◦ected in this expression, which generalizes our furtherexperiments. To calculate C0 as de�ned in Equation 4.4, it is �rst necessary to separate noisefrom signal. In our full- reference scheme, the signal term is represented by the noiseless reference,which is in practice obtained by averaging a large sequence of still images in order to decrease thenoise to a negligible level. This is di�erent from assuming a perfect (uncorrupted) image, whichis also perfectly sharp. In the case of independent additive noise, subtracting this reference fromthe noisy (test) image provides the noise term in the denominator. In radiography, however, theassumption of independent noise does not hold, as will be elaborated in the next section.

This text and the implementation of the metric was made by Chrysi Papalazarou [4].

17

Chapter 5

Experimental design

This chapter describes the experimental setting and the processing of the experimental results.

5.1 Adjusting the full-reference model

Our objective is to use full-reference metrics to evaluate the performance of image enhancementalgorithms. This means we want to be able to compare the e�ectiveness of IP algorithms witheach other to see which performs best. This can be either di�erent IP algorithms or the samealgorithms with di�erent parameters settings (parameter tuning).

The full-reference methods (SSIM,VIF) are designed to measure quality degradation, as ex-plained in Section 3.10. They use the original image as a reference and compare how much theprocessed image has remained similar to its original. But since we are enhancing images and notreducing image quality by compression, no original image is available to be used as reference. Wemight still be able to use these metrics when we adjust the full-reference model in the followingway. We can take any image and degrade it such that we have two images of the same situationthat are of di�erent quality. The test image (low quality) and the reference image (high quality).How to obtain these images is described in Section 5.2. We then perform di�erent IP on thetest image. These processed test images are now compared with the reference image using thefull-reference metrics. This process is displayed in Figure 5.1. The output values tell us whichprocessed image looks the most like its reference image (the image we started out with). We cannow say that the IP that the full-reference methods give the highest similarity value has improvedthe test image the most. So we can use this similarity value as our quality grading. However, thefact that the reference image is not perfect does have consequences.

Figure 5.1: Deployment of full-reference image quality assessment system for comparing di�erentIP algorithms. The di�erent IP's can be di�erent algorithms or the same algorithm with di�erentparameter settings.

18

Consequences using imperfect reference image

When we are improving our test image using IP the distance to the reference image will becomesmaller. When we have improved our test image so much that it has become the same qualityas the reference image, the similarity measured by the full-reference methods will approach thevalue 1. But when the test image is improved even further such that it becomes of higher qualitythen the reference image, the distance between the images will start to get bigger again. Butnow the full-reference methods still assumes that the reference image is the "perfect" image andtherefore will grade any further improvement negatively. When this happens we are making aserious assessment error. This reasoning leads to the following conclusion:

When we use a reference image that is not perfect, we must be 100% sure that

our IP will never be able to improve the test image to a higher quality then the

reference image.

Another way to look at this problem of using an imperfect reference image is the following:Full-reference metrics calculate the similarity between two images. They simply measure and addup the di�erences between two images, but can not judge whether these di�erences are good orbad. When we want to use this similarity measure as a quality grading, we must make sure thatevery di�erence between the two images is either good or bad. When this is the case all di�erencescan be added up to give the distance between the images without making errors. We further referto this problem, that full-reference metrics can not judge whether a di�erence is good or bad, asthe "Judging problem".

As well known, the quality of an image can be divided into three main properties. Con-trast, sharpness and noise. When we improve just one of these properties such that it becomesbetter then the reference image we are already increasing the distance and making an error. Sothe Judging problem must hold for every property. We can now extend the theorem in stating that:

When we use a reference image that is not perfect,we must be 100% sure that our

IP will never be able to improve any of the properties contrast, sharpness or noise of

test image to a higher quality than the reference image.

5.2 Constructing reference and test image

As explained before we need two images, test and reference image, of the same situation that aredi�erent in quality. With these images we calculate the similarity and use this number as a qualitygrading. The test image is created by adding distortions to the reference image. So what we needis a reference image of the highest possible quality.

The �rst idea was to use a high exposure shot as a reference image. These images are made witha high dose of X-rays and contain low amounts of noise. Another way to create a high qualityimage is used by C. Papalazarou in her research [4]. She made sets of 100 X-ray images of aparticular phantom and did this for a number of di�erent phantoms. Phantoms, are objects madeto function as optical tissue-equivalents of real humans tissue to be used to test X-ray machines andperform experiments with them without having to subject someone to harmful X-ray radiation.The phantoms have another advantage and that is that real patient bodies are always moving.This makes it impossible to make a set of 100 images of exactly the same situation. Papalazarouaverages this set of 100 images to create a noiseless image. By averaging a set of 100 images ofexactly the same situation the standard deviation of the noise is reduced with a factor 10. Thiswould make a very interesting reference image for our experiments as well. These sets of imageshave been made available and a set of 100 images of a chest phantom is used to create our referenceimage. See Figure 5.2.

The disadvantage of using this reference image is that it is only of high quality in noise level.The other image quality aspects, contrast and sharpness, are of normal quality. As explained in

19

Figure 5.2: Creating a reference image by averaging

the previous section we can just use this reference image to measure di�erences in noise. It is notsuitable to measure changes in contrast or sharpness because of the Judging problem mentionedabove.

Now that we have an almost noiseless reference image we want to know whether it is suitableto do noise reduction experiments, so we can continue with the creation of matching test images.We can use one of the 100 images used to calculate the reference as a test image but these all havea �xed amount of noise. For some experiments we need test images with varying kinds of noise sowe have to be able to generate arti�cial noise. How this can be done is explained in Section 3.7.1.

We now have an almost noiseless reference image and we can make test images with varyingstrengths of noise. So we can start experimenting with these images.

5.3 Design of experiments for full-reference metrics

The �rst experiment that is performed is to test how the full-reference metrics respond to basicdistortions. Noise and blurring is added to an image to measure what happens to the qualityscores. This experiment can be found in Section 6.1.1. Resulting this test we conclude that bothSSIM and VIF respond well to noise and blur distortions. In both cases the quality drops whenthe distortions are added with increasing strength.

Next we test an image enhancement algorithm in our full-reference model. Since our test andreference images are only di�erent in noise, we must consider the Judging problem. Any changeto the image except noise enhancement will be seen as a negative change. So we are looking foran algorithm to experiment with that only performs noise reduction. For a static scene the ATRalgorithm does this and therefore makes it ideal to test just noise reduction. More details aboutthis algorithms can be found in Section 3.6.1. We tested ATR on several test images, containingdi�erent amounts of noise. We chose one parameter in ATR, the Cut parameter, and tried to tunethis parameter using our full-reference metrics. This is done by processing the same test imageswith di�erent settings for this parameter. Then the quality is graded of the processed imagesusing the full-reference metrics to see which parameter setting results in the highest quality score.This scheme is shown in Figure 5.1. More details about this test can be found in Section 6.1.2.The result of this experiment are that the optimal value found for the Cut parameters is always 5,which is the parameters maximum. The conclusion of this test is that the ATR algorithm is nota suitable algorithm to experiment with. ATR is speci�cally designed to do noise reduction in asequence of images that contain movement within the sequence. But we used the 100 images ofa stationary phantom to experiment with. Experimenting with movement is not possible in thefull-reference model since we use a static reference image. It turns out that the Cut parameterswithin ATR becomes super�uous when there is no movement in the sequence processed by ATR.So another algorithm is needed to test whether the reference and test image, that are di�erent inthe amounts of noise they contain, are usable to measure noise reduction.

Another noise reduction algorithm that is used is Cosmix. The problem with Cosmix however

20

is that it does more then just noise reduction and these functions can not be separated or turnedo�. The �lter that is used to reduce the noise automatically performs edge enhancement whenit encounters an edge. Edge enhancement improves sharpness and since the test and referenceimage, that we want to compare, are only di�erent in noise we encounter the Judging problem.This means that when the sharpness is improved it will be graded negatively because it is seenas a greater distance to the reference image. To counteract this problem we will perform Cosmixon both the test and the reference image. This way any change besides noise reduction is doneon both images and the full-reference methods will not see these changes. The only measurablechanges are the ones e�ecting noise. These are that the noise in the test image will be reduced andalmost nothing will change in the reference image because it contains hardly any noise. This test isdescribed in Section 6.1.3. This experiment resulted in �nding optimal values at the maximum ofthe Alpha_Mean_Fact parameter setting. So again the parameters becomes super�uous, but thistime for a di�erent reason. Namely, that we process both the test and the reference image withthe same IP. This goes wrong because the Alpha_Mean_Fact indirectly in�uences the balancebetween noise and sharpness. A parameters always decides a balance between two e�ects wherethe optimum settings is di�erent for di�erent inputs. When we measure just the noise reductionof a parameter, that is designed to control the balance between noise reduction and sharpnessloss, the sharpness loss is neglected. And when this happens the parameter setting will alwaysbe at the point where the maximum amount of noise and sharpness is removed. To verify thesestatements we repeated this experiment but replaced Cosmix with blurring. Since blurring is avery simple operation and it is the most basic noise reduction algorithm it is suitable to verify theresults found with Cosmix. This is done to make sure that no other parts of Cosmix could havecaused the results we found. This experiment is found in Section 6.1.4. The results are the sameas when using Cosmix. The optimal values always are located at the maximum for the parameter,in this case the amount of blurring. So this veri�es the Cosmix results and we can conclude thatprocessing both images with the same IP is not going to provide useful information when tuningparameters.

We now go back to our old setup where we just process the test image. We have to �nd anotherway to make sure that any sharpness enhancement is seen as a positive change. This is done bydegrading the sharpness in the test image. When we now improve the sharpness of the test image,it will resemble the reference image more as long as it does not exceed it. Degrading the sharpnessis achieved by blurring. We again try to tune the Alpha_Mean_Fact parameter in Cosmix in theexperiment in Section 6.1.5. This time we use a test image that has less noise and sharpness thenthe reference image. And we process just the test image with IP (see scheme in �gure 6.11). Nowwe do �nd optimal values for the Alpha_Mean_Fact parameter for both SSIM and VIF. So thenew setup enables us to �nd optimal values for parameters using full-reference metrics like SSIMand VIF. We repeat the noise reduction by blur experiment and also process only the test image.This again to con�rm the results. This experiment is described in Section 6.1.6 and the results doagree with our conclusions about processing just the test image.

5.4 Model for Shannon's information capacity metric

Shannon's Information Capacity (IC) [4] metric works di�erently than the full-reference metrics.This IC metric needs a series of images with no object movement of the same situation to function.Papalazarou made a set of hundred images of a chest phantom that the full-reference model usedto create a reference image, seen in Figure 5.2. IC works di�erently in the way that it does notcalculate similarity between a test and a reference image. It uses the average of the set of 100images not as reference but as a noiseless version of the images. IC subtracts the noiseless imagefrom all the 100 images to obtain only the noise in each image. This way the noise is separatedfrom the signal. The noiseless image is now seen as the signal in the original images. A fourriertransformation is now done on the signal and the noise image. This is then used to calculate asignal to noise ratio. IC is based on the idea that the greater the signal to noise ratio, the moreinformation is in the image and the higher the quality of the images is.

21

In order to experiment with IC we process all of the 100 phantom images with IP and calculatethe new quality score. We repeat this several times with di�erent parameter settings of the IP.The quality for each of the sets of processed images can be calculated and compared to measurewhich parameter setting performed best.

5.5 Design of experiments for Shannon's information capac-

ity metric

First some basic experiments are done with IC, see section 6.2.1. We add blur and noise like wedid with the full-reference metrics to see how IC responds to basic distortions. Adding noise asexpected results in a quality loss. The images used here contain a normal amount of noise unlikethe reference image used in the full-reference model. So the expected results when adding blurwould be that blurring in small amounts would work as a noise reduction algorithm and slightlyimprove the image. And large amounts of blurring are expected to decrease the quality of animage like in Figure 6.16. Adding blur however improves the quality according to IC, see Figure6.20. This might be explained by the fact that Shannon's IC calculates a Signal to Noise ratio,formula 4.3 and 4.4. When the image is blurred both the signal and the noise strength decrease.Apparently the noise strength is reduced more or weighted stronger then the signal strength.

In the next experiment 6.2.2 the Alpha_Mean_Fact parameter in Cosmix is tested for IC.Optimal values are found and are located very close to some of the standard values in the Cosmixmodel. These standard values represent the preference of the makers of the algorithm. This is agood result of IC and shows potential to be used for �nding optimal values for parameters.

22

Chapter 6

Numerical experiments

In this chapter a description is given of all the experiments that are performed. Every experimenthas the same layout: distinct sections are made in which the objective, setup, results and conclu-sions of that particular experiment are described. Section 6.1 describes the experiments with thefull-reference methods. 6.2 describes the experiments with Shannon's Information Capacity.

6.1 Experiments testing full-reference metrics

The experiments is this section are performed to test the full-reference methods.

6.1.1 Noise and blur addition tests

The �rst and most simple test that is done is to add distortions to an image and then comparethe new obtained processed image with the original.

Objective

We have added noise and blur to the reference image to test if the di�erent full-reference metricsrespond correctly to these distortions. We expect the quality grading to decrease when we addnoise or blur to the image.

Setup

The scheme in �gure 6.1 shows how the experiments are performed. We start with the average of

Figure 6.1: Full-Reference scheme for distortion test

the set of 100 images that we will use as the reference image. The test image is created by addingnoise or blur to the reference image. Noise and blur is added as described in 3.7. Both imagesare provided to the metrics as input. This is done numerous times with increasing strength ofthe distortions. For the noise test a 100 di�erent noise standard deviations are tested. Rangingfrom 6 to 106. Blurring is performed as described in 3.7.2. For window size parameter the odd

23

numbers between 1 and 39 are used. Only the odd numbers are used because the even and oddnumbers for the blur �lter window size have di�erent shapes of the �lter. See Figure 6.2. You cansee that the even numbers make up a �lter where the pixel that is calculated is central and hasthe most weight. Using the odd numbers the pixel that is calculated has less weight and thereforewe choose to use just the odd or the even numbers.

Figure 6.2: Odd and even blur �lter window size shapes

Results noise addition test

The results of adding noise are given in Figure 6.3. The results verify our expectations and show

Figure 6.3: Basic noise test results of SSIM and VIF

that the quality scores drop when the standard deviation of the noise is increased. This is the casefor both SSIM and VIF. However the rate in which the values drop are slightly di�erent. As canbe observed from this �gure VIF appears to grade the added noise stronger negative then SSIM.

Results blur addition test

The results of the basic full-reference blur test are given in Figure 6.4. As we can see both SIMMand VIF value the blurring (sharpness distortion) negatively. When we increase the blur �lterwindow size and sigma, the quality of the image decreases. The VIF method shows some bumpswhen the �lter window size becomes larger then 21. More knowledge about the exact functioningof VIF is needed to explain this. All we know is that VIF uses Natural Scene Statistics. Aplausible reason could be that a blurred image shares a certain wavelet statistic with images ofnatural scenes. And that this overlapping statistic becomes stronger when the image is greatlyblurred. Fortunately these bumps occur only on high amounts of blurring. This high blurring isnot natural and will not be needed in further experiments.

Conclusions

Both metrics correctly measure the di�erence in noise and blurring. Also are the quality scoreplots very smooth. This implies that the metrics can accurately measure the di�erences in stepsize which are used to increase the distortions.

24

Figure 6.4: Basic blur addition test results of SSIM and VIF

6.1.2 ATR Cut parameter test

The ATR algorithm is very suitable to do a noise reduction test because it does not e�ect sharpnessor contrast for stationary scenes. More detail about the ATR algorithm can be found in Section3.6.1.

Objective

Find an optimal value of the Cut parameter in the ATR algorithm.

Setup

This test is achieved according to the scheme in �gure 6.5. The reference image used is the

Figure 6.5: ATR Cut parameter test scheme

average of the series of 100 images. Since ATR is a temperal �lter it needs a set of frames towork. Normally 20 frames should be su�cient, but since we are using abnormal settings for theCut parameter, 40 frames are used. The test images are the �rst 40 frames of the series of imagesused to calculate the reference. The noise in these 40 frames is increased until it has a standarddeviation of 50. These 40 images are then given as input the ATR algorithm which will give thesame set, but then with processed images as output. The last image of the output is used asinput for our full-reference metrics. This setup was repeated for di�erent settings of the "Cut"parameter in ATR. It was performed for 10 di�erent values from 0.5 to 5 with steps of 0.5.

Results

The results are shows in Figure 6.6. As we can see quality of the images goes up when the Cutparameter is increase. Also is the shape of both metrics almost identical, so they agree with each

25

Figure 6.6: Results of ATR experiment

other in this experiment.Both metrics grade a Cut setting of 5 with the highest quality score.When there is no movement, averaging over as much frames as possible does the most noise-reduction. When the Cut value of ATR is high the new frame is not weighted high in the newlycalculated average. This way more frames in�uence the output and there is averaged over moreframes. It is therefore no surprise that the optimal Cut value is the maximum possible value.

Conclusions

Both SSIM and VIF measure a quality improvement when the Cut parameters is increased. Allthree metrics also �nd the optimal value of 5, which is the maximum value at which this parametercan be set.

When noticing this we can conclude when we leave out a factor like movement, where the ATRalgorithm is designed for it can cause certain parameters to become super�uous. This probablyhappened to the Cut parameter in this experiment. Trying to �nd an optimal value for such aparameter will always result in the maximum of minimum value that it can be tuned on.

6.1.3 Cosmix on both test and reference image

Both test and reference image are processed by the same IP to make sure that changes, besides ofthe changes to noise, are performed on both images. This is necessary to test Cosmix, as it doesmore then only noise-reduction. Cosmix is also programmed to do edge enhancement and thise�ects the sharpness of the image. To make sure we do not grade any improvement in sharpnesswrong, as explained in Section 5.1, we will process both images with the same IP. This way weknow that any changes in sharpness or contrast do not in�uence the quality score. Only changesin noise level will be measured. What happens now is that when we process the test image withCosmix, a lot of noise will be removed along with some changes to the sharpness and contrast.When we process the reference image with Cosmix, no noise will be removed, but the similarchanges to the sharpness and contrast are made. This way we can measure only the di�erence innoise and see which parameter setting for Cosmix reduces the noise the most.

Objective

The objective of this experiment is to test whether the setup, of processing both test and referenceimage, is usable to test the strength of noise reduction in an algorithm. The test and referenceimage will only be di�erent in noise.

26

Setup

This experiment is done according to the scheme in �gure 6.7. This scheme is repeated for di�erentvalues of the Cosmix parameter Alpha_Mean_Fact ranging from 0 to 5 with steps of 0.5. As you

Figure 6.7: Full-reference scheme for Cosmix experiment, processing both test and reference image

can see, both images are processed with Cosmix. Cosmix is given an input of twice the sameimage. This is done because in the real system, Cosmix processes sequences of images. On the�rst frame measurements are done that set certain parameters of Cosmix for processing the nextframe. When Cosmix start to process the second frame, it can use the values calculated on the �rstframe and does not have to use standard values that are hard-coded into the program. Thereforewill the second frame be better processed so it is used to test the quality. The Alpha_Mean_Factvariable is explained in Section 3.6.2.

Results

Figure 6.8: Cosmix on test and reference image

As we can see in Figure 6.8 both the FR-metrics give the highest score for the highest Al-pha_Mean_Fact value. These tests have been repeated for di�erent settings (tastes) of the otherCosmix parameters, and have given the same results. Performing the same IP on both the testand the reference image �nds an optimum at the maximum of the parameter.

Conclusions

When the best setting for a parameter is always at its maximum, it has become super�uous. Thequestion now is: why has the parameter become super�uous. Our answer is that parametersexist to balance a setting between its positive and negative e�ects. In this case we want tobalance isotrope (low Alpha_Mean_Fact) and anisotrope (high Alpha_Mean_Fact) �ltering.

27

The positive e�ect is noise reduction and the negative e�ect is loss o� sharpness. The problem isthat when we perform IP on both test and reference image, that only di�er in noise, we are nolonger able to measure the loss of sharpness. So we are just measuring the positive side of theAlpha_Mean_Fact parameter, namely noise reduction. Isotrope �ltering reduces more noise butalso reduces sharpness. Since we only measure the noise reduction, the maximum value for thisparameter is always found.

We can conclude that this test setup (processing both images with the same IP) is not usablefor tuning parameters. Using the test set of Chrysi in combination with processing both imageswith the same IP does enable us to measure noise accurately. But to be able to �nd optimalparameter settings we also have to be able to measure changes to other properties of the image(sharpness and contrast). In this case Alpha_Mean_Fact parameter tuning is �nding the correctbalance between noise reduction and sharpness loss. It is not possible to �nd this balance whenwe only measure noise reduction.

6.1.4 Noise reduction by blur experiment (IP on both images)

Objective

This experiment is designed to con�rm the conclusion of the previous test: Cosmix with di�erentvalues for Alpha_Mean_Fact on both images. The same test is performed but the Cosmix algo-rithm is replaced by the most simple noise reduction algorithm, namely blurring. When we bluran image, noise is removed but also the sharpness will decrease rapidly. When we perform blurringon both test and reference image, that only di�er in noise, we expect to increase the quality whenwe increase the amount of blurring.

Setup

Figure 6.9 shows the scheme for this experiment. The same amount of blurring is used on the test

Figure 6.9: Full-reference scheme for Blur experiment, processing both test and reference image

and reference image. Blurring is performed as described in 3.7.2.

Results

Figure 6.10 shows the test results the noise reduction by blur experiment.

Conclusions

The results show that in this experiment the quality improves when both images are blurred.The more blurring is done, the higher the quality becomes. This makes sense when we considerfull-reference methods as image comparison methods. The more both images are blurred, themore they will resemble each other. We can conclude that the results comply with the previousexperiment 6.1.3, and again show that this setup is not able to correctly grade the quality. Thissetup only measures noise reduction and can there not be used to tune parameters.

28

Figure 6.10: Full-reference Blur experiment results, processing both test and reference image

6.1.5 Cosmix just on test image

Since the IP on both images did not workout, we go back to the model where we just processthe test image. In order to measure changes in sharpness and contrast correctly we degrade thetest image such that improvements in these properties are measured correctly. This allows for asubsequent application of the Cosmix algorithm.

Objective

Find an optimal value for the Alpha_Mean_Fact parameter in Cosmix.

Setup

Figure 6.11 shows the scheme for this experiment. The reference image is the average of the set

Figure 6.11: Full-reference scheme for Cosmix experiment, processing only test image

of 100 phantom images. As test image one out of the set of 100 images is taken, is this case the40th. Then some blurring is added to frame 40 to make sure any improvements to sharpness willbe graded correctly. This is done with three di�erent values for the blur �lter window size, namely3, 5 and 7. After the blurring extra noise is added to better see the changes in noise reduction.Also are 2 di�erent amounts of noise tested with the same blurring to see what e�ect this has onthe optimal value. This is done for SSIM and VIF.

29

Results

Figure 6.12: SSIM: Results Cosmix just on test image & zoomed in at optimal values

Noise Blur Alpha_Mean_Fact20 3 2.37520 5 2.020 7 1.75

Noise Blur Alpha_Mean_Fact20 7 1.7540 7 2.75

Table 6.1: SSIM: Optimal Alpha_Mean_Fact values

Figure 6.12 shows the results for SSIM and Figure 6.13 for VIF. Because the di�erences betweenthe lines are so much bigger then the variations within the lines, we have zoomed in on the optimalvalue for every line. The dotted lines shows which optimum plot belongs to which line. In themain plots you can see the e�ect of adding noise and blur to the quality score. In the smalleroptimum plots you can see which Alpha_Mean_Fact value is located at the optimum. Theseoptima's are also shown in table 6.1 for SSIM and 6.2 for VIF.

As we can see SSIM and VIF both �nd optimal values for the Alpha_Mean_Fact parameter.These optimal values shift when we add more noise or blur to the test image. When we add moreblur to the test image, the optimal Alpha_Mean_Fact value decreases. When we add more noiseto the test image, the optimal Alpha_Mean_Fact value increases.

As explained in Section 3.6.2 the Alpha_Mean_Fact value decides the balance between isotropeand anisotrope �ltering. To simplify things we can compare isotrope �ltering with blurring, andanisotrope �ltering with directed blurring. Directed blurring removes less noise then normalblurring but maintains edges (sharpness) in the image. When we increase the Alpha_Mean_Fact,Cosmix does more isotrope �ltering and less anisotrope �ltering. This means more noise is removed

30

Figure 6.13: VIF: Results Cosmix just on test image & zoomed in at optima's

Figure 6.14: VIF: extra points to decide di�erence in optima's

and less sharpness is maintained. When we decrease the Alpha_Mean_Fact less noise is removedand more sharpness is preserved.

When we increase the blurring on the test image, we actually remove noise and decrease thesharpness. So when we now perform Cosmix, less noise has to be removed and more sharpnessshould be preserved to obtain the same results as with less blurring on the test image. The lefttables in Figure 6.1 and 6.2 con�rm that the optimal Alpha_Mean_Fact value decreases when weadd more blurring.

The right tables in Figure 6.1 and 6.2 show that when we increase the noise in the test image,SSIM and VIF responds with assigning a higher Alpha_Mean_Fact value. This results in moreisotropic �ltering by Cosmix, so more noise is removed. This is a logical move when the test imagehas more noise so this is a pleasing result.

Comparing SSIM with VIF shows that SSIM has higher optimal values for Alpha_Mean_Fact.This means that SSIM parameter settings for Cosmix remove more noise along with some sharpnessthan VIF. So the di�erence in the preferred balance between noise and sharpness is di�erent forSSIM and VIF. SSIM prefers less noise along with less sharpness more then VIF. We should notethat the visual di�erences between these peaks are so small that they are hardly visible.

31

Noise Blur Alpha_Mean_Fact20 3 0.87520 5 0.62520 7 0.5

Noise Blur Alpha_Mean_Fact20 7 0.540 7 0.55

Table 6.2: VIF: Optimal Alpha_Mean_Fact values

Conclusions

We have succeeded in �nding optimal values for the Alpha_Mean_Fact parameter in Cosmix.This means this setup is able to decide an optimal parameter set suiting a metric. This testshould be repeated for other parameters in Cosmix to obtain this parameter set.

The fact that the optimal Alpha_Mean_Fact value responds to changes in the test imagemeans that we can not simply add distortions to create a greater distance between the test andreference image. Each set of images with the same levels of distortions has its own optimalparameter setting. So the distortion levels of the test image should resemble the levels of theimages that are really to be used in the system. These settings are just optimal for this speci�ctest and reference image. This leaves no choice but to improve the reference image.

6.1.6 Noise reduction by blur test (IP on just the test image)

Objective

The experiment in 6.1.4 used blur for noise reduction and found no optimal values. We con-cluded that this was the case because both test and reference image were processed with IP. Thisexperiment is done to show that when only the test image is processed we do �nd optimal values.

Setup

Images with 3 di�erent noise levels are used as test images. The �rst with a noise standarddeviation of 9 is one of the 100 images from the sequence. The test images with noise SD of 20and 40 are created by adding noise to this original sequence with a noise standard deviation of9. The reference image is the average of the sequence of 100 images. The blur �lter is appliedvarying in windows size and sigma. See �gure 6.15 for the scheme of this experiment.

Figure 6.15: Full-reference scheme for blur on test image

Results

Figure 6.16 shows the results of SSIM. As we can see, the more noise the test image contains, thelower the quality is when no blurring is done (blur �lter window size of 1). We also see a peakin all three curves that is the optimal point, according to SSIM, where the blurring removes somuch noise that the loss of sharpness is still acceptable. When the noise is increased, this optimalamount of blurring also increases. When the blurring is increases we also see the lines of thequality scores moving closer together. This means that when a lot of blurring is done, the images

32

that were di�erent in noise, are starting to look more like each other. This is logical since thenoise that is added varies only in variation, so the average pixel values stay the same. When weblur, we average over pixels and make the variations smaller again.

Figure 6.16: SSIM: results noise reduction by blur

Figure 6.17 shows the results of VIF. Here we also see the lines coming closer together but notas near to each other as with SSIM. This means that the blurred noise remains better visible toVIF then SSIM. But more importantly we see that the line with noise SD of 40 �nds a very weaklocal optimum where the others �nd a global optimum. The global optimum is at the maximumamount of blurring.

Figure 6.17: VIF: results noise reduction by blur

Conclusions

Blurring removes noise but also decreases the sharpness. SSIM �nds clear optimal values for thepoint where blurring is still acceptable. As also seen in the previous experiment with Cosmix, thissetup enables SSIM to give a optimal balance between noise and sharpness.

VIF also �nds optimal values except at high noise values. Then VIF grades the more blurringthe better, and this is not a result we expect. As we will see in the next section, this is also thecase with the Shannon's Information Capacity (IC) metric. We know that VIF has some sort ofIC incorporated in the metric, so assume this is the reason why the quality score goes up whenmore blurring is done. However, this high noise and blurring is not realistic so we will not have to

33

deal with this in the real system but we should keep it in mind. In the case with the real amountof noise (SD = 9) and blurring, VIF �nds peaks as expected so we do not worry to much aboutthis result but it is good to keep this in mind.

6.2 Experiments testing Shannon's information capacity

6.2.1 Noise and blur addition tests

Objective

This is just a basic test to see how IC responds to di�erent kind of distortions. We will add noiseto the sequence and test how IC responds. The same with blurring the sequence.

Setup

To test how IC responds to the same distortions as we used on the full-reference metrics in 6.1.1the scheme in �gure 6.18 is used. For this metric the distortions are applied to the whole set of100 images. The distortions are added with the same strength as in the full-reference tests. Thisis done to make the results comparable.

Figure 6.18: Basic Information Capacity scheme

Results noise addition test

Figure 6.19: Basic noise addition test results of Shannon's IC

As we can see in �gure 6.19, the quality of an image drops when more noise is added.

Results blur addition test

The results of the basic Information Capacity (IC) blur test are given in Figure 6.20. These resultsare very unexpected. Figure 6.20 shows that the quality improves when the images are blurredaccording to IC. There is a certain amount of noise present in the image, so blurring the image

34

Figure 6.20: Basic blur addition test results of Shannon's IC

does remove some noise. A small peak at the beginning of the plot might be explained by thisfact. But the quality values just keep on rising even when a blur size of 50 is used.

Figure 6.21: Image A shows the normal image, image B is blurred with a �lter window size of 50and a sigma of 25

Conclusions

The adding noise test reveals that the quality drops when more noise is added. The blur additiontest however shows a quality increase when the image is blurred. This might be explained by thefact that Shannon's IC incorporates a Signal to Noise ratio, formula 4.3 and 4.4. When the image isblurred both the signal and the noise strength decrease. Apparently the noise strength is reducedmore or weighted less then the signal strength. A possible explanation for this is that the Signaland Noise strength are averaged over the frequency bins that the fourrier transform calculates.The noise reduction is felt in the entire spectrum and the decrease will be in every frequency bin.The signal strength decrease, in this case loss of sharpness, only has a signi�cant e�ect on the highfrequencies. Therefore, when we average over the frequency bins, the noise decrease is weightedmore then the signal decrease. When this is the case, it can be �xed by applying di�erent weightson di�erent frequency components of IC. Frequencies that have a stronger e�ect on visual qualityshould get a higher weight. This will not be an easy task since it is connected to viewer perception.But when done properly will improve the functioning of IC and possibly solve this problem.

35

6.2.2 Cosmix Alpha_Mean_Fact parameter test

Objective

Find an optimal value for the Alpha_Mean_Fact parameter in the Cosmix Algorithm.

Setup

The same sequence of a 100 phantom images that has been used in all other tests is used. Becauseof computation time restrictions a quarter of each image in the sequence is used. The part used isshown in Figure 6.23. This part of the sequence contains a noise SD of 15 measured with the Xres3noise estimator. No extra noise is added. This sequence is then given to Cosmix and processedfor di�erent values of the Alpha_Mean_Fact parameter. The parameter domain ranges from 0to 10. Test are done in this domain using a step size of 0.1. These processed sequences are thengraded by Shannon's IC.

Figure 6.22: Scheme for testing IC on Cosmix with di�erent settings

Figure 6.23: The quarter that is cut out to perform Cosmix on

Results

In Figure 6.24 can be seen that IC �nds an optimal Alpha_Mean_Fact value of 1.3.

36

Figure 6.24: Results Cosmix with di�erent Alpha_Mean_Fact values

Conclusions

This test shows promising results for using Shannon's IC to tune parameters settings. The valuefound of 1.3 is really close to the value that is used by Philips for certain cases (1.5). In a previoustest we saw Shannon's IC handling blurring wrong. But this test proves this is no problem inreal situations. The blur that was added was unrealistic and its results less meaningful then theresults of the Cosmix Alpha_Mean_Fact parameter test.

37

Chapter 7

Discussions and Conclusions

The objective of this research is: "How do we use the full-reference model for objectively evaluatingX-ray image quality when using image enhancement algorithms?". After a literature study we havechosen to evaluate three image quality metrics: SSIM, VIF and IC.

The metrics SSIM and VIF are full-reference metrics. They are designed to calculate thesimilarity between a test and its reference image, with the reference being a perfect quality image.The smaller the di�erence between the images, the better the quality. This way the similarity canbe used as a quality grading. When enhancing images we do not have a perfect quality referenceimage and this gives rise to the Judging Problem. A di�erence between the test and the referenceimage can now also be an improvement in quality. The "Judging Problem" has been speci�ed inSection 5.1: "When we use a reference image that is not perfect,we must be 100% sure that ourIP will never be able to improve any of the properties contrast, sharpness or noise of test imageto a higher quality than the reference image". A number of solutions to the Judging Problem areexperimented with. Maintaining a large enough quality distance between reference and test imagehas proved to be a working solution. How this is achieved is explained in Section 6.1.5.

The IC metric is based on Shannon's theory [10, 11], and was formulated into an image qualitymetric by Papalazarou [4]. It determines the average quality of a set of images of a stationarysituation. The Judging Problem does not apply here as no reference image is required.

The evaluated metrics are able to clearly �nd optimal parameter values for IP algorithms. Thishas been achieved in a number of experiments. From this, we conclude that we have succeededin our objective. These optimal parameter values however remain objective and are di�erent foreach metric. The next step is to match these objective values to subjective values.

Comparing di�erent IP algorithms can also be done using the evaluated metrics. For the full-reference metrics the test images has to be processed by the di�erent algorithms. When usingthe IC metric the whole series of images has to be processed by each algorithms. The processedimages are then evaluated by the metrics. They will give a quality score for both processed imagesor sequence of images that can be compared to determine which algorithms performed best.

7.1 Conclusions full-reference metrics

We have seen that the full-reference model used in experiment 6.1.5 enables the evaluated metricsto �nd optimal values for parameters in IP algorithms. The tested full-reference metrics, SSIMand VIF, respond well to the distortions blur and noise. These are promising results but someobstacles remain when looking at our objective: use full-reference metrics to evaluate quality whenusing image enhancement algorithms. These obstacles will be discussed next.

Creating reference and test image

The full-reference metrics are not able to judge if a di�erence between test and ref image is good orbad (Judging Problem). It judges every di�erence as bad. So when performing an experiment we

38

need a reference image that is known to be always better in quality then the processed test image.So either the test image must be distorted or the reference image must be improved such thatthe processed test image can not, for any property, become of higher quality then the referenceimage. We have seen that the amount of distortion (noise and blur) added to the test imagein�uences the optimal values for the Alpha_Mean_Fact parameter. Since we are looking for theoptimal parameter settings to tune the real system, the test image must contain the same qualityproperties as the images in the real system. Thus, distorting the test image such that it no longerresembles a real image that is used in the system is not without risk, because it tunes the systemon the wrong input. The magnitude of this error should be further examined in order to identifywhether this will create a future problem.

So that leaves us with improving the reference image as a better alternative. For creating thereference image we have to use IP algorithms that are stronger then the ones we wish to evaluate.This is possible as there are no real-time constraints in making the refrence image.

Di�erent noise reduction algorithms

Two noise reduction algorithms were used for the evaluation in this research; Cosmix and ATR.These belong to two di�erent classes of algorithms.

ATR is a temporal noise reduction algorithm. It uses the fact that the system runs continuouslyand therefore processes long sequences of images. It uses an advanced technique to basicallyaverage between images or parts of images that have not moved. Such an algorithm has to �nd abalance between reducing noise and accurately display movement. In the case of stationary scenessharpness and contrast are not a�ected by the noise reduction algorithm. In the full-referencemodel it is not possible to work with moving sequences because we need a lot of images withoutobject movement to create our reference image. So this type of noise reduction algorithms can notbe evaluated using the full-reference metrics.

Cosmix is a spatial noise reduction algorithm. It can be seen as an advanced blur �lter. Suchan algorithm has to �nd a balance between reducing noise and maintaining sharpness to achievethe best possible quality.

Using full-reference metrics when evaluating noise reduction algorithms

Noise reduction algorithms, that are advanced blur �lters, �nd a balance between removing noiseand maintaining sharpness. When noise reduction IP incorporates sharpness improvement in thealgorithm, like Cosmix, we need a distance in sharpness between the test and reference image toavoid the Judging Problem. Until we are able to enhance the reference image we have to reducethe sharpness in the test image to create this distance. We do this by blurring the test image.Doing so will shift the optimal values found for the algorithm, since the image in the real systemhas di�erent properties (higher sharpness). So the values found will not be the exact optima's, butwill still give a good indication for the optimal tuning. Also when comparing di�erent algorithmswill this still result in valuable information. When one algorithm performs better then anotheralgorithm when processing an image with slightly reduced sharpness, the chances will be great thatthe same algorithm will perform better when processing the image with the original properties.Only in the case when one of the two algorithms is highly adaptive to sharpness, is it possible thatthis di�erence in sharpness results in di�erent conclusions when comparing algorithms.

The full-reference metrics SSIM and VIF �nd di�erent optima's for the balance between re-moving noise and maintaining sharpness. This is concluded in the experiment found in Section6.1.5. Other full-reference metrics will probably also �nd di�erent optimal values. So we cannot directly use these values but we can conclude that this setup enables full-reference metrics toexpress their preference (taste) in balancing a parameter.

39

7.2 Conclusions Shannon's Information Capacity metric

The IC metric grades every blurring operation as an improvement to image quality when performedon an image containing noise, see experiment 6.2.1. In spite this fact it still proved to be ableto �nd optimal values for parameters in a noise reduction algorithm. This experiment is foundin section 6.2.2. The optimum found for the Alpha_Mean_Fact parameter is located near thestandard value used in practice. It even is nearest to the standard value of all three metricstested. This metric does not need speci�c test and reference images like full-reference metrics dofor speci�c experiments. So as long as a sequence of images without object movement can bemade, IC can be used to decide the average image quality of such a sequence. This gives thismetric considerable more freedom than the full-reference metrics. Also the Judging Problem isavoided.

Full-reference versus Shannon's IC

The full-reference metrics and Shannon's IC metric both have their advantages and disadvantages.The advatage of IC is that is does not need a reference image as input, just a series of images

containing no movement. This way the judging problem is avoided. The disadvantage of this isthat it requires a set of 100 images as input, and these are not always easy to acquire.

The advantage of full-reference metrics is that when we would �nd better ways to create areference image, by for example using stronger IP algorithms, we can apply full-reference on anyimage including clinical images. Another advantage is that full-reference metrics are easier toadjust to comply with taste di�erences. When we want to adjust the taste preferences we onlyhave to adjust the reference image to suit a doctor's taste. This can be done in a visual way sincethe doctor can see the reference image. When we would want to tune IC according to a doctor'staste we have no visual objective we can adjust to suit the doctor's demands. The disadvantageis that the creation of the reference image is complex and causes the Judging Problem.

Both metrics have shown their potential, so it depends on the situation which will be the bestchoice.

7.3 Future work

In further research more parameters of IP algorithms should be evaluated to see how the optimalvalues compare to the subjective optimal values. In order to calculate more optimal values forother parameters the created experimental setup can be used. For full-reference metrics this isthe setup in Figure 6.11. For the IC metric the setup in Figure 6.22 can be used. By doing thiswe have more points to compare our subjective optima's (standard values) with the ones foundby a speci�c metric. When we perceive a pattern in the optimal values between a metric and ourpreferred standard values that can be mapped to each other, we have found a metric that can beadjusted to agree with our subjective preference. We can then employ this metric to determinenew parameters or decide between algorithms.

For the IC metrics shifting the optimal values such that they resemble our subjective preferencesmight be done by applying di�erent weights on di�erent spatial frequency components in the ICmetric. Frequencies that have a stronger e�ect on visual quality should get a higher weight.

Quality-potential

A problem that will occur when mapping a metric on a subjective preference is taste. While onedoctor may prefer higher contrast and does not care about the extra noise, another doctor mightthink that a low noise image is more pleasant to look at and will happily have less contrast tokeep the noise very limited. Two images, di�erent in contrast/sharpness/noise balance, that areeasily transformed to one another posses the same quality-potential.

We developed a way to measure quality-potential using full-reference metrics. This is doneas follows: Multiple reference images should be created using only linear transformations, such

40

that they all contain the same quality-potential. Then the similarity of the test image with all ofthe reference images is calculated using a full-reference metric. The smallest value found is theminimum distance to one of the reference images and considered the quality-potential of the testimage.

Improving the reference image

Improving the reference image will improve the e�ectiveness of the full-reference metrics. It alsoreduces the change that the Judging Problem occurs. This can be achieved in the following twoways.

• Use stronger image enhancement algorithms (that do not necessarily run in real-time) toimprove the reference image to a quality beyond the reach of the algorithm(s) to be evaluated.

• Manually enhancing reference image by using specialized image enhancing tools.

41

Bibliography

[1] F. W. Campbell and J. G. Robson. Application of fourier analysis to the visibility of gratings.Journal of Physiology (London) 197, pages 551�566, 1968.

[2] Jerry D. Gibson and Al Bovik, editors. Handbook of Image and Video Processing. AcademicPress, Inc., Orlando, FL, USA, 2000.

[3] Skoric B. Tuyls P.T. Ignatenko T., Schrijen G.J. and Willems F.M.J. Estimating the secrecy-rate of physical unclonable functions with the context-tree weighting method. IEEE Interna-tional Symposium on Information Theory, pages pp. 499�503, July 2006.

[4] Chrysi Papalazarou, Rudolph M. Snoeren, Frans M.J. Willems, Peter H.N. de With, HanKroon, and Peter Rongen. Towards a full-reference, information-theoretic quality assessmentmethod for x-ray images. Philips Medical Systems, 2008.

[5] Je�rey P.Woodard and Monica P. Carley-SpencerDOI 10.1385/NI:4:3:243. No-reference imagequality metrics for structural mri. NeuroInformatics, 4(3):243�262, 10 2006.

[6] D. Kunz R. Florent. Noise estimation in �uoroscopy. LEP - Technical Report No. C 97-667,Philips Electronics, 1997.

[7] Pierre-Hadi Saad. Using arti�cial intelligence in image quality evaluation. Philips MedicalSystems, 2006.

[8] Gerard Schouten. Xres3 reference model. Philips Medical Systems, Document ID BR-2066-07-0101, 26-11-2007.

[9] Marc Schrijver. Pattex: Practical application of tippo for tuning experiments in x-ray imageprocessing. August 2007.

[10] C. E. Shannon. A mathematical theory of communication. Bell system technical journal, 27,1948.

[11] C. E. Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10�21,1949.

[12] H. R. Sheikh and A. C. Bovik. Image information and visual quality. Image Processing, IEEETransactions on, 15(2):430�444, 2006.

[13] Hamid R. Sheikh, Muhammad F. Sabir, and Alan C. Bovik. A statistical evaluation of recentfull reference image quality assessment algorithms. IEEE Transactions on Image Processing,15(11):3440�3451, 2006.

[14] H.R. Sheikh, A.C. Bovik, and G. de Veciana. An information �delity criterion for imagequality assessment using natural scene statistics. 14(12):2117�2128, December 2005.

[15] Yury R. Tsoy, Vladimir G. Spitsyn, and Alexander V. Chernyavsky. No-reference imagequality assessment through interactive neuroevolution. Computer Engineering DepartmentTomsk Polytechnic University, Tomsk, Russia, 2007.

42

[16] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image quality assessment: From errorvisibility to structural similarity, 2004.

[17] Z Wang and E P Simoncelli. Reduced-reference image quality assessment using a wavelet-domain natural image statistic model. In B Rogowitz, T N Pappas, and S J Daly, editors,Proc. SPIE, Conf. on Human Vision and Electronic Imaging X, volume 5666, pages 149�159,San Jose, CA, January 17-20 2005.

[18] Zhou Wang and Alan C. Bovik. Modern Image Quality Assessment. Morgan & ClaypoolPublishers, 2006.

[19] Paul Withagen. Ipas: Objective image quality. Philips Medical Systems, Document IDDIH066-06-0103, 2007.

[20] Hamid R. Sheikh Zhou Wang and Alan C. Bovik. No-reference perceptual quality assessmentof jpeg compressed images. IEEE International Conference on Image Processing, September2002.

43

Documents

Objective image quality assessment for X-ray imaging · Objective image quality assessment for X-ray imaging Jop Daalmans May 30, 2008 Supervisors: ... Using an imperfect reference