AM Ch 4. Image Quality Assessment and Statistical Evaluation

www.Remote-Sensing.info

AM

www.Remote-Sensing.info

Ch 4. Image Quality Assessment and Statistical Evaluation

Many remote sensing datasets contain high-quality, accurate data. Unfortunately, sometimes error (or noise) is introduced into the remote sensor data by: • the environment (e.g., atmospheric scattering), • random or systematic malfunction of the remote sensing system (e.g., an uncalibrated detector creates striping), or • improper airborne or ground processing of the remote sensor data prior to actual data analysis (e.g., inaccurate analog-to- digital conversion).

Many remote sensing datasets contain high-quality, accurate data. Unfortunately, sometimes error (or noise) is introduced into the remote sensor data by: • the environment (e.g., atmospheric scattering), • random or systematic malfunction of the remote sensing system (e.g., an uncalibrated detector creates striping), or • improper airborne or ground processing of the remote sensor data prior to actual data analysis (e.g., inaccurate analog-to- digital conversion).

Image Quality Assessment and Statistical EvaluationImage Quality Assessment and Statistical Evaluation

Therefore, the person responsible for analyzing the digital remote sensor data should first assess its quality and statistical characteristics. This is normally accomplished by: • looking at the frequency of occurrence of

individual brightness values in the image displayed in a histogram

• viewing on a computer monitor individual pixel brightness values at specific locations or within a geographic area,

• computing univariate descriptive statistics to determine if there are unusual anomalies in the image data, and

• computing multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy).

Therefore, the person responsible for analyzing the digital remote sensor data should first assess its quality and statistical characteristics. This is normally accomplished by: • looking at the frequency of occurrence of

individual brightness values in the image displayed in a histogram

• viewing on a computer monitor individual pixel brightness values at specific locations or within a geographic area,

• computing univariate descriptive statistics to determine if there are unusual anomalies in the image data, and

• computing multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy).

Image Quality Assessment and Statistical EvaluationImage Quality Assessment and Statistical Evaluation

Remote Sensing Sampling Theory Remote Sensing Sampling Theory

A population is an infinite or finite set of elements. An infinite population could be all possible images that might be acquired of the Earth in 2004. All Landsat 7 ETM+ images of Charleston, S.C. in 2004 is a finite population.

A sample is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. For example, we might decide to analyze a June 1, 2004, Landsat image of Charleston. If observations with certain characteristics are systematically excluded from the sample either deliberately or inadvertently (such as selecting images obtained only in the spring of the year), it is a biased sample. Sampling error is the difference between the true value of a population characteristic and the value of that characteristic inferred from a sample.

A population is an infinite or finite set of elements. An infinite population could be all possible images that might be acquired of the Earth in 2004. All Landsat 7 ETM+ images of Charleston, S.C. in 2004 is a finite population.

A sample is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. For example, we might decide to analyze a June 1, 2004, Landsat image of Charleston. If observations with certain characteristics are systematically excluded from the sample either deliberately or inadvertently (such as selecting images obtained only in the spring of the year), it is a biased sample. Sampling error is the difference between the true value of a population characteristic and the value of that characteristic inferred from a sample.


• Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped and is called a normal distribution.

• Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed and the analyst must be careful to identify such conditions. In such instances, nonparametric statistical theory may be preferred.

• Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped and is called a normal distribution.

• Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed and the analyst must be careful to identify such conditions. In such instances, nonparametric statistical theory may be preferred.

Common Symmetric and

Skewed Distributions in

Remotely Sensed Data

Common Symmetric and

Skewed Distributions in

Remotely Sensed Data


• The histogram is a useful graphic representation of the information content of a remotely sensed image.

•It is instructive to review how a histogram of a single band of imagery, k, composed of i rows and j columns with a brightness value BVijk at each pixel location is constructed.

• The histogram is a useful graphic representation of the information content of a remotely sensed image.

•It is instructive to review how a histogram of a single band of imagery, k, composed of i rows and j columns with a brightness value BVijk at each pixel location is constructed.

Histogram of A Single Band of

Landsat Thematic Mapper Data of Charleston, SC

Histogram of A Single Band of

Landsat Thematic Mapper Data of Charleston, SC

Histogram of Thermal Infrared

Imagery of a Thermal Plume in the Savannah

River

Histogram of Thermal Infrared

Imagery of a Thermal Plume in the Savannah

River

Cursor and Raster Display of Brightness Values Cursor and Raster Display of Brightness Values

Two- and Three-Dimensional Evaluation of

Pixel Brightness Values within a

Geographic Area

Two- and Three-Dimensional Evaluation of

Pixel Brightness Values within a

Geographic Area

Univariate Descriptive Image StatisticsUnivariate Descriptive Image Statistics

Measures of Central Tendency in Remote Sensor Data

• The mode is the value that occurs most frequently in a distribution and is usually the highest point on the curve (histogram). It is common, however, to encounter more than one mode in a remote sensing dataset. The histograms of the Landsat TM image of Charleston, SC and the predawn thermal infrared image of the Savannah River have multiple modes. They are nonsymmetrical (skewed) distributions.

•The median is the value midway in the frequency distribution. One-half of the area below the distribution curve is to the right of the median, and one-half is to the left.

Measures of Central Tendency in Remote Sensor Data

• The mode is the value that occurs most frequently in a distribution and is usually the highest point on the curve (histogram). It is common, however, to encounter more than one mode in a remote sensing dataset. The histograms of the Landsat TM image of Charleston, SC and the predawn thermal infrared image of the Savannah River have multiple modes. They are nonsymmetrical (skewed) distributions.

•The median is the value midway in the frequency distribution. One-half of the area below the distribution curve is to the right of the median, and one-half is to the left.

Univariate Descriptive Image StatisticsUnivariate Descriptive Image Statistics

The mean is the arithmetic average and is defined as the sum of all brightness value observations divided by the number of observations. It is the most commonly used measure of central tendency. The mean (mk) of a single band of imagery composed of n brightness values (BVik) is computed using the formula:

The sample mean, mk, is an unbiased estimate of the population mean. For symmetrical distributions, the sample mean tends to be closer to the population mean than any other unbiased estimate (such as the median or mode).

The mean is the arithmetic average and is defined as the sum of all brightness value observations divided by the number of observations. It is the most commonly used measure of central tendency. The mean (mk) of a single band of imagery composed of n brightness values (BVik) is computed using the formula:

The sample mean, mk, is an unbiased estimate of the population mean. For symmetrical distributions, the sample mean tends to be closer to the population mean than any other unbiased estimate (such as the median or mode).

n

BVn

iik

k

1

Remote Sensing Univariate Statistics - VarianceRemote Sensing Univariate Statistics - Variance

Measures of Dispersion

Measures of the dispersion about the mean of a distribution provide valuable information about the image. For example, the range of a band of imagery (rangek) is computed as the difference between the maximum (maxk) and minimum (mink) values; that is,

Unfortunately, when the minimum or maximum values are extreme or unusual observations (i.e., possibly data blunders), the range could be a misleading measure of dispersion. Such extreme values are not uncommon because the remote sensor data are often collected by detector systems with delicate electronics that can experience spikes in voltage and other unfortunate malfunctions. When unusual values are not encountered, the range is a very important statistic often used in image enhancement functions such as min–max contrast stretching.


Measures of the dispersion about the mean of a distribution provide valuable information about the image. For example, the range of a band of imagery (rangek) is computed as the difference between the maximum (maxk) and minimum (mink) values; that is,

Unfortunately, when the minimum or maximum values are extreme or unusual observations (i.e., possibly data blunders), the range could be a misleading measure of dispersion. Such extreme values are not uncommon because the remote sensor data are often collected by detector systems with delicate electronics that can experience spikes in voltage and other unfortunate malfunctions. When unusual values are not encountered, the range is a very important statistic often used in image enhancement functions such as min–max contrast stretching.

kkkrange minmax

Remote Sensing Univariate Statistics - VarianceRemote Sensing Univariate Statistics - Variance


The variance of a sample is the average squared deviation of all possible observations from the sample mean. The variance of a band of imagery, vark, is computed using the equation:

The numerator of the expression is the corrected sum of squares (SS). If the sample mean (mk) were actually the population mean, this would be an accurate measurement of the variance.


The variance of a sample is the average squared deviation of all possible observations from the sample mean. The variance of a band of imagery, vark, is computed using the equation:

The numerator of the expression is the corrected sum of squares (SS). If the sample mean (mk) were actually the population mean, this would be an accurate measurement of the variance.

n

BVn

ikik

k

1

2

var

Remote Sensing Univariate StatisticsRemote Sensing Univariate Statistics

Unfortunately, there is some underestimation because the sample mean was calculated in a manner that minimized the squared deviations about it. Therefore, the denominator of the variance equation is reduced to n – 1, producing a larger, unbiased estimate of the sample variance:

Unfortunately, there is some underestimation because the sample mean was calculated in a manner that minimized the squared deviations about it. Therefore, the denominator of the variance equation is reduced to n – 1, producing a larger, unbiased estimate of the sample variance:

1var

n

SSk

Remote Sensing Univariate StatisticsRemote Sensing Univariate Statistics

The standard deviation is the positive square root of the variance. The standard deviation of the pixel brightness values in a band of imagery, sk, is computed as

The standard deviation is the positive square root of the variance. The standard deviation of the pixel brightness values in a band of imagery, sk, is computed as

kkks var

Pixel Band 1(green)

Band 2 (red)

Band 3 (near-

infrared)

Band 4 (near-

infrared)

(1,1) 130 57 180 205

(1,2) 165 35 215 255

(1,3) 100 25 135 195

(1,4) 135 50 200 220

(1,5) 145 65 205 235

Hypothetical Dataset of Brightness ValuesHypothetical Dataset of Brightness Values

Jensen, 2004Jensen, 2004

Band 1(green)

Band 2 (red)

Band 3 (near-

infrared)

Band 4 (near-

infrared)

Mean (mk) 135 46.40 187 222

Variance (vark)

562.50 264.80 1007 570

Standard deviation

(sk)

23.71 16.27 31.4 23.87

Minimum(mink)

100 25 135 195

Maximum (maxk)

165 65 215 255

Range (BVr) 65 40 80 60

Univariate Statistics for the Hypothetical Example DatasetUnivariate Statistics for the Hypothetical Example Dataset

Measures of Distribution (Histogram) Asymmetry and Peak Sharpness


Skewness is a measure of the asymmetry of a histogram and is computed using the formula:

A perfectly symmetric histogram has a skewness value of zero.

Skewness is a measure of the asymmetry of a histogram and is computed using the formula:

A perfectly symmetric histogram has a skewness value of zero.n

s

BV

skewness

n

i k

kik

k

1

3

A histogram may be symmetric but have a peak that is very sharp or one that is subdued when compared with a perfectly normal distribution. A perfectly normal distribution (histogram) has zero kurtosis. The greater the positive kurtosis value, the sharper the peak in the distribution when compared with a normal histogram. Conversely, a negative kurtosis value suggests that the peak in the histogram is less sharp than that of a normal distribution.

A histogram may be symmetric but have a peak that is very sharp or one that is subdued when compared with a perfectly normal distribution. A perfectly normal distribution (histogram) has zero kurtosis. The greater the positive kurtosis value, the sharper the peak in the distribution when compared with a normal histogram. Conversely, a negative kurtosis value suggests that the peak in the histogram is less sharp than that of a normal distribution.

31

1

4

n

i k

kikk s

BV

nkurtosis



Remote Sensing Multivariate StatisticsRemote Sensing Multivariate Statistics

Remote sensing research is often concerned with the measurement of how much radiant flux is reflected or emitted from an object in more than one band (e.g., in red and near-infrared bands). It is useful to compute multivariate statistical measures such as covariance and correlation among the several bands to determine how the measurements covary. Later it will be shown that variance–covariance and correlation matrices are used in remote sensing principal components analysis (PCA), feature selection, classification and accuracy assessment.

Remote sensing research is often concerned with the measurement of how much radiant flux is reflected or emitted from an object in more than one band (e.g., in red and near-infrared bands). It is useful to compute multivariate statistical measures such as covariance and correlation among the several bands to determine how the measurements covary. Later it will be shown that variance–covariance and correlation matrices are used in remote sensing principal components analysis (PCA), feature selection, classification and accuracy assessment.


To calculate covariance, we first compute the corrected sum of products (SP) defined by the equation:

To calculate covariance, we first compute the corrected sum of products (SP) defined by the equation:

lil

n

ikikkl BVBVSP

1

Just as simple variance was calculated by dividing the corrected sums of squares (SS) by (n – 1), covariance is calculated by dividing SP by (n – 1). Therefore, the covariance between brightness values in bands k and l, covkl, is equal to:

Just as simple variance was calculated by dividing the corrected sums of squares (SS) by (n – 1), covariance is calculated by dividing SP by (n – 1). Therefore, the covariance between brightness values in bands k and l, covkl, is equal to:

1cov

n

SPklkl


Band 1(green)

Band 2 (red)

Band 3 (near-

infrared)

Band 4 (near-

infrared)

Band 1 SS1cov1,2 cov1,3 cov1,4

Band 2 cov2,1 SS2cov2,3 cov2,4

Band 3 cov3,1 cov3,2 SS3cov3,4

Band 4 cov4,1 cov4,2 cov4,3 SS4

Format of a Variance-Covariance MatrixFormat of a Variance-Covariance Matrix

Jensen, 2004Jensen, 2004

Band 1(green)

Band 2 (red)

Band 3 (near-

infrared)

Band 4 (near-

infrared)

Band 1 562.25 - - -

Band 2 135 264.80 - -

Band 3 718.75 275.25 1007.50 -

Band 4 537.50 64 663.75 570

Variance-Covariance Matrix of the Sample DataVariance-Covariance Matrix of the Sample Data

Correlation between Multiple Bands of Remotely Sensed Data


To estimate the degree of interrelation between variables in a manner not influenced by measurement units, the correlation coefficient, r, is commonly used. The correlation between two bands of remotely sensed data, rkl, is the ratio of their covariance (covkl) to the product of their standard deviations (sksl); thus:

To estimate the degree of interrelation between variables in a manner not influenced by measurement units, the correlation coefficient, r, is commonly used. The correlation between two bands of remotely sensed data, rkl, is the ratio of their covariance (covkl) to the product of their standard deviations (sksl); thus:

lk

klkl ss

rcov



If we square the correlation coefficient (rkl), we obtain the sample coefficient of determination (r2), which expresses the proportion of the total variation in the values of “band l” that can be accounted for or explained by a linear relationship with the values of the random variable “band k.” Thus a correlation coefficient (rkl) of 0.70 results in an r2 value of 0.49, meaning that 49% of the total variation of the values of “band l” in the sample is accounted for by a linear relationship with values of “band k”.

If we square the correlation coefficient (rkl), we obtain the sample coefficient of determination (r2), which expresses the proportion of the total variation in the values of “band l” that can be accounted for or explained by a linear relationship with the values of the random variable “band k.” Thus a correlation coefficient (rkl) of 0.70 results in an r2 value of 0.49, meaning that 49% of the total variation of the values of “band l” in the sample is accounted for by a linear relationship with values of “band k”.

Correlation Matrix of the Sample DataCorrelation Matrix of the Sample Data

Band 1(green)

Band 2 (red)

Band 3 (near-

infrared)

Band 4 (near-

infrared)

Band 1 - - - -

Band 2 0.35 - - -

Band 3 0.95 0.53 - -

Band 4 0.94 0.16 0.87 -

Band Min Max Mean Standard Deviation 1 51 242 65.163137 10.231356 2 17 115 25.797593 5.956048 3 14 131 23.958016 8.469890 4 5 105 26.550666 15.690054 5 0 193 32.014001 24.296417 6 0 128 15.103553 12.738188 7 102 124 110.734372 4.305065

Covariance MatrixBand Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 1 104.680654 58.797907 82.602381 69.603136 142.947000 94.488082 24.464596 2 58.797907 35.474507 48.644220 45.539546 90.661412 57.877406 14.812886 3 82.602381 48.644220 71.739034 76.954037 149.566052 91.234270 23.827418 4 69.603136 45.539546 76.954037 246.177785 342.523400 157.655947 46.815767 5 142.947000 90.661412 149.566052 342.523400 590.315858 294.019002 82.994241 6 94.488082 57.877406 91.234270 157.655947 294.019002 162.261439 44.674247 7 24.464596 14.812886 23.827418 46.815767 82.994241 44.674247 18.533586

Correlation MatrixBand Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 1 1.000000 0.964874 0.953195 0.433582 0.575042 0.724997 0.555425 2 0.964874 1.000000 0.964263 0.487311 0.626501 0.762857 0.577699 3 0.953195 0.964263 1.000000 0.579068 0.726797 0.845615 0.653461 4 0.433582 0.487311 0.579068 1.000000 0.898511 0.788821 0.693087 5 0.575042 0.626501 0.726797 0.898511 1.000000 0.950004 0.793462 6 0.724997 0.762857 0.845615 0.788821 0.950004 1.000000 0.814648 7 0.555425 0.577699 0.653461 0.693087 0.793462 0.814648 1.000000

Band Min Max Mean Standard Deviation 1 51 242 65.163137 10.231356 2 17 115 25.797593 5.956048 3 14 131 23.958016 8.469890 4 5 105 26.550666 15.690054 5 0 193 32.014001 24.296417 6 0 128 15.103553 12.738188 7 102 124 110.734372 4.305065

Covariance MatrixBand Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 1 104.680654 58.797907 82.602381 69.603136 142.947000 94.488082 24.464596 2 58.797907 35.474507 48.644220 45.539546 90.661412 57.877406 14.812886 3 82.602381 48.644220 71.739034 76.954037 149.566052 91.234270 23.827418 4 69.603136 45.539546 76.954037 246.177785 342.523400 157.655947 46.815767 5 142.947000 90.661412 149.566052 342.523400 590.315858 294.019002 82.994241 6 94.488082 57.877406 91.234270 157.655947 294.019002 162.261439 44.674247 7 24.464596 14.812886 23.827418 46.815767 82.994241 44.674247 18.533586

Correlation MatrixBand Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 1 1.000000 0.964874 0.953195 0.433582 0.575042 0.724997 0.555425 2 0.964874 1.000000 0.964263 0.487311 0.626501 0.762857 0.577699 3 0.953195 0.964263 1.000000 0.579068 0.726797 0.845615 0.653461 4 0.433582 0.487311 0.579068 1.000000 0.898511 0.788821 0.693087 5 0.575042 0.626501 0.726797 0.898511 1.000000 0.950004 0.793462 6 0.724997 0.762857 0.845615 0.788821 0.950004 1.000000 0.814648 7 0.555425 0.577699 0.653461 0.693087 0.793462 0.814648 1.000000

Univariate and Multivariate

Statistics of Landsat TM Data of

Charleston, SC

Univariate and Multivariate

Statistics of Landsat TM Data of

Charleston, SC

Feature Space PlotsFeature Space Plots

The univariate and multivariate statistics discussed provide accurate, fundamental information about the individual band statistics including how the bands covary and correlate. Sometimes, however, it is useful to examine statistical relationships graphically.

Individual bands of remotely sensed data are often referred to as features in the pattern recognition literature. To truly appreciate how two bands (features) in a remote sensing dataset covary and if they are correlated or not, it is often useful to produce a two-band feature space plot.

The univariate and multivariate statistics discussed provide accurate, fundamental information about the individual band statistics including how the bands covary and correlate. Sometimes, however, it is useful to examine statistical relationships graphically.

Individual bands of remotely sensed data are often referred to as features in the pattern recognition literature. To truly appreciate how two bands (features) in a remote sensing dataset covary and if they are correlated or not, it is often useful to produce a two-band feature space plot.

Feature Space PlotsFeature Space Plots

A two-dimensional feature space plot extracts the brightness value for every pixel in the scene in two bands and plots the frequency of occurrence in a 255 by 255 feature space (assuming 8-bit data). The greater the frequency of occurrence of unique pairs of values, the brighter the feature space pixel.

A two-dimensional feature space plot extracts the brightness value for every pixel in the scene in two bands and plots the frequency of occurrence in a 255 by 255 feature space (assuming 8-bit data). The greater the frequency of occurrence of unique pairs of values, the brighter the feature space pixel.

Two-dimensional Feature Space Plot of Landsat

Thematic Mapper Band 3

and 4 Data of Charleston, SC

obtained on November 11,

1982

Two-dimensional Feature Space Plot of Landsat

Thematic Mapper Band 3

and 4 Data of Charleston, SC

obtained on November 11,

1982

Documents

AM Ch 4. Image Quality Assessment and Statistical Evaluation