Remote Sensing and Image Analysia

Remote Sensing and Image Analysis

1.1 Remote Sensing

Remote sensing is a technology used for obtaining information about a target through the analysis of data acquired from the target at a distance. It is composed of three parts, the targets - objects or phenomena in an area; the data acquisition - through certain instruments; and the data analysis - again by some devices. This definition is so broad that the vision system of human eyes, sonar sounding of the sea floor, ultrasound and x-rays used in medical sciences, laser probing of atmospheric particles, are all included. The target can be as big as the earth, the moon and other planets, or as small as biological cells that can only be seen through microscopes. A diagrammatic illustration of the remote sensing process is shown in Figure 1.1.

An essential component in geomatics, natural resource and environmental studies is the measurement and mapping of the earth surface - land and water bodies. We are interested in knowing the types of objects, the quality and quantity of various objects, their distribution in space and time, their spatial and temporal relationships, etc. In this book, we introduce some of the major remote sensing systems used for mapping the earth. We concentrate on examining how satellite and airborne images about the earth are collected, processed and analyzed. We illustrate various remote sensing techniques for information extraction about the identity, quantity, spatial and temporal distribution of various targets of interest.

Remote sensing data acquisition can be conducted on such platforms as aircraft, satellites, balloons, rockets, space shuttles, etc. Inside or on-board these platforms, we use sensors to collect data. Sensors include aerial photographic cameras and non-photographic instruments, such as radiometers, electro-optical scanners, radar systems, etc. The platform and sensors will be discussed in detail later.

Electro-magnetic energy is reflected, transmitted or emitted by the target and recorded by the sensor. Because energy travels through the medium of the earth's atmosphere, it is modified such that the signal between the target and the sensor will differ. The effects of the atmosphere on remote sensing will be examined later. Methods will be introduced to reduce such atmospheric effects.

Once image data are acquired, we need methods for interpreting and analyzing images. By knowing "what" information we expect to derive from remote sensing, we will examine methods that can be used to obtain the desirable information. We are interested in "how" various methods of remote sensing data analysis can be used.

In summary, we want to know how electromagnetic energy is recorded as remotely sensed data, and how such data are transformed into valuable information about the earth surface.

Figure 1.1 The Flows of Energy and Informationin ReSensing

mote

1.2 Milestones in the History of Remote Sensing

The following is a brief list of the times when innovative development of remote sensing

1839 Photography was invented

were documented. More details may be found in Lillesand and Kiefer (1987) and Campbell (1987).

1858 ard Felix Tournachon used a balloon to ce

Parisian Photographer, Gaspascend to a height of 80m to obtain the photograph over Bievre, Fran

1882 Kites were used for photography

1909 Airplanes were used as a platform for photography

1910-20 World War I. Aerial reconnaissance: Beginning of photo interpretation

1920- Aerial photogrammetry was developed 50

1934 American Society of Photogrammetry was established. Radar development for military use started

1940's invented Color photography was

1940's gnetic spectrum, mainly near-infrared, Non-visible portions of electromatraining of photo-interpretation

1950- ble photography, multi-camera Further development of non-visi

1970 sensors. ,

photography, color-infrared photography, and non-photographicSatellite sensor development - Very High Resolution Radiometer (VHRR)Launch of weather satellites such as Nimbus and TIROS

1962 The term "Remote Sensing" first appeared

1972 ,Remote sensing has been The launch of Landsat-1, originally ERTS-1extensively investigated and applied since then

1982 Mapper Second generation of Landsat sensor: Thematic

1986

M, HRV have

and

French SPOT-1 High Resolution Visible sensors MSS, Tbeen the major sensors for data collection for large areas all over the world. Such data have been widely used in natural resources inventorymapping. Major areas include agriculture, forest, wet land, mineral exploration, mining, etc.

1980- from other countries such as India, Japan, and 90

Earth-Resources SatelliteUSSR. Japan's Marine Observing Satellite (MOS - 1)

1986-

n developed.

• developers: JPL, Moniteq,ITRES and CCRS. A more detailed

A new type of sensor called an imaging spectrometer, has bee

• Products: AIS, AVIRIS, FLI, CASI, SFSI, etc.description of this subject can be found in Staenz (1992).

1990-

Proposed EOS aiming at providing data for global change monitoring.

• Japan's JERS-1 SAR, Sensing Satellite SAR,

pectrometer data will be the major theme of

Various sensors have been proposed.

• European ERS Remote• Canada's Radarsat • Radar and imaging s

this decade and probably next decade as well

1.3 Resolution and Sampling in Remotely Sensed Data

sk: what are the factors that make remotely sensed images taken for the e

nt

We begin to asame target different? Remotely sensed data record the dynamics of the earth surface. Ththree-dimensional earth surface is changing as time goes. Two images taken at the same place with the same imaging condition will not be the same if they are obtained at differetimes. Among many other factors that will be introduced in later chapters, sensor and

platform design affect the quality of remotely sensed data. Rgeneralization. Among various factors that affect the quality and information content of remotely sensed data, two concepts are extremely important for us to understand. They determine the level of details of the modeling process. These are the resolution and the sampling frequency.

emote sensing data can be considered as models of the earth surface at very low level of

Resolutionthe maximum separating or discriminating power of a

l, measurement. It can be divided into four types: spectraradiometric, spatial and temporal.

Sampling frequency

determines how frequent are data collected. There are three types of sampling important to remote sensing: spectral, spatial and temporal.

Comhave different types of remote sensing data.

binations of resolutions and sampling frequencies have made it possible for us to

ple, assume that the level of solar energy coming from the sun and passing in

For examthrough the atmosphere at a spectral region between 0.4 mm - 1.1 mm is distributed asFig. 1.2. This is a continuous curve.

Fig. 1.2 Solar Energy Reaching the Earth Surface

fter the solar energy interacts with a target such as a forest on the earth, the energy is

d in

Apartly absorbed, transmitted, or scattered and reflected. Assume that the level of the scattered and reflected energy collected by a sensor behaves in a manner as illustrateFig. 1.3.

Fig. 1.3 Reflected Solar Energy by Trees

The process that makes the shape of the energy curve change from Fig. 1.2 change to Fig. 1.3 will be discussed later. Let us use Fig. 1.3 to discuss the concepts of spectral resolution and spectral sampling.

Fig. 1.4 Example of Differences in Spectral Resolution and Spectral Sampling

In Figure 1.4, the three shaded bars A, B, and C represent three spectral bands. The width of each bar covers a spectral range within which no signal variation can be resolved. The width of each spectral band represents its spectral resolution. The resolution of A is coarser than the resolution of B. This is because spectral details within band A that cannot be discriminated may be partly discriminated with a spectral resolution as narrow as band B. The resolution relationships among the three bands are:

Resolution of A < Resolution of C < Resolution of B

Sampling determines the various ways we use to record a spectral curve. If data storage is not an issue, we may choose to sample the entire spectral curve with many narrow spectral

bands. Sometimes, we choose to make a discrete sampling over a spectral curve (Figure 1.4). The questions are: which way of sampling is more appropriate and what resolution is better? It is obvious that if we use a low resolution, we are going to blur the curve. The finer the resolution is, the more precise can we restore a curve, provided that sufficient spectral sampling frequency is used. The difference between imaging spectrometers and earlier generation sensors is in the difference of the spectral sampling frequency. Sensors of earlier generations use selective spectral sampling. Imaging spectrometers have a complete systematic sampling scheme over the entire spectral range. An imaging spectrometer, such as CASI, has 288 spectral bands between 0.43 - 0.92 spectral region, while earlier generation sensors only have 3 - 7 spectral bands.

Spatial resolution and sampling Similar to the spectral case, the surface has to be sampled with certain spatial resolution. The difference is that spatial sampling is mostly systematic, i.e., a complete sampling over an area of interest. The difference in spatial resolution can be seen in Figure 1.5.

Figure 1.5. Sampling the same target with different spatial resolutions.

A scene including a house with garage and driveway is imaged with two different spatial resolutions. For each cell in Figure 1.5a no object occupies an entire cell. Each cell will contain energy from different cover types. Such cells are called mixed pixels, also known as mixels. In Chapter 7, we will introduce some methods that can be used to decompose mixed pixels. Mixed pixels are very difficult to discriminate from each other. Obviously a house cannot be easily recognized at the level of resolution in Figure 1.5a, but it may be

possible in Figure 1.5b. As spatial resolution becomes finer, more details about objects in a scene become available. In general it is true that with finer spatial resolutions objects can be better discriminated with human eyes. With computers, however, it may be harder to recognize objects imaged with finer spatial resolutions. This is because finer spatial resolutions increase the image size for a computer to handle. More importantly, for many computer analysis algorithms, they cause the effect of "seeing the tree but not the forest." Computer techniqes are far poorer than human brain in generalization from fine details. Temporal sampling can be regarded similar to spectral sampling. For example, temporal sampling means how frequently we are imaging an area of interest. Are we going to use contiguous systematic sampling as in movie making or selective sampling as in most photographic actions? To decide the temporal sampling scheme, the dynamic characteristics of the target under study have to be considered. For instance, if the study subject is to discriminate crop species, the phenological calendar of each crop type should be considered for when to collect remotely sensed data in order to best characterize each different crop species. The data could be selected from the entire growing season between late April to early October for mid and high latitudes in northern hemisphere. If the subject is flood monitoring, the temporal sampling frequency should be high during the flood period because floods usually last only a few hours to a few days. Radiometric resolution can be understood in a similar manner as with spatial resolution. This is a concept well illustrated in a number of digital image processing books (e.g., Gonzalez and Wintz, 1987; Pratt, 1991). It is associated with the level of quantization of an image which is in turn related to how to use the minimum amount of data storage to represent the maximum amount of information. This is often a concern in data compression. Although we will explain the concept of radiometric resolution in Chapter 5, we will only touch the topic of data compression in Chapter 7 from an information extraction point of view.

1.4 Use of Remote Sensing A fundamental use of remote sensing is to extend our visual capability. In addition, remote sensing can enhance our memory because our brains tend not to remember every fine piece of details about what we see. With remote sensing images, we can do a lot more than refreshing our memories, which is a primary goal of conventional photography. We want to measure and map spatial dimensions of objects from remote sensing images. Furthermore, we use remotely sensed data to monitor the dynamics of the phenomena on the earth surface. These include monitoring the vigor and stress of vegetation and environmental quality, measuring the temperature of various objects, detecting and identifying catastrophic sites caused by fire, flood, volcano, earthquakes etc., estimating the mass of various components, such as biogeochemical constituents of a forest, volume

of fish schools in water, crop production of agricultural systems, water storage and runoff of watersheds, population in rural and urbanized areas, and quantity and living conditions of wildlife species.

We organize the remaining chapters of this book that lead you to take more advantages of remote sensing in the applications mentioned above. In Chapter two, we will first introduce the very basic physics required to understand the imaging mechanism in remote sensing. In Chapter three, we introduce the development of sensing systems following a historical order. In Chapter four, we introduce imaging geometry and illustrate geometrical calibration methods that are required to achieve precise measurement of spatial dimensions of objects. In Chapter five, we explain various methods for recovering image radiometry affected by sensor malfunctioning, atmospheric interference and terrain relief. In Chapter six, we illustrate some of the most commonly used image processing methods for image enhancement. In Chapter seven, we focus on the introduction of various strategies for information extraction from remotely sensed data. In Chapter eight, following a brief introduction on map making, we introduce some methods that are used to combine maps and other spatial data with remotely sensed data for analysis and extraction of information on various targets.

Chapter 1

References

Campbell, J.B., 1987. Introduction to Remote Sensing, The Guilford Press.

Gonzalez, R.C., P. Wintz, 1987. Digital Image Processing. 2nd Ed., Addison-Wesley, Reading:MA.

Lillesand, T.M. and Kiefer, R.W., 1987, Remote Sensing and Image Interpretation, Sec. Ed., John Wiley and Sons, Inc.: Toronto.

Emphasis on aerial photography, photogrammetry, photo interpretation, non-photographic sensing systems and their image interpretation, and introduction to digital image processing.

Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187-197.

Lists most of the imaging spectrometers developed worldwide. Sensor calibration and various applications.

Pratt, W., 1991. Digital Image Processing. John Wiley and Sons, Inc.: Toronto.

Further Readings

Asrar G., ed. 1989, Theory and Applications of Optical Remote Sensing, John Wiley and Sons, Toronto.

A selection of most important fields of optical remote sensing ranging from the physical basis of energy-meter interaction, vegetation canopy modelling, atmospheric effects reduction, applications to forest, agriculture, coastal wetland, geology, snow and ice, climatology and meteorology, and ecosystem. Its emphasis is on the application of remote sensing to understanding land-surface processes globally.

Jensen, J.R., 1986, Digital Image Processing, an Introductionary Perspective. Prentice-Hall: Englewood Cliffs, N.J.

A good introduction book on digital image analysis concepts and procedures. A show how type of book. Easy for beginners. Typical topics covered include image statistics , image enhancement in spatial domain, geometric correction, classification, change detection. Completely related to a remote sensing context.

Richards, J.A., 1986, Digital Image Processing, Springer-Verlag: New York.

A good introduction book. More mathematical than Jensen's book. Some additional materials in comparison to Jensen's book include an entire chapter on Fourier Transform. Relationships among some basic image enhancement and image classification algorithms.

2.1 Electromagnetic Energy

Energy is a group of particles travelling through a certain media. Electromagnetic energy is a group of particles with different frequencies travelling at the same velocity. These particles have a dual-mode nature. They are particles but they travel in a wave form.

Electromagnetic waves obey the following rule:

This equation explains that the shorter wavelength has higher spectral frequency

Electromagnetic energy is a mixture of waves with different frequencies. It may be viewed as:

Each wave represents a group of particles with the same frequency. All together they have different frequencies and magnitudes.

With each wave, there is an electronic (E) component and a magnetic component (M). The Amplitude (A) reflects the level of the electromagnetic energy. It may also be considered as intensity or spectral irradiance. If we plot A against the wavelength λ we then get an electromagnetic curve, or spectrum (Figure 2.1).

Figure 2.1. An electromagnetic spectrum

Any matter with a body temperature greater than 0 K emits electromagnetic energy. Therefore, it has a spectrum. Furthermore, different chemical elements have different spectra. They absorb and reflect spectral energy differently. Different elements are combined to form compounds. Each compound has a unique spectrum due to its unique molecular structure. This is the basis for the application of spectroscopy to identify chemical materials. It is also the basis for remote sensing in discriminating one matter from the other. Spectrum of a material is like the finger print of human being.

2.2 Major Divisions of Spectral Wavelength Regions

The wavelength of electromagnetic energy has such a wide range that no instrument can measure it completely. Different devices, however, can measure most of the major spectral regions.

The division of the spectral wavelength is based on the devices which can be used to observe particular types of energy, such as thermal, shortwave infrared and microwave energy. In reality, there are no real abrupt changes on the magnitude of the spectral energy. The spectrum are conventionally divided into various parts as shown below:

The optical region covers 0.3 - 15 mm where energy can be collected through lenses. The reflective region, 0.4 - 3.0 mm, is a subdivision of the optical region. In this spectral region, we collect solar energy reflected by the earth surface. Another subdivision of the optical spectral region is the thermal spectral range which is between 3 mm to 15 mm, where energy comes primarily from surface emittance. Table 2.1 lists major uses of some spectral wavelength regions.

Table 2.1. Major uses of some spectral wavelength regions

Wavelength Use Wavelength Use

g ray X ray Ultraviolet(UV) 0.4-0.45 µm 0.7-1.1 µm

Mineral

Medical

Detecting oil spill

Water depth, turbidity

Vegetation vigor

1.55-1.75 µm 2.04-2.34 µm 10.5-12.5 µm 3 cm - 15 cm 20 cm - 1 m

Water content in plant or soil

Mineral, rock types

Surface temperature

Surface relief, soil moisture

Canopy penetration, woody biomass

2.3 Radiation Laws

At the reflective spectral region, we are more concerned about the reflective properties of an object. But in the thermal spectral region, we have to rely on the emittance of an object. This is because most matters at the conventional temperature (temperature of our environment) emit energy that can be measured. Therefore, we introduce some basics of the radiation theory.

The first theory treats electromagnetic radiation as many discrete particles called photons or quanta (terms in Physics). The energy of a quantum is given by

E = hv

where

E energy of a quantum (Joules) h = 6.626 x 10-34 (Planck's constant) v frequency

since

thus

Energy (or radiation) of a quantum is inversely proportional to the wavelength. The longer the wavelength of a quantum, the smaller is its energy. (The shorter the wavelength, the stronger is its energy.) Thus, the energy of a very short wavelength (UV and shorter) is dangerous to human health. If we want to sense emittance from objects at longer wavelength, we will have to either use very sensitive devices or use less sensitive device to view a larger area to get sufficient amount of energy.

This has implications to remote sensing sensor design. To use the available sensing technology at hand, we will have to balance between wavelength and spatial resolution. If we wish to make our sensor to have higher spatial resolution, we may have to use short wavelength regions.

The second radiation theory is Stefan-Boltzmann Law:

M: total radiant existence for a surface of a material watts/m2

: Stefan-Boltzmann constant, 5.6697 x 10-8 Wm-2 °K-4 T: absolute temperature, K

This means that any material with a temperature greater than 0 K will emit energy. The total energy emitted from a surface is proportional to T4 .

This law is expressed for an energy source that behaves as a blackbody - a hypothetical, ideal radiator that absorbs and re-emits all energy incident upon it. Actual matters are not perfect blackbody. For any matter, we can measure its emitting energy (M), and compare it with the energy emitted from a blackbody at the same temperature (Mb) by:

" " is the emissivity of the matter. A perfect reflector will have nothing to emit. Therefore, its e will be "0". A true blackbody has an of 1. Most other matters fall in between these two extremes.

The third theory is Wien's displacement law which specifies the relationship between the peak wavelength of emittance and the temperature of a matter.

max = 2897.8/T

As the temperature of a blackbody gets higher, the wavelength at which the blackbody emits its maximum energy becomes shorter.

Figure 2.2. Blackbody radiation.

Figure 2.2 shows blackbody radiation curves for temperature levels of the Sun, a candescent lamp and the Earth. During the day time we can see the energy from the sun is overwhelming. During the night, however, we can use the spectral region between 3 µm and 16 µm to observe the emittance properties of the earth surface.

At wavelengths longer than the thermal infrared region, i.e. at the microwave region, the energy (radiation) level is very low. Therefore, we often use human-made energy source to illuminate the target (such as Radar) and to collect the backscatter from the target. A remote sensing system relying on human-made energy source is called an "active" remote sensing system. Remote sensing relying on energy sources which is not human-made is called "passive" remote sensing.

2.4 Energy Interactions in the Atmosphere

The atmosphere has different effects on the EM transfer at different wavelength. In this section, we will mainly introduce the fact that the atmosphere can have a profound effect on intensity and spectral composition of the radiation that reaches a remote sensing system. These effects are caused primarily by the atmospheric scattering and absorption.

Scattering: The redirection of EM energy by the suspended particles in the air.

Different particle sizes will have different effects on the EM energy propagation.

dp << λ Rayleigh scattering Sr

dp =λ Mie scattering Sm

dp >> λ Non-selective scattering Sn

The atmosphere can be divided into a number of well marked horizontal layers on the basis of temperature.

Troposphere:

It is the zone where weather phenomena and atmospheric turbulence are most marked. It contains 75% of the total molecular and gaseous mass of the atmosphere and virtually all the water vapour and aerosols.

height 8 - 16 km (pole to equator)

Stratosphere: 50 km Ozone Mesosphere : 80 km Thermosphere : 250 km Exosphere : 500 km ~ 750 km

The atmosphere is a mixture of gases with constant proportions up to 80 km or more from ground. The exceptions are Ozone, which is concentrated in the lower stratosphere, and water vapor in the lower troposphere. Carbon dioxide is the principal atmosphere gas, with its concentration varying with time. It is increasing since the beginning of this century due to the burning of fossil fuels. Air is highly compressible. Half of its mass occurs in the lowest 5 km and pressure decreases logarithmically with height from an average sea-level value of 1013 mb.

Figure 2.3 Horizontal layers that divide the atmosphere (Barry and Chorley, 1982)

Scattering causes degradation of image quality for earth observation. At higher altitudes, images acquired in shorter wavelengths (ultraviolet, blue) contain a large amount of scattered noise which reduces the contrast of an image.

Absorption: Atmosphere selectively absorbs energy in different wavelengths with different intensity.

The atmosphere is composed of N2 (78%), O2 (21%), CO2, H2O, CO, SO2, etc. Since different chemical element has a different spectral property, regions with different intensity. As a result, the atmosphere has the combined absorption features of various atmospheric gases. Figure 2.4 shows the major absorption wavelengths by CO2, H2O, O2, O3 in the atmosphere.

Figure 2.4 Major absorption wavelengths by CO2, H2O, O2, O3 in the atmosphere

(Source: Lillesand and Kiefer, 1994)

Transmission: The remaining amount of energy after being absorbed and scattered by the atmosphere is transmitted.

H2O is most variable in the atmosphere. CO2 varies seasonally.

Therefore, the absorpiton of EM energy by H2O and CO2 is the most difficult part to be characterized.

Atmospheric Window: It refers to the relatively transparent wavelength regions of the atmosphere.

Atmospheric absorption reduces the number of spectral regions that we can work with in observing the Earth. It affects our decision in selecting and designing sensor. We have to consider

1) the spectral sensitivity of sensors available;

2) the presence and absence of atmospheric windows;

3) the source, magnitude, and spectral composition of the energy available in these ranges.

For the third point, we have to base our decision of choosing sensors and spectral regions on the manner in which the energy interacts with the target under investigation.

On the other hand, although certain spectral regions may not be as transparent as others, they may be important spectral ranges in the remote sensing of the atmosphere.

2.5 Energy Interactions with the Earth Surface

What will happen when the EM energy reaches the Earth surface? The answer is that the total energy will be broken into three parts: reflected, absorbed, and/or transmitted.

r, a and t change from one matter to another.

t has been mentioned earlier. Solar energy has to transfer through the atmosphere in order to reach the Earth surface. Transmitted energy can also be measured from under water.

At the thermal spectral region, energy is primarily absorbed, and the reflected energy is significantly less in magnitude than the emission of a target. Since what is absorbed will be emitted, the absorbance "a" or the emissivity " " is a parameter of concern in the thermal region.

r is the easiest to measure using remote sensing devices. Therefore, it is the most important parameter for remote sensing observation using the 0.3 - 2.5 µm. r is called spectral reflectance

or reflectance or spectral signature.

Our second question is: how is energy reflected by a target? It can be classified into three cases, specular reflector, irregular reflector, and perfect diffusor.

Specular reflector is caused by the surface geometry of a mater. It is of little use in remote sensing because the incoming energy is completely reflected in another direction. Still water, ice and many other minerals with crystal surfaces have the same property.

Perfect diffuse reflector refers to a matter which reflects energy uniformly to all directions. This type of reflector is desirable because it is possible to observe the matter at any direction and obtain the same reflectance.

Unfortunately most targets have a behaviour between the ideal specular reflector and diffuse reflector. This makes quantitative remote sensing and target identification purely from reflectance data difficult. Otherwise, it would be easy to discriminate object using spectral reflectances from a spectral library. Due to the variability of spectral signature, one of the current research direction is to investigate the bidirectional properties of various targets.

Plotting reflectance against wavelength, we will get a spectral reflectance curve. Examples of spectral curves of typical materials such as vegetation, soil and water are shown in Figure 2.5. Clear water has a low spectral reflectance (< 10%) in the visible region. At wavelengths longer than 0.75 µm, water absorbs almost all the incoming energy. Vegetation generally has three reflectance valleys. The one at the red spectral wavelength region (0.65 µm) is caused by high absorptance of energy by chloraphyll a and b in the leaves. The other two at 1.45-1.55 µm and 1.90-1.95 µm are caused by high absorptance of energy by water in the leaves. Dry soil has a relatively flat reflectance curve. When it is wet, its spectral reflectance drops due to water absorption.

Figure 2.5 Typical Spectral Reflectance Curves for Soil, Vegetation and Water

(Lillesand and Kiefer, 1994)

Questions

1. Using the scattering properties of the atmosphere explain why under clear sky condition the sky is blue. Why the sun looks red at the time of sunset or sun-rise?

2. Why X-ray is used for medical examination? Using radiation law No. 3, explain why as a piece of iron is heated, the color of the iron begins with dark red, then changes to red, to yellow to white.

3. Describe how one may use absorptance and transmittance of a matter in remote sensing.

4. Use Figures 2.4 and 2.5 as references, answer the following questions:

5. Can we use 6-7 µm to observe the atmosphere?

6. Can we use 0.8-1.0 µm to observe under water materials such as plankaton?

7. Which spectral regions should be used to observe water content in the atmosphere? What about water content in vegetation?

Chapter 2

References

Barry, and Chorley, 1982. Climate, Weather and Atmosphere, Longman: London

Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing, John Wiley and Sons, Inc.: Toronto

Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd Ed., John Wiley and Sons, Inc.: Toronto.

3.1 Camera Systems

A camera system is composed of the camera body, lens, diaphragm, shutter and a film (Figure 3.1):

Figure 3.1. Components of a camera system

Lens collects the energy and it has a focal length.

Diaphragm controls the amount of energy reaching the film with an adjustable diameter

Shutter works by open and close. The time period between the open and close controls the amount of energy entering into the camera.

The principle of imaging is described by:

where

f = focal length do = distance from the lens to the object di = distance from the lens to the image (Figure 3.2)

For aerial photography,

do >> di

Therefore, we have a fixed distance between the lens to the film.

Figure 3.2. The imaging optics

The diameter of a diaphragm controls the depth of field. The smaller the diameter of an opened diaphram, the wider the distance range in which the scene constructs clearly focused image. The diaphragm diameter can be adjusted to a particular aperture. What we normally see on a camera's aperture setting is F 2.8 4 5.6 8 11 16 22

These F#s are obtained by f/diameter. When diameter becomes smaller, F# becomes larger and more energy is stopped. The actual amount of energy reaching the film is determined by:

where

i is the energy intensity J/m2ïs t is time in second F is F# as mentioned above E is energy in J/m2

Films

A film is primarily composed of an emulsion layer(s) and base (Figure 3.3)

Black and white film Color film

Figure 3.3. Layers in black and white films and colour films

The most important part of the film is the emulsion layer. An emulsion layer contains light sensitive chemicals. When it is exposed in the light, chemical reaction occurs and a latent image is formed. After developing the film, the emulsion layer will show the image.

Films can be divided into negative and positive, or divided in terms of their ranges of spectral sensitivities: black and white (B/W), B/W Infrared, Color, Color Infrared.

B/W negative films are those films that have the brightest part of the scene appearing the darkest while the darker part of the scene appearing brighter on a developed film.

Color negative films are those on which a color from the scene is recorded by its complementary colors.

There are two important aspects of a film: its spectral sensitivity and its characteristic curve.

Spectral sensitivity specifies the spectral region to which a film is sensitive (Figure 3.4).

Figure 3.4. The sensitivity of films and the transmittance of a filter

Since infrared is also sensitive to visible light, the visible light should be intercepted by some material. This is done by optical filtering (Figure 3.4). For this case, a dark red filter can be used to intercept visible light.

Similarly, other filters can be used to stop light at certain spectral ranges from reaching the film.

Characteristic curve indicates the radiative response of a film to the energy level.

[a] [b]

Figure 3.5. Film characteristic curves.

If the desnity of a film develops quickly when the film is exposed to light, we say that the film is fast (Fig. 3.5a). Otherwise, the film is slow (Fig. 3.5b). Film speed is defined by labels such as ASA 100, ASA 200, ...., ASA 1000. The greater the ASA number, the faster a film is. High speed films will have a good contrast on the image, but low speed films will provide better details.

Color Films

There are two types of colors: additive primaries and subtractive primaries:

ï Three additive primaries are red, green, and blue

ï Three subtractive primaries are cyan, magenta, and yellow

All colors can be made by combining any two primary colors.

Figure 3.6. Additive and subtractive colors

Additive colors apply to the mixing of light (Fig. 3.6a), while subtractive colors are used for the mixing of paints used in printing (Fig. 3.6b). In order to represent colors onto a medium such as a film or a colour photographic paper, subtractive colors are needed.

Color Negative Films

Figure 3.7 shows the structure of the three emulsion layers for a color negative film. Figure 3.8 shows the spectral sensitivities of each emulsion layer of the film. It can be seen from Figure 3.8 that green and red emulsion layers are also sensitive to blue. Therefore, film producers add a yellow filter to stop the blue lights from reaching the green and red emulsion layers (Figure 3.7).

Figure 3.7. Layers in a stadard colour film

Figure 3.8. The approximate spectral sensitivities of the three layers.

The development procedure for the colour negative film is shown in Figure 3.9.

Color Infrared Films

The sensitivity curve of a colour infrared film is shown in Figure 3.10. Figure 3.11 shows the structure of this type of film.

As the light of three primary colors pass through the film

After film development

Yellow Magenta Cyan

Print with white light

RGB RGB RGB Y M C

RG through RB through GB through Y Y M M C C

The results are:

B G R

Figure 3.9. The development procedure of a colour film

Figure 3.10. The sensitivity curves of a colour infrared film

Film Dye

NIR + B (Cyan dye forming layer)

G + B (Yellow dye forming layer)

R + B (Magenta dye forming layer)

Figure 3.11. Three layers of a colour-infrared film

The CIR film development process is as the following:

Photography

B G R IR

Clear C

Clear Y

Clear M

Use of white light to pass through the developed film

BGR BGR BGR BGR

Clear C

Clear Y

Clear M

Colour photographic paper exposure

Y Y Y

M M M

C C C

Final result

Black B G R

Figure 3.12. The procedures used to develop a colour film.

3.2 Aerial Photography

In texts on aerial photogrammetry or photo-interpretation, various types of aerial photographic cameras are discussed in detail (e.g. Lillesand and Kiefer, 1994; Paine, 1981). The implications of flight height, photographical orientation, and view angle on aerial photographic products are briefly discussed here.

Flight Height

For a given focal length of an aerial camera, the higher the camera is, the larger the area each aerial photo can cover. Obviously, the scale of aerial photographs taken at higher altitudes will be smaller than those taken at lower altitudes.

However, photographs taken at higher altitudes will be severely affected by the atmosphere. This is particularly true when films sensitive to shorter wavelengths are used. Thus, ultraviolet and blue should be avoided at higher altitudes. Instead, CIR or BWIR is more suitable.

Camera Orientation

Two types of camera orientations maybe used: vertical and oblique(slant) (Figure 3.13). Oblique allows one to take pictures of a large area while vertical allows for less distortion in photo scale.

Figure 3.13. Vertical and slant aerial photography

View Angle:

View angle is normally determined by the focal length and the frame size of a film. For a camera, the frame is fixed, therefore the ground coverage is determined by the altitude and the camera viewing angle (Figure 3.14)

f1 > f2 > f3

a1 < a2 < a3

Figure 3.14. Viewing angle determined by the focal length

In normal cameras Aerial Camera Normal lens 50 mm 300 mm Wide angle 28 mm 150 mm Fisheye lens 7 mm 88 mm

Obviously, wide angles allow a larger area to be photographed.

Photographic Resolution

Spatial resolution of aerial photographs is largely dependent on the following factors:

ï Lens resolution ï optical quality ï Film resolution ï Film flatness -normally not a problem ï Atmospheric conditions -changes all the time ï Aircraft vibration and motion -random

Film resolution depends mainly on granularity.

There is a standard definition of photographic resolution is the maximum number of line-pairs per mm that can be distinguished on a film when taken from a resolution target (Figure 3.15).

If the scale of an aerial photograph is known, we can convert the photographic resolution (rs) to ground resolution.

Figure 3.15. Resolving power test chart (from Lillesand and Kiefer, 1994).

Ground Coverage

A photograph may have a small coverage if it is taken either at a low flight height or with a narrower viewing angle.

The advantages of photographs with small coverages are that they provide more detail, and less distortion and displacement. It is easier to analyze a photograph with a small

coverage because similar target will have less distortion from the center to the edge of the photograph, and from one photograph to the other.

The disadvantage of photographs with small coverages is that it needs more flight time to cover an area and thus the cost will be higher. Moreover, mosaicing may cause more distortion.

A large coverage can be obtained by taking the photograph from a higher altitude or using a wider angle. The quality of photographs with a large coverage is likely to have poorer photographic resolution due to larger viewing angle and likely stronger atmospheric effect. The advantages are that a large coverage is simultanuously obtained, requires less geometric mosaicing, and costs less.

The disadvantages are that it is difficult to analyze targets in detail and that target is severely distorted.

Essentially, the size of photo coverage is related to the scale of the raw aerial photographs. Choosing photographs with a large coverage or a small one should be based on the following:

ï budget at hand ï task ï equipment available

The following are some of the advantages/disadvantages of aerial photography in comparison with other types of data acquisition systems:

Advantages:

ï High resolution (ground) ï Flexibility ï High geometric reliability ï Relatively inexpensive

Disadvantages:

ï Day light exposure (10:00 am- 2:00 pm) required ï Poorer contrast at shorter wavelengths ï Film non-reusable ï Inconvenient ï Inefficient for digital analysis

3.3 Satellite-Borne Multispectral Systems

What are the differences between a camera system and a scanning system? The following are some of the major differences:

ï A rotating mirror is added in front of the lens of camera ï In a scanning system, films are changed to photo-sensitive detectors and magnetic tapes. They are used to store the collected spectral energy (Figure 3.16).

Figure 3.16. A multispectral scanning system

Landsat Multispectral Scanner System

The first of the Landsat series was launched in 1972. The satellite was called Earth Resources Technology Satellites (ERTS-1). It was later renamed as Landsat - 1. On board of Landsat-1 are two sensing systems: multispectral scanning system (MSS) and return beam vidicon (RBV). RBV was discontinued since Landsat-3. MSS is briefly introduced here because it is still being used. The MSS sensor has 6 detectors per band (Figure 3.17). The scanned radiance is measured in four image bands (Figure 3.18).

Figure 3.17. Each scan will collect six image lines.

Figure 3.18. Four image bands with six detectors in each band.

MSSs have been used on Landsat - 1, 2, 3, 4, 5. They are reliable systems. The spectral region of each band is listed below:

Landsat 1, 2 Landsat 4, 5

B4 0.5 - 0.6 mm B1 B5 0.6 - 0.7mm B2 B6 0.7 - 0.8 mm B3 B7 0.8 - 9.1 mm B4

Landsat 3 had a short life. The MSS systems on Landsat 3 were modified as compared to Landsat 1 and 2. Landsat-6 was launched unsuccessfully in 1993.

Each scene of MSS image covers 185 km X 185 km in area. It has a spatial resolution of 79 m X 57 m. An advantage of MSS is that it is less expensive. Sometimes one detector is left blank or its signal is much different from other ones, creating banding or striping. We will discuss methods for correcting these problems in Chapter 5.

Landsat Thematic Mapper System

Since the launch of Landsat 4 in 1982, a new type of scanner, called Thematic Mapper (TM), has been introduced. It

http://www.cnr.berkeley.edu/%7Egong/textbook/chapter5/html/home5.htm

ï Increased the number of spectral bands. ï Improved spatial and spectral resolution. ï Increased the angle of view from 11.56° to 14.92°.

TM1 0.45 - 0.52 mm 30 m TM2 0.52 - 0.60 30 m TM3 0.63 - 0.69 30 m TM4 0.76 - 0.90 30 m TM5 1.55 - 1.75 30 m TM7 2.08 - 2.35 30 m TM6 10.4 - 12.5 mm 120m

MSS data are collected on only one scanning direction. TM data are collected on both scanning directions (Figure 3.19).

Figure 3.19. Major changes of the TM system as compared to the MSS system.

High Resolution Visible (HRV) Sensors

A French satellite called 'Le Système Pour l'observation de la Terre' (SPOT) (Earth Observation System) was launched in 1986. On board this satellite, a different type of sensors called High Resolution Visible (HRV) were used. The HRV sensors have two modes: the panchromatic (PAN) mode and the multispectral (XS) mode.

The HRV panchromatic sensor has a relatively wide spectral range, 0.51 - 0.73 mm, with a higher spatial resolution of 10 x 10 m2

HRV Multispectral (XS) mode

B1 0.50 - 0.59 mm B2 0.61 - 0.68 mm B3 0.79 - 0.89 mm

The spatial resolution for the multispectral (XS) mode is 20 x 20 m2.

Besides the difference of spectral and spatial resolution design from the Landsat sensor systems, major differences between MSS/TM and HRV are the use of linear array (also called pushbroom) detectors and the off-nadir observation capabilities with the HRV sensors (Figure 3.20). Instead of mirror rotation in the MSS or the TM sensors which collect data using only a few detectors, the SPOT HRV sensors use thousands of detectors arranged in arrays called "charge-coupled devices" (CCDs). This has significantly reduced the weight of the sensing system and power requirement.

Figure 3.20. The SPOT HRV systems

A mirror with the view angle of 4.13° is used to allow ±27° off nadir observation. An advantage of the off-nadir viewing capability is that it allows more frequent observations of certain targeted area on the earth and acquisitions of stereo-pair images. A disadvantage of the HRV sensors is the difficulties involved in calibrating thousands of detectors. The radiometric resolution of MSS is 6 to 7 bits, while both TM and HRVs have an 8 bit radiometric resolution.

The orbital cycle is 18 days for Landsats 1 - 3; 16 days for landsats 4, 5; 26 days for SPOT-1 (SPOT HRV sensors can repeat the same target in 3 to 5 days due to their off-nadir observing capabilities).

AVHRR - Advanced Very High Resolution Radiometer

Among many meterological satellites, the Advanced Very High Resolution Radiometers (AVHRR) on board the NOAA series (NOAA-6 through 12) have been widely used.

NOAA series were named after the National Oceanic and Atmospheric Administration of the United States.

The AVHRR sensor has 5 spectral channels

B1 0.58 - 0.68 mm B2 0.72 - 1.10 mm B3 3.55 - 3.95 mm B4 10.3 - 11.30 mm B5 11.5 - 12.50 mm

Swath width 2400 Km

The orbit repeating cycle is twice daily. This is an important feature for frequent monitoring. NOAA AVHRRs have been used for large scale vegetation and sea ice studies at continental and global scales.

Earth Observing System (EOS)

To document and understand global change, NASA initiated Mission to Planet Earth. This is a program involving international efforts to measure the Earth from space and ground. Earth Observing System is a primary component of the Mission to Planet Earth. EOS includes the launch of a series of satellites with advanced sensor systems by the end of this century. Those sensors will be used to measure most of the measurable aspects of the land, ocean and atmosphere, such as cloud, snow, ice, temperature, land productivity, ocean productivity, ocean circulation, atmospheric chemistry, etc.

Among various sensors on board the first six satellites to be launched, there is a sensor called Moderate Resolution Imaging Spectrometer (MODIS). It has 36 narrow spectral bands between 10-360 nm. The spatial resolution changes as the spectral band changes.

Two bands have 250 m, 5 have 500 m while the rest have 1000 m resolution. The sensor is planned to provide data covering the entire Earth daily.

Other Satellite Sensors

GOES - Geostationary Operational Environmental Satellite (Visible to NIR, Thermal)

DMSP - Defense Meterological Satellite Program 600 m resolution (Visible to NIR, Thermal) used, for example for urban heat island studies

Nimbus - CZCS - coastal zone color scanner, 825 m spatial resolution

Channels (6 total) Spectral Resolution 1 - 4 0.02 µm for chlorophyll absorption studies 5 - 6 NIR -thermal

Two private companies, Lockheed, Inc. and Worldview, Inc. are planning to launch their own commecial satellites in 2-3 years time with spatial resolutions ranging from 1 m to 3 m. In Japan, the NASDA (National Space Development Agency) has developed the Marine Observation System (MOS). On board this system, there is a sensor called Multispectral Electronic Self-scanning Radiometer (MESSR) with similar spectral bands as the Landsat MSS systems. However, the spatial resolution of the MESSR system is 50 x 502.

Other countries such as India and the former USSR have also launched Earth resources satellites with different optical sensors.

3.4 Airborne Multispectral Systems

Multispectral scanners

The mechanism of airborne multispectral sensors is similar to the Landsat MSS and TM. The airborne sensor systems usually have more spectral bands ranging from ultraviolet to visible through near infrared to thermal areas. For example, the Daedalus MSS system is a widely used system that has 11 channels, with the first 10 channels ranging from 0.38 to 1.06 µm and the 11th is a thermal channel (9.75 - 12.25 mm).

Another airborne multispectral scanner being used for experimental purposes is the TIMS - Thermal Infrared Multispectral Scanner. It has 6 channels: 8.2 - 8.6; 8.6 - 9.0; 9.4 - 10.2; 10.2 - 11.2; 11.2 - 12.2 µm.

MEIS-II

Canada Centre for Remote Sensing developed the Multispectral Electro optical Imaging Scanner (MEIS-II). It uses 1728 - element linear CCD arrays that acquire data in eight spectral bands ranging from 0.39 to 1.1 mm. The spatial resolution of MEIS-II can reach up to0.3 m.

Advantages of multispectral systems over photographic systems are

• Spectral range: photographic systems operate between 0.3 - 1.2 mm while Multispectral systems operate between 0.3 - 14 mm.

• Multiband photography (photographic system) uses different optical systems to acquire photos. This leads to problems in data incomparability among different cameras. MSS, on the other hand, uses the same optical system, eliminating the data incomparability problems.

• Electronical process used in MSS is easier to calibrate than photo chemical process used in photographic systems.

• Data transmission is easier for MSS than for photographic systems which require onboard supply of films.

• Visual interpretation for photographic systems v.s. digital analysis for MSS systems--Visual analysis is difficult to analyze in 3-dimension.

False Color Composite

Each time, only three colours (red, green and blue) can be used to display data on a colour monitor. The colours used to display an image may not be the actual colour of the spectral band that is used to acquire the image. Image displayed with such colour combinations are called false colour composite. We can make many 3-band combinations out of a multispectral image.

where Nc is the total number of 3-band combinations and nb is the number of spectral bands in a multispectral image. For each of these 3-band combinations, we can use red, green, and blue to represent each band and to obtain a false-colour image.

Digital Photography with CCD Arrays

Videographic imaging includes the use of video cameras and digital CCD cameras. Video images can be frame grabbed,or quantized and stored as digital images; however, the image resolution is relatively low (up to 550 lines/image). Digital CCD cameras use two-dimensional silicon-based charge coupled devices that produce a digital image in standard raster format. CCD detectors arranged in imaging chips of approximately 1024 X 1024 or more photosites produce an 8-bit image (King, 1992).

Digital CCD photography compare favorably to other technologies such as traditional photography, videography, and line scanning. Comparing to photography, digital CCD cameras have linear response, greater radiometric sensitivity, wider spectral response, greater geometric stability, and no-need for film supply (Lenz and Fritsch, 1990, King, 1992). Matching with the fast development of softcopy photogrammetry, they have the potential to replace the role of aerial photography and photogrammetry for surveying and mapping.

Imaging Spectrometry

Imaging spectrometry refers to the acquisition of images in many, very narrow, continuous spectral bands.

The spectral region can range from visible, near-IR to mid-IR.

• The first imaging spectrometer was developed in 1983 by JPL. The system called Airborne Imaging Spectrometer (AIS) collects data in 128 channels from 1.2 µm to 2.4 mm. Each image acquired has only 32 pixels in a line.

• The Airborne Visible-Infrared Imaging Spectrometer (AVIRIS) represents an immediate follow-up of the AIS (1987). It collects 224 bands from 0.40 - 2.45 mm with 512 pixels in each line.

• In Canada, the first system was the FLI - Flourescence Linear Imager manufactured by Moniteq, a company that used to be located in Toronto, Ontario.

In Calgary, the ITRES Research is producing another imaging spectrometer called the Compact Airborne Spectroscopy Imager (CASI) (Figure 3.21).

Figure 3.21. The two dimensional linear array of the CASI.

For each line of ground targets, there will be nb x ns data collected at 2 bytes (16 bits) radiometric resolution where nb is the number of spectral bands and ns is the number of pixels in a line.

Due to constraint of data transmission rate, these nb x ns data cannot be transferred completely. This leads to a division into two operation modes of CASI, spectral mode and spatial mode.

In spectral mode, all 288 spectral bands are used, but only up to 39 spatial pixels (look directions) can be transferred.

In the spatial mode, all 512 spatial pixels are used, but only up to 16 spectral bands can be selected.

Where to obtain remote sensing data?

See the Appendix in Lillesand and Kiefer (1994).

3.5 Microwave Remote Sensing

Radar represents "radio detection and ranging". As we mentioned before it is an active sensing system. It uses its own energy source - microwave energy. A radar system transmits pulses in the direction of interest and records the strength and origin of "echos" or reflection received from objects within the system's field of view.

Radar systems may or may not produce images

None imaging

ï Doppler radar -used to measure vehicle speeds. ï Plan Position Indicator (PPI) - used to observe weather systems or air traffic

}Ground Based

Imaging radar ï Side-looking airborne radar (SLAR) }Air Based

SLAR systems produce continuous strips of imagery depicting very large ground areas located adjacent to the aircraft flight line. Since cloud system is transparent to microwave region, a SLAR has been used to map tropical areas such as SLAR Amazon River Basin. Started in 1971 and ended in 1976, the project RADAM (Radar of the Amazon) was the largest radar mapping project ever undertaken. In this project, the Amazon area was mapped for the first time. In such remote and cloud covered areas of the world, radar system is a prime source of information for mineral exploration, forest and range inventory, water supplies and transportation management and site suitability assessment.

Radar imagery is currently neither as available nor as well understood as other image products. An increasing amount of research is being conducted on interaction mechanism between energy and surface targets, such as forest canopy, and on the combination of radar image with other image products.

SLAR system organization and operation are shown in Figure 3.22.

Figure 3.22. A RADAR system components and organization

Spatial Resolution of SLAR systems

The ground resolution of SLAR system is determined by two independent sensing parameters: pulse length and antenna beam width (Figure 3.23).

The time period can be measured from a transmitted signal travelling through the air reaching the target and being scattered back to the antenna. We then can determine the distance or the 'slant range' between the antenna and the target.

where

• Sr: the slant range. • c: the speed of light. • t: time period for a returned transmitted pulse.

Figure 3.23

From Figure 3.23, it can be seen that SLAR depends on the time it takes for a transmitted pulse being scattered back to the antenna to determine the position of a target.

In the across track direction, there is a spatial resolution which is determined by the duration of a pulse and the depression angle (Figure 3.24). This resolution is called ground range resolution. (rg)

Figure 3.24. The across track spatial resolution

The along track distinguishing ability of a SLAR system is called azimuth resolution ra:

Figure 3.25. The sidelobes of a RADAR signal

It is obvious that in order to minimize rg, one needs to reduce τ. For the case of ra, the optimal situation is determined by β which is a function of wavelength λ and antenna length α

can be the actual physical length of an antenna or a synthetic one.

Those systems whose beam width is controlled by the physical antenna length are called brute force or real aperture radar.

For a real aperture radar, the physical antenna length must be considerably longer than the wavelength in order to achieve higher azimuth resolution. Obviously, it has a limit at which the dimension of the antenna is not realistic to be put onboard an aircraft or a satellite.

This limiation is overcome in synthetic aperture radar (SAR) systems. Such systems use a short physical antenna, but through modified data recording and processing techniques, they synthesize the effect of a very long antenna. This is achieved by making the use of Doppler effect (Figure 3.26).

Figure 3.26. The use of Doppler Effect

Synthetic aperture radar records the frequency differences of backscattering signal at different aircraft position during the time period when the target is illuminated by the transmitted energy.

SAR records both amplitude and frequency of backscattering signals of objects throughout the time period in which they are within the beam of moving antenna. These signals are recorded on tapes or on films. This leads to two types of data processing.

One of the problem associated with processing radar signals from tapes is that the signal is contaminated by random noise. When displayed on a video monitor, the radar image tends to have a noisy or speckled appearance. Later in the digital analysis section, we will discussed the speckle reduction strategies.

Radar Equation Derivation

What we actually measure is the backscattered energy Pr in Watts.

Antenna transmitts Pt watts

At a distance R the power is

Power received at Antenna:

If the same antenna is used for both transmitting and receiving, then

All parameters in this formula except "d" is determined by the system. Only δ is parameter related to the ground target. Unfortunately, δ is a poorly understood parameter which largely limits its use in remote sensing.

We know δ is related not only to system variables including wavelength, polarization, azimuth, landscape orientation, and depression angle, but also to landscape parameters including surface roughness, soil moisture, vegetation cover, and micro topography.

ï moisture influences the dielectric constant of the target which in turn could significantly change the backscattering pattern of the signal. Moisture also stops the microwave penetrating capability.

ï Roughness - the standard deviation S(h) of the heights of individual facets.

In the field, we use an array of sticks arranged paralell to each other with a constant distance interval to measure the surface roughness.

A common definition of a rough surface is one whose S(h) exceeds one eighth of the wavelength divided by the cosine of the incidence angle

As we illustrated in the spectral reflectance section, a smooth surface will tend to reflect all the energy input at an angle equal to the incidence angle, while a rough surface tends to scatter the incoming energy more or less at all direction.

• Polarization

Microwave energy can be transmitted and received by the antenna at a selected orientation of the electromagnetic field. The orientation or polarization of the EM field is labelled as Horizontal (H) and Vertical (V) direction. The antenna can transmit using either polarization. This EM energy makes it posible for a radar system to operate in any of the four models transmit H and recieve H, transmit H receive V, transmit V recieve H, and transmit V receive V. By operating at different modes, the polarizing characteristics of ground target can be obtained.

• Corner reflector

It tends to collect

reflected signal at its foreground and returns the signal to the antenna.

Microwave Bands

Band Wavelength 1 Ka 0.75 - 1.1 cm 40 - 26.5 GHz K 1.1 - 1.67 cm 26.5 - 18 GHz Ku 1.67 - 2.4cm 18 - 12.5 GHz X 2.4 - 3.75cm 12.5 - 8GHz C 3.75 - 7.5cm 8 - 4GHz S 7.5 - 15cm 4 - 2GHz L 15 - 30cm 2 - 1GHz P 30 - 100cm 1 - 300 MHz

Geometric Aspects

Radar uses two types of image recording systems, a slant-range image recording system and a ground-range image recording system.

In slant-range recording system, the spacing of targets is proportional to the time interval between returning signals from adjacent targets.

In ground-range image recording system, the spacing is corrected to be approximately proportional to the horizontal ground distance between ground targets.

If the terrain is flat, we can convert the slant-range spacing SR to Ground range GR

• Relief distortion

Relief displacement is different on the photographs from SLAR images.

Space-borne radars

• Seasat launched in1978, duration 98 days

• Frequency L band • Swath width 100 km centered at 20¡ from nadir • Polarization HH • Ground Resolution 25 m x 25 m

• Shuttle Imaging Radar, SIR-A, SIR-B, SIR-C

• The Euraopean Space Agency has lauched a satellite in 1991: ERS - 1, with a C band SAR sensor.

• In 1992, the Japanese JERS -1 satellite was launched with a L band radar mounted. The L band radar has a higher penetration capability than the C band SAR.

• Radarsat

• Scheduled to be launched in mid 1995, Radarsat will contain a SAR system which is very flexible in terms of configurations of incidence angle, resolution, number of looks and swath width.

Frequency C band 5.3 GHz Altitude 792 Km

Repeat Cycle 16 days Radarsat Subcycle 3 day

Period 100.7 min (14 cycles per day) Equatorial crossing 6:00 A.M.

Radarsat

Platform Satellite Orbits Campbell's Book p. 118-129

Chapter 3

References

Ahmed, S. and H.R. Warren, 1989. The Radarsat System. IGARSS'89/12th Canadian Symposium on Remote Sensing. Vol. 1. pp.213-217.

Anger, C.D., S. K. Babey, and R. J. Adamson, 1990, A New Approach to Imaging Spectroscopy, SPIE Proceedings, Imaging Spectroscopy of the Terrestrial Environment, 1298: 72 - 86. - specifically, CASI

Curlander, J.C., and McDonough R. N., 1991. Synthetic Aperture Radar, Systems & Signal Processing. John Wiley and Sons: New York.

Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing. John Wiley and Sons, New York.

King, D., 1992. Development and application of an airborne multispectral digital frame camera sensor. XVIIth Congress of ISPRS, International Archives of Photogrammetry and Remote Sensing. B1:190-192.

Lenz, R. and D. Fritsch, 1990. Accuracy of videometry with CCD sensors. ISPRS Journal of Photogrammetry and Remote Sensing, 90-110.

Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd. Ed., John Wiley and Sons, Inc.: Toronto.

Luscombe, A.P., 1989. The Radarsat Synthetic Aperture Radar System. IGARSS'89/12th Canadian Symposium on Remote Sensing. Vol. 1. pp.218-221.

Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187-197.

4.1 Digital Imagery

Different from Cartesian coordinate system, the origin and the axis in an image coordinate system takes the following form for printing and processing purposes:

Figure 4.1. An image coordinate system

Each picture element in an image, called a pixel, has coordinates of (x, y) in the discrete space representing a continuous sampling of the earth surface. Image pixel values represent the sampling of the surface radiance. Pixel value is also called image intensity, image brightness or grey level. In a multispectral image, a pixel has more than one grey level. Each grey level corresponds to a spectral band. These grey levels can be treated as grey-level vectors.

From the continuous physical space to the discrete image space, a quantization process is needed. The details of quantization is determined by how we do sampling and what kind of resolution we use. General concepts on sampling and resolution have been introduced in Chapter 1.

Two concepts are of particular importance; image space and feature space. Image space refers to the spatial coordinates of an image(s) which are denoted as I with m x n elements, where m and n are respectively the number of rows and the number of columns in the image(s). The elements in image space, I(i,j) (i = 1, 2,..., m; j = 1, 2,..., n) are image pixels. They represent spatial sampling units from which electromagnetic energy or other phenomena are recorded. All possible image pixel values constitute the feature space V. One band of image constitutes a one-dimensional feature space. k bands in an image denoted as Ik construct a k-dimensional feature space Vk. Each element in Vk is a unit hypercube whose coordinate is a k-dimensional vector v = (v1, v2, ..., vk)T. When k = 1, 2, and 3 the hypercube becomes a unit line, a unit area, and a real unit cube. Each pixel in image space has one and only one vector in feature space. Different pixels may have the same vector in feature space.

Multispectral images construct a special feature space, a multispectral space Sk. In S, each unit becomes a grey-level vector g = (g1, g2, ..., gk)T. In multispectral images, each pixel has a grey-level vector. There are other types of images which add additional dimensions to the feature space. In the feature space, various operations can be performed. One of these operations is to classify feature space into groups with similar grey-level vectors, and give each group a same label that has a specific meaning. The classification decision made for each image pixel is in feature space and the classification result is represented in image space. Such an image is a thematic image which could also be used as an additional dimension in feature space for further analysis.

4.1.1 Pixel Window

A pixel window is defined in image space as a group of neighbouring pixels. For the computation simplicity, a square pixel neighbourhood wl(i,j) centered at pixel I(i,j) with a window lateral length of l is preferred. Without further explanation, we refer to a pixel window as wl(i,j). In order to ensure that I(i,j) is located at the centre of the pixel window, it is necessary for l to be an odd number. It is obvious that the size of a pixel window wl(i,j) is l X l. The following conditions hold for a pixel window:

This means that the minimum pixel window is the centre pixel itself, and the maximum pixel window could be the entire image space, provided that the image space is a square with an odd number of rows and columns. When the image space has more than one image, a pixel window can be used to refer to a window located in any one image or any combinations of those images.

4.1.2 Image Histogram

A histogram sometimes has two means: a table of occurrence frequencies of all vectors in feature space or a graph plotting these frequencies against all the grey-level vectors. The

occurrence frequency in the histogram is the number of pixels in the image segment having the same vector. When the entire image space is used as the image segment, the histogram

is referred to as h(I). When a histogram is generated from a specific pixel window, it is identified as hl(i,j) where l, i, and j are the same as above. In practice, one-dimensional feature space is mainly used. In this case, a histogram is a graphical representation of a

table with each grey level as an entry of the table. Corresponding to each grey level is its occurrence frequency f(vi) , i = 0, 1, 2, ..., Nv-1 and Nv are the numbers of grey levels of an

image (e.g.,Nv = 8 in Figure 4.2).

Figure 4.2. An example histogram

From a histogram h(I) we can derive the cumulative histogram hc(I)={fc(vi) , i = 0, 1, 2, ..., Nv-1}. This is obtained for each grey level by summing up all frequencies whose grey

levels are not higher than the particular grey level under consideration (Figure 4.3).

Figure 4.3. A cumulative histogram

In a numerical form that is:

4.1.3 Quality of a Digital Image

Two parameters of a sensor system at a specific height determine the quality of a digital remote sensing image for a given spectral range; the spatial resolution rsand the

radiometric resolution rr. As discussed in Chapter 1, the spatial resolution determines how finely the spatial detail of the real world an image can record (i.e., how small the spatial sampling unit is) and therefore the number of pixels in the image space. The radiometric

resolution determines how finely a spectral signal can be quantized and therefore the

number of grey levels that is produced. The finer these resolutions are, the closer is the information recorded in the image to the real world, and the larger are the sizes of the image space and the grey-level vector space. The size (or alternatively the number of

pixels) of image space, S(I), has an exponential relation with the spatial resolution, and so does the size (or the number of vectors) of the feature space, S(V), with the radiometric

resolution. Their relations take the following forms:

where k, as defined above, is the number of images in the image space. While S(I) has a fixed exponential order of 2 with rs, S(V) depends not only on rr, but also on k. The number of vectors in Vk becomes extremely large when k grows while rr is unchanged. For example, each band of a Landsat TM or SPOT HRV image is quantized into 8 bits (i.e., an image has 256 possible grey levels). Thus, when k = 1, S(V) = 256 and when k = 3, S(V) = 16,777,216. If a histogram is built in such a three-dimensional multispectral space, it would require at least 64 Megabytes of random access memory (RAM) or disk storage to process it. Therefore, the feature space has to be somehow reduced for certain analyses.

4.1.4 Image Formats

A single image can be represented as a 2-dimensional array. A multispectral image can be represented in a 3-dimensional array (Figure 4.4)

Figure 4.4. A multispectral image

In a computer, image data can be stored in a number of ways:

The most popular ones include Band Sequential (BSQ), Pixel Interleaved, Line Interleaved (BIL) or separate files. These format can be illustrated using the following example of a three-band multispectral image. AAA BBB CCC

AAA BBB CCC

Band 1 Band 2 Band 3

BIL is typically used by the Landsat Ground Station Operators' Working Group (LGSOWG) AAA BBB CCC, AAA BBB CCC Band Sequential BSQ takes the following form: AAA AAA BBB BBB CCC CCC Pixel Interleaved format is used by PCI. It takes the form of: ABC ABC ABC ABC ABC ABC These are the general formats that are being used. BIL is suitable for data transfer from the sensor to the ground. It does not need a huge buffer for data storage on the satellite if the ground station is within the transmission coverage of the satellite.

Pixel interleaved is suitable for pixel-based operation or multispectral analysis.

Band sequential and separate file formats are the proper forms to use when we are more interested in single-band image processing, such as image matching, correlation, geometric correction, and when we are more concerned with spatial information processing and extraction. For example, we use these files when linear features or image texture are of our concern.

4.2 Factors Affecting Image Geometry

In remote sensing there are three major forms of imaging geometry as shown in Figure 4.5:

Figure 4.5 The major types of imaging geometry

The first one is central perspective. It is the simplist because the entire image frame is defined by the same set of geometrical parameters. In the second imaging geometry, each pixel has its own central perspective. This is the most complicated because each pixel has to be corrected separately if there exists geometrical distortion. The third one shows that each line of an image has a central perspective.

The platform status which can be represented by six parameters all affect the image geometry.

(X, Y, Z, ψ, λ, ρ)

In addition, the following factors affect the image geometry.

• airborne platform • earth rotation - affects satellite • continental drift

Most remote sensing satellites for earth resources studies, such as the Landsat series and the SPOT, use Sun synchronous polar orbit around the earth (Figure 4.6) so that they overpass the same area on the earth at approximately the same local time. Most of the earth's surface can be covered by these satellites.

Figure 4.6. Sun synchronous polar orbit for Earth resources satellites

The effects of roll, pitch and yaw along the direction of satellite orbit or the airplane flight track can be illustrated by using Figure 4.7.

Figure 4.7. The effects of roll, pitch and yaw on image geometry

4.3. Flattening the Earth Surface through Map Projection

Although the Earth's surface is spherical, we use flat maps to represent the phenomena on the surface. We transform the coordinates on the spherical surface to a flat sheet of paper using map projection. The most widely used map projection is Universal Transverse Mercator (UTM) projection.

4.4 Georeferencing (Geometric Correction)

The purpose of georeferencing is to transform the image coordinate system (u,v), which may be distorted due to the factors discussed above, to a specific map projection (x,y) as shown in Figure 4.8. The imaging process involves the transformation of a real 3-D scene geometry to a 2-D image

Figure 4.8. Georeferencing is a transformation between the image space to the geographical coordinate space

Terms such as geometric rectification or image rectification, image-to-image registration, image-to-map registration have the following meanings:

1) Geometric rectification and image rectification recovers the imaging geometry

2) Image-to-image registration refers to transforming one image coordinate system into another image coordinating system

3) Image-to-map registration refers to transformation of one image coordinate system to a map coordinate system resulted from a particular map projection.

Georeferencing generally covers 1) and 3). It requires a transformation T:

Forward Transformation is composed of the following transformations:

In order to achieve:

Every step involved in the imaging process has to be known, i.e., we need to know the inverse process of geometric transformation.

This is a complex and time consuming process. However, there is a simpler and widely-used alternative: polynomial approximation.

Coefficients a's and b's are determined by using Ground Control Points (GCPs).

For example, we can use very low order polynomials such as the affine transformation

u = ax + by + c

v = dx + ey + f

A minimum of 3 GCPs will enable us to determine the coefficients in the above equations.

In this way, we don't need to use the transformation matrix T. However, in order to make our coefficients representative of the whole image that is transformed, we have to make sure that our GCPs are well distributed all over the image.

The third choice is that we can combine the T-1 method with the polynomial technique in order to reduce the transformation errors involved in the direct transformation of T-1 (Figure 4.9).

Figure 4.9. Larger magnitude of errors may be introduced if direct transformation is used.

This may be achieved through the following four steps:

(1) To refine imaging geometry parameters.

In T-1, due to the inaccuracies of satellite or plane positioning, polynomials are used to correct them. For platform position the following formula can be used:

We can use GCPs to refine the coefficients. Global Positioning System (GPS) and/or Inertial Navigation Systems (INS) techniques can also be used. The integration of GPS and INS with remote sensing sensors are being investigated (Schwarz, et al, 1993).

(2) Divide output grid into blocks (Figure 4.10):

Figure 4.10. In the x-y space

(3) Map the grid points using

(4) Use a low order polynomial inside each block for detailed mapping (Figure 4.11)

Figure 4.11. Further transformation from u-v space to Dx-Dy space using lower order polynomials

The choices are:

(i) Affine

(ii) Bilinear

Why (ii) is called bilinear? This is because each coordinate can be a multiplication of two linear function of x and y.

u = (a + bx) (c + dy)

linear linear

Bilinear

Since there are four known and four unknown, therefore we can solve (i) using least squares (ii) using regular solution of an equation group. We will only show how to obtain ao, a1, a2, a3 in (ii).

For point

Similarly, we can obtain bo, b1, b2, b3.

Why do we use bilinear instead of affine? It is because the bilinear transformation guarantees the continuity from block to block in the detailed mapping. The geometric interpolation of bilinear transformation is illustrated in Figure 4.12.

Figure 4.12. Linear and bilinear transformation

Method for Determining the coefficients of a polynomial geometric transformation

We can use the least square solution for bilinear polynomials.

This is done with more than 4, say n, GCPs,

(u1, v1) ♦ (x1, y1)

(u2, v2) ♦ (x2, y2)

.

.

.

(un, vn) ♦ (xn, yn)

by substituting the n GCPs coordinates into (1) and (2) we will obtain two groups of over-determined equations

The least squares solution in matrix form:

For x

by multiplying on both sides, we have

A is the solution. Similarly we can solve bo, b1, b2, b3.

This can be applied to affine transformation and higher order polynomial transformation.

4.5 Image Resampling

Once the geometric transformation (T) is obtained as we have discussed above, theoretically we can use the following relation to transform each pixel (i,j) from image space (u-v) to a desirable space (x-y).

The results will appear as in Figure 4.13. Pixel position (1, 1) may be transformed to (4850.672, 625.341).

Figure 4.13. Forward transformation results in fractional coordinates in x-y space In order to have a grid coordinate in x-y space, data have to be resampled in u-v image space for given coordinates in x-y space. This is shown below:

For a pixel location in x-y space, the corresponding coordinates * in u-v space are found through T-1. To determine the grey level at the * location in u-v space, interpolation strategies are used. These include: ï Nearest neighbour interpolation ï Bilinear (linear in one dimension) ï cubic - special case of spline There are some other interpolation methods, such as use of sinc function, spline function, etc. The most commonly used methods in remote sensing are, however, the three listed above. Nearest neighbour interpolation simply assigns the value to a pixel that is closest to * as shown below:

In one dimensional case, it can be illustrated as:

This can be achieved using the following convolution operation:

where (u) is the weight function. One dimensional convolution: The convolution of two functions f1 and f2 is denoted by,

Convolution is equivalent to flipping filter backwards and then do correlation. In image enhancement, convolution is the same as correlation since filters are symmetrical. In image resampling, Id(i) is the discrete image, (i) is the weight function for interpolation I'(u) = Id(u) * (u)

convolution operator

Since most weight functions are limited to a local neighbourhood, only a limited number of i's need to be used.

For instances, in the nearest neighbour (NN) interpolation, i takes the value which is closest to u. In linear interpretation, l and h takes the nearest integer less than and equal to u and the nearest integer greater than u, respectively. For cubic, l and h takes the second nearest integer less than and equal to u while h takes second closest integer that is greater that u. For sinc function,

x can be infinite but we usually need to use a limited number of terms up to 20. According to the above introduction of convolution, for the nearest neighbour case the weight function is

For the linear case, it is

For the cubic convolution:

For the case of two dimension, a sequential process is used. Z(u) values are first calculated along each row as shown below

Then Z(u, v) is obtained by applying convolution along the dashed line. The convolution process for all the three interpolation cases can be shown by

For the NN, l = m, the closest point to u.

For linear: l = nearest integer equal or smaller than u m = nearest integer larger than u For Cubic: l = nearest two integers equal or smaller than u m = nearest two integers larger than u.

Chapter 4

References

Jensen, J.R., 1986. Digital Image Processing, a Remote Sensing Perspective.

Schwarz, K-P., Chapman, M.A., Canon, E.C. and Gong, P., 1993. An integrated INS/GPS approach to the georeferencing of remotely sensed data. Photogrametric Engineering and Remote Sensing, 59(11): 1667-1673.

Shlien, S., 1979. Geometric correction, registration, and resampling of Landsat Imagery. Canadian Journal of Remote Sensing. 5(1):74-87.

5. Radiometric Correction

In addition to distortions in image geometry, image radiometry is affected by factors, such as system noise, sensor malfunction and atmospheric interference. The purpose of radiometric calibration is to remove or reduce the sensor (detector) inconsistencies, sensor malfunction, viewing geometry and atmospheric effects. We will first introduce the calibration of detector responses.

5.1 Detector Response Calibration

As we have discussed before, Landsat MSS has 6 detectors at each band, TM has 16 and SPOT HRVs have 3000 or 6000 detectors. The differences between the SPOT sensors and Landsat sensors are that each SPOT detector collects one column of an image while each detector of Landsat sensors corresponds to many lines of an image (Figure 5.1).

Figure 5.1. Images acquired using detectors in linear array sensors and in scanners

The problem is that no detector functions the same way as others. If the problem becomes serious, we will observe banding or striping on the image.

There are two types of approach to overcome the detector response problems: absolute calibration and relative calibration.

5.1.1 Absolute calibration

In this mode, we attempt to establish a relationship between the image grey level and the actual incoming reflectance or the radiation. A reference source is needed for this mode and this source ranges from laboratory light, to on-board light, to the actual ground reflectance or radiation.

For CASI, each detector is calibrated by the manufacturer in the laboratory. For the Landsat MSS, a calibration wedge with 6 different grey levels is used. For the Landsat TM, three lamps, which have 8 brightness combinations, are used.

In any case, a linear response is assumed for each detector

vo = a ï vi + b

vo - observed reading

vi - known source reading

e.g. for an 8-bit image 0 < vo < 255 .

Least squares method is used to derive a and b (Figure 5.2).

Figure 5.2. Responses of the six Landsat MSS detectors. A least squares linear fitting is applied to these detector responses.

Once each detector is calibrated, the calibrated image data (digital numbers) can be converted into radiances or spectral reflectances. For the case of converting digital numbers of an 8 bit image into radiances, we have

5.1.2 Relative calibration

Even though data may have been absolutely calibrated, an image may still have problems caused by sensor malfunctioning. For example, in some of the early Landsat-1, 2, 3 images, there may be lines which have been dropped out. No response for that particular detector can be found. In other cases, there are still striping problems. This happens to both MSS and TM images. The striping problem is most obvious when an image is acquired over water body where the actual spectral reflectances from one part to another are similar (Figure 5.3).

Figure 5.3. When six detectors of the Landsat MSS are seeing the same water target, their responds should be the same.

There are two additional methods to balance the detector response: (1) Balance the mean and standard deviation

(2) Balance the histogram

(1) Balance the mean and standard deviation (m and )

The aim of this method is to make the m and to be the same for each detector. For each detector i, we need a transfer function to transfer measured mi and i to a standard set of m and .

For each detector n, assume measured mean = mn

measured = n

desirable mean = M

desirable = S

The transfer function is I'n = anIn + bn where I'n is the calibrated intensity and In is the original intensity

an and bn are the gain and bias to be determined.

The solution is:

For an 8-bit image, you may try to use M = 128 and S = 50 or may use the mean and standard deviation calculated from the entire sample.

This may not always work. The assumption behind this strategy is that detector responses are linear.

(2) Balance histogram

The assumption for balancing histogram is that each detector has the same probability of seeing the scene and, therefore, the grey-level distribution function should be the same. Thus if two detectors have different histograms (a discrete version of grey-level distribution function), they should be corrected to have the same histogram.

This is usually done by comparing their cumulative histograms as shown in Figure 5.4.

Figure 5.4. Balancing the histogram F2 to the reference histogram F1.

This process is done for each given grey level, g2, to find its cumulative frequencies fc2(g2) in F2. Then in F1 find the grey-level value, g1, such that its cumulative frequency fc1(g1) = fc2(g2). Then assign g1 to g2 in the histogram to be adjusted.

5.2 Atmospheric Correction of Remotely Sensed Data

Atmospheric correction is a major issue in visible or near-infrared remote sensing because the presence of the atmosphere always influences the radiation from the ground to the sensor.

The radiance that reaches a sensor can be determined by

Normally Lmax, Lmin and DNrange are known from the sensor manufacturer or operator.

However, Ls is composed of contributions from the target, background and the atmosphere (Figure 5.5):

Figure 5.5 Target, background and scattered radiation received by the sensor.

As introduced before, the atmosphere has severe effects on the visible and near-infrared radiance. First, it modifies the spectral and spatial distribution of the radiation incident on the surface. Second, radiance being reflected is attenuated. Third, atmospheric scattered radiance, called path radiance, is added to the transmitted radiance.

Assuming that Ls is the radiance received by a sensor, it can be divided into LT and LP

LS = LT + LP (1)

LT is the transmitted radiance.

LP is atmospheric path radiance.

Obviously, our interest is to determine LT.

For a given spectral interval, the solar irradiance reaching the earth's surface is

EG =

where ES is the solar irradiance outside the atmosphere,

Ti atmospheric transmittance along the the incident direction,

i incident angle

Ed diffuse sky irradiance

Surface can be either specular or diffuse. Most surfaces can be considered as approximate diffuse reflectors at high solar elevations, i.e. when i is small.

If the surface is assumed to be a perfect diffuse reflector i.e. the Lambertian case, the ratio of the radiation reflected in the viewing direction to the total radiation into the whole upper

hemisphere is given by .

Based on Lambertian assumption,

where is the target reflectance, Te is the transmittance along the viewing direction. Therefore in order to quantitatively analyze remotely sensed data, i.e. to find ρ, atmospheric transmittance T and path radiance Lp have to be known.

5.2.1 Single scattering atmospheric correction

In practice, (2) and (3) can be written as

Path radiance Lp Lp is determined by at least two parameters: single scattering albedo and single scattering phase function.

Single scattering albedo = 1 when no attenuation occurs. Single scattering phase function denotes the fraction of radiation which is scattered from its initial forward direction to some other direction.

For Rayleigh atmosphere

For Mie's atmosphere

From the above diagram, it can be seen that forward scattering is dominated by aerosols while back scattering is mainly due to Rayleigh scattering.

A number of path radiance determination algorithms exists. For a nadir view as Landsat MSS, TM and SPOT HRV are usually used. Lp for these algorithms can be determined by:

P is a combination of Mie and Ragleigh atmosphere.

For aerosol scattering, the phase function Pp(µi) does not change much as wavelength changes, the function for λ = 0.7 mm can be used for all wavelengths. This function is usually found in a diagram or a table form. See a function found in Forster (1984).

The average background B is usually determined by collecting ground-truth information for a region. A 3 km x 3 km square centering the pixel to be corrected can be used.

Sky Irradiance and Ground Irradiance

In this section, we only tried to introduce some basic concepts of this complex topic. This is only a single-scattering correction algorithm for nadir viewing condition. More sophisticated algorithms which counts multiple-scattering do exist. Some examples of these algorithms are LOWTRAN 7, 5S (Simulation of the Satellite Signal in the Solar Spectrum 5S) and 6S (Second Simulation - aircraft, altitude of target). There are FORTRAN codes available for these algorithms. The 5S and 6S are proposed by Tanre and his colleagues (e.g. Tanre et al., 1990, IGARSS 190, p. 187).

One has to be careful when conducting atmospheric correction since there are many factors to be counted and to be estimated. If these estimations are not properly made, the atmospheric correction might add more bias than does the atmosphere itself.

5.2.2 Dark-target atmospheric correction

This is most suitable to the clear sky when Rayleigh atmosphere dominates since Rayleigh scattering affects short wavelength, particularly visible, and we know that clear-deep water has a very low spectral reflectance in the short wavelength region. If a relatively large water body, say 1-2 km in diameter, can be found on an image, we can use the radiance of water derived from the image as Lw and the real water radiance, L, to estimate Lp.

Lw = K ï DN water + Lmin

Lp = Lw - L

Lp can then be subtracted from other radiances in an image for the visible channels.

For the infrared channels, Rayleigh atmosphere has little effects and Lp is assumed to be 0. It can be seen that this method only applies to Rayleigh atmosphere.

5.2.3 Direct digital number to reflectance transformation

This can be done by

R = a ï DN + b By tying the ground reflectance measured during the flight overpass to the corresponding

pixel values on the image, we can solve the equation to obtain a and b. This is an empirical method. In fact, both the dark-target and direct digital number conversion methods have

been most widely used in remote sensing.

5.3 Topographic Correction

In previous sections, we attempted to correct the atmospheric effects, i.e. convert image digital numbers DNs to image radiance Ls. After atmospheric correction, we expect to have the spectral reflectivity ρ.

Assume that atmospheric effects can be completely removed from the image, the spectral reflectivity obtained contains the real target reflectance r and the topographic modification during image acquisition, G.

= r ï G

The G contains information about the viewing and energy incidence geometric relationship.

Moon can be considered approximately as a surface that reflects equal amount of light in all directions.

5.3.1 The role of relief

What effects does the relief have on the image radiometry? To answer this question, a different coordinate system will be used and Figure 5.6 shows this image coordinate system. In this coordinate system, z is the viewing direction and x-y plane is the image plane.

The actual relief for a small area is defined by its normal and the light source defined by

In discrete case these are the differences between elevations between neighbourhood cells and the grid cell under consideration.

5.3.2 Gradient Space

For perfectly white surface, when r = 1

For a grey surface, which is a proportional to a perfectly white surface

We can consider as a function of the slope factor (p, q) of the surface.

If r is the same over the whole study area, we can use two set of (p, q)'s to recover (p, q). Similarly, we can use three sets of

(p, q)s

1(p, q) 2(p, q) 3(p, q) to

recover both "ρ" and (p, q).

Using (p, q), we can generate a shaded map based on a DEM of an area.

Instead of calculating (p, q) for each grid on a DEM, we can calculate a two dimensional lookup table

p q -0.2 -0.1 0 0.1 0.2

-0.2

-0.1

0

0.1

0.2

The entire DEM {p, q} can be mapped using the above table.

in vector form is (-p, -q, 1)

in vector form is (-ps, -qs, 1)

The look direction is (0, 0, 1)

For sensors that look in nadir direction, the image coordinate system is only a shift from the local Cartesian coordinates. Thus, the above formula can be used to correct satellite (Landsat) imagery.

These relationships can be seen from the following diagram.

Chapter 5

References

Forster, B.C., 1984. Derivation of atmspheric correction procedures for Landsat MSS with particular reference to urban data. Int. J. of Remote Sensing . 5(5):799-817.

Horn, B.K.P., 1986. Robot Vision. The MIT Press:Toronto.

Horn, B.K.P., and Woodham, R.J., 1979. Destriping Landsat MSS images by histogram modification. Computer Graphics and Image Processing. 10:69-83.

Richards, J.A., 1986. Digital Image Processing. Springer-Verlag: Berlin.

Tanre, D., Deuze, J.L., Herman, M., Santer, R., Vermonte, E., 1990. Second simulation of the satellite signal in the solar spectrum - 6S code. IGARSS'90, Washington D.C., p. 187.

Further Readings:

Woodham, R.J., and Gray, M.H., 1987. An analytic method for radiometric correction of satellite multispectral scanner data. IEEE Transactions on Geosciences and Remote Sensing. 25(3):258-271.

6. Image Enhancement

6.1 Histogram-Based Operation

A histogram of an image can tell us about the data distribution with respect to image grey levels. The purpose of a histogram-based operation is that when a grey-level transformation is made, pixels in the image having a specific range of grey levels can be enhanced or suppressed. This is also called contrast adjustment. It can be done using:

1. histogram stretching

2. histogram compression (Figure 6.1)

Figure 6.1. Histogram stretching and compression

Both histogram stretching and histogram compression can be done either linearly or nonlinearly.

a) Linear adjustment (Figure 6.2)

DN' = a ï DN

Figure 6.2.

b) Piece-wise linear adjustment (Figure 6.3)

Figure 6.3.

From Figure 6.3, we can have The idea of contrast adjustment is the mapping of the range of digital numbers in the original image to a new range. For example, if the image displayed using the histogram in Figure 6.4a appears dark on the screen due to the majority of pixel grey-level values are lower than 150. We can linearly stretch the histogram to transform the grey-level range (0-150) in Figure 6.4a to a new grey-level range (0-255) in Figure 6.4b.

a b

Figure 6.4. (a) original histogram of an image. (b) the histogram after adjustment.

The following transformation can be used:

c) Non-linear adjustment (Figure 6.5)

Figure 6.5. An exponential adjustment

We can try to use a = 16, b = , i.e. DN' = 16 .

Other non-linear functions include logarithmic, and even sinusoidal.

d) Non-linear adjustment - histogram equalization

The task of histogram equalization is to transform a histogram of any shape to a histogram which has the same frequency along the whole range of digital number (Figure 6.6).

Figure 6.6. In the continuous case reshape the histogram

This is realized by equally partitioning the cumulative histogram fc of the original image into 255 pieces. Each piece will correspond to one digital number in the equalized image (Figure 6.7). On the cumulative curve, find out the nth dividing point,

DN' = n the corresponding DN is x.

Figure 6.7. For the discrete case, modify the grey level value according to the principle of equal frequency.

The equalization process can also be considered as a histogram matching method used in image destriping as discussed in Section 5.1. Here we attempt to match the original cumulative histogram Fc1 to the new cumulative histogram Fc2 (Figure 6.8).

Figure 6.8.

The following example shows how an equalization can be made in discrete digital form. It starts with the generation of image histogram (first two columns in Table 6.1). Then probability, Pi is calculated from frequency, f(vi) (third column). A cumulative histogram Fc can be calculated from frequencies. Similarly, the cumulative distribution function (CDF) can be derived from probabilities. Based on the cumulative distribution function we can convert the original grey-levels into grey-levels of the equalized image (Table 2).

Table 6.1 Histogram, cumulative histogram and cumulative distribution function (CDF)

Grey Level (DN)

Frequency f(vi)

Probability Pi

Cumulative histogram Fc CDF

0 4 0.04 4 0.04 1 17 0.17 21 0.21 2 15 0.15 36 0.36 3 18 0.18 54 0.54 4 24 0.24 78 0.78 5 12 0.12 90 0.90 6 0 0 90 0.90 7 10 0.10 100 1.00 100 1.00

Table 6.2 Conversion from the grey levels of the original image to the output image

Input Level (23 - 1) * CDF Output

0 1 2 3 4 5 6 7

0.28 1.47 2.52 3.78 4.46 6.3 6.3 7

0 1 3 4 4 6 6 7

6.2 Density Slicing/Color Density Slicing and Pseudo Coloring

Density Slicing is to represent a group of contiguous digital numbers using a single value. Although some details of the image will be lost, the effect of noise can also be reduced by using density slicing. As a result of density slicing, an image may be segmented, or sometimes contored, into sections of similar grey level. Each of these segments is represented by a user specified brightness.

Similarly, we can represent a section of grey levels using different colors, or pseudocoloring. This has been used in coloring classification maps in most image analysis software systems. For example, five classes can be represented by red, green, blue, yellow, and grey. This can be realized by assigning red, green, and blue color guns with the following values:

Class No Red Gun Green Gun Blue Gun Color 1 255 2 0 3 0 4 255 5 100

6.3 Image Operation Based on Spatial Neighbourhoods

6.3.1 Window-based image smoothing, Low - pass filters

1. Averaging with equal weights

We can also use 5x5, 7x7, etc. This filter is also called a box-car filter.

2. Averaging with different weights

The last filter can be used to remove drop-out lines in Landsat images. This is done by applying a filter only along the drop-out lines in those images. 3. Median filter This filter is more useful in removing outliers, random noise, and speckles on RADAR imagery, than a simple average filter. It has a desirable effect of keeping edges to some extent. This filter can also be applied to drop-out line removal in some Landsat images.

If we denote an image window in the following form:

The average filter in 1 can be written as

By moving (i, j) all over an image, the original image, I, can be filtered and the new image, I', can be created.

For 2,

6.3.2 Window-based edge enhancement - High-pass filters

In order to enhance edges, differences between neighbourhood digital numbers are taken. We will start from one dimensional example:

1 1 1 1 : 2 2 2 2 I

<> edge

By taking I(i+1) - I(i) , we get

0 0 0 1 0 0 0 I' We suppressed all the non-change part and left the edge out and thus an enhancement can be achieved. We can apply the differencing technique again to I', to get I''

0 0 1 -1 0 0 I" I" = I'(i+1) - I'(i) = I(i+2) - I(i+1) - I(i+1) + I(i)

= I(i+2) - 2I(i+1) + I(i)

1 -2 1 are the weights The advantage of using a second order differencing is that we can locate exact position of the edge at the zero-crossing point.

We call the first differencing, taking a gradient and the second differencing, taking a . We can use the matrix

1 -2 1 as a Laplacian filter, an edge enhancement filter. In the two-dimension form, a Laplacian filter is:

Another form can be:

Sobel filter - spatial derivative

6.3.3 Contrast stretching through high-frequency enhancement

This is also called edge-enhancement by subtractive smoothing

Why we don't use

This contrast will not be as good as DN-KDN".

The question is, can we write DN-kDN" in a filter form? The answer is yes.

6.3.4 Linear Line Detection Templates

With 5 x 5 filters we can have more directions ex.

6.4 Morphological Filtering

Morphological filtering is one type of processing in which the spatial form or structure of objects within an image is modified. Dilation, erosion and skeletonization are three fundamental morphological operations.

In this section, we first introduce binary image morphological filtering. Two types of connectivities are defined as following.

6.4.1 Binary image hit or miss transformations

Basic morphological operations, dilation, erosion and many variants can be defined and implemented by "hit or miss" transformations. A small odd-sized mask is scanned over a binary image. If the binary-pattern of a mask matches the state of the pixels under the mask, an output pixel in spatial correspondence to the center pixel of the mask is set to some desired binary state. Otherwise, the output is set to the opposite binary state.

For example, to perform simple binary noise cleaning, if the isolated 3 x 3 pixel window

is encountered, the "1" in the center will be replaced by a "0". Otherwise, the center pixel value is not changed.

It is often possible to use simple neighbourhood logical relationships to define the conditions for a hit. For the simple noise removal case,

where denotes the logical and i.e., intersection operation and denotes the union.

For simplicity purpose, we use a local coordinates to represent a pixel window:

Additive operators The center pixel of a 3 x 3 pixel window are converted by these operators from zero state to one state when a hit is obtained. The basic operators include Interior Fill - create one if all four-corrected neighbour pixels are one

Diagonal Fill - create one if this process will eliminate eight-connecting of the background.

where

Bridge - Create one if this will result in connectivity of previously unconnected neighbouring ones.

where

and

There are 119 patterns which satisfy the above condition. For example,

Eight-Neighbour Dilate create one if at least one eight-connected neighbour pixel is one.

This is a special case of dilation.

Subtractive operators convert center pixel from one to zero. Isolated one removal

Spur removal - Erase one with a single eight-connected neighbour

where

H - break - Erase one if it is H-connected

Interior Pixel Removal - Erase one if all 4-connected neighbours are ones

Eight-Neighbour Erode - Erase one if at least one eight-connected neighbour pixel is zero.

6.4.2 Binary image generalized dilation and erosion

Examples of image set algebraic operations

Generalized Dilation

It is expressed as

where I(i,j) for 1 < i, j < N is a binary-valued image and H(m,n) for 1 < m,n < a, a is an odd integer called a structuring element. Minkowski addition is defined as

In order to compare I(i,j) with I'(i,j). I(i,j) should be translated to TQ(I(i,j)) where Q = ((L-1)/2, (L-1)/2)

Generalized erosion is defined as

where H(m,n) is an odd size LxL structuring element. One formula is

Another formula using the reflection of H as a structuring element:

According to the rules defined above, you can observe what it looks like. Some properties of Dilation and Erosion

I(i,j) = I

Dilation is commutative I J = J I But, in general, erosion is not commutative I J J I Dilation and erosion are opposite in effect; dilation of a background of an object behaves like erosion of the object

The following chain rules hold for dilation and erosion

A (B C) = (A B) C

A (B C) = (A B) C

6.4.3. Binary image close and open operations

Dilation and erosion are often applied to an image in concatenation. A dilation followed by an erosion is called a close operation,

I'(i,j) = I(i,j) ï H(m,n) = [I(i,j) H(m,n) ] (m,n) The close operation, also called closing, fills gaps and preserves isolated pixels that have a binary value of 1.

An erosion followed by a dilation is called an open operation.

I'(i,j) = I(i,j) ï H(m,n) = [I(i,j) (m,n)] H(m,n)

The open operation, also called openning, breaks thin connections and clears isolated pixels with binary values of 1.

6.4.4 Grey scale image morphological filtering

Applying mathematical morphology to grey scale images is equivalent to finding the maximum or the minimum of a neighborhood defined by the structuring element. If a 3X3 neighborhood is taken as a structuring element, then dilation is defined as

I'(i,j) = max (I,I0,I1,I2,I3,I4,I5,I6,I7)

and erosion is defined as

I'(i,j) = min (I,I0,I1,I2,I3,I4,I5,I6,I7).

Similarly, closing refers to a dilation followed by an erosion while openning means erosion followed by dilation. The effect of closing on grey scale images is that small objects

brighter than background are preserved and bright objects with small gaps in between may become connected. Openning, on the other hand, removes bright objects that are small in

size and breaks narrow connections between two bright objects.

6.5 Image Enhancement in Multispectral Space - Multispectral Transformation

The multispectral or vector nature of most remote sensing data makes it possible for spectral transformations to generate new sets of image components or bands. The transformed image may make evident features not discernable in the original data or alternatively, it might possibly preserve the essential information content of the image with a reduced number of the transformed dimensions. The last point has significance for the display of a data in three dimensions on a colour monitor or in colour hardcopy, and for transmission and storage of data.

6.5.1 Image arithmetic, band ratios and vegetation indices

Addition, subtraction, multiplication, and division of the pixel brightnesses from two bands of image data form a new image. Multiplication is not as useful as others.

We can plot the pixel values in a two-dimensional space (Figure 6.10.) This two-dimensional diagram is called a scatter plot.

Figure 6.10. A scatterplot

A multispectral space is a coordinating system in which each axis represents the grey-level values of a specific image band.

Ratio

R(i,j) is the ratio for pixel coordinate (i,j)

ak, bk are constants, there are at least one a and one b that are not 0.

nb is the number of bands.

Commonly used ratios are:

Rv - a ratio that tends to enhance vegetation. It is also called a vegetation index.

Ratioing also allows for shade effect suppression.

Figure 6.11. Reflectances of two types of vegetation Figure 6.11 shows that for a healthy vegetation, the spectral reflectance difference between the NIR band and the R band is quite high. As the vegetation suffers from stress, the difference is smaller.

To generate two ratios between SRNIR and SRR, one for the normal vegetation and one for the vegetation which is under stress, we use the following equations.

SR represents spectral reflectance.

From these ratios, RVN > RVS, we can observe the difference in the conditions of the two types of vegetation

Vegetation Indices

Normalized Difference Vegetation Index (NDVI)

This is calculated from the raw remote sensing data. We can also calculate the NDVI using the processed remote sensing data (after converting digital numbers to spectral reflectances)

To suppress the effect of different soil backgrounds on the NDVI, Huete (1989) recommended to use a soil-adjusted vegetation index:

The mathematical equivalence of NDVI

Transformed Vegetation Index

TVI = {(DNNIR - DNR)/(DNNIR + DNR)}1/2

Perpendicular Vegetation Index

6.5.2 Principal component transformation

The dimension of the multispectral space constructed by a remotely sensed image is the number of spectral bands. For example, Landsat MSS image constructs a four dimensional multispectral space. For Landsat TM image, the multispectral space will have seven dimensions.

For simplicity purpose, two-dimensional data will be used as examples to illustrate the procedure of principal component transformation. Without loss of generality, the procedure can be applied to data in multispectral space of any dimension.

The Covariance Matrix and Correlation Matrix Two examples will be used to illustrate the usefulness of covariance matrix. Example 1

Pixel B1 B2 Xi - M

X1 1 2 -2, -0.33

X2 2 1 -1, -1.33

X3 4 1 1, -1.33

X4 5 2 2, -0.33

X5 4 4 1, 1.67

X6 2 4 -1, 1.67

M 3 2.33

1 2 3

4 5 6

Scatter plot for Example 1

Example 2

Pixel B1 B2 Xi - M

X1 2 2 -1.5, -1.5

X2 4 3 0.5, -0.5

X3 5 4 1.5, 0.5

X4 5 5 1.5, 1.5

X5 3 4 -0.5, 0.5

X6 2 3 -1.5, -0.5

M 3.5 3.5

Scatter plot of example 2

To calculate the means in vector form

where N is the number of pixels.

For variance-covariance matrix V

Since N is normally very large we can approximate and write

V is an nb x nb symmetric matrix.

The mean vectors and (Xi - M) are as listed in the two example tables.

The covariance matrix for example one is

What are the differences between V1 and V2? We can answer this question by further examining their corresponding correlation matrices R1 and R2.

From R1, we can see that the correlation between Band 1 and 2 is 0. This means that Band 1 and Band 2 contain independent information about our target. We cannot use B1 to replace B2.

For R2, the correlation between Band 1 and Band 2 is 0.761, which is quite high. Using either channel, we can obtain, to a large extent, information about the other channel.

The Principal Component Transformation (PCT)

PCT, or an Hoteling transformation, is the purpose of a principal component transformation to find a new set of coordinate system in which data can be represented without correlation as for the case of V1. In other words, can we find a coordinating system such that V2 can be transformed to a diagonal matrix? The answer can be found in matrix algebra.

X −−−−−−−−−−−>Y

transformation

It is recommended that a rotation matrix be used to complete this process. The rotation matrix is G, Y = GX . G can be found by deriving the eigenvalues and eigenvectors from the covariance matrix Vx. To find eigenvalues we need to solve | Vx - λI | = 0 (1) Where "I" is an identity matrix. is the eigenvalue vector ( 1, 2, ...., nb)T.

For each non-zero eigenvalue, λi, we can find its corresponding eigenvector gi = (gi1, gi2, ...., ginb)T . This can be obtained from

[ Vx - iI ] ï gi = 0 (2) The rotation matrix G can then be determined by

As an example, we will find the eigenvalues and eigenvectors for V2 =

To find eigenvalues, we use (1)

Once the transformation is done, the covariance matrix is in the new coordinating system

Now the results can be interpreted using the data in example 2 (Figure 6.12).

Figure 6.12. The new axes derived from the PCT in the original coordinate system.

B'1 and B'2 are the new axes. In this coordinating system, data variance along B'1 is 2.67 while variance on B'2 is only 0.33. This means that in the rotated space, the data variance along one axis is the same as its corresponding eigenvalue.

From, 2.67 + 0.33 = 1.90 + 1.10 = 3.00, we can see that the rotation will not affect the total variance of the original data. Using 1.90/3.00 and 1.10/3.00 we can determine the percentage of total variances that B1 and B2 represent.

B1 represents 1.90/3.00 = 63.3% of the total variance of the original data

B2 represents 1.10/3.00 = 36.7% of the total variance of the original data

The percentages are called loading of each band.

For B'1, it represents 2.67/3.00 = 89% of the total variance while B'2 contains only 11% of the total variance.

From the loadings of B'1 and B'2, we can see that after the rotation we can add more loading in one band while reducing the amount of loading in another band. For multispectral space with nb dimensions, after the principal component transformation, we will have a few higher loadings for the first few bands and a very low loading for the rest. We call those bands containing relatively high loadings the principal components. We can, therefore, make the use of these principal components in our data analysis while ignoring those relatively minor components. By so doing, we will not lose much of the original data variability. This serves as a purpose of reducing data dimensionality. It's application in

classification (keeping the maximum variance) and in change detection (keeping the minimal variance) normally holds the promise.

The PCT is a linear transformation technique which helps to enhance remotely sensed imagery. Although, principal components are often used, minor components may also be useful in highlighting information on low data variability that the remote sensing data have. For example, a few researchers have used the PCT to multi-temporal change detection. They found that changes in information of a scene are preserved in minor components.

6.5.3 Tasselled Cap Transform (K-T transform)

Different from the PCT which is based on the data covariance matrix, Kauth and Thomas (1976) have developed a linear transformation which is physically-based on crop growth.

Figure 6.13. A 3-D data scatterplot of the multispectral space constructed by the green, red and near-infrared bands (Which looks like a tasselled cap.)

The growing cycle of crop started from bare soil, then to green vegetation and then to crop maturation with crops turning yellow. These different stages of vegetation growth has made the data distribution in the three dimensional multispectral space (Figure 6.13) appear in a shape of a tasselled cap.

Kauth and Thomas defined a linear transformation to enhance the data according to the data structure. They have defined four components called, redness (soil), greenness (vegetation), yellowness and noise, using the following transformation matrix for Landsat MSS data

Later, Crist, Cicone and Kauth developed a new transformation technique for Landsat TM data. (Crist and Kauth, 1986; Crist and Cicone, 1984)

Their new redness or brightness and greenness are defined as:

Redness = 0.3037 TM1 + 0.2793 TM2 + 0.4743 TM3

+ 0.5586 TM4 + 0.5082 TM5 + 0.1863 TM7

Greenness = -0.2848 TM1 - 0.2435 TM2 - 0.5436 TM3

+ 0.7243 TM4 + 0.0840 TM5 - 0.1800 TM7 .

Chapter 6

References

Crist, E.P. and Cicone, R.C., 1984. A physically-based transformation of the Thematic Mapper data - the Tessled Cap. IEEE Transactions on Geoscience and Remote Sensing. GE-23:256-263

Crist,E.P., and KauthR.J., 1986. The Tessled Cap De-Mystified. Photogrammetric Engineering and Remote Sensing. 52(1):81-86.

Huete, A.R., 1989. Soil influences in remotely sensed vegetation canopy spectra. In Theory and Applications of Optical Remote Sensing. Ed. by G. Asrar, John Wiley and Sons: New York.

Kauth, R.J., Thomas, G.S. 1976. The tessled cap - a graphic description of the spectral-temporal development of agricultural crops as seen by Landsat. Proceedings of the symposium on Machine Processing of Remotely Sensed Data. Purdue University, West Lafayette, Indiana, pp. 4B41-51.

Pratt, W., 1991. Digital Image Processing. John Wiley and Sons: Toronto.

Richards, J.A., 1987. Digital Image Processing. Springer-Verlag, Berlin.

7. Information Extraction

7.1 Image Interpretation

To derive useful spatial information from images is the task of image interpretation. It includes

ï detection: such as search for hot spots in mechanical and electrical facilities and white spot in x-ray images. This procedure is often used as the first step of image interpretation.

ï identification: recognition of certain target. A simple example is to identify vegetation types, soil types, rock types and water bodies. The higher the spatial/spectral resolution of an image, the more detail we can derive from the image.

ï delineation: to outline the recognized target for mapping purposes. Identification and delineation combined together are used to map certain subjects. If the whole image is to be processed by these two procedures, we call it image classification.

ï enumeration: to count certain phenomena from the image. This is done based on detection and identification. For example, in order to estimate household income of the population, we can count the number of various residential units.

ï mensuration: to measure the area, the volume, the amount,and the length of certain target from an image. This often involves all the procedures mentioned above. Simple examples include measuring the length of a river and the acreage of a specific land-cover class. More complicated examples include an estimation of timber volume, river discharge, crop productivity, river basin radiation and evapotranspiration.

In order to do a good job in the image interpretation, and in later digital image analysis, one has to be familiar with the subject under investigation, the study area and the remote sensing system available to him. Usually, a combined team consisting of the subject specialists and the remote sensing image analysis specialists is required for a relatively large image interpretation task.

Depending on the facilities that an image interpreter has, he might interpret images in raw form, corrected form or enhanced form. Correction and enhancement are usually done digitally.

Elements on which image interpretation are based

ï Image tone, grey level, or multispectral grey-level vector

Human eyes can differentiate over 1000 colors but only about 16 grey levels. Therefore, colour images are preferred in image interpretation. One difficulty involved is use of multispectral image with a dimensionality of over 3. In order to make use of all the information available in each band of image, one has to somehow reduce the image dimensionality.

ï Image texture

Spatial variation of image tones. Texture is used as an important clue in image interpretation. It is very easy for human interpreters to include it in their mental process. Most texture patterns appear irregular on an image.

ï Pattern

Regular arrangement of ground objects. Examples are residential area on an aerial photograph and mountains in regular arrangement on a satellite imagery.

ï Association

A specific object co-occurring with another object. Some examples of association are an outdoor swimming pool associated with a recreation center and a playground associated with a school.

ï Shadow

Object shadow is very useful when the phenomena under study have vertical variation. Examples include trees, high buildings, mountains, etc.

ï Shape

Agricultural fields and human-built structures have regular shapes. These can be used to identify various target.

ï Size

Relative size of buildings can tell us about the type of land uses while relative sizes of tree crowns can tell us about the approximate age of trees.

ï Site

Broad leaf trees are distributed at lower and warmer valleys while coniferous trees tend to be distributed on a higher elevation, such as tundra. Location is used in image interpretation.

Image interpretation strategies

Direct recognition: Identification of targets. Land-cover classification

(Land cover is the physical evidence of the earth's surface.)

- indirect interpretation

to map something that is not directly observable in the image. This is used to classify land use types (Gong and Howarth, 1992b). Land-use is the human activities on a piece of land. It is closely related to land-cover types. For example, a residential land-use type is composed of roof cover, lawn, trees and paved surfaces.

- from known to unknown

To interpret an area where the interpreter is familiar with first, then interpret the areas where the interpreter is not familiar with (Chen et al, 1989). This can be assisted by field observation

- from direct to indirect

In order to obtain forest volume, one might have to determine what is observable from the image, such as tree canopies, shadows etc. Then the volume can be derived. We can also estimate the depth of permafrost using the surface cover information (Peddle, 1991).

- Use of collateral information

Census data,and topographical maps and other thematic maps may all be useful during image interpretation.

More details on the image interpretation can be found in Lillesand and Kiefer (1994) or Campbell (1987).

7.2 Image Segmentation

Dividing an image into relatively homogeneous regions or blocks.

1. Thresholding - Global operation

Multilevel thresholding

where n is the code of segments

N is the maximum grey level value

e.g.

I(i,j) =

2 3 5 7

1 4 6 1

2 2 3 2

1 3 3 2

when threshold T = 4, the resultant thresholded image is:

0 0 1 1

0 1 1 0

0 0 0 0

0 0 0 0

Normally T is determined from the histogram of an image as shown in the following example.

2. Region-growing - local operation

I0 I1 I2

I7 I I3

I6 I5 I4

(1) Suppose I as a seed (starting point) is of Label K, then Ii will also belong to K, if |Ii - I| < .

where is a small tolerance number, and i = 0, 1, 2, ...., 7.

Create a mean m based on the second point that is assigned to K.

(2) If the second point is not found in the local neighbourhood, then remove the label K from the seed point I. (3) If the second point is found, then operate (1) with the second point using m1. If a third point I is found, a new m2 will be generated based on m1 an Ij.

(4) Gradually growing a local area by using the criterion in (1). If an nth point is found, the mn-1 is adjusted to the group mean

(5) Repeat (1) to (4) with different seeds and s. Thresholding is faster, however, it is not adaptive to local properties. e.g. if a neighbourhood is as following

5 7 6

4 2 5

7 6 5

For thresholding with a threshold of 4 the results will be

1 1 1

1 0 1

1 1 1

while with the region-growing technique, if the seed I = 2 and = 1, 2 will not be assigned to a segment label because no neighbourhood pixel will meet the criterion in (4).

Image segmentation can also be done using clustering algorithms. Segmentation is usually used as the first step in image analysis. Once an image is properly segmented, the following operation can be performed: classification, morphological operation, and image understanding through knowledge-based or more advanced computation.

7.3 Conventional Multispectral Classification Methods

7.3.1 General procedures in image classification

Classification is the most popularly used information extraction techniques in digital remote sensing. In image space I, a classification unit is defined as the image segment on which a classification decision is based. A classification unit could be a pixel, a group of neighbouring pixels or the whole image. Conventional multispectral classification techniques perform class assignments based only on the spectral signatures of a classification unit. Contextual classification refers to the use of spatial, temporal, and other related information, in addition to the spectral information of a classification unit in the classification of an image. Usually, it is the pixel that is used as the classification unit.

General image classification procedures include (Gong and Howarth 1990b):

(1) Design image classification scheme: they are usually information classes such as urban, agriculture, forest areas, etc. Conduct field studies and collect ground infomation and other ancillary data of the study area.

(2) Preprocessing of the image, including radiometric, atmospheric, geometric and topographic corrections, image enhancement, and initial image clustering.

(3) Select representative areas on the image and analyze the initial clustering results or generate training signatures.

(4) Image classification

Supervised mode: using training signature

unsupervised mode: image clustering and cluster grouping

(5) Post-processing: complete geometric correction & filtering and classification decorating.

(6) Accuracy assessment: compare classification results with field studies.

The following diagram shows the major steps in two types of image classification:

Supervised:

Unsupervised

In order to illustrate the differences between the supervised and unsupervised classification, we will introduce two concepts: information class and spectral class:

Information class: a class specified by an image analyst. It refers to the information to be extracted.

Spectral class: a class which includes similar grey-level vectors in the multispectral space.

In an ideal information extraction task, we can directly associate a spectral class in the multispectral space with an information class. For example, we have in a two dimensional space three classes: water, vegetation, and concrete surface.

By defining boundaries among the three groups of grey-level vectors in the two-dimensional space, we can separate the three classes.

One of the differences between a supervised classification and an unsupervised one is the ways of associating each spectral class to an information class. For supervised classification, we first start with specifying an information class on the image. An algorithm is then used to summarize multispectral information from the specified areas on the image to form class signatures. This process is called supervised training. For the unsupervised case,however, an algorithm is first applied to the image and some spectral classes (also called clusters) are formed. The image analyst then try to assign a spectral class to the desirable information class.

7.3.2 Supervised classification

Conventional Pixel-Labelling Algorithms in Supervised Classification

A pixel-labelling algorithm is used to assign a pixel to an information class. We can use the previous diagram to discuss ways of doing this.

From the above diagram, there are two obvious ways of classifying this pixel.

(1) Multidimensional thresholding

As in the above diagram, we define two threshold values along each axis for each class. A grey-level vector is classified into a class only if it falls between the thresholds of that class along each axis.

The advantage of this algorithm is its simplicity. The drawback is the difficulty of including all possible grey-level vectors into the specified class thresholds. It is also difficult to properly adjust the class thresholds.

(2) Minimum-Distance Classification

Fig. 1 shows spectral curves of two types of ground target: vegetation and soil. If we sample the spectral reflectance values for the two types of targets (bold-curves) at three spectral bands: green, red and near-infrared as shown in Fig. 1, we can plot the sampled values in the three dimensional multispectral space (Fig. 2). The sampled spectral values become two points in the multispectral space. Similar curves in Fig. 1 will be represented by closer points in Fig. 2 (two dashed curves in Fig. 1 shown as empty dots in Fig. 2. From

Fig. 2, we can easily see that distance can be used as a similarity measure for classification. The closer the two points, the more likely they are in the same class.

We can use various types of distance as similarity measures to develop a classifier, i.e. minimum-distance classifier.

In a minimum-distance classifier, suppose we have nc known class centers C = {C1, C2, ..., Cnc}, Ci, i = 1, 2, ..., nc is the grey-level vector for class i.

As an example, we show a special case in Fig. 3 where we have 3 classes (nc = 3) and two spectral bands (nb = 2)

If we have a pixel with a grey-level vector located in the B1-B2 space shown as A (an empty dot), we are asked to determine to which class it should belong. We can calculate the distances between A and each of the centers. A is assigned to the class whose center has the shortest distance to A.

In a general form, an arbitrary pixel with a grey-level vector g = (g1, g2, ..., gnb)T,

is classified as Ci if

d(Ci, g) = min (d(Ci1,g1), d(Ci2,g2), ..., d(Cinb,gnb))

Now, in what form should the distance d take? The most-popularly used form is the Euclidian distance

The second popularly used distance is Mahalanobis distance

where V-1 is the inverse of the covariance matrix of the data.

If the Mahalanobis distance is used, we call the classifier as a Mahalanobis Classifier.

The simplest distance measure is the city-block distance

For dm and de, because taking their squares will not change the relative magnitude among distances, in the minimum distance classifiers, we usually use as the distance measures so as to save some computations.

Class centers C and the data covariance matrix V are usually determined from training samples if a supervised classification procedure is used. They can also be obtained from clustering.

For example, there are ns pixels selected as training sample for class Ci.

where j = 1, 2, ..., nb

k = 1, 2, ..., ns

If there are a total of nt pixels selected as training samples for all the classes

The average vector M = (m1, m2, ..., mns) will be obtained.

i = 1, 2, ..., nb.

k = 1, 2, ..., nt.

The covariance matrix is then obtained through the following vector form

(3) Maximum Likelihood Classification (MLC)

MLC is the most common classification method used for remotely sensed data. MLC is based on the Baye's rule.

Let C = (C1, C2, ..., Cnc) denote a set of classes, where nc is the total number of classes. For a given pixel with a grey-level vector x, the probability that x belongs to class ci is P(Ci|x), i = 1, 2, ..., nc. If P(Ci|x) is known for every class, we can determine into which class x should be classified. This can be done by comparing P(Ci|x)'s, i = 1, 2, ..., nc.

x => ci, if P(Ci|x) > P(Cj|x) for all j # i. (1)

However, P(Ci|x) is not known directly. Thus, we use Baye's theorem:

P(Ci|x) = p(x|Ci) ï P(Ci)/P(x)

where

P(Ci) is the probability that Ci occurs in the image. It is called a priori probability.

P(x) is the probability of x occurring in each class ci.

However, P(x) is not needed for the classification purpose because if we compare P(C1|x) with P(C2|x), we can cancel P(x) from each side. Therefore, p(x|Ci) i = 1, 2, ..., nc are the conditional probabilities which have to be determined. One solution is through statistical modelling. This is done by assuming that the conditional probability distribution function (PDF) is normal (also called, Gaussian distribution). If we can find the PDF for each class and the a priori probability, the classification problem will be solved. For p(*x|ci) we use training samples.

For one-dimensional case, we can see from the above figure that by generating training statistics of two classes, we have their probability distributions. If we use these statistics directly, it will be difficult because it requires a large amount of computer memory. The Gaussian normal distribution model can be used to save the memory. The one-dimensional Gaussian distribution is:

where we only need two parameter for each class µi and , i = 1, 2, ..., nc

µi the mean for Ci

the standard deviation of Ci

µi, can be easily generated from training sample.

For higher dimensions,

where nb is the dimension (number of bands)

µi is the mean vector of ci

Vi is the covariance matrix of Ci

P(Ci) can also be determined with knowledge about an area. If they are not known, we can assume that each class has an equal chance of occurrence.

i.e. P(C1) = P(C2) = ... = P(Cnc)

With the knowledge of p(x|Ci) and P(Ci), we can conduct maximum likelihood classification. p(x|Ci) ï P(Ci) i = 1, 2, ..., nc can be compared instead of P(Ci|x) in (1).

The interpretation of the maximum likelihood classifier is illustrated in the above figure. An x is classified according to the maximum p(x|Ci) ï P(Ci). x1 is classified into C1, x2 is classified into C2. The class boundary is determined by the point of equal probability.

In two-dimensional space, the class boundary cannot be easily determined. Therefore we don't use boundaries in maximum likelihood classification and, instead, we compare probabilities.

Actual implementation of MLC

In order to simplify the computation, we usually take a logarithm of p(x|Ci). P(Ci)

Since - nb/2 ï log 2π is a constant, the RHS can be simplified to

(2)

Often, we assume P(Ci) is the same for each class. Therefore (2) can be further simplified to

(3)

g(x) is referred to as the discriminant function.

By comparing g(x)'s, we can assign x to the proper class.

With the maximum likelihood classifier, it is guaranteed that the error of misclassification is minimal if p(x|Ci) is normally distributed.

Unfortunately, the normal distribution cannot always be achieved. In order to make the best use of the MLC method, one has to make sure that his training sample will generate distributions as close to the normal distribution as possible.

How large should one's training sample be? Usually, one needs 10 x nb, preferably 100 x nb, pixels in each class (Swain and Davis, 1978).

MLC is relatively robust but it has the limitation when handling data at nominal or ordinal scales. The computational cost increases considerably as the image dimensionality increases.

7.3.3 Clustering algorithms

For images that the user has little knowledge on the number and the spectral properties of spectral classes, clustering is a useful tool to determine inherent data structures. Clustering in remote sensing is the process of automatic grouping of pixels with similar spectral characteristics.

ï Clustering measures - measures how similar two pixels are. The similarity is based on:

(1) Euclidean distance dE(x1, x2)

(2) City-block distance dc(x1, x2)

ï Clustering criteria - determines how well the clustering results are

ï Sum of Squared Error (SSE)

Clustering algorithm 1: Moving cluster means

K-means clustering (also called c-means clustering)

1. Select K points in the multispectral space as candidate clustering centres

Let these points be

Although m can be arbitrarily selected, it is suggested that they be selected evenly in the multispectral space. For example, they can be selected along the diagonal axis going through the origin of the multispectral space.

2. Assign each pixel x in the image to the closest cluster centre m

3. Generate a new set of cluster centers based on the processed result in 2.

n is the number of iterations of step 2.

4. If ,(a small tolerance), the procedure is terminated. Otherwise let

= and return to step 2 to continue.

Clustering Algorithm 2

ISODATA - Iterative Self Organizing Data Analysis Technique A

Based on the K-means algorithm, ISODATA adds two additional steps to optimize the clustering process.

1. Merging and deletion of clusters

At a suitable stage, e.g. after a number of iterations of steps 2 - 4 in the K-means algorithm, all the clusters i = 1, 2, ..., nc are examined.

If the number of pixels in a particular cluster is too small, then that particular cluster is deleted.

If two clusters are too close, then they are merged into one cluster.

2. Splitting a cluster

If the variance of a cluster is too large, that cluster can be divided into two clusters.

These two steps increase the adaptivity of the algorithm but also increase the complexity of computation. Compared to K-means, ISODATA requires more specification of parameters for deletion and merging and a variance limit for splitting. Variance has to be calculated for each cluster.

In the K-means algorithm, clustering may not be realized, i.e., the clustering is not converging. Therefore, we might have to specify the number of iterations to terminate a clustering process.

Clustering Algorithm 3: Hyrarchical clustering.

This algorithm does not require an image analyst to specify the number of classes beforehand. It assumes that all pixels are individual clusters and systematically merges clusters by checking distances between means. This process is continued until all pixels are in one cluster. The history of merging (fusion) is recorded and they are displayed on a dendrogram, which is a diagram that shows at what distances the centers of particular clusters are merged. The following figure shows an example of this algorithm.

This procedure is rarely used in remote sensing because a relatively large number of pixels in the initial cluster centers requires a huge amount of disk storage in order to keep track of cluster distances at various levels. However, this algorithm can be used when a smaller number of clusters is obtained previously from some other methods.

Clustering Algorithm 4: Histogram-based clustering.

Histogram in high dimensional space H(V) is the occurrence frequency of grey-level vector V. The algorithm is to find peaks in the multidimensional

histogram:

(1) Construct a multi-dimensional histogram

(2) Search for peaks in the multispectral space using an eight-neighbour comparison strategy to see if the center frequency is the highest in a 3 x 3 grey-level vector neighbourhood. For three dimensional space, search the peak in a neighbourhood.

(3) If a local highest frequency grey-level vector is found, it is recorded as a cluster center.

(4) After all centers are found, they are examined according to the distance between each pair of clusters. Certain clusters can be merged if they are close together. If a cluster center has a low frequency it can be deleted.

The disadvantage of this algorithm is that it requires a large amount of memory space (RAM). For an 8-bit image, we require 256 x 4 bytes to store frequencies (each frequency is a 40 byte integer) if the image has only one band. As the dimensionality becomes higher, we need x 4 bytes of memory. When NB = 3, it requires 64 MB (256nb). Nevertheless, this limit could partly be overcome by a grey-level vector reduction algorithm (Gong and Howarth, 1992a).

7.3.4 Accuracy assessment

Accuracy assessment of remote sensing product

The process from remote sensing data to cartographic product can be summarized as following:

The reference that the remote sensing products are to be compared with is created based on human generalization. Depending on the scale of the reference map product, linear features and object boundaries are allowed to have a buffer zone. As long as the boundaries fall in their respective buffer zones, they are considered correct.

However, this has not been the case in assessing remote sensing products. In the evaluation of remote sensing products, we have traditionally adopted a hit-or-miss approach, i.e., by overlaying the reference map on top of the map product obtained from remote sensing, instead of giving the RS products tolerant buffers.

Some of the classification accuracy assessment algorithms can be found in Rosenfield and Fitz patrick-lins (1986) and Story and Congalton (1986)

In the evaluation of classification errors, a classification error matrix is typically formed. This matrix is sometimes called confusion matrix or contingency table. In this table, classification is given as rows and verification (ground truth) is given as columns for each sample point.

The above table is an example confusion matrix. The diagonal elements in this matrix indicate numbers of sample for which the classification results agree with the reference data.

The matrix contain the complete information on the categorical accuracy. Off diagonal elements in each row present the numbers of sample that has been misclassified by the classifier, i.e., the classifier is committing a label to those samples which actually belong to other labels. The misclassification error is called commission error.

The off-diagonal elements in each column are those samples being omitted by the classifier. Therefore, the misclassification error is also called omission error.

In order to summarize the classification results, the most commonly used accuracy measure is the overall accuracy:

From the example of confusion matrix, we can obtain = (28 + 15 + 20)/100 = 63%.

More specific measures are needed because the overall accuracy does not indicate how the accuracy is distributed across the individual categories. The categories could, and frequently do, exhibit drastically differing accuracies but overall accuracy method considers these categories as having equivalent or similar accuracies.

By examining the confusion matrix, it can be seen that at least two methods can be used to determine individual category accuracies.

(1) The ratio between the number of correctly classified and the row total

(2) The ratio between the number of correctly classified and the column total

(1) is called the user's accuracy because users are concerned about what percentage of the classes has been correctly classified.

(2) is called the producer's accuracy.

The producer is more interested in (2) because it tells how correctly the reference samples are classified.

However, there is a more appropriate way of presenting the individual classification accuracies. This is through the use of commission error and omission error.

Commission error = 1 - user's accuracy

Omission error = 1 - producer's accuracy

Kappa coefficient

The Kappa coefficient (K) measures the relationship between beyond chance agreement and expected disagreement. This measure uses all elements in the matrix and not just the

diagonal ones. The estimate of Kappa is the proportion of agreement after chance agreement is removed from consideration:

= (po - pc)/(1 - pc)

po = proportion of units which agree, =Spii = overall accuracy

pc = proportion of units for expected chance agreement =Spi+ p+i

pij = eij/NT

pi+ = row subtotal of pij for row i

p+i = column subtotal of pij for column i

po = 0.63

One of the advantages of using this method is that we can statistically compare two classification products. For example, two classification maps can be made using different algorithms and we can use the same reference data to verify them. Two s can be derived, 1, 2. For each , the variance can also be calculated.

It has been suggested that a z-score be calculated by

A normal distribution table can be used to determine where the two s are significantly different from Z.

e.g. if Z> 1.96, then the difference is said to be significant at the 0.95 probability level.

can be estimated by using the following equation:

Given the above procedures, we need to know how many samples are need to be collected and where they should be placed.

Sample size

(1) The larger the sample size, more representative an estimate can be obtained, therefore, more confidence can be achieved.

(2) In order to give each class a proper evaluation, a minimum sample size should be applied to every class.

(3) Researchers have proposed a number of pixel sampling schemes (e.g., Jensen, 1983). These are:

ï Random

ï Stratified Random

ï Systematic

ï Stratified Systematic Unaligned Sampling

7.4 Non-Conventional Classification Algorithms

1. By conventional classification, we refer to the algorithms which make the use of only multi-spectral information in the classification process.

2.

3. The problem with multi-spectral classification is that no spatial information on the image has been utilized. In fact, that is the difference between human interpretation and computer-assisted image classification. Human interpretation always involves the use of spatial information such as texture, shape, shade, size, site, association etc. While the strength of computer technique lag is on the handling of the grey-level values in the image, in terms of making use of spatial information, computer technique lag for behind. Therefore, it is an active field in image understanding (which is a subfield of pattern recognition, or artificial intelligence to make use of spatial patterns in an image).

We can summarize three general types of non-conventional classification:

Preprocessing approach,

Post processing approach, and

Use of contextual classifier.

Diagram 1 shows the procedures involved in a preprocessing method. The indispensable part of a preprocessing classification method is the involvement of spatial-feature extraction procedures.

Thanks to the development in the image understanding field, we are able to use part of the spatial information in image classification. Overall, there are two types of approaches to make use of spatial information.

- Region-based classification (object-based)

- Pixel window-based classification

Object-based classification

In order to classify objects, one has to somehow partition the original imagery. This can be done with image segmentation techniques that have been introduced previously, such as thresholding, region-growing and clustering.

The resultant segmented image can then be passed on to the region extraction procedure, where segments are treated as a whole object for the successive processing.

For instance, we can generate a table for each object as an entity table. From the entity table, we can proceed with various algorithms to complete classification, or prior to classification, we may do some preprocessing, such as filtering out some small objects.

We may have to base our classification decision on some neighbourhood information. Gong and Howarth (1990) have developed a knowledge-based system to conduct a region-based (object-based) classification.

4. Pixel-window based classification

In a pixel-window based classification, a labelling decision is made for one pixel according to the multi-spectral data. This data contains information on not only the pixel but also its neighbourhood.

A pixel window can be of any size, as long as it does not exceed the size of an image. For computational simplicity, however, odd-sized squares are used.

The grey-level variability within a pixel window can be measured and used in a classification algorithm. The grey-level variability is referred to as texture (Haralick, 1979). The following is some commonly-used texture measures:

(1) Simple statistics transformation

For each pixel-window, we can calculate parameters as in Table 1 (Hsu, 1978; Gong and Howarth, 1993).

TABLE 7.4. STATISTICAL MEASURES USED FOR SPATIAL FEATURE EXTRACTION

Feature Code Full Name Mathematical Description

AVE Average

STD Standard Deviation

SKW Skewness

KRT Kurtosis

ADA Absolute Deviation from the Average

CCN Contrast Between the Center Pixel and its

Neighbors

ACN Average Difference

Between the Center Pixel and its Neighbors

CAN Contrast Between Adjacent Neighbors

CAS Sum of the Squared CAN

CSN Contrast Between the Second Neighbors

CSS Sum of the Squared CSN

RXN Range MED Median ______________________________________________________________________

Pixel value at location

Value for the center pixel

Values for a pair of adjacent pixels

Values for a pair of every second neighbors

Number of pixels in the window

Number of pairs of adjacent neighbors

Number of pairs of every second neighbor

(2) Grey-level co-occurrence matrix method (used to characterize textures)

The matrix is determined by enumerating all possible combination of two grey-levels of pairs of pixels in a pixel window. These pixel-pairs are defined by their distance (D) and angle (a).

From the grey-level co-occurrence matrix, one can generate a number of parameters. (Haralick et al 1973): These include,

Homogeneity

Contrast

Entropy, etc.

Although these methods have been used in many remote sensing applications, they require a large amount of computation and disk space. There are so many parameters that need to be determined, such as size of pixel-window, distance, angle, statistics, etc.

Most of these spatial features can be categorized into two groups. The first group of spatial features is similar to an average filtered image. The second group is similar to an edge-enhanced image.

The simplest example for the post-processing contextual classification is through filtering such as majority filtering.

(3) Majority filter

(4) Grey-level vector reduction and frequency-based classification.

After testing a number of pixel-window based contextual classification algorithms, Gong and Howarth (1992a) found that most of these algorithms either required too much computation, or did not significantly improve classification accuracies when they were applied to the classification of SPOT HRV XS data acquired over an urban area. They developed a procedure called a grey-level vector reduction and a frequency-based classification which was tested using the same SPOT data set and some other data sets, such as TM data and CASI (7.5 m x 7.5 m spatial resolution) data. The results proved that the frequency-based classification method could save a significant amount of computation while achieving high classification accuracies.

Chapter 7

References

Chen, Q., and others, 1989. Remote Sensing and Image Interpretation. Higher Education Press, Beijing, China, (In Chinese).

Gong P. and P.J. Howarth, 1990a. Land cover to land use conversion: a knowledge-based approach, Technical Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing, Denver, Colorado, Vol. 4, pp.447-456.

_____, 1990b. An assessment of some factors influencing multispectral land-cover classification, Photogrammetric Engineering and Remote Sensing, 56(5):597-603.

_____, 1990c. Impreciseness in land-cover classification: its determination, representation and application. The International Geoscience and Remote Sensing Symposium, IGARSS '90, pp. 929-932.

_____, 1992a. Frequency-based contextual classification and grey-level vector reduction for land-use identification. Photogrametric Engineering and Remote Sensing, 58(4):421-437.

_____, 1992b. Land-use classification of SPOT HRV data using a cover-frequency method. International Journal of Remote Sensing, .

_____, 1993. An assessment of some small window-based spatial features for use in land-cover classification, IGARSS'93, Tokyo, August 18-22, 1993.

Gonzalez, R. C., and P. Wintz, 1987. Digital Image Processing, 2nd. Ed., Addison-Wesley Publishing Company, Reading, Mass.

Haralick, R. M., 1979. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786-804.

Haralick, R. M., Shanmugan, K. and Dinstein, I., 1973. Texture features for image classification. IEEE Transactions on System, Man and Cybernetics, SMC-3(6):610-621.

Hsu, S., 1978. Texture-tone analysis for automated landuse mapping. Photogrammetric Engineering and Remote Sensing, 44(11):1393-1404.

Jensen, J.R., 1983. Urban/Suburban Land Use Analysis. In R.N. Colwell (editor-in-chief), Manual of Remote Sensing, Second Edition, American Society of Photogrammetry, Falls Church, USA, pp. 1571-1666.

Lillesand, T. M., and R. W. Kiefer, 1994. Remote Sensing and Image Interpretation. 3rd Edition, John Wiley and Sons, New York.

Peddle, D., 1991. Unpublished Masters Thesis, Department of Geography, The University of Calgary.

Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag, Berlin.

Rosenfield, G. H., and K. Fitzpatrick-Lins, 1986. A coefficient of agreement as a measure of thematic classification accuracy. Photogrammetric Engineering and Remote Sensing, 52(2):223-227.

Story, M. and R. G. Congalton, 1986. Accuracy assessment, a user's perspective. Photogrammetric Engineering and Remote Sensing, 52(3):397-399.

Swain, P. H., and S. M. Davis (editors.), 1978. Remote Sensing: The Quantitative Approach. McGraw-Hill, New York.

Yen, J., 1989. Gertis: a Dempster-Shafer approach to diagosing hierarchical hypotheses. Communications of the ACM. 32(5):573-585.

Further Readings

Ball, G. H., and J. D. Hall, 1967. A clustering technique for summarizing multivariate data. Behavioral Science, 12:153-155.

Bezdek, J.C., R. Ehrlich & W. Fall, 1984, FCM: the fuzzy c-means clustering algorithm, Computers and Geoscience, 10:191-203.

Bishop, Y. M. M., S. E. Feinberg, and P. W. Holland, 1975. Discrete Multivariate Analysis - Theory and Practice. The MIT Press, Cambridge, Mass.

Chittineni, C. B., 1981. Utilization of spectral-spatial information in the classification of imagery data. Computer Graphics and Image Processing, 16:305-340.

Cibula, W. G., M. O. Nyquist, 1987, Use of topographic and climatological models in geographical data base to improve Landsat MSS classification for Olympic national park. Photogrammetric Engineering and Remote Sensing, 53(1):67-76.

Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, Vol. 20, No. 1, pp. 37-46.

Congalton, R. G., and R. A. Mead, 1983. A quantitative method to test for consistency and correctness in photointerpretation. Photogrammetric Engineering and Remote Sensing, 49(1):69-74.

Conners, R. W., and C. A. Harlow, 1980. A theoretical comparison of texture algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(3): 204-222.

Fleiss, J. L., J. Cohen, and B. S. Everitt, 1969. Large sample standard errors of Kappa and weighted Kappa. Psychological Bulletin, Vol. 72, No. 5, pp. 323-327.

Fu, K. S and Yu, T. S., 1980. Spatial Pattern Classification Using Contextual Information, Research Studies Press, Chichester, England.

Fung, T., and E. F. LeDrew, 1987. Land cover change detection with Thematic Mapper spectral/textural data at the rural-urban fringe. Proceedings of 21st Symposium on Remote Sensing of Environment, Ann Arbor, Mi., Vol. 2, pp.783-789.

_____, 1988. The determination of optimal threshold levels for change detection using various accuracy indices. Photogrammetric Engineering and Remote Sensing, 54(10):1449-1454.

Gong, P., D. Marceau, and P. J. Howarth, 1992. A comparison of spatial feature extraction algorithms for land-use mapping with SPOT HRV data. Remote Sensing of Environment. 40:137-151.

Gong, P., J. R. Miller, J. Freemantle, and B. Chen, 1991. Spectral decomposition of Landsat TM data for urban land-cover mapping, 14th Canadian Symposium on Remote Sensing, pp.458-461.

Ketting, R. J., and Landgrebe, D. A., 1976. Classification of multispectral image data by extraction and classification of homogeneous objects. IEEE Transactions on Geoscience and Electronics, GE-14(1):19-26.

Landgrebe, D. A. and E. Malaret, 1986. Noise in remote sensing systems: the effects on classification error. IEEE Transactions on Geoscience and Remote Sensing, GE-24(2):

8. Integrated Analysis of Multisource Data

8.1 Introduction to Multi-Source Spatial Data

1. Spatial Data

Any data with a locational aspect associated are spatial data. In real life, we often ask the question of where. Where is the bus stop? Where is the post office? To know where is a major part of human life. In our computerized information society, most of the questions of where can be answered in a computer system. However, we are not satisfied with knowing where about something, we may need to know how things at a specific location are related. We want to use what is known to infer those unknown aspects, those unknown locations.

From highly civilized urban areas to areas where human kind is sparsely populated, spatial data play an important role in our modern society. In this chapter, we will focus on our natural environment where different types of natural resources, land covers and uses, and accessibility often concern us. To find out what it is about a particular location, one would read a map. As a surveyor or cartographer, it is our job to make such a map. Traditionally, one has go to the field (a particular place) to measure the location, and record what exists. This is the traditional survey and map approach. A second approach is to use aerial photography and remote sensing techniques, these are the techniques developed since World War I. As the technology advances, we observe a revolutionary leap in instruments and associated data processing techniques. Now satellite based technology is occupying an important position in the geomatics field. We begin with asking the following questions:

• What are spatial data?

• What are the general approaches to spatial data collection?

• What is the current status of spatial data acquisition technology?

• What will be achieved in the near future in spatial data acquisition?

_________

Image, is a medium for communication. High Resolution TV is the tool for communication.

Computer provides us the processing power.

Telecommunication is the tool.

Think about FAX machines and modem.

2. Spatial Data Collection

• How spatial data are collected?

First hand

Second hand - digitizing from maps.

To know how spatial data are collected, helps us to appreciate the possible level of errors or uncertainties involved in the data collection process.

• In what forms are spatial data collected? How is spatial sampling done?

· Random collection

· Systematic sampling or complete coverage

· Other hybrids of the first two

One needs to determine the density of sampling, obviously, the denser one collects data, the more likely one would represent the reality.

The density of sampling is a function of a number of factors,

(1) the complexity of the phenomena,

(2) the capability of the measuring tools

(3) the available accuracy requirement

(4) economic considerations

At most of the time, we tend to use second-hand spatial data, i.e., currently available data and they are often in map forms.

• How are maps made?

For thematic maps,

(1) Manually · Base map preparation

· Thematic data transfer (from survey, aerial photographs remote sensing images) on to the base map

· Interpolation or extrapolation may be needed

· Classification, generalization, symbolization and decoration

· Layer separation and printing

(2) Computer-assisted · Base map preparation

· Geometric transformation (include interpolation)

· Data conversion, classification, generalization

· Legend design and decoration

· Printing

For base maps,

· Select proper map projection

- preserving area

- preserving length

- preserving direction

· Data transfer

· Interpolation or extrapolation

· Generalization, symbolization and decoration

· Printing

Reference: Robinson, Elements of Cartography

• What is a thematic map?

• What is a base map?

• What is a reference map?

• What is a topographic map?

3. Types of Spatial Data

According to geometrical properties

• Positional

• Linear

• Areal

• Volumetric

According to thematic entities

• Natural resources, forest, geological, lithological, agricultural, climatic

• Man-made

• Municipal

• Cadastral, etc.

4. Scale of Measurement In Spatial Data

• Nominal

• Ordinal

• Interval

• Ratio

5. Multi-Source Data Analysis

• Map overlay, for a particular location, collects all the necessary data so as to derive useful information.

• Similar to decision making in ordinary life, one needs to accumulate evidences in order to arrive a decision, in multi-source data analysis, each piece of evidence recorded in the data will be evaluated to validate certain hypothesis.

• It is the objective of this chapter to examine a number of schemes for integrated analysis of spatial data. Algorithms developed in pattern recognition and artificial intelligence can be used.

8.3 Integrated Analysis of Multi-Source Data

In daily life, we use our sensing organs and brain to recognize things and then make decisions and take actions. Our sensing organs include eyes, ears, nose, tongue and skin touch. The first three are our remote sensors. Our sensors pass scene, sound, smell, tastes and feeling to our brains, our brains

process the evidences collected by different sensors and analyze them and then compare with things in our memory that have been recognized before to see if based on the data collected we can recognize (label) the newly detected thing as one of the things which has been recognized before. If the recognized thing is a tree in our way, our brain may decide to go around it. In an increasingly competitive society, in order to make optimized decisions, we have to make best use of all the evidences that are available to arrive an accurate recognition. In our daily life, we experience thousands of processes like this, evidence collection - evidence analysis - decision making - action taking. For example, our eyes cannot resolve details either from too far away or due to their sizes being too small. This has been made possible with the help of a telescope and a macroscope.

We cannot see in the spectral ranges outside the visible spectral wavelength region, various detectors sensitive to different non-visible regions can record images for us to see as if our eyes were sensitive to those spectral regions. In spatial data handling, our brains cannot memorize exactly the location and spatial extent that certain phenomenon occupies, electro-magnetic media can be used to do so. The evidence volume is so large that our brain can only process a very small amount of it. Therefore, we need to use computers to assist us to do so. In this chapter, we examine some of the techniques that can be used in computer assisted handling of various spatial evidence, especially integrated analysis of spatial evidence from multiple sources, such as from field survey, remote sensing and/or existing map sources.

Data integration: integrate spatial data from different sources for a single application. What types of application are we referring to?

One problem in data integration is:

incompatibility between spatial data sets, in the following aspects:

• data structures

• data types

• spatial resolutions

• levels of generalization

- Data structures Raster vs.Vector

Discrepancies in concepts of spatial representation

cell ¨ææÆ object

Location (i, j) {(xi, yi)}

Entity/Attribute Incomplete/Broken Complete

Being Represented

Ease of representing Discrete

continuous

Phenomena Phenomena

More flexible

Level of Generalization Low High

Communication Hard Easy

Storage Large amount Less

_________

Is overlay of digital files a data integration method?

Yes, a very preliminary one. Given two data sets A = {(x, y) : z}

B = {(x, y) : u} AUB = {(x, y) : z, u}

It is more or less a data accumulation.

Five types of models,

PM Point Model : (x, y, z, ...)

GM Grid Model : (i, j, z, ...) Æi, Æj

LM Line Model : ({x, y}, z, ...)

AM Area Model : ({x, y}, z, ...)

CM Cartographic Model: Traditional meaning {PM, LM, AM}

Now {PM, GM, LM, AM}

_________

* An important extension, 3rd spatial dimension and the temporal dimension.

* Discussion:

Do PM, LM, AM involve scale as their individual components?

No, data acquisition error and processing error are involved.

Only CM involves scale.

* Scale, generalization, error and uncertainty are so much interrelated that deserve some conceptual clarification.

Models for converting different data models

• Aggregation

(1) Point Æ Surface Interpolation

(2) Grid Æ Larger Majority rule; composite rule based on statistics

From low generalization level to higher levels

(3) Line Æ Simplified line

(4) Area Æ Simplified area, or point

_________

Comment on why do we need (3)?

• Disaggregation

Boundaries Æ Probability surfaces

_________

(1) Mark and Csillag's model (1989)

Homogeneity is broken only at the boundaries.

(2) Goodchild et al. (1992) spatial autoregressive model

e = {ei} ei random number obey (0, S2)

X' = r W X + e

X' = {x'i} x'i R

X = {xi} xi {0, 1} or {A, B, ..., }

r is a spatial dependence factor,

W is an N x N weight matrix of interactions between pixels

_________

The problem is that do we need disaggregate our data? What is the uncertainties involved in the disaggregation process?

8.4 A Review of Probability Theory

Let W denote a finite collection of mutually exclusive statements about the world. By e = 2W we denote the set of all events. An empty set f, a subset of every set by definition is called the impossible event, since the outcome of a random selection can never be an element of f. On the other hand, the set W itself always contains the actual outcome, therefore it is called the certain event. If A and B are events, then so are the union of A and B, A»B, and the complements of A( ) and B( ) , respectively. For example, the event A»B occurs if and only if A occurs or B occurs. We call the pair (W, e) the sample space. Define a function P: e Æ [0, 1] to be a probability if it could be induced in the way described, i.e. if it satisfies the following conditions which are well known as the Kolmogorov axioms,

(1) P(A) ³ 0 for all A W

(2) P(W) = 1

(3) For A, B W , from A«B = f follows

P(A»B) = P(A) + P(B)

P(A) or P(B) is known as the prior probability of A or B occurring. The prior probability of an event is not conditioned on the occurrence of any other event. Suppose it is noted in an experiment that for 1² i

² n, the event Ai occurred ki times. Then under the conventional evaluation, called maximum likelihood evaluation:

Pm(Ai) =

but under an alternative evaluation, called the Bayesian evaluation:

Pb(Ai) =

under this evaluation, we implicitly assume that each event has already occurred once even before the experiment commenced. When Ski Æ • ,

Pm(Ai) = Pb(Ai)

Nevertheless,

0 < Pb(Ai) < 1 .

Let P(A‘B) denote the probability of even A occurring conditioned on event B having already occurred. P(A‘B) is known as the posterior probability of A subject to B, or the conditional probability of A given B.

For a single event A, A W the following hold

P(A) ² 1

P( ) = 1 - P(A)

For A B and A, B W, P(A) ² P(B) (monotonicity)

P(A»B) ² P(A) + P(B), (subadditivity)

P(B« ) = P(B) - P(A), (subtractivity)

P(A»B) = P(A) + P(B) - P(A«B)

Finally, for a number of events {Ai‘i=1, ..., n }

= S1 - S2 + S3 - S4 + ... + (-1)n-1 Sn

where S1 =

S2 =

S3 =

Sn = P(A1, A2, ..., An)

For conditional probability, P(A‘B), A, B W and P(B) > 0, define,

P(A‘B) = .

We then have

P(A1«A2« ... «An) =

= P(A1) · P(A2‘A1) · P(A3‘A1«A2) ...

where Ai W

If Ai Œ (e, W) for i = 1, ..., n , and

Ai«Aj = f , for i _ j and Ai = W

and P(Ai) > 0 the for any given B Œ (e, W)

P(B) =

_________

This is the complete probability for event B. Therefore the conditional probability can be written as

P(Ai‘B) =

This is the Bayes formula. A number of different versions of this formula will be discussed.

As P( ) = 1 - P(A) and P(A‘B) = ,

it can be derived that

P( ‘B) = 1 - P(A‘B)

Definition: Prior odds on event A is

O(A) = .

Since P( ) = 1 - P(A) , O(A) =

Therefore P(A) can be represented by its prior odds:

P(A) =

Definition: Posterior odds on event A conditioned on event B is

O(A‘B) = .

Similarly,

O(A‘B) = and thus

P(A‘B) =

Assume event A is a hypothesis h and event B is a piece of evidence e, with the definition of conditional probability, the following hold:

P(h‘ ) = ,

P( ‘ ) = ,

P(h‘e) = , and

P( ‘e) =

The odds on h conditioned on e being absent is obtained by:

O(h‘ ) = =

= · O(h)

This is called an odds likelihood formulation of the Bayes theorem. Depending on the context, the following expressions can be used synonymously: e does not occur, e is absent, e does not exist and e is false.

Similarly,

O(h‘e) = · O(h) .

This called an odds likelihood formulation of the Bayes theorem. The following expressions can be used synonymously: e occurs, e is present, e exists and e is true.

For a hypothesis, supported by multiple pieces of evidences, by generalizing the above, we have

O(h‘ 1« 2« ... « k«ek+1« ... «em) = · O(h)

when all evidences are mutually independent on h and ,

O(h‘ 1« 2« ... « k«ek+1« ... «em) = · · O(h)

8.5 Application Of The Probability Theory

In logical expression, when e implies h, that is, e Æ h, can be alternatively read as 'e is sufficient for h' or as 'h is necessary for e'. There is no ambiguity between e and h, i.e., the reliability is 100%. However, in reality, the reliability of e in support of h is lower than a logical implication.

8.5.1. Necessity and Sufficiency Measures

An evidence e can usually be in two states: absent or present when P(e) = 0 or P(e) = 1, it is of no practical interest. Either way there is nothing to observe. For h, it is the same. Therefore, we shall assume 0 < P(e) < 1 and 0 < P(h) < 1.

To study the necessity and sufficiency measures of e for h, we need to explore the influence that a state of e has on h. If the state of e makes h more plausible, we say that the state of e encourages h. If it makes h less plausible, we say that the state of e discourages h. If it neither encourages nor discourages h, then the state of e has no influence on h, or e and h are independent of each other.

For the necessity measure, we first explore how the absence of e influences h. From O(h‘ ) = · O(h)

we define N =

0 ² N ² •

Similarly, we have

For S = • P(h‘e) = 1 , eÆ h \ e is sufficient for h

1 < S < • P(h‘e) > P(h) , e encourages h

S = 1 No influence

0 < S < 1 P(h‘e) < P(h) , e discourages h

S = 0 P(h‘e) = 0 , eÆ , Æ e \ e is sufficient for .

From the above analysis, it is clear that N and S are the measures for necessity and sufficiency, respectively. N, S and O(h) needed to evaluate O(h‘ ) and O(h‘e) are provided by domain experts. Quite often, instead of directly supply N and S domain experts may supply values of P(e‘h) and P(e‘ ). This implies that observing evidential probabilities under a certain hypothesis h or .

N = =

S = .

8.5.2. Posterior Probability Estimation

In the above section, it has been explained that in order to determine the necessity and sufficiency measures N and S. The posterior probabilities such as P(e‘h) and P(e‘ ) are provided by domain experts. Sometimes, the system engineer may have to participate in the process of determining P(e‘h) and P(e‘ ) as will be explained in later part of this lecture (e.g., classification of land-use/cover types from remotely sensed images).

In spatial handling, domain experts may provide us the spatial data required or we are requested to collect further data from sources such as remote sensing images. Domain experts may also provide us their knowledge on where a specific hypothesis has been validated. It might be our responsibility to transform this type of knowledge into a computer system. The processes of collecting and encoding of expert knowledge is called knowledge acquisition and knowledge representation, respectively. While various complex computer structures for knowledge representation may be used, relatively simple procedures such as use of parametric statistical models or non-parametric look-up tables are often used. For the parametric method, a further readings is Richards (1986). For the non-parametric approach, refer to Duda and Hart (1973). Remote sensing image classification can be considered as a process of hypothesis test in which remotely sensed data are treated as evidences and a number of classes represent a list of hypotheses. In remote sensing image classification the equivalent of processes of knowledge acquisition and representation is supervised training (Gong and Howarth, 1990; and Gong and Dunlop, 1991).

8.5.3. Maximum Likelihood Decision Rule Based on Penalty Functions

In a classification problem, we are given a data set X = { xi‘ƒ i = 1, 2, ... , N } xi being a vector is considered as a piece of evidence. It may support a number of classes (hypotheses) H = {hj‘j = 1, 2, ... , M } . To develop the general method for maximum likelihood classification, the penalty function or the loss function is introduced:

l(j‘k) , j, k = 1, ... , M .

This is a measure of the loss or penalty incurred when a piece of evidence is supporting class hj when in fact it should support class hk. It is reasonable to assume that l(j‘j) = 0 for all j. This implies that there is no loss for an evidence supporting the correct class. For a particular piece of evidence xi, the penalty incurred as xi being erroneously supporting hj is:

l(j‘k) · p(hk‘xi)

where p(hk‘xi) is as before the posterior probability that hk is the correct class for evidence xi. Averaging the penalty over all possible hypotheses, we have the average penalty, called the conditional average loss, associated with f evidence xi erroneously support class hj. That is:

L(hj) =

L is a measure of the accumulated penalty incurred given the evidence could have supported any of the available classes and the penalty functions relating all these classes to class hj.

Thus a useful decision rule for evaluating a piece of evidence for support of a class is to choose that class for which the average loss is the smallest, i.e.,

xi encourages hj, if L(hj) < L(hk) for all

This k_ j is the algorithm that implemented Bayes' rule. Because p(hk‘xi) is usually not available, it is evaluated by p(xi‘hk), p(hk) and p(xi)

p(hk‘xi) =

Thus

L(hj) =

l(j‘k)'s can be defined by domain experts.

A special case for l(j‘k)s is given as follows:

Suppose l(j‘k) = 1 - Fjk with Fjj = 1 and Fjk to be defined. Then from the above formula we have

L(hj) = · -

= 1 -

The minimum penalty decision rule has become searching the maximum for g(hj) which is

.

Thus the decision rule is

xi encourages hj, if g(hj) > g(hk) fall all k_ j

If Fjk = djk , the delta function, i.e.,

djk =

g(hj) is further simplified to

and thus the decision rule becomes

xi encourages hj if p(xi‘hj)p(hj) > p(xi‘hk)p(hk) for all j_k.

This is the commonly-used maximum likelihood decision rule.

8.6 Introduction To Fuzzy Set Theory

Fuzzy set is a "class" with a continuum of grades of membership (Zadeh, 1965). More often than not, the classes of objects encountered in the real physical world do not have precisely defined criteria of membership. For example, the "class of all real numbers which are much greater than 1", or the "class of beautiful cats", do not constitute classes or sets in the usual mathematical sense of these terms. However, the fact remains that such imprecisely defined "classes" pay an important role in human thinking, particularly in the domain of patterns recognition, and abstraction

8.6.1. Ordinary Set

Let W, a non-empty set, be the formal basis of our further exertions. Set W is often called the universe of discourse or frame of discernment. Our focus is primarily on finite sets. In such cases, the number of elements in W, its cardinality is abbreviated by ‘W‘. Any element in W is denoted by w.

For a specific w Œ W, $ set A which makes either w Œ A or w œ A. This is the basic requirement in ordinary set theory.

Set A is denoted by A = {w1, w2, ... , wn} , wi is the ith element of set A. When elements in A cannot be explicitly listed, A is denoted by { w‘ .... }. The later part in the brackets is a description to those elements which is included. In general,

A = { w‘A(w) true } ,

where A is a function of w.

Given A, B defined on W, if for any w Œ W we have w Œ A Þ w Œ B, then

A B

If A B and B A, then A = B

Any A defined on W is called a subset of W

A W .

An empty set is one that does not contain any element in W. An empty set is denoted as f.

Any A on W , f A W .

A discussed so far is called a single element set. When any A W becomes an element of another set U, U is also a set, it is sometimes called a set class. All the set classes for W becomes 2W . For

instance, if W = {black, white} then 2W = {{black, white}, {black}, {white}, f} . In fact, sets defined on W could be a set class. Therefore, a set A defined on W is sometimes denoted as A Œ 2W .

8.6.2. Logical Operations of Ordinary Sets

Definition 1. Given A, B Œ 2W ,

A » B = {w‘w Œ A or w Œ B},

A « B = {w‘w Œ A and w Œ B},

= {w‘w œ A},

are called the union of set A and B, the intersection of A and B and the complement of A, respectively. When "»", "«", and "-" operators are used in combination, "-" has higher priority than "»" and "«".

It can be proven that for any W and A, B Œ 2W , the following relationships hold:

( ) = « ,

( ) = » .

These are called De-Morgan's law.

The following are some properties for set arithmetics.

A » A = A , A « A = A

A » B = B » A , A « B = B « A

(A » B)»C = A»(B » C )

(A « B)«C = A«(B « C )

(A « B)»B = B,

(A » B)«B = B

A«(B » C) = (A « B)»(A « C),

A»(B « C) = (A » B)«(A » C)

A » W = W, A « W = A

A » f = A, A « f = f

( ) = A

A » = W

A « = f .

Definition 2. The two denotations

Ai = {w‘wŒW, $ iŒI such that wŒAi }

Ai = {w‘wŒW, $ iŒI such that wŒAi }

are called the union and intersection of set class { Ai‘iŒI } .

I = { 1, 2, ... , n, ... } is called index set.

When I = { 1, 2 }, definition 2 is equivalent to definition 1.

Definition 3.

A - B = {w‘wŒA and wœB } is called the difference set of B for A.

A - B = A «

= » - A

A projection from W to F is defined by:

f : W Æ F .

Projection is the extension to the concept of a function. For any w Œ W, there exists an element j = f(w). w is the original image and f(w) is called the image of w.

W is the definition range for f, and

f(W) = { j‘$ wŒW such that j = f(w) } .

f(W) is called the value range.

If f(W) = F , then f is full projection from W to F.

If for any given w1, w2 Œ W and w1 _ w2, we have

f(w1) _ f(w2) ,

then f is a one to one projection.

Definition 4. Given A Œ 2W , determine a projection from W to { 0, 1 },

XA : W Æ { 0, 1 } such that

XA(w) =

XA is the characteristic function of set A.

The value of the characteristic function of A at w is XA (w). X(w) is called the degree of membership for w in A.

Obviously, when w Œ A, the degree of membership for w belonging to A is 1 indicating that w is absolutely an element of A. When w œ A, the degree of membership becomes 0, indicating that w does not belong to A at all.

8.6.3. Fuzzy Set, Its Definition and Arithmetic Operations

Definition 5. Given a universe of discourse W, a fuzzy set A is defined as:

for any w Œ W, there is a number m Œ [0, 1] which is the degree of membership for w belonging to .

Projection m : W Æ [0, 1] is called the membership function of .

Example, Given W = {a, b, c, d}

if m = 1, m = 0.8 m = 0.4 m = 0 , then is a fuzzy set. If is used to represent the concept of "Circular shape", then m indicates the degrees of circularity of all elements in W.

when W is composed of a finite number of elements, W is called a finite universe of discourse.

A fuzzy set defined on a finite W can be represented by a vector. For instance, the "circular shape" defined on W constitutes a fuzzy set which can be written as

= (1, 0.8, 0.4, 0) .

When there may be confusion between different elements, a fuzzy set may be represented as

= 1/a + 0.8/b + 0.4/c + 0/d ,

where denominators corresponds to elements in W and nominators represent the degrees of membership. "+" is only a separation mark. When the degree of membership is 0, that element can be omitted such as

= 1/a + 0.8/b + 0.4/c ,

we may also see in the following form

= {(1, a), (0.8, b), (0.4, c)} .

Example, If age is the universe of discourse, such as W = {0, 1, 2, ..., 200}, the fuzzy sets for "old" and "young" may be defined as

m =

m = .

Although W is a finite set, we can treat it as a continuous range between 0 to 200 to generate the curves for fuzzy sets and .

Definition 6. Given , Œ F(W), where F(W) is the set of all the fuzzy sets defined on W. The membership functions for » , « and are:

m » = max (m , m ) ,

m « = min (m , m ) , and

m = 1 - m , respectively .

If for W = {a, b, c, d}, two fuzzy sets are defined as:

= (1, 0.8, 0.4, 0) for circular shape

= (0.3, 0.4, 0.2, 0) for square shape,

then for circular or square we have

» = (1, 0.8, 0.4, 0)

for circular and square we have

« = (0.3, 0.4, 0.2, 0).

for not circular, we have

= (0, 0.2, 0.6, 1).

8.6.4. Transformations Between Fuzzy Sets and Ordinary Sets

Definition 7. Given Œ F(W), for any l Œ [0, 1]

Al = { w‘m ³ l } is called the l-level cut of .

If any w whose membership value exceeds l is considered as a member of , then fuzzy set becomes ordinary set Al .

For instance, from = (1, 0.8, 0.4, 0) we have

C1 = {0} , C0.5 = C0.8 = {a, b} .

If Œ F(W) , it can be proven that

m = ,

where XAl is the characteristic function for Al . This theorem and the level cut concept are the linkages for conversions between fuzzy sets and ordinary sets.

8.6.5. Fuzzy Statistics

Fuzzy set theory and probability theory are used to handle two different types of uncertainty. We use probability to study random phenomena. Each event itself has distinct meaning and not uncertain. However, due to the lack of sufficient condition the outcome for certain event to occur during a process cannot be determined.

In fuzzy set theory, concept or event itself does not have a clear definition. For example, "tall mean", how tall they are is not defined. Here, whether certain phenomena belong to this concept is difficult to determine. We call it fuzziness the uncertainty involved in a classification due to the imprecise concept definition. The root for fuzziness is that there exists transitions between two phenomena. Such transitions make it possible for us to label phenomena into either this or that class. Fuzzy set theory is the base for us to study membership relationships from the fuzziness of phenomena.

Fuzzy statistics is used to determine estimate the degree of membership or membership function. In order to do so we need to design a fuzzy statistic experiment. In such an experiment, similar to fuzzy statistics, there are four elements:

1. Universe of discourse W ;

2. An element w in W ;

3. An ordinary set A which is varying on the W basis. A is related to a fuzzy set which corresponds to a fuzzy concept. Each time A is fixed, it represents a deterministic definition of the fuzzy concept as its approximation.

4. Condition S which contains all the objective and subjective factors that are related to the definition of the fuzzy concept and therefore is a constraint of the variation of A.

The purpose of fuzzy statistics is to use a deterministic approach to study the uncertainties. The requirements for a fuzzy statistical experiment is that in each experiment a deterministic decision on whether w belongs to A. Therefore, in each experiment, A is a definite ordinary set. In fuzzy statistical experiments, w is fixed while A is changing.

In n experiments, calculate the membership frequency of w belonging to fuzzy set , denoted by f

f =

As n increases, f may stabilize. The stabilized membership frequency is the degree of membership for w belonging to . We call fuzzy statistics involving more than one fuzzy concepts, multi-phase fuzzy statistics.

Definition 8. Given Pm = { 1 , ..., m} Ai Œ F(W), i = 1, ..., m, this type of experiments is m-phase fuzzy statistical experiments, provided that in each experiment we can determine a projection such that

e : W Æ Pm .

Each fuzzy set in Pm is one phase of Pm.

The results of multi-phase fuzzy statistics enable us to obtain a fuzzy membership function for each phase on W. They have the following properties:

m 1(w) + m 2(w) + ... + m m(w) = 1

If W = {w1, w2, ... , wn} is a finite universe of discourse we have

m 1(wj) + ... + m m(wj) = n .

8.6.6. Fuzzy Relation

An important concept needed in fuzzy set theory is that of a fuzzy relation which generalizes the conventional set-theoretic notion of relation. Let W1 and W2 be two universes. A fuzzy relation – has the membership function mR : W1 x W2 Æ [0, 1]. The projection of – on W1 is the marginal fuzzy set

m = sup {m–(w1, w2)‘w2 Œ W2}

for all w1 Œ W1 . If 1 is a fuzzy set on W1 the m 1 can be extended to W1 x W2 by

m = m 1(w1)

for all (w1, w2) Œ W1 x W2 .

Based on the above introduction, it can be seen that a fuzzy relation in R, the real number space is a fuzzy set in the product space R x R. For example, the relation denoted by x >> y, x, y Œ R' may be regarded as a fuzzy set in R2 with the membership function of , f having the following values:

f = 0 ;

f = 0.7 ;

f = 1 ; etc.

8.6.7. Possibility Distribution

Let wo be an unknown value ranging over a set W, and let the piece of imprecise information be given as a set E, i.e., wo Œ E is known for sure and ‘E‘ ³ 2. If we ask whether another set A contains wo, there can be two possible answers:

if A « E = f then it is impossible that wo Œ A

if A « E _ f then it is possible

Formally, we obtain a mapping

PossE : 2W Æ [0, 1] , PossE(A) =

where 1 indicate "possible" and 0 "impossible".

When E becomes a fuzzy set , we define

Poss : 2W Æ [0, 1]

Poss = sup {a‘A « Ea _ f, a Œ [0, 1]}

= sup {m ‘w Œ A}

Hence given a fuzzy set the small positive integer

= (1, 1, 0.8, 0.6, 0.4, 0.2) .

Given A = {3} , the possibility is 0.8

A = {x‘x ² 3} Poss = 1 .

Possibility tells us about the possibility of "not A", hence the necessity of the occurrence of A,

Nec = 1 - Poss .

8.6.8. Algebraic Operations on Fuzzy Sets

In addition to the operations of union and intersection, one can define a number of other ways of forming combinations of fuzzy sets and relating them to one another.

Algebraic product: Given and the algebraic product of and denoted by is defined in terms of the membership functions of and ,

f = f · f

This indicate that « .

The algebraic sum of and denoted by + is defined by

f + = f + f

provided that 0 ² f + f ² 1

Convex combination of and with an arbitrary fuzzy set denoted by ( , ; ) is defined by

( , ; ) = +

written out in terms of membership functions

f( , ; ) (w) = f · f + (1 - f ) · f

A basic property of the convex combination of , and is expressed by

A « B (A, B ; L) A » B

Given any fuzzy set satisfying « » , one can always find a fuzzy set such that

= ( , ; ) .

In fact,

f = for w Œ W .

8.6.9. A Proposed Procedure for Use of Fuzzy Set Theory in Integrated Analysis of Spatial Data

The problem,

Given spatial data E = {e1, e2, ... , em} from m different sources S1, S2, ... , Sm, one wishes to decide which hypothesis among n of them H = {H1, H2, ... , Hn} is most likely to happen. Or in a classification problem, one wishes to decide which class among n classes {C1, C2, ... , Cn} is the most appropriate one into which E to be classified. Formally stated, one wishes to find out a projection F such that

F : S1 x S2 x ... x Sm Æ H

which satisfies

(1) 0 ² FHj(E) ² 1 for j = 1, 2, ... , n

(2) FHj(E) = 1 .

It requires relatively deep mathematical knowledge to determine a projection from the Cartesian product space S1 x S2 x ... x Sm to H, interested reader may find Kruse et al. (1991) a starting point. This may be relaxed by finding a projection between each source Si to H.

Therefore,

One may follow the steps listed below to solve the problem posed.

Step 1. Consider each element in H fuzzy set j, j = 1, 2, ... , n. Determine the fuzzy membership function on each source Si, i = 1, 2, ... , m for each Hj, j = 1, 2, ... , n. Thus a total of m x n membership functions need to be found. Usually, expert knowledge or fuzzy sets.

Step 2. Combine evidences from different sources to validate hypotheses or to conduct classification. Fuzzy set operations including union, intersection, complement and algebraic operation can be used for such purposes.

Step 3. Compare combined degree of membership for each hypothesis (class), confirm the hypothesis with the highest degree of membership.

Gong (1993) and a fuzzy classifier in a forest ecological classification research (Crain et al., 1993) are all following this procedure. It needs to be further validated. The assumption here is obviously each hypothesis is independent to the other.

8.7 Introduction To Neural Networks

Similar to the earlier part of this course, our interests are still focused on the problem that given a piece of evidence e Œ E test the hypothesis that e validates h Œ H. Transform this into classification or pattern recognition, we would like to have an algorithm or a system that is capable of classifying or recognizing a given set of observations and label it into a class or a pattern. We would like the system or the algorithm learns from observations of patterns that are labelled by class and then is able to recognize unknown patterns and properly label them with output of class membership values.

One of the most exciting developments during the early days of pattern recognition was the perception, the idea that a network of elemental processors arrayed in a manner reminiscent of biological neural nets might be able to learn how to recognize or classify patterns in an autonomous manner. However, it was realized that simple linear networks were inadequate for that purpose and that non-linear networks based on threshold-logic units lacked effective learning algorithms. This problem has been solved by Rumelhart, Hinton and Williams [1986] with a generalized delta rule (GDR) for learning. In the following section, a neural network model based on the generalized delta rule is introduced.

8.7.1. The Generalized Delta Rule For the Semilinear Feed Forward Net With Back Propagation of Error

The architecture of a layered net with feed forward capability is shown below:

In this system architecture, the basic elements are nodes " " and links "Æ". Nodes are arranged in layers. Each input node accepts a single value. Each node generates an output value. Depending on the layer that a node is located, its output may be used as the input for all nodes in the next layer.

The links between nodes in successive layers are weight coefficients. For example wji is the link between two nodes from layer i to layer j. Each node is an arithmetic unit. Nodes in the same layer are independent of each other, therefore they can be implemented in parallel processing. Except those

nodes of the input layer i, all the nodes take the inputs from all the nodes of the layer and use the linear combination of those input values as its net input, or a node in layer j, the net input is,

uj = .

The out of the node in layer j is

Oj = f (uj)

where f is the activation function. It often takes the form as a signoidal function,

Oj =

qj serves as a threshold or bias. The effect of a positive qj is to shift the activation function to the left along the horizontal axis. The effect of qo is to modify the shape of the signoid. These effects are illustrated in the following diagram

This function allows for each node to react to certain input differently, some nodes may be easily activated or fined to generate a high output value when qo is low and qj is small. In contrary, when qo is high and qj is large a node will have a slower response to the input uj. This is considered occurring in the human neural system where neurons are activated by different levels of stimuli.

Such a feed forward networks requires a single set of weights and biases that will satisfy all the (input, output) pairs presented to it. The process of obtaining the weights and biases is called network learning or training. In the training task, a pattern p = {IPi}, i = 1, 2, ..., ni, ni is the number of nodes in the input layer. IP is the input pattern index.

For the given input p , we require the network adjust the set of weights in all the connecting links and also all the thresholds in the nodes such that the desired outputs p (= {tpk}, k = 1, 2, ..., nk, nk is the number of output nodes) are obtained at the output nodes. Once this adjustment has been accomplished by the network, another pair of input and output, p and p is presented and the network is asked to learn that association also.

In general, the output p = {Opk} from the network will not be the same as the target or designed values p. For each pattern, the square of the error is

Ep = 2

and the average system error is

E = 2 .

where the factor of a half is used purely for mathematical convenience at the later stage.

The generalized delta rule (GDR) is used to determine the weights and biases. The correct set of weights is obtained by varying the weights in a manner calculated to reduce the error Ep as rapidly as possible. In general, different results will be obtained depending on one carries out the gradient search in weight space based on either Ep or E.

In GDR, the determination of weights and thresholds is carried out by minimizing Ep.

The convergency of Ep towards improved values for weights and thresholds is achieved by taking incremental changes Æwkj proportional to - _Ep/_wkj . The subscript p will be omitted subsequently, thus

Æwkz = - h (1)

where E is expressed in terms of outputs Ok, each of which is the non-linear output of the node k,

Ok = f(uk)

where uk is the input to the kth node and it is

uk = (2)

Therefore by the chain rule, is evaluated by

= · (3)

from (2), we obtain

= Oj (4)

Define

dk = - (5)

and thus

Æwkj = hdkOj (6)

dk = - = · (7)

where

= - (tk - Ok) (8)

and

= f'k(uk) (9)

Thus

dk = (tk - Ok) f'k(uk) (10)

\ Æwkj = h(tk - Ok) · f'k(uk) · Oj (11)

For weights that do not directly affect output nodes,

Æwji = - h ·

= - h ·

= - h · Oi

= h Oi

= h f'j(uj) · Oi

= h djOi (12)

However is not directly achievable. Thus it has to be indirectly evaluated in terms of quantities that are known and other quantities that can be evaluated.

- = - ·

= · wkmOm

= · wkj = dk · wkj (13)

\ dj = f'j(uj) · dk · wkj (14)

That is, the deltas at an internal nodes can be evaluated in terms of the deltas at a later layer. Thus, starting at the last layer, the output layer, we can evaluate dk using equation (10). Then we can propagate the "error" backward to earlier layers. This is the process of error back-propagation.

In summary, with the subscript p denoting the pattern number, we have

Æp wji = h dpj Opi (15)

If the j node is in the output layer,

dpj = (tpj - Opj) · f'j(upj) (16)

If the j node is in internal layers, then

dpj = f'j(upj) · · dpk · wkj

Particularly, if

Oj = (18)

then

= Oj(1 - Oj) (19)

and the deltas are

dpk = (tpk - Opk) Opk (1 - Opk)

dpj = Opj (1 - Opj) · dpk wkj (20)

for the output layer and the hidden layer nodes, respectively.

qj are learned in the same manner as are the other weights.

Note that the number of hidden layer can be greater than 1. Although a three-layer network can form arbitrarily complex decision regions, sometimes difficult learning tasks can be simplified by increasing the number of internal layers. Some preliminary assessment of this algorithm was made for ecological land systems classification of a selected study site in Manitoba (Gong et al., 1994).

References

Crain, I.K., Gong, P., Chapman, M.A., 1993. Implementation considerations for uncertainty management in an ecologically oriented GIS. GIS'93, Vancouver, B.C., pp.167-172.

Duda, R. O. and P. E. Hart, 1973. Pattern Classification and Scene Analysis. Wiley and Sons, New York, 482p.

Freeman J.A., D. M. Skapura, 1991. Neural Networks, Algorithms, Applications, and Programming Techniques, Addison-Wesley:New York.

Gong, P., 1993. Change detection using principal component analysis and fuzzy set theory. Canadian Journal of Remote Sensing. 19(1): 22-9.

Gong, P., and D.J. Dunlop, 1991. Comments on Skidmore and Turner's supervised non-parametric classifier. PE&RS. 57(1):1311-1313.

Gong, P. and P. J. Howarth, 1990. Land cover to land use conversion: a knowledge-based approach, Technical Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing, Denver, Colorado, Vol. 4, pp.447-456.

Gong, P., A. Zhang, J. Chen, R. Hall, I. Corns, Ecological land systems classification using multisource data and neural networks, Accepted by GIS'94, Vancouver, B.C., February, 1994.

Goodchild, M.F., G. Sun, S. Yang, 1992. Development and test of an error model for categorical data. International Journal of Geographical Information Systems. 6(2): 87-104.

Kosko, B., 1992. Neural Networks and Fuzzy Systems. Prentice-Hall; Englewood Cliffs, New Jersey.

Kruse R., E. Schwecke, J. Heinsohn, 1991. Uncertainty and Vagueness in Knowledge Based on Systems, Numerical Methods. Springer-Verlag: New York.

Mark D. and Cscillag F., 1989. The nature of boundaries on area-class maps. Cartographica, pp. 65-77.

Pao Y., 1989. Adaptive Pattern Recognition and Neural Networks. Addison-Wesley: Reading, MA.

Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag, Berlin.

Shinghal R., 1992. Formal Concepts in Artificial Intelligence, Fundamentals. Chapman & Hall: New York.

Documents

Remote Sensing and Image Analysia