15
Classification of remote sensing imagery with high spatial resolution Mathieu Fauvel a,b , Jon Aevar Palmason a , Jon Atli Benediktsson a , Jocelyn Chanussot b , and Johannes R. Sveinsson a a Department of Electrical and Computer Engineering, University of Iceland, Hjardarhaga 2-6, 107 Reykjavik, Iceland; b Signals and Images Laboratory – LIS, Grenoble, LIS / ENSIEG – Domaine Universitaire – BP 46 – 38402 Saint-Martin-d’H` eres Cedex, France. ABSTRACT Classification of high resolution remote sensing data from urban areas is investigated. The main challenge in classification of high resolution remote sensing image data is to involve local spatial information in the classifica- tion process. Here, a method based on mathematical morphology is used in order to preprocess the image data using spatial operators. The approach is based on building a morphological profile by a composition of geodesic opening and closing operations of different sizes. In the paper, the classification is performed on two data sets from urban areas; one panchromatic and one hyperspectral. These data sets have different characteristcs and need different treatments by the morphological approach. The approach can directly be applied on the panchro- matic data. However, some feature extraction needs to be done on the hyperspectral data before the approach can be applied. Both principal and independent components are considered here for such feature extraction. A neural network approach is used for the classification of the morphological profiles and its performance in terms of accuracies is compared to the classification of a fuzzy possibilistic approach in the case of the panchromatic data and the conventional maximum likelhood method based on the Gaussian assumption in the case of the case of hyperspectral data. Also, different types of feature extraction methods are considered in the classification process. Keywords: Mathematical Morphology, Urban Areas, Classification, High Resolution Remote Sensing Data, Hyperspectral Data, Panchromatic Data. 1. INTRODUCTION The classification of high resolution urban remote sensing imagery is a challenging research problem. Here, we consider the classification of such data by both considering the classification of panchromatic imagery (single data channel) and hyperspectral images (multiple data channels). Panchromatic images are characterized by a very high spatial resolution. The high spatial resolution allows to identify small structures in a dense urban area. However, the analyze of the scene by considering only the value of the pixel will produce very poor classification compared to the fine resolution; for example it will not be able to distinguish between a pixel belonging to the roof either of a small house or of a large building if both the roofs have the same reflectance. To solve this problem, some local spatial information is needed. An interesting approach to provide such information is based on the theory of Mathematical Morphology, which provides tools to analyze spatial relationship between pixels. Recently, advance mathematical morphology has been successfully applied to geoscience and remote sensing 1, 2 and has been proved its usefulness is remotely sensed images analysis. Hyperspectral urban data not only contain lot of spectral information, covered throughout the data channels, but also spatial information, covered by each individual band. Therefore, a single image, drawn from the data set, does not involve much spectral information, as well as spectral properties of the data set cannot bring Further author information: (Send correspondence to J.A. Benediktsson) J.A. Benediktsson: E-mail: [email protected], Telephone: +354 525 4670 Invited Paper Image and Signal Processing for Remote Sensing XI, edited by Lorenzo Bruzzone Proceedings of SPIE Vol. 5982 (SPIE, Bellingham, WA, 2005) 0277-786X/05/$15 · doi: 10.1117/12.637224 Proc. of SPIE Vol. 5982 598201-1

ClassiÞcation of remote sensing imagery with high spatial ...mistis.inrialpes.fr/people/fauvel/Site/Publication... · ClassiÞcation of remote sensing imagery with high spatial resolution

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Classification of remote sensing imagery with high spatialresolution

    Mathieu Fauvela,b, Jon Aevar Palmasona, Jon Atli Benediktssona, Jocelyn Chanussotb, andJohannes R. Sveinssona

    aDepartment of Electrical and Computer Engineering, University of Iceland, Hjardarhaga 2-6,107 Reykjavik, Iceland;

    bSignals and Images Laboratory – LIS, Grenoble, LIS / ENSIEG – Domaine Universitaire –BP 46 – 38402 Saint-Martin-d’Hères Cedex, France.

    ABSTRACT

    Classification of high resolution remote sensing data from urban areas is investigated. The main challenge inclassification of high resolution remote sensing image data is to involve local spatial information in the classifica-tion process. Here, a method based on mathematical morphology is used in order to preprocess the image datausing spatial operators. The approach is based on building a morphological profile by a composition of geodesicopening and closing operations of different sizes. In the paper, the classification is performed on two data setsfrom urban areas; one panchromatic and one hyperspectral. These data sets have different characteristcs andneed different treatments by the morphological approach. The approach can directly be applied on the panchro-matic data. However, some feature extraction needs to be done on the hyperspectral data before the approachcan be applied. Both principal and independent components are considered here for such feature extraction. Aneural network approach is used for the classification of the morphological profiles and its performance in termsof accuracies is compared to the classification of a fuzzy possibilistic approach in the case of the panchromaticdata and the conventional maximum likelhood method based on the Gaussian assumption in the case of the caseof hyperspectral data. Also, different types of feature extraction methods are considered in the classificationprocess.

    Keywords: Mathematical Morphology, Urban Areas, Classification, High Resolution Remote Sensing Data,Hyperspectral Data, Panchromatic Data.

    1. INTRODUCTION

    The classification of high resolution urban remote sensing imagery is a challenging research problem. Here, weconsider the classification of such data by both considering the classification of panchromatic imagery (singledata channel) and hyperspectral images (multiple data channels).

    Panchromatic images are characterized by a very high spatial resolution. The high spatial resolution allowsto identify small structures in a dense urban area. However, the analyze of the scene by considering only thevalue of the pixel will produce very poor classification compared to the fine resolution; for example it will notbe able to distinguish between a pixel belonging to the roof either of a small house or of a large building ifboth the roofs have the same reflectance. To solve this problem, some local spatial information is needed. Aninteresting approach to provide such information is based on the theory of Mathematical Morphology, whichprovides tools to analyze spatial relationship between pixels. Recently, advance mathematical morphology hasbeen successfully applied to geoscience and remote sensing1, 2 and has been proved its usefulness is remotelysensed images analysis.

    Hyperspectral urban data not only contain lot of spectral information, covered throughout the data channels,but also spatial information, covered by each individual band. Therefore, a single image, drawn from the dataset, does not involve much spectral information, as well as spectral properties of the data set cannot bring

    Further author information: (Send correspondence to J.A. Benediktsson)J.A. Benediktsson: E-mail: [email protected], Telephone: +354 525 4670

    Invited Paper

    Image and Signal Processing for Remote Sensing XI, edited by Lorenzo BruzzoneProceedings of SPIE Vol. 5982 (SPIE, Bellingham, WA, 2005)

    0277-786X/05/$15 · doi: 10.1117/12.637224

    Proc. of SPIE Vol. 5982 598201-1

  • forth spatial information. Consequently, joint spectral/spatial classifier is needed for classification of urbanhyperspectal data.

    Benediktsson et al.8 have proposed the use of extended morphological profiles for hyperspectral urban data,i.e., to build morphological profiles based on more than one image (such as in the panchromatic case) and useseveral principal components for that purpose. Here, both Principal Component Analysis (PCA) and IndependentComponent Analysis (ICA) are used to create base images for morphological profiles.

    The paper is organized as follows. In Section 2, basic definitions of mathematical morphology are reviewed,followed by the discussion of a composition of morphological operators that are used to define the morphologicalprofile which characterizes the structures present in an the image. Experimental results on an IKONOS panchro-matic image are given. In Section 3, the extended morphological profile for hyperspectral data is discussed alongwith a review of PCA and ICA for feature extraction. Classification results are given for DAIS hyperspectraldata. Finally, in Section 4, conclusions are drawn.

    2. MATHEMATICAL MORPHOLOGY

    Mathematical Morphology (MM) is a theory aiming to analyze spatial relationship between pixels. MM wasintroduced by Matheron and Serra in the 1960s to study porous media. Nowadays, several morphologicaloperators are available for extracting structural information in spatial data.3 In the following subsection, somebasic notions of MM are reviewed. Then, concepts of the Morphological Profile and of the Derivative of theMorphological Profile are detailed.

    2.1. Theoretical Notions

    In image analysis, data are represented in discrete space Zn, and an image f is a mapping of a subset Df of Zn,

    f : Df ⊂ Zn → {0, . . . , fmax} (1)

    where fmax is the maximum value of the image. With MM, objects of interest are viewed as a subsets of theimage. Then, several sets of known size and shape (such as disk, square or line) can be used to characterize theirmorphology. These sets are called Structuring Elements (SEs). An SE always has an origin, which generally isits symmetric center. The origin allows the positioning of the SE at a given pixel x of f , i.e., the origin coincideswith x. For binary images (i.e., fmax = 1), MM are mainly based on set operators such as the union, intersection,complementation and translation: SE is positioning on each pixel x and a set operators is applied between theset which x belongs to and SE.

    For grey level images, intersection ∩ of two sets becomes the infimum ∧ and the union ∪ becomes supremum ∨.For two images f and g and a given pixel x: (f∧g)(x) = min[f(x), g(x)] and (f∨g)(x) = max[f(x), g(x)]. We nowgive the definitions of the two fundamental morphological operators, erosion and dilation. More developmentson MM can be found in the literature.3, 4

    Definition 2.1 (Erosion). The erosion �B(f) of an image f by a structuring element B is defined as

    �B(f) =∧

    b∈Bf−b, (2)

    where fb is the translation by vector b of f , i.e., fb(x) = f(x − b).The eroded value at a given pixel x is the minimum value of the image in the window defined by the SE whenits origin is at x. The eroded value shows where the SE fits the objects in the input image.

    Definition 2.2 (Dilation). The dilation δB(f) of an image f by a structuring element B is defined as

    δB(f) =∨

    b∈Bf−b. (3)

    The dilated value at a given pixel x is the maximum value of the image in the window defined by the SE whenits origin is at x. The dilated value shows where the SE hits the objects in the input image.

    Proc. of SPIE Vol. 5982 598201-2

  • Erosion and dilation are dual transformations with respect to the complementation:

    �B(f) = [δB([f ]c)]c, (4)

    where [ ]c is the complementation operator: [f ]c(x) = fmax − f(x). This property shows the dual effect oferosion and dilation. When erosion expands dark objects, dilation shrinks them (and vice-versa for brightobjects). Moreover, bright structures that cannot contain the SE are removed by erosion (similar effects are seenfor dark objects using dilation). Hence, both erosion and dilation are non-invertible transformation. Figure 1shows examples of erosion and dilation. These two operators are the basic tools of MM. The operators that willbe discussed next, opening and closing, are a combination of erosion and dilation.

    Definition 2.3 (Opening). The opening γB(f) of an image f by an SE B is defined as the erosion of fby B followed by the dilation with the SE B∗:

    γB(f) = δB [�B(f)]. (5)

    The idea to dilate the eroded image is to recover most structures of the original image, i.e., structures that werenot removed by the erosion.

    Definition 2.4 (Closing). The closing φB(f) of an image f by an SE B is defined as the dilation of f byB followed by the erosion with the SE B:

    φB(f) = �B [δB(f)]. (6)

    Figure 2 shows result of closing and opening of an IKONOS image by a 5x5 square SE. It can be seen thatstructures of size less than the SE are totally removed.

    Eventhough opening and closing are powerful operators, their major drawback is that they are not connectedfilters. It can be seen in Figure 2, at the bottom, that the two small houses have merged into one after theclosing operation. This structure can be seen as a building now. To avoid thisproblem, geodesic morphologyand reconstruction can be used. Reconstruction filters are connected filters and they have been proven not tointroduce discontinuities.2 Reconstruction filters are based on geodesic morphology.

    Definition 2.5 (Geodesic dilation). The geodesic dilation δ(1)g (f) of size 1 consists in dilating a markerf with respect to a mask g,

    δ(1)g (f) = δ(1)(f) ∧ g. (7)

    The geodesic dilation of size n is obtained by performing n successive geodesic dilations of size 1:

    Definition 2.6 (geodesic erosion). The geodesic erosion is the dual transformation of the geodesicdilation,

    �(1)g (f) = �(1)(f) ∨ g. (8)

    Definition 2.7 (Reconstruction). The reconstruction by dilation (erosion) of a marker f with respect toa mask g consists of repeating a geodesic dilation (erosion) of size one until stability is achieved, i.e., δ(n+1)g (f) =δ(n)g (f) (i.e., �

    (n+1)g (f) = �

    (n)g (f)):

    Recg(f) = δ(n)g (f), (9)

    Rec∗g(f) = �(n)g (f). (10)

    With Definition 2.7, it is possible to define connected transformation that satisfy the following assertion: Ifthe structure of the image cannot contain the SE then it is totally removed, else it is totally preserved. Theseoperators are called opening/closing by reconstruction.

    ∗The true definition is with the transposed SE B̌. Transposition of B corresponds to its symmetric set with respectto its origin. For simplicity, we only consider SE whose origin is also the symmetric center, so B̌ = B.

    Proc. of SPIE Vol. 5982 598201-3

  • Definition 2.8 (Opening-Closing by reconstruction). The opening by reconstruction of an image fis defined as the reconstruction by dilation of f from the erosion of size n of f . Closing by reconstruction isdefined by duality:

    γ(n)R = Recf (�

    (n)(f)), (11)

    φ(n)R = Rec

    ∗f (δ

    (n)(f)). (12)

    Figure 2 shows results of opening and closing by reconstruction. It can be clearly seen that these transformationsintroduce less noise than the classical opening-closing. Shapes are preserved and the structures still present aftertransformation are of a size greater than or equal to the SE. Therefore, the use of opening and closing byreconstruction allows us to characterize morphological characteristics of the structures present in an image. Inaddition, to determine the size or shape of all the objects present in an image, it is necessary to use a range ofdifferent SE size. This concept is called Granulometry .3

    Definition 2.9 (Granulometry). A granulometry Φλ is defined by a transformation having a size para-meter λ and satisfying the three following axioms:

    • Anti-extensivity: The transformed image is less than or equal to the original image.• Increasingness: The ordering relation between image is preserved.• Absorption: The composition of two transformations Φ of different size λ and ν will give always the result

    of transformation with the biggest size:

    ΦλΦν = ΦνΦλ = Φmax(λ,ν). (13)

    Granulometries are typically used for the analysis of the size distribution of structures in images. Classicalgranulometry by opening is built by successive opening operation of an increasing size. By doing so, an image isprogressively simplified. Using connected operators, like opening by reconstruction, no shape noise is introduced.Anti-Granulometry is defined with the same axioms as granulometry and by replacing anti-extensivity axiom byextensivity. The granulometry concept could be used to create a feature vector from a single image. The nextsubsection describes the concepts of the Morphological Profile and of the Derivative of the Morphological Profile.

    2.2. Morphological Profile and Derivative of the Morphological Profile

    Here, we will review the concepts of the Morphological Profile and of the Derivative of the Morphological Profile,both of which are based on granulometry and anti-granulometryThe opening profile (OP) at the pixel x of the image I is defined as an n-dimensional vector:

    OPi(x) = γ(i)R (x), ∀i ∈ [0, n]. (14)

    Erosion Original image Dilation

    Figure 1. Erosion and dilation of a grey level image by a 5x5 square SE.

    Proc. of SPIE Vol. 5982 598201-4

  • ..-.Opening Closing Opening by Reconstruction Closing by Reconstruction

    Figure 2. Opening, closing, opening by reconstruction, and closing by reconstruction of a grey level image by a 5x5square SE.

    Also, the closing profile (CP) at the pixel x of the image I is defined as a n-dimensional vector:

    CPi(x) = φ(i)R (x), ∀i ∈ [0, n]. (15)

    Clearly we have CP0(x) = OP0(x) = I(x) and n is the total number of opening or closing. The OP madewith opening by reconstruction satisfy the three axioms of granulometry. It is the same for CP with anti-granulometry. Therefore OP (CP) could be defined as a granulometry (anti-granulometry) made with opening(closing) by reconstruction. By collating the OP and the CP, the Morphological Profile (MP) of the image I isdefined as 2n + 1-dimensional a vector:

    MP (x) = {CPn(x), CPn−1(x), . . . , CP1(x), I(x), OP1(x), . . . , OPn−1(x), OPn(x)} , (16)

    where

    MPi(x)

    ⎧⎨

    = CPn−i if 0 ≤ i < n,= I(x) if i = n,= OPi if n < i ≤ 2n.

    (17)

    Finally, the Derivative of the Morphological Profile (DMP) is defined as 2n-dimensional vector equal to thediscrete derivative of MP:

    DMPi(x) = MPi−1 − MPi. (18)

    Informations provided by DMP are both spatial and radiometric. For a given pixel, if the DMP is balanced(around a center point), that should signify that a pixel belonging to a structure that is small compared to theSE was used to build the DMP. On the other hand, an unbalanced DMP (to the left or to the right) shouldindicate that the pixel belongs to a large structure. Then, the unbalance of the profile, indicates that the pixelbelongs to a darker (left side) or a brighter (right side) structure than the surrounding pixels. Finally, theamplitude of the DMP gives an information about the local contrast of the structure.

    Figures 3 and 4 presents the MP and the DMP obtained for the IKONOS image. In this case, the SE usedwere discs with increasing radius (3, 6 and 9 pixels long radius). From this example, it is clear that the DMPshould help in discrimination. However, different approaches can be used. In the next section we will discuss theclassification of the DMP using neural networks and fuzzy logic.

    2.3. Classification of Panchromatic Imagery

    As said previously, DMP provides information to discriminate classes. Figure 5 gives “spectral responses”examples of DMPs for three different classes. For classification, we assume that each class has a typical DMP.Based on this assumption, two classifications methods were considered. They are based on two interpretation ofthe DMP. For the first one,5 we consider the DMP as a multispectral image, then the classification is performedwith classical pattern recognition algorithms. For the second approach,6 the DMP is a fuzzy measurement of

    Proc. of SPIE Vol. 5982 598201-5

  • the characteristic size and contrast of each structure. Both approach will be briefly presented and tested in thenext subsections. For both approaches, the tested image is an IKONOS panchromatic image from Reykjavik,Iceland. This image is a 975× 639 pixel with 1-m spatial resolution. The test area is in the center of Reykjavik.It comprises residential, commercial and open areas. Six classes of interest are defined: large buildings (1), smallbuildings (2), residential lawns (3), streets (4), open areas (5) and shadows (6). The number of training andtest samples for each information class is listed in Table 1. A 17-dimensional MP was created (8 openings and8 closings) using a disc SE. Therefore, a 16-dimensional DMP was used as input data.

    2.4. DMP Viewed as a Multispectral Image

    Here, each DMPi(x) is considered as a channel of a multispectral image. This way, classification methods appliedto multispectral images can be applied. Due to the possibly high dimensionality of the DMP, feature extractionand feature selection methods could also be used. For this experiment, a conjugate gradient neural network wasused for classification. The number of hidden neurons was selected to be twice the number of input features.Two different feature extraction methods15 (Discriminant Analysis Feature Extraction (DAFE) and DecisionBoundary Feature Extraction (DBFE)) were also applied on the DMP. DAFE is a method which is intended toenhance class separability. DBFE is a method which is intended to extract discriminant informations from thedecision boundary. The Neural Network (NN) was trained with a reference map and 25% of the labelled sampleswere used for training. The other samples were use for testing. Table 2 shows the achieved accuracies for thedifferent approaches. In the table, AVE refers to average classification accuracy, i.e., the mean accuracy for theindividual classes and OA denotes overall classification accuracy, i.e., the classification accuracy for all pixels inthe training or test sets. It is evidence that NN performed better using the DMP as compared using the singlepanchromatic band. However, using the DBFE gives quantitative results which are a little worse than withoutfeature extraction. DAFE had singularity problems and performed not so well. More details on these data andexperiments are given in Benediktsson et. al.5

    Table 1. DAIS University area: Information classes and samples.

    Class SamplesNo. Name Train Test

    1 Large Buildings 7729 309162 Small Buildings 8539 341553 Residential Lawns 8788 351474 Streets 9800 392025 Open Areas 10967 438676 Shadows 6451 25806

    Total 52274 209093

    2.5. DMP Viewed as a Possibility Distribution

    In this approach, the classification is based on a fuzzy interpretation of the DMP and on a possibilistic definitionfor the classes. The DMP is interpreted as a fuzzy measure of the size of a structure. The contrast of the localstructure is provided by the maximum value of the DMP. From Figure 5 it is clear that buildings are of a biggersize and contrast than roads or shadows. For classification, pre-defined DMP Π(n) for each class n is needed.They can be build with simple statements such as: “A large building is a bright object with a large size and ahigh contrast” or “A shadow is a dark object with a high contrast but a unknown size”. Dark and bright can bedefined as on the left side of the DMP and on the right side of the DMP while fuzzy definition of size (large ...)

    Figure 3. MP made with 3 closing-opening by reconstruction. SE were discs with increasing radius (3, 6 and 9 pixelslong radius)

    Proc. of SPIE Vol. 5982 598201-6

  • Figure 4. DMP derives from MP on Figure 3. The profile was normalized for visual inspection.

    Table 2. Test Accuracies in percentage for original IKONOS Image, the entire DMP, the DAFE, the DBFE and theFPM.

    Data Original gray value Entire DMP DAFE DBFE Entire DMPFeatures 1 16 10 10 16����������Class No

    MethodNN NN NN NN FPM

    1 63.6 73.2 54.4 70.7 47.62 7.3 60.6 45.3 57.4 67.83 0.0 40.2 45.9 46.5 58.84 2.8 61.6 61.6 68.7 9.85 90.9 52.7 38.2 43.3 52.26 95.5 92.5 91.8 89.5 83.3

    Ave 43.3 63.8 56.2 62.9 53.3OA 41.6 62.5 54.0 60.6 52.1

    depends on the DMP size. For definition facilities, the pre-defined DMP has trapezoidal shape. Then, for eachpixel x a degree of membership Cn(x) is defined for each class n as

    Cn(x) =∑

    i DMPi(x) × Πi(n)∑i DMPi(x) ×

    ∑i Πi(n)

    (19)

    where Πi(n) is a possibility distribution. Then another membership degree αn(x) is defined by comparing thelocal contrast with the corresponding model (low or high, depends on the considered class, for example high forbuilding class). The final decision for the pixel x is taken by selecting the class n as following:

    nselected(x) = argmaxn{αn(x) × Cn(x)}. (20)

    Table 2 shows the achieved accuracies for the Fuzzy Possibilistic Model approach. Here, results are evaluatedwith all the labelled samples and that can explained why visual results are not consistent with qualitative results.More details on these experiments are given in Ref.6

    It should be underlined that numerical and visual results should be considered in a relative ways. Some pre-and post-filtering could increase these accuracies.

    3. EXTENDED MORPHOLOGICAL PROFILE

    The morphological profile approach was applied to panchromatic data in the previous section. The methodhas been extended for hyperspectral data applications. A characteristic image or images needs to be extractedfrom the data. It was suggested to use the first principal component (PC) of the hyperspectral data for such

    1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Shadow Class Road Class Building class

    Figure 5. Examples of typical DMPs ”“spectral responses” obtained for the MP in 4

    Proc. of SPIE Vol. 5982 598201-7

  • Profile from PC 1

    Closings Original Openings

    Profile from PC 2

    Closings Original Openings

    - *- .-4-rU- -V •--c''- --- 4:_

    a purpose.9 Although that approach seems reasonable because PCA is optimal for data representation in themean square sense, it should not be forgotten that with only one PC, the hyperspectral data are reduced frompotentially several hundred data channels into one single data channel. Some important information may becontained in the other PCs. Therefore, we apply an extension to the approach in10 and build an extendedmorphological profile from several different PCs. This extension was proposed in Benediktsson et. al.8

    For example, we could decide to use the PCs that account to certain percentage of the total variation inthe image, e.g., 95% or 99%. If two PCs fill up the variation threshold, morphological profiles are constructedfor each of the PCs. An example of the extended morphological profile is shown in Figure 6. Each profile isnow represented by multi dimensional feature vector, which now are stacked into single vector to be used forclassification. In general mathematical notation, the extended morphological profile is represented by

    MPext(x) = {MPPC 1(x),MPPC 2(x), . . . , MPPC n(x)} , (21)

    where MPPC i(x), i = 1, ..., n, are the morphological profiles constructed from principal components accordingto (14). Obviously, the computations will be more intensive for this approach. On the other hand, betterinformation should be extracted from the hyperspectral data than for the simple approach proposed in.9 Also,some redundancies should be observed for the extended morphological profile. Principal component analysis(PCA) and Independent component analysis (ICA) as feature extraction method are discussed in the next twosubsections.

    3.1. PCA Feature Extraction

    The aim of PCA is to transform the data into lower dimensional subspace which is optimal in the sense ofsum-square error. PCA decorrelates the original data set and makes the transformed features uncorrelated toeach other.

    The eigenvectors, ei, i = 1, 2, ..., d, of the covariance matrix with corresponding eigenvalues λi form orthogonalbasis for the new feature space, which original data set is projected onto. Eigenvalues measure the contribution ofeach eigenvector to the original feature space and the eigenvectors are sorted in a decreasing order of eigenvalues.Usually, only the first k eigenvectors are used but the remaining d−k dimensions are skipped. The projection ontolower dimensional subspace contains noise. The value of k may be determined in order to have the transformeddata include a particular percentage of the original variance, e.g., 95% or 99% of the original variance.

    The transformation matrix Ax of size d × k includes the k eigenvectors and projection is given by

    y = ATx(x − µx). (22)

    where x is the feature vector and µx is the mean.

    After the transformation, the correlation between different features has been removed and the covariancematrix of y, Σy, is a diagonal matrix with entires λi.

    Figure 6. Extended morphological profile of two images. Each of the original profile has 2 opening and 2 closings.Circular structuring element with radius increment 4 was used (r = 4, 8).

    Proc. of SPIE Vol. 5982 598201-8

  • 3.2. ICA Feature Extraction

    ICA was introduced in 1980s as effective method for Blind Source Separation (BSS), i.e., to separate data intounderlaying information component. The method has also been used for the purpose of feature extraction.

    To demonstrate a BSS problem, let us consider sensed signals, xi(t), i = 1, 2, 3, as linear mixtures of thesources, si(t), i = 1, 2, 3:

    x1(t) = a11s1(t) + a12s2(t) + a13s3(t),x2(t) = a21s1(t) + a22s2(t) + a23s3(t), (23)x3(t) = a31s1(t) + a32s2(t) + a33s3(t).

    It is of interest to find the unknown sources, si(t), without prior knowledge about the mixing coefficients,aij . The set of equations can be written in vector form as,

    x = As, (24)

    where A is referred to as the mixing matrix with size n × m. The vector x represents n sensed signals and massumed sources are in s. ICA can recover up to m = n independent sources but in practice, m < n should beexpected.

    By definition, the sources are assumed to be statistically independent, which is a stronger requirement thanbeing uncorrelated.12 Then, the probability density can be expressed as,

    p(s) =m∏

    i=1

    p(si), (25)

    where p(si) are the probability density functions for the individual source signals.

    During the unmixing process, we seek m × n transformation matrix W such as,

    u = Wx. (26)

    where u contains the unmixed signals - the sources. Bell and Sejnowski published their approach of blind signaldeconvolution based on ICA by minimizing the mutual information13

    I(u1, ..., um) = E{

    logp(u)∏m

    i=1 p(ui)

    }(27)

    =m∑

    i=1

    H(ui) − H(u), (28)

    where H(ui) = −E{log(p(ui))} is the entropy for random variable ui and H(u) = −E{log(p(u))} is the jointentropy for u = [u1 u2 · · · um]T . At it’s minimum, (27) becomes zero. The unmixing matrix W is optimizedby a natural gradient algorithm14 where learning rate can control the convergence speed.

    Varshney and Arora propose two ICA feature extraction algorithms in.14 The algorithm applied here startsby data set decorrelation using the whitening transform or PCA. Independent components are extracted fromthe m most important principal components with the accumulative variance of 99%. The n−m rest of the PCsare not used. Algorithm breaks when mutual information has reached it’s minimum, according to estimation by(28).

    3.3. Experimental Results for DAIS Data

    Experiments were done on Digital Airborne Imaging Spectrometer (DAIS) 7915 data. The DAIS 7915 imagingspectrometer was designed by DLR and has 79 channels. Seventy two of the spectral channels are in the visiblelight and near infrared regions of the spectrum, i.e., correspond to the wavelengths 0.4–2.4 µm. Seven thermal

    Proc. of SPIE Vol. 5982 598201-9

  • infrared bands were not used. Some channels were skipped due to noise. Therefore, a total of 62 bands wereused.

    The flight altitude was chosen as the lowest available for the airplane, which resulted in a spatial resolutionof 2.6m per pixel. The test site is around the Engineering School at the University of Pavia the image is 243by 243 pixels. There were eight information classes defined for University area data set: asphalt, tree, meadow,gravel, bitumen, soil, parking lot and roof. The number of training and test samples for each information classis listed in Table 3.

    Table 3. DAIS University area: Information classes and samples.

    Class SamplesNo. Name Train Test

    1 Asphalt 137 1292 Tree 131 1353 Meadow 136 1374 Gravel 98 1175 Bitumen 110 966 Soil 133 807 Parking lot 130 1358 Metallic roof 92 90

    Total 967 919

    3.3.1. Statistical Classification

    To get baseline results for the University area data set, the experiments were started by using the GaussianMaximum Likelihood (ML) classifier. Feature Extraction, i.e., DAFE, DBFE and Non-Parametric WeightedFeature Extraction (NWFE),15 was also used in the experiments in order to reduce the data sets. For DBFEand NWFE, the reduction was according to the 99% variance criterion but for the DAFE a 100% criterion wasused.15 Results for the ML classification of the raw and reduced data sets are listed in Table 4

    Table 4. DAIS University area: Classification accuracies (%) obtained from maximum likelihood classification of rawdata, with and without Feature Extraction.

    Data Raw bands Raw bands Raw bands Raw bandsFE - DAFE DBFE NWFE

    Features 62 7 23 19

    Class Train Test Train Test Train Test Train Test

    1 100.0 69.8 81.0 59.7 97.8 63.6 90.5 61.22 100.0 82.2 90.8 84.4 99.2 77.8 93.1 83.03 100.0 75.2 89.7 93.4 98.5 86.9 97.1 97.14 100.0 21.4 92.9 66.7 100.0 36.8 99.0 62.45 100.0 28.1 92.7 47.9 100.0 44.8 98.2 40.66 100.0 75.0 92.5 80.0 100.0 82.5 98.5 67.57 100.0 59.3 80.0 70.4 100.0 70.4 93.8 74.18 100.0 74.4 100.0 73.3 100.0 80.0 100.0 81.1

    Ave 100.0 60.7 90.0 72.0 99.4 64.9 96.3 70.9OA 100.0 61.3 89.3 72.7 99.4 68.0 96.0 72.1

    Using the 62 raw data bands gave the lowest overall test accuracies, i.e., 61.3% of the test samples wereassigned to the correct class. The highest test accuracy was obtained after a reduction by the DAFE method(72.7% overall accuracy). For the DAFE, seven features were used based on an 100% accumulative variance inthis eight class problem. For the DBFE and NWFE, the feature reduction was according to the 99% varianceand original data set transformed into 23 and 19 features, respectively.

    3.3.2. Principal Components

    The three most important PCs correspond to 99% of the total data set variance. Principal components onethrough four are displayed in Figure 7 and classification accuracies, using these these band as input features aregiven in table 5. PCs four and above are noisy and make negligible contribution to the total data set variance.The best accuracies for the principal components was experienced using the three most important PCs where46.6% of test samples samples were assigned to correct class, respectively.

    Proc. of SPIE Vol. 5982 598201-10

  • et

    Figure 7. DAIS University area, most important principal components, 1st (left) through 4th (right).

    Table 5. DAIS University area: Classification accuracies (%) obtained from Neural Network classification of most im-portant Principal Components.

    Data set PC 1 PCs 1, 2 PCs 1, 2, 3Features 1 2 3

    Class Train Test Train Test Train Test

    1 71.5 24.0 81.8 58.1 83.2 55.02 74.0 87.4 80.9 84.4 86.3 86.73 4.4 2.2 69.1 48.9 52.9 83.94 12.2 11.1 84.7 67.5 85.7 56.45 0.0 0.0 90.9 40.6 82.7 38.56 68.4 27.5 43.6 16.3 94.0 52.57 49.2 40.7 0.0 0.0 0.0 0.08 96.7 52.2 0.0 0.0 0.0 0.0

    Ave 47.1 30.7 56.4 39.5 60.6 46.6OA 47.3 31.4 57.2 42.1 61.9 48.7

    3.3.3. Independent Components

    The PCs of University area data set, normalized to unit variance, are transformed according to the independentcomponent analysis procedure. The ICs are displayed in Figure 8 and classification accuracies, using ICs as inputto the neural network, are given in Table 6.

    For the three ICs, 61.9% of the test samples were correctly classified. This was a much improved overallaccuracy on what was obtained using the three PCs in Table 5.

    3.3.4. Morphological Profiles of Principal Components

    Simple and extended morphological profiles were constructed from each of the principal components by addingfour openings and four closing to the original image. A disk-shaped structuring element with radius incrementof 2 pixels was used. The profiles have nine, 18 and 27 input features and the classification accuracies are shownin Table 7.

    In all three instances, accuracies were improved compared to classification of PCs, shown in Table 5. Thehighest accuracy was obtained for the 3-PCs morphological profile where the classifier was correct for 79.0% ofthe test samples.

    Figure 8. DAIS University area, independent components, 1st (left) through 3rd (right).

    Proc. of SPIE Vol. 5982 598201-11

  • Table 6. DAIS University area: Classification accuracies (%) obtained from Neural Network classification of IndependentComponents.

    Data set IC 1 IC 2 IC 3 ICs 1, 2, 3Features 1 1 1 3

    Class Train Test Train Test Train Test Train Test

    1 70.8 76.7 77.4 66.7 75.9 79.1 61.3 46.52 35.9 37.0 0.0 0.0 44.3 15.6 84.7 88.93 0.0 0.0 58.8 43.1 55.9 72.3 80.9 94.24 6.1 8.5 14.3 9.4 73.5 63.2 63.3 39.35 0.0 0.0 22.7 0.0 0.0 0.0 91.8 37.56 67.7 68.8 90.2 57.5 58.6 15.0 91.0 48.87 45.4 37.0 56.2 42.2 0.0 0.0 77.7 69.68 72.8 21.1 66.3 40.0 95.7 68.9 100.0 50.0

    Ave 37.3 31.2 48.2 32.4 50.5 39.3 81.3 59.3OA 37.8 30.8 49.5 32.1 49.2 40.3 80.9 61.9

    Table 7. DAIS University area: Classification accuracies (%) obtained from neural network classification of morphologicalprofiles of most important Principal Components.

    Data set PC 1 PCs 1, 2 PCs 1, 2, 3# op/cl 4 4 4SE step 2 2 2Features 9 18 27

    Class Train Test Train Test Train Test

    1 94.2 76.0 94.9 72.1 90.5 79.82 0.0 0.0 100.0 74.1 92.4 80.73 57.4 35.8 98.5 66.4 99.3 99.34 0.0 0.0 89.8 63.2 79.6 59.05 92.7 74.0 100.0 67.7 100.0 71.96 0.0 0.0 100.0 55.0 99.2 88.87 93.8 89.6 97.7 69.6 100.0 91.98 0.0 0.0 98.9 56.7 100.0 50.0

    Ave 42.3 34.4 97.5 65.6 95.1 77.7OA 44.6 36.9 97.6 66.6 95.3 79.0

    3.3.5. Morphological Profiles of Independent Components

    Morphological profiles of three independent components in Figure 8 were constructed and used as input to theneural network classifier. Classification accuracies are shown in Table 8.

    Table 8. DAIS University area: Classification accuracies (%) obtained from neural network classification of morphologicalprofiles of Independent Components.

    Data set IC 1 IC 2 IC 3 ICs 1, 2, 3# op/cl 4 4 4 4SE step 2 2 2 2Features 9 9 9 27

    Class Train Test Train Test Train Test Train Test

    1 69.3 45.7 86.9 62.8 0.0 0.0 0.0 0.02 50.4 35.6 67.2 48.9 74.0 54.1 98.5 74.83 45.6 0.7 77.9 76.6 89.7 75.9 100.0 100.04 0.0 0.0 29.6 30.8 0.0 0.0 100.0 87.25 97.3 59.4 97.3 68.8 0.0 0.0 100.0 86.56 64.7 60.0 93.2 75.0 0.0 6.3 0.0 0.07 80.0 65.2 60.8 39.3 97.7 89.6 100.0 90.48 94.6 36.7 92.4 46.7 100.0 81.1 100.0 61.1

    Ave 62.7 37.9 75.7 56.1 45.2 38.4 74.8 62.5OA 62.8 36.3 76.2 55.4 45.3 40.9 71.9 65.3

    By the construction of an extended morphological profile of three ICs, classification accuracies were improvedlittle from the experiments listed in Table 6.

    3.3.6. Morphological Profiles of PCs with Feature Extraction

    The three feature extraction methods were applied to the morphological profile of two and three original PCs,respectively. The feature reduction for the DAFE was based on a 100% variance, resulting in seven features.

    Proc. of SPIE Vol. 5982 598201-12

  • Fewer features were used in case of reduction by the DBFE and NWFE methods. For those approaches, thefeatures were reduced based on 87 to 92% total variance. The classification accuracies are given in Table 9.

    Table 9. DAIS University area: Classification accuracies (%) obtained from neural network classification of morphologicalprofiles of most important Principal Components with Feature Extraction.

    Data set PCs 1, 2 PCs 1, 2, 3 PCs 1, 2 PCs 1, 2, 3 PCs 1, 2 PCs 1, 2, 3# op/cl 4 4 4 4 4 4SE step 2 2 2 2 2 2

    FE DAFE DAFE DBFE DBFE NWFE NWFEFeatures 7 7 6 6 7 6

    Class Train Test Train Test Train Test Train Test Train Test Train Test

    1 63.5 50.4 45.3 16.3 62.8 61.2 0.0 0.0 86.1 79.8 89.8 78.32 83.2 60.7 80.2 51.9 77.1 63.0 77.9 63.7 87.0 82.2 98.5 88.13 84.6 30.7 83.1 76.6 80.9 35.8 79.4 85.4 83.8 34.3 100.0 89.84 83.7 64.1 74.5 53.8 0.0 0.0 45.9 41.0 78.6 68.4 80.6 62.45 100.0 72.9 98.2 71.9 99.1 63.5 97.3 67.7 100.0 66.7 99.1 49.06 88.7 72.5 85.0 57.5 92.5 32.5 97.7 73.8 100.0 71.3 97.7 66.37 87.7 71.9 80.8 80.7 70.0 60.7 95.4 75.6 90.0 80.0 90.0 76.38 98.9 58.9 75.0 35.6 92.4 46.7 98.9 51.1 100.0 52.2 100.0 48.9

    Ave 86.3 60.3 77.7 55.5 71.8 45.4 74.1 57.3 90.7 66.9 94.5 69.9OA 85.4 59.0 77.4 56.0 72.9 46.1 73.1 56.9 90.5 67.1 94.6 72.1

    Classification accuracies were not improved over the results in Table 7 in any of the experiments listed inTable 9 except when the 2-PCs morphological profile was reduced by the NWFE method.

    3.3.7. Morphological Profiles of ICs with Feature Extraction

    Finally, the same feature extraction methods were applied to extended morphological profile of three independentcomponents and the accuracies are given in Table 10.

    Table 10. DAIS University area: Classification accuracies (%) obtained from neural network classification of morpholog-ical profiles of Independent Components with Feature Extraction.

    Data set PCs 1, 2, 3 PCs 1, 2, 3 PCs 1, 2, 3# op/cl 4 4 4SE step 2 2 2

    FE DAFE DBFE NWFEFeatures 6 5 5

    Class Train Test Train Test Train Test

    1 0.0 0.0 80.3 62.8 85.4 56.62 79.4 71.9 31.3 40.7 90.8 85.93 89.7 47.4 73.5 70.8 63.2 86.14 96.9 71.8 64.3 69.2 0.0 0.05 0.0 0.0 92.7 60.4 99.1 70.86 0.0 0.0 0.0 0.0 96.2 57.57 89.2 74.8 90.8 84.4 93.1 91.98 89.1 50.0 97.8 77.8 100.0 67.8

    Ave 55.5 39.5 66.3 58.3 78.5 64.6OA 53.7 42.7 64.5 60.5 79.8 65.9

    When compared to the results in Table 8, the test accuracies were not improved using the DAFE and DBFEmethods. However, a small improvement in terms of overall accuracies was obtained for reduction by NWFEand 65.9% test samples were assigned to the correct class according to Table 10.

    3.3.8. Summary of DAIS University Area Experiments

    Classification accuracies obtained by the neural network classifier are compared to statistical classification accu-racies. Bar chart is displayed in Figure 9.

    Reduction of the the extended morphological profiles always lead to less accuracies for the DAFE and DBFEmethods. The reduction by the NWFE increased overall accuracies for two morphological profiles out of three,along with the case when the NWFE was applied to the original raw data, prior to ML-classification.

    Proc. of SPIE Vol. 5982 598201-13

  • Cla

    ssifi

    catio

    n ao

    ot'ra

    oiss

    -a—

    ——

    ——

    . L.

    Figure 9. DAIS University area, comparison of feature extraction methods.

    4. CONCLUSIONS

    Classification has been performed on high resolution imagery from urban areas using morphological profiles.Two different data sets were used in analysis, i.e., a panchromatic data set and hyperspectral data. In thepanchromatic case, acceptable accuracies were achieved, regardless of the classification methods used. However,the obtained accuracies are still far below 100%, meaning that some improvements are still required. In particular,we defined the DMP with only one shape of SE. It could be useful to explore different shape of SE (disk andline of a different orientation) for the same image in order to increase the accuracies. Recently in Ref,7 we fusedNN results and FPM results to improve the classification. The obtained results motivate us to continue in thatdirection.

    For the hyperspectral data, the best overall accuracies of test data were obtained by using extended morpho-logical profiles based on principal components. However, the classification of extended morphological profiles wasin most cases not seen to have a significant advantage, in term of classification accuracies, over statistical classi-fication of the original raw data set. On the other hand, we have noted in our experiments that morphologicalapproaches improve the visual interpretation on classes, which represent structure shaped objects, such as roads,houses and their shadows. Furthermore, the classification maps obtained by the neural network classification ofextended morphological profiles seem to be less noisy than when maximum likelihood classification is applied onthe raw data.

    ACKNOWLEDGMENTS

    This research was partially supported by the Icelandic Science Fund, the Research Fund of the University ofIceland, and the Jules Verne Program of the French and Icelandic governments (PAI EGIDE). The authors wouldlike to thank Professor Paolo Gamba of the University of Pavia, Italy, for providing the reference data for theDAIS data.

    REFERENCES1. P. Soille and M. Pesaresi, “Advances in mathematical morphology applied to geoscience and remote sensing,”

    IEEE Transactions on Geoscience and Remote Sensing 40, pp. 2042–2055, september 2002.2. M. Pesaresi and J. A. Benediktsson, “A new approach for the morphological segmentation of high-resolution

    satellite imagery,” IEEE Transactions on Geoscience and Remote Sensing 39, pp. 309–320, february 2001.3. P. Soille, Morphological Image Analysis, Principles and Applications- 2nd edition, Springer, 2003.4. J. Serra, Image Analysis and Mathematical Morphology, Volume 2: Theoretical Advances, U.K. Academic,

    1988.5. J. A. Benediktsson, M. Pesaresi, and K. Arnason, “Classification and feature extraction for remote sensing

    images from urban areas based on morphological transformations,” IEEE Transactions on Geoscience andRemote Sensing 41, pp. 1940–1949, September 2003.

    Proc. of SPIE Vol. 5982 598201-14

  • 6. J. Chanussot, J. A. Benediktsson, and M. Fauvel, “Classification of remote sensing images from urban areasusing a fuzzy possibilistic model,” IEEE Transactions on Geoscience and Remote Sensing Letters , acceptedfor publication.

    7. M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Fusion of methods for the classification of remote sensingimages from urban areas,” in IEEE Geoscience and Remote Sensing Synposium, 2005.

    8. J. A. Benediktsson, J. A. Palmason and J. R. Sveinsson, “Classification of hyperspectral data from urbanareas based on extended morphological profiles,” IEEE Transactions on Geoscience and Remote Sensing,vol. 43, no. 3, pp. 480–491, 2005.

    9. J. A. Palmason, J. A. Benediktsson and K. Arnason, “Morphological Transformations and Feature Extractionfor Urban Data with High Spectral and Spatial Resolution,” Proeedings of IGARSS 2003, Toulouse, France,CD Rom, IEEE Publications 2003.

    10. F. Dell’Acqua, P. Gamba, A. Ferrari, J. A. Palmason, J. A. Benediktsson and K. Arnason, “ExploitingSpectral and Spatial Information in Hyperspectral UrbanData with High Resolution” IEEE Geoscience andRemote Sensing Letters, vol. 1, no. 3, pp. 322–326, 2004.

    11. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed, Academic Press, New York, 1990.12. A. Hyvrinen, J. Karhunen and E. Oja, Independent copmonent analysis, John Wiley and Sons, New York,

    200113. A. J. Bell and T. J. Sejnowski, Blind separation and blind deconvolution: an information-theoretic approach

    IEEE ICASSP-95, vol. 5, pp. 3415–3418, 1995.14. P.K. Varshney and M.K. Arora, Advanced Image Processing Techniques for Remotely Sensed Hyperspectral

    Data, Springer Verlag, Berlin, 2003.15. D. A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing, John Wiley and Sons, Hoboken,

    New Jersey, 2003.

    Proc. of SPIE Vol. 5982 598201-15