8
Event Data Mining and Classification from Multiple Streaming Sources Ashit Talukder Jet Propulsion Laboratory, California Institute of Technology Pasadena, CA, USA E-mail: [email protected] Abstract— A novel solution to mining and classification of deformable events from multiple streaming image data sources is discussed. Observations of natural or manmade phenomenon using sensor networks or remote satellites are often acquired from various sensory measurement mechanisms placed at different locations. Furthermore, each source measures a different parameter (or different aspect of the phenomenon) resulting in strong and weak classifiers for different data sources. Previous solutions for multisource learning and mining are applicable to simultaneous co- registered data measurements that may not work in many practical applications. We discuss a new multisource classification solution using a generative model that reduces the multiple measurement spaces into a common feature space and maintains a unique feature space for each measurement source. A temporal classifier is used for temporal knowledge transfer by tracking the correspondence between consecutive measurements from different sources in the common feature space. In addition, an auxiliary source-specific classifier is used for each data source. A knowledge transfer solution based on a Bayesian approach is then used to fuse the transferred knowledge between the consecutive measurements from two sources (applied to the common feature spaces) with a source- specific classifier for the current observation (applied to the unique feature space) to ensure robust classification labeling even during instances when only measurements from a weak data source is used. Experimental results on a practical cyclone detection and tracking problem from multiple streaming remote satellite sources demonstrate the usefulness of our proposed approach. Keywords- classification; data mining; multisource; processing; streaming data; knowledge transfer; transfer learning; tracking I. INTRODUCTION Multisource learning and classification has direct applicability in today’s information-age where information is gathered and digitized from various sources. Several domains involve data from multiple sources: defense applications such as battlefield monitoring involve visible and infra-red imaging sources, chemical and biological sensors and seismic sensors for vehicle identification, environmental monitoring for science using distributed pressure, temperature, humidity, radar and image data measurements over large areas, newscasts include audio and visual information, and remote monitoring that involve use of several satellites with specific orbits measuring sea surface height, wind speed, precipitation etc. Most of such multisource measurements involve data sources with unique data capture cycles, resulting in un-registered data sets. Many prior multisource processing solutions, including data fusion algorithms using wavelets [1]), assume the existence of simultaneous (at the same time) or co-registered (from same location) measurement sources. Solberg et. al. [2] proposed a Bayesian approach for land-use classification from a fixed region captured at different time using multiple satellite sources, in particular, Landsat images and SAR images. Classification of land cover using remote datasets has been addressed using co-registered sensors [3]. However, these approaches assume the simultaneous observation from multiple sources and would not apply in many real-world situations. In this paper, we discuss a solution to classify and track events across multiple heterogeneous source measurements where finding direct correspondences between source measurements are non-trivial and some sources are more accurate for detecting events than others (i.e. weak versus strong sources). As a unique aspect of our approach, we pose solutions to multisource event mining where: Figure 1 Evolution of a cyclone in the North Atlantic captured by different satellite sources (a) QuikSCAT satellite wind measurements (b) TRMM precipitation measurements (c) GOES Infra-red measurements and (d) QuikSCAT wind measurements after 12 hours from the first observations. Note the rapid change in appearance of the cyclone over time and from one satellite measurement source to another 2010 IEEE International Conference on Data Mining Workshops 978-0-7695-4257-7/10 $26.00 © 2010 IEEE DOI 10.1109/ICDMW.2010.189 80 2010 IEEE International Conference on Data Mining Workshops 978-0-7695-4257-7/10 $26.00 © 2010 IEEE DOI 10.1109/ICDMW.2010.189 80

[IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

  • Upload
    ashit

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

Event Data Mining and Classification from Multiple Streaming Sources

Ashit Talukder Jet Propulsion Laboratory, California Institute of Technology

Pasadena, CA, USA E-mail: [email protected]

Abstract— A novel solution to mining and classification of deformable events from multiple streaming image data sources is discussed. Observations of natural or manmade phenomenon using sensor networks or remote satellites are often acquired from various sensory measurement mechanisms placed at different locations. Furthermore, each source measures a different parameter (or different aspect of the phenomenon) resulting in strong and weak classifiers for different data sources. Previous solutions for multisource learning and mining are applicable to simultaneous co-registered data measurements that may not work in many practical applications. We discuss a new multisource classification solution using a generative model that reduces the multiple measurement spaces into a common feature space and maintains a unique feature space for each measurement source. A temporal classifier is used for temporal knowledge transfer by tracking the correspondence between consecutive measurements from different sources in the common feature space. In addition, an auxiliary source-specific classifier is used for each data source. A knowledge transfer solution based on a Bayesian approach is then used to fuse the transferred knowledge between the consecutive measurements from two sources (applied to the common feature spaces) with a source-specific classifier for the current observation (applied to the unique feature space) to ensure robust classification labeling even during instances when only measurements from a weak data source is used. Experimental results on a practical cyclone detection and tracking problem from multiple streaming remote satellite sources demonstrate the usefulness of our proposed approach.

Keywords- classification; data mining; multisource; processing; streaming data; knowledge transfer; transfer learning; tracking

I. INTRODUCTION Multisource learning and classification has direct

applicability in today’s information-age where information is gathered and digitized from various sources. Several domains involve data from multiple sources: defense applications such as battlefield monitoring involve visible and infra-red imaging sources, chemical and biological sensors and seismic sensors for vehicle identification, environmental monitoring for science using distributed pressure, temperature, humidity, radar and image data measurements over large areas, newscasts include audio and visual information, and remote monitoring that involve use of several satellites with specific orbits measuring sea surface height, wind speed, precipitation etc. Most of such

multisource measurements involve data sources with unique data capture cycles, resulting in un-registered data sets.

Many prior multisource processing solutions, including data fusion algorithms using wavelets [1]), assume the existence of simultaneous (at the same time) or co-registered (from same location) measurement sources. Solberg et. al. [2] proposed a Bayesian approach for land-use classification from a fixed region captured at different time using multiple satellite sources, in particular, Landsat images and SAR images. Classification of land cover using remote datasets has been addressed using co-registered sensors [3]. However, these approaches assume the simultaneous observation from multiple sources and would not apply in many real-world situations.

In this paper, we discuss a solution to classify and track events across multiple heterogeneous source measurements where finding direct correspondences between source measurements are non-trivial and some sources are more accurate for detecting events than others (i.e. weak versus

strong sources). As a unique aspect of our approach, we pose solutions to multisource event mining where:

Figure 1 Evolution of a cyclone in the North Atlantic captured by

different satellite sources (a) QuikSCAT satellite wind measurements (b) TRMM precipitation measurements (c) GOES Infra-red

measurements and (d) QuikSCAT wind measurements after 12 hours from the first observations. Note the rapid change in appearance of the cyclone over time and from one satellite measurement source to

another

2010 IEEE International Conference on Data Mining Workshops

978-0-7695-4257-7/10 $26.00 © 2010 IEEE

DOI 10.1109/ICDMW.2010.189

80

2010 IEEE International Conference on Data Mining Workshops

978-0-7695-4257-7/10 $26.00 © 2010 IEEE

DOI 10.1109/ICDMW.2010.189

80

Page 2: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

1. the measurements from different sources are taken at different time instances (non-coincidental observations), and

2. the source sensors are not co-located (though we do assume knowledge of spatial coordinates of each measurement);

3. events are deformable and change in appearance over time

4. events look visually dis-similar in each source (Fig. 1). One of our main contributions is a novel mechanism that

automatically learns commonalities between different unregistered data sources and transfers knowledge between measurements from different data sources (Fig 1) to enable reliable event classification & tracking using weaker data sources.

The practical utility of our multisource information mining solution is obvious for detecting and tracking numerous geospatial events, such as cyclones, plumes, wildfires, storms, and oil spills from remote satellite measurements that give very high resolution global coverage compared to in-situ sensors that have very limited spatial extent. In this paper, we apply our solution to the problem of cyclone detection and tracking from remote satellite data. Wind speed and direction measurements from QuikSCAT satellite are powerful indicators of cyclones, but provide very coarse temporal coverage (once every 12 hrs). On the other hand, the merged TRMM precipitation data is available every 3 hours, but are susceptible to false alarms. A knowledge transfer mechanism is discussed to transfer knowledge on-line between the strong (QuikSCAT data) and weaker (TRMM data) source measurements, and also between two sequential weak source observations to improve the overall classification accuracy. Our solution offers cyclone tracking at ½ - 3 hr resolutions with high spatial accuracy compared to current NHC estimates every 6-12 hrs derived from sparse in-situ sensors. The general solution discussed here applies to problems involving more than two source measurements, and can be used in a variety of different problem domains, including remote monitoring defense systems, wide area sensor nets for homeland security, and event detection across audio-visual sources.

II. PRIOR RELATED WORK Several novel solutions have been posed to the problem

of learning and classification of multiple heterogeneous sources. Elidan, Heitz and Koller have developed a statistical land-mark based shape model to learn the shape of complex objects based on hand-drawings, which are then transferred [4] to actual images. A probabilistic generative framework for analysing two sets of data is suggested in [5], where the structure of each data set is represented in terms of a shared and private latent space; this solution is applied to synthetic image datasets with specific similar and disparate features. Some theoretical results on learning from multiple sources with some amount of “disparity” have been obtained by Crammer and et. al. [6]. [7] discussed a technique to embed observations from different types such as images and text into a single common Euclidean space using their co-occurrence statistics. Canonical correlation analysis and

correspondence analysis invoke similar minimization of Euclidean distances between different variables that are normalized (unity covariance) and are zero-mean. A nonlinear version of probabilistic canonical correlation analysis [5] for analyzing two sets of synthetic image data has been suggested where the structure of each data set is represented in terms of a shared and private latent space. Discriminative modeling attempts to extract relevant structure from multiple variables has been discussed by minimizing the joint information about other, irrelevant, variables using mutual information [8] and information bottleneck [9] approaches. [10] discussed a discriminative model that characterizes the discrepancy between target and auxiliary data for transfer learning from measurements under different conditions. These existing solutions using co-occurrence measures and discriminative solutions such as canonical correlations, mutual information, and generative models use simultaneous observations from multiple sources, such as text and images on the same web-page, or concurrent video and audio from the same stream.

III. MULTI-SOURCE INFORMATION DATA MINING AND CLASSIFICATION THEORY

A. Multisource Problem Description We first discuss the most general formulation for our

learning and classification problem. An event of interest ( )tI s exhibits complex evolution structure over space s and

time t . Our goal is to detect and classify the event over a length of time t to t t+ Δ which is represented by the set

,( )t t tI s +Δ . Without loss of generality, we assume two sensors X and Y observing the event with corresponding

measurements denoted by ,( )t t tX s +Δ

and ,( )t t tY s +Δ

. We consider measurements at discrete (specific) times

0 1{ , ,....} { , }t t t t t∈ + Δ . Many prior machine learning and transfer learning

solutions for learning and classification from multiple sources assume that simultaneous measurements (at the same time) from all sensors (measurement sources) are available at all times, i.e. pairs of observations exist for { ( ), ( )}

o ot tX s Y s ,

1 1{ ( ), ( )}t tX s Y s , etc.

In reality, many sensing and measurement problems do not have co-registered observations at the same time t or even the same location s in space. Without any loss in generality, we assume that the observed sequence has alternate measurement observations from the two sensors as follows

1 2 13( ), ( ), ( ), ( ),......., ( ), ( )o N Nt t t t t tX s Y s X s Y s X s Y s

+.

Note that in practice, as in our applications, some source measurements could have higher temporal resolutions and other sources could have coarser temporal resolutions. For example, this occurs in remote satellite observations, resulting in a sequence with more frequent consecutive measurements from (say) the weak source Y , and sparse observations from the strong source X :

8181

Page 3: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

1 2 1 23( ), ( ), ( ), ( ),......., ( ), ( ), ( ).....

o N N Nt t t t t t tX s Y s Y s Y s X s Y s Y s+ +

This problem scenario implies that prior multi-source

machine learning and classification techniques that employ covariance, statistical, canonical correlation or discriminative solutions [8, 9] from simultaneous multisource data measurements of an instance of an event for feature selection and classification cannot be directly applied to these set of problems.

Our event classification goal from multiple sources is to assign a class label L at time t given sequential multisource observations 1( ), ( )t tX s Y s− using a Bayesian framework.

Our objective is to estimate the optimum class label itL at

time t such that the a-posteriori probability of the class given the two sequential observations is maximized as given below

1* max{ ( | )}, 1, 2, ...( ), ( )i

t ti

t tL P L i KX s Y s−= = (1.1) From Bayes rule, the MAP estimate simplifies to

1 1 1( ). ( | ). ( | , )( ) ( ) ( )i i i

t t tt t tP L P L P LX s Y s X s− − − (1.2) In this MAP estimation problem, 1 1( | , )( ) ( ) i

tt tP LY s X s− − is the conditional of the observation at time t given the prior observation 1t − . This is achieved by using a temporal

classifier {...}V using common subspaces discussed in

Section III.B. 1( | )( ) i

ttP LX s − is the likelihood of a label given the observation at time t and corresponds to the confidence of the single source classifier {...}W for measurement ( )tY s .

B. Multisource Learning and Classification Model For purposes of describing our approach, we assume

sequential alternating observations from each source X and Y. This can be easily generalized to any sequential series of observations or more sources. Our solution is to model the multisource learning and classification problem as a generative process where the observations are assumed to

be comprised of two components: (a) components ( ( )tC s ) that are common to all (both) data sources, and (b) components ( ( )tU s ) unique to each data stream. The

observation X (or Y ) at any time nt can be completely and uniquely described by it’s common and unique components,

( ) ( ) ( ) ;

( ) ( ) ( )

n n n

n n n

t t tX

t t tY

a n dX s C s U s

Y s C s U s

=

=

(1.3)

We use an invertible data representation (.)F , i.e. the components are orthogonal to each other and the transformation is linear in nature.

1 1 1

( ) ( ) ( );

( ) ( ) ( );

( ) ( )

o o o

o o

t t X t

t t Y t

t X t

X s C s U s

Y s C s U s

where C s U s

= +

= +

(1.4)

After this data decomposition, a two-phased classification solution is used for multisource event detection and tracking. Source-specific classifiers W are used to label each data stream using the unique components ( )tU s . Each data source differs in their discriminative capabilities for event classification. Some data sources (say X ) have strong discriminative power and therefore have strong classifiers while others (say Y ) have limited discriminative capabilities and weak classifiers. Therefore, at each time step, we transfer knowledge from the previous observation using a Bayesian MAP approach (Eq. (1.2)). A temporal classifier V uses the common data components ( )tC s from sequential measurements from different data sources over time as follows

1{( ( ), ( )}X t Y tV C s C s−.

C. Learning of Common Feature Subspaces We use a mutual information metric to estimate the

transformations for common feature and unique feature spaces. This measure was devised by Shannon [11] to characterize and quantify the amount of information flowing through a communication channel. We compute the optimal transformation F for each source measurement X , Y such that the Mutual information or joint-entropy between the random variables X and Y after transformation is maximized,

( ( ), ( )) ( ( )) ( ( )) ( ( ), ( )),M X Y H X H Y H X Y= + −F F F F F F (1.5) i.e. it is the transform that maximizes the common or

shared information between X and Y . Here ( )H X is also referred to as the uncertainty in X , and therefore

( ( ))H XF measures the uncertainty in the transformed variable ( )XF .

The next step is to determine the invertible or orthogonal transform to use for Equations 1.1 and 1.2. A rich family of

Common Feature

QuikSCAT TRMM

QuikSCAT Unique Feature

TRMM Unique Feature

Strong Classifier

Weak Classifier

Temporal Classifier

KnowledgeTransfer

t-1 tStrongSourceX

WeakSourceY

1( )tX −FTransform Transform

( )tYF

1{ ( )}X tW U s −{ ( )}Y tW U s

1( )X tU s − ( )Y tU s1( )X tC s −

( )Y tC s

1{( ( ), ( )}X t Y tV C s C s−

BayesianMAP

Fusion

Classified Event

Figure 2 Computation of common feature subspaces for knowledge transfer between multiple sources

8282

Page 4: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

orthogonal techniques exist for non-redundant signal transform ranging from Fourier transforms (DFT), cosine transforms (DCT) to principal component analysis (PCA) to Hadamard transform and wavelets. Any of these transforms can be used to express the generative model described in Eq. (1.6) and Eq. (1.7). For this application, we choose a Fourier transform to represent the common and unique feature spaces for multisource classification. an image measurement ( )X tI s of the object located at s at time t from source X can be represented by a sum of two orthogonal groups of Fourier coefficients as follows:

2 2

1 1

1 2 1 2

2 2

1 1

2 ( / / ),

2 ( / / ),

( , ) ( , )

2 ( / / ),

1( )

1 ,

, 0,1..., 1;

1where ( ) ;

Bv Buj mu M vn N

X t u vv Bv u Bu

j mu M vn Nu v

v Bv Bv u Bu Bu

Bv Buj mu M vn N

X t u vv Bv u Bu

X

I s eMN

eMN

m 0,1, ,N 1 n M D M N

C s eMN

U

π

π

π

− +

= =

− +

∉ ∉

− +

= =

= +

= − = − = ×

=

∑ ∑

∑ ∑

∑ ∑

F

F

F

1 2 1 2

2 ( / / ),

( , ) ( , )

1( ) .j mu M vn N

t u vv Bv Bv u Bu Bu

s eMN

π− +

∉ ∉

= ∑ ∑ F

(1.8)

This implies that for a Fourier-transform representation, a bandpass filtered data with passband frequencies 1 2 1 2{ , },{ , }Bu Bu Bv Bv can be used to compute the optimal common and unique feature spaces for a specific source. The choice of the optimal band-pass frequencies

1 2 1 2{ , },{ , }Bu Bu Bv Bv is problem-specific and would need to be learnt for each source-pair. We suggest a training step later where the optimal parameters can be learnt using sample training images of objects. Note that the selection of the Fourier transform to compute the source-invariant space is just one of many possible choices and is certainly not the optimal one for all applications. To avoid ringing in the spatial domain arising from a flat-top frequency response shown in Eq. (1.8), a 2-D FIR bandpass filter could be used instead.

For a selected orthogonal transform Ψ , the choice of the parameters PXΨ , PYΨ to use for the best common feature subspace description depends on the image modalities used and the statistical characteristics of the object to be tracked. The choice of optimal parameters for a specific application domain can be determined using sample training images of the object/event. Estimation of the optimal parameter set for the common and unique spaces is an optimization problem described by Equation (1.5) and (1.4)-(1.8). A number of optimization schemes can be used, ranging from greedy search to conjugate gradient (if a gradient can be expressed) to nonlinear techniques such as evolutionary strategies, genetic algorithms, and simulated annealing. For the current model involving only two parameters where the optimal band-pass

cutoff frequencies need to be determined, an optimal solution can be obtained quickly using brute-force search. For more complex models involving a larger number of parameters, a simulated annealing optimization solution can be used for optimal parameter selection.

IV. KNOWLEDGE TRANSFER FROM MULTIPLE SOURCES We discuss a methodology for transferring local spatial-

temporal knowledge for an event between different data sources to facilitate event classification and detection using weaker features (data source Y). After computing the transformation to derive the common feature space, a temporal classifier

1{( ( ), ( )}X t Y tV C s C s− is used to transfer

knowledge between the measurement sequence at 1t − and t . We employ multi-scale Lucas-Kanade tracking [12] on the transformed common feature subspace image of each source image to track local features between different source measurements taken at different times. For ease of notation, we denote the common feature transformed images for each source as 1 1( ), ( )t X t t Y tM C s M C s− −= = .

The LK-tracker provides a set of P sparse features 1 2 3

1 1 1 1, , ,...., Pt t t tf f f f− − − − on the image source X at time t-1

and their corresponding locations 1 2 3, , ,...., Pt t t tf f f f on

image source Y at time t. We derive a statistical measure of the target event based on the tracked features. The event location center at time 1t − is known beforehand due to the sequential nature of the observation stream and the event location reported by the strong source classifier; this is denoted by a specific point 1

stμ − . Since the target event is a

region it is described by a Gaussian distribution with standard deviation 1

stσ − that is determined by the bounding

box of the region extent at time 1t − . The size of the region changes dynamically over time. We estimate the Gaussian distribution describing the distribution of the event at the next time instance t by a mean s

tμ and standard deviation s

tσ based on the relation between feature locations 1 2 3

1 1 1 1, , , ...., P

t t t tf f f f− − − − and their corresponding locations 1 2 3, , ,...., P

t t t tf f f f in the next image at time t . For a Gaussian distribution, this corresponds to a linear estimation problem that can be solved in closed form. The Gaussian distribution at time t described by ( , )s s

t tN μ σ denotes the

posterior distribution 1( | , )i

t t tP M M L− of the data tM at time

t conditional on the data measurement 1tM − at time 1t −

and the target event 1

i

tL − at 1t − . This is used for multisource event classification as discussed next.

At initial time 0t , the event is located using classifier

W for data source X with strong discriminating unique

8383

Page 5: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

features 0

( )X tU s . The common component features at the

current time 0t from X and next time 1t from weak source

Y are computed, and an estimate of the event location at

1t using measurements from source Y) is computed using the

temporal classifier 1

{( ( ), ( )}oX t Y tV C s C s to yield a

classification confidence1 1

( | , )( ) ( ) i

t t tP LY s X s

− −. This

transferred knowledge between sequential observations is then combined with the single source weak classifier

1{ ( )}Y tW U s using the unique feature component in a

Bayesian MAP classification at time 1t as discussed in

Equation (1.2); this minimizes the incidence of false alarms using weak source Y. The single source weak classifier applied to unique feature spaces is application specific and could range from neural nets to Gaussian classifiers. The single source classifier employed for cyclone tracking is discussed in Section V.

This knowledge transfer process continues as one tracks the object with sequential data sources, one with weak measurements and one with strong measurements. Weak sources cannot be reliably used by themselves for event classification due to high false alarms. Therefore, the Bayesian MAP approach (Eq. (1.2)) combines the knowledge transferred from the sequential observations (using the temporal classifier) with the single source weak or strong measurement to improve overall event classification accuracy. This knowledge transfer process can be further generalized to multiple data sources and any order of observations in the sequence.

V. RESULTS Our application domain to demonstrate multisource

learning and classification is cyclone detection and tracking using streaming multiple remote orbiting satellite measurements. We have applied our system to real-time streaming satellite data and historical stored datasets. QuikSCAT wind speed satellite data containing strong cyclone discriminating features has very low temporal resolution while TRMM satellite data measuring

precipitation containing weak cyclone features lacks discriminative power for definitive cyclone identification.

Use of TRMM improves the temporal resolution from 12 hrs to 3 hrs. However, a mechanism to transfer and share knowledge between the strong and weak sources is needed Therefore, this is an ideal case in which learning and classification from multiple sources will improve the accuracy in a real-time functioning system to enable autonomous science for an advance weather warning and forecasting system. We demonstrate the feasibility of our approach on three tropical cyclone sequences: Hurricane Maria from the Year 2005, Hurricane Nargis in Year 2008, and Hurricane Isabel in 2003. Hurricane Nargis caused widespread damage in Burma, and Hurricane Maria was the costliest and deadliest hurricane in the 2003 Atlantic hurricane season. We will show that classification from multiple sources dramatically improves the detection rate for cyclones from remote satellite data, and results in more accurate spatial location estimation of cyclones with improved temporal resolutions while reducing false alarms. All the three hurricanes represent the varying degrees of difficulty for this multisource classification problem. Hurricane Maria is the least difficult sequence with minimal occlusions, Hurricane Isabel is moderately difficult due to the present of a few precipitation regions in the vicinity of the cyclone and Hurricane Nargis (Figure 3) is a very difficult case with numerous surrounding high precipitation regions that could increase false alarm rates and reduce classification accuracy.

A. Single Source Classifier: Summary and Results The (strong) single source classifier W used for QuikSCAT data was an ensemble classifier using several features including vorticity, wind-speed, wind-direction histograms, and support vector machine classifiers in the ensemble , and is discussed in our prior publication [13]. TRMM provides excellent temporal measurements of precipitation every 3 hours and can indicate the presence of

Hurricane Single source Classifier

TPR (%)

FPR (%)

Cyclone Location Error (Kms)

Maria QuikSCAT 100 0 62 KmsTRMM 100 32 320 Kms

Isabel QuikSCAT 100 0 84 KmsTRMM 100 76 770 Kms

Nargis QuikSCAT 100 0 90 KmsTRMM 100 240 1060 Kms

TABLE I. INDIVIDUAL CLASSIFICATION OF CYCLONES FROM SINGLE STRONG (QUIKSCAT) AND WEAK SOURCES.(TRMM)

Figure 3 Multisource measurements for Hurricane Nargis. Approximate cyclone locations are shown with a red box. Note

multiple regions with high rainfall that could result in false alarms from TRMM source data.

8484

Page 6: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

cyclones from high precipitation measurements with a good true-positive rate. However, TRMM data are susceptible to false alarms since most high rainfall regions are not unique to cyclones. The TRMM (weak) classifier W for TRMM data uses a multi-dimensional Gaussian model for the rainfall precipitation values. The Gaussian takes into account the joint rainfall on several neighboring pixels (9x9 region) with a diagonal covariance matrix (to avoid parameter estimation errors and for computational speed). A segmentation step groups neighboring pixels classified as cyclones into a single region, and the centroid of each segmented region (Figure 5) is labeled as the center of a potential cyclone. The classifier output from the weak classifier is then expressed as a multimodal Gaussian mixture with a mean at each detected event center and variance proportional to the size of each segmented blob. The weak classifiers on TRMM data are susceptible to false alarms (since most high rainfall regions do not correspond to cyclones). TRMM classification results on hurricane Nargis is shown in Figure 5. Note multiple false alarm regions in every image. The cyclone classification results from each single source using the unique feature transformations is shown below in Table 1. False alarms are reported as the percentage of false regions vs. number of cyclones, or 100 × (Number of false regions)/(Number of cyclones). Therefore, FPR could be more than 100% when multiple false regions per image are reported.

B. Classification and Mining from Multiple Sources In our earlier work [13, 14], we employed a predictive

motion model using a Kalman Filter to constrain the search for cyclones in a temporal sequence. The predictive motion model learns the motion parameters of the target event and provides a search region of interest to the next measurement in the sequence to minimize the incidence of false alarms and provide a more accurate estimate of the cyclone location using this constrained search. However, since the time lapse between measurements is large, the motion estimates may not be correct. Furthermore, movements of natural events are affected by a variety of external factors such as other weather fronts, ocean currents, change in wind direction, and changes in sea surface temperature. As a result, the motion of the target event cannot be well characterized until these other

(a) QuikSCAT strong source measurement (b) TRMM weak source measurement (c), (d) PDF of QS, TRMM measurement

(e) QuikSCAT common feature transform (f) TRMM common feature transform and (g) QS PDF after transform (h) TRMM PDF

Figure 4 (a), (b) Quikscat, TRMM measuremens and (c), (d) corresponding Probability density functions for cyclone; (e), (f) Quikscat, TRMM common features (high pass filtered data) and (g), (h) corresponding PDFs.

(a) Weak source (TRMM)

measurement for Hurricane Nargis

(b) Classified regions using weak source Gaussian

classifier

(c) Located cyclone centers (green) from

weak source Gaussian classifier

Figure 5 Classification using single information source (TRMM) with high false alarms

8585

Page 7: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

external variables are taken into account. This is obvious from the large classification errors using this solution, as compared to the multisource classification discussed below. The first step in this multisource classification process involved estimation of the multisource common ( ( )tC s ) and unique ( ( )tU s ) orthogonal component spaces. Our solution, as detailed in Section III, involves taking the Fourier domain representation of all training data sources for cyclone examples, and computing the frequencies (pass and stop bands) that result in the maximally mutually common and independent feature subspaces (Eq. (1.5)). We used 80 manually co-registered QuikSCAT and TRMM cyclone images for training. We used a brute-force optimization search to select the optimal passband that yielded the maximum mutual information as discussed in Section III. We assume a symmetrical frequency response for horizontal and vertical orientations, i.e. 1 1 2 2,Bu Bv Bu Bv= = . The optimal solution was found to be a high-frequency bandpass for QS ([0.07-0.43] normalized freq) and TRMM ([0.16-0.32] normalized freq), i.e. high frequencies were found to contain components (features) common to cyclones in both QuikSCAT and TRMM. This result is intuitively appealing since the mathematical formulation in the multisource transform automatically computes that shape information, corresponding to higher frequencies, is

common to both data sources after high-pass filtering. Figure 4 shows the similarities in the gray-histogram of the two sources after computing the common space transform using the band-pass filter. This shape information (high frequency information) about the events of interest, such as cyclones, is retained across the different source measurements as seen in Figure 4g and Figure 4h. The image data containing the mid to low frequency information

is unique to each data type for which separate classifiers were designed. The common component features therefore are a sequence of high-bandpass filtered images for QuikSCAT and TRMM and capture the shape evolution of the cyclone over time that are common to both data sources. A robust temporal classifier is needed to label the deformable cyclone structures in this time sequence in the common feature space. As discussed in Section 4, we employ a feature-based Lucas-Kanade correspondence technique to locate and track salient features on the high-pass cyclone images from QuikSCAT and TRMM data streams (see red marked points in Figure 6); this tracked information is combined with single-source data classification in a Bayesian framework for multisource classification. The results of multisource classification for 3 cyclones using our common and unique feature space solution is discussed in Table 2. Average Absolute Errors in locating the cyclone center compared to ground-truth interpolated NHC estimates for a 36 hour period (comprised of approximately 3 QS images and 12 TRMM images in the

sequence) using our multisource classification compared to single source classification and our previous published predictive fusion solution shows a 3X-9X improvement in performance over both prior approaches. Results for sequential observations from QS and TRMM are shown in Figure 7. The ground truth cyclone location as reported by the National Hurricane Center is shown as a green circle. The estimated cyclone location using KF predictive classification [13] is shown as a yellow circle. The cyclone located using our multi-source classification approach is shown as a purple triangle. The error between actual vs. estimated cyclone location is noted in Figure 7. The predictive Kalman-Filter [13] classification error is abbreviated as EPKF, while EMF denotes the error using the multisource classification model. The multisource technique locates the cyclone most accurately, even when several visually similar false alarm regions exist near the cyclone (Figure 7).

VI. CONCLUSIONS AND FUTURE WORK Our solution presents an advancement in the theory of

multisource classification and learning for disparate datasets for the general case where co-registered data is not available from the multiple streams. An efficient practical solution

Figure 6 Hurricane Nargis QuikSCAT and TRMM source measurements (top) and temporal tracked points (red) between

QuikSCAT and TRMM in common feature space (bottom) shown only near cyclone center for ease of visualization. Note that after

transformation to the common feature space, tracking is robust across the 2 sources even while the raw images are visually dis-

similar (top).

Cyclone Name

Cyclone Location Average Error (in Kilometers)

QS, TRMM Individual Classification

Predictive classification (using KF)

Multisource Classification (QS-TRMM & TRMM-TRMM sequences

Maria 280 Kms 189 Kms 66 KmsIsabel 552 kms 286 Kms 78 KmsNargis 964 Kms 344 Kms 95 Kms

TABLE II. CYCLONE CLASSIFICATION LOCATION AVERAGE ERRORS USING SINGLE SOURCE CLASSIFICATION, PREDICTIVE CLASSIFICATION [13], AND

MULTISOURCE TECHNIQUE.

8686

Page 8: [IEEE 2010 IEEE International Conference on Data Mining Workshops (ICDMW) - Sydney, TBD, Australia (2010.12.13-2010.12.13)] 2010 IEEE International Conference on Data Mining Workshops

such as ours that taps the information from such disparate sources greatly improves data understanding in various domains including environmental and space science as we have demonstrated in this work. Results using our technique on remote satellite data streams with different spatial and temporal resolutions show the practical viability of our method. While we have laid the groundwork for a general solution to classification from multiple sources, a number of open problems and issues remain. The common and unique feature spaces can be modeled with more complex orthogonal functions such as wavelets that could result in improved classification and knowledge transfer. Scale-invariant features could be used for enhanced temporal tracking and classification. Further multisource classification will also include the use of GOES data and other satellites to improve detection and tracking accuracy.

ACKNOWLEDGMENT This work was carried out at the Jet Propulsion

Laboratory, California Institute of Technology with funding from the NASA Applied Information Systems Research

Program. The author thanks S. Ho, A. Bingham and A. Panangadan for their help in acquiring satellite data.

REFERENCES 1. Li, H., B.S. Manjunath, and S.K. Mitra., Multisensor Image Fusion

Using the Wavelet Transform. Graphical Models and Image Processing, 1995. 57(3): p. 235-245.

2. Solberg, A.H.S., A.K. Jain, and T. Taxt, Multisource classification of remotely sensed data: fusion of Landsat TM and SAR images. IEEE Transaction on Geoscience and Remote Sensing, 1994. 32(4): p. 768- 778.

3. He, L.-M., F.-S. Kong, and Z.-Q. Shen. Multiclass SVM based land cover classification with multisource data. in Proceedings of 2005

International Conference on Machine Learning and Cybernetics. 2005.

4. Elidan, G., G. Heitz, and D. Koller. Learning Object Shape: From Drawings to Images. in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). 2006.

5. Leen, G. Learning Shared and Separate Features of Two Related Data Sets using GPLVMs. in NIPS ´08 Workshop: Learning from Multiple Sources. 2008. Whistler, Canada.

6. Crammer, K., M.J. Kearns, and J. Wortman. Learning from Multiple Sources. in NIPS 2006. 2006.

7. Globerson, A., et al., Euclidean Embedding of Co-occurrence Data,. Journal of Machine Learning Research, 2007. 8: p. 2265 - 2295.

8. Chechik, G. and N. Tishby, Extracting relevant structures with side information, in Advances in Neural Information Processing Systems. 2003. p. 857-864.

9. Tishby, N., F.C. Pereira, and W. Bialek. The information bottleneck method. in Proc. of 37th Allerton Conference on communication and computation,. 1999.

10. Scheffer, T. Learning from Multiple Sources by Matching Their Distributions. in NIPS ´08 Workshop: Learning from Multiple Sources. 2008. Whistler, Canada.

11. Shannon, C.E., A mathematical theory of communication. Bell Sys. Tech. J. , 1948. 27: p. 379–423, 623–56.

12. Lucas, B. and T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of the

International Joint Conference on Artificial Intelligence. 1981. p. 121-130.

13. Ho, S.-S.H. and A. Talukder. Automated cyclone discovery and tracking using knowledge sharing in multiple heterogeneous satellite data. in ACM/SIGKDD Conference on Knowledge and Data Mining (KDD-2008). 2008. Las Vegas: ACM/SIGKDD.

14. Ho, S.-S. and A. Talukder. Cyclone Tracking Using Multiple Satellite Data Sourcesvia Spatial-Temporal Knowledge Transfer. in AAAI-08 workshop: Transfer Learning for ComplexTasks. 2008. Chicago, IL.

QS Nargis t=0 TRMM t=3, EP=319Km , TRMM t=6, EP = 387Kms, TRMM t=9, EP= 486Kms , EMS =49Km EMS = 97Kms, EMS = 57Kms

TRMM t=11, EPKF = 434Kms QS Nargis Img t=12 TRMM t=15, EPKF= 225Kms , TRMM t=18, EPKF = 196Kms ,EMF = 146Kms EMS = 164Kms EMS =19Kms

Figure 7 Cyclone classification results from multiple sources using a predictive knowledge transfer mechanism [13] (yellow circle) and our multisource classification using a common-unique feature space transformation (white square). The ground-truth cyclone location as reported by the National

Hurricane center is shown as a green circle. The measurements from the sources QS and TRMM are shown over time t from t=0 to t=18 hours with typical time lapse between sequential TRMM observations of 3 hours, and QS observations every 12 hours.

8787