10
An Application of Data Driven Anomaly Identification to Spacecraft Telemetry Data Gautam Biswas 1 , Hamed Khorasgani 2 , Gerald Stanje 3 , Abhishek Dubey 4 , Somnath Deb 5 , and Sudipto Ghoshal 6 1,2,3,4 Inst. of Software-integrated Systems, Vanderbilt Univ., USA, [email protected] [email protected] [email protected] [email protected] 5,6 Qualtech Systems, Inc. (QSI), USA, [email protected] [email protected] ABSTRACT In this paper, we propose a mixed method for analyzing telemetry data from a robotic space mission. The idea is to first apply unsupervised learning methods to the teleme- try data divided into temporal segments. The large clusters that ensue typically represent the nominal operations of the spacecraft and are not of interest from an anomaly detection viewpoint. However, the smaller clusters and outliers that re- sult from this analysis may represent specialized modes of operation, e.g., conduct of a specialized experiment on board the spacecraft, or they may represent true anomalous or unex- pected behaviors. To differentiate between specialized modes and anomalies, we employ a supervised method of consult- ing human mission experts in the approach presented in this paper. Our longer term goal is to develop more automated methods for detecting anomalies in time series data, and once anomalies are identified, use feature selection methods to build online detectors that can be used in future missions, thus contributing to making operations more effective and improv- ing overall safety of the mission. 1. I NTRODUCTION As engineered systems have become more complex, and in- clude a range of operations that vary widely, self-monitoring, self-diagnosis, and adaptability to maintain operability and safety have become focus areas for research and develop- ment. Typical goals of such self-diagnosis approaches are the detection and isolation of faults and anomalies, identify- ing and analyzing the effects of degradation and wear, and Gautam Biswas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, pro- vided the original author and source are credited. providing fault-tolerant and fault-adaptive control (Blanke & Schr¨ oder, 2006; Chen & Patton, 2012; Isermann, 2005; Ji, Zhang, Biswas, & Sarkar, 2003; Noura, Theilliol, Ponsart, & Chamseddine, 2009). The majority of projects dealing with monitoring and diagnosis applications rely on models created using physical principles or by human experts. How- ever, these models are not always available, and are often in- complete, and sometimes erroneous. Moreover, it is hard to maintain the effectiveness of these models during a systems life-cycle. More recently, promising data-driven alternatives that exploit the large amounts of operational data collected from these systems are being employed to better understand system be- haviors and anomalies during system operations (Qin, 2012; Yin, Ding, Xie, & Luo, 2014). In data-driven approaches, monitoring and diagnosis knowledge can be learned by ob- serving and analyzing system behavior (Mack, Biswas, Kout- soukos, & Mylaraswamy, 2016, in press). This large amount of data collected using new, more robust sensors and sen- sor networks, can be exploited in a reliable manner for the purpose of detecting and analyzing anomalous situations and faults in these large and complex systems. The vision is de- veloping Cyber Physical Systems (CPSs) (Lee, 2008; Mar- wedel, 2010; Niggemann et al., 2015) that can observe their own behavior, recognize unusual situations during opera- tions, and inform operators, who use this this information to modify system operations, or plan for repair and mainte- nance. Furthermore, system’s experts and engineers can use the information gleaned from this data to update operations procedures and even redesign the system. In this paper, we take on the challenges of developing an anomaly detection scheme for analyzing telemetry data gen- 1

An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

An Application of Data Driven Anomaly Identification to SpacecraftTelemetry Data

Gautam Biswas1, Hamed Khorasgani 2, Gerald Stanje3, Abhishek Dubey4, Somnath Deb5, and Sudipto Ghoshal6

1,2,3,4 Inst. of Software-integrated Systems, Vanderbilt Univ., USA,[email protected]

[email protected]@vanderbilt.edu

[email protected]

5,6 Qualtech Systems, Inc. (QSI), USA,[email protected]

[email protected]

ABSTRACT

In this paper, we propose a mixed method for analyzingtelemetry data from a robotic space mission. The idea isto first apply unsupervised learning methods to the teleme-try data divided into temporal segments. The large clustersthat ensue typically represent the nominal operations of thespacecraft and are not of interest from an anomaly detectionviewpoint. However, the smaller clusters and outliers that re-sult from this analysis may represent specialized modes ofoperation, e.g., conduct of a specialized experiment on boardthe spacecraft, or they may represent true anomalous or unex-pected behaviors. To differentiate between specialized modesand anomalies, we employ a supervised method of consult-ing human mission experts in the approach presented in thispaper. Our longer term goal is to develop more automatedmethods for detecting anomalies in time series data, and onceanomalies are identified, use feature selection methods tobuild online detectors that can be used in future missions, thuscontributing to making operations more effective and improv-ing overall safety of the mission.

1. INTRODUCTION

As engineered systems have become more complex, and in-clude a range of operations that vary widely, self-monitoring,self-diagnosis, and adaptability to maintain operability andsafety have become focus areas for research and develop-ment. Typical goals of such self-diagnosis approaches arethe detection and isolation of faults and anomalies, identify-ing and analyzing the effects of degradation and wear, and

Gautam Biswas et al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.

providing fault-tolerant and fault-adaptive control (Blanke &Schroder, 2006; Chen & Patton, 2012; Isermann, 2005; Ji,Zhang, Biswas, & Sarkar, 2003; Noura, Theilliol, Ponsart,& Chamseddine, 2009). The majority of projects dealingwith monitoring and diagnosis applications rely on modelscreated using physical principles or by human experts. How-ever, these models are not always available, and are often in-complete, and sometimes erroneous. Moreover, it is hard tomaintain the effectiveness of these models during a systemslife-cycle.

More recently, promising data-driven alternatives that exploitthe large amounts of operational data collected from thesesystems are being employed to better understand system be-haviors and anomalies during system operations (Qin, 2012;Yin, Ding, Xie, & Luo, 2014). In data-driven approaches,monitoring and diagnosis knowledge can be learned by ob-serving and analyzing system behavior (Mack, Biswas, Kout-soukos, & Mylaraswamy, 2016, in press). This large amountof data collected using new, more robust sensors and sen-sor networks, can be exploited in a reliable manner for thepurpose of detecting and analyzing anomalous situations andfaults in these large and complex systems. The vision is de-veloping Cyber Physical Systems (CPSs) (Lee, 2008; Mar-wedel, 2010; Niggemann et al., 2015) that can observe theirown behavior, recognize unusual situations during opera-tions, and inform operators, who use this this informationto modify system operations, or plan for repair and mainte-nance. Furthermore, system’s experts and engineers can usethe information gleaned from this data to update operationsprocedures and even redesign the system.

In this paper, we take on the challenges of developing ananomaly detection scheme for analyzing telemetry data gen-

1

Page 2: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

ITAR RESTRICTED MATERIAL

WARNING

Simplified Block Diagram

May 17-20, 2011 51

Payload - 4 Instruments

VDUModule

CPU Board(RAD 750)

PAPI #1Heaters/Arm&Fire

IO Board(AMOAB)

S-band

TX

Li Ion

Battery

SACISolar Array and Charge Interface

Temp. Sensors

MLI Blankets

Heaters

IMU

`

X

Fuel

Tanks

Oxidizer

Tanks

He Pressurant Tanks

Propulsion/RCS Subsystem

RCS

Thrusters

OCS

Thruster

UVS

NMS

S-band

RX

Reaction Wheels (4)

CSS (12)

Solar

Panel

(6)

Interface/Power

Controller

Structures &

MechanismsBus Module

Payload Module

Extension Module

PAPI #2Propulsion/Arm&Fire

Spare Slot

SATORI #2Power SC Operations

SATORI #1Power Payload Operations

Avionics Chassis

Solar

Panel

(8)

1

8

7

6

5

4

3

2

LADEE Bus Architecture Functional Block Diagram

S-Band

Antennas

(3)

Electrical Power Subsystem

Command & Data Handling

Propulsion Module

LEGEND

Lightband

Telecommunications

Thermal Control Subsystem

Power

Low speed data

Solar

Panel

(8)

Radiator

(Top Side)Diplexer

Splitter/

Coupler

High speed data

Mass Memory Board, Digital I/O

& High Speed LVDS (DMOAB)

LDEX

LLCD

Single Stage

Extension Module

Star

Tracker

Guidance , Navigation and Control

TT/R

T/R

3

Solar

Panel

(8)

31.6 in.

cP

CI 3

3M

Hz 3

2b

it B

ac

kp

lan

e

Analog

RF

SwBP

FilterCoupler

Coupler

Heaters

Temp

Sensors

LADEE CDR ITAR RESTRICTED MATERIAL WARNINGFigure 1. Functional block diagram of the LADEE spacecraft (image credit: NASA/ARC)

https://directory.eoportal.org/web/eoportal/satellite-missions/l/ladee

erated by NASA’s Lunar Atmosphere and Dust EnvironmentExplorer (LADEE) spacecraft1. LADEE was a robotic mis-sion that orbited the moon to gather detailed informationabout the structure and composition of the thin lunar atmo-sphere, and determine whether dust is lofted into the lunarsky. The LADEE spacecraft’s modular common spacecraftbus, or body, designed, developed, and operated by NASA’sAmes Research Center, innovated away from custom designsand transitioned toward multi-use designs and assembly-lineproduction, which could drastically reduce the cost of space-craft development (Hine, Spremo, Turner, & Caffrey, 2010).The LADEE system block diagram, shown in Figure 1, con-sists of five primary subsystems: (1) the Integrated Avionicssystem, (2) the Propulsion system, (3) the Attitude Controlsystem (ACS)), and the Electrical Power Subsystem (EPS).Our focus is to develop a general data-driven monitoring ap-proach for telemetry (i.e., streaming time series) data for pur-poses of health monitoring, which includes fault and anomalydetection, prognosis, and performance analysis of the moni-tored system.

Our specific focus in this paper is on developing a generalunsupervised method for data-driven anomaly detection incomplex systems. The rest of this paper is organized as fol-

1see https://www.nasa.gov/mission pages/ladee/main/index.html

lows. Section 2 defines anomaly detection problem for space-craft longterm missions. Section 3 formally describes ourapproach to anomaly detection, and lays out the descriptionof our methodology. Section 4 shows an application of ourmethodology to telemetry data from the EPS of the LADEEspacecraft. Using examples, we illustrate the intertwining ofthe mode and anomaly detection problem. Finally, Section 5presents a discussion and conclusion based on the results orour case study, and briefly discusses how we will extend thisapproach in future work.

2. PROBLEM FORMULATION

A spacecraft is a mixed discrete-continuous system (hybridsystem). For example, a reaction wheel which is a contin-uous system with an electric motor to rotate the spacecraftaround its center of mass is controlled by a discrete proces-sor. The discrete states of the spacecraft are called modes andthe switches or external events define the discrete dynamicsof the system. The behavior of the spacecraft depends onthe system mode: each mode change can be: (1) controlled,i.e., they are initiated by a set of discrete switches by the sys-tem controller or a human operator; and (2) autonomous, i.e.,they are induced by external events (e.g., when the spacecraftmoves from region where it receives sunlight to one where it

2

Page 3: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

is dark) or they may be caused by a change internal to the sys-tem (e.g., a battery may be completely drained, or a fuel tankoverflows). In both cases, the mode change changes the sys-tem model, and, therefore, the system dynamics. In this work,we assume the spacecraft is a hybrid system that is modeledas a hybrid automaton (Henzinger, 2000) with the followingdefinition.

Definition 1 (Hybrid Automaton) A hybrid automaton His defined by a 5-tuple, H = {Q,Rn, f, ϕ, ρ}, where Qrepresents a set of discrete states; Rn represents the spaceof continuous behaviors; f : Q × Rn → Rn representsthe vector field that defines continuous behaviors in a mode;ϕ : Q× Rn → Q represents the discrete transition function,and ρ : Q×Rn → Rn represents the reset map.

Generally, before a spacecraft launch, scientists and systemdevelopers plan the entire mission. This is called space mis-sion planning. Some components of the plan may be up-loaded to controllers on the spacecraft, others are manuallycommanded or uploaded as a mission progresses by the mis-sion controllers. Considering the spacecraft as a hybrid au-tomata, we can define the expected mission trace as a se-quences of time transitions that the spacecraft is designed tofollow during the mission.

Definition 2 (Expected Mission Trace) The expected mis-sion trace, MT consists of an initial mode qstart ∈ Q, afinite set of mode transitions, Tr : Q → Q, which may becontrolled or autonomous; and a sequence of intermediatemodes Qm = {qi, qj , · · · qk}, where each intermediate modeqi ∈ Qm has a start time tsi and an end time tei. The finalmode that ends a mission is qend.

A mode trace is considered normal if and only if every modethe system enters is expected according the hybrid automatamodel, H .

Definition 3 (System normal operating mode) A mode(qi, tsi, tei) is a normal operating mode if and only if(qi, tsi, tei) ∈MT .

Because of unpredicted events and possible faults and degra-dation in the system, it is possible that the system starts be-having in an unexpected way for an interval of time during themission. Since this behavior cannot be explained or justifiedusing the mission plan or by a mode transition in the defini-tion of H , then the system is considered to be in an abnormalmode.

Definition 4 (System abnormal operating mode) A mode(qi, tsi, tse) is an abnormal operating mode if and only if(qi, tsi, tse) /∈ MT , or there is no defined transition froma mode qk ∈ Q to mode qi.

The objective here is to develop a method to detect the ab-normal behavior modes during spacecraft operation. Thishelps the system developers to study the abnormal modes

and analyzes their root causes to prevent them from occur-ring in future missions. A common approach to detect nor-mal and abnormal operating modes in hybrid systems is usingthe state estimation approaches (Hanlon & Maybeck, 2000;Blom & Bar-Shalom, 1988). Estimation approaches typi-cally use multiple-model-estimation schemes to track stateestimates over time, and therefore, require at least as manyfilters as there are modes in the system. In the LADEE space-craft, we have at least 67 switches in the power system, whichmeans we have at least 267 modes. A model-based approachto designing 267 filters is unrealistic and impractical – mostof these modes are not likely to occur in any spacecraft mis-sion. A more realistic approach would be to build detectorscorresponding to the system Mission Trace, but that wouldmiss unanticipated anomalous behaviors in the system.

3. DATA DRIVEN ANOMALY IDENTIFICATION

The objective here is to develop a method to detect abnormalbehaviors that may have occurred during the spacecraft oper-ations for a long-term mission. Since the possible discrepan-cies, faults, or errors that may occur are unknown before themission takes place, we have to develop approaches that candiscover them by analyzing mission telemetry data. If thiscan be accomplished, system designers and mission special-ists can perform detailed studies in the time intervals whenthe abnormal behaviors occur. This will help them identifythe root causes, which in turn will influence the design offuture spacecraft to avoid such anomalies. Alternately, mon-itors that can detect such anomalies in an automated fashioncan be designed that allows mission specialists to come upwith corrective actions or change the mission plan and avoidadverse incidents.

In our work, we have applied unsupervised learning tech-niques to find groupings in a large database of time seriesdata. Our approach is to divide the time series representingthe entire mission trajectory into segments, and each segmentrepresents an object of interest on the mission time line. Theselected size of each time interval (time window) is a trade-offbetween the number of objects created from the time series,and having enough data per window to detect and character-ize abnormal versus nominal behavior. Typically, most of theobjects will represent nominal operations of the spacecraftduring the mission, but a small subset of the time segments(objects) may represent anomalous or faulty behaviors. Sincethe anomalous behaviors or faults are not known beforehand,we apply a clustering algorithm to group the objects. Ourhypotheses is that the larger groups of clusters will repre-sent nominal operations, whereas outliers and smaller groupsmay represent anomalous situations. In general, researchershave developed classifier or supervised methods for charac-terizing known faults and semi-supervised and unsupervisedmethods for discovering and characterizing unknown faults

3

Page 4: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

Mission Time Line

𝑂1 𝑂2 𝑂𝑛 𝑂𝑖

𝑣1, 𝑣2,…, 𝑣𝑗 , …, 𝑣𝑚 Variables

Time Series time

𝑣𝑗

𝑓1, 𝑓2,…, 𝑓𝑙 𝑚 ∗ 𝑙 features for each

object 𝑂𝑖

Figure 2. Data preprocessing and feature extraction.

and anomalies. In this work, we propose a mixed method foranomaly detection in a robotic space mission.

In more detail, we have developed a multi-step unsupervisedlearning method to distinguish the outliers which include spe-cialized modes of operation and abnormal behaviors from thenormal operation. Then, we employ a supervised method ofconsulting human mission experts to differentiate betweenspecialized modes and anomalies. First, we select a set ofobjects from a curated data set, where each object is repre-sented by a set of features, typically segments of a time seriessignal. The next step is to convert each feature signal intoa set of feature values that make the data amenable to tra-ditional numeric clustering methods. To do this we apply awavelet transform to each time series segment, and representthat waveform in terms of coefficients that define the wavelettransform. In addition to generating numeric feature values,this approach also serves as an approach for compressing andsmoothing a signal. Then a hierarchical clustering approachis used to generate clusters from the extracted features. Theoutputs of the clustering algorithm are preliminary groups oftime intervals for further consideration.

The input to the process is operational data, which is thetelemetry data transferred to the earth from the spacecraft dur-ing the mission. This dataset contains many variables fromdifferent subsystems of the spacecraft which are measured orcomputed with different sample rates. In general, this datasetis extremely large. To reduce computational complexity, weuse the following approaches; 1) data reduction, 2) featurereduction. The data reduction here means selection of rele-vant time series waveforms for object definition to describeanomalies. V = {v1, v2, ..., vm} represents the selected vari-ables. Figure 2 illustrates the data preprocessing procedure.

As discussed earlier, we divide the mission into segments,and each segment represents an object of interest on the mis-sion time line. k is the number of samples in each time in-terval (time window). The time windows are the objects,O = {O1, O2, ..., On}. In fact, each object Oi ∈ O is atime series which contains k samples of each select variable.

In this work, we use the discrete Haar wavelet transform toextract the scaling coefficients which correlate to the low fre-quency of the time series signals. We consider the first l co-efficients of the wavelet transform for each variable (See Fig-ure 2). The set of coefficients for object Oi, is presented withfi ∈ Rm∗l in Figure 3, where m ∗ l represents the numberof wavelet coefficients derived from the set of time series sig-nal segments defining the object. The algorithm then uses thegenerated features and a weighted Euclidean distance mea-sure to build the dissimilarity matrix between every pair ofobjects, i.e., Dnn = dist(Oi, Oj), 1 ≤ i, j ≤ n. We then runa UPGMA (Unweighted Pair Group Method with ArithmeticMean), agglomerative (bottom-up) hierarchical clustering al-gorithm to generate a dendrogram that represents the orderin which the objects group into clusters. The advantage ofthis approach is that the number of clusters does not have tobe pre-determined. We then use heuristic methods to cut thedendrogram at a level (i.e., a distance measure) that providesa distinct grouping of clusters. In other words moving thedistance level at which the dendrogram is cut by small levelswill not change the number of clusters that are generated.

The next step is the interpretation process and the identifica-tion of anomalous groups. We assume the results of the clus-tering produce one or more large clusters where most of timeintervals reside, with the implication that these large clustersrepresent mostly nominal behaviors because overall the mis-

4

Page 5: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

Operational

Data

𝑂1

𝑂2

𝑂𝑛

.

.

.

Preprocessing

𝑓1 =

𝑜1⋮

𝑜𝑚∗𝑙

𝑓2 =

𝑜1⋮

𝑜𝑚∗𝑙

. . . 𝑓𝑛 =

𝑜1⋮

𝑜𝑚∗𝑙

Clustering . .

𝐶1

. 𝐶2

𝐶𝑝

Nominal

Outliers:

Special

Operation

Modes

Abnormal

Behaviors

Anomaly Detection Process

Figure 3. Our proposed approach for anomaly detection.

sion was a success. We label these clusters as correspondingto nominal operations of the spacecraft. Since a spacecraft ison a complex mission that involves multiple maneuvers andalso a number of scientific experiments, we then time focuson the smaller clusters, and start with the hypotheses that eachone of the cluster represent: (i) special operating modes , or(ii) anomalous or faulty behavior. We identify special op-erating modes by looking at the discrete waveforms that areavailable, such as switches that indicate a particular load rep-resenting a science experiment or a spacecraft maneuver wasturned on. This helps us assign non anomalous labels to anumber of the smaller clusters. For the rest, we seek expertinput.

A typical approach here would be to compare the features ofthe possible anomalous groups against the labeled nominalgroups. We can do this systematically by running a classi-fier algorithm and determining the features that best separatethese groups. These features then define the anomalous pat-terns that do not confirm to expected behavior. In this work,to identify anomalies we perform the following analysis onthe small clusters (outliers): 1) we study the features that suf-ficiently distinguish the outliers from the nominal data, 2) westudy the switches that went on or off during the outlier in-tervals, 3) we talk with the experts to confirm our hypothe-sis. With possible anomalous groups and a nominal base ofclusters, a feature selection algorithm is applied to identifythe relevant features that differentiate each anomalous group.A switch can explain special operating conditions due to themode transitions in the system. An expert will use the distin-guishing features and binary transitions during the outliers tocharacterize the anomalies and their level of failure. Coupledwith the nominal sets, these groups can be used to producenew models of anomaly detection.

4. CASE STUDY: LADEE EPS

The data set was a collection of time series data recordedover a 1 year long lunar mission. We extracted only voltageand current features (total 34) from this dataset. We believethis provided sufficient information for a mode and anomalyidentification. This paper explores a data-driven mode andanomaly detection approach where the algorithm has accessto the electric power subsystem data of a robotic satellite. Thedataset has the following characteristics:

1. There are 265 time series variables in the electrical sub-system dataset.

2. There are 17 time series binary variables in zero or oneformat among the variables.

3. There are 50 time series binary variables in on or off for-mat among the variables.

4. There are 7 time series variables with voltage indexamong the variables. These variables include:• Battery voltage• Solar array voltages• Load voltages

5. There are 27 time series variables with current indexamong the variables. These variables include:• Battery current• Solar array currents• Load currents

6. The mission is 223 days long and there are 574687 sam-ples for each time series variable.

7. The sampling rate is not constant and the time betweentwo samples can vary from 0.4s to 10195s in the dataset.

4.1. Data preprocessing and feature extraction

We have decided to break each time series into 1512 timewindows where each time window includes 380 samples. The

5

Page 6: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

sampling rate of the recorded data was not constant, thereforea time window may represent from 5 minutes to 10 hours ofoperation. In average each time window represents 3 hoursand 31 minutes. The selected size of the time window was atradeoff between detection accuracy and having enough dataper window to detect/identify an abnormal behavior.

The task of the data processing is to break the recorded timeseries data into distinct features, which were used as inputsof the clustering algorithm. Wavelet transform is an impor-tant mathematical tool to analyze time series data because itcontains both time and frequency information of a signal. Awavelet is an oscillation function which increases from zero,and then returns to zero. The wavelet transform of a signalis a convolution of a wavelet function with the signal. Thosewavelet functions have different shapes and sizes and are im-plemented as band-pass filters. The output of the wavelettransform is a set of coefficients which capture the time (po-sition in time) and frequency characteristics of the signal.

In signal processing wavelets are typically used in peak de-tection, noise reduction, and data compression. The discretewavelet transform was applied to reduce the computationalcomplexity and to filter the noise of all variables by extractingthe scaling coefficients which correlate to the low frequencysubbands. We used the package of functions for computingwavelet filters, wavelet transforms and multiresolution analy-ses (Aldrich, 2010) in R software environment2 to extract thewavelet coefficients. The wavelet transform was decomposedup to the 8th coefficient for each variable. This resulted in272 features per time window.

4.2. Clustering

In the next step the euclidean distance was employed to com-pute the pairwise distance between the temporally segmentedobjects. The time series waveforms were converted to a set offeatures represented by the wavelet transforms as discussedearlier. We used the R function hclust to generate the de-drogram shown in Figure 4. We selected UPGMA as the ag-glomerative clustering in this case study. The dendrogramrepresents the order in which the objects group into clusters.We used a heuristic, which suggested cutting the dendrogramat a height, where small changes in the location of the cutwould not cause changes in the number of groups extractedfrom the dendrogram. The small clusters generated were po-tential anomalies. We will investigate them in more detailusing the binary variables (switches) as context to understandthe spacecraft’s mode of operation, and experts recommen-dations to further characterize the condition. Cutting the treeat lower heights will generate more clusters and anomalies.Therefore, the level at which the dendrogram is cut representsa trade-off between increasing the precision of the anomaly

2see http://www.R-project.org/

Figure 5. Anomalies and special modes during the mission

definitions, while also reducing the number of anomalies anexpert would have to analyze.

4.3. Anomaly Detection

Figure 5 shows the most relevant groups plotted over time(mission days) detected by the clustering algorithm. Eachgroup represents a mode or an anomaly. Table 1 shows an ab-stract description of the most important modes and anomaliesthat were discovered from the groups generated.

4.3.1. Group 1: Normal operation

As we expected, the clustering algorithm generates a largecluster where significant number of time intervals reside.This cluster represents the normal operation of the spacecraft.

4.3.2. Group 2: The reaction wheels control problem

This time interval belongs to the beginning of the missionand is distinguished from the rest of the data set because ofrelatively high current in the SATORI #2 subsystem. As it isshown in Figure 6, the reaction wheels go off during this timeinterval. Note that Figure 6 shows that the reaction wheelsgo off two times during the mission. However, the expertsin NASA confirmed that the reaction wheels only went offonce during the mission and the second zero in the figure isa result of bad data. Considering the reaction wheels and thehigh current in the SATORI #2 subsystem, we found it verylikely that this incident related to the guidance navigation andcontrol unit. The experts from NASA confirmed that this timeinterval represents an anomaly and the incident is as follows.

In the first few orbits around the earth, the spacecraft began tospin at a faster rate than was expected, and the reaction wheelswere turned off by the control software to avoid a high currentload on the battery by the guidance, navigation and control(GNC) system. When the reaction wheels were turned off, the

6

Page 7: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

1367

443

347

392

1499

346

616

1425

4415

003 20

305

395 1

102

131

373

481

548

549

734

740

483

543

544

604

617

531

609

534

729

733

742

550

530

728

739

653

655

730

496

532

546

606

533

636

654

732

741

268

391

1366

1917 18

164

1439

4314

121 2

1483

681

682

683

990

1142

920

1255

749

216

241

333

294

340

1368

1385

278

281

460

34 132

1261

1389

279

1497

558

411

871

634

536

488

494

1415

647

652

501

495

727

552

1377

949

1234

1354

842

1294

1020

891

1041

778

1071

1172

2739 12

444

144

261

474

881

582

8 1116

492

1009

46 5625

639

034

434

526

430

255

262

260

261

263

130

129

52 5425

725

957 59

53 301

255

258 45

58 6014

6823

123

216

216

363

814

3614

1364

949

353

514

2360

860

361

061

564

260

161

314

5914

3071

171

316

142

273

743

942

443

542

042

932

141

842

890

438

427

434

123

322

8843

243

143

041

542

142

643

643

343

745

646

5 115

711

811

612

0 9111

111

395

109

9311

412

232

541

942

511

911

289

417

324

735

363

423

92 9636

212

111

011

732

332

611

594 36

473

641

441

614

2071

615

0713

4814

28 1170

1191

1333

1343

1390

1410

1417

409

587

562

928

1060

582

1069 78

710

6580

523

340

814

2692

480

479

593

864

876

810

4694

310

5114

3261

156

578

381

715

0314

7214

3714

4915

1215

1015

1114

4510

9656

356

651

775

256

769

812

9256

466

466

611

3013

9214

1167

031

740

140

740

640

040

548

950

2 500

487

490

482

484

540

538

542

1435

1100

1493

626

1111

1427

1265

1274

717

1414

1454

1121

1279

539

410

725

1193

503

719

758

1056

1042

791

934

519

762 52

491

054

555

189

258

810

6110

4058

364

360

263

774

740

441

241

340

240

312

2515

0912

0113

7291

890

159

350

992

957

011

0786

510

0499

910

2511

9273

810

3210

9212

1647

575

790

291

469

710

9514

4470

114

71 146

748

086

915

0412

2710

7910

8410

4513

5175

110

7313

5952

212

0275

351

215

0293

713

3413

3913

5310

3710

5912

3712

2678

112

7294

596

1 140

180

395

312

6712

4212

8112

5814

5612

5914

0496

211

1511

0377

113

56 1072

779

1355 12

3512

56 650

796

925

1397

1382

935

1081

665

964

811

974

561

621

829

955

1440

800

812

820

1365

640

1388

1408

948

969 9

5980

663

995

014

6946

146

447

344

746

345

344

547

098

910

4777

413

4912

2913

7013

7493

012

4564

613

69 801

1248

965

684

951

1217

1266 1

137

1236

1446

660

576

580

1070

788

1066

1484

661

672

1197

457

896

1194

1209

1344

750

523

557

1375

1075

1357

939

956

1160

1476

1126

1288

1464

1147

1077

1136

1131

1140

1314

1300

1489

1481

1177

578

1161

1187

1329 13

1911

6613

05 1148

1182

1309

1323

1152

1477

1269

1419

574

595

572

585

597

628

630

632

766

1421

1284

1293

843

1295

904

972 13

3213

6213

3713

4212

5311

5014

0670

612

7883

611

5885

368

969

412

9014

7512

6461

964

565

150

855

552

152

7 118

068

871

573

154

748

650

711

3581

995

882

565

896

013

8413

9511

3482

210

8311

7811

7979

786

4 693

662

667

765

926

976

663

724

1024

1104

1247

1238

1246

1462

1491

1505 88

688

913

3611

9912

1176

479

8 835

1174

834

1145

1155

1347

946

1396

1017

1189

873

863

936

894

1327 1

232

802

1085

1277

1496

977

1405

993

793

1016

912

1443

1453

685

818

957

1008 99

811

2014

7412

0513

1511

1213

3814

0914

1613

4011

9510

2813

1612

9613

01 1146

966

1123

1088

1271 9

4112

41 1099

1089

1098

1206

1257

944

807

808

931

1326

1457

1434

1479

1154

1094

673

770

963

1282 8

8275

679

414

0011

6414

4810

5890

991

310

1211

0991

710

02 932

839

916

686

1273

840

856

814

1068

1298

1118

1214

1188

1306

1487

789

1220

695

723

879

940

705

1113

1243

1064

1322

968

790

1358 10

0711

59 103

910

4979

994

211

2590

093

380

912

5213

7113

9412

5114

5111

6714

9484

671

082

388

3 718

1303

1031

1308

1363

556

947

1101

1156

1506

767

1124

967

1262

1110

1466

1461

987

1287

1480

982

1149

971

1276

994

1128

997

1452 8

4914

8614

4211

1413

8111

0614

3314

4198

176

013

30 1078

1200

1221

780

1215

895

1034 70

088

713

1299

114

5898

611

19 952

813

985

782

1139 86

810

5410

55 1053

1080

1076

810

1090

1038

792

1207

1196

1297 8

4510

1189

978

611

6311

8590

899

283

112

3088

892

267

783

011

33 1218

1399

1231

1350

1268

1345

1093

1376

980

1105 9

2312

5014

4710

6312

6314

2914

3814

5013

5210

7413

9313

3111

7511

9012

9113

8011

2914

2210

1387

410

1813

4112

1011

8313

4613

1714

9510

2214

8585

513

3513

1113

2147

950

551

867

166

950

655

974

649

977

2 599

537

528

905

1213

1087

1240

1108

1403

592

954

1431

1122

1254

449

674 5

41 668

560

854

678

1460

699

1270

1289

1455

600

633

515

520

635

1391

1386

1086

1091

573

584

590 71

412

2213

6013

7859

457

513

7357

158

612

0312

0811

9812

2813

8313

9812

3310

8212

4913

8714

0760

712

6014

2462

062

411

1757

757

958

155

459

1 1244

641

612

622

596

598

1239

1402 6

2963

160

561

8 569 62

3 526

553

776

1302

1307

1033

1184

761

785

775

850

1285

1325

1097

1361

1379

1067

1219 87

587

814

1812

8613

20 708 82

710

1582

685

971

211

6512

8313

2813

1311

7611

8614

8813

0412

9911

8113

1869

691

9 504

1473 33

1052

1498

1492

872

1364

644

625

627

1044

769

784

1501

1508

921

927

451

687 86

011

3811

4311

6910

0314

9044

481

688

010

1084

849

813

2488

158

912

1214

6570

248

569

087

7 116

211

5385

813

1010

0114

7886

710

2799

697

098

411

2712

8082

184

411

3283

897

510

5012

24 100

612

0491

191

510

4875

510

3611

4411

68 102

372

090

7 893

759

1057

1029

1043

903

898

1062

1171

866

862

1173

1141

833

1482

675

1223

656

1275

454

979

763

841

691

721

861

870

497

679

399

1005

857

1000

529

1035 4

5910

26 773 77

751

390

674

356

885

189

770

770

398

898

310

14 472

467

474

1030

516

525

462

744

726

709

754

455

511

446

452

847

1470

469

885

837

876

471

704

1021

458

676

657

692

450

659

468

978 68

072

2 832

995 1

019

491

890

448

476

824

514

852

510

745

884

973

1151

1463

398

440 62 165 37

628

261 28

0 318

270

271

349

276

196

277

371

177

218

290

245

285

248

296

240

217

289 2

14 295

249

275 2

84 288

224

49 267

3032

782

219

229

283

184

244

266

3513

936

732

107

172

477

357

478

21 137 8

737

532

937

438

240 34

8 246 25

0 269

359

169

272

361

355

354

394

254

243

252

67 198

70 204 22

520

673 15

385

368

187

306 29

118

822

3 152 19

518

920

128

620

919

138

421

021

5 230

386

154

146

194

8064 25

386

149

150

222

293

66 227

208

297

7620

522

079 14

281

7719

368 69

7814

721

221

338

114

371 29

236 38 4

1 5032

837

9 331

380

237

197

211

48 393

265

274

31 160 1

7631

623 32

024

218

118

2 105

101

106 35

023

631

012

551

128

273

356

358

179

303

319

353

134

396

389

360

300

343

298

337

159

304

352

25 2817

131

138

831

331

533

038

714

115

610

216

8 126 30

919

018

537

024

717

419

230

726

170

312

235

175

178 2

3928

736

634

115

720

233

229

942 38

597 31

4 238

335

383

377

339

342

336

338

207

221

151

145

155

7437 37

222

675

65 7218

622

814

420

315

847

63 8314

884 18

320

018

019

936

9 98 99 140

167

104

108

127

103

2224 29

133

136

234

135

138

397

100

166 3

7835

133

436

517

325

130

813

466

1611 14

1510 12

57 8

64 9

010

2030

4050

6070

Cluster Dendrogram

hclust (*, "average")d

Hei

ght

Figure 4. Hierarchical clustering

Figure 6. Reaction wheels (OFF=0, ON=1)

spacecraft stopped rotating, and, therefore, one side becametoo hot, and the other side too cold. To keep the temperaturebalanced several heaters went on which led to high currentin the SATORI #2 subsystem. Figure 7 shows normalizedSATORI # 2 current during group 2.

4.3.3. Group 3: The lunar orbit insertion

As it is shown in Figure 5 group 3 consists of three time in-tervals which occur in three different days. A high variationin the PAPI # 2 (propulsion, see Figure 1 ) subsystem’s cur-rent distinguishes this group from the rest of the dataset. Fig-ure 8 shows normalized PAPI # 2 current during this group.

0 100 200 300

−5

05

1015

2025

30First window

Samples

Nor

mal

ized

SAT

RO

I # 2

Cur

rent

0 100 200 300

05

1015

Second window

Samples

Figure 7. Normalized SATORI # 2 current during group 2

Moreover, the valve driver unit which controls the propulsionsubsystem, and the pressurant tanks heaters which also be-longs to the propulsion subsystem (see Figure 1) goes ON inall the three time intervals. This group represents a uniquebehavior in the dataset, however, the experts confirmed thatthis does not represent an anomaly behavior and we shouldclassify this group as a special operation mode. The expertsinformed us that this group represents the lunar orbit insertionprocess. In fact, there were three firing process in the propul-sion subsystem to get into lunar orbit and our algorithm wasable to classify them in a single group.

7

Page 8: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

0 100 200 300

05

1015

First window

Samples

Nor

mal

ized

PA

PI #

2 C

urre

nt

0 100 200 300

02

46

810

Second window

Samples

0 100 200 3002

46

810

12

Third window

Samples

Figure 8. Normalized PAPI # 2 current during group 3

4.3.4. Group 4: The laser communication test

This mode of operation consists of two time windows whereeach of them is almost 20 minutes long. To explain this mode,we should notice that a laser communication test occurs dur-ing each time window. These tests are part of the missionplan and occur multiple times during the mission. However,because of the new moon, the solar array current is almostzero during this mode. The high current demand due to thelaser communication test in the absence of solar energy putstoo much pressure on the battery and leads to a battery voltagedrop. Figure 9 shows the normalized battery voltage duringeach time window.

0 100 200 300−3.

0−

2.6

−2.

2−

1.8

First window

Samples

Nor

mal

ized

bat

tery

vol

tage

0 100 200 300−4.

0−

3.5

−3.

0−

2.5

−2.

0

Second window

Samples

Figure 9. Normalized battery voltage during group 4

This group represents an abnormal behavior because the lasertest in the absence of solar energy leads to unintended conse-quences.

4.3.5. Group 5: The eclipse

This group is related to the eclipse. The time interval is al-most 5 hours long, because the sampling rate gets very low

Figure 10. Normalized battery voltage during the mission

at the end of the mission. The solar array current is zero dur-ing the first 2 hours. Then it starts turning on, and switchesbetween zero and none zero after that. Several heaters wenton to keep different subsystems temperatures in normal rangein the absence of sun light. This increases the load current.A simultaneous increase in the load current and decrease inthe solar array current put an unprecedented pressure on thebattery, which leads to battery voltage drop. We can classifythe eclipse as an abnormal behavior because the battery is notdesigned to operate in this situation. Figure 10 shows the nor-malized battery voltage during the mission. We can see theeclipse at the end of the mission (day 219).

4.3.6. Group 6: The safe mode

The system goes to the safe mode right after the eclipse. Dur-ing this mode, several loads were switched off to limit en-ergy consumption and provide enough current for the bat-tery recharge. This group represents a unique behavior in thedataset, however, it does not represent an anomaly behaviorand we should classify this group as a special operation mode.

5. CONCLUSIONS, AND FUTURE WORK

In this paper, we presented a data-driven anomaly detectionmethod. We defined the problem and presented a new ap-proach for anomaly detection. Our approach uses an unsuper-vised learning method to detect the outliers that may representspecial modes of operation, or they may be anomalies. To dif-ferentiate between special modes and anomalies, the methoduses a supervised approach of consulting human mission ex-perts. We applied our approach to detect anomalies during theLADEE mission. To detect anomalies, we looked for the fea-tures that were significant actors in differentiating the outliersfrom the nominal set. We also considered the binary switchesduring each time interval as the indicators of mode changes.Finally, we used domain experts to validate the anomalies.

8

Page 9: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

Table 1. Abstract overview of detected modes and anomalies

Group Detected Mode orAnomaly

Voltage or Current Switches

1 Normal operation mode2 Reaction wheels anomaly

• SATORI # 2 current had ahigher variance

• Propulsion heater turned on• Star tracker went off

3 Lunar orbit insertion mode• PAPI # 2 current had large

variations• SATORI # 2 current had

large variations

• Pressurant tank heater wenton

• Valve driver unit went on

4 Laser communication testanomaly • Solar array current was

zero because of new moon• Battery voltage dropped be-

cause of high current de-mand of the laser commu-nication test

• Laser communicationsswitch went on

5 Eclipse mode• Solar array current was

zero during the first 2hours, and had high fluctu-ation afterward.

• Several heaters went on(e.g. Propulsion heater)

6 Safe mode• Battery current shows a

high fluctuation• Several loads (e.g. star

tracker) turned off

We found that a number of detected anomalies were quite in-teresting to the experts including a fault in reaction wheelscontrol system, and a laser test that caused a drop in the bat-tery voltage. In both cases the experts confirmed the anoma-lies and flagged them for further investigation. The proposedapproach performs well when the number of outliers is smallenough for examining them one by one. In future work, wewill investigate different algorithms to further automate theprocess.

ACKNOWLEDGMENT

This work was partially supported by funding from NASASTTR grant # NNX15CA11C. The authors gratefully ac-knowledge the support provided by Scott Poll, Mark Shirley,Peter Berg, and other members of the LADEE Mission Opsteam from NASA Ames in acquiring, analyzing, and inter-preting the LADEE Telemetry data.

REFERENCES

Aldrich, E. (2010). Wavelets: a package of functions forcomputing wavelet filters, wavelet transforms and mul-

tiresolution analyses. R package version 0.2-60..Blanke, M., & Schroder, J. (2006). Diagnosis and fault-

tolerant control (Vol. 691). Springer.Blom, H., & Bar-Shalom, Y. (1988). The interacting multiple

model algorithm for systems with markovian switch-ing coefficients. Automat. Contr., vol. 33, pp. 780-783,Aug. 1988.

Chen, J., & Patton, R. J. (2012). Robust model-based fault di-agnosis for dynamic systems (Vol. 3). Springer Science& Business Media.

Hanlon, P., & Maybeck, P. (2000). Multiple-model adap-tive estimation using a residual correlation kalman fil-ter bank. IEEE Trans. Aerosp. Electron. Syst., vol. 36,pp. 393-406, Apr. 2000.

Henzinger, T. A. (Ed.). (2000). The theory of hybrid au-tomata. Springer Berlin Heidelberg.

Hine, B., Spremo, S., Turner, M., & Caffrey, R. (2010).The lunar atmosphere and dust environment explorer(ladee) mission. In Ieee aerospace conference.

Isermann, R. (2005). Model-based fault-detection anddiagnosis–status and applications. Annual Reviews inControl, 29(1), 71–85.

Ji, M., Zhang, Z., Biswas, G., & Sarkar, N. (2003).

9

Page 10: An Application of Data Driven Anomaly Identification to … · 2016. 10. 18. · system, (2) the Propulsion system, (3) the Attitude Control system (ACS)), and the Electrical Power

ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016

Hybrid fault adaptive control of a wheeled mobilerobot. Mechatronics, IEEE/ASME Transactions on,8(2), 226–233.

Lee, E. A. (2008). Cyber physical systems: Design chal-lenges. In Object oriented real-time distributed com-puting (isorc), 2008 11th ieee international symposiumon (pp. 363–369).

Mack, D. L., Biswas, G., Koutsoukos, X. D., & My-laraswamy, D. (2016, in press). Learning bayesiannetwork structures to augment aircraft diagnostic ref-erence models. IEEE Transactions on Automation Sci-ence and Engineering.

Marwedel, P. (2010). Embedded system design: Em-bedded systems foundations of cyber-physical systems.Springer Science & Business Media.

Niggemann, O., Biswas, G., Kinnebrew, J. S., Khorasgani,

H., Volgmann, S., & Bunte, A. (2015). Data-drivenmonitoring of cyber-physical systems leveraging on bigdata and the internet-of-things for diagnosis and con-trol. 26th International Workshop on Principles of Di-agnosis, Paris, France.

Noura, H., Theilliol, D., Ponsart, J.-C., & Chamseddine, A.(2009). Fault-tolerant control systems: Design andpractical applications. Springer Science & BusinessMedia.

Qin, S. J. (2012). Survey on data-driven industrial processmonitoring and diagnosis. Annual Reviews in Control,36(2), 220–234.

Yin, S., Ding, S. X., Xie, X., & Luo, H. (2014). A reviewon basic data-driven approaches for industrial processmonitoring. Industrial Electronics, IEEE Transactions

on, 61(11), 6418–6428.

10