View
1
Download
0
Category
Preview:
Citation preview
An Application of Data Driven Anomaly Identification to SpacecraftTelemetry Data
Gautam Biswas1, Hamed Khorasgani 2, Gerald Stanje3, Abhishek Dubey4, Somnath Deb5, and Sudipto Ghoshal6
1,2,3,4 Inst. of Software-integrated Systems, Vanderbilt Univ., USA,gautam.biswas@vanderbilt.edu
hamed.g.khorasgani@vanderbilt.edugerald.stanje@vanderbilt.edu
abhishek.dubey@Vanderbilt.edu
5,6 Qualtech Systems, Inc. (QSI), USA,deb@teamqsi.com
sudipto@teamqsi.com
ABSTRACT
In this paper, we propose a mixed method for analyzingtelemetry data from a robotic space mission. The idea isto first apply unsupervised learning methods to the teleme-try data divided into temporal segments. The large clustersthat ensue typically represent the nominal operations of thespacecraft and are not of interest from an anomaly detectionviewpoint. However, the smaller clusters and outliers that re-sult from this analysis may represent specialized modes ofoperation, e.g., conduct of a specialized experiment on boardthe spacecraft, or they may represent true anomalous or unex-pected behaviors. To differentiate between specialized modesand anomalies, we employ a supervised method of consult-ing human mission experts in the approach presented in thispaper. Our longer term goal is to develop more automatedmethods for detecting anomalies in time series data, and onceanomalies are identified, use feature selection methods tobuild online detectors that can be used in future missions, thuscontributing to making operations more effective and improv-ing overall safety of the mission.
1. INTRODUCTION
As engineered systems have become more complex, and in-clude a range of operations that vary widely, self-monitoring,self-diagnosis, and adaptability to maintain operability andsafety have become focus areas for research and develop-ment. Typical goals of such self-diagnosis approaches arethe detection and isolation of faults and anomalies, identify-ing and analyzing the effects of degradation and wear, and
Gautam Biswas et al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.
providing fault-tolerant and fault-adaptive control (Blanke &Schroder, 2006; Chen & Patton, 2012; Isermann, 2005; Ji,Zhang, Biswas, & Sarkar, 2003; Noura, Theilliol, Ponsart,& Chamseddine, 2009). The majority of projects dealingwith monitoring and diagnosis applications rely on modelscreated using physical principles or by human experts. How-ever, these models are not always available, and are often in-complete, and sometimes erroneous. Moreover, it is hard tomaintain the effectiveness of these models during a systemslife-cycle.
More recently, promising data-driven alternatives that exploitthe large amounts of operational data collected from thesesystems are being employed to better understand system be-haviors and anomalies during system operations (Qin, 2012;Yin, Ding, Xie, & Luo, 2014). In data-driven approaches,monitoring and diagnosis knowledge can be learned by ob-serving and analyzing system behavior (Mack, Biswas, Kout-soukos, & Mylaraswamy, 2016, in press). This large amountof data collected using new, more robust sensors and sen-sor networks, can be exploited in a reliable manner for thepurpose of detecting and analyzing anomalous situations andfaults in these large and complex systems. The vision is de-veloping Cyber Physical Systems (CPSs) (Lee, 2008; Mar-wedel, 2010; Niggemann et al., 2015) that can observe theirown behavior, recognize unusual situations during opera-tions, and inform operators, who use this this informationto modify system operations, or plan for repair and mainte-nance. Furthermore, system’s experts and engineers can usethe information gleaned from this data to update operationsprocedures and even redesign the system.
In this paper, we take on the challenges of developing ananomaly detection scheme for analyzing telemetry data gen-
1
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
ITAR RESTRICTED MATERIAL
WARNING
Simplified Block Diagram
May 17-20, 2011 51
Payload - 4 Instruments
VDUModule
CPU Board(RAD 750)
PAPI #1Heaters/Arm&Fire
IO Board(AMOAB)
S-band
TX
Li Ion
Battery
SACISolar Array and Charge Interface
Temp. Sensors
MLI Blankets
Heaters
IMU
`
X
Fuel
Tanks
Oxidizer
Tanks
He Pressurant Tanks
Propulsion/RCS Subsystem
RCS
Thrusters
OCS
Thruster
UVS
NMS
S-band
RX
Reaction Wheels (4)
CSS (12)
Solar
Panel
(6)
Interface/Power
Controller
Structures &
MechanismsBus Module
Payload Module
Extension Module
PAPI #2Propulsion/Arm&Fire
Spare Slot
SATORI #2Power SC Operations
SATORI #1Power Payload Operations
Avionics Chassis
Solar
Panel
(8)
1
8
7
6
5
4
3
2
LADEE Bus Architecture Functional Block Diagram
S-Band
Antennas
(3)
Electrical Power Subsystem
Command & Data Handling
Propulsion Module
LEGEND
Lightband
Telecommunications
Thermal Control Subsystem
Power
Low speed data
Solar
Panel
(8)
Radiator
(Top Side)Diplexer
Splitter/
Coupler
High speed data
Mass Memory Board, Digital I/O
& High Speed LVDS (DMOAB)
LDEX
LLCD
Single Stage
Extension Module
Star
Tracker
Guidance , Navigation and Control
TT/R
T/R
3
Solar
Panel
(8)
31.6 in.
cP
CI 3
3M
Hz 3
2b
it B
ac
kp
lan
e
Analog
RF
SwBP
FilterCoupler
Coupler
Heaters
Temp
Sensors
LADEE CDR ITAR RESTRICTED MATERIAL WARNINGFigure 1. Functional block diagram of the LADEE spacecraft (image credit: NASA/ARC)
https://directory.eoportal.org/web/eoportal/satellite-missions/l/ladee
erated by NASA’s Lunar Atmosphere and Dust EnvironmentExplorer (LADEE) spacecraft1. LADEE was a robotic mis-sion that orbited the moon to gather detailed informationabout the structure and composition of the thin lunar atmo-sphere, and determine whether dust is lofted into the lunarsky. The LADEE spacecraft’s modular common spacecraftbus, or body, designed, developed, and operated by NASA’sAmes Research Center, innovated away from custom designsand transitioned toward multi-use designs and assembly-lineproduction, which could drastically reduce the cost of space-craft development (Hine, Spremo, Turner, & Caffrey, 2010).The LADEE system block diagram, shown in Figure 1, con-sists of five primary subsystems: (1) the Integrated Avionicssystem, (2) the Propulsion system, (3) the Attitude Controlsystem (ACS)), and the Electrical Power Subsystem (EPS).Our focus is to develop a general data-driven monitoring ap-proach for telemetry (i.e., streaming time series) data for pur-poses of health monitoring, which includes fault and anomalydetection, prognosis, and performance analysis of the moni-tored system.
Our specific focus in this paper is on developing a generalunsupervised method for data-driven anomaly detection incomplex systems. The rest of this paper is organized as fol-
1see https://www.nasa.gov/mission pages/ladee/main/index.html
lows. Section 2 defines anomaly detection problem for space-craft longterm missions. Section 3 formally describes ourapproach to anomaly detection, and lays out the descriptionof our methodology. Section 4 shows an application of ourmethodology to telemetry data from the EPS of the LADEEspacecraft. Using examples, we illustrate the intertwining ofthe mode and anomaly detection problem. Finally, Section 5presents a discussion and conclusion based on the results orour case study, and briefly discusses how we will extend thisapproach in future work.
2. PROBLEM FORMULATION
A spacecraft is a mixed discrete-continuous system (hybridsystem). For example, a reaction wheel which is a contin-uous system with an electric motor to rotate the spacecraftaround its center of mass is controlled by a discrete proces-sor. The discrete states of the spacecraft are called modes andthe switches or external events define the discrete dynamicsof the system. The behavior of the spacecraft depends onthe system mode: each mode change can be: (1) controlled,i.e., they are initiated by a set of discrete switches by the sys-tem controller or a human operator; and (2) autonomous, i.e.,they are induced by external events (e.g., when the spacecraftmoves from region where it receives sunlight to one where it
2
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
is dark) or they may be caused by a change internal to the sys-tem (e.g., a battery may be completely drained, or a fuel tankoverflows). In both cases, the mode change changes the sys-tem model, and, therefore, the system dynamics. In this work,we assume the spacecraft is a hybrid system that is modeledas a hybrid automaton (Henzinger, 2000) with the followingdefinition.
Definition 1 (Hybrid Automaton) A hybrid automaton His defined by a 5-tuple, H = {Q,Rn, f, ϕ, ρ}, where Qrepresents a set of discrete states; Rn represents the spaceof continuous behaviors; f : Q × Rn → Rn representsthe vector field that defines continuous behaviors in a mode;ϕ : Q× Rn → Q represents the discrete transition function,and ρ : Q×Rn → Rn represents the reset map.
Generally, before a spacecraft launch, scientists and systemdevelopers plan the entire mission. This is called space mis-sion planning. Some components of the plan may be up-loaded to controllers on the spacecraft, others are manuallycommanded or uploaded as a mission progresses by the mis-sion controllers. Considering the spacecraft as a hybrid au-tomata, we can define the expected mission trace as a se-quences of time transitions that the spacecraft is designed tofollow during the mission.
Definition 2 (Expected Mission Trace) The expected mis-sion trace, MT consists of an initial mode qstart ∈ Q, afinite set of mode transitions, Tr : Q → Q, which may becontrolled or autonomous; and a sequence of intermediatemodes Qm = {qi, qj , · · · qk}, where each intermediate modeqi ∈ Qm has a start time tsi and an end time tei. The finalmode that ends a mission is qend.
A mode trace is considered normal if and only if every modethe system enters is expected according the hybrid automatamodel, H .
Definition 3 (System normal operating mode) A mode(qi, tsi, tei) is a normal operating mode if and only if(qi, tsi, tei) ∈MT .
Because of unpredicted events and possible faults and degra-dation in the system, it is possible that the system starts be-having in an unexpected way for an interval of time during themission. Since this behavior cannot be explained or justifiedusing the mission plan or by a mode transition in the defini-tion of H , then the system is considered to be in an abnormalmode.
Definition 4 (System abnormal operating mode) A mode(qi, tsi, tse) is an abnormal operating mode if and only if(qi, tsi, tse) /∈ MT , or there is no defined transition froma mode qk ∈ Q to mode qi.
The objective here is to develop a method to detect the ab-normal behavior modes during spacecraft operation. Thishelps the system developers to study the abnormal modes
and analyzes their root causes to prevent them from occur-ring in future missions. A common approach to detect nor-mal and abnormal operating modes in hybrid systems is usingthe state estimation approaches (Hanlon & Maybeck, 2000;Blom & Bar-Shalom, 1988). Estimation approaches typi-cally use multiple-model-estimation schemes to track stateestimates over time, and therefore, require at least as manyfilters as there are modes in the system. In the LADEE space-craft, we have at least 67 switches in the power system, whichmeans we have at least 267 modes. A model-based approachto designing 267 filters is unrealistic and impractical – mostof these modes are not likely to occur in any spacecraft mis-sion. A more realistic approach would be to build detectorscorresponding to the system Mission Trace, but that wouldmiss unanticipated anomalous behaviors in the system.
3. DATA DRIVEN ANOMALY IDENTIFICATION
The objective here is to develop a method to detect abnormalbehaviors that may have occurred during the spacecraft oper-ations for a long-term mission. Since the possible discrepan-cies, faults, or errors that may occur are unknown before themission takes place, we have to develop approaches that candiscover them by analyzing mission telemetry data. If thiscan be accomplished, system designers and mission special-ists can perform detailed studies in the time intervals whenthe abnormal behaviors occur. This will help them identifythe root causes, which in turn will influence the design offuture spacecraft to avoid such anomalies. Alternately, mon-itors that can detect such anomalies in an automated fashioncan be designed that allows mission specialists to come upwith corrective actions or change the mission plan and avoidadverse incidents.
In our work, we have applied unsupervised learning tech-niques to find groupings in a large database of time seriesdata. Our approach is to divide the time series representingthe entire mission trajectory into segments, and each segmentrepresents an object of interest on the mission time line. Theselected size of each time interval (time window) is a trade-offbetween the number of objects created from the time series,and having enough data per window to detect and character-ize abnormal versus nominal behavior. Typically, most of theobjects will represent nominal operations of the spacecraftduring the mission, but a small subset of the time segments(objects) may represent anomalous or faulty behaviors. Sincethe anomalous behaviors or faults are not known beforehand,we apply a clustering algorithm to group the objects. Ourhypotheses is that the larger groups of clusters will repre-sent nominal operations, whereas outliers and smaller groupsmay represent anomalous situations. In general, researchershave developed classifier or supervised methods for charac-terizing known faults and semi-supervised and unsupervisedmethods for discovering and characterizing unknown faults
3
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
Mission Time Line
𝑂1 𝑂2 𝑂𝑛 𝑂𝑖
𝑣1, 𝑣2,…, 𝑣𝑗 , …, 𝑣𝑚 Variables
Time Series time
𝑣𝑗
𝑓1, 𝑓2,…, 𝑓𝑙 𝑚 ∗ 𝑙 features for each
object 𝑂𝑖
Figure 2. Data preprocessing and feature extraction.
and anomalies. In this work, we propose a mixed method foranomaly detection in a robotic space mission.
In more detail, we have developed a multi-step unsupervisedlearning method to distinguish the outliers which include spe-cialized modes of operation and abnormal behaviors from thenormal operation. Then, we employ a supervised method ofconsulting human mission experts to differentiate betweenspecialized modes and anomalies. First, we select a set ofobjects from a curated data set, where each object is repre-sented by a set of features, typically segments of a time seriessignal. The next step is to convert each feature signal intoa set of feature values that make the data amenable to tra-ditional numeric clustering methods. To do this we apply awavelet transform to each time series segment, and representthat waveform in terms of coefficients that define the wavelettransform. In addition to generating numeric feature values,this approach also serves as an approach for compressing andsmoothing a signal. Then a hierarchical clustering approachis used to generate clusters from the extracted features. Theoutputs of the clustering algorithm are preliminary groups oftime intervals for further consideration.
The input to the process is operational data, which is thetelemetry data transferred to the earth from the spacecraft dur-ing the mission. This dataset contains many variables fromdifferent subsystems of the spacecraft which are measured orcomputed with different sample rates. In general, this datasetis extremely large. To reduce computational complexity, weuse the following approaches; 1) data reduction, 2) featurereduction. The data reduction here means selection of rele-vant time series waveforms for object definition to describeanomalies. V = {v1, v2, ..., vm} represents the selected vari-ables. Figure 2 illustrates the data preprocessing procedure.
As discussed earlier, we divide the mission into segments,and each segment represents an object of interest on the mis-sion time line. k is the number of samples in each time in-terval (time window). The time windows are the objects,O = {O1, O2, ..., On}. In fact, each object Oi ∈ O is atime series which contains k samples of each select variable.
In this work, we use the discrete Haar wavelet transform toextract the scaling coefficients which correlate to the low fre-quency of the time series signals. We consider the first l co-efficients of the wavelet transform for each variable (See Fig-ure 2). The set of coefficients for object Oi, is presented withfi ∈ Rm∗l in Figure 3, where m ∗ l represents the numberof wavelet coefficients derived from the set of time series sig-nal segments defining the object. The algorithm then uses thegenerated features and a weighted Euclidean distance mea-sure to build the dissimilarity matrix between every pair ofobjects, i.e., Dnn = dist(Oi, Oj), 1 ≤ i, j ≤ n. We then runa UPGMA (Unweighted Pair Group Method with ArithmeticMean), agglomerative (bottom-up) hierarchical clustering al-gorithm to generate a dendrogram that represents the orderin which the objects group into clusters. The advantage ofthis approach is that the number of clusters does not have tobe pre-determined. We then use heuristic methods to cut thedendrogram at a level (i.e., a distance measure) that providesa distinct grouping of clusters. In other words moving thedistance level at which the dendrogram is cut by small levelswill not change the number of clusters that are generated.
The next step is the interpretation process and the identifica-tion of anomalous groups. We assume the results of the clus-tering produce one or more large clusters where most of timeintervals reside, with the implication that these large clustersrepresent mostly nominal behaviors because overall the mis-
4
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
Operational
Data
𝑂1
𝑂2
𝑂𝑛
.
.
.
Preprocessing
𝑓1 =
𝑜1⋮
𝑜𝑚∗𝑙
𝑓2 =
𝑜1⋮
𝑜𝑚∗𝑙
. . . 𝑓𝑛 =
𝑜1⋮
𝑜𝑚∗𝑙
Clustering . .
𝐶1
. 𝐶2
𝐶𝑝
Nominal
Outliers:
Special
Operation
Modes
Abnormal
Behaviors
Anomaly Detection Process
Figure 3. Our proposed approach for anomaly detection.
sion was a success. We label these clusters as correspondingto nominal operations of the spacecraft. Since a spacecraft ison a complex mission that involves multiple maneuvers andalso a number of scientific experiments, we then time focuson the smaller clusters, and start with the hypotheses that eachone of the cluster represent: (i) special operating modes , or(ii) anomalous or faulty behavior. We identify special op-erating modes by looking at the discrete waveforms that areavailable, such as switches that indicate a particular load rep-resenting a science experiment or a spacecraft maneuver wasturned on. This helps us assign non anomalous labels to anumber of the smaller clusters. For the rest, we seek expertinput.
A typical approach here would be to compare the features ofthe possible anomalous groups against the labeled nominalgroups. We can do this systematically by running a classi-fier algorithm and determining the features that best separatethese groups. These features then define the anomalous pat-terns that do not confirm to expected behavior. In this work,to identify anomalies we perform the following analysis onthe small clusters (outliers): 1) we study the features that suf-ficiently distinguish the outliers from the nominal data, 2) westudy the switches that went on or off during the outlier in-tervals, 3) we talk with the experts to confirm our hypothe-sis. With possible anomalous groups and a nominal base ofclusters, a feature selection algorithm is applied to identifythe relevant features that differentiate each anomalous group.A switch can explain special operating conditions due to themode transitions in the system. An expert will use the distin-guishing features and binary transitions during the outliers tocharacterize the anomalies and their level of failure. Coupledwith the nominal sets, these groups can be used to producenew models of anomaly detection.
4. CASE STUDY: LADEE EPS
The data set was a collection of time series data recordedover a 1 year long lunar mission. We extracted only voltageand current features (total 34) from this dataset. We believethis provided sufficient information for a mode and anomalyidentification. This paper explores a data-driven mode andanomaly detection approach where the algorithm has accessto the electric power subsystem data of a robotic satellite. Thedataset has the following characteristics:
1. There are 265 time series variables in the electrical sub-system dataset.
2. There are 17 time series binary variables in zero or oneformat among the variables.
3. There are 50 time series binary variables in on or off for-mat among the variables.
4. There are 7 time series variables with voltage indexamong the variables. These variables include:• Battery voltage• Solar array voltages• Load voltages
5. There are 27 time series variables with current indexamong the variables. These variables include:• Battery current• Solar array currents• Load currents
6. The mission is 223 days long and there are 574687 sam-ples for each time series variable.
7. The sampling rate is not constant and the time betweentwo samples can vary from 0.4s to 10195s in the dataset.
4.1. Data preprocessing and feature extraction
We have decided to break each time series into 1512 timewindows where each time window includes 380 samples. The
5
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
sampling rate of the recorded data was not constant, thereforea time window may represent from 5 minutes to 10 hours ofoperation. In average each time window represents 3 hoursand 31 minutes. The selected size of the time window was atradeoff between detection accuracy and having enough dataper window to detect/identify an abnormal behavior.
The task of the data processing is to break the recorded timeseries data into distinct features, which were used as inputsof the clustering algorithm. Wavelet transform is an impor-tant mathematical tool to analyze time series data because itcontains both time and frequency information of a signal. Awavelet is an oscillation function which increases from zero,and then returns to zero. The wavelet transform of a signalis a convolution of a wavelet function with the signal. Thosewavelet functions have different shapes and sizes and are im-plemented as band-pass filters. The output of the wavelettransform is a set of coefficients which capture the time (po-sition in time) and frequency characteristics of the signal.
In signal processing wavelets are typically used in peak de-tection, noise reduction, and data compression. The discretewavelet transform was applied to reduce the computationalcomplexity and to filter the noise of all variables by extractingthe scaling coefficients which correlate to the low frequencysubbands. We used the package of functions for computingwavelet filters, wavelet transforms and multiresolution analy-ses (Aldrich, 2010) in R software environment2 to extract thewavelet coefficients. The wavelet transform was decomposedup to the 8th coefficient for each variable. This resulted in272 features per time window.
4.2. Clustering
In the next step the euclidean distance was employed to com-pute the pairwise distance between the temporally segmentedobjects. The time series waveforms were converted to a set offeatures represented by the wavelet transforms as discussedearlier. We used the R function hclust to generate the de-drogram shown in Figure 4. We selected UPGMA as the ag-glomerative clustering in this case study. The dendrogramrepresents the order in which the objects group into clusters.We used a heuristic, which suggested cutting the dendrogramat a height, where small changes in the location of the cutwould not cause changes in the number of groups extractedfrom the dendrogram. The small clusters generated were po-tential anomalies. We will investigate them in more detailusing the binary variables (switches) as context to understandthe spacecraft’s mode of operation, and experts recommen-dations to further characterize the condition. Cutting the treeat lower heights will generate more clusters and anomalies.Therefore, the level at which the dendrogram is cut representsa trade-off between increasing the precision of the anomaly
2see http://www.R-project.org/
Figure 5. Anomalies and special modes during the mission
definitions, while also reducing the number of anomalies anexpert would have to analyze.
4.3. Anomaly Detection
Figure 5 shows the most relevant groups plotted over time(mission days) detected by the clustering algorithm. Eachgroup represents a mode or an anomaly. Table 1 shows an ab-stract description of the most important modes and anomaliesthat were discovered from the groups generated.
4.3.1. Group 1: Normal operation
As we expected, the clustering algorithm generates a largecluster where significant number of time intervals reside.This cluster represents the normal operation of the spacecraft.
4.3.2. Group 2: The reaction wheels control problem
This time interval belongs to the beginning of the missionand is distinguished from the rest of the data set because ofrelatively high current in the SATORI #2 subsystem. As it isshown in Figure 6, the reaction wheels go off during this timeinterval. Note that Figure 6 shows that the reaction wheelsgo off two times during the mission. However, the expertsin NASA confirmed that the reaction wheels only went offonce during the mission and the second zero in the figure isa result of bad data. Considering the reaction wheels and thehigh current in the SATORI #2 subsystem, we found it verylikely that this incident related to the guidance navigation andcontrol unit. The experts from NASA confirmed that this timeinterval represents an anomaly and the incident is as follows.
In the first few orbits around the earth, the spacecraft began tospin at a faster rate than was expected, and the reaction wheelswere turned off by the control software to avoid a high currentload on the battery by the guidance, navigation and control(GNC) system. When the reaction wheels were turned off, the
6
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
1367
443
347
392
1499
346
616
1425
4415
003 20
305
395 1
102
131
373
481
548
549
734
740
483
543
544
604
617
531
609
534
729
733
742
550
530
728
739
653
655
730
496
532
546
606
533
636
654
732
741
268
391
1366
1917 18
164
1439
4314
121 2
1483
681
682
683
990
1142
920
1255
749
216
241
333
294
340
1368
1385
278
281
460
34 132
1261
1389
279
1497
558
411
871
634
536
488
494
1415
647
652
501
495
727
552
1377
949
1234
1354
842
1294
1020
891
1041
778
1071
1172
2739 12
444
144
261
474
881
582
8 1116
492
1009
46 5625
639
034
434
526
430
255
262
260
261
263
130
129
52 5425
725
957 59
53 301
255
258 45
58 6014
6823
123
216
216
363
814
3614
1364
949
353
514
2360
860
361
061
564
260
161
314
5914
3071
171
316
142
273
743
942
443
542
042
932
141
842
890
438
427
434
123
322
8843
243
143
041
542
142
643
643
343
745
646
5 115
711
811
612
0 9111
111
395
109
9311
412
232
541
942
511
911
289
417
324
735
363
423
92 9636
212
111
011
732
332
611
594 36
473
641
441
614
2071
615
0713
4814
28 1170
1191
1333
1343
1390
1410
1417
409
587
562
928
1060
582
1069 78
710
6580
523
340
814
2692
480
479
593
864
876
810
4694
310
5114
3261
156
578
381
715
0314
7214
3714
4915
1215
1015
1114
4510
9656
356
651
775
256
769
812
9256
466
466
611
3013
9214
1167
031
740
140
740
640
040
548
950
2 500
487
490
482
484
540
538
542
1435
1100
1493
626
1111
1427
1265
1274
717
1414
1454
1121
1279
539
410
725
1193
503
719
758
1056
1042
791
934
519
762 52
491
054
555
189
258
810
6110
4058
364
360
263
774
740
441
241
340
240
312
2515
0912
0113
7291
890
159
350
992
957
011
0786
510
0499
910
2511
9273
810
3210
9212
1647
575
790
291
469
710
9514
4470
114
71 146
748
086
915
0412
2710
7910
8410
4513
5175
110
7313
5952
212
0275
351
215
0293
713
3413
3913
5310
3710
5912
3712
2678
112
7294
596
1 140
180
395
312
6712
4212
8112
5814
5612
5914
0496
211
1511
0377
113
56 1072
779
1355 12
3512
56 650
796
925
1397
1382
935
1081
665
964
811
974
561
621
829
955
1440
800
812
820
1365
640
1388
1408
948
969 9
5980
663
995
014
6946
146
447
344
746
345
344
547
098
910
4777
413
4912
2913
7013
7493
012
4564
613
69 801
1248
965
684
951
1217
1266 1
137
1236
1446
660
576
580
1070
788
1066
1484
661
672
1197
457
896
1194
1209
1344
750
523
557
1375
1075
1357
939
956
1160
1476
1126
1288
1464
1147
1077
1136
1131
1140
1314
1300
1489
1481
1177
578
1161
1187
1329 13
1911
6613
05 1148
1182
1309
1323
1152
1477
1269
1419
574
595
572
585
597
628
630
632
766
1421
1284
1293
843
1295
904
972 13
3213
6213
3713
4212
5311
5014
0670
612
7883
611
5885
368
969
412
9014
7512
6461
964
565
150
855
552
152
7 118
068
871
573
154
748
650
711
3581
995
882
565
896
013
8413
9511
3482
210
8311
7811
7979
786
4 693
662
667
765
926
976
663
724
1024
1104
1247
1238
1246
1462
1491
1505 88
688
913
3611
9912
1176
479
8 835
1174
834
1145
1155
1347
946
1396
1017
1189
873
863
936
894
1327 1
232
802
1085
1277
1496
977
1405
993
793
1016
912
1443
1453
685
818
957
1008 99
811
2014
7412
0513
1511
1213
3814
0914
1613
4011
9510
2813
1612
9613
01 1146
966
1123
1088
1271 9
4112
41 1099
1089
1098
1206
1257
944
807
808
931
1326
1457
1434
1479
1154
1094
673
770
963
1282 8
8275
679
414
0011
6414
4810
5890
991
310
1211
0991
710
02 932
839
916
686
1273
840
856
814
1068
1298
1118
1214
1188
1306
1487
789
1220
695
723
879
940
705
1113
1243
1064
1322
968
790
1358 10
0711
59 103
910
4979
994
211
2590
093
380
912
5213
7113
9412
5114
5111
6714
9484
671
082
388
3 718
1303
1031
1308
1363
556
947
1101
1156
1506
767
1124
967
1262
1110
1466
1461
987
1287
1480
982
1149
971
1276
994
1128
997
1452 8
4914
8614
4211
1413
8111
0614
3314
4198
176
013
30 1078
1200
1221
780
1215
895
1034 70
088
713
1299
114
5898
611
19 952
813
985
782
1139 86
810
5410
55 1053
1080
1076
810
1090
1038
792
1207
1196
1297 8
4510
1189
978
611
6311
8590
899
283
112
3088
892
267
783
011
33 1218
1399
1231
1350
1268
1345
1093
1376
980
1105 9
2312
5014
4710
6312
6314
2914
3814
5013
5210
7413
9313
3111
7511
9012
9113
8011
2914
2210
1387
410
1813
4112
1011
8313
4613
1714
9510
2214
8585
513
3513
1113
2147
950
551
867
166
950
655
974
649
977
2 599
537
528
905
1213
1087
1240
1108
1403
592
954
1431
1122
1254
449
674 5
41 668
560
854
678
1460
699
1270
1289
1455
600
633
515
520
635
1391
1386
1086
1091
573
584
590 71
412
2213
6013
7859
457
513
7357
158
612
0312
0811
9812
2813
8313
9812
3310
8212
4913
8714
0760
712
6014
2462
062
411
1757
757
958
155
459
1 1244
641
612
622
596
598
1239
1402 6
2963
160
561
8 569 62
3 526
553
776
1302
1307
1033
1184
761
785
775
850
1285
1325
1097
1361
1379
1067
1219 87
587
814
1812
8613
20 708 82
710
1582
685
971
211
6512
8313
2813
1311
7611
8614
8813
0412
9911
8113
1869
691
9 504
1473 33
1052
1498
1492
872
1364
644
625
627
1044
769
784
1501
1508
921
927
451
687 86
011
3811
4311
6910
0314
9044
481
688
010
1084
849
813
2488
158
912
1214
6570
248
569
087
7 116
211
5385
813
1010
0114
7886
710
2799
697
098
411
2712
8082
184
411
3283
897
510
5012
24 100
612
0491
191
510
4875
510
3611
4411
68 102
372
090
7 893
759
1057
1029
1043
903
898
1062
1171
866
862
1173
1141
833
1482
675
1223
656
1275
454
979
763
841
691
721
861
870
497
679
399
1005
857
1000
529
1035 4
5910
26 773 77
751
390
674
356
885
189
770
770
398
898
310
14 472
467
474
1030
516
525
462
744
726
709
754
455
511
446
452
847
1470
469
885
837
876
471
704
1021
458
676
657
692
450
659
468
978 68
072
2 832
995 1
019
491
890
448
476
824
514
852
510
745
884
973
1151
1463
398
440 62 165 37
628
261 28
0 318
270
271
349
276
196
277
371
177
218
290
245
285
248
296
240
217
289 2
14 295
249
275 2
84 288
224
49 267
3032
782
219
229
283
184
244
266
3513
936
732
107
172
477
357
478
21 137 8
737
532
937
438
240 34
8 246 25
0 269
359
169
272
361
355
354
394
254
243
252
67 198
70 204 22
520
673 15
385
368
187
306 29
118
822
3 152 19
518
920
128
620
919
138
421
021
5 230
386
154
146
194
8064 25
386
149
150
222
293
66 227
208
297
7620
522
079 14
281
7719
368 69
7814
721
221
338
114
371 29
236 38 4
1 5032
837
9 331
380
237
197
211
48 393
265
274
31 160 1
7631
623 32
024
218
118
2 105
101
106 35
023
631
012
551
128
273
356
358
179
303
319
353
134
396
389
360
300
343
298
337
159
304
352
25 2817
131
138
831
331
533
038
714
115
610
216
8 126 30
919
018
537
024
717
419
230
726
170
312
235
175
178 2
3928
736
634
115
720
233
229
942 38
597 31
4 238
335
383
377
339
342
336
338
207
221
151
145
155
7437 37
222
675
65 7218
622
814
420
315
847
63 8314
884 18
320
018
019
936
9 98 99 140
167
104
108
127
103
2224 29
133
136
234
135
138
397
100
166 3
7835
133
436
517
325
130
813
466
1611 14
1510 12
57 8
64 9
010
2030
4050
6070
Cluster Dendrogram
hclust (*, "average")d
Hei
ght
Figure 4. Hierarchical clustering
Figure 6. Reaction wheels (OFF=0, ON=1)
spacecraft stopped rotating, and, therefore, one side becametoo hot, and the other side too cold. To keep the temperaturebalanced several heaters went on which led to high currentin the SATORI #2 subsystem. Figure 7 shows normalizedSATORI # 2 current during group 2.
4.3.3. Group 3: The lunar orbit insertion
As it is shown in Figure 5 group 3 consists of three time in-tervals which occur in three different days. A high variationin the PAPI # 2 (propulsion, see Figure 1 ) subsystem’s cur-rent distinguishes this group from the rest of the dataset. Fig-ure 8 shows normalized PAPI # 2 current during this group.
0 100 200 300
−5
05
1015
2025
30First window
Samples
Nor
mal
ized
SAT
RO
I # 2
Cur
rent
0 100 200 300
05
1015
Second window
Samples
Figure 7. Normalized SATORI # 2 current during group 2
Moreover, the valve driver unit which controls the propulsionsubsystem, and the pressurant tanks heaters which also be-longs to the propulsion subsystem (see Figure 1) goes ON inall the three time intervals. This group represents a uniquebehavior in the dataset, however, the experts confirmed thatthis does not represent an anomaly behavior and we shouldclassify this group as a special operation mode. The expertsinformed us that this group represents the lunar orbit insertionprocess. In fact, there were three firing process in the propul-sion subsystem to get into lunar orbit and our algorithm wasable to classify them in a single group.
7
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
0 100 200 300
05
1015
First window
Samples
Nor
mal
ized
PA
PI #
2 C
urre
nt
0 100 200 300
02
46
810
Second window
Samples
0 100 200 3002
46
810
12
Third window
Samples
Figure 8. Normalized PAPI # 2 current during group 3
4.3.4. Group 4: The laser communication test
This mode of operation consists of two time windows whereeach of them is almost 20 minutes long. To explain this mode,we should notice that a laser communication test occurs dur-ing each time window. These tests are part of the missionplan and occur multiple times during the mission. However,because of the new moon, the solar array current is almostzero during this mode. The high current demand due to thelaser communication test in the absence of solar energy putstoo much pressure on the battery and leads to a battery voltagedrop. Figure 9 shows the normalized battery voltage duringeach time window.
0 100 200 300−3.
0−
2.6
−2.
2−
1.8
First window
Samples
Nor
mal
ized
bat
tery
vol
tage
0 100 200 300−4.
0−
3.5
−3.
0−
2.5
−2.
0
Second window
Samples
Figure 9. Normalized battery voltage during group 4
This group represents an abnormal behavior because the lasertest in the absence of solar energy leads to unintended conse-quences.
4.3.5. Group 5: The eclipse
This group is related to the eclipse. The time interval is al-most 5 hours long, because the sampling rate gets very low
Figure 10. Normalized battery voltage during the mission
at the end of the mission. The solar array current is zero dur-ing the first 2 hours. Then it starts turning on, and switchesbetween zero and none zero after that. Several heaters wenton to keep different subsystems temperatures in normal rangein the absence of sun light. This increases the load current.A simultaneous increase in the load current and decrease inthe solar array current put an unprecedented pressure on thebattery, which leads to battery voltage drop. We can classifythe eclipse as an abnormal behavior because the battery is notdesigned to operate in this situation. Figure 10 shows the nor-malized battery voltage during the mission. We can see theeclipse at the end of the mission (day 219).
4.3.6. Group 6: The safe mode
The system goes to the safe mode right after the eclipse. Dur-ing this mode, several loads were switched off to limit en-ergy consumption and provide enough current for the bat-tery recharge. This group represents a unique behavior in thedataset, however, it does not represent an anomaly behaviorand we should classify this group as a special operation mode.
5. CONCLUSIONS, AND FUTURE WORK
In this paper, we presented a data-driven anomaly detectionmethod. We defined the problem and presented a new ap-proach for anomaly detection. Our approach uses an unsuper-vised learning method to detect the outliers that may representspecial modes of operation, or they may be anomalies. To dif-ferentiate between special modes and anomalies, the methoduses a supervised approach of consulting human mission ex-perts. We applied our approach to detect anomalies during theLADEE mission. To detect anomalies, we looked for the fea-tures that were significant actors in differentiating the outliersfrom the nominal set. We also considered the binary switchesduring each time interval as the indicators of mode changes.Finally, we used domain experts to validate the anomalies.
8
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
Table 1. Abstract overview of detected modes and anomalies
Group Detected Mode orAnomaly
Voltage or Current Switches
1 Normal operation mode2 Reaction wheels anomaly
• SATORI # 2 current had ahigher variance
• Propulsion heater turned on• Star tracker went off
3 Lunar orbit insertion mode• PAPI # 2 current had large
variations• SATORI # 2 current had
large variations
• Pressurant tank heater wenton
• Valve driver unit went on
4 Laser communication testanomaly • Solar array current was
zero because of new moon• Battery voltage dropped be-
cause of high current de-mand of the laser commu-nication test
• Laser communicationsswitch went on
5 Eclipse mode• Solar array current was
zero during the first 2hours, and had high fluctu-ation afterward.
• Several heaters went on(e.g. Propulsion heater)
6 Safe mode• Battery current shows a
high fluctuation• Several loads (e.g. star
tracker) turned off
We found that a number of detected anomalies were quite in-teresting to the experts including a fault in reaction wheelscontrol system, and a laser test that caused a drop in the bat-tery voltage. In both cases the experts confirmed the anoma-lies and flagged them for further investigation. The proposedapproach performs well when the number of outliers is smallenough for examining them one by one. In future work, wewill investigate different algorithms to further automate theprocess.
ACKNOWLEDGMENT
This work was partially supported by funding from NASASTTR grant # NNX15CA11C. The authors gratefully ac-knowledge the support provided by Scott Poll, Mark Shirley,Peter Berg, and other members of the LADEE Mission Opsteam from NASA Ames in acquiring, analyzing, and inter-preting the LADEE Telemetry data.
REFERENCES
Aldrich, E. (2010). Wavelets: a package of functions forcomputing wavelet filters, wavelet transforms and mul-
tiresolution analyses. R package version 0.2-60..Blanke, M., & Schroder, J. (2006). Diagnosis and fault-
tolerant control (Vol. 691). Springer.Blom, H., & Bar-Shalom, Y. (1988). The interacting multiple
model algorithm for systems with markovian switch-ing coefficients. Automat. Contr., vol. 33, pp. 780-783,Aug. 1988.
Chen, J., & Patton, R. J. (2012). Robust model-based fault di-agnosis for dynamic systems (Vol. 3). Springer Science& Business Media.
Hanlon, P., & Maybeck, P. (2000). Multiple-model adap-tive estimation using a residual correlation kalman fil-ter bank. IEEE Trans. Aerosp. Electron. Syst., vol. 36,pp. 393-406, Apr. 2000.
Henzinger, T. A. (Ed.). (2000). The theory of hybrid au-tomata. Springer Berlin Heidelberg.
Hine, B., Spremo, S., Turner, M., & Caffrey, R. (2010).The lunar atmosphere and dust environment explorer(ladee) mission. In Ieee aerospace conference.
Isermann, R. (2005). Model-based fault-detection anddiagnosis–status and applications. Annual Reviews inControl, 29(1), 71–85.
Ji, M., Zhang, Z., Biswas, G., & Sarkar, N. (2003).
9
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2016
Hybrid fault adaptive control of a wheeled mobilerobot. Mechatronics, IEEE/ASME Transactions on,8(2), 226–233.
Lee, E. A. (2008). Cyber physical systems: Design chal-lenges. In Object oriented real-time distributed com-puting (isorc), 2008 11th ieee international symposiumon (pp. 363–369).
Mack, D. L., Biswas, G., Koutsoukos, X. D., & My-laraswamy, D. (2016, in press). Learning bayesiannetwork structures to augment aircraft diagnostic ref-erence models. IEEE Transactions on Automation Sci-ence and Engineering.
Marwedel, P. (2010). Embedded system design: Em-bedded systems foundations of cyber-physical systems.Springer Science & Business Media.
Niggemann, O., Biswas, G., Kinnebrew, J. S., Khorasgani,
H., Volgmann, S., & Bunte, A. (2015). Data-drivenmonitoring of cyber-physical systems leveraging on bigdata and the internet-of-things for diagnosis and con-trol. 26th International Workshop on Principles of Di-agnosis, Paris, France.
Noura, H., Theilliol, D., Ponsart, J.-C., & Chamseddine, A.(2009). Fault-tolerant control systems: Design andpractical applications. Springer Science & BusinessMedia.
Qin, S. J. (2012). Survey on data-driven industrial processmonitoring and diagnosis. Annual Reviews in Control,36(2), 220–234.
Yin, S., Ding, S. X., Xie, X., & Luo, H. (2014). A reviewon basic data-driven approaches for industrial processmonitoring. Industrial Electronics, IEEE Transactions
on, 61(11), 6418–6428.
10
Recommended