View
0
Download
0
Category
Preview:
Citation preview
NASAAmesDataSciencesGroup
www.nasa.gov •1
NikunjC.Oza,Ph.D.Leader,DataSciencesGroup
nikunj.c.oza@nasa.gov
•2
TheDataSciencesGroupatNASAAmes
GroupMembersIlya AvrekhKamalika Das,Ph.D.DaveIversonVijayJanakiraman,Ph.D.RodneyMartin,Ph.D.BryanMatthewsDavidNielsenNikunjOza,Ph.D.VeronicaPhillipsJohnStutzHamed Valizadegan,Ph.D.+summerstudents
Team Members are NASA Employees, Contractors, and Students.
FundingSources
• ScienceMissionDirectorate:AISTandCMACprograms
• NASAAeronauticsResearchMissionDirectorate- ATD,SMART-NAS,SASOProject
• NASAEngineeringandSafetyCenter
• ExplorationSystemsMissionDirectorate,ExplorationTechnologyDevelopmentProgram
• Non-NASA:DARPA,DoD
DataMiningResearchandDevelopment(R&D)forapplicationtoNASAproblems(Aeronautics,EarthScience,SpaceExploration,SpaceScience)
ExampleDataMiningProblems
• Aeronautics:AnomalyDetection,PrecursorIdentification,textmining(classification,topicidentification)
• EarthScience:Fillinginmissingmeasurements,anomalydetection,teleconnections,climateunderstanding
• SpaceScience:Kepler planetcandidates• SpaceExploration:systemhealthmanagement,vascularstructureidentification
FourV’sofBig Tough,SleepDeprivingData
AmazingAlgorithm
ØVolume:Ø RadarTracks:47facilities(1
year)~423GB(Compressed),~3.2TB(CSV)
Ø WeatherandForecast(EntireNAS):CIWS~2.8TB
ØVelocityØ RadarTracks:47Facilities
Ø ~35GB/month(compressed).
Ø ~268GB/month(uncompressed)
Ø WeatherandForecast(EntireNAS):CIWS~233GB/month
ØVeracityØ DatadropoutsØ DuplicatetracksØ TrackendinginmidairØ Reusedflightidentifiers
ØVarietyØ Numerical
(continuous/binary)Ø Weather(forecast/actual)Ø Radar/AirportmetadataØ ATCVoiceØ ASRStextreports
(Pilot/Controller)
IntuitiveReports
AeronauticsDataMiningProblems
• AnomalyDetection– AnomalyDiscoveryoverlargesetofvariables– Particularvariableofinterest,forexample,fuelburn
• Determineexpectedinstantaneousfuelburngivencurrentstateofaircraft
• Comparewithactualinstantaneousfuelburn• Wheredifferenceishigh,problemmaybeoccurring
• PrecursorIdentification– Givenundesirableeffect(e.g.,go-around),identifyprecursors(e.g.,overtakesituation,highspeedapproach)
• Textmining– Textclassification,topicidentification
TopicExtractionExample
autopltacftspd
capturemoderatelevel
engagedleveloffvertctl
disconnectedselectedfpmlightclbpitch
manuallywarningpwr
TOPIC1
timedayleg
contributingfactorshrscrewfactorfatiguenighttriprestdutyflyinglonglate
previousincidentlack
alerter
TOPIC2apchrwyvisualilstwrlndglocarptfinal
missedclredmsl
interceptvectoredsightgar
terrainfield
uneventfulctl
TOPIC3
Otherexamplesof‘fatigue’
AltitudeDeviationSpatialDeviationRampExcursionLandingwithoutclearanceRunwayIncursionUnstableApproach
AeronauticsAnomalyDetection:CurrentMethods
Exceedance-BasedMethods• Knownanomalies• Conditionsover2-3variables(e.g.,speed>250knots,altitude=1000ft,landing)
• Cannotidentifyunknownanomalies• Lowfalsepositiverate,highfalsenegative(misseddetection)rate.
Data-DrivenMethods
• DISCOVERanomaliesby– learningstatisticalpropertiesofthedata– findingwhichdatapointsdonotfit(e.g.,faraway,lowprobability)
– nobackgroundknowledgeonanomaliesneeded
• Complementarytoexistingmethods– Lowfalsenegative(misseddetection)rate– Higherfalsepositiverate(identifiedpoints/flightsunusual,butnotalwaysoperationallysignificant)
• Data-drivenmethods->insights->modificationofexceedance detection
Example:HighSpeedGo-Around
• OvershootsExtendedRunwayCenterline(ERC)byover1SM
• Over250Kts @2500Ft.• Angleofintercept>40°• Overshoots2nd approach
BryanMatthews,DavidNielsen,JohnSchade,Kennis Chan,andMikeKiniry,AutomatedDiscoveryofFlightTrackAnomalies,33rd DigitalAvionicsSystemsConference,2014
ProvidingDomainExpertFeedback
Input Features Anomalies
Nominals
MKAD
SME
Operationally significant anomalies
Uninteresting anomalies
Activelearning strategy
Input Features
MKAD
Training
Anomalies
Nominals
Testing
Rationale features
Active learning with rationales framework
2-class classification/ranking
algorithm
Manali Sharma,Kamalika Das,MustafaBilgic,BryanMatthews,DavidNielsen,andNikunjOza,ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation,EuropeanConferenceonMachineLearningandPrinciplesandPractices
OfKnowledgeDiscovery(ECML-PKDD),2016
EarthScienceExample
• Understandrelationshipsbetweenecosystemdynamics andclimaticfactors
• Modelasaregressionanalysisproblem• 3sciencequestions– Magnitudeandextentofecosystemexposure,sensitivityandresiliencetothe2005and2010Amazondroughts
– Understandhuman-inducedandotherattributionascausesofvegetationanomalies
– Howlearneddependencymodelvariesacrosseco-climaticzonesandgeographicalregionsonaglobalscale
NASAESTOAIST-14project,UncoveringEffectsofClimateVariablesonGlobalVegetation(PI:Kamalika Das,Ph.D.)
ProblemFormulation
• Point-to-pointregressionanalysis(GeneticProgrammingbasedSymbolicRegression)
• Estimatespatio-temporaldependencyofforestecosystemsonclimatevariables
Vijt=f(Lcij, CVij
t, CVnbt, CVij
t-1, CVnbt-1,.....CVij
t-k, CVnbt-k)
V:vegetation, i,j:pixellocationindicesLC:landcover type, t:timeindexCV:climate variable(s) nb:spatialneighborhoodof
indexi,jk:temporaldependencyOpenchallenges: 1.Estimatingfunctionf
2.Estimatingbestchoicesfork,nb
DataPipeline
NDVIResolution: 250 m
Projection: Sinusoidal
LSTResolution: 1 km
Projection: Sinusoidal
TRMM (Ver 6)Resolution: 25 kmProjection: WGS84
Reprojectand
resampledata
NDVI, TRMM, LST
Resolution: 1 kmProjection: WGS84
Filterdatabasedonlandcover
2000 – 2010 Monthly data
Time-Series:Changetoseasonal
Monthly -> Seasonal Windowing:
Smoothingover25x25sizewindow
4 Seasons/yea
r2000 – 2010 Seasonal
data
Season 1: March – MaySeason 2: June – SepSeason 3: OctSeason 4: Nov - Feb
Resultsfor2004-2010
Year RidgeRegression LASSO SVR Symbolic
Regression
2004 0.284 0.284 0.280 0.262
2005 0.289 0.289 0.288 0.278
2006 0.426 0.426 0.430 0.321
2007 0.374 0.374 0.370 0.318
2008 0.308 0.308 0.310 0.336
2009 0.353 0.353 0.360 0.328
2010 0.546 0.547 0.540 0.479
Marcin Szubert,Anuradha Kodali,Sangram Ganguly,Kamalika Das,andJoshC.Bongard,ReducingAntagonismbetweenBehavioralDiversityandFitnessinSemanticGeneticProgramming,ProceedingsoftheGeneticandEvolutionaryComputation
Conference(GECCO),pp.797-804,2016.
Mean Squared Error
OngoingandFutureWork• Experimentwithdifferentcombinationsoftemporal
lookback and/orspatialeffects• Introduceadditionalregressors(radiation,forestfiremaps,
deforestationmaps)• StudytheeffectofdifferentregressorsondifferentAmazon
tiles• DerivenonlinearGPmodelsonAmazontiles• Givenappropriatehistoricaldata,havetheabilitytopredict:
“Underwhatconditionsdoesvegetationnotrecoverwithinacertaintimeframe.”
• Doglobalscaleanalysisinparallel
VESsel GENeration (VESGEN)AnalysisPatriciaParsons-Wingerter,PhD,NASAChiefInnovator/POCNASAAmes2016InnovationFundAward,ChiefTechnologist’sOffice
• VESGEN2Dmapsandquantifiesvascularremodelingforawidevarietyofquasi-2Dvascularizedbiomedicaltissueapplications.
• WorkingontransformingtoVESGEN3D,inlinewithmostvascularizedorgansandtissuesinhumansandvertebrates.
• Vascular-dependentdiseasesincludecancer,diabetes,coronaryvesseldisease,andmajorastronauthealthchallengesinthespacemicrogravityandradiationenvironments,especiallyforlong-durationmissions.
• Onekeycomponentisbinarization:conversionofgrayscaleimagestoblack/whitevascularbranchingpatterns.– Takes10-25hoursofhumaneffort.– Exploringpatternrecognition,matchingfiltering,vessel
tracking/tracing,mathematicalmorphology,multiscaleapproaches,andmodelbasedalgorithms.
OTSUThresholding
OTSUvs.AdaptiveThresholding
FutureWork• Workinprogress:exploringmorepreprocessingandpost-processingtechniques
• Eachstepofpreprocessingandpostprocessing hassomeinputparameters– Theresultissensitivetothisparameters– Weaimtomaketheparameterselectioneitherautomated(machinelearning)orsemi-automated(usercanchoosetherightparameter)
• MachineLearningtolearnthebinarization– Giventhemanuallabels,performsupervisedorsemi-supervisedlearning
– Eachpixelanditsclasslabel(foregroundorbackground)isthetrainingexample
DASHlinkdisseminate.collaborate.innovate.https://dashlink.ndc.nasa.gov/
DASHlinkisacollaborativewebsitedesignedtopromote:• Sustainability• Reproducibility• Dissemination• Communitybuilding
Userscancreateprofiles• Sharepapers,uploadanddownloadopensourcealgorithms• FindNASAdatasets.
How dowegettheWordOut?
NASAAmesDataSciencesGroup
www.nasa.gov •21
NikunjC.Oza,Ph.D.Leader,DataSciencesGroup
nikunj.c.oza@nasa.gov
Recommended