21
NASA Ames Data Sciences Group www.nasa.gov •1 Nikunj C. Oza, Ph.D. Leader, Data Sciences Group [email protected]

NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

NASAAmesDataSciencesGroup

www.nasa.gov •1

NikunjC.Oza,Ph.D.Leader,DataSciencesGroup

[email protected]

Page 2: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

•2

TheDataSciencesGroupatNASAAmes

GroupMembersIlya AvrekhKamalika Das,Ph.D.DaveIversonVijayJanakiraman,Ph.D.RodneyMartin,Ph.D.BryanMatthewsDavidNielsenNikunjOza,Ph.D.VeronicaPhillipsJohnStutzHamed Valizadegan,Ph.D.+summerstudents

Team Members are NASA Employees, Contractors, and Students.

FundingSources

• ScienceMissionDirectorate:AISTandCMACprograms

• NASAAeronauticsResearchMissionDirectorate- ATD,SMART-NAS,SASOProject

• NASAEngineeringandSafetyCenter

• ExplorationSystemsMissionDirectorate,ExplorationTechnologyDevelopmentProgram

• Non-NASA:DARPA,DoD

DataMiningResearchandDevelopment(R&D)forapplicationtoNASAproblems(Aeronautics,EarthScience,SpaceExploration,SpaceScience)

Page 3: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

ExampleDataMiningProblems

• Aeronautics:AnomalyDetection,PrecursorIdentification,textmining(classification,topicidentification)

• EarthScience:Fillinginmissingmeasurements,anomalydetection,teleconnections,climateunderstanding

• SpaceScience:Kepler planetcandidates• SpaceExploration:systemhealthmanagement,vascularstructureidentification

Page 4: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

FourV’sofBig Tough,SleepDeprivingData

AmazingAlgorithm

ØVolume:Ø RadarTracks:47facilities(1

year)~423GB(Compressed),~3.2TB(CSV)

Ø WeatherandForecast(EntireNAS):CIWS~2.8TB

ØVelocityØ RadarTracks:47Facilities

Ø ~35GB/month(compressed).

Ø ~268GB/month(uncompressed)

Ø WeatherandForecast(EntireNAS):CIWS~233GB/month

ØVeracityØ DatadropoutsØ DuplicatetracksØ TrackendinginmidairØ Reusedflightidentifiers

ØVarietyØ Numerical

(continuous/binary)Ø Weather(forecast/actual)Ø Radar/AirportmetadataØ ATCVoiceØ ASRStextreports

(Pilot/Controller)

IntuitiveReports

Page 5: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

AeronauticsDataMiningProblems

• AnomalyDetection– AnomalyDiscoveryoverlargesetofvariables– Particularvariableofinterest,forexample,fuelburn

• Determineexpectedinstantaneousfuelburngivencurrentstateofaircraft

• Comparewithactualinstantaneousfuelburn• Wheredifferenceishigh,problemmaybeoccurring

• PrecursorIdentification– Givenundesirableeffect(e.g.,go-around),identifyprecursors(e.g.,overtakesituation,highspeedapproach)

• Textmining– Textclassification,topicidentification

Page 6: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

TopicExtractionExample

autopltacftspd

capturemoderatelevel

engagedleveloffvertctl

disconnectedselectedfpmlightclbpitch

manuallywarningpwr

TOPIC1

timedayleg

contributingfactorshrscrewfactorfatiguenighttriprestdutyflyinglonglate

previousincidentlack

alerter

TOPIC2apchrwyvisualilstwrlndglocarptfinal

missedclredmsl

interceptvectoredsightgar

terrainfield

uneventfulctl

TOPIC3

Otherexamplesof‘fatigue’

AltitudeDeviationSpatialDeviationRampExcursionLandingwithoutclearanceRunwayIncursionUnstableApproach

Page 7: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

AeronauticsAnomalyDetection:CurrentMethods

Exceedance-BasedMethods• Knownanomalies• Conditionsover2-3variables(e.g.,speed>250knots,altitude=1000ft,landing)

• Cannotidentifyunknownanomalies• Lowfalsepositiverate,highfalsenegative(misseddetection)rate.

Page 8: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

Data-DrivenMethods

• DISCOVERanomaliesby– learningstatisticalpropertiesofthedata– findingwhichdatapointsdonotfit(e.g.,faraway,lowprobability)

– nobackgroundknowledgeonanomaliesneeded

• Complementarytoexistingmethods– Lowfalsenegative(misseddetection)rate– Higherfalsepositiverate(identifiedpoints/flightsunusual,butnotalwaysoperationallysignificant)

• Data-drivenmethods->insights->modificationofexceedance detection

Page 9: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

Example:HighSpeedGo-Around

• OvershootsExtendedRunwayCenterline(ERC)byover1SM

• Over250Kts @2500Ft.• Angleofintercept>40°• Overshoots2nd approach

BryanMatthews,DavidNielsen,JohnSchade,Kennis Chan,andMikeKiniry,AutomatedDiscoveryofFlightTrackAnomalies,33rd DigitalAvionicsSystemsConference,2014

Page 10: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

ProvidingDomainExpertFeedback

Input Features Anomalies

Nominals

MKAD

SME

Operationally significant anomalies

Uninteresting anomalies

Activelearning strategy

Input Features

MKAD

Training

Anomalies

Nominals

Testing

Rationale features

Active learning with rationales framework

2-class classification/ranking

algorithm

Manali Sharma,Kamalika Das,MustafaBilgic,BryanMatthews,DavidNielsen,andNikunjOza,ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation,EuropeanConferenceonMachineLearningandPrinciplesandPractices

OfKnowledgeDiscovery(ECML-PKDD),2016

Page 11: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

EarthScienceExample

• Understandrelationshipsbetweenecosystemdynamics andclimaticfactors

• Modelasaregressionanalysisproblem• 3sciencequestions– Magnitudeandextentofecosystemexposure,sensitivityandresiliencetothe2005and2010Amazondroughts

– Understandhuman-inducedandotherattributionascausesofvegetationanomalies

– Howlearneddependencymodelvariesacrosseco-climaticzonesandgeographicalregionsonaglobalscale

NASAESTOAIST-14project,UncoveringEffectsofClimateVariablesonGlobalVegetation(PI:Kamalika Das,Ph.D.)

Page 12: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

ProblemFormulation

• Point-to-pointregressionanalysis(GeneticProgrammingbasedSymbolicRegression)

• Estimatespatio-temporaldependencyofforestecosystemsonclimatevariables

Vijt=f(Lcij, CVij

t, CVnbt, CVij

t-1, CVnbt-1,.....CVij

t-k, CVnbt-k)

V:vegetation, i,j:pixellocationindicesLC:landcover type, t:timeindexCV:climate variable(s) nb:spatialneighborhoodof

indexi,jk:temporaldependencyOpenchallenges: 1.Estimatingfunctionf

2.Estimatingbestchoicesfork,nb

Page 13: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

DataPipeline

NDVIResolution: 250 m

Projection: Sinusoidal

LSTResolution: 1 km

Projection: Sinusoidal

TRMM (Ver 6)Resolution: 25 kmProjection: WGS84

Reprojectand

resampledata

NDVI, TRMM, LST

Resolution: 1 kmProjection: WGS84

Filterdatabasedonlandcover

2000 – 2010 Monthly data

Time-Series:Changetoseasonal

Monthly -> Seasonal Windowing:

Smoothingover25x25sizewindow

4 Seasons/yea

r2000 – 2010 Seasonal

data

Season 1: March – MaySeason 2: June – SepSeason 3: OctSeason 4: Nov - Feb

Page 14: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

Resultsfor2004-2010

Year RidgeRegression LASSO SVR Symbolic

Regression

2004 0.284 0.284 0.280 0.262

2005 0.289 0.289 0.288 0.278

2006 0.426 0.426 0.430 0.321

2007 0.374 0.374 0.370 0.318

2008 0.308 0.308 0.310 0.336

2009 0.353 0.353 0.360 0.328

2010 0.546 0.547 0.540 0.479

Marcin Szubert,Anuradha Kodali,Sangram Ganguly,Kamalika Das,andJoshC.Bongard,ReducingAntagonismbetweenBehavioralDiversityandFitnessinSemanticGeneticProgramming,ProceedingsoftheGeneticandEvolutionaryComputation

Conference(GECCO),pp.797-804,2016.

Mean Squared Error

Page 15: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

OngoingandFutureWork• Experimentwithdifferentcombinationsoftemporal

lookback and/orspatialeffects• Introduceadditionalregressors(radiation,forestfiremaps,

deforestationmaps)• StudytheeffectofdifferentregressorsondifferentAmazon

tiles• DerivenonlinearGPmodelsonAmazontiles• Givenappropriatehistoricaldata,havetheabilitytopredict:

“Underwhatconditionsdoesvegetationnotrecoverwithinacertaintimeframe.”

• Doglobalscaleanalysisinparallel

Page 16: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

VESsel GENeration (VESGEN)AnalysisPatriciaParsons-Wingerter,PhD,NASAChiefInnovator/POCNASAAmes2016InnovationFundAward,ChiefTechnologist’sOffice

• VESGEN2Dmapsandquantifiesvascularremodelingforawidevarietyofquasi-2Dvascularizedbiomedicaltissueapplications.

• WorkingontransformingtoVESGEN3D,inlinewithmostvascularizedorgansandtissuesinhumansandvertebrates.

• Vascular-dependentdiseasesincludecancer,diabetes,coronaryvesseldisease,andmajorastronauthealthchallengesinthespacemicrogravityandradiationenvironments,especiallyforlong-durationmissions.

• Onekeycomponentisbinarization:conversionofgrayscaleimagestoblack/whitevascularbranchingpatterns.– Takes10-25hoursofhumaneffort.– Exploringpatternrecognition,matchingfiltering,vessel

tracking/tracing,mathematicalmorphology,multiscaleapproaches,andmodelbasedalgorithms.

Page 17: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

OTSUThresholding

Page 18: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

OTSUvs.AdaptiveThresholding

Page 19: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

FutureWork• Workinprogress:exploringmorepreprocessingandpost-processingtechniques

• Eachstepofpreprocessingandpostprocessing hassomeinputparameters– Theresultissensitivetothisparameters– Weaimtomaketheparameterselectioneitherautomated(machinelearning)orsemi-automated(usercanchoosetherightparameter)

• MachineLearningtolearnthebinarization– Giventhemanuallabels,performsupervisedorsemi-supervisedlearning

– Eachpixelanditsclasslabel(foregroundorbackground)isthetrainingexample

Page 20: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

DASHlinkdisseminate.collaborate.innovate.https://dashlink.ndc.nasa.gov/

DASHlinkisacollaborativewebsitedesignedtopromote:• Sustainability• Reproducibility• Dissemination• Communitybuilding

Userscancreateprofiles• Sharepapers,uploadanddownloadopensourcealgorithms• FindNASAdatasets.

How dowegettheWordOut?

Page 21: NASA Ames Data Sciences Group - Amazon Web …...•2 The Data Sciences Group at NASA Ames Group Members IlyaAvrekh KamalikaDas, Ph.D. Dave Iverson Vijay Janakiraman, Ph.D. Rodney

NASAAmesDataSciencesGroup

www.nasa.gov •21

NikunjC.Oza,Ph.D.Leader,DataSciencesGroup

[email protected]