Software and data as scaffolds for integrative science

  • View
    63

  • Download
    4

  • Category

    Science

Preview:

Citation preview

SoftwareandDataasScaffoldsforIntegrativeScience

DavidLeBauer,Ph.D.UniversityofIllinoisatUrbana-Champaign

DepartmentofAgriculturalandBiologicalEngineeringCarlRWoeseInstituteforGenomicBiology

NationalCenterforSupercomputingApplications

1

Outline

• Overview:ProblemsandApproach

• CombiningInformation

• Models:integrationacrossdomains

• PEcAn:integrationofmodelsanddata

• TERRAREF:automateddatacollectionandanalysis

• FutureDirections

2

Challengesweface

• AgriculturalProduction:• Feeding9bnby2050• Climateischanging

• Resourcesarebecomingscarce

• ScientificProblems:

• Howdogenescontroltraits?• Howcanleveragedataandcomputing?

3Tilmanetal,Nature2002

Yield

Fertilizer

Pesticides

TechnicalSolutionsforScienceandAgriculture

• KnowledgeisSpreadAcrossManyScalesandFormats:• ExpertKnowledge• Data• MechanisticModels

• Integratingthesewillenable:• StrongerInferenceandPrediction• MoreScienceandEngineering

4

Marshall-Colon et al 2017 Frontiers in Plant Science

102m

10-3m

103m

104m

105m Whichcropsareviable,…andwhere?

Whatfractionofglobalenergy/fooddemand?

CountylevelmeanyieldsSupplychainoptimization

Localtopography:soil,hydrologySub-fieldmanagement

CropArchitectureRowSpacing/OrientationHarvestingEquipmentShadingresponse

SpatialScale Questions

OpportunitiesAcrossScales

5

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn• TERRAREF

• FutureDirections

Zhu,Lynch,LeBauer,Millar,Stitt,Long,2015PlantCell&Environment

6

EvanDelucia

StartingPoint:ConceptualModels

7

BioCro:CombiningBiology,Physics,Chemistry

HumphriesandLong2005Miguezetal2009,2012Jaiswal,DeSouza,Larsen,LeBauer,…etal2017Wang,Jaiswal,LeBauer,…etal2015

8

InputsMeteorology(energy,water)

Soil(physics,carbon,nutrients)

Parameters(e.g.planttraits)

OutputsYield,Biomass,

EnergyBalance

WaterUse

NutrientUse

ScalingPhotosynthesisfromLeaftoCanopy

Light

Temperature

Light

Light

Photosynthesis

Photosynthesis

Temperature9

ScalingUp&PredictingtheFuture

IPCC AR5 Warszawski et al. PNAS

Temperature Precipitation

ClimateForecasts(2040-2050) CMIP5:5Climatemodelsx4CO

2emissionsScenarios

10

EffectsofClimateonSugarcaneYieldinBrazil2040-2050 Climate Impact

(metric Tons / ha)

Jaiswal, DeSouza, Larsen, LeBauer, Miguez, Sparovek, Bollero, Buckeridge, Long, 2017

Scaling leaf-level CO2 x T x H2O response11

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn:LinkingModelsandData

• TERRAREF• FutureDirections

12

PEcAn

LeBauer et al, 2013

Ecological Model-data Synthesis

LeBauerandTreseder,2008

13

Thomasetal2013

CombiningDataandModelsIsHard,MostlyaTechnicalChallenge

Traits System States

Prediction

Soil

Meteorology

Parameters

Boundary Conditions

Drivers

Publications

Primary Data

Repositories

Wild Data Relevant Information Configuration

Sensitivity

Calibration

Validation

AnalysesRun Model Outputs

JustRunningaModelisHard

Most of this work is model independent, so solutions can be shared

14

DataSources AnalysesEcosystemModels

BioCroED2CLMSIPNET

...n=12

TheStandardApproach:Redundant,LaborIntensive,ErrorProne

Converter

ForMet,needoneconverterperdriver(m)xmodel(n)combination

Prediction

NARR

NOAA

Fluxnet

CMIP5

… m = 10

Met Station

Calibration

Sensitivity

Validation

Visualization

15

PEcAncommonformats:Manyusersuse,reuse,test,andimprovecomponents

CommonFormat

CommonFormat

EcosystemModels

BioCroED2CLMSIPNET

...n=12

Converter

Onlyneedn+m(notn×m)convertersLesswork,morerobustandvalidresults

Diverse Met Data

NARR

NOAA

Fluxnet

CMIP5

… n = 10

Met Station

Analyses

Prediction

Calibration

Sensitivity

Validation

Visualization

16

ParameterEstimation:CombiningLiteratureandFieldData

LeBaueretal,2013 17

LeBaueretal,2013

Givencurrentdata,whatdrivesuncertainty?3Years,1crop,1location

18

PEcAnVarianceDecomposition

Bars:ParameterContributiontoUncertaintyinYieldPrediction

Grey=PriorBlack=Posterior

Usedtoinformoptimaldatacollection

LeBaueretal,2013

Automation&Reuse:Uncertaintyanalysisbars/color=ParameterContributiontoPredictiveUncertainty

3Years,1crop,1location

19Dietzeetal,2014

~1Year,8scientists,17PFTs,6biomes

TargetedFieldStudy:WillowWaterUse

Wertin,LeBauer,Volk,Leakey,inprep

Predictions

20

Before AfterDataCollection

AddData Configure AnalyzeRun

MakingCrop&EcosystemModelsAccessible

LeBaueretal2013,Kooperetal2013,Dietzeetal,201321

PEcAnisacommunityproject

42Contributors>50citationsTextbook100sofstudentstrained

22

PEcAnRadiativeTransferModelInversion

23Ely,Serbin,Shiklomanov,Dietzeandothers

PEcAnnowprovidesaplaceforsharedmodels,dataaccess,andtools

Tools: Web front end PostGIS database* Met Scaling and Gap filling Data Ingest Meta-Analysis* Sensitivity & Uncertainty Analysis* Ensemble Prediction Parameter Data Assimilation State Data Assimilation Benchmarking Visualization* Data Modeling:

Radiative Transfer Photosynthesis Tree Rings

Models: BioCro* CABLE CLM DALEC ED* FATES G’Day JULES Linkages LPJguess MAAT MAESPA PRELES SIPNET

Data: Literature* Field Measurements Expert Priors* Meteorology Soils PalEON Fluxnet ORNL NEON TERRA REF* LTER …

github.com/pecanproject/pecanpecanproject.org

24

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn:LinkingModelsandData

• TERRAREF• FutureDirections

25

HighThroughputPhenotyping

• HighThroughputPhenotyping:• Replacemanualwithsensor-basedmeasurements• Measuremoretraitswithhigherfrequency

• But…sensorsareexpensiveanddataaredifficulttointerpret

• Terraprogrammajorinvestmenttopushthisforward

http://bulletin

.ipm.illinois.ed

u/print.ph

p?id=513

26

TERRAREF

• Motivation:

• AutomatedMeasurements—>StrongerInference

• Software&Data—>FrameworkforInterdisciplinaryCollaboration

• Solutions:• ReferenceDatasets• ModularandInteroperable

• OpenData,Software,Computing

27

APhenomicsPipelineforCropImprovement

Sensors Traits Genotypes

Selection

Genomics

Higher Yield Yield Stability Nutrition Stress Tolerance and more …

Automated MeasurementsComponent & Aggregate

Genomic Prediction

Pan Genome

28

DiverseScientificDisciplines

Sensors Traits Genotypes

Selection

Genomics

Engineering Robotics Computer Vision

(Eco)Physiology Agronomy

Biology

Breeding

Statistics & Machine Learning

29

ARPA-ETERRA

OpenDatasetforSixProjects+PublicRelease

30

TERRAReferenceDataSources

LemnatecScanalyzerDanforth,St.Louis

LemnatecFieldScannerUSDAALRC,Maricopa,AZ

TractorandUAVAZandKansasState

31

FieldScannerSensors

terraref.org/articles/lemnatec-scanalyzer-field-sensors/

VNIR Imaging Spectrometer 380-1000nmSWIR Imaging Spectrometer 900-2500 nmIR Temperature SensorNDVI (1 down, 1 up) 650, 800 nmPRI Sensor 531, 570 nmPAR Sensor 410-655 nmColor Sensor 410-655 nm3D Scanners: 2 Side View, 1 DownRGB: 2 Side View, 1 Down (1)Active Reflectance 670, 730, 780 nmPS II Fluorescence Environmental: wind, temperature, humidity, light, rain, CO2

32

Approach:IntegrateSoftwareandDatabases

• Whatdopeoplecurrentlyuse?

• Whatdomainspecificsoftwareanddatabasesexist?

• Howcanweconnectthese?• Whatstandards&conventionstoadopt?

33

GeneralFrameworkforCross-DomainLinks

Sensors Traits Genotypes

Selection

Genomics

LocationTime

Genotype

34

DataFormats,Standards&Conventions

Sensors Traits Genotypes

Selection

CF Conventions OGC

geoTIFF NetCDF-CF LAS

PEcAn Crop Ontology AgMIP/ICASA BRAPI

BAM, FASTQ, VCF, BED, FASTA, GFF

Genomics

35

TERRAREFDatabases

Sensors Traits Genotypes

Selection

Genomics

36

ModularSoftware

github.com/terraref 37

TERRAREFPipeline

Fieldmeasurements

Metadata

TraitData

PipelineOrchestration

SensorData

Analysis&Development

1TB/d

<48h

Genomics

38

DataAnalysisEnvironmentsAnyLinuxConfiguration+LargeFilesystem+ Databases+ Compute

Workflows:Analyze! Share! PublishDevelop! Deploy

workbench.terraref.org39

~/data~/tutorials

40

WebApplicationDevelopedwithNDSWorkbench

traitvis.workbench.terraref.org 41

218mm

RobertPlessZongyangLiSolmazHajmohammadi

3DLaserScanner

42

%Reflectance

10cm

Nscandirection

HyperspectralImageat543nm

x

y

43

Thermal

44

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 45

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 46

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 47

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 48

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 49

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 50

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 51

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 52

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 53

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 54

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 55

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 56

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 57

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 58

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 59

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 60

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 61

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 62

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 63

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 64

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 65

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 66

GeoffMorris&ZhenbinHu,KSULOD(LogorithmofOdds)geneslinkedtotrait

GenesThatControlGrowthRate

67

Getinvolved

• Signupforbetareleaseofsoftwareanddata• terraref.org/data

• Useandprovidefeedbackonsoftwareanddataformats

• github.com/terraref

• Collaborate• Fieldmeasurements

• Software• Algorithms

• ColocatedSensors68

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn:LinkingModelsandData

• TERRAREF• FutureDirections

69

Baroneetal2017bioRxiv“UnmetNeedsforAnalyzingBiologicalBigData:ASurveyof704NSFPrincipalInvestigators”

SoftwareCarpentryXSEDE.org,SharedClusters

Trainingisthebottleneck

70

Introductiontodatascience,withexamplesandprojectsfromTERRAREF

HackathonsandTraining

71

ArkansasStateUniversityIowaStateUniversityPurdueUniversityUniversityofArizonaUniversityofIllinoisUniversityofNebraskaUniversityofArkansas

Toppetal,unpublished72

SensorModelingandModelCoupling

Toppetal.unpublished73

ModularModelComponents

Zhu,Lynch,LeBauer,Millar,Stitt,Long,2015PlantCell&EnvironmentMarshall-Colonetal2017FronsersinPlantSciencecropsinsilico.org

Eachcomponentrepresents>=1hypothesis.

Eachparameteroroutputcanbetreatedasaphenotype

EnvironmentaldriverscanbeintegratedovertoaddressGxE

74

PurduePhenomics&IoTPlatforms• DevelopCyberinfrastructure

• Makedatauseable

• Facilitateinterdisciplinaryresearch• Assessexistingcapabilities,currentroadblocks,futureneeds• WorkwithLibrary,RCAC,facultytofacilitatedatapublishing

• QA/QC• CommunityStandardsandCommonInterfaces

75

Funding:

NSFAdvancesinBiologicalInfrastructure

USDANIFAFoodandAgricultureCyberinformaticsandTools

AgriculturalTechnology

Onceweunderstandhowthesesystemswork,wecanengineerforecosystemservicesratherthatsolelyforyield:• Climatecontrol• Soilimprovement,carbonstorage• Roots,mycorrhizae,microbiome• Pharmaceuticals• PetrochemicalSubstitutes• …anythingplantscando

NASA Ames Research Center

76

ToddMockler ProjectLeadNadiaShakoor ProjectDirector

NoahFahlgren Phenotyping&BioinformaticsEricaFishel TechnologyTransfer

SolmazHajmohammadi SensorFusion

StephenKresovich BreedingJeremySchmutz Sequencing

GeoffMorris Gene-traitAssociationsWilliamRooney Breeding

PedroAndrade-Sanchez Agronomy&PhenomicsMichaelOttman Physiology

MariaNewcomb FieldMeasurementsJeffWhite Agronomy

DavidLeBauer Informatics&ComputingRobertPless ImageAnalysis

RomanGarnett PredictionAlgorithmsWasitWalamu Sensing&Physiology

MaxBurnetteCraigWillis

RobKooperJeffTerstreip

ZongyangLi

ZhenbinHuNickHeyek

CharlieZenderHenryButowsky

Team

77

• MikeDietze,BostonUniversity

• DavidLeBauer,UniversityofIllinois• ShawnSerbin,BrookhavenNationalLab• AnkurDesai,UniversityofWisconsin

• KentonMcHenry,NationalCenterforSupercomputingApplications

• andmanyotheruser/contributors

78

DavidLeBauer

dlebauer@illinois.edu

TERRAREF

terraref.org

github.com/terraref

@terra_ref

PEcAnProject

pecanproject.org

github.com/pecanproject

@pecanproject79

Recommended