58
Javier Sanchez Javier Sanchez Pat Langley Pat Langley Computational Learning Laboratory Computational Learning Laboratory Center for the Study of Language and Information Center for the Study of Language and Information Stanford University, Stanford, California Stanford University, Stanford, California http://cll.stanford.edu/ http://cll.stanford.edu/ An Interactive Environment An Interactive Environment for for Scientific Model Scientific Model Construction Construction o N. Asgharbeygi, K. Arrigo, S. Bay, J. Fitzgerald, D. George, S. Kl o N. Asgharbeygi, K. Arrigo, S. Bay, J. Fitzgerald, D. George, S. Kl r, K. Saito, and T. Shinar. r, K. Saito, and T. Shinar.

Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Embed Size (px)

Citation preview

Page 1: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Javier SanchezJavier Sanchez

Pat LangleyPat LangleyComputational Learning LaboratoryComputational Learning Laboratory

Center for the Study of Language and InformationCenter for the Study of Language and InformationStanford University, Stanford, CaliforniaStanford University, Stanford, California

http://cll.stanford.edu/http://cll.stanford.edu/

An Interactive Environment forAn Interactive Environment forScientific Model ConstructionScientific Model Construction

Thanks to N. Asgharbeygi, K. Arrigo, S. Bay, J. Fitzgerald, D. George, S. Klooster, Thanks to N. Asgharbeygi, K. Arrigo, S. Bay, J. Fitzgerald, D. George, S. Klooster, C. Potter, K. Saito, and T. Shinar.C. Potter, K. Saito, and T. Shinar.

Page 2: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Lessons about Scientific Knowledge DiscoveryLessons about Scientific Knowledge Discovery

Our research collaborations in Earth science and microbiology Our research collaborations in Earth science and microbiology have revealed some important lessons: have revealed some important lessons:

1. Scientists are more comfortable with their own notations than 1. Scientists are more comfortable with their own notations than ones from machine learning and data mining.ones from machine learning and data mining.

2. Scientific data are often rare and difficult, indicating a need 2. Scientific data are often rare and difficult, indicating a need for additional constraints. for additional constraints.

3. Scientists often have initial models and knowledge that should 3. Scientists often have initial models and knowledge that should influence the discovery process.influence the discovery process.

4. Scientists typically want computational assistance rather than 4. Scientists typically want computational assistance rather than automated discovery systems.automated discovery systems.

These observations suggest a need for alternative computational These observations suggest a need for alternative computational approaches to scientific model construction. approaches to scientific model construction.

Page 3: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

ModelModelRevisionRevision

Initial modelInitial modelObservationsObservations

RevisedRevisedmodelmodel

ScientistScientist

Data, Knowledge, and the ScientistData, Knowledge, and the Scientist

process exponential_growth process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1,equations: d[P,t] = [0, 1,] ] P P

process logistic_growthprocess logistic_growth variables: P {population}variables: P {population} equations: d[P,t] = [0, 1, equations: d[P,t] = [0, 1, ] ] P P (1 (1 P / [0, 1, P / [0, 1, ])])

process constant_inflowprocess constant_inflow variables: I {inorganic_nutrient}variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, equations: d[I,t] = [0, 1, ]]

process consumptionprocess consumption variables: P1 {population}, P2 {population}, variables: P1 {population}, P2 {population}, nutrient_P2 nutrient_P2 equations: d[P1,t] = [0, 1, equations: d[P1,t] = [0, 1, ] ] P1 P1 nutrient_P2, nutrient_P2, d[P2,t] = d[P2,t] = [0, 1, [0, 1, ] ] P1 P1 nutrient_P2 nutrient_P2

process no_saturationprocess no_saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = Pequations: nutrient_P = P

process saturationprocess saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, equations: nutrient_P = P / (P + [0, 1, ])])

model AquaticEcosystemmodel AquaticEcosystem

variables: nitro, phyto, zoo, nutrient_nitro, variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phytonutrient_phytoobservables: nitro, phyto, zooobservables: nitro, phyto, zoo

process phyto_exponential_growthprocess phyto_exponential_growth equations: d[phyto,t] = 0.1 equations: d[phyto,t] = 0.1 phyto phyto

process zoo_logistic_growthprocess zoo_logistic_growth equations: d[zoo,t] = 0.1 equations: d[zoo,t] = 0.1 zoo / (1 zoo / (1 zoo / 1.5) zoo / 1.5)

process phyto_nitro_consumptionprocess phyto_nitro_consumption equations: d[nitro,t] = equations: d[nitro,t] = 1 1 phyto phyto nutrient_nitro, nutrient_nitro, d[phyto,t] = 1 d[phyto,t] = 1 phyto phyto nutrient_nitro nutrient_nitro

process phyto_nitro_no_saturationprocess phyto_nitro_no_saturation equations: nutrient_nitro = nitroequations: nutrient_nitro = nitro

process zoo_phyto_consumptionprocess zoo_phyto_consumption equations: d[phyto,t] = equations: d[phyto,t] = 1 1 zoo zoo nutrient_phyto, nutrient_phyto, d[zoo,t] = 1 d[zoo,t] = 1 zoo zoo nutrient_phyto nutrient_phyto

process zoo_phyto_saturationprocess zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5)equations: nutrient_phyto = phyto / (phyto + 0.5)

model AquaticEcosystemmodel AquaticEcosystem

variables: nitro, phyto, zoo, nutrient_nitro, variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phytonutrient_phytoobservables: nitro, phyto, zooobservables: nitro, phyto, zoo

process phyto_exponential_growthprocess phyto_exponential_growth equations: d[phyto,t] = 0.1 equations: d[phyto,t] = 0.1 phyto phyto

process zoo_logistic_growthprocess zoo_logistic_growth equations: d[zoo,t] = 0.1 equations: d[zoo,t] = 0.1 zoo / (1 zoo / (1 zoo / 1.5) zoo / 1.5)

process phyto_nitro_consumptionprocess phyto_nitro_consumption equations: d[nitro,t] = equations: d[nitro,t] = 1 1 phyto phyto nutrient_nitro, nutrient_nitro, d[phyto,t] = 1 d[phyto,t] = 1 phyto phyto nutrient_nitro nutrient_nitro

process phyto_nitro_no_saturationprocess phyto_nitro_no_saturation equations: nutrient_nitro = nitroequations: nutrient_nitro = nitro

process zoo_phyto_consumptionprocess zoo_phyto_consumption equations: d[phyto,t] = equations: d[phyto,t] = 1 1 zoo zoo nutrient_phyto, nutrient_phyto, d[zoo,t] = 1 d[zoo,t] = 1 zoo zoo nutrient_phyto nutrient_phyto

process zoo_phyto_saturationprocess zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5)equations: nutrient_phyto = phyto / (phyto + 0.5)

Page 4: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

The PThe PROMETHEUSROMETHEUS Modeling Environment Modeling Environment

specify process models of static and dynamic systems;specify process models of static and dynamic systems;

display and edit a model’s structure and details graphically;display and edit a model’s structure and details graphically;

utilize a model to simulate a system’s behavior over time;utilize a model to simulate a system’s behavior over time;

incorporate background knowledge cast as generic processes;incorporate background knowledge cast as generic processes;

indicate which processes to consider during model revision; indicate which processes to consider during model revision;

invoke a revision module that improves a model’s fit to data.invoke a revision module that improves a model’s fit to data.

PPROMETHEUSROMETHEUS is an interactive environment that lets its users: is an interactive environment that lets its users:

Our initial results focused on static models, but in this talk we Our initial results focused on static models, but in this talk we illustrate the system’s use on dynamic models. illustrate the system’s use on dynamic models.

Page 5: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

A Process Model for an Aquatic EcosystemA Process Model for an Aquatic Ecosystem

model AquaticEcosystem;model AquaticEcosystem;

variables phyto, zoo, nitro, residue;variables phyto, zoo, nitro, residue;observables phyto, nitro;observables phyto, nitro; process zoo_exponential_decay;process zoo_exponential_decay; equationsequations d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo; zoo;

d[residue,t,1] = 0.251;d[residue,t,1] = 0.251; process zoo_phyto_predation;process zoo_phyto_predation; equationsequations d[zoo,t,1] = 0.615 d[zoo,t,1] = 0.615 0.495 0.495 zoo;zoo;

d[residue,t,1] = 0.385 d[residue,t,1] = 0.385 0.495 0.495 zoo;zoo;

d[phyto,t,1] = d[phyto,t,1] = 0.495 0.495 zoo; zoo;

process nitro_uptake;process nitro_uptake; conditionsconditions nitro > 1.25;nitro > 1.25; equationsequations d[phyto,t,1] = 0.411 d[phyto,t,1] = 0.411 phyto; phyto;

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto;phyto; process nitro_remineralization;process nitro_remineralization; equationsequations d[nitro,t,1] = 0.005 d[nitro,t,1] = 0.005 residue; residue;

d[residue,t,1 ] = d[residue,t,1 ] = 0.005 0.005 residue; residue;

Page 6: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Advantages of Quantitative Process ModelsAdvantages of Quantitative Process Models

they embed quantitative relations within qualitative structure;they embed quantitative relations within qualitative structure;

that refer to notations and mechanisms familiar to scientists;that refer to notations and mechanisms familiar to scientists;

they support both algebraic and dynamical relationships;they support both algebraic and dynamical relationships;

they offer causal and explanatory accounts of phenomena;they offer causal and explanatory accounts of phenomena;

while retaining the modularity needed to support induction.while retaining the modularity needed to support induction.

Process models are a good target for modeling systems because: Process models are a good target for modeling systems because:

Quantitative process models provide an important alternative to Quantitative process models provide an important alternative to formalisms used currently in scientific modeling. formalisms used currently in scientific modeling.

Page 7: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Viewing a Process Model GraphicallyViewing a Process Model Graphically

Page 8: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Simulation and Prediction in PSimulation and Prediction in PROMETHEUSROMETHEUS

To utilize a given process model, PTo utilize a given process model, PROMETHEUSROMETHEUS simulates its simulates its behavior over time or samples by: behavior over time or samples by:

accepting initial values for input variables and a time step size;accepting initial values for input variables and a time step size;

on each time step, determining which processes are active;on each time step, determining which processes are active;

solving active static/differential equations with known values;solving active static/differential equations with known values;

propagating values and solving other active equations; propagating values and solving other active equations;

when multiple processes influence the same variable, assuming when multiple processes influence the same variable, assuming their effects are additive. their effects are additive.

This module makes specific predictions that the user can compare This module makes specific predictions that the user can compare to observations. to observations.

Page 9: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Predictions from Aquatic Ecosystem ModelPredictions from Aquatic Ecosystem Model

Page 10: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

A User-Guided Method for Model RevisionA User-Guided Method for Model Revision

1. Specify all ways to alter the initial model in terms of revising 1. Specify all ways to alter the initial model in terms of revising parameters in, removing, and adding processes; parameters in, removing, and adding processes;

2. Find all ways to instantiate candidate additions with specific 2. Find all ways to instantiate candidate additions with specific variables, subject to type constraints;variables, subject to type constraints;

3. Generate candidate model structures by removing and adding 3. Generate candidate model structures by removing and adding indicated processes, with limits on total number of processes.indicated processes, with limits on total number of processes.

4. For each model structure, search for parameter values that 4. For each model structure, search for parameter values that provide a good fit to the data;provide a good fit to the data;

5. Return a list of the best N parameterized models, ranked by 5. Return a list of the best N parameterized models, ranked by their mean squared error .their mean squared error .

The PThe PROMETHEUSROMETHEUS system revises a process model in five stages: system revises a process model in five stages:

The user can inspect these revisions and select the one he finds The user can inspect these revisions and select the one he finds most plausible. most plausible.

Page 11: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Marking Processes to Revise or RemoveMarking Processes to Revise or Remove

Page 12: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Indicating Processes to Consider AddingIndicating Processes to Consider Adding

Page 13: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Generic Processes as Background KnowledgeGeneric Processes as Background Knowledge

the variables involved in a process and their types;the variables involved in a process and their types; the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges; the forms of conditions on the process; andthe forms of conditions on the process; and the forms of associated equations and their parameters.the forms of associated equations and their parameters.

PPROMETHEUSROMETHEUS casts background knowledge about a domain as casts background knowledge about a domain as generic processesgeneric processes that specify: that specify:

Generic processes are the building blocks that PGeneric processes are the building blocks that PROMETHEUSROMETHEUS uses uses in its revision of specific process models. in its revision of specific process models.

Page 14: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Generic Processes for Aquatic EcosystemsGeneric Processes for Aquatic Ecosystems

generic process exponential_decay;generic process exponential_decay; generic process generic process remineralization;remineralization; variables: S{species}, D{detritus};variables: S{species}, D{detritus}; variables N{nutrient}, variables N{nutrient}, D{detritus};D{detritus}; parameters parameters [0, 1]; [0, 1]; parameters parameters [0, 1]; [0, 1]; equations equations d[S,t,1] = d[S,t,1] = 1 1 S; S; equations equations d[N, t,1] = d[N, t,1] = D; D;

d[D,t,1] = d[D,t,1] = S; S; d[D, t,1] = d[D, t,1] = 1 1 D;D;

generic process predation;generic process predation; generic process constant_inflow;generic process constant_inflow; variables S1{species}, S2{species}, D{detritus};variables S1{species}, S2{species}, D{detritus}; variables variables N{nutrient};N{nutrient}; parameters parameters [0, 1], [0, 1], [0, 1]; [0, 1]; parameters parameters [0, 1]; [0, 1]; equations equations d[S1,t,1] = d[S1,t,1] = S1; S1; equations equations d[N,t,1] = d[N,t,1] = ;;

d[D,t,1] = (1 d[D,t,1] = (1 ) ) S1; S1;d[S2,t,1] = d[S2,t,1] = 1 1 S1; S1;

generic process nutrient_uptake;generic process nutrient_uptake; variables S{species}, N{nutrient};variables S{species}, N{nutrient}; parameters parameters [0, [0, ], ], [0, 1], [0, 1], [0, 1]; [0, 1]; conditions conditions N > N > ;; equations equations d[S,t,1] = d[S,t,1] = S ; S ;

d[N,t,1] = d[N,t,1] = 1 1 S; S;

Page 15: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Specifying Data and Search ParametersSpecifying Data and Search Parameters

Page 16: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Inspecting Revised Process ModelsInspecting Revised Process Models

Page 17: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Best Fit to Nitrate Data from Ross SeaBest Fit to Nitrate Data from Ross Sea

Page 18: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Best Fit to Phytoplankton Data from Ross SeaBest Fit to Phytoplankton Data from Ross Sea

Page 19: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

computational scientific discovery (e.g., Langley et al., 1983);computational scientific discovery (e.g., Langley et al., 1983);

theory revision in machine learning (e.g., Towell, 1991);theory revision in machine learning (e.g., Towell, 1991);

qualitative physics and simulation (e.g., Forbus, 1984);qualitative physics and simulation (e.g., Forbus, 1984);

languages for scientific simulation (e.g., languages for scientific simulation (e.g., STELLA, MATLABSTELLA, MATLAB););

interactive tools for data analysis (e.g., Schneiderman, 2001).interactive tools for data analysis (e.g., Schneiderman, 2001).

Intellectual InfluencesIntellectual Influences

Our approach to scientific model construction incorporates ideas Our approach to scientific model construction incorporates ideas from many traditions:from many traditions:

The PThe PROMETHEUSROMETHEUS environment combines insights from machine environment combines insights from machine learning, AI, programming languages, and HCI in novel ways.learning, AI, programming languages, and HCI in novel ways.

Page 20: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Valdes-Perez’ (1995) MValdes-Perez’ (1995) MECHEMECHEM [chemistry] [chemistry]

Rickel and Porter’s (1997) TRickel and Porter’s (1997) TRIPELRIPEL [biology] [biology]

Sleeman et al.’s (1997) DSleeman et al.’s (1997) DAVICCANDAVICCAND [metallurgy] [metallurgy]

Mahidadia and Compton’s (2001) JMahidadia and Compton’s (2001) JUSTUSTAAIDID [endocrinology] [endocrinology]

Specific PrecursorsSpecific Precursors

A few earlier systems support the interactive discovery of scientific A few earlier systems support the interactive discovery of scientific models, including:models, including:

PPROMETHEUSROMETHEUS adapts their ideas to scientific domains that involve adapts their ideas to scientific domains that involve quantitative explanatory models, such as Earth science.quantitative explanatory models, such as Earth science.

Page 21: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Directions for Future ResearchDirections for Future Research

produce additional results on other scientific data sets;produce additional results on other scientific data sets;

develop more robust methods for fitting model parameters;develop more robust methods for fitting model parameters;

implement interactive methods for searching the model space;implement interactive methods for searching the model space;

introduce models with subsystems to handle complexity; andintroduce models with subsystems to handle complexity; and

carry out usability studies with the modeling environment.carry out usability studies with the modeling environment.

Despite our progress to date, we need further work in order to:Despite our progress to date, we need further work in order to:

Interactive environments for model construction and revision have Interactive environments for model construction and revision have great potential to speed progress in science and engineering.great potential to speed progress in science and engineering.

Page 22: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Contributions of the ResearchContributions of the Research

a new formalism for representing scientific process models;a new formalism for representing scientific process models;

a graphical interface for displaying and editing these models;a graphical interface for displaying and editing these models;

a computational method for simulating these models’ behavior;a computational method for simulating these models’ behavior;

an encoding for background knowledge as generic processes; an encoding for background knowledge as generic processes;

an interactive method for revising process models given data.an interactive method for revising process models given data.

In summary, our work on the PIn summary, our work on the PROMETHEUSROMETHEUS system has led to: system has led to:

We have demonstrated this approach to model construction and We have demonstrated this approach to model construction and revision on Earth science problems with encouraging results. revision on Earth science problems with encouraging results.

Page 23: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Hierarchical Model of a Power GridHierarchical Model of a Power Grid

Page 24: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

End of PresentationEnd of Presentation

Page 25: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Steps in Applying Computational Scientific DiscoverySteps in Applying Computational Scientific Discovery

problemformulation

representationengineering

data collection/manipulation

algorithmmanipulation

filtering andinterpretation

algorithminvocation

Page 26: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

A Process Model for Carbon ProductionA Process Model for Carbon Productionmodel npp;

variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg;

observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg;

process CarbonProd; equations NPPc = E * IPAR;

process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W)));

process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - (0.0005 * (Topt ^ 2))));

process TempStress2; equations T2 = ((1.1814 / (1 + (2.718281828 ^ (0.2 * (Topt - 10 - tempc))))) / (1 + (2.718281828 ^ (0.3 * (tempc - 10 - Topt)))));

process WaterStress; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET)));

process WSNoEvapoTrans; conditions PET==0; equations W = 0.5;

process EvapoTrans; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM; • • •

Page 27: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Viewing a Process Model GraphicallyViewing a Process Model Graphically

Page 28: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Results of Revising the NPP ModelResults of Revising the NPP Model

Initial model:Initial model:

E = 0.56 · T1 · T2 · WE = 0.56 · T1 · T2 · W

T2 = 1.18 / [(1 + e T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10)0.2 · (Topt – Tempc – 10) ) · (1 + e ) · (1 + e 0.3 · (Tempc – Topt – 10)0.3 · (Tempc – Topt – 10) )] )]

PET = 1.6 · (10 · Tempc / AHI)PET = 1.6 · (10 · Tempc / AHI)AA · PET-TW-M · PET-TW-M

SR SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05}

RMSE on training data = 465.212 RMSE on training data = 465.212 andand r r 2 2 = 0.799 = 0.799

Revised model:Revised model:

E = 0.353 · T1E = 0.353 · T10.000.00 · T2 · T2 0.080.08 · W · W 0.000.00

T2 = 0.83 / [(1 + e T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34)1.0 · (Topt – Tempc – 6.34) ) · (1 + e ) · (1 + e 1.0 · (Tempc – Topt – 11.52)1.0 · (Tempc – Topt – 11.52) )] )]

PET = 1.6 · (10 · Tempc / AHI)PET = 1.6 · (10 · Tempc / AHI) AA · PET-TW-M · PET-TW-M

SR SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61}

Cross-validated RMSE = 397.306 Cross-validated RMSE = 397.306 andand r r 2 2 = 0.853 [ 15= 0.853 [ 15 % reduction ]% reduction ]

••

••

••

Page 29: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

The Challenge of Systems ScienceThe Challenge of Systems Science

focus on synthesis rather than analysis in their operation;focus on synthesis rather than analysis in their operation;

rely on computer modeling as one of their central methods;rely on computer modeling as one of their central methods;

develop system-level models with many variables and relations;develop system-level models with many variables and relations;

evaluate their models on observational, not experimental, data. evaluate their models on observational, not experimental, data.

Disciplines like Earth science and computational biology differ Disciplines like Earth science and computational biology differ from traditional fields in that they:from traditional fields in that they:

Developing and testing such models are complex tasks that would Developing and testing such models are complex tasks that would benefit from computational aids. benefit from computational aids.

Our research goal is to design, construct, evaluate, and understand Our research goal is to design, construct, evaluate, and understand such computational tools for systems science.such computational tools for systems science.

Page 30: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Fields Contributing to the Proposed ResearchFields Contributing to the Proposed Research

computational scientific discovery

qualitative reasoningsimulation languages,

numerical analysis

human-computerinteraction

biology, physiology,Earth science

Page 31: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

identifying conditions on processes (parameter optimization)identifying conditions on processes (parameter optimization)

inferring initial values of unobservables (parameter optimization)inferring initial values of unobservables (parameter optimization)

keeping the search space tractable (typing on variables)keeping the search space tractable (typing on variables)

reducing variance to mitigate overfitting (min. desc. length)reducing variance to mitigate overfitting (min. desc. length)

Inductive process modeling raises a number of issues that have Inductive process modeling raises a number of issues that have clear analogues in other paradigms:clear analogues in other paradigms:

We have demonstrated promising responses to these four problems We have demonstrated promising responses to these four problems within the IPM framework. within the IPM framework.

Issues in Process Model InductionIssues in Process Model Induction

Page 32: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Best Model Fit to Data from Ross SeaBest Model Fit to Data from Ross Sea

Page 33: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Best Model Fit to Data on Protozoan PredationBest Model Fit to Data on Protozoan Predation

Page 34: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Collecting Data on Photosynthetic ProcessesCollecting Data on Photosynthetic Processes

External stimuli (e.g., light)External stimuli (e.g., light)

Adaptation PeriodAdaptation Period

Sampling mRNA/cDNASampling mRNA/cDNA

Equlibrium PeriodEqulibrium Period

MicroarrayMicroarrayTraceTrace

Continuous Culture (Chemostat)Continuous Culture (Chemostat)

/wwwscience.murdoch.edu.au/teach

www.affymetrix.com/

www.affymetrix.com/

Hea

lth

of C

ultu

reH

ealt

h of

Cul

ture

TimeTime

Page 35: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Gene Expressions for CyanobacteriaGene Expressions for Cyanobacteria

Page 36: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Generic Processes for Photosynthesis RegulationGeneric Processes for Photosynthesis Regulation

generic process translationgeneric process translation generic process transcriptiongeneric process transcription variables: P{protein}, M{mRNA}variables: P{protein}, M{mRNA} variables: M{mRNA}, R{rate} variables: M{mRNA}, R{rate} parameters: parameters: [0, 1] [0, 1] parameters: parameters: equations:equations: d[P,t,1] = d[P,t,1] = M M equations: equations: d[M,t,1] = Rd[M,t,1] = R

generic process regulate_onegeneric process regulate_one generic process regulate_twogeneric process regulate_two variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} parameters: parameters: [ [1 , 1] 1 , 1] parameters: parameters: [ [1 , 1], 1 , 1], [0, 1] [0, 1] equations:equations: R = R = S S equations: equations: R = R = S S

d[S, t,1] = d[S, t,1] = 1 1 S S

generic process automatic_degradationgeneric process automatic_degradation generic process controlled_degradationgeneric process controlled_degradation variables: C{concentration}variables: C{concentration} variables: D{concentration}, variables: D{concentration}, E{concentration}E{concentration} conditions:conditions: C > 0C > 0 conditions: conditions:D > 0, E > 0D > 0, E > 0 parameters: parameters: [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[C,t,1] = d[C,t,1] = 1 1 C C equations: equations: d[D,t,1] = d[D,t,1] = 1 1 E E

d[E,t,1] = d[E,t,1] = 1 1 E Egeneric process photosynthesisgeneric process photosynthesis variables: L{light}, P{protein}, R{redox}, S{ROS}variables: L{light}, P{protein}, R{redox}, S{ROS} parameters: parameters: [0, 1], [0, 1], [0, 1] [0, 1] equations:equations: d[R,t,1] = d[R,t,1] = L L P P

d[S,t,1] = d[S,t,1] = L L P P

Page 37: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

A Process Model for Photosynthetic RegulationA Process Model for Photosynthetic Regulation

model photo_regulationmodel photo_regulation

variables: light, mRNA_protein, ROS, redox, transcription_ratevariables: light, mRNA_protein, ROS, redox, transcription_rateobservables: light, mRNAobservables: light, mRNA

process photosynthesis;process photosynthesis; equations:equations: d[redox,t,1] = 0.0155 d[redox,t,1] = 0.0155 light light protein protein

d[ROS,t,1] = 0.019 d[ROS,t,1] = 0.019 light light protein protein

process protein_translationprocess protein_translation process mRNA_transcriptionprocess mRNA_transcription equations:equations: d[protein,t,1] = 7.54 d[protein,t,1] = 7.54 mRNA mRNA equations:equations: d[mRNA,t,1] = transcription_rated[mRNA,t,1] = transcription_rate

process regulate_one_1process regulate_one_1 process regulate_two_2process regulate_two_2 equations:equations: transcription_rate = 0.99 transcription_rate = 0.99 light light equations:equations: transcription_rate = 1.203 transcription_rate = 1.203 redox redox

d[redox,t,1] = d[redox,t,1] = 0.0002 0.0002 redoxredox

process automatic_degradation_1process automatic_degradation_1 process controlled_degradation_1process controlled_degradation_1 conditions:conditions: protein > 0protein > 0 conditions: conditions:redox > 0, ROS > 0redox > 0, ROS > 0 equations:equations: d[protein,t,1] = d[protein,t,1] = 1.91 1.91 protein protein equations:equations: d[redox,t,1] = d[redox,t,1] = 0.0003 0.0003 ROS ROS

d[ROS,t,1] = d[ROS,t,1] = 0.0003 0.0003 ROS ROS

Page 38: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Predictions from Best Parameterized ModelPredictions from Best Parameterized Model

Page 39: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Electric Power on the International Space StationElectric Power on the International Space Station

Page 40: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Telemetry Data from Space Station BatteriesTelemetry Data from Space Station Batteries

Page 41: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Induced Process Model for Battery BehaviorInduced Process Model for Battery Behavior

model Batterymodel Battery

variables: Rs, Vcb, soc , Vt, i, temperaturevariables: Rs, Vcb, soc , Vt, i, temperatureobservable: soc, Vt, i, temperatureobservable: soc, Vt, i, temperature

process voltage_chargeprocess voltage_charge process voltage_dischargeprocess voltage_discharge conditions:conditions: i i 0 0 conditions: conditions: i < 0i < 0 equations:equations: Vt = Vcb + 6.105 Vt = Vcb + 6.105 Rs Rs i i equations: equations: Vt = Vt = Vcb Vcb 1.0 / (Rs + 1.0) 1.0 / (Rs + 1.0)

process charge_transferprocess charge_transfer equations:equations: d[soc,t,1] = i d[soc,t,1] = i Vcb/179.38 Vcb/179.38

process quadratic_influence_Vcb_socprocess quadratic_influence_Vcb_soc equations:equations: Vcb = 41.32 Vcb = 41.32 soc soc soc soc

process linear_influence_Vcb_tempprocess linear_influence_Vcb_temp equations:equations: Vcb = 0.2592 Vcb = 0.2592 temperature temperature

process linear_influence_Rs_socprocess linear_influence_Rs_soc equations:equations: Rs = 0.03894 Rs = 0.03894 soc soc

Page 42: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Results on Battery Test DataResults on Battery Test Data

Page 43: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Hierarchical Model of a Power GridHierarchical Model of a Power Grid

Page 44: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

specify a quantitative process model of the target system;specify a quantitative process model of the target system;

display and edit the model’s structure and details graphically;display and edit the model’s structure and details graphically;

simulate the model’s behavior over time and situations;simulate the model’s behavior over time and situations;

compare the model’s predicted behavior to observations; compare the model’s predicted behavior to observations;

invoke a revision module in response to detected anomalies.invoke a revision module in response to detected anomalies.

Because scientists do not want to be replaced, we are developing Because scientists do not want to be replaced, we are developing an interactive environment that lets users:an interactive environment that lets users:

The environment offers computational assistance in forming and The environment offers computational assistance in forming and evaluating models but lets the user retain control. evaluating models but lets the user retain control.

Challenge 5: Interfacing with ScientistsChallenge 5: Interfacing with Scientists

Page 45: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

In MemoriamIn Memoriam

Herbert A. Simon (1916 – 2001)Herbert A. Simon (1916 – 2001)

Jan M. Zytkow (1945 – 2001)Jan M. Zytkow (1945 – 2001)

Two years ago, computational scientific discovery lost two of its Two years ago, computational scientific discovery lost two of its founding fathers:founding fathers:

Both contributed to the field in many ways: posing new problems, Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings.inventing methods, training students, and organizing meetings.

Moreover, both were interdisciplinary researchers who contributed Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics.to computer science, psychology, philosophy, and statistics.

Herb Simon and Jan Zytkow were excellent role models that we Herb Simon and Jan Zytkow were excellent role models that we should all aim to emulate. should all aim to emulate.

Page 46: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Data Mining vs. Scientific DiscoveryData Mining vs. Scientific Discovery

Data miningData mining generates knowledge cast as decision trees, generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers;logical rules, or other notations invented by AI researchers;

Computational scientific discoveryComputational scientific discovery instead uses equations, instead uses equations, structural models, reaction pathways, or other formalisms structural models, reaction pathways, or other formalisms invented by scientists and engineers.invented by scientists and engineers.

There exist two computational paradigms for discovering explicit There exist two computational paradigms for discovering explicit knowledge from data: knowledge from data:

Both approaches draw on heuristic search to find regularities in Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases.data, but they differ considerably in their emphases.

Page 47: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Time Line for Research on Time Line for Research on Computational Scientific DiscoveryComputational Scientific Discovery

1989 19901979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Bacon.1–Bacon.5Abacus,

CoperFahrehneit, E*,

Tetrad, IDSN

Hume,ARC

DST, GPN

LaGrangeSDS

SSF, RF5,LaGramge

Dalton, Stahl

RL, Progol

Gell-MannBR-3,

MendelPauli

Stahlp,Revolver

Dendral

AM Glauber NGlauberIDSQ,

Live

IECoast, Phineas,AbE, Kekada

Mechem, CDPAstra,GPM

HR

BR-4

Numeric laws Qualitative laws Structural models Process modelsLegendLegend

Page 48: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Why Are Process Models Interesting?Why Are Process Models Interesting?

they incorporate they incorporate scientific formalismsscientific formalisms rather than AI notations; rather than AI notations;

that are easily that are easily communicable communicable to scientists and engineers;to scientists and engineers;

they move beyond descriptive generalization to they move beyond descriptive generalization to explanationexplanation;;

while retaining the while retaining the modularitymodularity needed to support induction. needed to support induction.

Process models are a crucial target for machine learning because: Process models are a crucial target for machine learning because:

These reasons point to process models as an ideal representation These reasons point to process models as an ideal representation for scientific and engineering knowledge.for scientific and engineering knowledge.

Process models are an important alternative to formalisms used Process models are an important alternative to formalisms used currently in machine learning. currently in machine learning.

Page 49: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Challenges of Inductive Process ModelingChallenges of Inductive Process Modeling

process models characterize behavior of dynamical systems; process models characterize behavior of dynamical systems;

variables are mainly continuous and data are unsupervised; variables are mainly continuous and data are unsupervised;

observations are not independently and identically distributed;observations are not independently and identically distributed;

process models contain unobservable processes and variables; process models contain unobservable processes and variables;

multiple processes can interact to produce complex behavior.multiple processes can interact to produce complex behavior.

Process model induction differs from typical learning tasks in that:Process model induction differs from typical learning tasks in that:

Compensating factors include a focus on deterministic systems and Compensating factors include a focus on deterministic systems and the availability of background knowledge. the availability of background knowledge.

Page 50: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Making Predictions with Process ModelsMaking Predictions with Process Models

Specify initial values for input variablesand the size for time steps

On each time step, check conditions todecide which processes are active

Solve algebraic and differentialequations with known values

Propagate values and recurseto solve other equations

Add the effects of differentprocesses on each variable

Page 51: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Predictions from IPM’s Induced ModelPredictions from IPM’s Induced Model

Page 52: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Observed values for a set of continuous variables as they varyover time or situations

Generic processes thatcharacterize causal relationships amongvariables in terms ofconditional equations

Inductive Process ModelingInductive Process Modeling

A specific process model that explains the observed values and predicts future data accurately

Induction

training datatraining data

background knowledgebackground knowledge

learned modellearned model

Page 53: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Inductive Process Modeling as SearchInductive Process Modeling as Search

an an initial stateinitial state from which to start search; from which to start search; some some operatorsoperators that generate new states; that generate new states; an an evaluation functionevaluation function that selects among states; that selects among states; an overall an overall control regimecontrol regime for the search; and for the search; and a a halting criterionhalting criterion for ending the search. for ending the search.

To construct a quantitative process model, we need an algorithm to To construct a quantitative process model, we need an algorithm to search the space of models that assumes: search the space of models that assumes:

We have implemented a four-stage method that takes positions on We have implemented a four-stage method that takes positions on these design decisions. these design decisions.

Page 54: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

The IPM Method for Process Model InductionThe IPM Method for Process Model Induction

Find all ways to instantiate known generic processes with specific variables

Combine subsets of instantiated processes into generic models

Remove candidates that are too complex or not connected graphs

For each generic model, search for good parameter values

Return parameterized modelwith the smallest error

Page 55: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

http://www.bio.ic.ac.uk/research/barber/photosystemII.htmlhttp://www.bio.ic.ac.uk/research/barber/photosystemII.html

A Biologist’s Depiction of PhotosynthesisA Biologist’s Depiction of Photosynthesis

Page 56: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Predictions from Best Parameterized ModelPredictions from Best Parameterized Model

Page 57: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

The NPPc Portion of CASAThe NPPc Portion of CASA

NPPc = NPPc = monthmonth max max (E(E··IPAR, 0)IPAR, 0)

E = 0.56 · T1 · T2 · WE = 0.56 · T1 · T2 · W

T1 = 0.8 + 0.02 · Topt – 0.0005 · ToptT1 = 0.8 + 0.02 · Topt – 0.0005 · Topt22

T2 = 1.18 / [(1 + T2 = 1.18 / [(1 + ee 0.2 · (Topt – Tempc – 10)0.2 · (Topt – Tempc – 10) ) · (1 + ) · (1 + ee 0.3 · (Tempc – Topt – 10)0.3 · (Tempc – Topt – 10) )] )]

W = 0.5 + 0.5 · EET / PETW = 0.5 + 0.5 · EET / PET

PET = 1.6 · (10 · Tempc / AHI)PET = 1.6 · (10 · Tempc / AHI)AA · PET-TW-M if Tempc > 0 · PET-TW-M if Tempc > 0

PET = 0 if Tempc < 0PET = 0 if Tempc < 0

A = 0.00000068 · AHIA = 0.00000068 · AHI33 – 0.000077 · AHI – 0.000077 · AHI22 + 0.018 · AHI + 0.49 + 0.018 · AHI + 0.49

IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-ConverIPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver

FPAR-FAS = FPAR-FAS = minmin [(SR-FAS – 1.08) / [(SR-FAS – 1.08) / SRSR (UMD-VEG) , 0.95] (UMD-VEG) , 0.95]

SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

Page 58: Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

A Process Model for an Aquatic EcosystemA Process Model for an Aquatic Ecosystem

model AquaticEcosystem;model AquaticEcosystem;

variables phyto, zoo, nitro, residue;variables phyto, zoo, nitro, residue;observables phyto, nitro;observables phyto, nitro;

process phyto_exponential_decay;process phyto_exponential_decay; equationsequationsd[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto; phyto;

d[residue,t,1] = 0.307 d[residue,t,1] = 0.307 phyto; phyto;

process zoo_exponential_decay;process zoo_exponential_decay; equationsequationsd[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo; zoo;

d[residue,t,1] = 0.251;d[residue,t,1] = 0.251;

process zoo_phyto_predation;process zoo_phyto_predation; equationsequationsd[zoo,t,1] = 0.615 d[zoo,t,1] = 0.615 0.495 0.495 zoo; zoo;

d[residue,t,1] = 0.385 d[residue,t,1] = 0.385 0.495 0.495 zoo; zoo;d[phyto,t,1] = d[phyto,t,1] = 0.495 0.495 zoo; zoo;

process nitro_uptake;process nitro_uptake; conditionsconditions nitro > 1.25;nitro > 1.25; equationsequationsd[phyto,t,1] = 0.411 d[phyto,t,1] = 0.411 phyto; phyto;

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto; phyto;

process nitro_remineralization;process nitro_remineralization; equationsequationsd[nitro,t,1] = 0.005 d[nitro,t,1] = 0.005 residue; residue;

d[residue,t,1 ] = d[residue,t,1 ] = 0.005 0.005 residue; residue;