Introduc)ontoScien)ﬁcModeling CS365Fall2012forrest/classes/cs365-2012/...• Dynamical model works as claimed:" – Run it. E.g., patented devices." ... – Mackey-Glass

Introduc)on to Scien)fic Modeling CS 365 Fall 2012

Review for Midterm

Topics •  What is a model?

–  Styles of modeling •  Aggregate vs. individual models •  State-‐based vs Process-‐based •  Determinis)c vs. Probabilis)c

–  How do we evaluate models? –  Case studies

•  Cellular automata •  Power laws and data analysis

–  Sta)s)cal distribu)ons –  Tes)ng for power laws and other distribu)ons –  Maximum likelihood es)mates –  Power laws in nature –  Mechanisms that generate power laws

•  Simula)on and modeling –  Discrete-‐)me vs. Discrete event –  Monte-‐carlo methods –  Pseudo-‐random number generators

Modeling

•  How do we use models? •  How do we evaluate models? •  Different approaches to modeling •  Examples of different kinds of models and what they are used for

•  Pros and cons of different modeling methods •  Limita)ons of modeling

Examples

•  Blueprint of a bridge"•  2-dimensional projection of a 3-

dimensional image"•  Crash dummies (model humans)"•  Lotka-Volterra equations"•  Forest fire simulation"•  Prisoner’s Dilemma"•  Case studies

How do we evaluate a model? •  Parsimony and simplicity"

–  Occam’s Razor (select the competing hypothesis that makes the fewest new assumptions, when the hypotheses are equal in other respects)"

•  Accuracy of predictions"–  R2 and other statistical tests"

•  Dynamical model works as claimed:"–  Run it. E.g., patented devices."

•  Cogency and relevance of ideas that they produce."•  Falsifiability."•  Consistency---formalize notion of model as a

homomorphic map. !

Common Modeling Assump)ons

•  Homogeneity (all agents are identical / stateless)"•  Equilibrium (no or very simple dynamics)"•  Random mixing"•  No feedback (learning)"•  Deterministic"•  No connection between micro and macro

phenomena"

•  Models with these assumptions can produce some interesting features, e.g., tipping points (R0)."

Features of Complex Systems

•  Heterogeneous agents"•  Non-equilbrium (non-linear dynamics)"•  Contact structure (networks, nonrandom

mixing)"•  Learning / Feedback (agents can change

behavior)"•  Stochastic behavior (interesting behavior in

the tails)"•  Emergence (multi-scale phenomena)"

Modeling complex systems is challenging

•  Closed form solutions rarely exist:"–  Features from previous slide"

•  Detailed simulations are problematic:"–  Can never hope to get all the details correct."–  Because systems are nonlinear, small errors can have large

consequences."•  Evolution is key:"

–  Basic components change over time."–  Individual variants matter (hard to do theory)."

•  Discreteness (e.g., time, state spaces, and internal variable values)."–  Techniques developed to study nonlinear systems are not always

directly applicable."•  Spatial heterogeneity."•  Classical ODE (ordinary differential equation) assumptions:"

–  Well stirred” (each particle-particle interaction is equally likely). "–  Infinite-sized populations."–  Spatial homogeneity. "

Classes of Scien)fic Models

•  Continuous vs. Discrete"– E.g., Differential equation vs. Cellular

automaton"•  Deterministic vs. Probabilistic"– Dynamical system vs. Markov chain"– Cellular automaton vs. genetic algorithm"

•  Spatial vs. nonspatial"•  Data-driven vs. theory-driven"– Bayesian networks vs. expert system"

Aggregate Models Differen)al Equa)ons

•  Represent how a process changes through time as a differential (difference) equation:"–  Time is continuous (discrete)"–  Model components are continuous (density)"–  Deterministic"–  Nonspatial (in simplest case)"

•  Describes the global behavior of a system"•  Averages out individual differences (stateless)"•  Assumes infinite-sized populations of model components"

–  E.g., assume all possible genotypes always present in population."•  Easier to do theory and make quantitative predictions."•  Examples:"

–  Maxwell’s equations "–  Mackey-Glass systems"–  Lotka-Volterra systems"

Agent-based Models (ABM)Computational / Individual-based / Particle

•  A computational artifact that captures essential components and interactions (I.e. a computer program)."

•  Encodes a theory about relevant mechanisms:!–  Want relevant behavior to arise spontaneously as a

consequence of the mechanisms. The mechanisms give rise to macro-properties without being built in from the beginning."

–  This is a very different kind of explanation than simply predicting what will happen next."

–  Example: Cooperation emerges from Iterated Prisoner’s Dilemma model."

–  Simulation as a basic tool. "–  Observe distribution of outcomes."

•  Study the behavior of the artifact, using theory and simulation:"–  To understand its intrinsic properties, and wrt modeled system."

Examples

•  Cellular automata"•  Genetic algorithms"•  Digital immune

systems"•  Sugarscape"•  Prisoner’s Dilemma

Tournament"

Agent-‐based models

Strengths •  Can address problems that are

fundamental to many disciplines:"–  Path dependency "–  Effects of adaptive versus rational behavior"–  Effects of network structure "–  Cooperation among egoists"–  Diffusion of innovation "–  Tradeoff between exploitation and exploration"–  Generalism vs. specialism "

•  Facilitate interdisciplinary collaboration:"–  “A prosthesis for interaction”"

•  Useful tool when closed form mathematical analyses are intractable."–  E.g., the evolution of sex"

•  Can reveal unity across disciplines."•  Can be a “hard sell”:"

–  Realism vs. clarity

Limita.ons •  Correspondence problem:"

–  What does each primitive component in the model correspond to in the real system?"

•  What questions can they answer? (qualitative predictions, critical regions of parameter space)"

•  How to interpret results?"–  Can’t look at a single run."–  Many contingent behaviors, macro-

statistics don’t tell the entire story."•  Scaling issues (e.g., time, error

rates, population sizes)."•  The mechanistic theory encoded by

the model cannot always be stated cleanly."

Models as Homomorphic MapsCommutativity of the Diagram

algorithm t1

laws t

World at time t World at time t + 1

Model at time t + 1 Model at time t

.

M is an equivalence rela)on.

Model M is valid if this is a homomorphic map:

M(t(x)) = t1(M(x))

Cellular Automata

•  1-D and 2-D"•  Space-time plots"•  Neighborhood, Update rules"•  Wolfram’s classification and dynamical

regimes"•  Forest fire model and the game of Life"•  Spa)al Prisoner’s Dilemma

Power Laws and Scaling

•  What is a power law?"•  Why is it important?"•  How do power laws arise?"•  How are power laws related to scaling?"•  How do I know if my data shows a power

law?"•  Fitting curves to data and testing for

significance."

Power Law Distribu)on •  Polynomial:"•  Scale invariant:"

•  Distribution can range over many orders of magnitude"–  Ratio of largest to smallest sample "

•  Plotted on log-log axes"–  Slope of line gives scaling exponent"–  Y-intercept gives the constant"

•  Heavy tailed (right skewed)"•  Universality"

€

p(x) = axb

€

p(cx) = a(cx)b = cb p(x)∝ p(x)

€

log(p(x)) = log(axb ) = blog x + loga

Sta)s)cal Distribu)ons Why are normal, exponen)al, power-‐law important?

•  Normal Distributions:"–  Additive"

•  Exponential Distributions:"–  aka single-scale"–  Have form P(x) = e-ax"–  Use Gaussian to approximate

exponential because differentiable at 0."

–  Plot on log-linear scale to see straight line."

•  Power-law Distributions:"–  aka scale-free or polynomial"–  Have form P(x) = x-a "

–  Fat tail is associated with power law because it decays more slowly."

–  Plot on log-log scale to see straight line."

Normal Distribu)on

•  Unimodal: Bell curve •  Central Limit Th: The mean of a large number of random variables independently drawn from the same distribu)on is distributed approximately normally, regardless of the form of the original distribu)on.

•  Widely applicable when phenomena are addi)vely related

Measuring Power Laws •  Plot histogram of samples on log-log axes (a):"

–  Test for linear form of data on plot"–  Measure slope of best-fit line to determine scaling exponent"–  Maximum Likelihood Estimate"

•  Problem: Noise in right-hand side of distribution (b)"–  Each bin on the right-hand side of plot has few samples"–  Correct with logarithmic binning (c)"–  Divide #samples in each bin by width of bin (count per unit interval of x)"

•  Cumulative distribution function (d)"

"–  CCDF: Probability P(x) that x has a value greater than y (1 - CDF)"–  Also follows power law but with the exponent b-1"–  No need to use logarithmic binning"–  Sometimes called rank/frequency plots"

•  For power laws"€

P(x) = p(y)dyx

∞

∫

€

P(x) = p(y)dyx

∞

∫ = a y−bdyx

∞

∫ = −a

b −1x−(b−1) = 0 − x(−b+1)

−(b −1)=x(−b+1)

(b −1)

Power laws, Pareto distributions and Zipf’s Law M. Newman (2006)

1 million random numbers, with b=2.5

Linear Empirical Models •  An empirical model is a function

that captures the trend of observed data:"–  It predicts but does not explain the

system that produced the data."•  A common technique is to fit a line

through the data:"

•  Assume Gaussian distributed errors."–  Note: For logged data, we assume

that the errors are log-normally distributed.!

€

y = mx + b

Image downloaded from Wikipedia Sept. 11, 2007

Mechanisms for Producing Power Laws

•  Preferen)al a^achment •  Combina)ons of exponen)als

•  Random walks •  Phase transi)ons and cri)cal phenomena – Percola)on, self-‐organized cri)cality, HOT

•  Centralized space-‐filling networks + invariant terminal units + op)mal design

€

p(y) ≈ eay x ≈ eby

Complex Networks

•  Degree distribu)ons •  Examples of scale-‐free networks •  Proper)es of scale-‐free networks

Documents

Introduc)on*to*Scien)ﬁc*Modeling* CS*365*Fall*2012*forrest/classes/cs365-2012/...• Dynamical model works as claimed:" – Run it. E.g., patented devices." ... – Mackey-Glass

Introduc)ontoScien)ﬁcModeling CS365Fall2012forrest/classes/cs365-2012/...• Dynamical model works as claimed:" – Run it. E.g., patented devices." ... – Mackey-Glass