Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Introduc)on to Scien)fic Modeling CS 365 Fall 2012
Review for Midterm
Topics • What is a model?
– Styles of modeling • Aggregate vs. individual models • State-‐based vs Process-‐based • Determinis)c vs. Probabilis)c
– How do we evaluate models? – Case studies
• Cellular automata • Power laws and data analysis
– Sta)s)cal distribu)ons – Tes)ng for power laws and other distribu)ons – Maximum likelihood es)mates – Power laws in nature – Mechanisms that generate power laws
• Simula)on and modeling – Discrete-‐)me vs. Discrete event – Monte-‐carlo methods – Pseudo-‐random number generators
Modeling
• How do we use models? • How do we evaluate models? • Different approaches to modeling • Examples of different kinds of models and what they are used for
• Pros and cons of different modeling methods • Limita)ons of modeling
Examples
• Blueprint of a bridge"• 2-dimensional projection of a 3-
dimensional image"• Crash dummies (model humans)"• Lotka-Volterra equations"• Forest fire simulation"• Prisoner’s Dilemma"• Case studies
How do we evaluate a model? • Parsimony and simplicity"
– Occam’s Razor (select the competing hypothesis that makes the fewest new assumptions, when the hypotheses are equal in other respects)"
• Accuracy of predictions"– R2 and other statistical tests"
• Dynamical model works as claimed:"– Run it. E.g., patented devices."
• Cogency and relevance of ideas that they produce."• Falsifiability."• Consistency---formalize notion of model as a
homomorphic map. !
Common Modeling Assump)ons
• Homogeneity (all agents are identical / stateless)"• Equilibrium (no or very simple dynamics)"• Random mixing"• No feedback (learning)"• Deterministic"• No connection between micro and macro
phenomena"
• Models with these assumptions can produce some interesting features, e.g., tipping points (R0)."
Features of Complex Systems
• Heterogeneous agents"• Non-equilbrium (non-linear dynamics)"• Contact structure (networks, nonrandom
mixing)"• Learning / Feedback (agents can change
behavior)"• Stochastic behavior (interesting behavior in
the tails)"• Emergence (multi-scale phenomena)"
Modeling complex systems is challenging
• Closed form solutions rarely exist:"– Features from previous slide"
• Detailed simulations are problematic:"– Can never hope to get all the details correct."– Because systems are nonlinear, small errors can have large
consequences."• Evolution is key:"
– Basic components change over time."– Individual variants matter (hard to do theory)."
• Discreteness (e.g., time, state spaces, and internal variable values)."– Techniques developed to study nonlinear systems are not always
directly applicable."• Spatial heterogeneity."• Classical ODE (ordinary differential equation) assumptions:"
– Well stirred” (each particle-particle interaction is equally likely). "– Infinite-sized populations."– Spatial homogeneity. "
Classes of Scien)fic Models
• Continuous vs. Discrete"– E.g., Differential equation vs. Cellular
automaton"• Deterministic vs. Probabilistic"– Dynamical system vs. Markov chain"– Cellular automaton vs. genetic algorithm"
• Spatial vs. nonspatial"• Data-driven vs. theory-driven"– Bayesian networks vs. expert system"
Aggregate Models Differen)al Equa)ons
• Represent how a process changes through time as a differential (difference) equation:"– Time is continuous (discrete)"– Model components are continuous (density)"– Deterministic"– Nonspatial (in simplest case)"
• Describes the global behavior of a system"• Averages out individual differences (stateless)"• Assumes infinite-sized populations of model components"
– E.g., assume all possible genotypes always present in population."• Easier to do theory and make quantitative predictions."• Examples:"
– Maxwell’s equations "– Mackey-Glass systems"– Lotka-Volterra systems"
Agent-based Models (ABM)Computational / Individual-based / Particle
• A computational artifact that captures essential components and interactions (I.e. a computer program)."
• Encodes a theory about relevant mechanisms:!– Want relevant behavior to arise spontaneously as a
consequence of the mechanisms. The mechanisms give rise to macro-properties without being built in from the beginning."
– This is a very different kind of explanation than simply predicting what will happen next."
– Example: Cooperation emerges from Iterated Prisoner’s Dilemma model."
– Simulation as a basic tool. "– Observe distribution of outcomes."
• Study the behavior of the artifact, using theory and simulation:"– To understand its intrinsic properties, and wrt modeled system."
Examples
• Cellular automata"• Genetic algorithms"• Digital immune
systems"• Sugarscape"• Prisoner’s Dilemma
Tournament"
Agent-‐based models
Strengths • Can address problems that are
fundamental to many disciplines:"– Path dependency "– Effects of adaptive versus rational behavior"– Effects of network structure "– Cooperation among egoists"– Diffusion of innovation "– Tradeoff between exploitation and exploration"– Generalism vs. specialism "
• Facilitate interdisciplinary collaboration:"– “A prosthesis for interaction”"
• Useful tool when closed form mathematical analyses are intractable."– E.g., the evolution of sex"
• Can reveal unity across disciplines."• Can be a “hard sell”:"
– Realism vs. clarity
Limita.ons • Correspondence problem:"
– What does each primitive component in the model correspond to in the real system?"
• What questions can they answer? (qualitative predictions, critical regions of parameter space)"
• How to interpret results?"– Can’t look at a single run."– Many contingent behaviors, macro-
statistics don’t tell the entire story."• Scaling issues (e.g., time, error
rates, population sizes)."• The mechanistic theory encoded by
the model cannot always be stated cleanly."
Models as Homomorphic MapsCommutativity of the Diagram
algorithm t1
laws t
World at time t World at time t + 1
Model at time t + 1 Model at time t
.
M is an equivalence rela)on.
Model M is valid if this is a homomorphic map:
M(t(x)) = t1(M(x))
Cellular Automata
• 1-D and 2-D"• Space-time plots"• Neighborhood, Update rules"• Wolfram’s classification and dynamical
regimes"• Forest fire model and the game of Life"• Spa)al Prisoner’s Dilemma
Power Laws and Scaling
• What is a power law?"• Why is it important?"• How do power laws arise?"• How are power laws related to scaling?"• How do I know if my data shows a power
law?"• Fitting curves to data and testing for
significance."
Power Law Distribu)on • Polynomial:"• Scale invariant:"
• Distribution can range over many orders of magnitude"– Ratio of largest to smallest sample "
• Plotted on log-log axes"– Slope of line gives scaling exponent"– Y-intercept gives the constant"
• Heavy tailed (right skewed)"• Universality"
€
p(x) = axb
€
p(cx) = a(cx)b = cb p(x)∝ p(x)
€
log(p(x)) = log(axb ) = blog x + loga
Sta)s)cal Distribu)ons Why are normal, exponen)al, power-‐law important?
• Normal Distributions:"– Additive"
• Exponential Distributions:"– aka single-scale"– Have form P(x) = e-ax"– Use Gaussian to approximate
exponential because differentiable at 0."
– Plot on log-linear scale to see straight line."
• Power-law Distributions:"– aka scale-free or polynomial"– Have form P(x) = x-a "
– Fat tail is associated with power law because it decays more slowly."
– Plot on log-log scale to see straight line."
Normal Distribu)on
• Unimodal: Bell curve • Central Limit Th: The mean of a large number of random variables independently drawn from the same distribu)on is distributed approximately normally, regardless of the form of the original distribu)on.
• Widely applicable when phenomena are addi)vely related
Measuring Power Laws • Plot histogram of samples on log-log axes (a):"
– Test for linear form of data on plot"– Measure slope of best-fit line to determine scaling exponent"– Maximum Likelihood Estimate"
• Problem: Noise in right-hand side of distribution (b)"– Each bin on the right-hand side of plot has few samples"– Correct with logarithmic binning (c)"– Divide #samples in each bin by width of bin (count per unit interval of x)"
• Cumulative distribution function (d)"
"– CCDF: Probability P(x) that x has a value greater than y (1 - CDF)"– Also follows power law but with the exponent b-1"– No need to use logarithmic binning"– Sometimes called rank/frequency plots"
• For power laws"€
P(x) = p(y)dyx
∞
∫
€
P(x) = p(y)dyx
∞
∫ = a y−bdyx
∞
∫ = −a
b −1x−(b−1) = 0 − x(−b+1)
−(b −1)=x(−b+1)
(b −1)
Power laws, Pareto distributions and Zipf’s Law M. Newman (2006)
1 million random numbers, with b=2.5
Linear Empirical Models • An empirical model is a function
that captures the trend of observed data:"– It predicts but does not explain the
system that produced the data."• A common technique is to fit a line
through the data:"
• Assume Gaussian distributed errors."– Note: For logged data, we assume
that the errors are log-normally distributed.!
€
y = mx + b
Image downloaded from Wikipedia Sept. 11, 2007
Mechanisms for Producing Power Laws
• Preferen)al a^achment • Combina)ons of exponen)als
• Random walks • Phase transi)ons and cri)cal phenomena – Percola)on, self-‐organized cri)cality, HOT
• Centralized space-‐filling networks + invariant terminal units + op)mal design
€
p(y) ≈ eay x ≈ eby
Complex Networks
• Degree distribu)ons • Examples of scale-‐free networks • Proper)es of scale-‐free networks