Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly Department of Philosophy Carnegie Mellon University

Ockham’s Razor: Ockham’s Razor: What it is, What it is,

What it isn’t, What it isn’t, How it works, and How it works, and

How it doesn’tHow it doesn’t

Kevin T. KellyKevin T. KellyDepartment of PhilosophyDepartment of Philosophy

Carnegie Mellon UniversityCarnegie Mellon Universitywww.cmu.eduwww.cmu.edu

““Efficient Convergence Implies Ockham's Efficient Convergence Implies Ockham's Razor”,Razor”, Proceedings of Proceedings of the 2002 International the 2002 International Workshop on Computational Models of Workshop on Computational Models of Scientific Scientific Reasoning and ApplicationsReasoning and Applications, Las Vegas, , Las Vegas, USA, June USA, June 24-24- 27, 2002.27, 2002.(with C. Glymour) (with C. Glymour) “Why Probability Does Not “Why Probability Does Not Capture the Capture the Logic of Scientific Logic of Scientific Justification”,Justification”, C. Hitchcock, ed., C. Hitchcock, ed., Contemporary Contemporary Debates in the Philosophy of ScienceDebates in the Philosophy of Science, Oxford: , Oxford: Blackwell, 2004.Blackwell, 2004.““Justification as Truth-finding Efficiency: How Justification as Truth-finding Efficiency: How Ockham's Ockham's Razor Works”,Razor Works”, Minds and MachinesMinds and Machines 14: 14: 2004, pp. 485-505.2004, pp. 485-505.““Learning, Simplicity, Truth, and Learning, Simplicity, Truth, and Misinformation”,Misinformation”, The The Philosophy of Philosophy of InformationInformation, under review., under review.““Ockham's Razor, Efficiency, and the Infinite Ockham's Razor, Efficiency, and the Infinite Game ofGame of

Science”,Science”, proceedings, proceedings, Foundations of the Foundations of the Formal Sciences 2004: Formal Sciences 2004: Infinite Game TheoryInfinite Game Theory, , Springer, under review.Springer, under review.

Further ReadingFurther Reading

Which Theory to Choose?Which Theory to Choose?

T1 T2 T3 T4 T5

???

Compatible with data

Use Ockham’s RazorUse Ockham’s Razor

SimpleSimple ComplexComplexT1 T2 T3 T4 T5

DilemmaDilemma

If you If you knowknow the truth is simple, the truth is simple,

then you then you don’t needdon’t need Ockham. Ockham.

T1 T2 T3 T4 T5SimpleSimple ComplexComplex

DilemmaDilemma

If you If you don’t knowdon’t know the truth is simple, the truth is simple,

then how couldthen how could a a fixedfixed simplicity biassimplicity bias help you if the truth is complex?help you if the truth is complex?

T1 T2 T3 T4 T5SimpleSimple ComplexComplex

PuzzlePuzzleA A fixedfixed bias is like a bias is like a brokenbroken thermometer. thermometer.

How could it possibly help you find How could it possibly help you find unknownunknown truth? truth?

Cold!

I. Ockham I. Ockham Apologists Apologists

Wishful ThinkingWishful Thinking

Simple theories are nice if true:Simple theories are nice if true: TestabilityTestability UnityUnity Best explanationBest explanation Aesthetic appealAesthetic appeal Compress dataCompress data

So is believing that you are the So is believing that you are the emperor.emperor.

OverfittingOverfitting Maximum likelihood estimates based on Maximum likelihood estimates based on

overly complex theories can have greater overly complex theories can have greater predictive errorpredictive error (AIC, Cross-validation, etc.). (AIC, Cross-validation, etc.). Same is true even if you Same is true even if you know know the the true true model is model is

complexcomplex.. Doesn’t Doesn’t convergeconverge to true model. to true model. Depends on Depends on random datarandom data..

Thanks, but a simpler Thanks, but a simpler model still has lower model still has lower predictive error.predictive error.

The truth is The truth is complex.complex.-God--God- .

..

.

Messy worlds are Messy worlds are legionlegion

Tidy worlds are Tidy worlds are fewfew..

That is why the tidy worldsThat is why the tidy worlds

Are those most likely Are those most likely truetrue. (Carnap). (Carnap)

unity

Ignorance = KnowledgeIgnorance = Knowledge


1/3 1/3 1/3






2/6

1/6

2/6

1/6





Depends on NotationDepends on Notation

But mess depends onBut mess depends on coding coding,,

which Goodman noticed, too.which Goodman noticed, too.

The picture is The picture is invertedinverted if if

we we translatetranslate green to grue. green to grue.2/6

1/6

2/6

1/6 NotationIndicates truth?

Same for Algorithmic Same for Algorithmic ComplexityComplexity

Goodman’s problemGoodman’s problem works against every works against every fixedfixed simplicity ranking (independent of the processes by simplicity ranking (independent of the processes by which data are generated and coded prior to learning).which data are generated and coded prior to learning).

Extra problem:Extra problem: any pair-wise ranking of theories can any pair-wise ranking of theories can be be reversedreversed by choosing an by choosing an alternative computer alternative computer languagelanguage..

So how could simplicity help us find the true theory? So how could simplicity help us find the true theory?

NotationIndicates truth?

Just Beg the QuestionJust Beg the Question AssignAssign high prior probability to high prior probability to

simple theories.simple theories. Why Why shouldshould you? you? Preference for Preference for complexitycomplexity has the same has the same

“explanation”.“explanation”.

You presume simplicityTherefore you should presume simplicity!

Miracle ArgumentMiracle Argument

Simple data would be a Simple data would be a miraclemiracle if a if a complex theory were true (Bayes, complex theory were true (Bayes, BIC, Putnam). BIC, Putnam).

Begs the QuestionBegs the Question

S C

““Fairness” between theories Fairness” between theories

bias against complex worlds. bias against complex worlds.

Two Can Play That GameTwo Can Play That Game

S C

““Fairness” between Fairness” between worlds worlds

bias against simple bias against simple theory.theory.

ConvergenceConvergence

At least a simplicity bias At least a simplicity bias doesn’t doesn’t preventprevent convergenceconvergence to the truth to the truth (MDL, BIC, Bayes, SGS, etc.).(MDL, BIC, Bayes, SGS, etc.). Neither do Neither do otherother biases. biases. May as well recommend May as well recommend flat tiresflat tires since since

they can be fixed.they can be fixed.

OSI

MP L E

OCO

M P L EX

Does Ockham Have No Does Ockham Have No Frock?Frock?

Philosopher’s stone,Philosopher’s stone,Perpetual motion,Perpetual motion,Free lunchFree lunch

Ockham’s Razor???Ockham’s Razor???

Ash Heap of Ash Heap of HistoryHistory

. . .

II. How Ockham II. How Ockham Helps You Find Helps You Find

the Truth the Truth

What is Guidance?What is Guidance? Indication or trackingIndication or tracking

Too Too strongstrong FixedFixed bias can’t bias can’t indicateindicate anything anything

Convergence Convergence Too Too weakweak True of other biasesTrue of other biases

““Straightest” convergenceStraightest” convergence Just right?Just right?

C

C

S

S

CS

A True StoryA True StoryNiagara FallsNiagara Falls

PittsburghPittsburgh

ClarionClarion



ClarionClarion



ClarionClarion



ClarionClarion

!!

ClarionClarion



A True StoryA True Story

??



Ask Ask directions!directions!


Where’s …Where’s …

What Does She Say?What Does She Say?Turn around. Turn around. The freeway The freeway ramp is on ramp is on the left.the left.

You Have Better IdeasYou Have Better Ideas

Phooey!Phooey!The Sun was on the The Sun was on the right!right!


!!!!




Stay the Course!Stay the Course!

AhemAhem……



Don’t Flip-flop!Don’t Flip-flop!



Then Again…Then Again…



@@$##!@@$##!

One Good Flip One Good Flip Can Save a Lot of FlopCan Save a Lot of Flop


The U-TurnThe U-Turn














Told ya!Told ya!











Your RouteYour Route


Needless U-turnNeedless U-turn

The Best RouteThe Best Route


Told ya!Told ya!

The Best Route Anywhere The Best Route Anywhere from Therefrom There

PittsburghPittsburgh NY, DCNY, DC

Told ya!Told ya!

The Freeway to the TruthThe Freeway to the Truth

ComplexComplex SimpleSimple

Fixed adviceFixed advice for all destinations for all destinations Disregarding it entails an Disregarding it entails an extra course extra course

reversalreversal……

Told ya!Told ya!

The Freeway to the TruthThe Freeway to the Truth

ComplexComplex SimpleSimple

……even if the advice points even if the advice points away fromaway from the goal!the goal!

Told ya!Told ya!

Counting MarblesCounting Marbles



May come at any time…May come at any time…













Ockham’s RazorOckham’s Razor

3 ?

If you answerIf you answer, answer with the , answer with the current countcurrent count..

AnalogyAnalogy MarblesMarbles = detectable “effects”. = detectable “effects”. Late appearanceLate appearance = difficulty of = difficulty of

detection.detection. CountCount = model (e.g., causal graph). = model (e.g., causal graph). Appearance timesAppearance times = free = free

parameters.parameters.

AnalogyAnalogy

U-turnU-turn = model = model revisionrevision (with (with content loss)content loss)

HighwayHighway = revision-efficient = revision-efficient truth-finding method.truth-finding method.

TTTT

The U-turn ArgumentThe U-turn Argument

Suppose you Suppose you converge to the truthconverge to the truth but but

violate Ockham’s razor violate Ockham’s razor along the way.along the way.

33


Where is that extra marble, anyway?Where is that extra marble, anyway?

33


It’s not coming, is it?It’s not coming, is it?

33


If you never say 2 you’ll never If you never say 2 you’ll never converge to the truth….converge to the truth….

33


That’s it. You should have listened to That’s it. You should have listened to Ockham.Ockham.

33222222


Oops! Well, no method is infallible!Oops! Well, no method is infallible!

33222222


If you never say 3, you’ll never If you never say 3, you’ll never converge to the truth….converge to the truth….

33222222


Embarrassing to be back at that old Embarrassing to be back at that old theory, eh?theory, eh?

33222222 33


And so forth…And so forth…

33222222 33 44



33222222 33 44 55



33222222 33 44 55 66



33222222 33 44 55 6677

The ScoreThe Score

You: You:

33222222 33 44 55 6677

Subproblem

The ScoreThe Score

Ockham:Ockham:

222222 33 44 55 6677

Subproblem

Ockham is NecessaryOckham is Necessary

If you If you converge to the truthconverge to the truth, ,

and and you you violate Ockham’s razorviolate Ockham’s razor

then then some convergent method some convergent method beats beats

your worst-case revision boundyour worst-case revision bound in in each answer in the subproblem each answer in the subproblem entered at the time of the violation.entered at the time of the violation.

Ockham is SufficientOckham is Sufficient

If you If you converge to the truthconverge to the truth, ,

and and you you never violate Ockham’s razornever violate Ockham’s razor

then then You You achieve the worst-case revision achieve the worst-case revision

boundbound of each convergent solution of each convergent solution in each answer in each in each answer in each subproblem.subproblem.

EfficiencyEfficiency

EfficiencyEfficiency = achievement of the = achievement of the best worst-case revision bound in best worst-case revision bound in each answer in each subproblem. each answer in each subproblem.

Ockham Efficiency Ockham Efficiency TheoremTheorem

Among the convergent methods…Among the convergent methods…

Ockham = Efficient!Ockham = Efficient!

Efficient

Inefficient

““Mixed” StrategiesMixed” Strategies

mixed strategy mixed strategy = chance of output = chance of output depends only on actual experience.depends only on actual experience.

convergence in probabilityconvergence in probability = chance = chance of producing true answer of producing true answer approaches 1 in the limit.approaches 1 in the limit.

efficiencyefficiency = achievement of best = achievement of best worst-case worst-case expectedexpected revision bound revision bound in each answer in each subproblem. in each answer in each subproblem.

Ockham Efficiency Ockham Efficiency TheoremTheorem Among the Among the mixedmixed methods that methods that

converge in probability…converge in probability…

Ockham = Efficient!Ockham = Efficient!

Efficient

Inefficient

Dominance and Dominance and “Support”“Support”

EveryEvery convergent method is convergent method is weakly weakly dominateddominated in revisions by a clone in revisions by a clone who says “?” until stage who says “?” until stage nn..

1.1. Convergence Convergence Must leap Must leap eventually.eventually.

2.2. EfficiencyEfficiency Only leap to Only leap to simplest.simplest.

3.3. Dominance Dominance Could always Could always wait longer.wait longer. Can’t wait forever!

III. Ockham on III. Ockham on Steroids Steroids

Ockham Wish ListOckham Wish List General definitionGeneral definition of Ockham’s razor. of Ockham’s razor. Compare revisions even when Compare revisions even when not not

bounded within answersbounded within answers.. Prove theorem for Prove theorem for arbitrary empirical arbitrary empirical

problemsproblems..

Empirical ProblemsEmpirical Problems Problem = Problem = partition of a topological partition of a topological

space.space. Potential answersPotential answers = partition cells. = partition cells. EvidenceEvidence = open (verifiable) = open (verifiable)

propositions.propositions.Example: SymmetryExample: Symmetry

Example: Parameter Example: Parameter FreeingFreeing

Euclidean topology.Euclidean topology. Say which parameters are zero.Say which parameters are zero. Evidence = open neighborhood. Evidence = open neighborhood.

•Curve fitting

a2

a1

0

a1 = 0a2 = 0

a1 > 0a2 = 0

a1 > 0a2 > 0

a1 = 0a2 > 0

The PlayersThe Players

Scientist:Scientist: Produces an answer in response to Produces an answer in response to

current evidence.current evidence. Demon:Demon:

Chooses evidence in response to Chooses evidence in response to scientist’s choicesscientist’s choices

WinningWinning

Scientist wins…Scientist wins… by defaultby default if demon doesn’t present an if demon doesn’t present an

infinite nested sequence of basic open infinite nested sequence of basic open sets whose intersection is a singleton.sets whose intersection is a singleton.

else by meritelse by merit if scientist eventually if scientist eventually always produces the true answer for always produces the true answer for world selected by demon’s choices.world selected by demon’s choices.

Comparing RevisionsComparing Revisions

One answer sequence One answer sequence maps into maps into another iff another iff there is an order and answer-preserving there is an order and answer-preserving

map from the first to the second (? is map from the first to the second (? is wild).wild).

Then the revisions of first are Then the revisions of first are as as good asgood as those of the second. those of the second.? ? ? ? ? . . .

. . .

Comparing RevisionsComparing Revisions

The revisions of the first are The revisions of the first are strictly strictly betterbetter if, in addition, the latter if, in addition, the latter doesn’t map back into the former. doesn’t map back into the former.

? . . .

? ? ? ? ? . . .

Comparing MethodsComparing Methods

F F is is as good asas good as G G iffiff

each output sequence of each output sequence of FF is as good is as good as some output sequence of as some output sequence of G.G.

F Gas good as


FF is is better thanbetter than GG iff iff

FF is as good as is as good as GG and and

GG is is not not as good as as good as FF

F Gnot as good as


FF is is strongly better thanstrongly better than GG iff each iff each output sequence of output sequence of FF is strictly is strictly better than an output sequence of better than an output sequence of GG but …but …

strictly better than


…… no output sequence of no output sequence of GG is as is as good as any of good as any of FF..

not as good as

TerminologyTerminology

Efficient solution: Efficient solution: as good as any as good as any solution in any subproblem.solution in any subproblem.

What Simplicity Isn’tWhat Simplicity Isn’t Syntactic length.Syntactic length. Data-compression (MDL).Data-compression (MDL). Computational ease.Computational ease. Social “entrenchment” (Goodman).Social “entrenchment” (Goodman). Number of free parameters (BIC, Number of free parameters (BIC,

AIC).AIC). Euclidean dimensionalityEuclidean dimensionality

Only by accident!!

Symptoms…

What Simplicity IsWhat Simplicity Is SimplerSimpler theories are compatible theories are compatible

with with deeper problems of inductiondeeper problems of induction..

Worst demon

Smaller demon

Problem of InductionProblem of Induction

No true information entails the true No true information entails the true answer.answer.

Happens in Happens in answer boundariesanswer boundaries..

Demonic PathsDemonic PathsA A demonic pathdemonic path from from ww is a is a

sequence of alternating answers sequence of alternating answers that a demon can force an that a demon can force an arbitrary convergent method arbitrary convergent method through starting from through starting from ww. .

0…1…2…3…4

Simplicity Simplicity DefinedDefinedThe The AA-sequences-sequences are the demonic are the demonic

sequences beginning with answer sequences beginning with answer AA..

AA is is as simpleas simple as as BB iff each iff each BB--sequence is as good as some sequence is as good as some AA--sequence. sequence. 2, 3

2, 3, 42, 3, 4, 5

<<<

33, 4

3, 4, 5.

. .

So 2 is simpler than 3!

Ockham AnswerOckham Answer An answer as simple as any other An answer as simple as any other

answer.answer. = number of observed particles.= number of observed particles.

. .

.

So 2 is Ockham!

2, …, n2, …, n, n+12, …, n, n+1, n+2

<<<

nn, n+1

n, n+1, n+2

Ockham LemmaOckham LemmaAA is Ockham iff is Ockham iff

for all demonic for all demonic p, p, ((AA**pp) ≤ some demonic sequence. ) ≤ some demonic sequence.

3

I can force you through

2but not

through 3,2.

So 3 isn’t Ockham

Ockham AnswerOckham AnswerE.g.: Only simplest curve E.g.: Only simplest curve

compatible with data is Ockham.compatible with data is Ockham.

a2

a1

0

Demonic sequence:

Non-demonic sequences:

General Efficiency General Efficiency TheoremTheorem

If the topology is metrizable and If the topology is metrizable and separable and the question is separable and the question is countable then: countable then:

Ockham = Efficient.Ockham = Efficient.

Proof: uses Martin’s Borel Proof: uses Martin’s Borel Determinacy theorem.Determinacy theorem.

Stacked ProblemsStacked ProblemsThere is an Ockham answer at There is an Ockham answer at

every stage.every stage.

0

1

2

3

. . .

1

Non-Ockham Non-Ockham Strongly Strongly WorseWorse

If the problem is a If the problem is a stackedstacked countable partition over a countable partition over a restricted Polish space:restricted Polish space:

Each Ockham solution is Each Ockham solution is stronglystrongly better than each non- better than each non-Ockham solution in the Ockham solution in the subproblem entered at the subproblem entered at the time of the violation.time of the violation.

Simplicity Simplicity Low Low DimensionDimension

Suppose God says the true Suppose God says the true parameter value is rational.parameter value is rational.


Topological dimension and Topological dimension and integration theory dissolve.integration theory dissolve.

Does Ockham?Does Ockham?


The proposed account survives in The proposed account survives in the preserved limit point the preserved limit point structure.structure.

IV. Ockham and IV. Ockham and Symmetry Symmetry

Respect for SymmetryRespect for Symmetry If severalIf several simplest simplest alternatives alternatives

are available, are available, don’t break the don’t break the symmetry.symmetry.

•Count the marbles of each color.•You hear the first marble but don’t see it.•Why red rather than green?

?

Respect for SymmetryRespect for Symmetry Before the noise, (Before the noise, (00, , 00) is ) is

Ockham.Ockham. After the noise, After the noise, no answer is no answer is

OckhamOckham::

((00, , 11))((11, , 00))

DemonicDemonic Non-demonicNon-demonic

((00, , 11) () (11, , 00))((11, , 00) () (00, , 11))

?

Right!

((00, , 00))

Goodman’s RiddleGoodman’s Riddle Count Count oneiclesoneicles--- a oneicle is a particle at --- a oneicle is a particle at

any stage but one, when it is a non-any stage but one, when it is a non-particle. particle.

Oneicle tranlation is auto-Oneicle tranlation is auto-homeomorphism that homeomorphism that does not preserve does not preserve the problemthe problem. .

Unique Unique Ockham answer is current Ockham answer is current oneicle count.oneicle count.

Contradicts Contradicts unique Ockham answer in unique Ockham answer in particle counting. particle counting.

SupersymmetrySupersymmetry Say Say whenwhen each particle appears. each particle appears. Refines Refines counting problem.counting problem. EveryEvery auto-homeomorphism auto-homeomorphism

preserves problem.preserves problem.

No No answer is Ockham.answer is Ockham. No No solution is Ockhamsolution is Ockham.. No No method is efficient.method is efficient.

Dual SupersymmetryDual Supersymmetry Say only whether particle count is Say only whether particle count is even even

or oddor odd.. CoarsensCoarsens counting problem. counting problem. Particle/OneicleParticle/Oneicle auto-homeomorphism auto-homeomorphism

preserves problem.preserves problem.

Every Every answer is Ockham.answer is Ockham. Every Every solution is Ockhamsolution is Ockham.. Every Every solution is efficient.solution is efficient.

Broken SymmetryBroken Symmetry CountCount the even or just report the even or just report oddodd.. CoarsensCoarsens counting problem. counting problem. RefinesRefines the even/odd problem. the even/odd problem. Unique Unique Ockham answer at each Ockham answer at each

stagestage.. Exactly Exactly Ockham solutions are Ockham solutions are

efficient.efficient.

Simplicity Under Simplicity Under RefinementRefinement

Particle countingOneicle counting Twoicle counting

Particle countingor odd particles

Oneicle countingor odd oneicles

Twoicle countingor odd twoicles

Time of particle appearance

Even/odd

Broken symmetryUnique Ockham answer

SupersymmetryNo answer is Ockham

Dual supersymmetryBoth answers are Ockham

Proposed Theory is RightProposed Theory is Right ObjectiveObjective efficiency is grounded in efficiency is grounded in

problems.problems. Symmetries in the preceding problems Symmetries in the preceding problems

would would wash outwash out stronger simplicity stronger simplicity distinctions. distinctions.

Hence, such distinctions would amount Hence, such distinctions would amount to mere to mere conventionsconventions (like coordinate (like coordinate axes) that couldn’t have anything to do axes) that couldn’t have anything to do with objective efficiency. with objective efficiency.

Furthermore…Furthermore… If Ockham’s razor is forced to choose If Ockham’s razor is forced to choose

in the supersymmetrical problems in the supersymmetrical problems then either:then either:

following Ockham’s razor following Ockham’s razor increases increases revisions revisions in some counting problems in some counting problems OrOr

Ockham’s razor leads to Ockham’s razor leads to contradictionscontradictions as a problem is coarsened or refined. as a problem is coarsened or refined.

V. Conclusion V. Conclusion

What Ockham’s Razor IsWhat Ockham’s Razor Is

““Only output Ockham answers”Only output Ockham answers” Ockham answerOckham answer = a topological = a topological

invariant of the empirical problem invariant of the empirical problem addressed.addressed.

What it Isn’tWhat it Isn’t

preference for preference for brevity, brevity, computational ease, computational ease, entrenchment, entrenchment, past success, past success, Kolmogorov complexity, Kolmogorov complexity, dimensionality, etc….dimensionality, etc….

How it WorksHow it Works

Ockham’s razor is necessary for Ockham’s razor is necessary for mininizing revisionsmininizing revisions prior to prior to convergence to the truth.convergence to the truth.

How it Doesn’tHow it Doesn’t

No possible methodNo possible method could: could: Point at the truth;Point at the truth; Indicate the truth;Indicate the truth; Bound the probability of error;Bound the probability of error; Bound the number of future revisions.Bound the number of future revisions.

Spooky OckhamSpooky Ockham

Science Science without support or safety without support or safety netsnets..







VI. Stochastic VI. Stochastic OckhamOckham

““Mixed” StrategiesMixed” Strategies

mixed strategy mixed strategy = chance of = chance of output depends only on actual output depends only on actual experience.experience.

Pe(M = H at n) = Pe|n(M = H at n).

e

Stochastic CaseStochastic Case

Ockham = Ockham =

at each stage, you produce a non-at each stage, you produce a non-Ockham answer with Ockham answer with prob = 0prob = 0. .

EfficiencyEfficiency = =

achievement of the best worst-case achievement of the best worst-case expected expected revision bound in each revision bound in each answer in each subproblem over all answer in each subproblem over all methods that converge to the truth methods that converge to the truth in probabilityin probability. .

Stochastic Efficiency Stochastic Efficiency TheoremTheorem

Among the stochastic methods Among the stochastic methods that converge in probability, that converge in probability, Ockham = Efficient!Ockham = Efficient!

Efficient

Inefficient

Stochastic MethodsStochastic Methods

Your chance of producing an answer is Your chance of producing an answer is a a function of observations made so farfunction of observations made so far. .

22

Urn selected in light of observations.

p

Stochastic U-turn Stochastic U-turn ArgumentArgument

Suppose you Suppose you converge converge in probabilityin probability to the truthto the truth but produce a non-Ockham but produce a non-Ockham answer with prob > 0.answer with prob > 0.

33r >

Choose small Choose small > 0. Consider answer > 0. Consider answer 4.4.

33


r >


By By convergenceconvergence in probability in probability to the to the truth:truth:

3322p > 1 - /3

r >


Etc. Etc.

3322p> 1-/3

r >

33p > 1-/344p > 1-/3

Stochastic U-turn Stochastic U-turn ArgumentArgument Since Since can be chosen arbitrarily small, can be chosen arbitrarily small,

sup prob of ≥ 3 revisions ≥ sup prob of ≥ 3 revisions ≥ rr.. sup prob of ≥ 2 revisions =1sup prob of ≥ 2 revisions =1

33r >

22p> 1-/333p > 1-/344p > 1-/3

Stochastic U-turn Stochastic U-turn ArgumentArgument So sup Exp revisions is ≥ So sup Exp revisions is ≥ 2 + 32 + 3rr..

But for Ockham = But for Ockham = 22. .

33r >

Subproblem

22p> 1-/333p > 1-/344p > 1-/3

VII. Statistical VII. Statistical InferenceInference

(Beta Version)

The Statistical Puzzle of The Statistical Puzzle of SimplicitySimplicity

Assume:Assume: Normal distribution, Normal distribution, , , ≥ 0. ≥ 0. Question:Question: = 0 or = 0 or > 0 ?> 0 ? Intuition: Intuition: = 0 is simpler than = 0 is simpler than > 0 .> 0 .

= 0 mean

AnalogyAnalogy

Marbles: Marbles: potentially small “effects”potentially small “effects” Time:Time: sample sizesample size Simplicity:Simplicity: fewer free parameters tied to fewer free parameters tied to

potential “effects”potential “effects” Counting:Counting: freeing parameters in a modelfreeing parameters in a model

U-turn “in Probability”U-turn “in Probability” Convergence in probability:Convergence in probability: chance chance

of producing true model goes to unityof producing true model goes to unity

Retraction in probability:Retraction in probability: chance of producing chance of producing a model drops from above a model drops from above to below 1 – to below 1 –

Sample size

1

0

Chance of producingtrue model

Chance of producingalternative model

= 0

sample mean

Suppose You (Probably) Suppose You (Probably) Choose a Model More Choose a Model More

Complex than the TruthComplex than the Truth

mean

zone for choosing > 0 > 0

Revision Counter: 0

> 0

= 0

sample mean

Eventually You Retract to Eventually You Retract to the Truth (In Probability)the Truth (In Probability)

mean

zone for choosing = 0 = 0

Revision Counter: 1

> 0

sample mean

So You (Probably) Output So You (Probably) Output an Overly Simple Model an Overly Simple Model

NearbyNearby

mean


Revision Counter: 1

> 0

sample mean

mean


Eventually You Retract to Eventually You Retract to the Truth (In Probability)the Truth (In Probability)

Revision Counter: 2

But Standard (Ockham) But Standard (Ockham) Testing Practice Requires Testing Practice Requires

Just One Retraction!Just One Retraction! = 0

sample mean

mean


Revision Counter: 0

In The Simplest World, No In The Simplest World, No RetractionsRetractions

= 0

sample mean

mean


Revision Counter: 0

= 0

sample mean

mean


Revision Counter: 0

In The Simplest World, No In The Simplest World, No RetractionsRetractions

In Remaining Worlds, In Remaining Worlds, at Most One Retractionat Most One Retraction

> 0

mean


Revision Counter: 0

> 0 mean

Revision Counter: 0



> 0 mean

Revision Counter: 1



So Ockham Beats All ViolatorsSo Ockham Beats All Violators Ockham:Ockham: at most one revision. at most one revision. Violator:Violator: at least two revisions in at least two revisions in

worst caseworst case

SummarySummary Standard practice is to “test” the point Standard practice is to “test” the point

hypothesis rather than the composite hypothesis rather than the composite alternative.alternative.

This amounts to favoring the “simple” This amounts to favoring the “simple” hypothesis a priori.hypothesis a priori.

It also minimizes revisions in probability!It also minimizes revisions in probability!

Two Dimensional Two Dimensional ExampleExample

Assume:Assume: independent bivariate normal independent bivariate normal distribution of unit variance.distribution of unit variance.

Question:Question: how manyhow many components of the components of the joint mean are zero?joint mean are zero?

Intuition: Intuition: more nonzeros = more more nonzeros = more complexcomplex

Puzzle:Puzzle: How does it help to favor How does it help to favor simplicity in less-than-simplest worlds?simplicity in less-than-simplest worlds?

A Real Model Selection A Real Model Selection MethodMethod

Bayes Information Criterion (BIC)Bayes Information Criterion (BIC) BIC(BIC(MM, sample) = , sample) =

- log(max prob that - log(max prob that MM can assign to can assign to sample) + sample) +

+ log(sample size) + log(sample size) model complexity model complexity ½. ½.

BIC method: choose BIC method: choose MM with least with least BIC score.BIC score.

Official BIC PropertyOfficial BIC Property

In the limit, minimizing BIC finds a In the limit, minimizing BIC finds a model with maximal conditional model with maximal conditional probability when the prior probability probability when the prior probability is flat over models and fairly flat over is flat over models and fairly flat over parameters within a model. parameters within a model.

But it is also revision-efficient.But it is also revision-efficient.

AIC in Simplest WorldAIC in Simplest World

nn = 2 = 2 = (0, 0).= (0, 0).

Retractions Retractions = 0= 0

ComplexComplex

SimpleSimple

-2 -1 0 1 2 3

-2

-1

0

1

2

3


nn = 100 = 100 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.4 -0.2 0 0.2 0.4

-0.4

-0.2

0

0.2

0.4


nn = 4,000,000 = 4,000,000 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.004 -0.002 0 0.002 0.004

-0.004

-0.002

0

0.002

0.004

BIC in Simplest WorldBIC in Simplest World

nn = 2 = 2 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-2 -1 0 1 2 3

-2

-1

0

1

2

3


nn = 100 = 100 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1


nn = 4,000,000 = 4,000,000 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.0075-0.005-0.0025 0 0.0025 0.005 0.0075

-0.0075

-0.005

-0.0025

0

0.0025

0.005

0.0075


nn = 20,000,000 = 20,000,000 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.006 -0.004 -0.002 0 0.002 0.004 0.006

-0.006

-0.004

-0.002

0

0.002

0.004

0.006

Performance in Complex Performance in Complex WorldWorld

nn = 2 = 2 = (.05, .005).= (.05, .005).


-2 -1 0 1 2 3

-2

-1

0

1

2

3

ComplexComplex

SimpleSimple

95%


nn = 100 = 100 = (.05, .005).= (.05, .005).

-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1


ComplexComplex

SimpleSimple


nn = 30,000 = 30,000 = (.05, .005).= (.05, .005).

-0.1 -0.05 0 0.05 0.1

-0.1

-0.05

0

0.05

0.1 Retractions Retractions

= 1= 1

ComplexComplex

SimpleSimple


nn = 4,000,000 (!) = 4,000,000 (!) = (.05, .005).= (.05, .005).

-0.04 -0.02 0 0.02 0.04

-0.04

-0.02

0

0.02

0.04 Retractions Retractions = 2= 2

ComplexComplex

SimpleSimple

QuestionQuestion Does the statistical retraction minimization Does the statistical retraction minimization

story extend to violations in less-than-simplest story extend to violations in less-than-simplest worlds?worlds?

Recall that the deterministic argument for Recall that the deterministic argument for higher retractions required the concept of higher retractions required the concept of minimizing retractions in each “subproblem”.minimizing retractions in each “subproblem”.

A “subproblem” is a proposition verified at a A “subproblem” is a proposition verified at a given time in a given world. given time in a given world.

Some analogue “in probability” is required. Some analogue “in probability” is required.

Subproblem.Subproblem. HH is an is an -subroblem in -subroblem in ww at at nn: :

There is a likelihood ratio test of {There is a likelihood ratio test of {ww} at } at significance < significance < such that this test has power such that this test has power < 1 - < 1 - at each world in at each world in HH. .

w

sample size = n

H

reject

worlds

accept

reject

Significance SchedulesSignificance Schedules A A significance schedulesignificance schedule ((..) is a monotone ) is a monotone

decreasing sequence of significance levels decreasing sequence of significance levels converging to zero that drop so slowly that converging to zero that drop so slowly that power can be increased monotonically with power can be increased monotonically with sample size.sample size.

aa((nn))aa((nn+1)+1)

nn+1+1

nn

Ockham Violation Ockham Violation InefficientInefficient

(X, Y)Subproblem At sample size n


(X, Y)

Ockham violation:Probably say blue hypothesisat white world(p > )

Subproblem At sample size n


(X, Y)

Probably say blue

Probably say white

Subproblem at time of violation


(X, Y)

Probably say blue

Probably say white



(X, Y)

Probably say blue

Probably say white

Probably say blue


Oops! Ockham Oops! Ockham InefficientInefficient

(X, Y) Subproblem


(X, Y) Subproblem


(X, Y) Subproblem


(X, Y) Subproblem


(X, Y)

Two retractions

Subproblem

LocalLocal Retraction Retraction EfficiencyEfficiency

(X, Y) Subproblem

Ockham does as well as best subproblem Ockham does as well as best subproblem performance in performance in some neighborhoodsome neighborhood of of ww..

At most one retraction

Two retractions


(X, Y)Subproblem at time of violation

Note: Note: nono neighborhood around neighborhood around ww avoids extra retractions.avoids extra retractions.

w

Gonzo Ockham Gonzo Ockham InefficientInefficient

(X, Y) Subproblem

Gonzo = probably saying simplest Gonzo = probably saying simplest answer in entire subproblem answer in entire subproblem entered in simplest world.entered in simplest world.

BalanceBalance

Be Ockham (avoid complexity) Be Ockham (avoid complexity) Don’t be Gonzo Ockham (avoid Don’t be Gonzo Ockham (avoid

bad fit).bad fit).

Truth-directed:Truth-directed: sole aim is to find sole aim is to find true modeltrue model with minimal revisions! with minimal revisions!

No circles:No circles: totally worst-case; no totally worst-case; no prior bias toward simple worlds.prior bias toward simple worlds.

THE ENDTHE END

Documents

Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly Department of Philosophy Carnegie Mellon University