35
From Purple Prose to Machine-Checkable Proofs: Levels of rigor in family history tools Dr. Luther A. Tychonievich, Ph.D. Dept. Computer Science, University of Virginia TSC Coordinator, Family History Information Standards Organisation These slides: http://www.cs.virginia.edu/luther/RT2015.pdf Me: [email protected] or [email protected] Luther Tychonievich 1 of 35

From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Embed Size (px)

Citation preview

Page 1: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

From Purple Prose toMachine-Checkable Proofs:Levels of rigor in family history tools

Dr. Luther A. Tychonievich, Ph.D.Dept. Computer Science, University of Virginia

TSC Coordinator, Family History Information Standards Organisation

These slides: http://www.cs.virginia.edu/luther/RT2015.pdfMe: [email protected] or [email protected]

Luther Tychonievich 1 of 35

Page 2: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond

Luther Tychonievich 2 of 35

Page 3: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Two Definitions of Proof• Persuasive writing intended toconvince someone that an idea isbeyond rational doubt

• A set of derivations that reduce anon-obvious claim to a set of otherclaims and derivation rules

Luther Tychonievich 3 of 35

Page 4: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Not Proof• Purple prose is language thatconceals a lack of substance behinda superfluity of description

• Structure can be that detail…• Try a genealogy search for Odin

• Potential for misuse ≠ bad

Luther Tychonievich 4 of 35

Page 5: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond

Luther Tychonievich 5 of 35

Page 6: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Reasons not to trustgenealogy

• Amateurs and fee-for-service• …the two least-trusted sources

• Individuals don’t yield to statistics• Linked generations compound error• Genes and probability offer littlehope

Luther Tychonievich 6 of 35

Page 7: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Compound Error• Suppose each parent-child linkcorrect with 95% confidence• Great-grandparents = 14 links• 0.9514 < 49% chance all correct

• Doubling per-link confidence• Doubles links we can trust• Gives one extra generation

Luther Tychonievich 7 of 35

Page 8: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Will genetics provide rigor?• Makes statistical claims aboutoverlap of ancestry of living people

• Illegitimacy and inbreeding• Biological vs Social parent• (see philogenetic tree research)

Luther Tychonievich 8 of 35

Page 9: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Can probability provide rigor?• Probabilistic reasoning requiressome subset of• Priors• Experiments or Ground truth• Well-defined populations• Independent distributions

• Not a silver bullet…Luther Tychonievich 9 of 35

Page 10: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond

Luther Tychonievich 10 of 35

Page 11: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Pop Quiz• Which of the following is hardest?(1) Finding the right source(2) Citing the source right(3)Drawing the right conclusions(4) Communicating your reasoning

• Using tool , which is hardest?

Luther Tychonievich 11 of 35

Page 12: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Analogy: Proof-Carrying Code• To prove code works is provably hard

• …or impossible (Rice’s Theorem)• To communicate a proof is easy

• …if you use the right tool

• Can we make the right tool forfamily history?

Luther Tychonievich 12 of 35

Page 13: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond

Luther Tychonievich 13 of 35

Page 14: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

On the next slide we’ll see…• A recorder’s perspective of thelogical flow of genealogy

• Not a suggested research work-flow

Luther Tychonievich 14 of 35

Page 15: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Luther Tychonievich 15 of 35

Page 16: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• History happened, creating artifacts(documents, monuments, memories…)

• Not part of research; should notappear in tools

Luther Tychonievich 16 of 35

Page 17: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• Step 1: discover artifacts• Digitization helps us all• Citations are getting better…• Logs (negative evidence) need work

Luther Tychonievich 17 of 35

Page 18: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• Many tiers: shapes → glyphs →words → meaning → claims

• Single version = data pollution• Transcription community muchbetter at this than us

Luther Tychonievich 18 of 35

Page 19: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• We don’t consciously consider everyclaim we see

• How do you detect, let alone store,subconscious selection?

• Computer could notice for us…

Luther Tychonievich 19 of 35

Page 20: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• “Is that my John?”• Turns claims into people• The crux of genealogy; most errorsI’ve seen happen here

• Deserves a second slide…Luther Tychonievich 20 of 35

Page 21: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• We need to1. record matches (and non-matches)2. allow changes3. record reason for each match4. embrace ambiguity

• More tools gaining 1, a few have 2; 3is unusual, 4 yet to appear

• Deserves a third slide…Luther Tychonievich 21 of 35

Page 22: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• Match algebra:• A = B• A ≠ B• {A, B, C, D, …} ≥ 2 individuals

• Matches of any noun (person, place,event, etc.)

• Often evidence for A = B and A ≠ B

Luther Tychonievich 22 of 35

Page 23: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• The “hidden” step today: from“name: Jane” we infer “gender:female” but don’t record that wemade an inference.

• Data simple, missing in our tools

Luther Tychonievich 23 of 35

Page 24: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

• The final belief is logicallyuninteresting, but socially vital

• Just one is a problem• Many tools store only beliefs andartefacts found

Luther Tychonievich 24 of 35

Page 25: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Luther Tychonievich 25 of 35

Page 26: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond

Luther Tychonievich 26 of 35

Page 27: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Do we want rigorous tools?• Depends on the audience• Option of rigor seems good• Can we make a “proof-carryingGEDCOM”?

Luther Tychonievich 27 of 35

Page 28: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

To achieve rigor:• Embrace ambiguity

• 2+ conflicting versions of each step• Adopt best-practice transcription• Match algebra• Support reasons for matches• Add inferences explicitly

Luther Tychonievich 28 of 35

Page 29: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

…and beyond• Track the “select” step• Inference rules could allow

• translation-free rationales• statistically-sound probability

• Mid-research belief: A = B = C ≠ A• Time-varying rigor could berevealing data

Luther Tychonievich 29 of 35

Page 30: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond• Extra Slides

Luther Tychonievich 30 of 35

Page 31: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

More on Inferences• Inference step:rule + antecedents = consequent

• Antecedents, consequent: claims• source of consequent: the inference• support of conseq.: the antecedents• rationale of consequent: the rule• Support may be enough…

Luther Tychonievich 31 of 35

Page 32: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Inference “rule”s• A predicate over antecedents• A claim built from antecedents• Could be probabilistic (“trend” not“rule”)

• Consequent could be claim, match,non-match, etc.

• Encodes “why” as pure dataLuther Tychonievich 32 of 35

Page 33: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

More on Probability• Probability a claim is true: 0 or 1• Probability a rule’s consequent istrue: validated

validated + invalidated(approximate)

• Is “83% chance Jane is John’smother” useful?

• What about “99.7% chance eitherJane, Sue, or Anna is John’s mother”?

Luther Tychonievich 33 of 35

Page 34: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Outline• Two definitions of “proof ”• Can genealogy be rigorous?• Can genealogy tools be rigorous?• An upper bound on possible rigor• Achieving rigor, and beyond• Extra Slides

Luther Tychonievich 34 of 35

Page 35: From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Questions?

These slides: http://www.cs.virginia.edu/luther/RT2015.pdfMe: [email protected] or [email protected]

Luther Tychonievich 35 of 35