The Certainty of Citations

Preview:

DESCRIPTION

The Certainty of Citations. A proposal for an objective method of measuring certainty. Genealogy Background. Notice the light at the top of the picture. The FM Bobo Story. Grandmother. Grandfather of grandmother. 1860 Census. 1870 Census. - PowerPoint PPT Presentation

Citation preview

The Certainty of Citations

A proposal for an objective method of measuring

certainty

Genealogy BackgroundNotice the light at the

top of the picture.

The FM Bobo StoryGrandmother

Grandfather of grandmother

1860 Census

1870 Census

Marriage RecordCarroll County Arkansas Marriage Records

Eastern District Grooms Index1869-1930

Book/Page Groom Age Bride Age Date

A 63 BOBO FRANCES M. 19 LITTRELL MATILDA 16 6/02/1872

http://www.rootsweb.com/~arcchs/MARB.html

Note 3 year gap in age.

1880 Census

Remember Jarrett for

later

1920 Census

Jarrett’s Funeral Book

Record Summary

Record Date

Record Type Birth Reported Age Reported Implied Birth Death Rept

Cen Age Date

8/23/1860 CEN 8 1852 1-Jun

7/14/1870 CEN 18 1852 1-Jun

6/2/1872 MAR 19 1853

6/17/1880 CEN 25 1855 1-Jun

1/22/1920 CEN 71 1849 1-Jan

2/12/1951 FUN 11/17/193

2

1/1/1955 GRAV 10/1/1845 184511/10/193

1

Let’s talk about that …Note person partially in

picture.

The Information Flow Diagram

• Event – an association of an action, place, time, and person(s)

EVENTEVENT

Dick Eastman at GENTECH2, January

1994

The Information Flow Diagram

• Reporter – a person who creates a record about an event.

• We can measure confidence or bias.

EVENTEVENT

REPORTERREPORTER

John Wylie, president of

GENTECH for 5 years

The Information Flow Diagram

• Record – a report about an event, which may not be complete or accurate

• Measure granularity.

EVENTEVENT

REPORTERREPORTER

RECORDRECORD

What’s Granularity?

Small Medium Large

NAME James Powell Sharbrough

J Sharbrough Sharbrough

DATE June 2, 1872 June, 1872 1872

PLACE 123 Elm St Harris County Texas

Granularity ExamplesCase 1 Case 2 Case 3

Name FM Bobo - 2

Francis M Bobo – 3

Bobo -1

Date 1953 -1 June 1853 – 2

2 Jun 1872 - 3

Place 153 Elm St, Tulsa, OK - 3

Carroll Co, Ark – 2

Ark -1

6 7 5

The Information Flow Diagram

• Reviewer – a person who reviews records and draws conclusions.

• Evaluate ER Gap, evaluate Reporter.

EVENTEVENT

REPORTERREPORTER

RECORDRECORD

REVIEWERREVIEWER

Tony Burroughs, NGS 2001, Portland OR

“ER Gap”

The Information Flow Diagram

• Conclusion – a statement by a reviewer about a collection of records related to an event

• Report – a collection of conclusions.

EVENTEVENT

REPORTERREPORTER

RECORDRECORD

REVIEWERREVIEWER

REPORTREPORT

ER Gap

FarNear

Near

Far

“Primary” Record2

“Secondary” Record1

“Secondary” Record1

All Records about my family

0

Features of EVIDENCE: The Record

• Granularity• “Mind the Gap” - ER Gap• Reporter

CONCLUSION – Rate It

• 1 - Believe• 2 - Know• 3 - Can Prove• 0 – No claim• Negative numbers -1, -2, -3

TRUST: The Report

• Do this like eBay

So many formulas …

• … so few examples.

• Record granularity measurement – 3 to 9

• ER Gap – 0, 1, or 2

• Reviewer evaluation of reporter -1 to 10

• Reviewer confidence - -3 to 3

• Trust number, positive feedback ratio• [Granularity / 5] + [ER Gap] + [Report Eval / 5]

+ [Reviewer Confidence] + [Trust ratio / 0.5]

Demographic Info

Medical Info

The Death Certificate

It’s “What-if” Time

What if we could make the future however we like?

Mechanical Certainty

Finding Needles in Really Big Haystacks

Record Linking

• Building Indices

• Finding larger patterns

Where:

• x indicates the identifier and its value on the record from the file initiating the search (record A);

• y indicates the identifier and its value on the record from the file being searched (record B);

• LINKED pairs may refer either to all linked pairs, or to a defined subset of these; and

• UNLINKABLE pairs may refer either to all unlinkable pairs, or to a defined subset, provided the linked and the unlinkable sets (or subsets) are otherwise strictly comparable with each other.

pairsunlinkableamongyxoutcomeoffrequency

pairsLINKEDamongyxoutcomeoffrequencyRATIOFREQUENCY

),(

),(

Examples– FIRST INITIALS

AGREEMENT DISAGREEMENT LETTER “Q”

– YEAR OF BIRTH SIMILARITY (difference = 1 year) DISSIMILARITY (difference = 11+ years)

– GIVEN NAMES SIMILARITY (first 3 letters agree, none disagree – eg Sam vs

Samuel) SIMILARITY + DISSIMILARITY (first 3 letters agree, 4th disagrees – eg

Samuel vs Sampson)

– DIFFERENT BUT LOGICALLY RELATED IDENTIFIERS PLACE of WORK vs PLACE of DEATH (Provo vs Salt Lake City)

Some more examplesPercentagef requencies

I dentifi ers compared Comparisonoutcomes

Links Non-Links

Global f requencyratios (links/ non-

links)

SURNAME AgreeDisagree

96.53.5

0.199.9

965/ 11/ 29

FI RST NAME AgreeDisagree

79.021.0

0.999.1

88/ 11/ 5

MI DDLE I NI TI AL AgreeDisagree

88.811.2

7.592.5

12/ 11/ 8

YEAR OF BI RTH AgreeDisagree

77.322.7

1.198.9

70/ 11/ 4

MONTH OF BI RTH AgreeDisagree

93.36.7

8.391.7

11/ 11/ 14

DAY OF BI RTH AgreeDisagree

85.114.9

3.396.7

26/ 11/ 6

STATE/ COUNTRYOF BI RTH

AgreeDisagree

98.11.9

11.788.3

8/ 11/ 46

Discrimination

• A lookup table containing the frequencies of values for identifiers, as they appear in the file being searched.

• SURNAMES Brown (0.39), Aube (0.014), and Skuda (0.00004).

• FIRST NAMES John(5.30), Axel (0.020), and Ulder (0.0045).

Competing Hypotheses

Record DateRecord

Type Birth Reptd

Age Rept

dImplied

BirthDeath

ReptCen Age

Date Rate

8/23/1860 CEN   8 1852   1-Jun 60

7/14/1870 CEN   18 1852   1-Jun 60

6/2/1872 MAR   19 1853     40

1/22/1920 CEN   71 1849   1-Jan 40

6/17/1880 CEN   25 1855   1-Jun 25

2/12/1951 FUN      11/17/193

2   10

1/1/1955 GRAV 10/1/1845   184511/10/193

1   5

The Digital Research Assistant

• Search for records on internet

• Evaluate their relevance to assignment

• Evaluate their granularity, confidence, etc

• Evaluate patterns, such as families

• Report matches

• Let me set the knobs for the parameters

The DRA will have ...

• A heirarchy of useful comparison algorithms

• A method of searching across the Internet - and paying for it

• A method of documenting the source of that search that satisfies the rules of preserving intellectual property and academic research

Who knows what the formula will be?

• We are asking which dragons must be slain, but we aren’t saying how it must happen.

• We are talking about possible ways to accomplish our goal.

• That goal is connecting to new information, with confidence.

Summary

• Any type of review– Measurements of Records– Measurement of conclusions– Rating of publishers

• Mechanical searches– Record Linking– Smart Searches– Groupwork and Rights

Never forget to have fun

Recommended