Authorities Futures
T Hickey OCLC
Why authorities?
Searching
Browsing
Variations on Tchaikovsky
• NACO: Tchaikovsky, Peter Ilich,‡1840-1893• German: Cajkovskij, Petr I.‡1840-1893• French: Cajkovskij‡Piotr Ilʹic‡1840-1893• Cyrillic: Чайкoвский, Пётр Ильич (1840-1893)
More ways to say Chajkowskii
Ciaikovsky, Piotr Ilic 1840-1893Tschaikowsky, Peter Iljitch 1840-1893Tchaikowsky, Peter Iljitch 1840-1893Ciaikovsky, Pjotr Iljc 1840-1893Cajkovskij, P. I. 1840-1893Tsjaikovsky, Peter Iljitsj 1840-1893Czajkowski, Piotr 1840-1893Chaikovsky, P. I. 1840-1893Csajkovszkij, Pjotr Iljics 1840-1893Tsjaikovskiej, Pjotr Iljietsj 1840-1893Tjajkovskij, Pjotr Ilitj 1840-1893Caikovskis, P. 1840-1893Chaikovskii, Petr Ilʹich 1840-1893Tchaikovski, P. 1840-1893Tchaikovski, Piotr Ilyitch 1840-1893Chaikovskii, P. 1840-1893Tchaikovsky, P. 1840-1893Tchaikovsky, Piotr Ilitch 1840-1893Tschaikowsky, Pjotr Iljitsch 1840-1893
Tschajkowskij, Pjotr Iljitsch 1840-1893Tchaikovski, P. I. 1840-1893Ciaikovskij, Piotr 1840-1893Ciaikovskji, Piotr Ilijich 1840-1893Tschaikowski, P. I. 1840-1893Tschaikowski, Peter Illic 1840-1893Tjajkovskij, Peter 1840-1893Chaikovski, Pʹotr Ilich 1840-1893Tschaikousky 1840-1893Tschaijkowskij, P. I. 1840-1893Tschaikowsky, P. I. 1840-1893Chaikovski, P. I. 1840-1893Tchaikovski, Petr Ilitch 1840-1893Ciaikovski, Peter Ilic 1840-1893Tschaikowski, Pjotr 1840-1893Tchaikowsky, Pyotr 1840-1893Sinopov, P. 1840-1893Tchaikovskij, Piotr Ilic 1840-1893柴可夫斯基
Wider coveragePublished, unpublished, objects, licensed, archival
Multiple sourcesMachine generatedInfo. professionals, scholars, researchers, enthusiasts
Broader use of APIsMultiple viewsBetter contextBetter navigationMore mashups
Authorities touch everything
33 Nodes132 CPUs528 Gigabytes memory33 Terabytes disk
100-fold speed up
1 hour → <1 minute 1 day → 15 minutes1 month → 8 hours
Controlling WorldCat Virtual International Authority File WorldCat Identities
Controlling names in WorldCat
• Has been done semi-manually– Encourages review of all links
• For Identities we did this automatically– Research copy of WorldCat– Very aggressive matching
• How to move links to WorldCat?
Pretend you are a Connexion Client
• Program to:– Log in– Search for record– Verify heading hasn’t changed– Insert authorized form– Add link– Do replace
Then just replace 26 million records
• Each update takes two transactions– Retrieve the record– Replace the record
• If it takes 2 seconds/update– 52,000,000 seconds– ~ 2 years
But, we can run multiple clients
• Connexion can handle 40+ of these clients– ~ 20 records/second
• Offline processing has limited capacity– Run 32 clients for 12 hours for 16 updates/second– ~700,000 overnight– Up to a million/day
• 3 million/week• 2-3 months elapsed time
Virtual InternationalAuthorityFile
VIAF
DNB Bib & Authority BnF Bib & Authority LC Bib & Authority
VIAF
• ~7.5 million personal name authority records• ~25 million bibliographic records• ~1.2 million links between files
Match on
• Names and dates in headings• Standard numbers• Titles• Coauthors• Publishers• Personal name as subject
Matching situations
Hickey, Thomas Butler, ‡d 1947-
Dempsey, Lorcan
Tchaikovsky, Peter Ilich
Cajkovskij, Petr I.
Cajkovskij, Petr I./Tchaikovsky, Peter Ilich/Чайкoвский, Пётр ИльичЧайкoвский, Пётр Ильич
Fournier, Marcel
Fournier, Marcel,‡1946-
Fournier, Marcel,‡1945-
What makes a match?
1,338,606 Title 526,234 Double date 67,749 Joint author 47,499 LCCN 15,867 Partial date and partial title 6,454 Partial date and publisher 4,673 Partial title and publisher 4,116 Name as subject 2,158 Standard number
Next steps for VIAF
Merged display Better documentation More participants Geographics
Australian Identities (in WorldCat)
51,399 Keneally, Thomas42,679 Fox, Mem30,301 Travers, P. L.28,998 Lindsay, Jack19,179 Marsden, John16,688 Stead, Christina15,041 Malouf, David14,717 Jennings, Paul13,769 Lawson, Henry12,612 Winton, Tim
Editing
Merged result
Immediately visible in Identities Persistent in Identities Information fed into established channels
Implementation
SRU/SRW server (Z39.50 for the Web) XML returned XSLT style sheets transform it to HTML
Syndication
Searchable via SRU, OpenURL Sitemaps for harvesters HTML for harvesters and mobile devices Links in Wikipedia
More Identities
Thomas HickeyChief ScientistOCLC
[email protected]://worldcat.org/identities/lccn-n82-54463http://orlabs.oclc.org/viaf/LC|n82054463