Upload
erik-hatcher
View
1.161
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content. Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr. This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications.
Citation preview
© Copyright 2013 LucidWorks
Solr Powered Libraries:A survey of the world's knowledge bases
May 2, 2013Presented by Erik Hatcher
Thursday, May 2, 13
© 2013 LucidWorks
Abstract
Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content.
Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr.
This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications.
2
Thursday, May 2, 13
© 2013 LucidWorks
Real Solar Powered Library !
•http://www.ktsm.com/news/texas-library-runs-sunshine
3
Thursday, May 2, 13
© 2013 LucidWorks
Card carrying library geek
•Applied Research in Patacriticism (ARP)- Rossetti Archive: http://www.rossettiarchive.org- NINES: http://www.nines.org/- Collex: http://www.collex.org
•Blacklight- originated as an implementation of Solr Flare
•Presentations- http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013- Library of Congress: "Solr Powered Libraries" (2007)
»http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113- EBTI/CBETA Conference 2008- Publication: “Library 2.0 Initiatives in Academic Libraries”
•Windsor Lucene Summit•eIFL-FOSS
4
Thursday, May 2, 13
© 2013 LucidWorks
Rossetti Archive
5
Thursday, May 2, 13
© 2013 LucidWorks
NINES/Collex
6
Thursday, May 2, 13
© 2013 LucidWorks
Card catalog
• the original inverted index
7
http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg
Thursday, May 2, 13
© 2013 LucidWorks
•http://openlibrary.org/- project of the Internet Archive
•Goal: "A (community editable) web page for every book"
8
Thursday, May 2, 13
© 2013 LucidWorks
dp.la - Digital Public Library of America
9
Lucene/ElasticSearch Powered
Thursday, May 2, 13
© 2013 LucidWorks
Wikimedia/Wikipedia/MediaWiki
•Solr powered: translation memory service, GeoData extension, etc
• "heavily modified Lucene" powers main site search currently
10
Thursday, May 2, 13
© 2013 LucidWorks
HathiTrust
• "partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future."
• 10.5M books, 12TB OCR+metadata, hundreds of languages- "Books are different"- http://code4lib.org/conference/2013/burton-west
• http://www.hathitrust.org/blogs/large-scale-search- http://www.hathitrust.org/blogs/large-scale-search/too-many-words- "org.apache.solr.common.SolrException: Impossible Exception"- CommonGrams- word segmentation: autoGeneratePhraseQueries="false"
• HathiTrust Research Center- The infrastructure includes an entrance portal, search and collection-building tools (using
Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus (more than 3 million volumes). In addition to the production services, the HTRC offers a development “sandbox”. The sandbox runs against non-Google scanned content (about 260,000 volumes) and provides a test-bed for interested researchers to experiment with writing their own algorithms for use in the HTRC infrastructure.
11
Thursday, May 2, 13
© 2013 LucidWorks
Smithsonian Institution
•http://collections.si.edu•Many disparate data sources:
- 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical Observatory, research centers in Panama,Boston, New York, Maryland,and Virginia
• "Documents" of all varieties:- Photographs, paintings, manuscripts, letters, postage stamps,scientific
specimens, rockets, airplanes, postcards, sound recordings, posters, decorative arts, ceramics, maps, sculptures, publication papers, books, trade catalogs, etc
•User tagging, negative/exclude filtering, DIH SolrEntityProcessor•http://bit.ly/13P41YJ
- http://www.basistech.com/pdf/events/open-source-search-conference/oss-2011-wang-steps-toward-open-government.pdf
12
Thursday, May 2, 13
© 2013 LucidWorks
13
Thursday, May 2, 13
© 2013 LucidWorks
14
Thursday, May 2, 13
© 2013 LucidWorks
•SerialsSolutions Summon
•http://www.serialssolutions.com/en/services/summon•SaaS, single unified index, match & merge
15
Thursday, May 2, 13
© 2013 LucidWorks
Astrophysics Data System Labs
•Smithsonian, NASA, Harvard•http://adslabs.org
16
http://code4lib.org/conference/2013/luker
Thursday, May 2, 13
© 2013 LucidWorks
•vufind.org•Powers main HathiTrust UI (currently) and many more
- see http://vufind.org/wiki/installation_status
17
Thursday, May 2, 13
© 2013 LucidWorks
18
Thursday, May 2, 13
© 2013 LucidWorks
• "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for any Solr index. Blacklight provides a default user interface which is customizable via the standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous data, allowing different information displays for different types of objects."- http://projectblacklight.org
• Founded at the University of Virginia (2007): search.lib.virginia.edu- UV-A solar radiation == blacklight
• Initial contributors: UVa, Stanford, JHU, WGBH• University of Hull, United States Holocaust Memorial Museum, University of Wisconsin-
Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University, Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and Issue Campaign Exchange, is a one-stop web-based public library of progressive state and local laws), and many more
• http://projecthydra.org/ uses Blacklight as UI component
19
Thursday, May 2, 13
© 2013 LucidWorks
searchworks at Stanford
20
Thursday, May 2, 13
© 2013 LucidWorks
Advanced search at Stanford's searchworks
21
Thursday, May 2, 13
© 2013 LucidWorks
searchworks: Mapping Text Boxes to Solr query pieces
•http://code4lib.org/conference/2010/dushay_keck
22
Thursday, May 2, 13
© 2013 LucidWorks
•https://catalyst.library.jhu.edu/
23
Thursday, May 2, 13
© 2013 LucidWorks
Rock and Roll!
• \m/
24
Thursday, May 2, 13
© 2013 LucidWorks
Community and Resources
•code4lib:- http://www.code4lib.org/
•HathiTrust folks- http://www.hathitrust.org/blogs/large-scale-search- http://robotlibrarian.billdueber.com/
•http://bighumanities.net/- The Workshop on Big Humanities will be held in conjunction with the 2013
IEEE International Conference on Big Data (IEEE BigData 2013), which will take place between 6-9 October 2013 in Silicon Valley, California, USA, and which provides a leading international forum for disseminating the latest research in the growing field of “big data
25
Thursday, May 2, 13
© 2013 LucidWorks
26
http://heatherbrewer.com/blog/2013/04/15/libraries-rock/
Thursday, May 2, 13