57
5/14/12 h(p://dar.bibalex.org 1 Accessing Your Library Book Collec5ons Using Solr By: Engy Morsy Software project manager, Bibliotheca Alexandrina [email protected]

How to Access Your Library Book Collections Using Solr

Embed Size (px)

DESCRIPTION

Presented by Engy Ali | The Library of Alexandria See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 Do you have a large collection of text content that you want to search? Facing challenges on how to facet after performing a full text search across metadata and content? Do you want to use Solr with personalization? Bibliotheca Alexandrina provides public access to digitized book collections that exceed 220,000 books, through a web-based search and browsing facility. The facility is completely built on Solr in five different languages. The website provides full text morphological search within the books’ metadata and content with result highlighting. Different personalization features like annotation tools and tagging are also implemented using Solr. This presentation will cover how Bibliotheca Alexandrina uses Solr to implement full text indexing and searching across the entire collection, faceting, search within the content of a book and result highlighting and techniques used for personalization.

Citation preview

Page 1: How to Access Your Library Book Collections Using Solr

5/14/12   h(p://dar.bibalex.org   1  

Accessing  Your  Library  Book  Collec5ons  Using  Solr  

By: Engy Morsy Software project manager, Bibliotheca Alexandrina

[email protected]  

Page 2: How to Access Your Library Book Collections Using Solr

 

BA  &  Solr  5/14/12   h(p://dar.bibalex.org   2  

Page 3: How to Access Your Library Book Collections Using Solr

h(p://bibalex.org  

5/14/12   h(p://dar.bibalex.org   3  

Page 4: How to Access Your Library Book Collections Using Solr

h(p://wamcp.bibalex.org  

5/14/12   h(p://dar.bibalex.org   4  

Page 5: How to Access Your Library Book Collections Using Solr

h(p://ssc.bibalex.org  

5/14/12   h(p://dar.bibalex.org   5  

Page 6: How to Access Your Library Book Collections Using Solr

h(p://dar.bibalex.org  

5/14/12   h(p://dar.bibalex.org   6  

Page 7: How to Access Your Library Book Collections Using Solr

Introductory  Video  

5/14/12   h(p://dar.bibalex.org   7  

Page 8: How to Access Your Library Book Collections Using Solr

Agenda  

•  Brief  introducFon  to  DAR  architecture  •  Indexing  books’  collecFon  •  Searching  across  Metadata  and  Content  •  FaceFng    •  Searching  Book  Content  •  Solr  with  personalizaFon  •  Future  •  Q&A  5/14/12   h(p://dar.bibalex.org   8  

Page 9: How to Access Your Library Book Collections Using Solr

About  1.5  Million  books  

5/14/12   h(p://dar.bibalex.org   9  

Page 10: How to Access Your Library Book Collections Using Solr

5/14/12   h(p://dar.bibalex.org   10  

Digital  Assets  Repository  

Page 11: How to Access Your Library Book Collections Using Solr

Digital  Assets  Repository  

5/14/12   h(p://dar.bibalex.org   11  

Page 12: How to Access Your Library Book Collections Using Solr

Book  site  

•  Approximately  260,000  books    •  Nearly  220,000    books  published  online    •  About  1.5  TB  of  content  •  Average  book  size  6  MB    •  Daily  indexing  rate  is  about  150  books.  

5/14/12   h(p://dar.bibalex.org   12  

Page 13: How to Access Your Library Book Collections Using Solr

What  do  we  want…?  

•  Allow  simple  and  advanced  search  across  metadata  and  content  in  5  languages  

5/14/12   h(p://dar.bibalex.org   13  

Page 14: How to Access Your Library Book Collections Using Solr

Simple  Search  

5/14/12   h(p://dar.bibalex.org   14  

Page 15: How to Access Your Library Book Collections Using Solr

What  do  we  want…?  

•  Allow  simple  and  advanced  search  across  metadata  and  content  in  5  languages  

•  FaceFng    

5/14/12   h(p://dar.bibalex.org   15  

Page 16: How to Access Your Library Book Collections Using Solr
Page 17: How to Access Your Library Book Collections Using Solr
Page 18: How to Access Your Library Book Collections Using Solr
Page 19: How to Access Your Library Book Collections Using Solr
Page 20: How to Access Your Library Book Collections Using Solr

What  do  we  want…?  

•  Allow  simple  and  advanced  search  across  metadata  and  content  in  5  languages  

•  FaceFng  •  AnnotaFons    

5/14/12   h(p://dar.bibalex.org   20  

Page 21: How to Access Your Library Book Collections Using Solr
Page 22: How to Access Your Library Book Collections Using Solr

Text  Underlining  

Page 23: How to Access Your Library Book Collections Using Solr

Text  Highligh5ng  

Page 24: How to Access Your Library Book Collections Using Solr

Adding  S5cky  Notes  

Page 25: How to Access Your Library Book Collections Using Solr

What  do  we  want…?  

•  Allow  simple  and  advanced  search  across  metadata  and  content  in  5  languages  

•  FaceFng  •  AnnotaFons  •  PersonalizaFon    

5/14/12   h(p://dar.bibalex.org   25  

Page 26: How to Access Your Library Book Collections Using Solr

Arranging  Books  in  Bookshelves  

Page 27: How to Access Your Library Book Collections Using Solr

SubmiIng  Comments  

Page 28: How to Access Your Library Book Collections Using Solr

Ra5ng  

Page 29: How to Access Your Library Book Collections Using Solr

Embedding  

Page 30: How to Access Your Library Book Collections Using Solr

Sharing  the  book  link  in  other  social  networks  

Page 31: How to Access Your Library Book Collections Using Solr

What  lies  beneath!!  

5/14/12   h(p://dar.bibalex.org   31  

Page 32: How to Access Your Library Book Collections Using Solr

Book  site  indices  

5/14/12   h(p://dar.bibalex.org   32  

AR  Index  

EN    Index  

FR  Index  

IT  Index  

SP  Index  

Query  

Page 33: How to Access Your Library Book Collections Using Solr

                         Indexing  Book  CollecFon  

•  Index  per  language  •  A  Document  in  the  content  index  correspond  to  a  page  in  a  book  

•  Maintain  a  field  to  disFnguish  between  metadata  record  and  content  record  (e.g.  SolrType)  

•  Use  staFc  fields  for  all  content  index  (e.g.  PageID..etc)  

5/14/12   h(p://dar.bibalex.org   33  

Page 34: How to Access Your Library Book Collections Using Solr

What  is  the  problem  with  this  solu5on?  

5/14/12   h(p://dar.bibalex.org   34  

Page 35: How to Access Your Library Book Collections Using Solr

Problem  for  content  search  

Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”  

5/14/12   h(p://dar.bibalex.org   35  

Page 36: How to Access Your Library Book Collections Using Solr

SolrType        Content  

SolrType      Meta  

Proposed  soluFon  

5/14/12   h(p://dar.bibalex.org   36  

Title:  Mobile  Technology  

Content  :  “cloud  compuFng”  

..  index  

..  index  

Get  intersecFon  

Result  IDs  

Facet  result  

Final  result  

Parent  Book  IDs  

..  index  

Page 37: How to Access Your Library Book Collections Using Solr

The  problem  is…  

•  Can’t  get  the  faceFng  result  directly  from  the  content  index  

•  Need  to  query  the  metadata  index  in  order  to  get  the  final  facet  result  

processing  Fme!!!  

5/14/12   h(p://dar.bibalex.org   37  

Page 38: How to Access Your Library Book Collections Using Solr

SoluFon…!  

•  Metadata  denormalizaFon  – Denormalize  metadata  into  content  index  

5/14/12   h(p://dar.bibalex.org   38  

Page 39: How to Access Your Library Book Collections Using Solr

SolrType        Content  

SolrType      Meta  

Proposed  soluFon  

5/14/12   h(p://dar.bibalex.org   39  

Title:  Mobile  Technology  

Content  :  “cloud  compuFng”  

..  index  

..  index  

Get  intersecFon  

Result  IDs  

Facet  result  

Final  result  

Page 40: How to Access Your Library Book Collections Using Solr

 Problem  for  content  search  

•  Metadata  denormalizaFon…..    

5/14/12   h(p://dar.bibalex.org   40  

Worst  choice!     •  Re-­‐indexing  for  changes  in  

metadata  •  Data  processing  is  required.  

 

Page 41: How to Access Your Library Book Collections Using Solr

New  Solu5on  

5/14/12   h(p://dar.bibalex.org   41  

Page 42: How to Access Your Library Book Collections Using Solr

Indexing  Metadata    

•  Index  per  language    •  Separate  content  and  metadata  index  •   Text  field  holds  the  whole  book  content  in  the  metadata  index  – The  maxFieldLength  has  been  set  to  maximum.  

•  e.g:  2147483647  

5/14/12   h(p://dar.bibalex.org   42  

Page 43: How to Access Your Library Book Collections Using Solr

Back  to  the  example  

Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”  

5/14/12   h(p://dar.bibalex.org   43  

Page 44: How to Access Your Library Book Collections Using Solr

SoluFon  

5/14/12   h(p://dar.bibalex.org   44  

Title:  Mobile  Technology  

Content  :  “cloud  compuFng”  

Meta  index  

Facet  result  

Page 45: How to Access Your Library Book Collections Using Solr

soluFon  

5/14/12   h(p://dar.bibalex.org   45  

Title:  Mobile  Technology  

Content  :  “cloud  compuFng”  

Meta  index  

Content  index  

Get  intersecFon  

Meta  index  

Facet  result  

Page 46: How to Access Your Library Book Collections Using Solr

   Separate  indexes  Vs.  All  in  one  

 •  Separate  indexes  

+  Indexing  Fme  +  Index  size  -­‐  Processing  results  (facets..)  -­‐  Scoring  

5/14/12   h(p://dar.bibalex.org   46  

Page 47: How to Access Your Library Book Collections Using Solr

   Separate  indexes  Vs.  All  in  one  

 •  Separate  indexes  

+  Indexing  Fme  +  Index  size  -­‐  Processing  results  (facets..)  -­‐  Scoring  

•  One  index  –  Index  size  –  Indexing  Fme  + Scoring  + Processing  Fme  

5/14/12   h(p://dar.bibalex.org   47  

Page 48: How to Access Your Library Book Collections Using Solr

Book  content  index  

5/14/12   h(p://dar.bibalex.org   48  

AR  Index  

EN    Index  

FR  Index  

IT  Index  

SP  Index  

Page 49: How to Access Your Library Book Collections Using Solr

5/14/12   h(p://dar.bibalex.org   49  

Page 50: How to Access Your Library Book Collections Using Solr

Searching  

•  Simple  and    advanced  search  – Cache  the  resulted  IDs  only  

•  HighlighFng  search  result  – Get  the  full  search  result  and  highlight  per  page  result  

 

 

5/14/12   h(p://dar.bibalex.org   50  

Page 51: How to Access Your Library Book Collections Using Solr

Book  Content  Search  

•  Search  using  – Search  query  – Book  ID  – List  of  pages’  IDs  

•  Highlights  •  AnnotaFons  – Saved  currently  in  DB  

5/14/12   h(p://dar.bibalex.org   51  

Page 52: How to Access Your Library Book Collections Using Solr

FaceFng  

•  Fixed  facet  fields    – Category,  sub-­‐category,  language..etc.  – Stored,  indexed,  exact  fields  

•  Process  facets  from  different  indices  

5/14/12   h(p://dar.bibalex.org   52  

Page 53: How to Access Your Library Book Collections Using Solr

PersonalizaFon  

•  Using  separate  index  of  personalizaFon    – Different  Solr  fields  for  different  languages.  – Search  across  all  fields.  

•  Saving  in  both  Solr  and  DB  •  Indexing  tags,  raFng  and  comments  using  type  field  

 

5/14/12   h(p://dar.bibalex.org   53  

Page 54: How to Access Your Library Book Collections Using Solr

Future  

•  Book  mobile  applicaFon  using  Solr  •  Using  Hadoop    •  Indexing  other  digital  media  (Maps,  audio,  video)  

5/14/12   h(p://dar.bibalex.org   54  

Page 55: How to Access Your Library Book Collections Using Solr

Contact    

   

engy.morsy  @bibalex.org  Library  website:  h(p://bibalex.org  

Digital  Asset  Repository:  h(p://dar.bibalex.org    

5/14/12   h(p://dar.bibalex.org   55  

Page 56: How to Access Your Library Book Collections Using Solr

5/14/12   h(p://dar.bibalex.org   56  

Page 57: How to Access Your Library Book Collections Using Solr

Thank  you…  

5/14/12   h(p://dar.bibalex.org   57