31

Webinar: Solr's example/files: From bin/post to /browse and Beyond

Embed Size (px)

Citation preview

Solr’s example/filesfrom bin/post to /browse and beyond

$ bin/post -c your_collection your_data/

https://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/

https://lucidworks.com/blog/2015/12/08/browse-new-improved-solr-5/

http://localhost:8983/solr/<collection>/browse

example/files

• Distilled, simple, document type navigation

• Multi-lingual, localizable interface

• Language detection and faceting

• Phrase/shingle indexing and "tag cloud" faceting

• E-mail address and URL index-time extraction

• "instant search" (as you type results)

$ bin/solr start

$ bin/solr create -c files -d example/files

$ bin/post -c files ~/Documents

$ open http://localhost:8983/solr/files/browse

quick start

The UI is The App

document type filtering

language detection and faceting

• instant search

• localized interface

• HTML safe highlighting

URLs are UI too!• /browse is stateless

• all parameters for the view must be passed on the URL

• and/or in config:

• request handler definition (core reload API)

• paramsets / params.json (real-time API, no reload)

/browse?type=pdfbrowsing by document type

/browse?locale=de_DElocalizing the interface

URL comparison• /browse?type=html&locale=de_DE

• /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html}&wt=velocity&v.template=browse&v.layout=layout&q=*:*&facet.query={!ex=type%20key=all_types}*:*&facet=on&facet.field={!ex=type}doc_type…

URL comparison: /browse• /browse?type=html&locale=de_DE

• /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html}&wt=velocity&v.template=browse&v.layout=layout&q=*:*&facet.query={!ex=type%20key=all_types}*:*&facet=on&facet.field={!ex=type}doc_type…

URL comparison: type• /browse?type=html&locale=de_DE

• /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html}&wt=velocity&v.template=browse&v.layout=layout&q=*:*&facet.query={!ex=type%20key=all_types}*:*&facet=on&facet.field={!ex=type}doc_type…

URL comparison: locale• /browse?type=html&locale=de_DE

• /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html}&wt=velocity&v.template=browse&v.layout=layout&q=*:*&facet.query={!ex=type%20key=all_types}*:*&facet=on&facet.field={!ex=type}doc_type…

Solr Tips and Tricks within• Indexing “pipeline”

• language detection

• document type identification

• e-mail address and URL extraction

• Top Phrases

• Query “pipeline”

• document type faceting and filtering

• UI localisation/localization

implementation: language detection

&facet.field=language (via params.json)

conf/solrconfig.xml:

implementation: document type identification

conf/update-script.js:

implementation: E-mail address and URL extraction

conf/email_url_types.txt

<URL>

<EMAIL>

/select?fl=id,email_ss,url_ss&wt=csv

conf/managed-schema:

conf/update-script.js:

implementation: Top Phrases

&facet.field=text_shingles

conf/managed-schema:

implementation: document type faceting and filtering

&type=[doc|pdf|image|…|all|unknown]

fq={!switch v=$type tag=type

case=‘*:*'

case.all=‘*:*'

case.unknown='-doc_type:[* TO *]’

default=$type_fq}

type_fq={!field f=doc_type v=$type}

facet.field={!ex=type}doc_type f.doc_type.facet.mincount=0 f.doc_type.facet.missing=true facet.query={!ex=type key=all_types}*:*

implementation: UI

conf/params.json:

conf/solrconfig.xml:

example/files: what’s next?

• Fix e-mail and URL field names (<email>_ss and <url>_ss, with angle brackets in field names), also add display of these fields in /browse results rendering

• Improve quality of extracted phrases

• Extract, facet, and display acronyms

• Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title

• Add grouping by doc_type perhaps

• fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well)

• Harden update-script: it currently errors if documents do not have a "content" field

• Filter out bogus e-mail addresses

https://issues.apache.org/jira/browse/SOLR-8590

And beyond…• Leveraging https://github.com/LucidWorks/fusion-solr-plugins

• Analytics

• Relevancy Tuning: signals feedback, parameter adjustments

• Landing pages, scripting, etc

Analytics

Landing Pages