15
Twin Cities Drupal Users Group - October 22, 2008 EthicShare: Solr + Drupal Under the Hood Tour

EthicShare.org (Mostly Solr)

  • Upload
    libsys

  • View
    2.471

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EthicShare.org (Mostly Solr)

Twin Cities Drupal Users Group - October 22, 2008

EthicShare: Solr + Drupal

Under the Hood Tour

Page 2: EthicShare.org (Mostly Solr)

EthicShare?

• Who: University of Minnesota's Center for Bioethics, the University of Minnesota Libraries, and the University of Minnesota Department of Computer Science and Engineering

• EthicShare’s pilot implementation builds on a recent planning phase that was a collaboration with the University of Virginia, Georgetown University, Indiana University-Bloomington, and Indiana University-Purdue University, Indianapolis.

• What: A sustainable aggregation of bioethics research and a platform for scholarship

• When: Pilot Phase runs from January 2008 - June 2009

• How: Funded by the Andrew W. Mellon Foundation

Page 3: EthicShare.org (Mostly Solr)

The Platform• Drupal

• Community Development Framework• Solr

• Faceted Search Appliance

Page 4: EthicShare.org (Mostly Solr)

The Process

Page 5: EthicShare.org (Mostly Solr)
Page 6: EthicShare.org (Mostly Solr)

• Origin: Created by CNET and released January 2006

• Became an Apache Software Foundation project shortly thereafter

• Builds on the Lucene Search Engine Library• Comes with Lucene’s search syntax and features

• Provides simple HTTP/XML API• Strongly typed field definitions• Noteworthy Implementations

Netflix, CNET Reviews, GameSpot, Digg• More: http://wiki.apache.org/solr/PublicServers

Page 7: EthicShare.org (Mostly Solr)

Behind the Scenes - Indexing• HTTP/XML API http://localhost:8983/solr/update http://localhost:8983/solr/select• Indexing = POSTing XML Records to /update• Commands: <add><delete><commit/><optimize/>

<add>

<doc>

<field name=”nid">101</field>

<field name=”vid">2</field>

<field name="title">Solr Search is Simply Great</field>

<field name=”body">Solr and Drupal are like PB And J</field>

<field name="changed">1224707462</field>

<field name=”tid">4</field>

<field name=”name">libsys</field>

<field name=”uid">10297</field>

</doc>

</add>

Page 8: EthicShare.org (Mostly Solr)

Behind the Scenes - Searching• Get Contents of …/select URL: cURL, file_get_contents($url)…• ApacheSolr makes use of a Solr PHP Client Abstraction Layer

• http://wiki.apache.org/solr/SolPHP

Page 9: EthicShare.org (Mostly Solr)

Setup - Solr Directory Layout

Tomcat Files:…/tomcat/webapps/solr_ethicshare.war (cp solr.war from example dir)

…/tomcat/conf/Catalina/localhost/solr_ethicshare.xml

solr_ethicshare.xml - Tell Tomcat About Solr <Context docBase="solr_ethicshare.war" debug="0" crossContext="true" > <Environment name="solr/home" type="java.lang.String" value="/usr/local/solr_home/ethicshare" override="true" /></Context>

Page 10: EthicShare.org (Mostly Solr)

Solr Schema - Fields and Types

• Starter schema: – ../drupaldir/sites/all/modules/apachesolr/schema.xml

• <types> ex:– string=solr.StrField– boolean=solr.BoolField

• <fields>– <field name="title" type="string" indexed="true" stored="true"/>

Page 11: EthicShare.org (Mostly Solr)

Solr Schema - <type> Analyzers

• Tokenize on whitespace, then remove any common words (StopFilterFactory)

• Remove any duplicates (RemoveDuplicatesTokenFilterFactory)

Page 12: EthicShare.org (Mostly Solr)

Solr Schema - Dynamic Fields

<dynamicField name="smfield*" type="string" indexed="true" stored="true" multiValued="true"/>

<dynamicField name="tmfield*" type="text" indexed="true" stored="true" multiValued="true"/>

Page 13: EthicShare.org (Mostly Solr)

Solr Schema - Some Example Options• uniqueKey

• <!-- Field to use to determine and enforce document uniqueness.• Unless this field is marked with required="false", it will be a required field• -->

• <uniqueKey>nid</uniqueKey>

• defaultSearchField• <!-- field for the QueryParser to use when an explicit fieldname is absent -->

• <defaultSearchField>text</defaultSearchField>

• solrQueryParser• <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --

• <solrQueryParser defaultOperator="AND"/>

Page 14: EthicShare.org (Mostly Solr)

ApacheSolr Search Integration Module• Core Search Integrated• Blocks for facet configuration• Schedules Indexing (via core search)• Theme Hooks for overriding look and feel• CCK Integration

• hook_apachesolr_cck_field_mappings()• Which Fields to Index• How to Index them• Callback to pre-process fields• Whether or Not to Provide a Facet Block

• Help! We need testers for alpha3!

• http://drupal.org/project/apachesolr

Page 15: EthicShare.org (Mostly Solr)

• Installing Solr + Tomcat• http://mikejoconnor.net/content/solr-ubercartorg

• Google Book Search API• http://code.google.com/apis/books/

• unAPI• http://unapi.info/

• ApacheSolr Search Integration• http://drupal.org/project/apachesolr

• IBM Developer Works - Solr• http://www.ibm.com/developerworks/java/library/j-

solr1/• SolPHP

• http://wiki.apache.org/solr/SolPHP

Links