Transcript
Page 1: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

APACHE SOLR CMS INTEGRATION

Ingo RennerSoftware Engineer

Page 2: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

we build smart.

ID INFIELD DESIGN

MAY.01.2013LUCENE/SOLR REVOLUTION

TYPO3 CMS and Solr. How we did it.

APACHE SOLR CMS INTEGRATION

Page 3: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

ABOUT IDWhat we do and who we do it for

• Strategy Planning

• Design

• UX

• Development & Integration

Page 4: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

WHO IS THIS GUY?

• Committer TYPO3 CMS

• Committer and PMC member Apache Tika

• Release Manager TYPO3 CMS 4.2

• New San Franciscan

• Snowboarding, mountain biking

• Software Engineer, Architect at Infield Design

- Caution -TYPO3-Evangelist

Page 5: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

TYPO3 CMS

Page 6: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

TYPO3 CMS

• Free and Open Source Enterprise CMS

• Estimated 500,000+ installations worldwide

• Over 6,000+ public extensions

• 6,000,000+ downloads

• Content Management Framework

• Multi-Site, Multi-Language, Versioning, Workflows, ...

• Stable, Secure, Scaleable

Page 7: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

TYPO3 COMMUNITY

• Community driven development

• Conferences in North America, Europe, Asia

• Barcamps, Developer Days, Snowboard Tour

• 4 times Google Summer of Code participant

• Backed by TYPO3 Association

• Several other projects under the TYPO3 brand

Page 8: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

SOLR & CMS INTEGRATION

Page 9: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

PAGE RENDERING

• Di!erent template engines

• (too) flexible page rendering engine

• Identify relevant content on websites

• Exclude navigation and common page elements

• Content generated by plugins

Page 10: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

INDEX QUEUE

• Index Queue to track and index content

• Record Monitor to update Index Queue

• Crawl pages, index unstructured content marked relevant

• Exclude pages with plugin-generated content

• Index structured plugin data directly from DB

Page 11: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

ACCESS RIGHTS

• Intranet, Extranet, ...

• Not everybody may see everything

• Flexible user groups and permissions

• Permissions extended to sub-pages

Page 12: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

SOLR ACCESS FILTER PLUGIN

• Custom Solr access filter plugin

• Query Parser and Filter

• User group IDs stored in documents

• Current user’s groups submitted with query

• Plugin matches document groups with user’s groups

Page 13: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

FILE INDEXING

• Finding file links in page content

• Core file links vs. plugin file links

• Track files for indexing

• Reading file content

• Separate tools for di!erent file formats

Page 14: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

FILE INDEXING

• File Detectors & File Index Queue

• File system abstraction layer

• Apache Tika

• Knows 1,200+ file formats, reads about half of them

• Content & meta data extraction

• Language detection

Page 15: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

THE REST

• PHP people vs. Java technology

• Talking to Solr

• Learning from mistakes

Page 16: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Integration Challenges & Solutions

THE REST

• Fully automated bash install script

• SolrPhpClient

• Separate your languages

Page 17: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

EXT:solr - Apache Solr for TYPO3

FEATURES

• Facetted Search

• File Indexing

• Multi-Language & Multi-Site Support

• Did you mean, More Like This

• Search Word Highlighting

• Auto Complete

• Access Rights Support

• Many More ...

Page 18: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013
Page 19: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013
Page 20: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013
Page 21: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013
Page 22: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013
Page 23: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

we build smart.

ID INFIELD DESIGN

QUESTIONS?

Page 24: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

ID INFIELD DESIGN

we build smart.

THANKS.

Page 25: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

ID INFIELD DESIGN

we build smart.

T3CON North AmericaSan Francisco, May 30-3120% o! regular ticket price, use:LUCENETYPO3

INFIELD DESIGN is hiring!

Page 26: Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge gets you in the door

TOMORROW Breakfast starts at 7:30Keynotes start at 8:30

CONTACT@[email protected], [email protected]


Recommended