Upload
lucenerevolution
View
960
Download
0
Tags:
Embed Size (px)
DESCRIPTION
See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011 Earlier this year, Sensis launched its Business Search API, which allows publishers to develop local search propositions powered by the two million business listings contained in the Australian Yellow Pages® and White Pages® directories. This case study will explore Sensis’ strategic direction for search and explain how the framework and metrics by which search is managed at Sensis were used to define our search roadmap. Key architectural decisions including our use of Solr and MongoDB will be discussed as well as our approach to real-time search tuning and quality management.
Citation preview
Search, APIs, Capability Management
and the Sensis Journey
Craig Rees
• Project background
• Platform selection
• Search capability
• Relevance
• Architecture
• Quality management
• Hurdles
• What’s next
Today’s menu
• Sensis helps Australians find, buy and sell
• From print directories to a cross-platform lead generator
• Sensis publishes over 1.8 Million business listings
• Two of the top 10 visited online sites in Australia (WhitePages.com.au and YellowPages.com.au)
Sensis
Business objectives
• Drive presence in the local search market place
• Open up the largest database of business listings in Australia
• Reduce the effort required from local search developers
• Free to use, we are after the reporting Technology objectives
• Develop a total search platform
• Relevancy testing as part of the development lifecycle
• A framework to identify problem spaces
• Manageable platform
• Continuous deployments
Project background
Developer portal
Platform selection
• Support for the search capability team
• Structured vs non structured data
• Deterministic vs black box
• Non propriety code base
• Community backing
Unmanaged
Adhoc
Monitored
Managed
Optimized
• No resources• No reporting• Out of the box
features
• Adhoc processes• Part time team• Static dictionaries• Individual led innovation
• Defined team• Regular monitoring• Static autosuggest• Basic linguistics
• Online dashboards• Test environments• Dynamic search refinements• Targets and metrics
• A/B testing• Machine learning• External collaboration• Multiple contexts
The Sensis Search capability maturity model*Courtesy of Pete Crawford & Craig Lonsdale
Lvl 5
Lvl 4
Lvl 3
Lvl 2
Lvl 1
Context is key
Intent• Name• Type• Product• Spatial
LocationLocationLocationLocation
ChronologyChronology
Social GraphSocial Graph
IndividualIndividual
DeviceDevice
Historical search Data
MongoDB
Business Data
Geo Service
Index
Name Query Handler
Type Query Handler
Business Data
Search Service
Reporting Service
Reporting Events
Publisher
Solr
API
Ontologies
Mashery
Our architecture
Historical search Data
MongoDB
Business Data
Geo Service
Index
Name Query Handler
Type Query Handler
Business Data
Search Service
Reporting Service
Reporting Events
Publisher
Solr
API
Ontologies
Mashery
Data staging
Historical search Data
MongoDB
Business Data
Geo Service
Index
Name Query Handler
Type Query Handler
Business Data
Search Service
Reporting Service
Reporting Events
Publisher
Solr
API
Ontologies
Mashery
Search
Historical search Data
MongoDB
Business Data
Geo Service
Index
Name Query Handler
Type Query Handler
Business Data
Search Service
Reporting Service
Reporting Events
Publisher
Solr
API
Ontologies
Mashery
API
Historical search Data
MongoDB
Business Data
Geo Service
Index
Name Query Handler
Type Query Handler
Business Data
Search Service
Reporting Service
Reporting Events
Publisher
Solr
API
Ontologies
Mashery
API proxy
• Moved from a black box solution to a manageable platform
• Deliver search improvements without major code changes
• Understand how results were calculated
• Identity problems scientifically
• Continuously tune and test relevance
Evolution of search management
Yesterday Today Tomorrow
Problem spaces, quality management & tuning
Path Analysis used to identify problems spaces
Problem spaces, quality management & tuning
“Gold Sets” used to define overall quality score (TREC)
Features signed off only when they make a positive impact to quality score
Specific gold sets for each problem space:
Intent Spelling & stemming Location Phrase parsing
Search quality analysis and testing
Results examiner
Score analysis
Tuning
Lather, rinse, repeat
Hurdles along the way
• Data redundancy and homogeneity • Solr ranking of rare terms • Intent differentiation• Contextual synonyms
Where next?
• Query engine• Facets / autosuggest• Real time tuning• Machine learning• Multi term queries• Scoring thresholds• Content Value