Using Solr in Online Travel to
Improve User Experience
Sudhakar Karegowdra, Esteban Donato
Travelocity, May 25TH 2011
{ sudhakar.karegowdra, esteban.donato}@travelocity.com
What We Will Cover
Travelocity
Speakers Background
Merchandising & Solr• Challenges
• Solution
• Sizing and performance data
• Take Away
Location Resolution & Solr• Challenges
• Solution
• Sizing and performance data
• Take Away
Q&A3
First Online Travel Agency(OTA) Launched in 1996
Grown to 3,000 employees and is one of the largest
travel agencies worldwide
Headquartered in Dallas/Fort Worth with satellite
offices in San Francisco, New York, London,
Singapore, Bangalore, Buenos Aires to name a few
In 2004, the Roaming Gnome became the
centerpiece of marketing efforts and has become an
international pop icon
Owned by Sabre Holdings - sister companies include
Travelocity Business, IgoUgo.com, lastminute.com,
Zuji among others
4
Speakers Background
Esteban Donato
• Lead Architect
Travelocity.com
My experience
– 10 + years
– Solr 2 years
– Analyzing Mahout and
Carrot2 for document
clustering engine.
Topic :
Location Resolution
5
Sudhakar Karegowdra
• Principal Architect
Travelocity.com
My experience
– 13 + years
– Solr/ Lucene 3 years
– Implementing Hadoop,
Pig and Hive for Data
warehouse.
Topic :
Merchandising
6
Merchandising By Sudhakar Karegowdra
The Challenge
Market Drivers
• Build Landing Pages with Faceted Navigation
• Enable Content Segmentation and delivery
• Support Roll out of Promotions
• Roll up Data to a higher level
E.g., All 5 star hotels in California to bring all the 5 Star
hotels from SFO,LAX, SAN etc.,
• Faster time to market new Ideas
• Rapidly scale to accommodate global brands
with disparate data sources
7
The Challenge
Traditional Database approach
• Higher time to market
• Specialized skill set to design and optimize
database structures and queries
• Aggregation of data and changing of structures
quite complex
• Building Faceted navigation capabilities needs
complex logic leading to high maintenance cost
8
Solution - Overview
Data from various sources aggregated and
ingested into Solr
• Core per Locale and Product Type
Wrapper service to combine some data across
product cores and manage configuration rules
Solr’s built in Search and Faceting to power the
navigation
9
Solution – Architecture View
10
Solr Master (Multi Core)
Oracle
Offer
Management
ToolETL
Services/Business Logic
UI Widgets Mobile
Deals Products ……
Solr Slaves (Multi Core)
Solution - Achievements
Millions of unique Long Tail Landing Pages E.g., http://www.travelocity.com/hotel-d4980-nevada-las-
vegas-hotels_5-star_business-center_green
Faster search across products E.g., Beach Deals under $500
Segmented Content delivery through tagging
Scaled well to distribute the content to different
brands, partners and advertisers
Opened up for other innovative applications Deals on Map, Deals on Mobile, Wizards etc.,
11
Solution – Road Ahead
Migration to Solr 3.1
• Geo spatial search
• CSV out put format
Query boosting by Search pattern
Near Real time Updates
Deal and user behavior mining in Hadoop –
MapReduce and Solr to Serve the Content
Move Slaves to Cloud
12
Sizing & Performance
Index Stats Number of Cores : 25
Number of Documents : ~ 1 Million Records
Response Requests : 70 tps
Average response time : 0.005 seconds (5 ms)
Software Versions Solr Version 1.4.0
– filterCache size : 30000
Tomcat – 5.5.9
JDK1.6
13
Take Away
Semi Structured Storage in Solr helps
aggregate disparate sources easilyRemember Dynamic fields
Multiple Cores to manage multiple locale data
Solr is a great enabler of “Innovations”
14
15
Location ResolutionBy Esteban Donato
The Challenge
How to develop a global location resolution
service?
Flexibility to changes
General enough to cover everyone needs
Multi language
Performance and scalability
Configurable by site
16
Architecture of the solution
17
Location DB
Solr Master
Solr Slave
Management
Tool
Auto-complete
Resolution
Batch Job
Remote Streaming indexing
CSV format
Master/Slave architecture
Multi-core: each core
represents a language
SolrJ client binary format
Solr response cache
Auto-complete
System has to suggest options as the users
type their desired location
Examples “san” => San Francisco, “veg” =>
Las Vegas
Relevancy: not all the locations are equally
important. “par” => “Paris, France”; “Parana,
Argentina”
Users can search by various fields: location
code, location name, city code, city name,
state/province code, state province name,
country code, country name.
18
Solr schema<dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" />
<field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true"
stored="false" multiValued="true" />
<fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“>
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="[/\-\t ]+" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.ISOLatin1AccentFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""
replace="all"/>
</analyzer>
</fieldType>
19
Resolution
System has to resolve the location requested
by the users.
Contemplates aliases. Big Apple => New York
Contemplates ambiguities.
Contemplates misspellings. Lomdon => London NGramDistance algorithm.
How to combine distance with relevancy
Error suggesting the correct location when it is a prefix.
Lond => London
20
Spellchecker configuration<fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“>
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory” />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.ISOLatin1AccentFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""
replace="all"/>
</analyzer>
</fieldType>
21
Sizing & Performance
4 cores with ~ 500,000 documents indexed
each
Response times
• Auto-complete: 15ms, 20 TPS
• Resolution: 10ms, 2 TPS
Cache configuration
• queryResultCache: maxSize=1024
• documentCache, maxSize=1024
• fieldValueCache & filterCache disabled
22
Wrap Up
Performance always as top priority
Develop simple but robust services
Provide a simple API
23
Q&A
24
Contact
Esteban Donato
• Twitter: @eddonato
Sudhakar Karegowdra
• Twitter: @skaregowdra
https://www.facebook.com/travelocity
Twitter: @travelocity and
@RoamingGnome
25