Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Magento Expert Consulting Group Webinar | July 31, 2013
Thinking Beyond Search with Solr Understanding How Solr Can Help Your Business Scale
Udi Shamay Head, Expert Consulting Group [email protected]
Steve Kukla Business Solution Architect, Expert Consulting Group [email protected]
Kirill Morozov Application Architect, Expert Consulting Group [email protected]
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 2
The presenters Magento Expert Consulting Group
What is Apache Solr?
Business Use Cases for Scale Supporting Initial Catalog Growth Supporting Growing Traffic Supporting Substantial Catalog Growth Supporting A Real-Time Catalog
Key Points to Remember
Q&A
Today’s agenda
July 31, 2013 | 3 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
July 31, 2013 | 4 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Solr • Separate application – installed on its own server, or
on an existing server in the environment depending on
business needs.
• Solr uses schema configuration files which can be
found in Magentto/lib/Apache
• Magento communicates with Solr via HTTP/XML
• Searching options configured via the Magento admin
panel
July 31, 2013 | 5 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr? General Solr Overview
Better text-based searching provides a better customer experience • More relevant “fuzzy” searching*
• Faceted searches
• Search corrections
• Out of the box type-ahead*
• Response caching for better performance
July 31, 2013 | 6
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
*Requires customization to leverage at 100%
What is Apache Solr? Solr the Search Platform
Solr is more than a search engine because… • Most data customers see is handled by
Solr instead of MySQL
July 31, 2013 | 7 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr? What Makes Solr Powerful
Solr is more than a search engine because… • Most data customers see is handled by
Solr instead of MySQL
• Solr uses a simpler data structure
July 31, 2013 | 8 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr? What Makes Solr Powerful
product_id attribute_id product_id attribute_name
attribute_id product_id attribute_value
product_id attribute_name attribute_value
MySQL (EAV)
Solr (No EAV)
Solr is more than a search engine because… • Most data customers see is handled by
Solr instead of MySQL
• Solr uses a simpler data structure
• Solr supports replication which allows it to
truly scale for growth
July 31, 2013 | 9 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr? What Makes Solr Powerful
Solr Solr
Solr Solr
Solr
Magento
Supporting Initial Catalog Growth
July 31, 2013 | 10 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background • Growing catalog – from 10K to 100K SKUs
• From 1 to 2 stores
• From 1 to 2+ web nodes / 1 database node
• Using native Solr Search
July 31, 2013 | 11 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problems • Increased indexing time
• Out-dated information on the front-end
Business Use Case Supporting Initial Catalog Growth
Supporting Initial Catalog Growth Problem – Increasing Index Footprint
*Expected indexing time
July 31, 2013 | 12
35 Min* 17.5
min* 3.5 min*
Year 2 2 websites 2 store views
17.5 min*
10 Min*
1.75 Min*
Control 1 website 1 store view
10,000 SKUs
50,000 SKUs
100,000 SKUs
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Slow Indexing
July 31, 2013 | 13
Concept
• Connects to the database using JDBC
• Extra data transformations must be
written in Java/JavaScript.
• Uses a prepared xml configuration
Supporting Initial Catalog Growth Solution – Custom Data Import Handler
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Results • 10 times faster indexing
• Supports delta-indexing
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 14
Supporting Initial Catalog Growth Data Import Handler – Results
Things to keep in mind • Solr knows about its data source
• May require extra development efforts
• Extra data transformations must be
written in Java/JavaScript
Supporting Growing Traffic
July 31, 2013 | 15 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background • Growing catalog – 1,000,000 SKUs
• Growing traffic: up to 100 requests / second
• 3 stores
• 3+ web nodes/ 1 database node
• Using Data Import Handler
July 31, 2013 | 16 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problem • Solr can’t handle increasing user
concurrency
Business Use Case Supporting Growing Traffic
47.5 Min* 23.75
min*
35 min*
17.5 Min*
3.5 Min*
Control 2 website 2 store view
500,000 SKUs
1,000,000 SKUs
*Expected indexing time
July 31, 2013 | 17
4.75 min*
Year 3 3 websites 3 store views
100,000 SKUs
< 1000 updates/sec
Indexing delta data handles
updates
Supporting Growing Traffic Increasing Index Footprint – OK
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
120 msec* 100
msec* 80 msec*
Year 3 3 websites 3 store views
105 msec*
95 msec*
75 msec*
Control 2 website 2 store view
100,000 SKUs 30 RPS
500,000 SKUs 60 RPS
1,000,000 SKUs 100 RPS
*Expected average response time
July 31, 2013 | 18
Solr CPU is maxed
out
Supporting Growing Traffic Problem – Increased Response Time
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
July 31, 2013 | 19
Supporting Growing Traffic Solution – Solr Replication
Concept • Separate reading requests
• Replicate index across multiple nodes
• Read from multiple servers
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Results • Allows Solr to handle read traffic
• Introduces fail-over
Things to keep in mind • Requires middle-ware or Magento customization
• Possible heavy data duplication
• Extra changes in infrastructure
July 31, 2013 | 20
Supporting Initial Catalog Growth Solr Replication – Results
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting Substantial Catalog Growth
July 31, 2013 | 21 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background • Growing catalog – 5,000,000 SKUs
• 4 stores
• 4+ web nodes / 1 database node
• Using Data Import Handler
• Using Solr replication
July 31, 2013 | 22 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problems • Delta-indexing delays
• Slow response time
Business Use Case Supporting Substantial Catalog Growth
317.5 Min* 158.75
min*
237.5 min*
118.75 Min*
47.5 Min*
Control 3 website 3 store view
2,500,000 SKUs
5,000,000 SKUs
*Expected indexing time
July 31, 2013 | 23
63.5 min*
Year 4 4 websites 4 store views
1,000,000 SKUs
> 1000 updates/sec
Supporting Substantial Catalog Growth Problem – Increasing Index Footprint
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Delta indexing delays
400 msec* 270
msec* 150 msec*
Year 4 4 websites 4 store views
300 msec*
230 msec*
120 msec*
Control 3 website 3 store view
1,000,000 SKUs 100 RPS
2,500,000 SKUs 200 RPS
5,000,000 SKUs 400 RPS
*Expected average response time
July 31, 2013 | 24
Slow response
time
Supporting Substantial Catalog Growth Problem – Increased Response Time
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
July 31, 2013 | 25
Concept
• Distributed search
• Distributed + Replication
(SolrCloud)
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting Substantial Catalog Growth Solution – Index Sharding
Results • Distributed search for faster response time
• 50 times faster indexing with 5 shards
Supporting Growing Traffic Index Sharding – Results
July 31, 2013 | 26
MySQL A B C
I D H
F G E
Magento
D E F
G H I Solr Shards
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Things to keep in mind… • Custom solution
• Requires Magento customization or
middleware introduction
• Extra changes in infrastructure
Supporting A Real-Time Catalog
July 31, 2013 | 27 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background • Growing catalog – 10,000,000 SKUs
• 5 stores
• 5+ web nodes / 1 database node
• Data Import Handler
• SolrCloud and distributed search
July 31, 2013 | 28 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Requirement • Always up-to-date index
Business Use Case Supporting A Real-Time Catalog
Supporting A Real-Time Catalog Solution – Listen To The MySQL Bin Log
July 31, 2013 | 29
Concept • Connect via MySql replication protocol
• Listen to data-related events
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
MySQL
MySql Slave
Replication Binlog
Supporting A Real-Time Catalog Solution – Listen To The MySQL Bin Log
July 31, 2013 | 30
Concept • Connect via MySql replication protocol
• Listen to data-related events
• Extract information from events
• Manipulate with document in Lucene index
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
MySQL
Solr
Log
Parser
Replication Listener Binlog
Results • Replication-like connection • Indexes are always up-to-date Things to keep in mind • Relatively complex implementation
July 31, 2013 | 31
Magento
MySQL
A
Solr Shards
B C
I D H
F G E
D E F
G H I
Bin log
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting A Real-Time Catalog Listening To The MySQL Bin Log – Results
Key Points to Remember
July 31, 2013 | 32 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
• Solr’s search capabilities provide a better site experience than MySQL LIKE or Full-text
• Solr is more than a search platform – it is a key for scalability and growth
• Solr’s data import handler keeps Solr performing well as your catalog grows
• Solr replication helps accommodate growing traffic
• Solr shards help keep indexing execution time and search response times low for very
large catalogs
• Listening to the MySQL bin log can help facilitate a continuously updating catalog
July 31, 2013 | 33 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Key Points to Remember Solr helps businesses scale
Scaling Solr Solr Wiki http://wiki.apache.org/solr/ Type-Ahead http://wiki.apache.org/solr/Suggester Data Import Handler(DIH) http://wiki.apache.org/solr/DataImportHandler Replication http://wiki.apache.org/solr/SolrReplication Shard http://wiki.apache.org/solr/SolrCloud Distributed Search http://wiki.apache.org/solr/DistributedSearch MySql Replication listening Change Data Capture http://www.slideshare.net/mkindahl/binary-log-api-presentation-oscon-2011 Replication Listener (C) https://launchpad.net/mysql-replication-listener Open-Replicator (Java) http://code.google.com/p/open-replicator/
References
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 34
Q&A
July 31, 2013 | 35 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Udi Shamay Head, Expert Consulting Group [email protected]
Steve Kukla Business Solution Architect, Expert Consulting Group [email protected]
Kirill Morozov Application Architect, Expert Consulting Group [email protected]
July 31, 2013 | 36
The presenters Magento Expert Consulting Group
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale