View
8
Download
0
Category
Preview:
DESCRIPTION
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 In this case study I'll discuss architectural lessons learned from refactoring an existing REST-API backed by Apache Solr. The initial goal of the refactoring was to speed up data access while scaling from 5m documents to 20-50m documents stored in Solr. Under consideration was the hosting infrastructure, the REST API Java code and the Solr documents and configuration. In this talk I'll give a brief review of the results. "Pimping" the Solr configuration, the client access and the document structure achieved better results. But the elementary lesson learned was, that a significant increase of data access speed can only be realized with a functional redesign and a simplification of the REST API. NO CAPS ON CORES & SHARDS) I'll explain how this led us directly to distinct Solr cores and why we dropped the introduction of Solr shards or a breathing cloud infrastructure.
Citation preview
Architectural lessons learned from refactoring a Solr based API application.
Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011
Contents
Shopping24 and it‘s API
Technical scaling solutions
ShardingCachingSolr Cores„Elastic“ infrastructure
business requirements as key factor
@tboeghk
Software- and systems- architect2 years experience with Solr3 years experience with Lucene
Team of 7 Java developers currently at Shopping24
shopping24 internet group
1 portal became n portals
30 partner shops became 700
500k to 7m documents
index fact time
•16 Gig Data•Single-Core-Layout•Up to 17s response time•Machine size limited•Stalled at solr version 1.4•API designed for small tools
scaling goal:15-50m documents
ask the nerds
„Shard!“ That‘ll be fun!
„Use spare compute cores at Amazon?“
breathe load into the cloud
„Reduce that index size“
„Get rid of those long running queries!“
data sharding ...
... is highly effective.
125ms
250ms
375ms
500ms
1 4 8 12 16 20
1shard 2shard 3shard4shard 6shard 8shard
concurrent requests
Sharding: size matters
the bigger your index gets, the more complex your
queries are, the more concurrent requests,
the more sharding you need
but wait ...
Why do we have such a big index?
7m documents vs. 2m active poducts
fashionproduct
lifecyclemeets SEO
Bastografie / photocase.com
Separation of duties! Remove unsearchable data from your index.
Why do we have complex queries?
A Solr index designed for 1 portal
Grown into a multi-portal index
Let “sharding“ follow your data ...
... and build separate cores for every client.
Duplicate data as long as access is fast.
andybahn / photocase.com
Streamline your index provisioning
process.
A thousand splendid cores at your fingertips.
Throwing hardware at problems. Automated.
evil traps: latency, $$
mirror your complete system – solve load balancer problems
froodmat / photocase.com
I said faster!
use a cache layerlike Varnish.
What about those complex queries? Why do we have them? And how do we get
rid of them?
Lost in encapsulation: Solr API exposed to world.
What‘s the key factor?
look at your business requirements
decrease complexity
Questions? Comments? Ideas?
Twitter: @tboeghkGithub: @tboeghkEmail: torsten.koester@s24.com
Web: http://www.s24.com
Images: sxc.hu (unless noted otherwise)
Recommended