Nitin Sharma - Data Infrastructure EngineerJorge Rodriguez - Data Infrastructure Engineer
Data Infrastructure Scaling at BloomReach
AbstractScaling data platforms for serving hundreds of millions of documents with low latency and high throughput workloads at an optimized cost is an extremely hard problem.
At BloomReach, we have implemented BC2, an elastic infrastructure for big data applications that
1. Supports heterogeneous workloads while hosted in the cloud.
2. Dynamically grows/shrinks search servers to provide application and pipeline level isolation, NRT search and indexing.
3. Offers latency guarantees and application-specific performance tuning.
4. Provides high-availability features like cluster replacement, cross-data center support, disaster recovery etc.
Agenda• Data Infrastructure V1• Scaling Challenges
– Cassandra– Solr
• Elastic Data Infrastructure– Cassandra– Solr
• Questions?
Nitin works on search platform scaling for BloomReach’s big data. His relevant experience and background includes scaling real-time services for latency sensitive applications and building performance and search-quality metrics infrastructure for personalization platforms.
BloomReach has developed a personalized discovery platform that features applications that analyze big data to makes our customers’ digital content more discoverable, relevant and profitable.
Jorge works on cassandra db platform scaling for BloomReach’s big data. Previously he worked on our organic search applications and customer integration infrastructure. Prior to BloomReach, Jorge also worked on an eCommerce platform.
About Us
The BloomReach Personalized
Discovery Platform
BloomReach’s Applications
Organic Search
Con
ten
t u
nd
ers
tan
din
g
What it does
Content optimization, management and measurement
Benefit
Enhanced discoverability and customer acquisition in organic search
What it does
Personalized onsite search and navigation across devices
Benefit
Relevant and consistent onsite experiences for new and known users
What it does
Merchandising tool that under-stands products and identifies
opportunities
Benefit
Prioritize and optimize online merchandising
SNAP
Compass
Data Infrastructure• Cassandra Database• SOLR Reverse Index• Write Heavy MapReduce Jobs• Read/Scan Heavy MapReduce Jobs (Analytics and
ETL)• Large Scale Indexers• RT API’s
Data Infrastructure V1
SOLR
C* Frontend DC
C* Backend DC
Read PipelineWrite Pipelines
Write Pipelines
Read Pipeline
Read Pipeline
APIAPI
API
Write Pipelines
Cassandra
Cassandra: How we startedFr
on
ten
d A
pp
licati
on
s
Cassandra Cluster
FrontendDC
BackendDC
EM
R Job
s
Fixed Resource Issue
Cassandra Cluster
Backend
DC
EM
R Job
s
EM
R Job
s
EM
R Job
s
EM
R Job
s
EM
R Job
s
EM
R Job
s
Frontend DC Spillover
reads
Starvation Issue
BackendDC
Large EMR Jobs with
relaxed SLA
Small EMR job
with tighter SLA
Frontend Latencies vs Replication LoadFr
on
ten
d A
pp
licati
on
s
Cassandra Cluster
Frontend
DC
Backend
DC EM
R Job
s
Stabilizing Cassandra: Rate LimiterFr
on
ten
d A
pp
licati
on
s
Cassandra Cluster
Frontend
DC
BackendDC E
MR
Job
s
Token Server (Redis)
Cost of Rate Limiter• We converted EMR from an elastic resource to
a fixed resource• To scale EMR we have to scale Cassandra• Adding capacity to Cassandra cluster is not
trivial• Adding capacity under heavy load is harder• Auto scaling and reducing under heavy load is
even harder
SOLR
BloomReach Search Architecture
Solr Cluster
Zookeeper Ensemble
Map Reduce Pipelines (Reads)
Indexing Pipelines Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Heavy Load
Moderate Load
Light Load
Legend
Public API
Search Traffic
Search Traffic
Throughput Issues…
Solr Cluster
Zookeeper Ensemble
Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Public API
Search Traffic
● Heterogeneous read workload
● Same collection - different pipelines, different query patterns
● Cache tuning is virtually impossible
● Larger pipeline starving the small ones
● Machine utilization determines throughput and stability of a pipeline at any point
● No isolation among jobs
Stability and Uptime Issues…
Solr Cluster
Zookeeper Ensemble
Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Public API
Search Traffic
● Bad clients – bring down the cluster/degrade performance
● Bad queries (with heavy load) – render nodes unresponsive
● Garbage collection issues
● ZK stability issues (as we scale collections)
● Higher number of concurrent pipelines, higher number of issues
Indexing Issues…
Solr Cluster
Zookeeper Ensemble
Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Public API
Search Traffic
● Commit frequencies vary with indexer types
● Indexer run during another pipeline – performance
● Indexer client leaks
● Too many stored fields
● Non-batch updates
Rethinking…• Shared cluster for pipelines does not scale.
• Every job runs great in isolation. When you put them together, they choke.
• Running index-heavy load and read-heavy load simultaneously - cluster performance issues.
• Any direct access to production cluster – cluster stability (client leaks, bad queries etc.).
• Dynamic way of scaling collections (SOLR) – increase/decrease replicas on the fly to help pipelines finish faster.
• What if every pipeline had its own cluster?
• Elastic Infrastructure – Provision Clusters on demand, on-the-fly.
• Create, Use, Terminate Model - Create a temporary cluster with necessary data, use it and throw it away.
• Technologies behind BC2 (built in House)
• Cluster Management - Dynamic cluster provisioning and resource allocation.
• Solr HAFT – High availability and data management library for SolrCloud.
• Cassandra Replication Service – Replicating Cassandra Data to elastic clusters on demand.
• Isolation - Pipelines get their own cluster. One cannot disrupt another.
• Dynamic Scaling – Every pipeline can state its own replication requirements.
• Production Safeguard - No direct access. Safeguards from bad clients/access patterns.
• Cost Saving – Provision for the average; withstand peak with elastic growth.
BloomStore Compute Cloud (BC2)
SOLR Scaling with BC2
SOLR on BC2
Solr Cluster
Zookeeper Ensemble
Pipeline 1
BC2 API
Solr Cluster Collection A Replicas: 6
1. Read pipeline requests collection and desired replicas from SC2 API.
2. SC2 API provisions cluster dynamically with needed setup (and streams Solr data).
3. SC2 calls HAFT service to replicate data from production to provisioned cluster.
4. Pipeline uses this cluster to run job.
1
4
Request: {Collection: A, Replica: 6}
2
Solr HAFT
Service
3
3
Read
Replicate
SOLR on BC2 …
Solr Cluster
Zookeeper Ensemble
Pipeline 1
BC2 API
Solr Cluster Collection A Replicas: 6
1. Pipeline finishes running the job.
2. Pipeline calls SC2 API to terminate the cluster.
3. SC2 terminates the cluster.
2Terminate: {Cluster}
3
Solr HAFT
Service
1
SOLR on BC2– Read View
Zookeeper Ensemble
Pipeline 1
BC2 API
Solr Cluster Collection A Replicas: 6
Request: {Collection: A, Replica: 6}
Pipeline 2Solr Cluster Collection B Replicas: 2
Request: {Collection: B, Replica: 2}
Pipeline nSolr Cluster Collection CReplicas: 1
Request: {Collection: C, Replica: 1}
Solr HAFT
Service
Production Solr Cluster
SOLR on BC2– Indexing
Production Solr Cluster
Zookeeper Ensemble
Indexing
BC2 API
Solr Cluster Collection A Replicas: 6
1. Read pipeline requests collection and desired replicas from SC2 API.
2. SC2 API provisions cluster dynamically with needed setup (and streams Solr data).
3. Indexer uses this cluster to index the data.
4. Indexer calls HAFT service to replicate the index from dynamic cluster to production.
5. HAFT service reads data from dynamic cluster and replicates to production Solr.
1
3
Request: {Collection: A, Replica: 2}
2
Replicate
Solr HAFT Service
4
5Read
SOLR on BC2– Global View
Zookeeper Ensemble
BC2 API
Solr HAFT Service
Production Solr Cluster
Indexing Pipelines 1
Elastic Clusters
Read Pipelines 1
Read Pipelines n
Indexing Pipelines n
Provision: {Cluster}
Terminate: {Cluster}
Replicate Index
Replicate Index
Run Job
Cassandra Scaling with BC2
Cassandra BC2 Diagram
Source Cluster
BC2 API
On-demand cluster
On-demand cluster
On-demand cluster
On-demand cluster
On-demand cluster E
MR
Job
s
How Cassandra Replication Works
Source Cluster Destination
Cluster
SSTable file copy SSTable split
computation
Cassandra from Gains BC2• Very high throughput in moving raw data from source to destination cluster (10 X increase
network usage compared to normal)• Little CPU/Memory load on the source cluster• Time to scale varies between 10 minutes to 40 minutes• API driven so automatically scales up and down with demand • Application agnostic• Allows use of AWS spot instances and optimize instance choice around current spot instance
pricing.• Removes scan/read load from backend cluster
Write Throughput• Write capacity still defined by frontend latencies
– Compute delta changes, as most of our data does not change.– Add more frontend nodes– Experimental changes:
• Prioritize reads over writes in frontend DC.• Column level replication – filter mutations to frontend DC by
removing columns not needed in frontend view.
BC2 vs Non-BC2Property Non-BC2 BC2
Linear Scalability for Heterogeneous Workload
Pipeline Level Isolation
Dynamic Collection Scaling
Prevention from Bad Clients
Pipeline Specific Performance
No Direct Access to Production Cluster
Can Sleep at night?
SOLR HAFT Service1. High availability and fault tolerance2. Home-grown technology 3. Features
• One push disaster recovery • High availability operations
• Replace node• Add replicas• Repair collection• Collection versioning
• Cluster backup operations• Dynamic replica creation• Cluster clone• Cluster swap• Cluster state reconstruction
Solr HAFT Service
Clone Alias
Clone Collections
Custom Commit Node Replacement
Node Repair
Clone Cluster
Collection Versioning
Black Box Recording
Lucene Segment Optimize
Index Management Actions
High Availability Actions
Cluster Backup Operations
Solr MetadataZookeeper Metadata
Verification Monitoring
Solr HAFT Service – Functional View
Dynamic Replica Creation
Cluster Clone
Cluster Swap
Cluster State Reconstruction
Solr Disaster Recovery in New Architecture
Old Production Solr Cluster
Zookeeper Ensemble
New Solr Cluster
Zookeeper Ensemble
Solr HAFT Service
Push Button
Recovery
Brave Soul on Pager Duty
1
2
DNS
3
1. Guy on Pager clicks the recovery button
2. Solr HAFT Service triggers
Cluster Setup
State Reconstruction
Cluster Clone
Cluster Swap 3. Production DNS – New
Cluster
BC2 vs Non-BC2 (Availability Features)
Property Non-BC2 BC2
Cross Data-Center Support
Cluster Cloning
Collection Versioning
One-Push Disaster Recovery
Repair API for Nodes/Collections
Solr Node Replacements
V2 Architecture
SOLR
C* Frontend DC
C* Backend DC
Write Pipelines Read
Pipeline
API APIAPI
On-demand cluster
On-demand cluster
On-demand cluster
HAFT SERVICE
Write-Back
Replication
Rate Limiter
BC2 API
Questions ???
Questions?Thank You!
Nitin [email protected]://www.linkedin.com/in/knitinsharma
Jorge [email protected]://www.linkedin.com/pub/jorge-rodriguez/5/559/12b