Upload
lucidworks
View
142
Download
1
Embed Size (px)
Citation preview
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
Building a Vibrant Search Ecosystem @ Bloomberg
Steven Bower & Ken LaPorte
Copyright 2016 Bloomberg Finance L.P. All rights reserved.
3
01 Bloomberg • Largest provider of financial news and information • Our strength is quickly and accurately delivering data, news and analytics • Creating high performance and accurate information retrieval systems is core to our
strength
4
02 Why are we giving this talk?
5
01 What came before…
• Search has been around for a long time at Bloomberg - Rapid delivery of product to clients - Proprietary, commercial and open-source search technologies
• Fragmented solutions - Disparate search technologies - Custom code - Deployment patterns - Lack of standards
• Costly to maintain & evolve
6
01 How We Got Started • Created a team to specialize in search • Reviewed existing applications reliant upon search • Selected a set of representative applications
- Various scales - Data types - Distinct requirements
7
01 Why Solr? • Evaluated other open source search engines
- Already used at Bloomberg • Large community & widely used • Established & growing feature set • Scalable • Committed to open source
- Ability to contribute to core engine - Ability to fix bugs ourselves - Contributions in almost every Solr release since 4.5.0 - 3 Solr committers at the company
8
01 Search as a service • Designed platform with application teams • Middleware service to wrap Solr
- Familiar & lightweight interface - Simplified APIs - Insulate clients from changes in Solr
• Pass-thru capability • Basic monitoring/metrics
9
01 Open for business!
• Hundreds of search applications - Diverse use cases and scale - Displaced other technologies
• >10 Billion documents • >10 Million new documents daily • >4000 Solr instances • >100s of servers • >2,000 of queries per second • Mission critical to Bloomberg and the financial markets
0
50
100
150
200
250
300
2012
Num
ber
of C
olle
ctio
ns
Time
Number of Collections over Time
2016
10
01 What have we done?! • Human scaling • Ineffective Alarming • Manual build process
- Limited automated testing • Configuration Management • Lots of known unknowns
11
01 Challenge: EcoSystem
• Ownership - Where’s the line?
• Planning for scale • Education
- Search != Database - Data types (text parsing) - Relevance - Features
12
01 Solution: Ecosystem • Survey
- Understand business requirements - Identify scale and complexity - Assist with schema and query design - Concerns
• Develop & Test - Best practices - Documentation & code samples - Office hours & support chat - Community development
13
01 Solution: Ecosystem • Validate & Deploy
- Hardware provisioning - Automated deployments - Hot & cold collections - Load testing
• Maintain and Grow - Applications change & grow - Solr & platform upgrades - Monitoring
14
01 Challenge: Monitoring Solr • Very large monitoring footprint • What should we monitor?
- Ping - Cluster state - Process state - Server health
• False alarms - Flutter - Solr can lie to you! (SOLR-8599)
• Many different ways to view system health - Different people care about different things - Active vs Forensic
15
01 Solution: Monitoring Solr • Monitor via multiple mechanisms • Aggregate events
- Alarm on multiple signals - Delay alarms
• Niteowl - Solr / ZooKeeper / Generic - Distributed / Scalable - Events indexed into Solr
• Led to massive stability improvements
16
01 What We Found • Long Garbage Collections
- Profiler interactions with Mmap - Young generation pressure during ingest - Use G1GC / Keep heap small
• Long Recovery Times - Transaction logs don’t hold enough - Always doing full replications when under ingest load
• Solr Bugs • Out of Memory Exceptions
- One off OOMs are not uncommon - Use DocValues! - OOM Killer
SOLR-9310SOLR-9207SOLR-9506
Long recovery times
SOLR-6931 Random connection reset issues
SOLR-8085 Replicas get out of sync
SOLR-8599 ZooKeeper client in inconsistent state
17
01 Challenge: Configuration Management
• Deployment process • Requires versioning / rollback
- Some changes cannot be rolled back • Template driven configuration
- Good for simple things - Doesn’t scale for complex collections
• Lack of provenance
18
01 Solution: Configuration Management • Convert to SDLC process
- Configurations live in Git repository - Solr extensions linked as dependencies - Built with Maven / Jenkins - Published to artifact repository
• Validation of configurations during build - Static Analysis
• Allowed schema changes • Access control of solr configuration
- Integration testing
• Deployed to ZooKeeper / Solr
19
01 Challenge: Infrastructure • Substantial demand • Large lead times • Differing requirements
- Security - Scale - Control
• Too many pets!
20
01 Solution: Infrastructure • Streamlined process • Shared and dedicated resources • Built from the ground up
- Well defined layers of abstraction - Cattle not pets - Infrastructure-as-code - SDLC / provenance
• Better hardware == better experience - SSDs - More RAM - Faster network
Hardware / OS
Control Plane
Applications
APIs
21
01 What’s next? • Containerization
- Simplify / decentralize operational procedures - Local testing and development - Security / Metrics / QoS
• Delegation of control - Mute / Direct alarms to tenants - Tenant managed
• Detect failures before they happen - Heuristics / ML models
• Solr - More work on streaming - Analytics
• distributed analytics • pivot faceting
Building a Vibrant Search Ecosystem @ Bloomberg
QUESTIONS?
Steven Bower [email protected]
Ken LaPorte [email protected]