Upload
lucenerevolution
View
901
Download
3
Embed Size (px)
DESCRIPTION
Presented by Ben Brown, Software Architect, Cerner Corporation Our team made their first foray into Solr building out Chart Search, an offering on top of Cerner's primary EMR to help make search over a patient's chart smarter and easier. After bringing on over 100 client hospitals and indexing many tens of billions of clinical documents and discrete results we've (thankfully) learned a couple of things. The traditional hashed document ID over many shards and no easily accessible source of truth doesn't make for a flexible index. Learn the finer points of the strategy where we shifted our source of truth to HBase. How we deploy new indexes with the click of a button, take an existing index and expand the number of shards on the fly, and several other fancy features we enabled.
Citation preview
Brahe - Flexible Indexing At Scale
Ben Brown Software Architect, Cerner Corporation
Who I Am
• Ben Brown
Software Architect
• Cerner
Healthcare IT Company
• Semantic Solutions
Team of 10
Search Services
Fun Stuff
NLP, Medical Ontologies, ML
Chart Search
Taking This
Photo: http://bit.ly/Y7kTJt
Chart Search
Turning it into this
Chart Search Does
• Faceting
• NLP
• Semantic Concept Markup
Makes for a heavy record
(Especially on Solr 1.4)
Where We Started
Started Major Engineering in 2009
IBM Dev Works: http://ibm.co/14ZrtqX
Where We Started
Started Major Engineering in 2009
IBM Dev Works: http://ibm.co/14ZrtqX
Scale
• Clusters partitioned by client
• Raw and processed data in HDFS
• All processing & indexing done through map
reduce
Shard Size
Limiting Factor ~26 Million Discrete Results Per
Shard
Average of 35 Shards Per Client
Range 5 to 140
Query Touch Points
Query Touch Points
One User Action ~ 4 Queries
35 Shards - 432 Touch Points
140 Shards - 1692 Touch Points
• Works, but not efficient
• Chance for variance killing performance
• Failure is a massive config headache
Growth
• Hashed ID does not play well with resizing
• Deploy Again
• Reindex Everything
Document Hash modulo Shard Count
Doc One:Hash(abc123) = 15
Doc Two: Hash(efg456) = 8
Doc Three: Hash(hij789) = 7
3 Shards
Doc One -> Shard 0
Doc Two -> Shard 2
Doc Three -> Shard 1
4 Shards
Doc One -> Shard 3
Doc Two -> Shard 0
Doc Three -> Shard 3
We Have a Problem
Painful Growth
Lots of Deploys
Variance Risk
Image: http://bit.ly/Y7oBD6
What Would Be Better?
Load Balance at the Client
Automated Failover
Easy Deployments
Simplified Splitting
Minimized Touch Points
Disconnected Stages
Solution
Shift Master to HBase
Image: http://bit.ly/ZXO2na
Why HBase?
Lexically organized keys
Efficient key range scans
Efficient time based scans
We're pretty good at operating it
Coordinate With ZooKeeper
|-- Index name
|-- Version
|-- Solr Schema/Config
|-- Table Name + Connection Info
|-- Shard Number
|-- Shard Boundary Info
|-- Replica Number
|-- Ephemeral Claim
|-- Solr Connection Info
|-- Ephemeral Online
Custom Core Admin
Work with ZooKeeper for claim process
Creates solr core after claims
Controls pulling data from HBase
Claim Process
Claim Process
Claim Process
Image: http://bit.ly/Or317R
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Coordinate With ZooKeeper
|-- Index name
|-- Version
|-- Solr Schema/Config
|-- Table Name + Connection Info
|-- Shard Number
|-- Shard Boundary Info
|-- Replica Number
|-- Ephemeral Claim
|-- Solr Connection Info
|-- Ephemeral Online
Queries
• Client inspects ZooKeeper
• Finds online nodes o Only for the keyspace it cares about
o Issues distributed queries if necessary
• Balances in the Client
• Retries if queries fail
Ends Thoughts
• Keep things simple
• Disconnect your stages
• Keep your touchpoints at a minimum
• Organize your data around your queries
• Use what you’re good at
CONTACT
Ben Brown
http://linkd.in/ZZIBK4
@b_brown
ENGINEERING BLOG
https://engineering.cerner.com/
WE’RE HIRING!
http://www.cerner.com/About_Cerner/Careers/
Bonus Slides!