60
Brahe - Flexible Indexing At Scale Ben Brown Software Architect, Cerner Corporation

Brahe mass scale flexible indexing

Embed Size (px)

DESCRIPTION

Presented by Ben Brown, Software Architect, Cerner Corporation Our team made their first foray into Solr building out Chart Search, an offering on top of Cerner's primary EMR to help make search over a patient's chart smarter and easier. After bringing on over 100 client hospitals and indexing many tens of billions of clinical documents and discrete results we've (thankfully) learned a couple of things. The traditional hashed document ID over many shards and no easily accessible source of truth doesn't make for a flexible index. Learn the finer points of the strategy where we shifted our source of truth to HBase. How we deploy new indexes with the click of a button, take an existing index and expand the number of shards on the fly, and several other fancy features we enabled.

Citation preview

Page 1: Brahe   mass scale flexible indexing

Brahe - Flexible Indexing At Scale

Ben Brown Software Architect, Cerner Corporation

Page 2: Brahe   mass scale flexible indexing

Who I Am

• Ben Brown

Software Architect

• Cerner

Healthcare IT Company

• Semantic Solutions

Team of 10

Search Services

Fun Stuff

NLP, Medical Ontologies, ML

Page 3: Brahe   mass scale flexible indexing

Chart Search

Taking This

Photo: http://bit.ly/Y7kTJt

Page 4: Brahe   mass scale flexible indexing

Chart Search

Turning it into this

Page 5: Brahe   mass scale flexible indexing

Chart Search Does

• Faceting

• NLP

• Semantic Concept Markup

Makes for a heavy record

(Especially on Solr 1.4)

Page 6: Brahe   mass scale flexible indexing

Where We Started

Started Major Engineering in 2009

IBM Dev Works: http://ibm.co/14ZrtqX

Page 7: Brahe   mass scale flexible indexing

Where We Started

Started Major Engineering in 2009

IBM Dev Works: http://ibm.co/14ZrtqX

Page 8: Brahe   mass scale flexible indexing

Scale

• Clusters partitioned by client

• Raw and processed data in HDFS

• All processing & indexing done through map

reduce

Page 9: Brahe   mass scale flexible indexing

Shard Size

Limiting Factor ~26 Million Discrete Results Per

Shard

Average of 35 Shards Per Client

Range 5 to 140

Page 10: Brahe   mass scale flexible indexing

Query Touch Points

Page 11: Brahe   mass scale flexible indexing

Query Touch Points

One User Action ~ 4 Queries

35 Shards - 432 Touch Points

140 Shards - 1692 Touch Points

• Works, but not efficient

• Chance for variance killing performance

• Failure is a massive config headache

Page 12: Brahe   mass scale flexible indexing

Growth

• Hashed ID does not play well with resizing

• Deploy Again

• Reindex Everything

Document Hash modulo Shard Count

Doc One:Hash(abc123) = 15

Doc Two: Hash(efg456) = 8

Doc Three: Hash(hij789) = 7

3 Shards

Doc One -> Shard 0

Doc Two -> Shard 2

Doc Three -> Shard 1

4 Shards

Doc One -> Shard 3

Doc Two -> Shard 0

Doc Three -> Shard 3

Page 13: Brahe   mass scale flexible indexing

We Have a Problem

Painful Growth

Lots of Deploys

Variance Risk

Image: http://bit.ly/Y7oBD6

Page 14: Brahe   mass scale flexible indexing

What Would Be Better?

Load Balance at the Client

Automated Failover

Easy Deployments

Simplified Splitting

Minimized Touch Points

Disconnected Stages

Page 15: Brahe   mass scale flexible indexing

Solution

Shift Master to HBase

Image: http://bit.ly/ZXO2na

Page 16: Brahe   mass scale flexible indexing

Why HBase?

Lexically organized keys

Efficient key range scans

Efficient time based scans

We're pretty good at operating it

Page 17: Brahe   mass scale flexible indexing

Coordinate With ZooKeeper

|-- Index name

|-- Version

|-- Solr Schema/Config

|-- Table Name + Connection Info

|-- Shard Number

|-- Shard Boundary Info

|-- Replica Number

|-- Ephemeral Claim

|-- Solr Connection Info

|-- Ephemeral Online

Page 18: Brahe   mass scale flexible indexing

Custom Core Admin

Work with ZooKeeper for claim process

Creates solr core after claims

Controls pulling data from HBase

Page 19: Brahe   mass scale flexible indexing

Claim Process

Page 20: Brahe   mass scale flexible indexing

Claim Process

Page 21: Brahe   mass scale flexible indexing

Claim Process

Image: http://bit.ly/Or317R

Page 22: Brahe   mass scale flexible indexing

Claim Process

Page 23: Brahe   mass scale flexible indexing

Claim Process

Page 24: Brahe   mass scale flexible indexing

Claim Process

Page 25: Brahe   mass scale flexible indexing

Claim Process

Page 26: Brahe   mass scale flexible indexing

Claim Process

Page 27: Brahe   mass scale flexible indexing

Claim Process

Page 28: Brahe   mass scale flexible indexing

Claim Process

Page 29: Brahe   mass scale flexible indexing

Claim Process

Page 30: Brahe   mass scale flexible indexing

Claim Process

Page 31: Brahe   mass scale flexible indexing

Claim Process

Page 32: Brahe   mass scale flexible indexing

Claim Process

Page 33: Brahe   mass scale flexible indexing

Claim Process

Page 34: Brahe   mass scale flexible indexing

Claim Process

Page 35: Brahe   mass scale flexible indexing

Claim Process

Page 36: Brahe   mass scale flexible indexing

Claim Process

Page 37: Brahe   mass scale flexible indexing

Claim Process

Page 38: Brahe   mass scale flexible indexing

Claim Process

Page 39: Brahe   mass scale flexible indexing

Claim Process

Page 40: Brahe   mass scale flexible indexing

Claim Process

Page 41: Brahe   mass scale flexible indexing

Claim Process

Page 42: Brahe   mass scale flexible indexing

Claim Process

Page 43: Brahe   mass scale flexible indexing

Claim Process

Page 44: Brahe   mass scale flexible indexing

Claim Process

Page 45: Brahe   mass scale flexible indexing

Claim Process

Page 46: Brahe   mass scale flexible indexing

Claim Process

Page 47: Brahe   mass scale flexible indexing

Claim Process

Page 48: Brahe   mass scale flexible indexing

Claim Process

Page 49: Brahe   mass scale flexible indexing

Claim Process

Page 50: Brahe   mass scale flexible indexing

Coordinate With ZooKeeper

|-- Index name

|-- Version

|-- Solr Schema/Config

|-- Table Name + Connection Info

|-- Shard Number

|-- Shard Boundary Info

|-- Replica Number

|-- Ephemeral Claim

|-- Solr Connection Info

|-- Ephemeral Online

Page 51: Brahe   mass scale flexible indexing

Queries

• Client inspects ZooKeeper

• Finds online nodes o Only for the keyspace it cares about

o Issues distributed queries if necessary

• Balances in the Client

• Retries if queries fail

Page 52: Brahe   mass scale flexible indexing

Ends Thoughts

• Keep things simple

• Disconnect your stages

• Keep your touchpoints at a minimum

• Organize your data around your queries

• Use what you’re good at

Page 53: Brahe   mass scale flexible indexing

CONTACT

Ben Brown

http://linkd.in/ZZIBK4

@b_brown

ENGINEERING BLOG

https://engineering.cerner.com/

WE’RE HIRING!

http://www.cerner.com/About_Cerner/Careers/

Page 54: Brahe   mass scale flexible indexing

Bonus Slides!

Page 55: Brahe   mass scale flexible indexing
Page 56: Brahe   mass scale flexible indexing
Page 57: Brahe   mass scale flexible indexing
Page 58: Brahe   mass scale flexible indexing
Page 59: Brahe   mass scale flexible indexing
Page 60: Brahe   mass scale flexible indexing