Transcript
Page 1: Brahe   mass scale flexible indexing

Brahe - Flexible Indexing At Scale

Ben Brown Software Architect, Cerner Corporation

Page 2: Brahe   mass scale flexible indexing

Who I Am

• Ben Brown

Software Architect

• Cerner

Healthcare IT Company

• Semantic Solutions

Team of 10

Search Services

Fun Stuff

NLP, Medical Ontologies, ML

Page 3: Brahe   mass scale flexible indexing

Chart Search

Taking This

Photo: http://bit.ly/Y7kTJt

Page 4: Brahe   mass scale flexible indexing

Chart Search

Turning it into this

Page 5: Brahe   mass scale flexible indexing

Chart Search Does

• Faceting

• NLP

• Semantic Concept Markup

Makes for a heavy record

(Especially on Solr 1.4)

Page 6: Brahe   mass scale flexible indexing

Where We Started

Started Major Engineering in 2009

IBM Dev Works: http://ibm.co/14ZrtqX

Page 7: Brahe   mass scale flexible indexing

Where We Started

Started Major Engineering in 2009

IBM Dev Works: http://ibm.co/14ZrtqX

Page 8: Brahe   mass scale flexible indexing

Scale

• Clusters partitioned by client

• Raw and processed data in HDFS

• All processing & indexing done through map

reduce

Page 9: Brahe   mass scale flexible indexing

Shard Size

Limiting Factor ~26 Million Discrete Results Per

Shard

Average of 35 Shards Per Client

Range 5 to 140

Page 10: Brahe   mass scale flexible indexing

Query Touch Points

Page 11: Brahe   mass scale flexible indexing

Query Touch Points

One User Action ~ 4 Queries

35 Shards - 432 Touch Points

140 Shards - 1692 Touch Points

• Works, but not efficient

• Chance for variance killing performance

• Failure is a massive config headache

Page 12: Brahe   mass scale flexible indexing

Growth

• Hashed ID does not play well with resizing

• Deploy Again

• Reindex Everything

Document Hash modulo Shard Count

Doc One:Hash(abc123) = 15

Doc Two: Hash(efg456) = 8

Doc Three: Hash(hij789) = 7

3 Shards

Doc One -> Shard 0

Doc Two -> Shard 2

Doc Three -> Shard 1

4 Shards

Doc One -> Shard 3

Doc Two -> Shard 0

Doc Three -> Shard 3

Page 13: Brahe   mass scale flexible indexing

We Have a Problem

Painful Growth

Lots of Deploys

Variance Risk

Image: http://bit.ly/Y7oBD6

Page 14: Brahe   mass scale flexible indexing

What Would Be Better?

Load Balance at the Client

Automated Failover

Easy Deployments

Simplified Splitting

Minimized Touch Points

Disconnected Stages

Page 15: Brahe   mass scale flexible indexing

Solution

Shift Master to HBase

Image: http://bit.ly/ZXO2na

Page 16: Brahe   mass scale flexible indexing

Why HBase?

Lexically organized keys

Efficient key range scans

Efficient time based scans

We're pretty good at operating it

Page 17: Brahe   mass scale flexible indexing

Coordinate With ZooKeeper

|-- Index name

|-- Version

|-- Solr Schema/Config

|-- Table Name + Connection Info

|-- Shard Number

|-- Shard Boundary Info

|-- Replica Number

|-- Ephemeral Claim

|-- Solr Connection Info

|-- Ephemeral Online

Page 18: Brahe   mass scale flexible indexing

Custom Core Admin

Work with ZooKeeper for claim process

Creates solr core after claims

Controls pulling data from HBase

Page 19: Brahe   mass scale flexible indexing

Claim Process

Page 20: Brahe   mass scale flexible indexing

Claim Process

Page 21: Brahe   mass scale flexible indexing

Claim Process

Image: http://bit.ly/Or317R

Page 22: Brahe   mass scale flexible indexing

Claim Process

Page 23: Brahe   mass scale flexible indexing

Claim Process

Page 24: Brahe   mass scale flexible indexing

Claim Process

Page 25: Brahe   mass scale flexible indexing

Claim Process

Page 26: Brahe   mass scale flexible indexing

Claim Process

Page 27: Brahe   mass scale flexible indexing

Claim Process

Page 28: Brahe   mass scale flexible indexing

Claim Process

Page 29: Brahe   mass scale flexible indexing

Claim Process

Page 30: Brahe   mass scale flexible indexing

Claim Process

Page 31: Brahe   mass scale flexible indexing

Claim Process

Page 32: Brahe   mass scale flexible indexing

Claim Process

Page 33: Brahe   mass scale flexible indexing

Claim Process

Page 34: Brahe   mass scale flexible indexing

Claim Process

Page 35: Brahe   mass scale flexible indexing

Claim Process

Page 36: Brahe   mass scale flexible indexing

Claim Process

Page 37: Brahe   mass scale flexible indexing

Claim Process

Page 38: Brahe   mass scale flexible indexing

Claim Process

Page 39: Brahe   mass scale flexible indexing

Claim Process

Page 40: Brahe   mass scale flexible indexing

Claim Process

Page 41: Brahe   mass scale flexible indexing

Claim Process

Page 42: Brahe   mass scale flexible indexing

Claim Process

Page 43: Brahe   mass scale flexible indexing

Claim Process

Page 44: Brahe   mass scale flexible indexing

Claim Process

Page 45: Brahe   mass scale flexible indexing

Claim Process

Page 46: Brahe   mass scale flexible indexing

Claim Process

Page 47: Brahe   mass scale flexible indexing

Claim Process

Page 48: Brahe   mass scale flexible indexing

Claim Process

Page 49: Brahe   mass scale flexible indexing

Claim Process

Page 50: Brahe   mass scale flexible indexing

Coordinate With ZooKeeper

|-- Index name

|-- Version

|-- Solr Schema/Config

|-- Table Name + Connection Info

|-- Shard Number

|-- Shard Boundary Info

|-- Replica Number

|-- Ephemeral Claim

|-- Solr Connection Info

|-- Ephemeral Online

Page 51: Brahe   mass scale flexible indexing

Queries

• Client inspects ZooKeeper

• Finds online nodes o Only for the keyspace it cares about

o Issues distributed queries if necessary

• Balances in the Client

• Retries if queries fail

Page 52: Brahe   mass scale flexible indexing

Ends Thoughts

• Keep things simple

• Disconnect your stages

• Keep your touchpoints at a minimum

• Organize your data around your queries

• Use what you’re good at

Page 53: Brahe   mass scale flexible indexing

CONTACT

Ben Brown

http://linkd.in/ZZIBK4

@b_brown

ENGINEERING BLOG

https://engineering.cerner.com/

WE’RE HIRING!

http://www.cerner.com/About_Cerner/Careers/

Page 54: Brahe   mass scale flexible indexing

Bonus Slides!

Page 55: Brahe   mass scale flexible indexing
Page 56: Brahe   mass scale flexible indexing
Page 57: Brahe   mass scale flexible indexing
Page 58: Brahe   mass scale flexible indexing
Page 59: Brahe   mass scale flexible indexing
Page 60: Brahe   mass scale flexible indexing

Recommended