Brahe mass scale flexible indexing

Brahe - Flexible Indexing At Scale

Ben Brown Software Architect, Cerner Corporation

Who I Am

• Ben Brown

Software Architect

• Cerner

Healthcare IT Company

• Semantic Solutions

Team of 10

Search Services

Fun Stuff

NLP, Medical Ontologies, ML

Chart Search

Taking This

Photo: http://bit.ly/Y7kTJt

Chart Search

Turning it into this

Chart Search Does

• Faceting

• NLP

• Semantic Concept Markup

Makes for a heavy record

(Especially on Solr 1.4)

Where We Started

Started Major Engineering in 2009

IBM Dev Works: http://ibm.co/14ZrtqX

Where We Started

Started Major Engineering in 2009

IBM Dev Works: http://ibm.co/14ZrtqX

Scale

• Clusters partitioned by client

• Raw and processed data in HDFS

• All processing & indexing done through map

reduce

Shard Size

Limiting Factor ~26 Million Discrete Results Per

Shard

Average of 35 Shards Per Client

Range 5 to 140

Query Touch Points

Query Touch Points

One User Action ~ 4 Queries

35 Shards - 432 Touch Points

140 Shards - 1692 Touch Points

• Works, but not efficient

• Chance for variance killing performance

• Failure is a massive config headache

Growth

• Hashed ID does not play well with resizing

• Deploy Again

• Reindex Everything

Document Hash modulo Shard Count

Doc One:Hash(abc123) = 15

Doc Two: Hash(efg456) = 8

Doc Three: Hash(hij789) = 7

3 Shards

Doc One -> Shard 0

Doc Two -> Shard 2

Doc Three -> Shard 1

4 Shards

Doc One -> Shard 3

Doc Two -> Shard 0

Doc Three -> Shard 3

We Have a Problem

Painful Growth

Lots of Deploys

Variance Risk

Image: http://bit.ly/Y7oBD6

What Would Be Better?

Load Balance at the Client

Automated Failover

Easy Deployments

Simplified Splitting

Minimized Touch Points

Disconnected Stages

Solution

Shift Master to HBase

Image: http://bit.ly/ZXO2na

Why HBase?

Lexically organized keys

Efficient key range scans

Efficient time based scans

We're pretty good at operating it

Coordinate With ZooKeeper

|-- Index name

|-- Version

|-- Solr Schema/Config

|-- Table Name + Connection Info

|-- Shard Number

|-- Shard Boundary Info

|-- Replica Number

|-- Ephemeral Claim

|-- Solr Connection Info

|-- Ephemeral Online

Custom Core Admin

Work with ZooKeeper for claim process

Creates solr core after claims

Controls pulling data from HBase

Claim Process

Claim Process

Claim Process

Image: http://bit.ly/Or317R

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Claim Process

Coordinate With ZooKeeper

|-- Index name

|-- Version

|-- Solr Schema/Config

|-- Table Name + Connection Info

|-- Shard Number

|-- Shard Boundary Info

|-- Replica Number

|-- Ephemeral Claim

|-- Solr Connection Info

|-- Ephemeral Online

Queries

• Client inspects ZooKeeper

• Finds online nodes o Only for the keyspace it cares about

o Issues distributed queries if necessary

• Balances in the Client

• Retries if queries fail

Ends Thoughts

• Keep things simple

• Disconnect your stages

• Keep your touchpoints at a minimum

• Organize your data around your queries

• Use what you’re good at

CONTACT

Ben Brown

http://linkd.in/ZZIBK4

@b_brown

ENGINEERING BLOG

https://engineering.cerner.com/

WE’RE HIRING!

http://www.cerner.com/About_Cerner/Careers/

Bonus Slides!

Education

Brahe mass scale flexible indexing