Tubular Labs - Using Elastic to Search Over 2.5B Videos

Using Elastic tosearch over 2.5Bvideos

Talk structure

● 4 steps to make user experience great again

● 4 patterns to simplify architecture and reduce costs

© 2016 Tubular Labs

2

Data size

● 2.5B documents

● AVG doc size 2Kb, 4Tb total size

● 200M daily updates (~8% of the index)

● Constant indexing rate of 3k/s with spikes

● Querying rate 1-3 r/s (low concurrency)


3

Hardware

● 52 x c3.4xlarge

● 128 shards

● 16 cores per node

● ~3 shards per node

● 832 cores, 16Tb

SSD, 1.5Tb RAM


4

● 26 x c3.8xlarge

● 416 shards

● 32 cores per node

● 16 shards per node

● 832 cores, 16Tb

SSD, 1.5Tb RAM

Before After (25% bigger)

Indexing

Optimize indexing

● Using bulk API• 1Mb per batch (500 docs), should be 5k docs/s

• Recommended 5-15Mb

● Increasing refresh interval• From 1 to 30 seconds

● Monitoring bulk.rejected• Increased bulk.queueSize from 50 to 2000


6

Searching

Product view


8

Summary

Search results

Term aggregations

Before optimization


9

Goal


10

• Slow queries • From 15 to 5 seconds for 95th

• Seeking for 3x improvement

Problem Goal

Understand hardware utilization


11

• Run the heaviest query

• No bottlenecks (CPU, disk IO, network)

• Thread pool search.size 25

• Max search.active is 3

CPU utilization


12

• Know

• Your

• Concurrency

Benchmarking # of shards


13

On a single 32 cores node

More CPU per request results


14

15s to 7.5s

Search & Aggregations


15

• Searching and sorting

is fast

• 8 term aggregations

are slow

Aggregation impact


16

Check facet usage


17

● Talk to your product manager

● Low product usage

● Remove networks and claims aggregations

● Replace facets with filters

Removing two aggregations results


18

15s to 5.3s

Cardinality


19

● Reduce cardinality

● Going from 200M to 5M (channels to creators)

● Reducing # of topics from 5M to 500

Reducing cardinality results


20

15s to 4.4s

Split query and aggregations


21

● Searching and aggregating separately

● Using shard-level query cache

● Showing results in UI asynchronously

Split query and aggregations results


22

15s to 4.0s

Performance gain


23

● From 15 to 4 seconds (<5 seconds)

● Overall improvement 3.7x

● What about costs?

Architecture patterns

Part 2. Goals


25

● Reduce costs

● Improve reliability

● Simplify architecture

● Reduce variability in latency

Current flow


26

● Too many

dependencies

● Expensive

intermediate

storage

Denormalization


27

● 90% of data is

shared

● No extra calls

from frontend

Partial updates with Update API (experimental)


28

“Partial” updates with parent-child relations (experimental)


29

Split data by hot/full (idea for future)


30

● Cheaper

hardware on full

● Shard allocation

filtering

Thank you

Software

Tubular Labs - Using Elastic to Search Over 2.5B Videos