View
110
Download
1
Category
Tags:
Preview:
DESCRIPTION
Presentation from the Elasticsearch Denver Meetup. Discusses scaling of Elasticsearch for Related Posts across WordPress.com and some of the big changes that were needed in order to scale for 23 million queries a day across 800 million documents.
Citation preview
at
Tuesday, February 25, 14
Greg Ichneumon
Brown
http://gibrown.wordpress.com@gregibrowngreg@automattic.com
Data Wrangler at Automattic
Tuesday, February 25, 14
Tuesday, February 25, 14
1 Billion Monthly Uniques
Tuesday, February 25, 14
Elasticsearch DeploymentsInternal Search - 216 Internal Blogs - 750k docs [3 GB]Support Documents - KNN Link Prediction - 1.7m docs [14 GB]Polldaddy - Word Clouds/Freq Response - 39m docs [9 GB]
WordPress.com VIP Search - KFF.org - 18m docs [99 MB] - NY Post - 600k docs [2.3 GB]
WordPress.com - ~800m docs [4 TB] - Related Posts - 48 mil reqs/day - search.wordpress.com - 3 mil reqs/day
Tuesday, February 25, 14
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
Related Posts
Search within just the one blog
Tuesday, February 25, 14
WordPress.comTotal Elasticsearch Operations
Operation Ops/Day
Routed Queries 23 mil
Global Queries 2 mil
Docs Indexed 13 mil
Docs Updated 10 mil
Docs Deleted 2.5 mil
Delete By Query 250k
Tuesday, February 25, 14
Global Cluster
DC2
14 Data
1 Master
DC1
14 Data
1 Master
DC3
14 Data
1 Master
Tuesday, February 25, 14
Our Secret To Scaling
Routed Queries
All Posts for each Blog are on the same Shard
Tuesday, February 25, 14
Global Index
7 Indices10 mil Blogs per Index25 Shards per Index
175 Shards Total
Tuesday, February 25, 14
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
20% Improvements Don’t solve scaling problems
Tuesday, February 25, 14
Entangling Elasticsearch with Existing Systems
Indexing
Tuesday, February 25, 14
Bulk Indexing 1.0
44 Days to Index all Posts(estimated)
Tuesday, February 25, 14
Bulk Indexing Problems
- Overhead: Spent too much time starting indexing jobs
WordPress.com has 500 mil MySQL tables.
- High DB Load: Corner Cases. Blogs with 1+ mil followers.- High DB Load: Indexing sequentially doesn’t spread the load.- High DB Load: Heavy load on archive DBs.
Tuesday, February 25, 14
Bulk Indexing Today
12.0?
4 Days to Index all Posts(running right now)
Tuesday, February 25, 14
Real Time Indexing
The Hardest Part!
Tuesday, February 25, 14
Real Time Goals
1) Eventually Consistent
2) Minimize Bulk Re-indexing
3) Normally updated < 1 minute
Tuesday, February 25, 14
Real Time Goals
1) Eventually Consistent
2) Minimize Bulk Re-indexing
3) Normally updated < 1 minute
Bulk reindexed 3 times in 5 months.One intentional,
Two during system upgrades.Tuesday, February 25, 14
Stuff Fails
1) Humans
2) Hardware
3) Elasticsearch (steady improvements)
Combinations of the above.
Tuesday, February 25, 14
Hardware Problems
1) Detect and Track Down Servers
2) Prioritize Queries over Indexing
3) Throttle Indexing Jobs
- any issues: block bulk changes to blogs
- >10 min: block doc updates
- >20 min: block all indexing
Tuesday, February 25, 14
Real Time Failures
1) Auto Retry Failed Indexing Jobs
2) Indexing Queue for Failures
3) Scrolling Queries to Find Bad Docs
Tuesday, February 25, 14
Cluster Restarts
Indexing across replicas is non-deterministic
Segments diverge
Slows Restart TimeTuesday, February 25, 14
Simplistic Example
Segments w/ identical checksums
Docs
Primary
Replica
Shard 1 merges
Only first segment is identical
Tuesday, February 25, 14
After Bulk Index
Every segment is out of sync!
Tuesday, February 25, 14
Our Bulk Indexing Procedure
1) Bulk Index All Docs
2) Optimize the index
3) Rolling Restart (sync segments)
4) Future restarts will be much faster.
- Play with recovery settings
- SSDs? => use noop Linux scheduling
Tuesday, February 25, 14
Indexing
It’s all about handling Failures
Tuesday, February 25, 14
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
Querying
Test and Iterate
Tuesday, February 25, 14
Related Posts Query
Started with MoreLikeThis API.
Did not scale well enough.
Tuesday, February 25, 14
MLT API
1) Get Document
2) Analyze Document
3) Search for Similar Docs
Tuesday, February 25, 14
MLT API vs MLT Query
MLT API MLT Query
147 req/sec 1062 req/sec
40% CPU 30% CPU
306 ms median latency 49.5 ms median latency
All processing by ES Build query in PHP
Tuesday, February 25, 14
Related Posts RelevancyGreat With Long Content
{ "more_like_this":{ "fields":["mlt_content"], "like_text":"Scaling Elasticsearch Part 1: Overview ElasticSearch scaling Search We recently launched Related Posts across WordPress.com, so its time to pop the hood and take a look at what ended up in our engine... ", "percent_terms_to_match":0.08, "boost_terms":5, "analyzer": "en_analyzer"}}
Tuesday, February 25, 14
MLT Query RelevancyUse match or multi_match for
short content.
Average Related Posts CTR
Tuesday, February 25, 14
Language Analyzers
arabic, armenian, basque, brazilian, bulgarian, catalan, chinese, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, italian, japanese, korean, norwegian, persian, portuguese, romanian, russian, spanish, swedish, turkish, thai
Tuesday, February 25, 14
Related Posts Relevancy
How Important is using the
correct Language Analyzer?
Tuesday, February 25, 14
Related Posts Relevancy
How Important is using the
correct Language Analyzer?
Doubled Click Through Rate
Tuesday, February 25, 14
Unfortunately
Increased Slow Queries
(>1 second)
by 10x
still worth it.Tuesday, February 25, 14
Global Query Performancesearch.wordpress.com
Tuesday, February 25, 14
Parent-Child FilteringBlog Doc
Post Doc
public: true|false
title: “...”
content: “...”
Tuesday, February 25, 14
has_parent Filter
With has_parent Without has_parent
7.6 req/sec 17.5 req/sec
75% CPU 50% CPU
503 ms median latency 207 ms median latency
Requires more Indexing
Querying Across All Shards
Tuesday, February 25, 14
Indexing:
Optimize to Handle Failures
Querying:
Test and Iterate
Tuesday, February 25, 14
Overview of Related Posts
Our “10X Improvements” - Indexing - Querying
Our Open Issues
Tuesday, February 25, 14
Open Issues
Slow Queries (> 1 second)
Getting Better. Shards are too big.Tuesday, February 25, 14
Open Issues
What does it take to scale?
3x Data
5x Queries
Tuesday, February 25, 14
Open Issues
Elasticsearch for Natural
Language Processing?At Scale.
On Live Data.
Tuesday, February 25, 14
http://gibrown.wordpress.com@gregibrown
Feeling Inspired?http://automattic.com/work-with-us/data-wrangler/
Tuesday, February 25, 14
Recommended