1. Scaling Analytics with elasticsearch Dan Noble @dwnoble
2. Background Technologist at The HumanGeo We use elasticsearch
to build social media analysis tools 100MM documents indexed 600GB+
index size Author of Python elasticsearch driver rawes
https://github.com/humangeo/rawes
3. Overview What is elasticsearch? Scaling with elasticsearch
How can I use elasticsearch to help with analytics? Use Case:
Social Media Analytics
4. What is elasticsearch?
5. Search Engine Open source Distributed Automatic failover
Crazy fast
6. Search Engine Actively maintained REST API JSON messages
Lucene based
7. Search Elasticsearch Cluster Host Index: Articles Simple
case: one host One index containing a set of articles
8. Distributed Search Elasticsearch Cluster Host Host Articles
(a) Articles (b) Too much data? Add another host Indices can be
broken up into shards and live on different machines
9. Redundancy Elasticsearch Cluster Host Host Articles (a)
Articles (b) Articles (b) Articles (a) Shards can be replicated to
improve availability
10. Node Auto Discovery Elasticsearch Cluster Host Host Host
Articles (a) Articles (b) Articles (b) Articles (b) Articles (a)
Articles (a) Say we add a third host elasticsearch will
automatically start moving shards to this new host to distribute
load
11. Failover Elasticsearch Cluster Host Host Host Articles (a)
Articles (b) Articles (b) Articles (b) Articles (a) Articles (a)
Say a host goes down Shards on that host are no longer available
for search Elasticsearch automatically rebuilds these two shards on
other hosts
12. Querying Elasticsearch Cluster Host Host Host Articles (a)
Articles (b) Articles (b) Articles(a) Query: Barack ObamaCan query
against Client Search for articles any host (Web Application) Send
request to other shards if needed
13. REST API JSON query syntax Developer friendly Easy to get
started
17. Analytics and elasticsearch Date Histograms Statistical
facets Geospatial queries All with arbitrary search parameters
Again: Fast
18. Use Case: Social Media Analysis Use social media APIs to
search for data on a topic of interest 100MM documents indexed
Sentiment analysis Location extraction (Geotagging)
19. Sample Documentes.post(articles/facebook, data={ date":
"2012-09-01 08:37:55", "tags": { "sentiment": { "positive": 0.36,
"negative": 0.10 } "geotags": [{ "term" : "Cairo", "location" :
"30.0566,31.2262, type : geo_point }], "search_terms": [ "Mohamed
Morsi" ] }, "item": { "publisher: "Facebook" "source_domain":
"www.facebook.com", "author": "James Smith", "source_url":
"http://www.facebook.com/5551231234/posts/414141414141",
"content_text": "Mohamed Morsi visits Iran for first time since
1979 ....", "title": "James Smith posted a note to Facebook",
"author_url: "http://www.facebook.com/profile.php?id=5551231234"
}})