Download ppt - Building a CRM on top of ElasticSearch

+

How we’re building a CRM on top of ElasticSearch

About me (quickly)

Director of Engineering @ EverTrue

Love distributed data stores, love them!

Using ElasticSearch for ~1 year

Mark Greene / @markjgreene

What does EverTrue do?

We help nonprofits raise more money

by allowing them to identify and build relationships with potential donors

How do we do that?

Obligatory database tube

Resolving identities across third party data sources

Cluster Setup•3 Masters, 2 data nodes, AZ aware

•~40m documents, ~25GB

•1 index, 7 types

•5 shards, 1 replica

•Peak work loads equate to 4-5k ops/s

•Using mostly default settings

Data Model•Mapping contains ~50 default fields.

•Most fields are stored as both analyzed and not analyzed

•Leverage dynamic templates for custom fields created by our customers

•Each custom field is stored by as analyzed and not analyzed

Write Path

SQSSQSSQSSQS

Background Background JobsJobs

Background Background JobsJobs

Read Path

3. Load full contact objects w/ meta Offline streaming jobs

ContactContacts APIs API

ContactContacts APIs API

Search Search APIAPI

Search Search APIAPI

1. Submit EverTrue DSL

Query

2. Translate to ES Query, returns contact

Id’s

Arbitrary field filtering

Aggregations ES Hadoop Plugin

Filter Cache: Our first scaling issue

Turns out field cache is unbounded by default...

First Solution

• We set indices.fielddata.cache.size to 50%

• No more OOME Crashes

• Then something else happened....Really slow queries (Problem sign #1)

Slow Query?... More Hardware Right?!

Type m1.xlarge r3.2xlarge r3.2xlarge

Hardware

4 CPU 8 CPU 8 CPU

15GB RAM 60GB RAM 60GB RAM

Round disk thingy SSD’s SSD’s

ES Version v1.1.2 v1.1.2 v1.3.2

has_child query time 12-15s 6-8s ~100ms

Lessons Learned

•Watch the release notes & GH issues like a hawk

•Don’t fall to far behind w/r/t versions

•We waited to long (6 months)

•Keep ES fed with plenty of memory

•Need monitoring to have any hope of understanding operational issues

Settings We Tweaked

• indices.store.throttle.max_bytes_per_sec

• Default 20mb -> 60mb (SSD’s can handle it)

• indices.fielddata.cache.size

• Set to 70% of heap

ES Hadoop Integration

•We use it for a lot of our offline jobs

•One map task per shard

•Small shard deployments may underutilize your hadoop cluster

•Mapper inputs do not contain meta fields like _version

•Forces another read for write back scenarios

tail -f ~/questions