+
How we’re building a CRM on top of ElasticSearch
About me (quickly)
Director of Engineering @ EverTrue
Love distributed data stores, love them!
Using ElasticSearch for ~1 year
Mark Greene / @markjgreene
What does EverTrue do?
We help nonprofits raise more money
by allowing them to identify and build relationships with potential donors
How do we do that?
Obligatory database tube
Resolving identities across third party data sources
Cluster Setup•3 Masters, 2 data nodes, AZ aware
•~40m documents, ~25GB
•1 index, 7 types
•5 shards, 1 replica
•Peak work loads equate to 4-5k ops/s
•Using mostly default settings
Data Model•Mapping contains ~50 default fields.
•Most fields are stored as both analyzed and not analyzed
•Leverage dynamic templates for custom fields created by our customers
•Each custom field is stored by as analyzed and not analyzed
Write Path
SQSSQSSQSSQS
Background Background JobsJobs
Background Background JobsJobs
Read Path
3. Load full contact objects w/ meta Offline streaming jobs
ContactContacts APIs API
ContactContacts APIs API
Search Search APIAPI
Search Search APIAPI
1. Submit EverTrue DSL
Query
2. Translate to ES Query, returns contact
Id’s
Arbitrary field filtering
Aggregations ES Hadoop Plugin
Filter Cache: Our first scaling issue
Turns out field cache is unbounded by default...
First Solution
• We set indices.fielddata.cache.size to 50%
• No more OOME Crashes
• Then something else happened....Really slow queries (Problem sign #1)
Slow Query?... More Hardware Right?!
Type m1.xlarge r3.2xlarge r3.2xlarge
Hardware
4 CPU 8 CPU 8 CPU
15GB RAM 60GB RAM 60GB RAM
Round disk thingy SSD’s SSD’s
ES Version v1.1.2 v1.1.2 v1.3.2
has_child query time 12-15s 6-8s ~100ms
Lessons Learned
•Watch the release notes & GH issues like a hawk
•Don’t fall to far behind w/r/t versions
•We waited to long (6 months)
•Keep ES fed with plenty of memory
•Need monitoring to have any hope of understanding operational issues
Settings We Tweaked
• indices.store.throttle.max_bytes_per_sec
• Default 20mb -> 60mb (SSD’s can handle it)
• indices.fielddata.cache.size
• Set to 70% of heap
ES Hadoop Integration
•We use it for a lot of our offline jobs
•One map task per shard
•Small shard deployments may underutilize your hadoop cluster
•Mapper inputs do not contain meta fields like _version
•Forces another read for write back scenarios
tail -f ~/questions