Building a CRM on top of ElasticSearch

How we’re building a CRM on top of ElasticSearch

About me (quickly)

Director of Engineering @ EverTrue

Love distributed data stores, love them!

Using ElasticSearch for ~1 year

Mark Greene / @markjgreene

What does EverTrue do?

We help nonprofits raise more money

by allowing them to identify and build relationships with potential donors

How do we do that?

Obligatory database tube

Resolving identities across third party data sources

Cluster Setup•3 Masters, 2 data nodes, AZ aware

•~40m documents, ~25GB

•1 index, 7 types

•5 shards, 1 replica

•Peak work loads equate to 4-5k ops/s

•Using mostly default settings

Data Model•Mapping contains ~50 default fields.

•Most fields are stored as both analyzed and not analyzed

•Leverage dynamic templates for custom fields created by our customers

•Each custom field is stored by as analyzed and not analyzed

Write Path

SQSSQSSQSSQS

Background Background JobsJobs

Read Path

3. Load full contact objects w/ meta Offline streaming jobs

ContactContacts APIs API

Search Search APIAPI

1. Submit EverTrue DSL

2. Translate to ES Query, returns contact

Id’s

Arbitrary field filtering

Aggregations ES Hadoop Plugin

Filter Cache: Our first scaling issue

Turns out field cache is unbounded by default...

First Solution

• We set indices.fielddata.cache.size to 50%

• No more OOME Crashes

• Then something else happened....Really slow queries (Problem sign #1)

Slow Query?... More Hardware Right?!

Type m1.xlarge r3.2xlarge r3.2xlarge

Hardware

4 CPU 8 CPU 8 CPU

15GB RAM 60GB RAM 60GB RAM

Round disk thingy SSD’s SSD’s

ES Version v1.1.2 v1.1.2 v1.3.2

has_child query time 12-15s 6-8s ~100ms

Lessons Learned

•Watch the release notes & GH issues like a hawk

•Don’t fall to far behind w/r/t versions

•We waited to long (6 months)

•Keep ES fed with plenty of memory

•Need monitoring to have any hope of understanding operational issues

Settings We Tweaked

• indices.store.throttle.max_bytes_per_sec

• Default 20mb -> 60mb (SSD’s can handle it)

• indices.fielddata.cache.size

• Set to 70% of heap

ES Hadoop Integration

•We use it for a lot of our offline jobs

•One map task per shard

•Small shard deployments may underutilize your hadoop cluster

•Mapper inputs do not contain meta fields like _version

•Forces another read for write back scenarios

tail -f ~/questions

Building a CRM on top of ElasticSearch

Data & Analytics

3 Top Reasons CRM Goes Mobile

SAP TOP TEN CRM 2013

Elasticsearch: Accelerating the Django Admin · Elasticsearch Service elastic cloud Elasticsearch Reference + + + + + + + Elasticsearch Reference: 6.4 (current) Getting Started Set

ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

CRM in 2016 top new features to lead your company to a competitive CRM victory!

Top 8 crm assistant resume samples

Elasticsearch Basics

Top 12 benefits of CRM for Salespersons

Top 40 CRM Features

Some Top CRM Programmes

Top cloud CRM overview. Part 1 - Choosing the right CRM solution

Alexander Reelsen alex@elastic.co @spinscale · Elasticsearch 7.0 - Faster top-k retrieval While querying, exclude documents that cannot make it into the top hits Search: Elasticsearch

ElasticSearch Introduction

IPon top of Microsoft Dynamics CRM

TOP 10 CRM SOFTWARE - Discover CRM · TOP 10 CRM SOFTWARE 1 HubSpot HubSpot CRM 2 Salesforce Salesforce 3 Zoho Zoho CRM 4 Pipedrive Pipedrive ... SALES GAMIFICATION QUOTATION HANDLING

Top 10 features of CRM software USA

Neo4j Integration with ElasticSearch - ElasticSearch Meetup

"Top 10 Considerations for Choosing a CRM"

The Top 5 CRM Implementation Mistakes

Elasticsearch - nosqlroadshow.comnosqlroadshow.com/.../elasticsearch_Alexander_Reelsen.pdf · Elasticsearch - The Company • Founded in 2012 • By the people behind the Elasticsearch