Pivotal's effort on Apache Geode

Apache Geode,and Pivotal's leadership role

in open sourcing (Gemfire)

Nitin Lamba

(incubating)

Pivotal’s Open Source strategy

What is Apache Geode?

History

Differentiators

Basic Concepts

Resources

Q & A

Agenda

2

3

4

In 2015, Pivotal granted the components of its Big Data Suite to open source

6 Million Lines of Code4 new open source communities

5

May 2015 Sept 2015

Sept 2015Oct 2015

From GEMFIRE to GEODE…

6

A distributed, memory-based data management platform for data oriented apps that need:• high performance, scalability,

resiliency and continuous availability

• fast access to critical data sets• location-aware distributed data

processing• event-driven data architecture

What is GEODE?

7

• 1000+ systems in production (real customers)• Cutting edge use cases

Incubating but ROCK solid…

8

<2000 2004 2008 2012 2016

Early drivers• Data Volumes• Margins/ transactions• IT maintenance costs • Elasticity needs

Real-time needs• Real-time response• Time to market needs• Flexible Data Models • Persistent+In-memory

Global Data• Visibility across DC• Fast Ingest• Device to enterprise • Uptime (always on)

Open Source!• Apache Incubation• Gemfire > Geode• Geode M1 release• 1st Geode Summit

Financial Services

US DoDTrade Clearing

Travel Portal

Online Gambling

TelcosManufacturing

Auto InsurancePayroll processing

Rail systems

…with both SCALE and SPEED, …

9

40KTransactionsper second

3TB Data

in-memory

17B Records

in-memory

120KConcurrent

users

… and impacting a LOT of people!

10

China RailwayCorporation

Indian Railways

17%

19%

36%of the world population

High-level Architecture

11

Powerful app development kit• APIs: Java & REST• Adapters: Redis, Lucene*, Spark*, …

Multiple persistence options• Filesystem, RDBMS or HDFS*• Sync: read-through, write-through• Async: write-behind

Durable <K,V> cache/ store• Data replicated or partitioned• Redundant storage in-memory/ disk• Flexible data retention policiesÎ

!

Loca

tor

Serv

er

Serv

er

Serv

er

Serv

er +""""

"

$

%%%

&& &% % %% %% %%

&&

A Peer-2-Peer in-memory Distributed System

REST

!

* Experimental and waiting community feedback

• Minimize copying

• Minimize contention points

• Run user code in-process

• Partitioning & parallelism

• Avoid disk seeks

• Automated benchmarks

What makes it go FAST?

12

• Cache• Region• Member• Client Cache• Persistence• Functions

Let’s talk about a few BASIC CONCEPTS…

13

• In-memory storage and management for your data

• Configurable through XML, Java API or CLI

• Collection of Region

What is a CACHE?

14

• Distributed java.util.Map on steroids (Key/Value)

• Consistent API regardless of where or how data is stored

• Observable (reactive)

• Highly available, redundant on cache Member (s).

What is a REGION?

15

• Local, Replicated or Partitioned

• In-memory or persistent

• Redundant

• LRU

• Overflow

Region: Types & Options

16

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

• Durability

• WAL for efficient writing

• Consistent recovery

• Compaction

Persistent Regions

17

Server 1 Server N

• A process that has a connection to the system

• A process that has created a cache

• Embeddable within your application

What is a MEMBER?

18

Client

Locator

Server

• A process connected to the Geode server(s)

• Can have a local copy of the data

• Run OQL queries on local data

• Can be notified about events on the servers

What is a CLIENT CACHE?

19

Persistence - Shared Nothing

20

Server 3Server 2Server 1


21


B1

B3

B2

B1

B3

B2

Primary

Secondary


22


B1

B3

B2

B1

B3

B2

Primary

Secondary


23


B1

B3

B2

B1

B3

B2

Primary

Secondary


24


B1

B3

B2

B1

B3

B2

Primary

Secondary

B3

B2

Server 1 waits for others when it starts


25


B1

B3

B2

B1

B3

B2

Primary

Secondary

Fetches missed operations on restart

Persistence - Operational Logs

26

Create k1->v1

Create k2->v2

Modifyk1->v3

Create k4->v4

Modify k1->v5

Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to operation log

Persistence - Operational Logs: Compaction

27

Create k1->v1

Create k2->v2

Modifyk1->v3

Create k4->v4

Modify k1->v5

Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to operation log

Copy live data forward

• Used for distributed concurrent processing (Map/Reduce, stored procedure)

• Highly available

• Data oriented

• Member oriented

Functions

28

Functions

29

30

• Check out: http://geode.incubator.apache.org

• Subscribe: [email protected]

• Download: http://geode.incubator.apache.org/releases/

Join the Community!

31

Thank you!

Additional Slides

32

Built for PERFORMANCE…

33

0

200,000

400,000

600,000

800,000

1,000,000

A Re

ads

A Up

date

s

B Re

ads

B Up

date

s

C Re

ads

D In

serts

D Re

ads

F Re

ads

F Up

date

s

Ope

ratio

ns p

er s

econ

d

YCSB Workloads

Cassandra Geode

…and horizontal, consistent SCALABILITY!

34

Horizontal scaling for reads, consistent latency and CPU

0.

4.5

9.

13.5

18.

0.

1.25

2.5

3.75

5.

6.25

2 4 6 8 10

Speedu

p

ServerHosts

speedup latency(ms) CPU%

• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size

High Availability

35

Technology

Pivotal's effort on Apache Geode