40
Introduction to Apache Geode (incubating) Cork, September 2015 Anthony Baker (@metatype) William Markito (@william_markito)

Introduction to Apache Geode (Cork, Ireland)

Embed Size (px)

Citation preview

Page 1: Introduction to Apache Geode (Cork, Ireland)

Introduction to Apache Geode (incubating)

Cork, September 2015

Anthony Baker (@metatype) William Markito (@william_markito)

Page 2: Introduction to Apache Geode (Cork, Ireland)

• Introduction to Geode

• Geode concepts and usage

• The Geode open source project

• StockPrediction demo

Agenda

Page 3: Introduction to Apache Geode (Cork, Ireland)

Introduction

Page 4: Introduction to Apache Geode (Cork, Ireland)

4

"…an in-memory, distributed database with strong consistency built to support low latency transactional applications at extreme scale.”

Page 5: Introduction to Apache Geode (Cork, Ireland)

History

2004 2008 2014

•  Massive increase in data volumes

•  Falling margins per transaction

•  Increasing cost of IT maintenance

•  Need for elasticity in systems

•  Financial Services Providers (every major Wall Street bank)

•  Department of Defense

•  Real Time response needs •  Time to market constraints •  Need for flexible data

models across enterprise •  Distributed development •  Persistence + In-memory

•  Global data visibility needs •  Fast Ingest needs for data •  Need to allow devices to

hook into enterprise data •  Always on

•  Largest travel Portal •  Airlines •  Trade clearing •  Online gambling

•  Largest Telcos •  Large mfrers •  Largest Payroll processor •  Auto insurance giants •  Largest rail systems on

earth

• 1000+ customers in production • Cutting edge use cases

Page 6: Introduction to Apache Geode (Cork, Ireland)

Interesting use cases

China RailwayCorporation

5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second

*http://pivotal.io/big-data/pivotal-gemfire

Indian Railways

7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute

Page 7: Introduction to Apache Geode (Cork, Ireland)

Interesting use cases

Indian RailwaysChina Railway Corporation

World: ~7,349,000,000

~36% of the world population

Population: 1,251,695,6161,401,586,609

Page 8: Introduction to Apache Geode (Cork, Ireland)

8

Application Patterns

• Caching for speed and scale ➡ Read-Through, Write-Through, Write-Behind

• As the OLTP system of record ➡ Data in-memory for low-latency, on disk for durability

• Parallel compute engine • Real-time analytics

Page 9: Introduction to Apache Geode (Cork, Ireland)

Concepts

Page 10: Introduction to Apache Geode (Cork, Ireland)

• Cache

• Region

• Member

• Client Cache

• Functions

• Listeners

Geode Concepts and Usage

Page 11: Introduction to Apache Geode (Cork, Ireland)

• Cache

• In-memory storage and management for your data

• Configurable through XML, Spring, Java API or CLI

• Collection of Region

Region

Region

Region

Cache

JVM

Concepts

Page 12: Introduction to Apache Geode (Cork, Ireland)

Concepts

• Region

• Distributed java.util.Map on steroids (Key/Value)

• Consistent API regardless of where or how data is stored

• Observable (reactive)

• Highly available, redundant on cache Member (s).

• Querying

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Page 13: Introduction to Apache Geode (Cork, Ireland)

Concepts

• Region

• Local, Replicated or Partitioned

• In-memory or persistent

• Redundant

• LRU

• Overflow

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

LOCAL  LOCAL_HEAP_LRU  LOCAL_OVERFLOW  LOCAL_PERSISTENT  LOCAL_PERSISTENT_OVERFLOW  PARTITION  PARTITION_HEAP_LRU  PARTITION_OVERFLOW  PARTITION_PERSISTENT  PARTITION_PERSISTENT_OVERFLOW  PARTITION_PROXY  PARTITION_PROXY_REDUNDANT  PARTITION_REDUNDANT  PARTITION_REDUNDANT_HEAP_LRU  PARTITION_REDUNDANT_OVERFLOW  PARTITION_REDUNDANT_PERSISTENT  PARTITION_REDUNDANT_PERSISTENT_OVERFLOW  REPLICATE  REPLICATE_HEAP_LRU  REPLICATE_OVERFLOW  REPLICATE_PERSISTENT  REPLICATE_PERSISTENT_OVERFLOW  REPLICATE_PROXY

Page 14: Introduction to Apache Geode (Cork, Ireland)

• Persistent Regions

• Durability

• WAL for efficient writing

• Consistent recovery

• Compaction

Concepts

Modify k1->v5

Create k6->v6

Create k2->v2

Create k4->v4 Oplog2.crf

Member 1

Modify k4->v7 Oplog3.crf

Put k4->v7

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Server 1 Server N

Page 15: Introduction to Apache Geode (Cork, Ireland)

• Member

• A process that has a connection to the system

• A process that has created a cache

• Embeddable within your application

Concepts

Client

Locator

Server

Page 16: Introduction to Apache Geode (Cork, Ireland)

Concepts

• Client cache

• A process connected to the Geode server(s)

• Can have a local copy of the data

• Can be notified about events on the servers

Application

GemFire Server

Region

Region

Region Client Cache

Page 17: Introduction to Apache Geode (Cork, Ireland)

Concepts

• Functions

• Used for distributed concurrent processing (Map/Reduce, stored procedure)

• Highly available

• Data oriented

• Member oriented

Submit (f1)

f1 , f2 , … fn

Execute Functions

Page 18: Introduction to Apache Geode (Cork, Ireland)

Concepts

• Functions

Server

Server

FunctionService.onRegion.withFilter.execute ResultCollector.getResult

Server Distributed System

execute

Server

Server

6

1

result

execute

execute

result result

2

5

3

4 3 4

Server

Partitioned Region Data Store - X

Partitioned Region Data Store - Y

Partitioned Region Data Store - Z

Partitioned Region Data Accessor

Partitioned Region Data Accessor

filter = Keys X, Y Client Region

Page 19: Introduction to Apache Geode (Cork, Ireland)

Concepts

• Listeners

• CacheWriter / CacheListener

• AsyncEventListener (queue / batch)

• Parallel or Serial

• Conflation

Page 20: Introduction to Apache Geode (Cork, Ireland)

Core Beliefs

Performance

Consistency

Resiliency

Page 21: Introduction to Apache Geode (Cork, Ireland)

Why Apache Geode?

© Copyright 2014 Pivotal. All rights reserved.

Pivotal GemFire High Availability and Fault Tolerance in 6 acts

Failing data copies are replaced transparently

Data is replicated to other clusters and sites (WAN)

Network segmentations are identified and fixed automatically

Client and cluster disconnections are handled gracefully

Data is persisted on local disk for ultimate durability

“split brain”

Failed function executions are restarted automatically

restart

Page 22: Introduction to Apache Geode (Cork, Ireland)

• Minimize copying

• Minimize contention points

• Flexible consistency model

• Partitioning and parallelism

• Avoid disk seeks

• Automated benchmarks

What makes it fast?

Page 23: Introduction to Apache Geode (Cork, Ireland)

Benchmarks and Testing

0

2

4

6

8

10

12

14

16

18

0

1

2

3

4

5

6

2 4 6 8 10

Spee

dup

Server(Hosts

speedup

latency4(ms)

CPU4%

Page 24: Introduction to Apache Geode (Cork, Ireland)

The Geode Project

Page 25: Introduction to Apache Geode (Cork, Ireland)

Why Open Source? Why ASF?

• Open source is fundamentally changing software buying patterns

• Customers get transparency and co-development of features

• It’s the community that matters

• ASF provides a framework for open source

Page 26: Introduction to Apache Geode (Cork, Ireland)

Geode Will Be a Significant Apache Project

• 1M+ LOC, over a 1000 person years invested into cutting edge R&D

• Thousands of production customers in very demanding verticals

• Cutting edge use cases that have shaped product thinking

• A core technology team that has stayed together since founding

• Performance differentiators that are baked into every aspect of the product

Page 27: Introduction to Apache Geode (Cork, Ireland)

Geode versus GemFire

• Geode is a project supported by the OSS community

• GemFire is product from Pivotal, based on Geode source

• We donated everything but the kitchen sink*

• Development process follows “The Apache Way”

* Multi-site WAN replication, continuous queries, and native (C/C++/.NET) client

Page 28: Introduction to Apache Geode (Cork, Ireland)

"Talk is cheap, show me the code"

• Clone & Build

git  clone  https://github.com/apache/incubator-­‐geode  cd  incubator-­‐geode./gradlew  build  -­‐Dskip.tests=true

• Start a server

cd  gemfire-­‐assembly/build/install/apache-­‐geode    ./bin/gfsh    gfsh>  start  locator  -­‐-­‐name=locator    gfsh>  start  server  -­‐-­‐name=server    gfsh>  create  region  -­‐-­‐name=myRegion  -­‐-­‐type=REPLICATE

Page 29: Introduction to Apache Geode (Cork, Ireland)

29

&

Page 30: Introduction to Apache Geode (Cork, Ireland)

Stock Predictions with

Apache Geode, Spark, and SpringXD

Page 31: Introduction to Apache Geode (Cork, Ireland)

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."

Page 32: Introduction to Apache Geode (Cork, Ireland)

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

“A dataframe is a distributed collection of rows organized into named columns. An abstraction for selecting, filtering and plotting structured data (pandas), previously known as SchemaRDD."

Page 33: Introduction to Apache Geode (Cork, Ireland)

• RDD • Dataframe • Driver • Worker

Quick intro to Apache Spark

Page 34: Introduction to Apache Geode (Cork, Ireland)

Live Data

Apache Geode / GemFire1- Live data is ingested into the grid

2 - Trained ML model compares new data to historical patterns

3 - Results are pushed immediately to deployed applications

Machine Learning model

4 - Re-training is triggered, updating the model with the latest historical data

Spring XD

Spring XD

Data Temperature

Hot

Warm

Page 35: Introduction to Apache Geode (Cork, Ireland)

Machine Learning Concepts

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Page 36: Introduction to Apache Geode (Cork, Ireland)

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Page 37: Introduction to Apache Geode (Cork, Ireland)

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 38: Introduction to Apache Geode (Cork, Ireland)

• Off-heap memory storage

• HDFS persistence

• Lucene indexes

• Spark connector

• Cloud Foundry service

• Distributed transactions

…and other ideas from the Geode community!

Roadmap

Page 39: Introduction to Apache Geode (Cork, Ireland)

How to Get Involved

• http://geode.incubator.apache.org

• Join the mailing lists; ask a question, answer a question, learn

[email protected] [email protected]

• File a bug in JIRA

• Update the wiki, website, or documentation

• Create example applications

• Use it in your project! We need you!

Page 40: Introduction to Apache Geode (Cork, Ireland)

Questions?

Thank you!