41
Introduction to Apache Geode (incubating) #ApacheConBigData

ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

  • Upload
    lammien

  • View
    222

  • Download
    5

Embed Size (px)

Citation preview

Page 1: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Introduction to Apache Geode (incubating)

#ApacheConBigData

Page 2: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

About us Anthony Baker (@metatype)

William Markito (@william_markito)

Page 3: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Agenda

• Introduction to Geode

• Geode concepts

• The Geode open source source project

• Demo = Geode + Docker

Page 4: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Introduction

Page 5: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Apache Geode is…

“…an in-memory, distributed database

with strong consistency

built to support low latency

transactional applications

at extreme scale.”

Page 6: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

2004 2008 2014

•  Massive increase in data volumes

•  Falling margins per transaction

•  Increasing cost of IT maintenance

•  Need for elasticity in systems

•  Financial Services Providers (every major Wall Street bank)

•  Department of Defense

•  Real Time response needs •  Time to market constraints •  Need for flexible data

models across enterprise •  Distributed development •  Persistence + In-memory

•  Global data visibility needs •  Fast Ingest needs for data •  Need to allow devices to

hook into enterprise data •  Always on

•  Largest travel Portal •  Airlines •  Trade clearing •  Online gambling

•  Largest Telcos •  Large mfrers •  Largest Payroll processor •  Auto insurance giants •  Largest rail systems on

earth

Page 7: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

China RailwayCorporation

5,700 train stations 4.5 million tickets per day

20 million daily users 1.4 billion page views per day

40,000 visits per second

*http://pivotal.io/big-data/pivotal-gemfire

Indian Railways

7,000 stations 72,000 miles of track

23 million passengers daily 120,000 concurrent users

10,000 transactions per minute

Page 8: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

World: ~7,349,000,000

~36% of the world population

Population: 1,251,695,6161,401,586,609

China RailwayCorporation

Indian Railways

Page 9: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Application patterns• Caching for speed and scale using read-through,

write-through, and write-behind

• OLTP system of record with in-memory for speed, on disk for durability

• Parallel compute grid

• Real-time analytics

Page 10: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts

Page 11: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Geode concepts and usage

• Cache

• Region

• Member

• Client Cache

• Functions

• Listeners

Page 12: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts

• Cache

• In-memory storage and management for your data

• Configurable through XML, Spring, Java API, or CLI

• Collection of Region

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Page 13: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts• Region

• Distributed java.util.Map on steroids (key/value)

• Consistent API regardless of where or how data is stored

• Observable (reactive)

• Highly available, redundant on cache Member(s)

• Querying

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Page 14: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts

• Region

• Local, replicated, or partitioned

• In-memory or persistent

• Redundant

• LRU

• Overflow

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

LOCAL  LOCAL_HEAP_LRU  LOCAL_OVERFLOW  LOCAL_PERSISTENT  

LOCAL_PERSISTENT_OVERFLOW  PARTITION  

PARTITION_HEAP_LRU  PARTITION_OVERFLOW  

PARTITION_PERSISTENT  PARTITION_PERSISTENT_OVERFLOW  

PARTITION_PROXY  PARTITION_PROXY_REDUNDANT  

PARTITION_REDUNDANT  PARTITION_REDUNDANT_HEAP_LRU  PARTITION_REDUNDANT_OVERFLOW  PARTITION_REDUNDANT_PERSISTENT  

PARTITION_REDUNDANT_PERSISTENT_OVERFLOW  REPLICATE  

REPLICATE_HEAP_LRU  REPLICATE_OVERFLOW  

REPLICATE_PERSISTENT  REPLICATE_PERSISTENT_OVERFLOW  

REPLICATE_PROXY

Page 15: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts

• Persistent region

• Durability

• WAL for efficient writes

• Consistent recovery

• Compaction

Modify k1->v5

Create k6->v6

Create k2->v2

Create k4->v4 Oplog2.crf

Member 1

Modify k4->v7 Oplog3.crf

Put k4->v7

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Server 1 Server N

Page 16: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts

• Member

• A process that has a connection to the cluster

• A process that has created a cache

• Embeddable within your application

Client

Locator

Server

Page 17: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts• Client cache

• A process connected to the Geode server(s)

• Can have a local copy of the data

• Can be notified about events in the cluster

Application

GemFire Server

Region

Region

Region Client Cache

Page 18: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts• Functions

• Used for distributed concurrent processing (Map/Reduce, stored procedure, data parallel, …)

• Highly available

• Data-oriented, member-oriented

Submit (f1)

f1 , f2 , … fn

Execute Functions

Page 19: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Concepts• Listeners

• CacheWriter / CacheListener

• AsyncEventListener (queue / batch)

• Parallel or Serial

• Conflation

Page 20: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

© Copyright 2014 Pivotal. All rights reserved.

Pivotal GemFire High Availability and Fault Tolerance in 6 acts

Failing data copies are replaced transparently

Data is replicated to other clusters and sites (WAN)

Network segmentations are identified and fixed automatically

Client and cluster disconnections are handled gracefully

Data is persisted on local disk for ultimate durability

“split brain”

Failed function executions are restarted automatically

restart

Page 21: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

What makes it go fast?

• Minimize copying

• Minimize contention points

• Flexible consistency model

• Partitioning and parallelism

• Avoid disk seeks

• Automated benchmarks

Page 22: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

0

2

4

6

8

10

12

14

16

18

0

1

2

3

4

5

6

2 4 6 8 10

Spee

dup

Server(Hosts

speedup

latency4(ms)

CPU4%

Horizontal scaling for reads, consistent latency and CPU

Page 23: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Fixed or flexible schema?

id name age pet_id

or

{      id      :  1,      name  :  “Fred”,      age    :  42,      pet    :  {          name  :  “Barney”,          type  :  “dino”      }  }

Page 24: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

But how to serialize data?

Page 25: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

C#, C++, Java, JSON

No IDL, no schemas, no hand-coding

* domain object classes not required

|                        header                        |              data              |  |  pdx  |  length  |  dsid  |  typeid  |  fields  |  offsets  |

Portable Data eXchange

Page 26: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Efficient for queries

{      id      :  1,      name  :  “Fred”,      age    :  42,      pet    :  {          name  :  “Barney”,          type  :  “dino”      }  }

SELECT  p.name  FROM  /Person  p  WHERE  p.pet.type  =  “dino”

single field deserialization

Page 27: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Distributed type registryPerson  p1  =  …  region.put(“Fred”,  p1); Member A Member B

Distributed Type Definitions

Page 28: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Distributed type registryPerson  p1  =  …  region.put(“Fred”,  p1); Member A Member B

Distributed Type Definitions

automatic definition

Page 29: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Distributed type registryPerson  p1  =  …  region.put(“Fred”,  p1); Member A Member B

Distributed Type Definitions

automatic definition

replicate serialized data containing typeId

Page 30: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Distributed type registryPerson  p1  =  …  region.put(“Fred”,  p1); Member A Member B

Distributed Type Definitions

automatic definition

replicate serialized data containing typeId

Page 31: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Schema evolutionMember A Member B

Distributed Type Definitions

v2v1

Application #1

Application #2

v2 objects preserve data from missing fields

v1 objects use default values to fill in new fields

PDX provides forwards and backwards compatibility, no code required

Page 32: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Show me the code!

git  clone  https://github.com/apache/incubator-­‐geode  cd  incubator-­‐geode./gradlew  build  -­‐Dskip.tests=true

cd  gemfire-­‐assembly/build/install/apache-­‐geode    ./bin/gfsh    gfsh>  start  locator  -­‐-­‐name=locator    gfsh>  start  server  -­‐-­‐name=server    gfsh>  create  region  -­‐-­‐name=myRegion  -­‐-­‐type=REPLICATE

Clone and build:

Start a cluster:

Page 33: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

The Geode Project

Page 34: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Why OSS? Why ASF?• Open source is fundamentally changing software

buying patterns

• Customers avoid vendor lock-in and get transparency, co-development of features

• “Open source is where ecosystems are built”

• It’s the community that matters

• ASF provides a framework for open source

Page 35: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Some context• 1M+ LOC, over a 1000 person-years invested into cutting

edge R&D

• Thousands of production customers in very demanding verticals

• Cutting edge use cases that have shaped product design

• A core technology team that has stayed together since founding

• Performance differentiators baked into every aspect of the product

Page 36: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Geode vs GemFire• Geode is project supported by the OSS community

• GemFire is a product from Pivotal, based on Geode source

• We donated everything but the kitchen sink*

• Development process follows “The Apache Way”

* Multi-site WAN replication, continuous queries, native (C++, .NET) client

Page 37: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

&

Page 38: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Active development• Off-heap memory storage

• HDFS persistence

• Lucene indexes

• Spark connector

• Transactions on distributed data

… and other ideas from the Geode community!

Page 39: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

How to get involved• Join the mailing lists; ask a question, answer a question,

learn

[email protected]@geode.incubator.apache.org

• File a bug in JIRA

• Update the wiki, website, or docs

• Create an example application

• Use it in your project! We need you!

Page 40: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Thank you!http://geode.incubator.apache.org

@ApacheGeode

Page 41: ApacheConBigData - Introducing Apache Geode - final · PDF file• GemFire is a product from Pivotal, based on Geode source

Building a Highly-Scalable Open-Source Real-time Streaming Analytics System Using Spark SQL, Apache Geode (incubating), SpringXD and Apache Zeppelin

(incubating)Room: Tas - 15:00, Sep 29

Fred Melo, Pivotal, William Markito, Pivotal

Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode (incubating), R, SparkML and Spring XD

Room: Tohotom - 14:30, Sep 30Fred Melo, Pivotal, William Markito, Pivotal