80
SMACK

Make 2016 your year of SMACK talk

Embed Size (px)

Citation preview

Page 1: Make 2016 your year of SMACK talk

SMACK

Page 2: Make 2016 your year of SMACK talk

Who are we?

2© 2015. All Rights Reserved.

Joe Stein - @allthingshadoop: CEO Elodina

Jon Haddad- @rustyrazorblade: Technical Evangelist, DataStax

Patrick McFadin- @PatrickMcFadin: Chief Evangelist, DataStax

Page 3: Make 2016 your year of SMACK talk

3© 2015. All Rights Reserved.

Page 4: Make 2016 your year of SMACK talk

4© 2015. All Rights Reserved.

Page 5: Make 2016 your year of SMACK talk

5© 2015. All Rights Reserved.

XML

Page 6: Make 2016 your year of SMACK talk

6© 2015. All Rights Reserved.

Page 7: Make 2016 your year of SMACK talk

7© 2015. All Rights Reserved.

Page 8: Make 2016 your year of SMACK talk

8© 2015. All Rights Reserved.

Page 9: Make 2016 your year of SMACK talk

• 75 data formats • Process data in flight w/ a tight SLA / Real time analysis of data

to determine pricing • scalable storage • Deploy a lot of services reliably • batch analytics • Multiple data centers (Oh, and by the way, this has to work

across multiple DCs across several continents)

9© 2015. All Rights Reserved.

The problem in a huge nutshell

Page 10: Make 2016 your year of SMACK talk

10© 2015. All Rights Reserved.

Page 11: Make 2016 your year of SMACK talk

11© 2015. All Rights Reserved.

Page 12: Make 2016 your year of SMACK talk

12© 2015. All Rights Reserved.

Page 13: Make 2016 your year of SMACK talk

13© 2015. All Rights Reserved.

Page 14: Make 2016 your year of SMACK talk

14© 2015. All Rights Reserved.

Page 15: Make 2016 your year of SMACK talk

15© 2015. All Rights Reserved.

Page 16: Make 2016 your year of SMACK talk

16© 2015. All Rights Reserved.

Page 17: Make 2016 your year of SMACK talk

17© 2015. All Rights Reserved.

Page 18: Make 2016 your year of SMACK talk

18© 2015. All Rights Reserved.

Page 19: Make 2016 your year of SMACK talk

19© 2015. All Rights Reserved.

Page 20: Make 2016 your year of SMACK talk

20© 2015. All Rights Reserved.

Kafka decouples data-pipelines

Page 21: Make 2016 your year of SMACK talk

21© 2015. All Rights Reserved.

Page 22: Make 2016 your year of SMACK talk

22© 2015. All Rights Reserved.

Topics & Partitions

Page 23: Make 2016 your year of SMACK talk

23© 2015. All Rights Reserved.

A high-throughput distributed messaging system rethought as a distributed commit log.

Page 24: Make 2016 your year of SMACK talk

24© 2015. All Rights Reserved.

Page 25: Make 2016 your year of SMACK talk

25© 2015. All Rights Reserved.

Page 26: Make 2016 your year of SMACK talk

26© 2015. All Rights Reserved.

Page 27: Make 2016 your year of SMACK talk

Spark Streaming - Micro Batching

27© 2015. All Rights Reserved.

Page 28: Make 2016 your year of SMACK talk

DStream

28© 2015. All Rights Reserved.

Page 29: Make 2016 your year of SMACK talk

Sliding Windows

29© 2015. All Rights Reserved.

Page 30: Make 2016 your year of SMACK talk

30© 2015. All Rights Reserved.

Page 31: Make 2016 your year of SMACK talk

31© 2015. All Rights Reserved.

Page 32: Make 2016 your year of SMACK talk

32© 2015. All Rights Reserved.

Page 33: Make 2016 your year of SMACK talk

Cassandra - More than one server

• All nodes participate in a cluster • Shared nothing • Add or remove as needed • More capacity? Add a server

33

Page 34: Make 2016 your year of SMACK talk

34

Cassandra HBase Redis MySQL

THRO

UG

HPU

T O

PS/S

EC)

VLDB benchmark (RWS)

Page 35: Make 2016 your year of SMACK talk

NodeServer

Page 36: Make 2016 your year of SMACK talk

TokenServer•Each partition is a 64 bit value

•Consistent hash between 2-63 and 264 •Each node owns a range of those values

•The token is the beginning of that range to the next node’s token value

•Virtual Nodes break these down further Data

Token Range

0 …

Page 37: Make 2016 your year of SMACK talk

The cluster Server

Token Range

0 0-100

0-100

Page 38: Make 2016 your year of SMACK talk

The cluster Server

Token Range

0 0-50

51 51-100

Server

0-50

51-100

Page 39: Make 2016 your year of SMACK talk

The cluster Server

Token Range

0 0-25

26 26-50

51 51-75

76 76-100Server

ServerServer

0-25

76-100

26-5051-75

Page 40: Make 2016 your year of SMACK talk

Replication10.0.0.1 00-25

DC1

DC1: RF=1

Node Primary

10.0.0.1 00-25

10.0.0.2 26-50

10.0.0.3 51-75

10.0.0.4 76-100

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Page 41: Make 2016 your year of SMACK talk

Replication10.0.0.1

00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

DC1

DC1: RF=2

Node Primary Replica

10.0.0.1 00-25 76-100

10.0.0.2 26-50 00-25

10.0.0.3 51-75 26-50

10.0.0.4 76-100 51-75

76-100

00-25

26-50

51-75

Page 42: Make 2016 your year of SMACK talk

ReplicationDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Page 43: Make 2016 your year of SMACK talk

ConsistencyDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Client

Write to partition 15

Page 44: Make 2016 your year of SMACK talk

44© 2015. All Rights Reserved.

Page 45: Make 2016 your year of SMACK talk

45© 2015. All Rights Reserved.

Page 46: Make 2016 your year of SMACK talk

Batch Analytics

46© 2015. All Rights Reserved.

Page 47: Make 2016 your year of SMACK talk

• Abstraction over RDDs • Modeled after Pandas & R • Structured data • Python passes commands only • Commands are pushed down • Goal: Data Never Leaves the JVM • You can still use the RDD if you want • Operations are lazy

47© 2015. All Rights Reserved.

RDD

DataFrame

Dataframes

Page 48: Make 2016 your year of SMACK talk

SparkSQL

48© 2015. All Rights Reserved.

movies.registerTempTable("movie") ratings.registerTempTable("rating") sql.sql("""select title, avg(rating) as avg_rating from movie join rating on movie.movie_id = rating.movie_id group by title order by avg_rating DESC limit 3""")

Page 49: Make 2016 your year of SMACK talk

Notebooks

49© 2015. All Rights Reserved.

Page 50: Make 2016 your year of SMACK talk

Visualizations

50© 2015. All Rights Reserved.

Page 51: Make 2016 your year of SMACK talk

51© 2015. All Rights Reserved.

Page 52: Make 2016 your year of SMACK talk

Apache Mesos

52© 2015. All Rights Reserved.

Page 53: Make 2016 your year of SMACK talk

53© 2015. All Rights Reserved.

Page 54: Make 2016 your year of SMACK talk

Static Partitioning

54© 2015. All Rights Reserved.

Page 55: Make 2016 your year of SMACK talk

Static Partitioning

55© 2015. All Rights Reserved.

Page 56: Make 2016 your year of SMACK talk

Better Option

56© 2015. All Rights Reserved.

Page 57: Make 2016 your year of SMACK talk

Kernel For Your Datacenter

57© 2015. All Rights Reserved.

Page 58: Make 2016 your year of SMACK talk

58© 2015. All Rights Reserved.

Page 59: Make 2016 your year of SMACK talk

Mesos

59© 2015. All Rights Reserved.

Page 60: Make 2016 your year of SMACK talk

60© 2015. All Rights Reserved.

Schedulers

Page 61: Make 2016 your year of SMACK talk

61© 2015. All Rights Reserved.

Page 62: Make 2016 your year of SMACK talk

62© 2015. All Rights Reserved.

Executors

Page 63: Make 2016 your year of SMACK talk

63© 2015. All Rights Reserved.

Page 64: Make 2016 your year of SMACK talk

64© 2015. All Rights Reserved.

Page 65: Make 2016 your year of SMACK talk

65© 2015. All Rights Reserved.

Page 66: Make 2016 your year of SMACK talk

Making Kafka Elastic with Mesos

66© 2015. All Rights Reserved.

Page 67: Make 2016 your year of SMACK talk

Goal we set out with

• smart broker.id assignment • preservation of broker placement (through constraints and/or

new features) • ability to-do configuration changes • rolling restarts (for things like configuration changes) • scaling the cluster up and down with automatic, programmatic

and manual options • smart partition assignment via constraints visa vi roles,

resources and attributes

67© 2015. All Rights Reserved.

Page 68: Make 2016 your year of SMACK talk

Mesos/Kafka

68© 2015. All Rights Reserved.

https://github.com/mesos/kafka

Page 69: Make 2016 your year of SMACK talk

Scheduler & Executor

69© 2015. All Rights Reserved.

Scheduler • Provides the operational automation for a Kafka Cluster • Manages the changes to the broker's configuration • Exposes a REST API for the CLI to use or any other client • Runs on Marathon for high availability

Executor • The executor interacts with the kafka broker as an intermediary

to the scheduler

Page 70: Make 2016 your year of SMACK talk

CLI and REST API

• scheduler - starts the scheduler • add - adds one more more brokers to the cluster • update - changes resources, constraints or broker properties one or more brokers • remove - take a broker out of the cluster • start - starts a broker up • stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help

stop) • rebalance - allows you to rebalance a cluster either by selecting the brokers or

topics to rebalance. Manual assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication factor on a topic

• help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command}

70© 2015. All Rights Reserved.

Page 71: Make 2016 your year of SMACK talk

Launch 20 brokers in seconds

71© 2015. All Rights Reserved.

./kafka-mesos.sh add 1000..1019 --cpus 0.01 --heap 128 --mem 256 --options num.io.threads=1

./kafka-mesos.sh start 1000..1019

Page 72: Make 2016 your year of SMACK talk

72© 2015. All Rights Reserved.

Zipkin http://zipkin.io/

Apache Mesos Framework https://github.com/elodina/sawfly/blob/master/tristan.md

Page 73: Make 2016 your year of SMACK talk

73© 2015. All Rights Reserved.

Page 74: Make 2016 your year of SMACK talk

74© 2015. All Rights Reserved.

LinkedIn Simoorghttps://github.com/linkedin/simoorg

Apache Mesos Framework https://github.com/elodina/sawfly/blob/master/pisaura.md

Page 75: Make 2016 your year of SMACK talk

75© 2015. All Rights Reserved.

Multiple Data Centers ?

Page 76: Make 2016 your year of SMACK talk

Multi-datacenterDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Client

Write to partition 15

DC2

10.1.0.1 00-25

10.1.0.4 76-100

10.1.0.2 26-50

10.1.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

Page 77: Make 2016 your year of SMACK talk

Multi-datacenterDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Client

Write to partition 15

DC2

10.1.0.1 00-25

10.1.0.4 76-100

10.1.0.2 26-50

10.1.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

Page 78: Make 2016 your year of SMACK talk

Multi-datacenterDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Client

Write to partition 15

DC2

10.1.0.1 00-25

10.1.0.4 76-100

10.1.0.2 26-50

10.1.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

Page 79: Make 2016 your year of SMACK talk

Data Protection• No longer OK to ship EU data to US under “Safe Harbour”

Product_Catalog RF=3Product_Catalog RF=3 EU_Customer_Data RF=3

EU_Customer_Data RF=0

Product_Catalog RF=3EU_Customer_Data RF=3

Page 80: Make 2016 your year of SMACK talk

80© 2015. All Rights Reserved.