52
Zebus Building a P2P Service Bus Kévin Lovato - @alprema

Zebus - Pitfalls of a P2P service bus

  • Upload
    alprema

  • View
    243

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Zebus - Pitfalls of a P2P service bus

Zebus Building a P2P Service Bus

Kévin Lovato - @alprema

Page 2: Zebus - Pitfalls of a P2P service bus

What is an ESB?

Page 3: Zebus - Pitfalls of a P2P service bus

Enterprise Service Bus 101

Application 1

Bus

Application 3 Application 2

InterestingMessage

InterestingMessage InterestingMessage

Page 4: Zebus - Pitfalls of a P2P service bus

What we had

Page 5: Zebus - Pitfalls of a P2P service bus

QPID (broker) Single instance / single server

Application 1

Application 2 Application 3

Page 6: Zebus - Pitfalls of a P2P service bus

Why change?

Page 7: Zebus - Pitfalls of a P2P service bus

New requirements

• More resilience

• Better latency

• Better throughput

Page 8: Zebus - Pitfalls of a P2P service bus

How it works

Page 9: Zebus - Pitfalls of a P2P service bus

Basics

• Communication: Zmq + ProtoBuf

• Discovery: All knowing Directory service

• Storage: Cassandra (Plugable)

Page 10: Zebus - Pitfalls of a P2P service bus

Directory

Peer 2 Peer 3

Peer 1 needs to connect to the bus

Startup

Page 11: Zebus - Pitfalls of a P2P service bus

Peer 1 Peer 2 Peer 3

Register Peers list + Subscriptions

Startup

Directory

Page 12: Zebus - Pitfalls of a P2P service bus

Peer 1 Peer 2 Peer 3

New Peer information

Startup

Directory

Page 13: Zebus - Pitfalls of a P2P service bus

Peer 1 Peer 2 Peer 3

Direct communication

Startup

Directory

Page 14: Zebus - Pitfalls of a P2P service bus

Persistence

Peer 1 Peer 2 Message

Message persistence

Message copy

Store

Page 15: Zebus - Pitfalls of a P2P service bus

Persistence

Peer 1 Peer 2

Message persistence

Message-handling ack

”Delete ”

Page 16: Zebus - Pitfalls of a P2P service bus

How we use it

Page 17: Zebus - Pitfalls of a P2P service bus

Sending a command

private void SendMessage() { _bus.Send(new SendChatMessageCommand("Hello world", "me")); }

Page 18: Zebus - Pitfalls of a P2P service bus

Receiving a command

public class ChatMessageHandler : IMessageHandler<SendChatMessageCommand> { public void Handle(SendChatMessageCommand chatMessageCommand) { Console.WriteLine(chatMessageCommand.Message); } }

Page 19: Zebus - Pitfalls of a P2P service bus

Dynamic subscription

private void MakeSubscriptionForMe() { var constraint = Subscription.Matching<Msg>(msg => msg.Author == "me"); _bus.Subscribe(new[] { constraint }, msg => DoTheThing(msg)); }

Page 20: Zebus - Pitfalls of a P2P service bus

Configuring the bus

private IBus ConfigureAndStartBus() { return new BusFactory().WithConfiguration("tcp://directory:129", "Prod") .CreateAndStartBus(); }

Page 21: Zebus - Pitfalls of a P2P service bus

https://github.com/Abc-Arbitrage/Zebus.Samples for more

Page 22: Zebus - Pitfalls of a P2P service bus

Design pitfalls

Mistakes you should avoid when designing a distributed system

Page 23: Zebus - Pitfalls of a P2P service bus

Proper bus shutdown

Peer 1 Peer 2

Network Zmq layer Zmq layer

Normal operations

Page 24: Zebus - Pitfalls of a P2P service bus

Proper bus shutdown

Peer 1 Peer 2

Network Zmq layer Zmq layer

Shutdown phase

Page 25: Zebus - Pitfalls of a P2P service bus

Proper bus shutdown

Peer 1 Peer 2

Network Zmq layer Zmq layer

Shutdown phase

Messages in the Zmq buffers are not guaranteed to be processed

Page 26: Zebus - Pitfalls of a P2P service bus

Proper bus shutdown

Peer 1 Peer 2

Solution

EndOfStream message

Use an end of stream handshake before closing streams

Page 27: Zebus - Pitfalls of a P2P service bus

Proper bus shutdown

Peer 1 Peer 2

Solution

EndOfStreamAck

Use an end of stream handshake before closing streams

Page 28: Zebus - Pitfalls of a P2P service bus

Directory fault tolerance

Register Peers list + Subscriptions

Naive implementation

Peer 1

Directory 1

Page 29: Zebus - Pitfalls of a P2P service bus

Directory fault tolerance

The directory became the new SPOF

Directory down

Peer 1

Page 30: Zebus - Pitfalls of a P2P service bus

Directory fault tolerance

Peer 1

Solution

Directory 1 Directory 2

Redundant directories, state synchronized using Cassandra

Page 31: Zebus - Pitfalls of a P2P service bus

Out of order subscriptions

Peer 1

Directory 1 Directory 2

Naive implementation

Subscriptions update A

Page 32: Zebus - Pitfalls of a P2P service bus

Out of order subscriptions

Peer 1

Directory 1 Directory 2

Naive implementation

Subscriptions update B

Page 33: Zebus - Pitfalls of a P2P service bus

Out of order subscriptions

Peer 1

Directory 1 Directory 2

Out of order processing

For unknown reasons, Directory 2 does the update first

Page 34: Zebus - Pitfalls of a P2P service bus

Out of order subscriptions

Peer 1

Directory 1 Directory 2

Directory 1 catches up and overwrites Directory 2’s update

Out of order processing

Page 35: Zebus - Pitfalls of a P2P service bus

Out of order subscriptions

Peer 1

Directory 1 Directory 2

The state is incorrect

Out of order processing

Page 36: Zebus - Pitfalls of a P2P service bus

Out of order subscriptions

Peer 1

Directory 1 Directory 2

Solution

Update A with client timestamp

Update B with client timestamp

Page 37: Zebus - Pitfalls of a P2P service bus

Out of order subscriptions

Peer 1

Directory 1 Directory 2

Solution

Use the client timestamp to let Cassandra resolve the conflict

Page 38: Zebus - Pitfalls of a P2P service bus

Massive subscriptions

Peer 1

Directory 1

Naive implementation

Subscriptions update – 10 subscriptions

Page 39: Zebus - Pitfalls of a P2P service bus

Massive subscriptions

Peer 1

Directory 1

Directory DoS

Subscriptions update – 10 000 subscriptions ( ) 10 000 x

Page 40: Zebus - Pitfalls of a P2P service bus

Massive subscriptions

Peer 1

Directory 1

Solution

Subscriptions update – few subscriptions ( ) 10 000 x

Use incremental updates to mitigate load Restructure the Directory storage

Page 41: Zebus - Pitfalls of a P2P service bus

And the list goes on…

• Fast and resilient persistent storage

Page 42: Zebus - Pitfalls of a P2P service bus

And the list goes on…

• Persistence failure handling

Page 43: Zebus - Pitfalls of a P2P service bus

And the list goes on…

• Time synchronization between peers

Page 44: Zebus - Pitfalls of a P2P service bus

And the list goes on…

• Java / .Net types conversions (Cassandra)

Page 45: Zebus - Pitfalls of a P2P service bus

Why did we do it anyway?

Page 46: Zebus - Pitfalls of a P2P service bus

• We needed something light and easy to use

Page 47: Zebus - Pitfalls of a P2P service bus

• We needed to match the existing bus API

Page 48: Zebus - Pitfalls of a P2P service bus

• Well… we probably also underestimated the

madness that awaited us

Page 49: Zebus - Pitfalls of a P2P service bus

What can it do?

Up to 140K messages per second ~200M messages transferred per day 2 years in production More than 60K subscriptions ~350 peers in production

Page 50: Zebus - Pitfalls of a P2P service bus

Want to give it a try?

https://github.com/Abc-Arbitrage/Zebus

Page 51: Zebus - Pitfalls of a P2P service bus

Or build cool stuff with us?

We’re hiring

Page 52: Zebus - Pitfalls of a P2P service bus

Questions