1 Herald: Achieving a Global Event Notification Service Luis Felipe Cabrera, Michael B. Jones, Marvin Theimer Microsoft Research

1

Herald: Achieving a Global Event Notification Service

Luis Felipe Cabrera, Michael B. Jones, Marvin Theimer

Microsoft Research

2

Global Event Notification Services

• Communication via event notification (also called publish/subscribe) is well-suited for loosely-coupled eCommerce applications, as well as Internet-scale distributed applications (e.g. instant messaging and multi-player games).

• General event notification systems currently:– scale to tens of thousands of clients,– do not have global reach.

3

Internet-scale Issues

• Scaling requirements are millions and billions, perhaps more.

• There will (probably) not be a single organization that owns the entire event notification infrastructure. Hence a federated design is required.

• Global reach implies that failures and network partitions will be common-place.

4

Focus on the Basic Distributed Systems Primitives

• Focus on the scalability of basic message delivery and distributed state management capabilities.

• Employ a very simple message-oriented design and assume – until proven otherwise – that richer event notification semantics can be layered on top.

5

Herald Event Notification Model

Creator

Publisher Subscriber

1: Create Rendezvous Point

2: Subscribe3: Publish

4: Notify

Rendezvous Point

Herald Service

6

Design Criteria

• The “usual” criteria:– Scalability– Resilience– Self-administration– Timeliness

• Additional criteria:– Heterogeneous federation– Security– Support for disconnection– Partitioned operation

7

Scalability

• 1011 Rendezvous Points (RPs)

• 1011 publishers & subscribers in aggregate

• 1010 publishers & subscribers per RP

• 1010 federation members

• 102 events/sec/RP

8

Resilience

• “Fail last, fail least” semantics.

• Correct operation in the presence of malicious/corrupt participants.

9

Self-administration

• System decides where to place state and how to propagate information about state changes.

• System dynamically adapts to changing loads and the presence of faults and network partitions.

• No manual tuning.

10

Timeliness

• Event notification should normally take seconds not hours.

11

Heterogeneous Federation

• Federation of machines within cooperating but mutually suspicious domains of trust.

• Federated parties may include both small and large domains.

12

Security

• Support restricted access to Herald facilities.

• Support concepts such as groups and roles.

13

Support for Disconnection

• Eventual delivery to disconnected subscribers.

• Event histories to allow a posteriori examination of the past.

14

Partitioned Operation

• Continued operation on both sides of a network partition.

• Eventual (out-of-order) delivery after partition healing.

15

Non-Goals

• What’s the “best” way to do:– Naming– Filtering– Complex subscription queries

• In-order delivery (except as layered on top)

16

Applying Lessons of the Internet and Web

• Assume things are broken:– Mutual suspicion and no dependence on correct

behavior by others.

• Don’t try to fix everything:– All distributed state is maintained in a weakly-

consistent soft-state manner and is aged.

– All distributed state is incomplete and may be inaccurate.

17

Design Overview

• We think we only need these mechanisms:– Replication.– Overlay distribution networks.– Time contracts.– Event histories.– Administrative rendezvous points.

18

Replication

RP1@L1

RP2@L1

Herald@L1

RP1@L2

Herald@L2

RP1@L3

Herald@L3

Pub1

Sub2

Pub2

Sub1

Sub4

Sub5

Pub3Sub3

19

Overlay Distribution Networks

RP1@L1

Herald@L1

RP1@L2

Herald@L2

RP1@L3

Herald@L3

Pub1

Sub2

Sub1

Sub4

Pub2

RP1@L3

Herald@L3

20

Time Contracts

Creator

Pub1 Sub1

RP1

Herald Service

CreatorRP1

Pub1Sub1

601030

21

Event Histories

Creator

Pub1

Sub1

RP1

Herald ServiceCreator

RP1

Pub1Sub1

601030

History 50

22

Administrative Rendezvous Points

RP1

Herald Service

Name Service

1. Subscribe RP1@

2. Notify(change)

23

Engineering & Research Issues

• Baseline scalability numbers

• Dynamic system reconfiguration

• Federation and security

24

Baseline Scalability Numbers

• How scalable are single-node servers and server clusters?

• What are multicast-style delivery systems actually capable of, especially in aggregate?

25

Dynamic System Reconfiguration

• Reconfiguring distributed RP state in response to aggregate workloads and global state changes.

• Dealing with “flash crowd” loads.• Placement of RP state to minimize the effects of

network partitions and disconnection.• Placement of RP state to enable efficient

implementations of higher-level pub/sub semantics.

26

Federation and Security

• Can we define simple, open protocols?

• Will we need heavy-weight mechanisms to deal with malicious/corrupt servers?

• How should anonymity and privacy be dealt with/supported?

27

Related Work

• Non-global event notification systems (Gryphon, Ready, Siena, …)

• Netnews

• P2P systems such as Gnutella and Farsite

• Overlay & multicast networks

• CDNs

• OceanStore

28

Conclusion

• Global event notification is emerging as a key Internet technology.

• Herald is exploring scalability of the basic message and distributed state management aspects of an event notification system:– Gain engineering experience with scalable pub/sub

systems.– Explore dynamic system reconfiguration.– Understand the implications of federation and security.

Documents

1 Herald: Achieving a Global Event Notification Service Luis Felipe Cabrera, Michael B. Jones, Marvin Theimer Microsoft Research