38
Messaging “Just pick something”

Messaging

Embed Size (px)

Citation preview

Messaging“Just pick something”

A little about myself

● Sean Kellyo Also known as Stabby

● I went from .NET to Ruby to Goo But my favorite language is SQL

● Core maintainer of Tapjoys:o Chore - https://github.com/Tapjoy/choreo Dynamiq - https://github.com/Tapjoy/dynamiq

● I love IPAs

Speaking of Tapjoy...

We do…● 1.8 Billion Requests per minute

o And almost as many messages per day● ~170 Million Jobs per day● All on ~750 EC2 instances and private

servers● A stocked double-kegerator

o Right now: Pinner IPA / Cisco Summer of Lager

What is “Messaging”Like, jobs and stuff?

Messaging is...

● A way to share important events, without needing to know who's listening

● A way to handle processing events and information at a larger scale

● Not all that unlike “background jobs”o Jobs: “I’ll do this later”o Messaging: “Other people will do this later”

How does this fit into my app?

It sure sounds cool

Messaging and You

Let’s say you’ve got a great new app, and for a while things are fine

Monolith1.0

Messaging and You

Eventually, you need to push work out of band

Monolith1.2

Jobs

Messaging and You

Now you have several services, and they all need to share info

Monolith1.5

Jobs

One-off which becomes a core part of

your business

Jobs

Failed attempt at

Micro Service

Reporting System

Sure, but how can you actually use Messaging?

Those weren’t even very good drawings

They didn’t have lines or anything

What types of Messaging are there?● 1:1, traditional “Queueing”

o Basic push / pull model of doing worko Common with asynchronous job processingo RabbitMQ, ActiveMQ, SQS, Disqueue, Dynamiq, NSQ

● Fanouto Broadcast style publishing, all listeners get a copyo Ex: A game pushing out notifications of an updateo Most technologies with 1:1 queues support this in some way

What types of Messaging are there?● Routing

o Intelligent fanout, routes to listeners based on message metadata

o Newsgroups: Subscribe to food.charcuterie.*, get bressolao RabbitMQ does this pretty well

● Streamingo Persistent connection, constant source of raw byteso Twitter's Firehose is one exampleo Kafka is a current popular choiceo Really popular with the Scala / Spark crowd

OK, so my Apps and Services need to talk

Can’t I just stick it all in a shared database and be done with it?

NoYou certainly cannot

Some things, maybe

But not everything, it just doesn’t scale that way

Why not just stick it all in a DB?

● You can some share of your data this wayo Depends on the use case, type of informationo This is outside the scope of this talk

● Databases are not designed for delivering messages

o Any “queue” tables will be ridiculously contendedo No atomic “pull” options

So, what does Tapjoy do?You guys must have solved this, right?

At Tapjoy, we use...● RabbitMQ

o Moves analytics events to reporting endpoints by way of complex filesystem / s3 approacho Single node with sharded queueso Rabbit HA cannot handle our scale

● SNS / SQSo SNS in some newer projects, mostly for fanouto SQS for all traditional background jobs

● Kinesiso Pilot integration for a new analytics pipelineo Being supplanted with Kafka

● Kafkao New analytics pipelineo Used to distribute metrics to both the new endpoint as well as the existing one for

verification● Dynamiq

o Inhouse Open Source SNS/SQS-alike built on top of Riak 2.0o Currently used to circumvent complicated and slow legacy messaging service

But I’m not really here to talk about Tapjoy

Not entirely

I’m more interested in you

So, what do I pick?There are so many choices, and they all

seem like they’d work

I’m not really here to tell you what to pick, either!

I’d rather talk to you about how to pick, and how you integrate your choices

Distributed Systems are all about tradeoffs

Ask: What are my actual needs?

● Planning for 2 years down the road is smarto But solutions right now get shit doneo Include a cost projection with scale estimates

● Build a prototype (or two)o Try to iterate quicklyo Understand how you’d use whatever you chooseo Don’t be afraid to move ono Look at multiple client libraries

Look for: Docs, Active repos, Idiomatic

Ask: What is my latency tolerance?

● Publishing Messageso How much time can your app tolerate for publishing?o What does publish latency look like during an issue?o Consider the worst-case scenario when planning

● Consuming Messageso Can you run multiple consumers without impacting

the service?● End to End

o How fast is the whole experience, round trip?

Ask: What level of durability?

● Cliento Batched VS Unbatched / Streamingo Acknowledged writes

● Servero Messages held in memory VS disko Messages highly-available?o Recover from network partitions safely?o At-Most-Once VS At-Least-Once

Exactly-Once is something of a myth

Ask: What about throughput?

● How many producing clients do you have● How many messages per second will they submit

o Does message size impact performance?● What size should the cluster be?

o Super cluster VS specialized clusters● How many consumers it takes to keep pace

o With room to grow

Ask: What does failure mean?

● What does a message publishing error mean?

● What does a delay in the processing pipeline mean?

● What does a “lost” or failed message mean?● What does a total failure of the messaging

system mean?

Ask: What behavior do I want?

Is it…● CA?

o Not distributed, will be difficult to scale past 1 boxo Traditional RDBMS systems are typically CA

● CP?o Good if you need strongly consistent datao Partitions can cause data unavailability

● AP?o Good if you need complete availabilityo Eventual consistency can often be “good enough”

Okay, so I lied a little bitI’ll give you one recommendation

Do you have...

● Relatively small (< 256kb) message sizes?● Not so strict (~50ms) latency requirements?● Throughput on the order of 100m or less per

month?● A tolerance or capability to handle the

occasional duplicate message?● No concern around being locked into a

vendor-specific technology?

Go use SNS and SQS immediately

Leave here now and just do itIt’s easy, it’s cheap (at that scale), and you

don’t need to maintain it

Ok, so I picked “something”

Anything else to know?

You don’t have to choose just 1

● It’s a falsehood that you need 1 perfect technology

o Each has strengths, weaknesses, and ideal use cases

● Don’t be afraid to use something elseo If you’re lucky, your app lives long enough to see

many different infrastructure needs

Avoid direct implementations

● Wrap the notion of Publishing in an interfaceo Most technologies look surprisingly similar to publisho You can wrap this in a simple interface, and switch

implementations as needed● Consuming is usually unique per technology

o Just write a new oneo Trying to interface this part is probably more trouble

than it’s wortho Play to the unique strengths of the technology

Interfacing your Messaging choices● Sending messages is often as simple as a name and a chunk of

datao Define a simple interface for pushing arbitrary data towards a

named endpointo A name and a string of JSON is usually enough to get goingo At Tapjoy, we use our Chore library to handle abstracting

message publishing from messaging technologies● Destinations are independent from messages

o You could need to switch sending messages to a new technology

o You could have 2 or more different systems depending on the information in a given message

How do I change messages safely?● Wrap messages in a simple envelope

o Keep metadata about the message distinct from metadata about the event it describes

● Define schemas for message bodieso Schemas give you a catalogue of message definitions, and the

ability to version themo At Tapjoy, we use our TOLL to build endpoint-agnostic clients

based on schemas, and register them to use Chore publishers.● Consumers need older schemas

o Lets them reason about how to handle older messageso Keep a backlog of N older versions, drop support for > N

In Conclusion

Keep in mind

● Distributed Systems - all about tradeoffso Never trade “P”

● Understand your needso Latency, Throughput, Availability, Durability

● Understand how it fits into your architecture● Interfaces are your friend

o They can give you a lot of flexibility

Keep in mind

● Use schemas and versioning to support changes to messages themselves

● Just pick somethingo Build a prototype, or two (or three)o Your second try will probably go bettero SNS/SQS is a decent choice, if latency isn’t a

concern● Tapjoy is a great place to work on these kinds of

problems at huge scale

Messaging“Just pick something”

Sean Kelly@StabbyCutyou