43
No bid left behind My day to day handling a resilient real time bidding platform in a JVM environment. Marc de Palol Trovit

No bid left behind

Embed Size (px)

Citation preview

No bid left behind My day to day handling a resilient real time bidding platform in a JVM environment. 

Marc de Palol Trovit

Hey hi,

• Studied here (good to be back)

• Some research on supercomputing

• Moved to London, discovered Hadoop & intensive data systems.

• Came back, still in the ‘Data Engineering’ stuff.

A classified search engine for property, jobs, cars, products and holiday rentals

• 180 Million ads,

• 170 Tb in the cluster

• 65 Million uniques / 170 Million visits

• 10 apps (iOS, Android)

• Cool office in Barcelona.

have a look at http://www.trovit.es

Real Time Bidding

It’s about selling ads.

• Per impression basis.

• Programmatic instantaneous auction

We are using ‘DoubleClick Ad Exchange’ (Google)

• Response under 100 ms.

• If 15% of our responses are invalid or timed out, we stop getting bid requests progressively

Currently 10.000 QPS.

This system, literally, spends money. So, it must be rock solid.

Our system is coded carefully, with love and tests.

Still, sh*t happens.*t Happens

Resiliency

The ability to recover from unexpected errors. The ability to sleep at night.

Detect Recover Warn

Detect Recover Warn

Monitoring Resiliency Patterns Notifications

Monitoring, in a sensible way

• Logging with ‘mailAppender’

log4j.appender.mail=org.apache.log4j.net.SMTPAppender log4j.appender.mail.SMTPHost=localhost log4j.appender.mail.From=Error <[email protected]> [email protected], [email protected] log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE log4j.appender.mail.layout=org.apache.log4j.PatternLayout log4j.appender.mail.threshold=ERROR

• Logging with ‘mailAppender’

Probably, no e-mail when you’ve got an OOM.

log4j.appender.mail=org.apache.log4j.net.SMTPAppender log4j.appender.mail.SMTPHost=localhost log4j.appender.mail.From=Error <[email protected]> [email protected], [email protected] log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE log4j.appender.mail.layout=org.apache.log4j.PatternLayout log4j.appender.mail.threshold=ERROR

Let’s talk about OOM for a minute.

Let’s talk about OOM for a minute.

ps ax | grep java

Let’s talk about OOM for a minute.

ps ax | grep java

JVMOpts=“- XX:OnOutOfMemoryError= /usr/local/bin/slack-msg.sh"

🚫

👍

Some cool ideas for improving memory usage

• byte[] serialization in objects ❗

• Varying Memory Conditions ❗

• Logging with ‘mailAppender’

• Bad when OOM.

• Logging with ‘mailAppender’

• Bad when OOM.

• Heartbeat

• Doing some real work

• Logging with ‘mailAppender’

• Bad when OOM.

• Heartbeat

• Doing some real work

• Supervision with actors

• If you’re using Akka

• control flow != data flow

Our Monitoring:

• Nagios.

• Logging (to Sentry)

• Heartbeats with real work.

• graphite comparison

Our Monitoring:

• Nagios.

• Logging (to Sentry)

• Heartbeats with real work.

• graphite comparison

Have graphs

Now we know that something

is going wrong.

Recovery

Bad data in the system

or / and

Errors in the system

Data errors.

Roll back (when possible)

• Keeping different versions in the DB.

• Keep the old version around.

• Know how to do a rollback.

Data errors.

Roll back (when possible)

• Keeping different versions in the DB.

• Keep the old version around.

• Know how to do a rollback.

Checks & Asserts with google guava.

checkArgument(i >= 0, "Argument was %s but expected nonnegative", i);

checkArgument(i < j, "Expected i < j, but %s > %s", i, j);

checkNotNull(myList, "List should not be null")

checkState(object.isValid(), "Object is not valid")

System errors

These happen mostly between system integrations.

• Your code and the DB.

• Your code and the 3rd party library.

• Your code and the queue.

DBs, a necessary supervillain

• Lost connection.

• Timeouts

• Can give you corrupted data.

• Can give you 0 data.

• Can give you too much data.

Circuit Breaker and his friend,

the Bulkhead Pattern.

Circuit Breaker

Our Beloved CircuitBreakers

Bulkhead

Once the circuit breaker is open,

• Notify

• Try again! maybe.

• Try to avoid DOS your own system.

• Exponential retry.

• Failover

• Restart

Some other bits and pieces:

• Tight coupling leads to fast propagation of errors.

• Event driven stuff

• Complete parameter checking

• Avoid SPF’s. Pretty please.

• Stateless is better.

• Bounded queues!

Your turn.

[email protected] @lant

[]

http://www.maxisciences.com/destruction/wallpaper