58
Insight without Interference Monitoring with Scala, Swagger, MongoDB and Wordnik OSS Tony Tam @fehguy

System insight without Interference

Embed Size (px)

DESCRIPTION

Talk at Wordnik HQ about how to monitor application performance and business goals without intrusive engineering work on your core product.

Citation preview

Page 1: System insight without Interference

Insight without InterferenceMonitoring with Scala, Swagger, MongoDB and Wordnik

OSSTony Tam@fehguy

Page 2: System insight without Interference

Nagios Dashboard

Page 3: System insight without Interference

Monitoring?

IT Ops 101

Host Checks

System

Load

Disk Space

Network

Page 4: System insight without Interference

Host Checks

System

Load

Disk Space

Network

Monitoring?

Necessary(but

insufficient)

Page 5: System insight without Interference

Why Insufficient?

•What about Services?

• Database running?

• HTTP traffic?

•Install Munin Node!

• Some (good) service-level insight

Page 6: System insight without Interference
Page 7: System insight without Interference

Your boss LOVES charts

“OH pretty

colors!”

“up and to the right!”“it MUST

be important

!”

Page 8: System insight without Interference

Good vs. Bad?

•Database calls avg 1ms?

• Great! DB working well

• But called 1M times per page load/user?

•Most tools are for system, not your app

•By the time you know, it’s too late

Need business metrics

monitoring!

Page 9: System insight without Interference

Enter APM

•Application Performance Monitoring

•Many flavors, degrees of integration

• Heavy: transaction monitoring, code performance, heap, memory analysis

• Medium: home-grown profiling

• Light: digest your logs (failure forensics)

•What you need depends on architecture, business + technology stage

Page 10: System insight without Interference

APM @ Wordnik

•Micro Services make the System

Monolithic application

Page 11: System insight without Interference

APM @ Wordnik

•Micro Services make the System

Monolithic application

API Calls are the unit of work!

Page 12: System insight without Interference

Monitoring API Calls

•Every API must be profiled

•Other logic as needed

• Database calls

• Connection manager

• etc...

•Anything that might matter!

Page 13: System insight without Interference

How?

•Wordnik-OSS Profiler for Scala

• Apache 2.0 License, available in Maven Central

•Profiling Arbitrary code block:import com.wordnik.util.perf.Profile

Profile("create a cat", {/* do something */})

•Profiling an API call:Profile("/store/purchase", {/* do something */})

Page 14: System insight without Interference

Profiler gives you…

•Nearly free*** tracking

•Simple aggregation

•Trigger mechanism

• Actions on time spent “doing things”:

Profile.triggers += new Function1[ProfileCounter, Unit] { def apply(counter: ProfileCounter): Unit = { if (counter.name == "getDb" && counter.duration > 5000) wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW) return counter }}

Page 15: System insight without Interference

Profiler gives you…

•Nearly free*** tracking

•Simple aggregation

•Trigger mechanism

• Actions on time spent “doing things”:

Profile.triggers += new Function1[ProfileCounter, Unit] { def apply(counter: ProfileCounter): Unit = { if (counter.name == "getDb" && counter.duration > 5000) wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW) return counter }}

This is intrusive on

your codebase

Page 16: System insight without Interference

Accessing Profile Data

•Easy to get in codeProfileScreenPrinter.dump

•Output where you wantlogger.info(ProfileScreenPrinter.toString)

•Send to logs, email, etc.

Page 17: System insight without Interference

Accessing Profile Data

•Easier to get via API with Swagger-JAXRS

import com.wordnik.resource.util

@Path("/activity.json")@Api("/activity")@Produces(Array("application/json"))class ProfileResource extends ProfileTrait

Page 18: System insight without Interference

Accessing Profile Data

Page 19: System insight without Interference

Accessing Profile Data

Inspect without bugging

devs!

Page 20: System insight without Interference

Is Aggregate Data Enough?

•Probably not

•Not Actionable

• Have calls increased? Decreased?

• Faster response? Slower?

Page 21: System insight without Interference

Make it Actionable

•“In a 3 hour window, I expect 300,000 views per server”

• Poll & persist the counters

• Example: Log page views, every min{

"_id" : "web1-word-page-view-20120625151812","host" : "web1","count" : 627172,"timestamp" : NumberLong("1340637492247")

},{"_id" : "web1-word-page-view-20120625151912","host" : "web1","count" : 627372,"timestamp" : NumberLong("1340637552778")

}

Page 22: System insight without Interference

Make it Actionable

Page 23: System insight without Interference

Make it Actionable

Your boss LOVES charts

Page 24: System insight without Interference

That’s not Actionable!

•But it’s pretty

What’s missing?

APIs to track?

Low + High

Watermarks

Custom Time

window

Too much custom

Engineering

Page 25: System insight without Interference

That’s not Actionable!

APIs to track?

Low + High

Watermarks

Custom Time

window

Too much custom

Engineering

Call to Action!

Page 26: System insight without Interference

Make it Actionable

•Swagger + a tiny bit of engineering

• Let your *product* people create monitors, set goals

•A Check: specific API call mapped to a service function{ "name": "word-page-view", "path": "/word/*/wordView (post)", "checkInterval": 60, "healthSpan": 300, "minCount": 300, "maxCount": 100000}

Page 27: System insight without Interference

Make it Actionable

•A Service Type: a collection of checks which make a functional unit { "name": "www-api", "checks": [ "word-of-the-day", "word-page-view", "word-definitions", "user-login", "api-account-signup", "api-account-activated" ] }

Page 28: System insight without Interference

Make it Actionable

•A Host: “directions” to get to the checks { "host": "ip-10-132-43-114", "path": "/v4/health.json/profile?api_key=XYZ", "serviceType": "www-api”},{ "host": "ip-10-130-134-82", "path": "/v4/health.json/profile?api_key=XYZ", "serviceType": "www-api”}

Page 29: System insight without Interference

Make it Actionable

•And finally, a simple GUI

Page 30: System insight without Interference

Make it Actionable

•And finally, a simple GUI

Page 31: System insight without Interference

Make it Actionable

•Point Nagios at this!serviceHealth.json/status/www-api?explodeOnFailure=true

•Get a 500, get an alert

Metrics from

Product

Based on YOUR app

Treat like system failure

Page 32: System insight without Interference

Make it Actionable

Page 33: System insight without Interference

Is this Enough?

System monitoring

Aggregate monitoring

Windowed monitoring

Object monitoring?

• Action on a specific event/object

Why!?

Page 34: System insight without Interference

Object-level Actions

•Any back-end engineer can build this

• But shouldn’t

•ETL to a cube?

•Run BI queries against production?

•Best way to “siphon” data from production w/o intrusive engineering?

Page 35: System insight without Interference

Avoiding Code Invasion

•We use MongoDB everywhere

•We use > 1 server wherever we use MongoDB

•We have an opLog record against everything we do

Page 36: System insight without Interference

What is the OpLog

•All participating members have one

•Capped collection of all write ops

primary replica replicat0

time

t1

t3

t2

time

Page 37: System insight without Interference

So What?

•It’s a “pseudo-durable global topic message bus” (PDGTMB)

• WTF?

•All DB transactions in there

•It’s persistent (cyclic collection)

•It’s fast (as fast as your writes)

•It’s non-blocking

•It’s easily accessible

Page 38: System insight without Interference

More about this{

"ts" : {"t" : 1340948921000, "i" : 1

},"h" : NumberLong("5674919573577531409"),"op" : "i","ns" : "test.animals","o" : {"_id" : "fred", "type" : "cat"}

}, {"ts" : {

"t" : 1340948935000, "i" : 1},"h" : NumberLong("7701120461899338740"),"op" : "i","ns" : "test.animals","o" : {

"_id" : "bill", "type" : "rat"}

}

Page 39: System insight without Interference

Tapping into the Oplog

•Made easy for you!https://github.com/wordnik/wordnik-oss

Page 40: System insight without Interference

Tapping into the Oplog

•Made easy for you!https://github.com/wordnik/wordnik-oss

SnapshotsReplication

Incremental Backup

Same Techniqu

e!

Page 41: System insight without Interference

Tapping into the Oplog

•Create an OpLogProcessor

class OpLogReader extends OplogRecordProcessor { val recordTriggers = new HashSet[Function1[BasicDBObject, Unit]] @throws(classOf[Exception]) def processRecord(dbo: BasicDBObject) = { recordTriggers.foreach(t => t(dbo)) } @throws(classOf[IOException]) def close(string: String) = {}}

Page 42: System insight without Interference

Tapping into the Oplog

•Attach it to an OpLogTailThreadval util = new OpLogReader

val coll: DBCollection =

(MongoDBConnectionManager.getOplog("oplog",

"localhost", None, None)).get

val tailThread = new OplogTailThread(util, coll)

tailThread.start

Page 43: System insight without Interference

Tapping into the Oplog

•Add some observer functions

util.recordTriggers += new Function1[BasicDBObject, Unit] { def apply(e: BasicDBObject): Unit = Profile("inspectObject", { totalExamined += 1 /* do something here */ } }) } }

Page 44: System insight without Interference

/* do something here */

•Like?

•Convert to business objects and act!

• OpLog to domain object is EASY

• Just process the ns that you care about

"ns" : "test.animals”

•How?

Page 45: System insight without Interference

Converting OpLog to Object

•Jackson makes this trivial

case class User(username: String, email: String, createdAt: Date)

val user = jacksonMapper.convertValue( dbo.get("o").asInstanceOf[DBObject], classOf[User])

•Reuse your DAOs? Bonus points!

•Got your objects!

Page 46: System insight without Interference

Converting OpLog to Object

•Jackson makes this trivial

case class User(username: String, email: String, createdAt: Date)

val user = jacksonMapper.convertValue( dbo.get("o").asInstanceOf[DBObject], classOf[User])

•Reuse your DAOs? Bonus points!

•Got your objects!Now

What?

“o” is for “Object”

Page 47: System insight without Interference

Use Case 1: Alert on Action

•New account!obj match { case newAccount: UserAccount => { /* ring the bell! */ } case _ => { /* ignore it */ }}

Page 48: System insight without Interference

Use case 2: What’s Trending?

•Real-time activitycase o: VisitLog =>

Profile("ActivityMonitor:processVisit", {

wordTracker.add(o.word)

})

Page 49: System insight without Interference

Use case 3: External Analytics

case o: UserProfile => {

getSqlDatabase().executeSql(

"insert into user_profile values(?,?,?)",

o.username, o.email, o.createdAt)

}

Page 50: System insight without Interference

Use case 3: External Analytics

case o: UserProfile => {

getSqlDatabase().executeSql(

"insert into user_profile values(?,?,?)",

o.username, o.email, o.createdAt)

}

Don’t mix runtime &

OLAP!

Your Data pushes to Relational!

Page 51: System insight without Interference

Use case 4: Cloud analysis

case o: NewUserAccount => {

getSalesforceConnector().create(

Lead(Account.ID, o.firstName, o.lastName,

o.company, o.email, o.phone))

}

Page 52: System insight without Interference

Use case 4: Cloud analysis

case o: NewUserAccount => {

getSalesforceConnector().create(

Lead(Account.ID, o.firstName, o.lastName,

o.company, o.email, o.phone))

} We didn’t interrupt

core engineering

!

Pushed directly to Salesforce!

Page 53: System insight without Interference

Examples

Polling profile APIs

cross cluster

Page 54: System insight without Interference

Examples

Siphoning hashtags

from opLog

Page 55: System insight without Interference

Examples

Page view activity from

opLog

Page 56: System insight without Interference

Examples

Health check w/o

engineering

Page 57: System insight without Interference

Summary

•Don’t mix up monitoring servers & your application

•Leave core engineering alone

•Make a tiny engineering investment now

•Let your product folks set metrics

•FOSS tools are available (and well tested!)

•The opLog is incredibly powerful

• Hack it!

Page 58: System insight without Interference

Find out more

•Wordnik: developer.wordnik.com

•Swagger: swagger.wordnik.com

•Wordnik OSS: github.com/wordnik/wordnik-oss

•Atmosphere: github.com/Atmosphere/atmosphere

•MongoDB: www.mongodb.org