Foursquare data @ Strata NY 2011

Preview:

Citation preview

San Francisco

Australia

Brooklyn

repeat checkin %

The Adjustment Bureau

Explore

Engineering

Our StackA

pp

licat

ion

Sta

ck

Scala/Liftweb API Machines WWW Machines Batch Jobs

Scala Application code

Mongo/Postgres/Flat Files

Databases Logs

Dat

a St

ack Amazon S3 Database Dumps Log Files

EMR Hadoop

Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs

mongoexport

postgres dumpFlume

Massive Intersection Queries

select * from checkins

where user in friends

and venue in nearby

select * from similarities

where venue1 in myHistory

and venue2 in nearby

1

2

3

4

5

a

b

c

d

e

< 100ms

Analytics

Goals

• Simple Interface

• Scales with our data

• Cheap (free)

• Supports 90% use cases

• Fast

Our Internal Dashboard

EMR

The Team

The Anatomy of a Data Team

• Analytics (Stats)

• Science (ML)

• Engineering (CS)

• Mix of all of these!

Building a Data Team

• It’s hard

• Good references:

– http://radar.oreilly.com/2011/09/building-data-science-teams.html

– http://mathbabe.org/2011/09/25/why-and-how-to-hire-a-data-scientist-for-your-business/

Join us!

foursquare is hiring

www.foursquare.com/jobs

Justin Moore@injust

justin@foursquare.com