22
CREATING ADDED VALUE WITH BIG DATA by KLAAS BOSTEELS @klbostee

Creating Added Value with Big Data

Embed Size (px)

DESCRIPTION

This talk essentially tells the story of the data science team at Massive Media, the company behind Netlog.com and Twoo.com. After obtaining invaluable first-hand experience in working with big data as a member of the information retrieval team at the music discovery website Last.fm, I joined Massive Media to conceive, build and lead a brand new team around big data and data science for them. In doing so, I developed a pretty clear perspective on how to introduce big data within a company and create added value from it, which is precisely what I would like to share in this talk.

Citation preview

Page 1: Creating Added Value with Big Data

CREATINGADDED VALUEWITH BIG DATA

by KLAAS BOSTEELS@klbostee

Page 2: Creating Added Value with Big Data

MY CAREER PATH SO FAR

2007: Began working with big data as PhD student

2009: Embarked on a data science career at Last.fm

2011: Joined Massive Media as Lead Data Scientist

Data company at heart; one of the earliest Hadoop adopters world-wide; inventors of Ketama; organised first “NoSQL” meetup in SF.

Huge audience and tremendous potential, but data science newcomer at the time.

Page 3: Creating Added Value with Big Data

Second big product of Massive Media, after Netlog

2011: Initial launch of Twoo.com

2012: Biggest dating site world-wide on comScore

2013: Massive Media acquired by InterActiveCorp

Page 4: Creating Added Value with Big Data

IT’S A BIG FAMILY

IAC’s main personals brands:

Some other well-known IAC brands:

Page 6: Creating Added Value with Big Data

BOOTSTRAP BY SAVING OR GAINING MONEY

You need to get some capital to get started

Saving money tends to be easier in practice

Real-world example:

• Analyzing CDN logs unveiled abuse

• Stopping the abuse greatly reduced the bills

Page 8: Creating Added Value with Big Data

HADOOP

Not the holy grail, but deserves a central role

It has a vibrant community and is proven to be:

ECONOMICAL runs on commodity hardware

SCALABLE smart distributed processing

MAINTAINABLE very robust and fault-tolerant

FLEXIBLE predefined schemas not required

Page 9: Creating Added Value with Big Data

STEP 3

BUILD DASHBOARDS

photo by Dawn Hopkins

Page 10: Creating Added Value with Big Data

STATS PIPELINE BASED ON HADOOP

MapReduce

HBase

HDFS

Log collector

Dashboardsin batches

continuous

Page 11: Creating Added Value with Big Data

STATS PIPELINE BASED ON HADOOP

Realtimeprocessing

Cfr. “lambda architecture”

coined by @nathanmarz

MapReduce

HBase

HDFS

Log collector

Dashboardsin batches

continuous

Page 12: Creating Added Value with Big Data

STATS PIPELINE BASED ON HADOOP

Ad-hoc results

Realtimeprocessing

Cfr. “lambda architecture”

coined by @nathanmarz

MapReduce

HBase

HDFS

Log collector

Dashboardsin batches

continuous

Page 13: Creating Added Value with Big Data

CUSTOM-TAILORED WEB INTERFACE

Annotation & exporting functionality

SupportsA/B testingand cohort

analysis

Various othernifty extra’s

Page 14: Creating Added Value with Big Data

STEP 4

ASSEMBLE A TEAM

photo by Jean-François Schmitz

Page 15: Creating Added Value with Big Data

THE SECRET IS IN THE MIX

Hadoop’s tricks also apply to data science teams

• Avoid specialisation to allow easy distribution and scaling

• Exploit data locality by hiring people with wide skill set

Great Data Scientists have the right mix of skills

• Hackers with solid technical background

• Analytical mind that knows statistics and machine learning

• Clever and creative in everything they do

Page 17: Creating Added Value with Big Data

STEP 5

EXPLORE & INNOVATE

photo by NASAr

Page 18: Creating Added Value with Big Data

SOME TIPS AND TRICKS

Dare to fail and/or start from estimates

Introduce data exploration/innovation days

• Basically 20% time devoted to playing with data

• Incorporate collaborative brainstorming

• Goal is to find promising new projects to work on

Communicate findings to the rest of the company

• Fun and silliness are allowed

• Prototype early and often

Page 19: Creating Added Value with Big Data

PRODUCT INSIGHTS & EXTENSIONS

E.g. recommendations and activity patterns analysis

Page 21: Creating Added Value with Big Data

FIVE SIMPLE STEPS IS ALL IT TAKES

1

2

3

4

5

FOLLOW THE MONEY

EMBRACE HADOOP

BUILD DASHBOARDS

ASSEMBLE A TEAM

EXPLORE & INNOVATE

Page 22: Creating Added Value with Big Data

FIVE SIMPLE STEPS IS ALL IT TAKES

1

2

3

4

5

FOLLOW THE MONEY

EMBRACE HADOOP

BUILD DASHBOARDS

ASSEMBLE A TEAM

EXPLORE & INNOVATE

Thanks!Questions?