36
Developing Data Products SF Data Science Meetup Pete Skomoroch @peteskomoroch September 19 2013 ©2012 LinkedIn Corporation. All Rights Reserved.

SF Data Science: Developing Data Products

Embed Size (px)

Citation preview

Page 1: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Developing Data ProductsSF Data Science MeetupPete Skomoroch @peteskomorochSeptember 19 2013

Page 2: SF Data Science: Developing Data Products

Developing Data ProductsExamples, Techniques, & Lessons Learned

Page 3: SF Data Science: Developing Data Products

Our MissionConnect the world’s professionals to make them

more productive and successful.

Our VisionCreate economic opportunity for every

professional in the world.

Members First!

Page 4: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 4

LinkedIn is the leading professional network site

Worldwide Workforce

3,300M+2

Worldwide Professionals

640M+2

LinkedIn Members238M+

1

Page 5: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 5

LinkedIn profiles represent our professional identity

238MMembers 238M MemberProfiles

1 2

Page 6: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

We have a lot of data.

Page 7: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

We have a lot of data.And (like everyone else), we store it in Hadoop.

Page 8: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

We have a lot of data.And (like everyone else), we store it in Hadoop.And people build awesome things with that data.

Page 9: SF Data Science: Developing Data Products

What do we mean by data products?

Page 10: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Building products from data at LinkedIn

A few examples:

People You May Know Skills and Endorsements Year in Review Network Updates Digest InMaps Who’s viewed my profile Collaborative Filtering Groups You May Like and more…

Page 11: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Collaborative Filtering: LinkedIn Skill Pages

Page 12: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Classification: giving structure to unstructured data

Extract

Page 13: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Clustering & Disambiguation

Page 14: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

De-duplication and Normalization

Page 15: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 15

Network Algorithms: Relevance & Ranking

Page 16: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Prediction: Personalized Skill Recommendations

Page 17: SF Data Science: Developing Data Products
Page 18: SF Data Science: Developing Data Products
Page 19: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Skill Endorsements: Over 2 Billion and Growing

Page 20: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 20

Social Proof and the Skill Endorsement Graph

Page 21: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 21

The Economic Graph: Skills, Jobs, People, Locations…

TimeLocation

Page 22: SF Data Science: Developing Data Products

Lessons learned developing data products

Page 23: SF Data Science: Developing Data Products

Collect the right data at the right time

Page 24: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 24

Large amounts of data can reveal new patternsP

rob

ab

ilit

y of

Job

Tit

le

Time since graduation

Page 25: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 25

Be wary of “black-box” approaches

Page 26: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 26

Look at your data

Page 27: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 27

Aggregate statistics can be misleading

Page 28: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 28

Build a viewer app, “micro-listen”

Page 29: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 29

Algorithmic intuition: include data geeks in design

Page 30: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 30

OODA: Think like a jet fighter

Page 31: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 31

OODA: Observe, Orient, Decide, Act

Page 32: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 32

OODA: The speed you can move determines victory

Page 33: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 33

Red teaming: what can go wrong likely will

Page 34: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 34

Error data is valuable, analyze it and adapt

Page 35: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved.

Conclusion: tips for developing data products

Collect the right data at the right time Large amounts of data can reveal new patterns Be wary of “black box” approaches Look at your raw data Aggregate statistics can be misleading Build and use viewer apps Include data geeks in design process OODA: Think like a jet fighter Red-teaming: anticipate edge cases Find opportunity in your error data

Page 36: SF Data Science: Developing Data Products

©2012 LinkedIn Corporation. All Rights Reserved. 36

Questions?

@peteskomoroch