Data Science @ LinkedIn - Predictive Analytics World...Data Science @ LinkedIn : Insights and...

Preview:

Citation preview

Data Science @ LinkedIn : Insights and Innovation at Scale

Manu Sharma Principal Research Scientist and Group Manager, Product Analytics, LinkedIn

Predictive Analytics World, New York, Oct 19, 2011

Agenda

•  LinkedIn

•  Data Science

•  Scale

•  Innovation

•  Insights

Agenda

•  LinkedIn

•  Data Science

•  Scale

•  Innovation

•  Insights

www.linkedin.com

Connect the world’s professionals to make them more productive and successful

LinkedIn at a glance

•  Founded in 2003

•  #31 site in the US (#34 Worldwide)

•  120MM+ Members

•  First million members ~500 days

•  Latest million ~ 6 days

•  2 new signups per second

•  ~2B people searches performed in 2010

Who is on LinkedIn?

•  Executives from all Fortune 500 companies

•  8.5M+ SMB professionals

•  9M+ recent college grads

•  Average Household income $107,000

•  42% are “decision makers”

We are truly International

•  More than 50% members are outside the US

•  Over 200 countries & territories are represented

•  26M+ members in Europe

•  11M+ members in India

•  Available in 9 languages

Multiple revenue channels

•  Premium Subscriptions

•  Self Serve Ads

•  Hiring Solutions

•  Marketing Solutions

•  LinkedIn

•  Data Science

•  Scale

•  Innovation

•  Insights

What is Data?

•  Factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation.

•  Information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful

•  Information in numerical form that can be digitally transmitted or processed

Source : http://www.merriam-webster.com

da·ta noun pl but singular or pl in constr, often attributive\ˈdā-tə, ˈda- also ˈdä-\

Or…

Data is the new black -Reid Hoffman

What is Data?

•  Factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation.

•  Information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful

•  Information in numerical form that can be digitally transmitted or processed

Source : http://www.merriam-webster.com

da·ta noun pl but singular or pl in constr, often attributive\ˈdā-tə, ˈda- also ˈdä-\

In that case...

Web Logs = Data Normalized Data = Information

Parse, Normalize, Standardize

Information

Knowledge

Insights

Wisdom

What is Data Science?

Using (multiple) data elements in clever ways to solve iterative or auxiliary data problems that when combined solve a data problem that might otherwise be itractable.

Data Scientist = Curiosity + Intuition + Data gathering + Standardization + Statistics + Modeling + Visualization

What makes a Data Scientist?

What Technologies do we use?

Crowdsourcing

Demand for Data Scientists

What do we do with Data?

•  Build innovative data products

•  Draw insights

•  Drive the business

Before we can do that...

•  There are a few challenges that we have to overcome

•  LinkedIn

•  Data Science

•  Scale

•  Innovation

•  Insights

Big Data brings unique challenges

• What do you do when out of the box solutions don’t work?

•  You build your own solutions

What is the scale ���of LinkedIn Data?

•  75TB data crunched daily to produce recommendations

•  10B rows of data processed daily

•  Real time (up to the minute) data availability for key events

• Most webtrack events are updated every 15 minutes

Some of our Homegrown Solutions

Agenda

•  LinkedIn

•  Data Science

•  Scale

•  Innovation

•  Insights

Using data to build new products

What data do we start with?

Before we can leverage the data, we need to make sure it

is clean...

Standardization - The challenges

We can standardize the title Software Engineer with 6000+ variations including: s.w engineer sw. enginner sofware engineer sw. engineer software enginer - china

We can standardize the company IBM with 8000+ variations including: ibm - ireland ibm research TJ Watson Labs International Buss. Machines

The easiest data product to build

Collaborative filtering

The most important data product we built...

People You May Know

Once you have clean data and engagement, begin to leverage the data to benefit the user...

Data Products on your LinkedIn homepage

Our Newest Data Product : Skills

How do we do it?

Extract

Standardized data is the key to building compelling products

What Skills do we have?

What Skills do we have?

How are we connected?

How are we connected?

Agenda

•  LinkedIn

•  Data Science

•  Scale

•  Innovation

•  Insights

Insights

•  Reporting

•  A/B Testing

•  Exposing demographic trends

•  Understanding user and usage

•  Drive the business

Using data for insights

The 2008 Financial Collapse

How Often do people change jobs?

What are these jobs?

CEO Names

If your name is Chip, you are likely in sales!

Keep your profile up to date, but...

I am a highly motivated, innovative dynamic professional with extensive experience in analytics, who has a proven track record of being a problem solver and a team player in fast paced entrepreneurial environments

Keep your profile up to date, but...

I am a highly motivated, innovative dynamic professional with extensive experience in analytics, who has a proven track record of being a problem solver and a team player in fast paced entrepreneurial environments

How you describe yourself matters

Wisdom: Using data to drive the business

Strategic Analyses

• What is the value of an action that a user takes on the site?

• What is the value of a user?

• What early behavior on the site is predictive of future engagement?

•  Does mobile usage impact site engagement?

Ultimately...

It is all about the people

We’re Hiring

msharma@linkedin.com

Recommended