36
Big Data EcoSystem and Analytics @ LinkedIn May 16, 2013 LinkedIn Confidential ©2013 All Rights Reserved

Big data arch_analytics

Embed Size (px)

Citation preview

Page 1: Big data arch_analytics

Big Data EcoSystem and Analytics @ LinkedInMay 16, 2013

LinkedIn Confidential ©2013 All Rights Reserved

Page 2: Big data arch_analytics

Srinu Adira

Manager, Data Services(Business Solutions)

LinkedIn Corporation

http://www.linkedin.com/in/srinuadira

LinkedIn Confidential ©2013 All Rights Reserved 2

Page 3: Big data arch_analytics

Outline

LinkedIn OverviewWhy Data is important for LinkedIn?Big Data EcosystemAnalytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 3

Page 4: Big data arch_analytics

Our Mission

Connect the world’s professionalsto make them more productive and successful

LinkedIn Confidential ©2013 All Rights Reserved 4

Page 5: Big data arch_analytics

5

The LinkedIn Opportunity

Connect talent with opportunity at massive scale

+

Fundamentally transforming the way the world works

LinkedIn Confidential ©2013 All Rights Reserved

Page 6: Big data arch_analytics

200M+

The World’s Largest Professional Network

LinkedIn Confidential ©2013 All Rights Reserved 6

8 1732

55

90

147

2006 2007 2008 2009 2010 2011 2012

LinkedIn Members (Millions)

*

88%Fortune 100 Companies

use LinkedIn to hire

~2/secNew Members joining

>2.9MCompany Pages

Professional

searches in 2012

~5.7B

Page 7: Big data arch_analytics

Outline

LinkedIn Overview Why Data is important for LinkedIn? Big Data Ecosystem Analytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 7

Page 8: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 8

“If you are not embarrassed by the first versionof your product, you have launched it too late.”

Reid Hoffman, Founder & Chairman LinkedIn Corp

Page 9: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 9

“What gets measured gets fixed.”

David Henke, SVP Technology Operations, LinkedIn Corp

Page 10: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 10

The Power of LinkedIn’s Network Effects

Member growthand engagement

Relevant andvaluable products, solutions & services

Critical massof data

Page 11: Big data arch_analytics

Few Data Driven Products People You May Like Groups You May Like Jobs You May Be Interested In Who's Viewed Your Profile Companies You May Want To Follow

11

LinkedIn Confidential ©2013 All Rights Reserved

Page 12: Big data arch_analytics

Data Insights (Sample)

LinkedIn Confidential ©2013 All Rights Reserved 12

Page 13: Big data arch_analytics

Data Solutions (Sample)

LinkedIn Confidential ©2013 All Rights Reserved

Segmentation/Standardization

Propensity Modeling

TargetingChurn Analysis/LTV

Business Forecasting

Java/MPP/

Hadoop

ML/Statistical Packages

HadoopMPP

MPP

13

Page 14: Big data arch_analytics

Data Solutions Drivers Business analytics (e.g., data mining,

enable decision making) Sales analytics (e.g., customer

segmentation, targeting) Marketing (e.g., campaigns) Data insights for Customers (e.g., Career

site analytics) Business Operations (forecasting,

business pulse)

14

LinkedIn Confidential ©2013 All Rights Reserved

Page 15: Big data arch_analytics

Outline

LinkedIn Overview Why Data is important at LinkedIn? Big Data Ecosystem Analytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 15

Page 16: Big data arch_analytics

Big Data at LinkedIn

16

* Chart from Philip Russom- Research Director: TDWI

LinkedIn Confidential ©2013 All Rights Reserved

Page 17: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 17

Big Data at LinkedIn

Platform and solutions that Scale at cost with data complexity Simplify the data continuum across online, near-line

and offline Enable business decisions

Page 18: Big data arch_analytics

18

What does “big data” mean at LinkedIn?

ERP data…

Social Data…

CRM data…

Web data…

+∞

+∞

Data Volume

Analytical Challenge & Complexity0

18

LinkedIn Confidential ©2013 All Rights Reserved

Page 19: Big data arch_analytics

3 major data dimensions at LinkedIn

19

IdentityData

SocialData

Behavioral Data

LinkedIn Confidential ©2013 All Rights Reserved

Page 20: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 20

Near-LineData Store

Online DataStore

WebLogs

Big Data at LinkedIn

High-level data environment

Application

Users

Challenges so complex thatoff-the-shelf or a few

technologies can’t address

Offline DataStore

Built our own combination oftoolsets/ technologies tomeet specific requirements

Page 21: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 21

LinkedIn’s Sample Data Stack

Let’s do a deep dive to understand how the capabilities ofLinkedIn’s data stack meet our requirements

Page 22: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 22

Users

Near-LineData Store

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Online

Systems

Capabilities

Rich structures (e.g., indexes)

Change capture capability

Page 23: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 33

Users

Near-LineData Store

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Nearline

Systems Capabilities

Distributed Key value store

Search platform

Distributed Graph engine

Bobo Sensei

Voldemort

Zoie

D-Graph

Page 24: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 34

Users

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Pipeline

Systems Capabilities

Messaging for site events, monitoring

Change data capture streams

Reliable, consistent, low latency pipe

Near-LineData Store

Page 25: Big data arch_analytics

LinkedIn Confidential ©2013 All Rights Reserved 35

Users

Near-LineData Store

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Offline

Systems

Capabilities

Machine learning, ranking,Relevance, SolutionsWarehouse and analytics

Page 26: Big data arch_analytics

LinkedIn with Hadoop, Aster, and Teradata

Aster/TeradataBi-Directional Connector

Aster/TeradataHadoop Connectors

Data transformation& batch processing• Image processing• Search indexes• Graph (PYMK)• MapReduce

Batch data transformations forengineering groups using HDFS +

MapReduce

LinkedIn Confidential ©2013 All Rights Reserved

Analytic Platform for datadiscovery• nPath Pattern/Path• Clickstream analysis• A/B site testing• Data Sciences discovery• SQL-MapReduce

Interactive MapReduceanalytics for the enterprise using

MapReduce Analytics &SQL-MapReduce

Integrated DataWarehouse• Exec Dashboards• Adhoc/OLAP• Complex SQL• SQL

Integration with structured data,operational intelligence, scalable

distribution of analytics

26

Page 27: Big data arch_analytics

Outline

LinkedIn Overview Why Data is important at LinkedIn? Big Data Ecosystem Analytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 27

Page 28: Big data arch_analytics

Several examples of business analytics evolution at LinkedIn

Products

Marketing

Sales

1

2

3

28

Page 29: Big data arch_analytics

How we leverage data to support Marketing

29

Identity DataSocial Data

Behavioral Data

Overall Audience

Target Audience

LinkedIn Confidential ©2013 All Rights Reserved

Page 30: Big data arch_analytics

The closed-loop analytical framework

30

Execution

Reporting & business

intelligence

Post campaign analysis

Model building and tuning

Campaign planning & design

Test

Measure

Why?Predict

Design

LinkedIn Confidential ©2013 All Rights Reserved

Page 31: Big data arch_analytics

A example of using data to improve sales

Which account? Who? How?

Step 1 Step 2 Step 3

Identity Data

Social Data

Behavioral Data

31

Page 32: Big data arch_analytics

How to provide 500 to 1000X impact?

Insights portal for sales org.

Easy: quickly find right info

Fast: few seconds response time for most insights

Scalable: 2M+ accounts/prospects

Accurate: mimic analyst/data scientist1

2

3

4

32

Page 33: Big data arch_analytics

Four stages of data analytics

What will happen?

What happened?

Why it happened?

What is happening?

High

High

Business Value

Analytical Challenge & Complexity0

33

LinkedIn Confidential ©2013 All Rights Reserved

Page 34: Big data arch_analytics

Use data to solve product problems-- A solution for answering A/B testing questions

Let technology work for us

Results first, methodology later

Bypass the charts and reports

Several thousands A/B tests are live, how to measure the performance?

1

2

3

34

LinkedIn Confidential ©2013 All Rights Reserved

Page 35: Big data arch_analytics

Nextplay : Web 3.0 – It’s all about data!!

LinkedIn Confidential ©2013 All Rights Reserved 35

Page 36: Big data arch_analytics

We are hiring!Thank you!

36

[email protected] Confidential ©2013 All Rights Reserved