Big data arch_analytics

Preview:

Citation preview

Big Data EcoSystem and Analytics @ LinkedInMay 16, 2013

LinkedIn Confidential ©2013 All Rights Reserved

Srinu Adira

Manager, Data Services(Business Solutions)

LinkedIn Corporation

http://www.linkedin.com/in/srinuadira

LinkedIn Confidential ©2013 All Rights Reserved 2

Outline

LinkedIn OverviewWhy Data is important for LinkedIn?Big Data EcosystemAnalytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 3

Our Mission

Connect the world’s professionalsto make them more productive and successful

LinkedIn Confidential ©2013 All Rights Reserved 4

5

The LinkedIn Opportunity

Connect talent with opportunity at massive scale

+

Fundamentally transforming the way the world works

LinkedIn Confidential ©2013 All Rights Reserved

200M+

The World’s Largest Professional Network

LinkedIn Confidential ©2013 All Rights Reserved 6

8 1732

55

90

147

2006 2007 2008 2009 2010 2011 2012

LinkedIn Members (Millions)

*

88%Fortune 100 Companies

use LinkedIn to hire

~2/secNew Members joining

>2.9MCompany Pages

Professional

searches in 2012

~5.7B

Outline

LinkedIn Overview Why Data is important for LinkedIn? Big Data Ecosystem Analytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 7

LinkedIn Confidential ©2013 All Rights Reserved 8

“If you are not embarrassed by the first versionof your product, you have launched it too late.”

Reid Hoffman, Founder & Chairman LinkedIn Corp

LinkedIn Confidential ©2013 All Rights Reserved 9

“What gets measured gets fixed.”

David Henke, SVP Technology Operations, LinkedIn Corp

LinkedIn Confidential ©2013 All Rights Reserved 10

The Power of LinkedIn’s Network Effects

Member growthand engagement

Relevant andvaluable products, solutions & services

Critical massof data

Few Data Driven Products People You May Like Groups You May Like Jobs You May Be Interested In Who's Viewed Your Profile Companies You May Want To Follow

11

LinkedIn Confidential ©2013 All Rights Reserved

Data Insights (Sample)

LinkedIn Confidential ©2013 All Rights Reserved 12

Data Solutions (Sample)

LinkedIn Confidential ©2013 All Rights Reserved

Segmentation/Standardization

Propensity Modeling

TargetingChurn Analysis/LTV

Business Forecasting

Java/MPP/

Hadoop

ML/Statistical Packages

HadoopMPP

MPP

13

Data Solutions Drivers Business analytics (e.g., data mining,

enable decision making) Sales analytics (e.g., customer

segmentation, targeting) Marketing (e.g., campaigns) Data insights for Customers (e.g., Career

site analytics) Business Operations (forecasting,

business pulse)

14

LinkedIn Confidential ©2013 All Rights Reserved

Outline

LinkedIn Overview Why Data is important at LinkedIn? Big Data Ecosystem Analytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 15

Big Data at LinkedIn

16

* Chart from Philip Russom- Research Director: TDWI

LinkedIn Confidential ©2013 All Rights Reserved

LinkedIn Confidential ©2013 All Rights Reserved 17

Big Data at LinkedIn

Platform and solutions that Scale at cost with data complexity Simplify the data continuum across online, near-line

and offline Enable business decisions

18

What does “big data” mean at LinkedIn?

ERP data…

Social Data…

CRM data…

Web data…

+∞

+∞

Data Volume

Analytical Challenge & Complexity0

18

LinkedIn Confidential ©2013 All Rights Reserved

3 major data dimensions at LinkedIn

19

IdentityData

SocialData

Behavioral Data

LinkedIn Confidential ©2013 All Rights Reserved

LinkedIn Confidential ©2013 All Rights Reserved 20

Near-LineData Store

Online DataStore

WebLogs

Big Data at LinkedIn

High-level data environment

Application

Users

Challenges so complex thatoff-the-shelf or a few

technologies can’t address

Offline DataStore

Built our own combination oftoolsets/ technologies tomeet specific requirements

LinkedIn Confidential ©2013 All Rights Reserved 21

LinkedIn’s Sample Data Stack

Let’s do a deep dive to understand how the capabilities ofLinkedIn’s data stack meet our requirements

LinkedIn Confidential ©2013 All Rights Reserved 22

Users

Near-LineData Store

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Online

Systems

Capabilities

Rich structures (e.g., indexes)

Change capture capability

LinkedIn Confidential ©2013 All Rights Reserved 33

Users

Near-LineData Store

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Nearline

Systems Capabilities

Distributed Key value store

Search platform

Distributed Graph engine

Bobo Sensei

Voldemort

Zoie

D-Graph

LinkedIn Confidential ©2013 All Rights Reserved 34

Users

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Pipeline

Systems Capabilities

Messaging for site events, monitoring

Change data capture streams

Reliable, consistent, low latency pipe

Near-LineData Store

LinkedIn Confidential ©2013 All Rights Reserved 35

Users

Near-LineData Store

Online DataStore

Application Offline DataStore

WebLogs

LinkedIn Data Stack – Offline

Systems

Capabilities

Machine learning, ranking,Relevance, SolutionsWarehouse and analytics

LinkedIn with Hadoop, Aster, and Teradata

Aster/TeradataBi-Directional Connector

Aster/TeradataHadoop Connectors

Data transformation& batch processing• Image processing• Search indexes• Graph (PYMK)• MapReduce

Batch data transformations forengineering groups using HDFS +

MapReduce

LinkedIn Confidential ©2013 All Rights Reserved

Analytic Platform for datadiscovery• nPath Pattern/Path• Clickstream analysis• A/B site testing• Data Sciences discovery• SQL-MapReduce

Interactive MapReduceanalytics for the enterprise using

MapReduce Analytics &SQL-MapReduce

Integrated DataWarehouse• Exec Dashboards• Adhoc/OLAP• Complex SQL• SQL

Integration with structured data,operational intelligence, scalable

distribution of analytics

26

Outline

LinkedIn Overview Why Data is important at LinkedIn? Big Data Ecosystem Analytics at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 27

Several examples of business analytics evolution at LinkedIn

Products

Marketing

Sales

1

2

3

28

How we leverage data to support Marketing

29

Identity DataSocial Data

Behavioral Data

Overall Audience

Target Audience

LinkedIn Confidential ©2013 All Rights Reserved

The closed-loop analytical framework

30

Execution

Reporting & business

intelligence

Post campaign analysis

Model building and tuning

Campaign planning & design

Test

Measure

Why?Predict

Design

LinkedIn Confidential ©2013 All Rights Reserved

A example of using data to improve sales

Which account? Who? How?

Step 1 Step 2 Step 3

Identity Data

Social Data

Behavioral Data

31

How to provide 500 to 1000X impact?

Insights portal for sales org.

Easy: quickly find right info

Fast: few seconds response time for most insights

Scalable: 2M+ accounts/prospects

Accurate: mimic analyst/data scientist1

2

3

4

32

Four stages of data analytics

What will happen?

What happened?

Why it happened?

What is happening?

High

High

Business Value

Analytical Challenge & Complexity0

33

LinkedIn Confidential ©2013 All Rights Reserved

Use data to solve product problems-- A solution for answering A/B testing questions

Let technology work for us

Results first, methodology later

Bypass the charts and reports

Several thousands A/B tests are live, how to measure the performance?

1

2

3

34

LinkedIn Confidential ©2013 All Rights Reserved

Nextplay : Web 3.0 – It’s all about data!!

LinkedIn Confidential ©2013 All Rights Reserved 35

We are hiring!Thank you!

36

sadira@linkedin.comLinkedIn Confidential ©2013 All Rights Reserved

Recommended