39
Big Data EcoSystem @ LinkedIn October 20, 2012 LinkedIn Confidential ©2013 All Rights Reserved

Infovision sunil shirguppi _ large scale BI and analytics

Embed Size (px)

DESCRIPTION

Big Data EcoSystem @ LinkedIn Sunil Shirguppi

Citation preview

Page 1: Infovision sunil shirguppi  _ large scale BI and analytics

Big Data EcoSystem @ LinkedInOctober 20, 2012

LinkedIn Confidential ©2013 All Rights Reserved

Page 2: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn Confidential ©2013 All Rights Reserved

Sunil ShirguppiHead of Data Services- InternationalLinkedIn Corporationhttp://www.linkedin.com/in/sunilshirguppi

Page 3: Infovision sunil shirguppi  _ large scale BI and analytics

Outline

LinkedIn OverviewData ScienceBig Data Eco-SystemLearnings

LinkedIn Confidential ©2013 All Rights Reserved 3

Page 4: Infovision sunil shirguppi  _ large scale BI and analytics

Our Mission

Connect the world’s professionals to make them more productive and successful

LinkedIn Confidential ©2013 All Rights Reserved 4

Page 5: Infovision sunil shirguppi  _ large scale BI and analytics

We are the professional profile of record

Googled yourself lately?Don’t feel bad, we all do it.

Page 6: Infovision sunil shirguppi  _ large scale BI and analytics

Executives from all

Companies are LinkedIn members

Page 7: Infovision sunil shirguppi  _ large scale BI and analytics

The LinkedIn Opportunity

LinkedIn Confidential ©2013 All Rights Reserved 7

Fundamentally transforming the way the world worksFundamentally transforming the way the world works

Connect talent with opportunity at massive scale

+

Page 8: Infovision sunil shirguppi  _ large scale BI and analytics

The World’s Largest Professional Network

LinkedIn Confidential ©2013 All Rights Reserved 8

*as of Nov 4, 2011**as of June 30, 2011

2 48

17

32

55

90

2004 2005 2006 2007 2008 2009 2010

LinkedIn Members (Millions)

175M+*

82%Fortune 100 Companies use LinkedIn to hire

Company Pages

>2M

**

New Members joining

~2/sec

Professional searches in 2011

~4.2B

Page 9: Infovision sunil shirguppi  _ large scale BI and analytics

Multiple revenue channels

Premium Subscriptions Self Serve Ads Hiring Solutions Marketing Solutions

Page 10: Infovision sunil shirguppi  _ large scale BI and analytics

Let’s talk Data…

Page 11: Infovision sunil shirguppi  _ large scale BI and analytics

Business is recognizing the importance of analytics

Page 12: Infovision sunil shirguppi  _ large scale BI and analytics

Data Scientist = Curiosity + Intuition + Data gathering + Standardization + Statistics + Modeling + Visualization + Communication

What makes a Data Scientist?

Page 13: Infovision sunil shirguppi  _ large scale BI and analytics

Big Data at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 13

* Chart from Philip Russom- Research Director: TDWI

Page 14: Infovision sunil shirguppi  _ large scale BI and analytics

What do we do with Data?

Data Standardization Build innovative data products to help professionals Draw insights Drive the business

Before we can do that...There are a few challenges that we have to overcome

• Scale• Standardization• Infrastructure

Page 15: Infovision sunil shirguppi  _ large scale BI and analytics

Few Data-Driven Products

LinkedIn Confidential ©2013 All Rights Reserved 15

Pandora Search for People

Events YouMay BeInterested In

Groups browse maps

Page 16: Infovision sunil shirguppi  _ large scale BI and analytics

How do we do it?

Page 17: Infovision sunil shirguppi  _ large scale BI and analytics
Page 18: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn Sample Data Stack

Crowdsourcing

Page 19: Infovision sunil shirguppi  _ large scale BI and analytics

Big Data at LinkedIn

LinkedIn Confidential ©2013 All Rights Reserved 19

Users

Online Data Store

Near-Line Data Store

Application Offline Data Store

WebLogs

High-level data environment

Challenges so complex which off-the-shelf or a few

technologies can’t address

Built our own combination of toolsets/ technologies to

meet specific requirements

Page 20: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn Data Stack – Online

LinkedIn Confidential ©2013 All Rights Reserved 20

Users

Online Data Store

Near-Line Data Store

Application Offline Data Store

WebLogs

Systems Capabilities

• Rich structures (e.g. indexes)• Change capture capability

Page 21: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn Data Stack – Nearline

LinkedIn Confidential ©2013 All Rights Reserved 21

Users

Online Data Store

Near-Line Data Store

Application Offline Data Store

WebLogs

Systems Capabilities• Key value accessVoldemort

• Search platform

• Distributed Graph engineZoie Bobo Sensei

D-Graph

Page 22: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn Data Stack – Pipeline

LinkedIn Confidential ©2013 All Rights Reserved 22

Users

Online Data Store

Near-Line Data Store

Application Offline Data Store

WebLogs

Systems Capabilities• Messaging for site events,

monitoring

• Change data capture streams

Page 23: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn Data Stack – Offline

LinkedIn Confidential ©2013 All Rights Reserved 23

Users

Online Data Store

Near-Line Data Store

Application Offline Data Store

WebLogs

Systems Capabilities

• Machine learning, ranking, relevance

• Warehouse and analytics

Page 24: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn with Hadoop, Aster, and Teradata

LinkedIn Confidential ©2013 All Rights Reserved 24

Integrated Data Warehouse• Exec Dashboards • Adhoc/OLAP• Complex SQL• SQL

Data transformation & batch processing• Image processing• Search indexes• Graph (PYMK)• MapReduce

Analytic Platform for data discovery• nPath Pattern/Path• Clickstream analysis• A/B site testing• Data Sciences discovery• SQL-MapReduce

Aster/Teradata Bi-Directional Connector

Aster/Teradata Hadoop Connectors

Batch data transformations for engineering groups using HDFS +

MapReduce

Batch data transformations for engineering groups using HDFS +

MapReduce

Interactive MapReduce analytics for the enterprise using

MapReduce Analytics & SQL-MapReduce

Interactive MapReduce analytics for the enterprise using

MapReduce Analytics & SQL-MapReduce

Integration with structured data, operational intelligence, scalable

distribution of analytics

Integration with structured data, operational intelligence, scalable

distribution of analytics

Page 25: Infovision sunil shirguppi  _ large scale BI and analytics
Page 26: Infovision sunil shirguppi  _ large scale BI and analytics

It’s a global economy

Country connectedness on LinkedIn

Page 27: Infovision sunil shirguppi  _ large scale BI and analytics

Data deep dives

Job migration after financial collapse

Page 28: Infovision sunil shirguppi  _ large scale BI and analytics

How Often do people change jobs?

Page 29: Infovision sunil shirguppi  _ large scale BI and analytics

Visualization is important

Page 30: Infovision sunil shirguppi  _ large scale BI and analytics

If your name is Chip, you are likely in sales!

Page 31: Infovision sunil shirguppi  _ large scale BI and analytics

31

Industry Growth

Page 32: Infovision sunil shirguppi  _ large scale BI and analytics

Buzzwords

Page 33: Infovision sunil shirguppi  _ large scale BI and analytics

What next?

• Self service analytics• Metadata framework• Integrate reporting solutions• Go Mobile!• Scalability and Data Quality

Page 34: Infovision sunil shirguppi  _ large scale BI and analytics

Challenges• Data volumes and availability

– Billion+ rows every day– Users in Global locations need data

• Multiple platforms– Agile development– Data Integration

Data Quality– User input data– Data standardization

Page 35: Infovision sunil shirguppi  _ large scale BI and analytics

Key Learnings

Self Service– Making data accessible to key stakeholders in a timely

manner creates tremendous value. – Viz is more important than we think

• Measuring your future investments– Performance is not the only measure– Company fundamentals matter

• As an Data team, be in control of your destiny– Identify what to measure and lead by metrics– Become the Think-tank

Page 36: Infovision sunil shirguppi  _ large scale BI and analytics

Web 3.0 – It’s all about data!!

LinkedIn Confidential ©2013 All Rights Reserved 36

Page 37: Infovision sunil shirguppi  _ large scale BI and analytics

ULTIMATELY…

Page 38: Infovision sunil shirguppi  _ large scale BI and analytics

It is all about the people!

Page 39: Infovision sunil shirguppi  _ large scale BI and analytics

LinkedIn Confidential ©2013 All Rights Reserved 39

Thank You!