Kaggle: Crowd Sourcing for Data Analytics

Preview:

DESCRIPTION

These slides use concepts from my (Jeff Funk) course entitled Biz Models for Hi-Tech Products to analyze the business model for Kaggle’s Crowd Sourcing Service for Data Analytics. Kaggle connects data scientists with organizations who have problems related to data analysis. Kaggle helps organizations define their data analytic problems, present them to data scientists, and organize and evaluate competitions between data analytic solutions. Its data ensemble technique also evaluates the effectiveness of the various solutions. These slides describe the specific value proposition for organizations and data scientists and other aspects of the business model such as the method of value capture, scope of activities, and method of strategic control.

Citation preview

MT5016 – BUSINESS MODELS FOR HI-TECH PRODUCTS

A STUDY BY,

Jeffray Jayaraj Michael (A0119246E)

Niha Agarwalla (A0119230U)

Nivethan Santhan (A0121887X)

Sathishkumar Murugesan (A0133745E)

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

INTRODUCTION

Kaggle, the medium where companies with data and require someone to work on it to connect

with people who wants to use their data solving skills.

Crowdsourcing

platform

What is Data Science?

The newly emerging field that is dedicated to analysing and manipulating unstructured/structured raw data to derive insights and build process, products and alter or develop new business model.

Necessary skill-sets ranges from computer science, to mathematics, to knowledge in relevant field.

INTRODUCTION

Data

science

How Kaggle addresses Data Science?

It is almost never the case that any single organization has access to the advanced machine learning

and statistical techniques that would allow them to extract maximum value from their data.

Meanwhile, data scientists crave real-world data to develop and refine their techniques.

Kaggle corrects this mismatch by offering companies a cost effective way to harness the ‘cognitive

surplus’ of the world's best data scientists.

What does Kaggle use to correct the mismatch?

Crowdsourcing – It shares the real time data to specific group of users (data scientists) to come up

with the predictive models to solve the problems.

INTRODUCTION

WHY DATA SCIENCE AND ANALYTICS?

Organization's are

spending an average of

21% of their

marketing budget on

analytics

http://blogs.osc-ib.com/2014/02/ib-student-blogs/data-is-the-new-oil/

DATA IS THE NEW OIL

http://blogs.osc-ib.com/2014/02/ib-student-blogs/data-is-the-new-oil/

HOW KAGGLE WORKS?

The competition host prepares the

data and a description of the

problem. He announced the prize

pool for a proper solution together

with a deadline for the challenge.

Participants experiment with

different techniques and compete

against each other to find the best

models. After the deadline passes,

the competition host pays the prize

money to the winner.

Kaggle Connect is the consulting

part of the platform, which

connects companies to the elite of

the Kaggle community, whom serve

solutions for different data science

problems.

HOW THE COMPETITIONS WORK?

4. Understand

(Data Scientist &

Kaggle)

5. Collect

(Data Scientist

& Kaggle)

6. Data exploration

(Data Scientist & Kaggle)

7. Plausibility check

(Data Scientist & Kaggle)

8. Model

(Data Scientist)

9. Validate

(Kaggle – Ensemble

approach)

1. Company

(customer

with problems)2. Kaggle 3. Organize data

(Kaggle) Data scientist

Registration

10. Communicating

Results

Deploy

Best solution

WHICH MODEL TO USE?

Countless possible approaches to any data prediction problem.

Which to choose?

HOW KAGGLE SELECTS THE BEST?

Competitions are judged based on predictive

accuracy and objective criteria set by the

competition host/company.

Kaggle compare techniques on a uniform dataset

with a uniform evaluation algorithm that assigns

points to each solution and the results are

categorized.

Kaggle uses an Ensemble approach which is proven

to be better to assess predictive modelling

solutions.

Ensemble approach

HOW COMPETITIONS ARE CATEGORISED?

Categories

Recruiting

Confused?

HOW COMPETITIONS ARE CATEGORISED?

Getting Started : Public competitions for beginners to participate and involves no cash prize.

Customers are mostly non-profit organizations.

Playground : Public competitions set-up to be more fun, quirky and idea-driven, rather than to solve

any business or research problems.

Kaggle Prospect : Public competitions that doesn’t use the leader board to determine the winner,

and where the goal is not a predictive model. The goals of Prospect competitions include data

exploration, analyses, and data visualizations.

Research : Public competitions where the competition goals are research/ scientific in nature or

serve a public good. These competitions tend to focus on ambitious machine learning problems at the

forefront of technology, or problems with a significant social-good aspect.

Recruiting : Public competitions where the sponsors are looking to hire data scientists and use the

competition to find and test potential talent. There are no teams, and each user must showcase their

individual work.

Masters : Competitions open to only a select tier of elite Kagglers, or a subset of these by invitation-

only or special eligibility criteria. These competitions have significant commercial value or sensitive

data.

Featured : Public competitions with significant prize money meant to solve commercial problems.

Prize winners grant the sponsor a non-exclusive license to their work, and will present their results

via a detailed write-up

HOW COMPETITIONS ARE CATEGORISED?

SAMPLE COMPETITION

Intel gathered the data of previous NCAA

tournament results and fixtures match-up, players

data and home and away wins over a period of two

decades.

First stage is to generate a predictive model to and

compare it with the previous tournaments.

Target is to use the model to predict the winners of

the 2014 NCAA tournaments.

Prize money : $15,000

id pred1 pred2 name.x name.y

S_507_509

0.24530923428

8291

0.70899929953

0187ALBANY NY

AMERICAN

UNIV

S_507_511

0.01524540814

7597

0.08396557425

6572ALBANY NY ARIZONA

S_509_511

0.04476173292

3018

0.04177913184

0498

AMERICAN

UNIV ARIZONA

S_509_512

0.28228121328

2214

0.18569021549

2044

AMERICAN

UNIV ARIZONA ST

S_507_512

0.114997411223

728

0.32404878668

6369ALBANY NY ARIZONA ST

S_511_521

0.84695278868

2282

0.83506008008

3856ARIZONA BAYLOR

S_507_521

0.07761586504

1407

0.28593300082

739ALBANY NY BAYLOR

S_509_536

0.30457632400

6342

0.18732429402

6667

AMERICAN

UNIV BYU

S_507_536

0.12640714011

8714

0.32641237116

6609ALBANY NY BYU

Predictions :

TOP COMPANIES INVOLVED

In kaggle thousand of competition are hosted

Competition varieties range from Biology to Finance.

Various companies such as Nasa, Microsoft etc and medium sized enterprise host competition.

Universities such as Stanford and Harvard even host the competition.

KAGGLE COMMUNITY

Kaggle community is the place where various datascientists and experts stand on a single platform toshare thoughts.

Kaggle runs a blog “no free hunch” where everyactivity happening in kaggle, best practices, conferencesand updates on recent developments are constantlyposted.

The community also has the top data scientists in theworld, with whom the companies could discuss on thecurrent model and the effects of the predictive modelsdeveloped.

The Jobs Board is the new feature wherecompany/customer in need of Data Scientist couldpost an ad with their requirements

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

SCOPE OF ACTIVITIES

Kaggle Open source Investors &

support

Companies Data

Scientist

Competition hosts x

Data providers x

Content development x

Software x x

Algorithm x x x

Evaluation x

Data Storage x

Marketing x x

Licensing x x

Reading material x x x

Search x

Terms x x x

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

VALUE PROPOSITION – KAGGLE

KAGGLE has two types of Customer:

1.Data Scientist (who works for the problem)

2.Company/Organizations.(who gives the problem)

Participation by worlds leading data scientist

Many data scientist participate

Different minds gives different solutions

Kaggleplatform<<< data scientist

Ensemble approach

Signing of NDA, Background check, Exclusive sets of data scientists

VALUE PROPOSITION- COMPANIES

VALUE PROPOSITION FOR DATASCIENTIST

To Big companies such as NASA, Facebook, Microsoft

Highly paid jobs in big organizations.

Signature track : Data Scientist in Kaggle leader board which gives them recognition in the field of predictive modelling.

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

CUSTOMER SELECTION - MARKET SEGMENTS

Companies &

Research

Organization

Data

Scientists

END USERS

Corporations and Research

Organizations

People

Kaggle

Trend Analytics on

Stock Prices

Users Subscribe to

services based on

Kaggle Solutions

Direct

Indirect

TARGETED INDUSTRIES

Companies & Research Organization

Life Sciences EnergyFinancial Services

IT Retail

COMPANIES OF FOCUS

TARGETED USERS

100,000

Data

Scientists

Job

SeekersFreelancers

DATA SCIENTISTS

https://gigaom.com/2013/07/11/kaggle-now-has-100k-data-scientists-but-whats-a-data-scientist/

KAGGLE : NUMBER OF DATA SCIENTIST

100,000 as of 2013

KAGGLE’S MARKET

Sales ForecastingStock Forecasting

Risk Modelling &

Pricing

Logistic

optimisation

Best Process

Prediction

Inventory

Management

Traffic Forecasting

Energy demand Crime Prediction

Tax Social fraud

detection

Hospital Casualty

Demand

Private Sectors Public Sectors

MARKET DRIVERS

IT offers a definitive source of competitive advantage across all industries and will offer significant future value.

Data is being considered to be the future commodity.

Individuals create 70% of data, Enterprises store 80% of the data

MARKET OPPORTUNITY

http://marketing555.wordpress.com/2012/10/02/the-big-and-small-of-data/

Overall

$107 Billion

Outsourced

$43 Billion in 2017

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Kaggle Competition

Community

Access

% from prize

money

Company-Open DataData ScientistsSolution

Prize Money

CURRENT REVENUE STREAM - BUSINESS

Kaggle ConnectTop Data Scientist Access

Connect

Fee

Company- Sensitive Data

Top 0.5% Data Scientists

Money

Solution

CURRENT REVENUE STREAM - BUSINESS

CURRENT REVENUE STREAM - EDUCATION

Kaggle corp

Assignments % Revenue

Results in order of

marks obtained

Student enrolled

in the university

Question &

Data

Data model

Top universities

PROPOSED REVENUE STREAM – EDUCATION

Contract with online courseware websites like Coursera, edx

could be signed and provide data for students enrolled in specific

courses.

Singapore government has proposed to introduce data science in

high schools as a part of co-curriculum. Kaggle could enter the

market to provide a tool for schools.

PROPOSED REVENUE STREAM – GOVERNMENT ALIASES

Kaggle corp

Kaggle competition

Kaggle connect

Government/

Customer Local Data

scientist

Data available

online

Job offer

Brand value gained as a government recognised platform/organisation for Analytics

Prize money

% of Prize

money

Job

Data model Has knowledge

about the local

market

Data model

+

Trust/Privacy

Human

Resource

PROPOSED REVENUE STREAM – KAGGLE CONSULTANCY

Kaggle

corp

Kaggle connect

Oil & Gas

industries/

Customer

Raw Data +

ChallengeFee for

consultancy

Top 0.5% of Data

Scientist in

relevant field

Work

Kaggle consultancy

Job offer

Structured

dataOwnershipData model

With good Brand value, trust and adequate human resource availability, Kaggle could enter the field of analytics

as a consulting firm.

The major field of interest could be Oil & Gas as the data is large, unstructured and sensitive.

VALUE CAPTURE - KAGGLE PRODUCTS

Kaggle Public Competitions

Competitions allow organizations to

post their data and a specific prediction

problem to be answered

competitively by the world's best

Kaggle Masters Competitions

Kaggle provides the same platform as

with its public competitions,

except that access is limited only to an

elite group of Kaggle players

Kaggle-in-Class

Kaggle-in-Class allows instructors

to host data prediction

competitions for their students.

KAGGLE IN LONG TAIL

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Kaggle Innocentive

For users Career Choice with enough

competitions

Rewarding hobby

platform Crowdsourcing, Open

innovation, Predictive

modelling

Open innovation, Research and

Development

Scope Problems involving Data

analytics

R&D in various industries

Registered Members 100,000 in 3 years 300,000 in 12 years

Max Prize money 3 million 1 million

Number of Competitions 311(107/year) 1650 ( 138/year)

https://www.kaggle.com/competitions

KAGGLE VS INNOCENTIVE

Kaggle focuses on problems that

are related to data analytics.

Kaggle’s data scientist use

machine leaning as a

methodology to solve these

problems.

Problems posted in Innocentive

are related to R&D, product

development generic issues.

Ususally coding stands as the

major part of the development.

These 2 are different

organizations with a different

value proposition.

CONTENTS

Introduction

Scope of Activities

Value Proposition

Competitor Analysis

Customer Selection

Value Capture

Strategic Control

1

2

3

4

5

6

7

More Data scientists attracts more Clients

NETWORK EFFECT

First mover advantages of internet platforms

ClientsData

Scientist

More Clients attracts more data scientists

STRATEGIC PARTNERSHIP & COLLABORATION

Strong collaboration with big data companies And Institutions – GE, Google,

Facebook, Amazon, Walmart

Secure

PlatformSecure

Platform

BARRIER FOR ENTRY

Strengthen and establish exclusive

relationships with Big data companies and

World class Institutions will create a

barrier for other competitors to enter in

the business

Patent/trade secret of business model shall

be made

IP MANAGEMENT

Kaggle has a strong IP management

IP protected ranking software which is used to choose the best model

Ranking software is the key for Appropriability

Between the parties, Kaggle is the owner of all Intellectual Property Rights in and to the Website

Winner entry will be governed by a separate contract between the winner and the Competition Host

All text, graphics, user interfaces, photographs, trademarks, logos and artwork, including the design,

structure… licensed by or to Kaggle and is protected by applicable copyright, patent and trademark

laws and various other intellectual property rights and unfair competition laws.

COMPLEMENTARY ASSETS

Job Opportunities

Data analysis courses and online support

Certificate/Credit System: Kaggle can establish a credit system as like the

leader board that can leverage a Student to join in a school

Complementary Products like T-Shirts for Non-Profit competitions

Transforming the inefficient market for

technical talent into the world’s largest meritocracy.

1. INTRODUCTION“I keep saying the sexy job in the next ten years will be statisticians.”Hal Varian

Google Chief Economist

2009

“Aim to make Data Science a Sport.”Anthony Goldbloom

Kaggle Founder

2012

THANK YOU