Big Data: Beyond the "Bigness" and the Technology (webcast)

Preview:

DESCRIPTION

 

Citation preview

Big Data - Beyond the 'Bigness' and the Technology

April 26, 2012

Anant Jhingran @jhingran

http://blog.apigee.com

http://jhingran.typepad.com

groups.google.com/group/api-craft

youtube.com/apigee

IRC Channel#api-craft

on freenode

New!

Three themes

Big Data dialog has focused on the wrong things – bigness and technology, which are both misplaced

Big Data needs to focus on the right new thing – focus on data stitching from disparate data sources

Data APIs need to be front and center of any Big Data dialog – too little discussion on that

Big Data discussion has focused on the wrong things

Wrong thing #1 – focus on technology

Business value

DATA“THE GOLD”

TECHNOLOGY“THE MEANS”

Cassandra

HBASE

EC2 . . .

.91

dept

h of

ana

lysi

s

size of the data 100TB 10 PB

Interesting problems

Hype

2 dimensions of complexity

Big data nerds $$$ VC invest Next cool tech

- webscale etc.

Wrong thing #2 – focus on bigness

Big Data needs to focus on the new right thing

Circa 2005 – Data controlled within enterprise

YourCompany

Web Page

Store

Data Warehouse

2012 – Control shifts to edge of enterprise

YourCompany

Web Page

Store

Data Warehouse

BusinessNetworks

SocialNetworks

Partners

Apps

API

Control shifts to edge of enterprise

Big Data needs to become Broad Data

Da

ta v

olu

me

enterprise data sources

enterprise + complementary sources

old world new world

sign

al /

noi

se

Most of the bigness comes from noise

The noise doesn’t matter

Only the signal matters

sign

al /

noi

se

Increase signal/noise by stitching data

sources

enterprise

syndicated

✖ Web 1.0 – Crawling . . .

✖ Web 2.0 – AJAX . . .

✔Web 3.0 - APIs + control of data

enterprise

external

access ?

control ?

central or de-central process?

If we give up the wrong things and take up the right things, what is it that we need to do?

It’s about . . .• Accessing Data that others collect• Variety• Striking deals• Respecting the APIs• Data stitching and improving S/N ratio• Depth of analysis

It’s not about . . .• Crawling• BIGNESS from any one data source

Shifting from Big Data to Broad Data

Data APIs are the future

So what kind of Data APIs?

Data APIs are the future

Monetizable apps produce & consume data

Data is the lifeblood at edge of enterprise

Need to focus on making data consumption easy

Yin and a Yang of transactions and data

X-APIs

User managementSend SMSAdd movieDo tradeGet credit info

Example APIs

D-APIsBrowse catalogGet weather by Zip codeGet demographics by region

Let’s create an information halo around APIs

http://blog.apigee.com/detail/api_strategy_talk_web_2.0/

See Amundsen’s Dogs, Information Halos and APIs: The epic story of your API Strategy »

Give Data . . .

what are your transactions, and what are your data?

Do you want to be crawled or do you want to control it?

Give Visibility . . .

Analytics and Data go hand in hand…

. . . to both your end developers and your colleagues

People are planting “flags” on various data domains by collecting and stitching disparate data together

Weather

Real-estate

Finance

Internet Traffic

Local

Business

Social

Demographic

Purchases

Price

To build out a single domain, many data sources have to be accessed and stitched

A natural stitching thing could be linked data

linkeddata.org

Once stitched, clean APIs can be provided

Data Source

Data Source

Data Source

Data Sources(crawled, bulk loaded, API accessed)

Data API and Analytics

Cleansed, Stitched

Data Source

Data Source

Data Source

Data Sources(crawled, bulk loaded, API accessed)

Data API and Analytics

Cleansed, Stitched

Typically Linked Data techniques not used here

Data Source

Data Source

Data Source

Data Sources(crawled, bulk loaded, API accessed)

Data API and Analytics

Cleansed, Stitched

Can Linked Data techniques be used here?

Linked Data as the Data API for the domains not likely to be very common

Why? The interlinking of domains is not as important as the strength of any one domain (at least for now)

Weather

Real-estate

Finance

Internet Traffic

Local

Business

Social

Demographic

Purchases

Price

If not linked data APIs, what other Data APIs might become common?

Data Source

Data Source

Data Source

Data Sources(crawled, bulk loaded, API accessed)

Data API and Analytics

Cleansed, Stitched

Our guess: APIs patterned after relational access

Kinds of Data APIs we are observing

Data

Primary Key Lookuphttp://weather.yahooapis.com/forecastrss?w=location

Imposed Hierarchy based traversal over collectionshttp://api.worldbank.org/incomeLevels/LIC/countries

“Rectangle” {rows, columns} through query parametershttp://api.worldbank.org/countries?per_page=10&incomeLevel=LIC

There are many perspectives on data APIs coming from relational world

http://blog.apigee.com/detail/rest_api_design_for_sql_programmers

http://azgroups.nextslide.com/odata-begins

• Practical REST and OData are good starting points

• However, they cannot be available as vendor-specific implementations

• The Linked Data model cannot be ignored completely

• Let us, as a community, get the best of Linked Data and OData thoughts together

• Let’s continue this dialoggroups.google.com/group/api-craft

What do we need for Data APIs to take off?

Big Data dialog has focused on the wrong things – bigness and technology, which are both misplaced

Big Data needs to focus on the right new thing – focus on data stitching from disparate data sources

Data APIs need to be front and center of any Big Data dialog – too little discussion on that

Wrapping up

THANK YOUQuestions and ideas to:

@jhingran

Recommended