Big Data - Beyond the 'Bigness' and the Technology
April 26, 2012
Anant Jhingran @jhingran
http://blog.apigee.com
http://jhingran.typepad.com
groups.google.com/group/api-craft
youtube.com/apigee
IRC Channel#api-craft
on freenode
New!
Three themes
Big Data dialog has focused on the wrong things – bigness and technology, which are both misplaced
Big Data needs to focus on the right new thing – focus on data stitching from disparate data sources
Data APIs need to be front and center of any Big Data dialog – too little discussion on that
Big Data discussion has focused on the wrong things
Wrong thing #1 – focus on technology
Business value
DATA“THE GOLD”
TECHNOLOGY“THE MEANS”
Cassandra
HBASE
EC2 . . .
.91
dept
h of
ana
lysi
s
size of the data 100TB 10 PB
Interesting problems
Hype
2 dimensions of complexity
Big data nerds $$$ VC invest Next cool tech
- webscale etc.
Wrong thing #2 – focus on bigness
Big Data needs to focus on the new right thing
Circa 2005 – Data controlled within enterprise
YourCompany
Web Page
Store
Data Warehouse
2012 – Control shifts to edge of enterprise
YourCompany
Web Page
Store
Data Warehouse
BusinessNetworks
SocialNetworks
Partners
Apps
API
Control shifts to edge of enterprise
Big Data needs to become Broad Data
Da
ta v
olu
me
enterprise data sources
enterprise + complementary sources
old world new world
sign
al /
noi
se
Most of the bigness comes from noise
The noise doesn’t matter
Only the signal matters
sign
al /
noi
se
Increase signal/noise by stitching data
sources
enterprise
syndicated
✖ Web 1.0 – Crawling . . .
✖ Web 2.0 – AJAX . . .
✔Web 3.0 - APIs + control of data
enterprise
external
access ?
control ?
central or de-central process?
If we give up the wrong things and take up the right things, what is it that we need to do?
It’s about . . .• Accessing Data that others collect• Variety• Striking deals• Respecting the APIs• Data stitching and improving S/N ratio• Depth of analysis
It’s not about . . .• Crawling• BIGNESS from any one data source
Shifting from Big Data to Broad Data
Data APIs are the future
So what kind of Data APIs?
Data APIs are the future
Monetizable apps produce & consume data
Data is the lifeblood at edge of enterprise
Need to focus on making data consumption easy
Yin and a Yang of transactions and data
X-APIs
User managementSend SMSAdd movieDo tradeGet credit info
Example APIs
D-APIsBrowse catalogGet weather by Zip codeGet demographics by region
Let’s create an information halo around APIs
http://blog.apigee.com/detail/api_strategy_talk_web_2.0/
See Amundsen’s Dogs, Information Halos and APIs: The epic story of your API Strategy »
Give Data . . .
what are your transactions, and what are your data?
Do you want to be crawled or do you want to control it?
Give Visibility . . .
Analytics and Data go hand in hand…
. . . to both your end developers and your colleagues
People are planting “flags” on various data domains by collecting and stitching disparate data together
Weather
Real-estate
Finance
Internet Traffic
Local
Business
Social
Demographic
Purchases
Price
To build out a single domain, many data sources have to be accessed and stitched
A natural stitching thing could be linked data
linkeddata.org
Once stitched, clean APIs can be provided
Data Source
Data Source
Data Source
Data Sources(crawled, bulk loaded, API accessed)
Data API and Analytics
Cleansed, Stitched
Data Source
Data Source
Data Source
Data Sources(crawled, bulk loaded, API accessed)
Data API and Analytics
Cleansed, Stitched
Typically Linked Data techniques not used here
Data Source
Data Source
Data Source
Data Sources(crawled, bulk loaded, API accessed)
Data API and Analytics
Cleansed, Stitched
Can Linked Data techniques be used here?
Linked Data as the Data API for the domains not likely to be very common
Why? The interlinking of domains is not as important as the strength of any one domain (at least for now)
Weather
Real-estate
Finance
Internet Traffic
Local
Business
Social
Demographic
Purchases
Price
If not linked data APIs, what other Data APIs might become common?
Data Source
Data Source
Data Source
Data Sources(crawled, bulk loaded, API accessed)
Data API and Analytics
Cleansed, Stitched
Our guess: APIs patterned after relational access
Kinds of Data APIs we are observing
Data
Primary Key Lookuphttp://weather.yahooapis.com/forecastrss?w=location
Imposed Hierarchy based traversal over collectionshttp://api.worldbank.org/incomeLevels/LIC/countries
“Rectangle” {rows, columns} through query parametershttp://api.worldbank.org/countries?per_page=10&incomeLevel=LIC
There are many perspectives on data APIs coming from relational world
http://blog.apigee.com/detail/rest_api_design_for_sql_programmers
http://azgroups.nextslide.com/odata-begins
I gave a talk at Microsoft
If NoData is not an Option,is Odata the answer?
(http://bit.ly/I1P0I6)
• Practical REST and OData are good starting points
• However, they cannot be available as vendor-specific implementations
• The Linked Data model cannot be ignored completely
• Let us, as a community, get the best of Linked Data and OData thoughts together
• Let’s continue this dialoggroups.google.com/group/api-craft
What do we need for Data APIs to take off?
Big Data dialog has focused on the wrong things – bigness and technology, which are both misplaced
Big Data needs to focus on the right new thing – focus on data stitching from disparate data sources
Data APIs need to be front and center of any Big Data dialog – too little discussion on that
Wrapping up
THANK YOUQuestions and ideas to:
@jhingran