View
2.313
Download
0
Category
Tags:
Preview:
DESCRIPTION
A presentation given by Erik Swan, CTO/Co-Founder of Splunk and Michael Wilde, Splunk NInja at the SXSW Interactive 2012 Conference on March 11, 2011
Citation preview
Big Data for Everyman
Erik Swan, Michael Wilde
Hi... We work at Splunk.
We stare at data all day.
WTF is Big Data?!
larger than small data?
smaller than giant data?
some cool sauce for DBAs?
Aaaahhh, no.
a simple way to describe a massive problem
*or opportunity depending on your p.o.v.
Volume | Velocity | Variety | Variability
GPS,RFID,
Hypervisor,Web Servers,
Email, MessagingClickstreams, Mobile,
Telephony, IVR, Databases,Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
Big data comes out of machines
Volume | Velocity | Variety | Variability
GPS,RFID,
Hypervisor,Web Servers,
Email, MessagingClickstreams, Mobile,
Telephony, IVR, Databases,Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
Machine-generated data is one of the fastest growing, most complex
and most valuable segments of big data
Big data comes out of machines
no, not uswe’re justnice guyswho wantshow youcool stuff
you are a producer and consumer of data
building a service?
using an app?
Location-‐Based Messaging and Intelligence For Your App and Your Customers
Seth RabinowitzCEO
James RodmellCTO
2011-11-06 11:57:31,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.75496,-73.963853,60
2011-11-06 12:17:32,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.755001,-73.963886,70
2011-11-06 12:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754982,-73.963849,75
2011-11-06 12:57:34,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754984,-73.963883,85
2011-11-06 13:17:35,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754941,-73.9639,90
2011-11-06 13:37:36,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754948,-73.963874,90
2011-11-06 13:57:37,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754931,-73.963892,95
2011-11-06 14:17:38,50,00027d27-ae02-627d-a79a-fa0004d3a347,40.755232,-73.963522,100
2011-11-06 14:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754979,-73.9639,100
Data! Good!DATE/TIME
DEVICE ID
LAT/LONG
BATTERY STRENGTH
show them something cool already!
Oh, real quick. Did you check in
or tweet #splunk #sxsw
...please
All this data can be pretty cooland empowering
Text
except one little
PROBLEM
alot of it looks like this
13/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1099,135,epmap,(empty),0,113/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1100,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1048,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1049,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1051,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1052,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.64,192.168.1.6,(empty),(empty),1694,135,epmap,(empty),0,1
and we’re expected to talk to it like this
select (select max(answer.answer) from answer where answer.member_id in (select member_id from team_members where project_id in ( select project_idfrom project where Business_stream='Upstream' and stage='Appraise' andproject_id in (select project_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id select (select max(answer.answer) fromanswer where answer.member_id in ( select member_id from team_members whereproject_id in ( select project_id from project whereBusiness_stream='Upstream' and stage='Appraise' and project_id in (selectproject_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id
It could be better. yes? better is good!
{[-‐] checkin : {[-‐] badges : [], created : 1331454784, geolat : "30.2640941786", geolong : "-‐97.7414819408", mayor : {[-‐] type : "nochange" }, primarycategory : {[-‐] fullpathname : "Food:American Restaurants", iconurl : "https://foursquare.com/img/categories/food/default.png", id : "4bf58dd8d48988d14e941735", nodename : "American Restaurants" }, timezone : "America/Chicago", user : {[-‐] gender : "male" }, venue : {[-‐] id : "4d752b1bba682d43e7563876", name : "CNN Grill @ SXSW (Max's Wine Dive)" } }} readable, ya think?
Text
source=foursquare | timechart count by checkin.venue.name
The languages to talk to data are getting better for us humans
Guys.. come on! Go back to the data please.
a simple way to describe a massive problem
A friend in Boulder can help
Need data?
The Social Media API
Jud ValeskiCo-Founder, CEO
Just when you think you’re all done, wait. There is another
consumer you may have forgotten
Someone with a different
perspective sees your service as
input to theirs
DEMAND REALTIME DATAIN A STREAM OVER THE WEB
IN JSON FORMAT
Hey audience!We still have a few
minutes.
What questions might you have
been saving until this exact moment?
Thanks.
Erik Swan, CTO Co-Founder,
Splunk
Michael WildeSplunk Ninja
Who else sends you on your way with a cute dog photo?
Recommended