Interop - Exploring Machine Data

Preview:

Citation preview

ExploringMachine Data

@michaelwilde, Co-CTO, Splunk

Hi... I work at Splunk.

We stare at data all day.

WTF is Machine Data?!

is it logs?

is it netflow?

is it TWEETS?

Aaaahhh, well... kind of.

a simple way to describe the exhaust from technology

*or a big giant pain in the butt.

Volume | Velocity | Variety | Variability

GPS,RFID,

Hypervisor,Web Servers,

Email, MessagingClickstreams, Mobile,

Telephony, IVR, Databases,Sensors, Telematics, Storage,

Servers, Security Devices, Desktops

Machine-generated data is one of the fastest growing, most complex

and most valuable segments of big data

Machine data is the BIGgest DATA

no, not uswe’re justnice guyswho wantshow youcool stuff

you are a producer and consumer of data

building a service?

using an app?

Location-­‐Based  Messaging  and  Intelligence  For  Your  App  and  Your  Customers

Seth RabinowitzCEO

James RodmellCTO

2011-11-06 11:57:31,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.75496,-73.963853,60

2011-11-06 12:17:32,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.755001,-73.963886,70

2011-11-06 12:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754982,-73.963849,75

2011-11-06 12:57:34,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754984,-73.963883,85

2011-11-06 13:17:35,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754941,-73.9639,90

2011-11-06 13:37:36,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754948,-73.963874,90

2011-11-06 13:57:37,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754931,-73.963892,95

2011-11-06 14:17:38,50,00027d27-ae02-627d-a79a-fa0004d3a347,40.755232,-73.963522,100

2011-11-06 14:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754979,-73.9639,100

Data! Good!DATE/TIME

DEVICE ID

LAT/LONG

BATTERY STRENGTH

Oh, real quick. Did you check in

or tweet #splunk #interop

...please

All this data can be pretty cooland empowering

Text

except one little

PROBLEM

alot of it looks like this

13/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1099,135,epmap,(empty),0,113/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1100,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1048,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1049,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1051,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1052,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.64,192.168.1.6,(empty),(empty),1694,135,epmap,(empty),0,1

and we’re expected to talk to it like this

select (select max(answer.answer) from answer where answer.member_id in (select member_id from team_members where project_id in ( select project_idfrom project where Business_stream='Upstream' and stage='Appraise' andproject_id in (select project_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id select (select max(answer.answer) fromanswer where answer.member_id in ( select member_id from team_members whereproject_id in ( select project_id from project whereBusiness_stream='Upstream' and stage='Appraise' and project_id in (selectproject_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id

It could be better. yes? better is good!

{[-­‐]    checkin  :  {[-­‐]        badges  :  [],        created  :  1331454784,        geolat  :  "30.2640941786",        geolong  :  "-­‐97.7414819408",        mayor  :  {[-­‐]            type  :  "nochange"        },        primarycategory  :  {[-­‐]            fullpathname  :  "Food:American  Restaurants",            iconurl  :  "https://foursquare.com/img/categories/food/default.png",            id  :  "4bf58dd8d48988d14e941735",            nodename  :  "American  Restaurants"        },        timezone  :  "America/Chicago",        user  :  {[-­‐]            gender  :  "male"        },        venue  :  {[-­‐]            id  :  "4d752b1bba682d43e7563876",            name  :  "CNN  Grill  @  SXSW  (Max's  Wine  Dive)"        }    }} readable, ya think?

Text

failed password | timechart count by client_ip

The languages to talk to data are getting better for us humans

Guys.. come on! Go back to the data please.

a simple way to describe a massive problem

A friend in Boulder can help

Need data?

Sometimes machine data is helpful to those OTHER than IT

Someone with a different

perspective sees your

exhaust as a source of fuel

please, please, pleaseCALL THE VP OF

ENGINEERINGat all of your vendors.

DEMAND REALTIME DATAIN A STREAM OVER THE WEB

IN JSON FORMAT

Hey audience!We still have a few

minutes.

What questions might you have

been saving until this exact moment?

Thanks.

@michaelwilde

Michael WildeSplunk Ninja

Co-CTO, SplunkWho else sends you on your way with a cute dog photo?

Recommended