29
Copyright © 2015 Splunk Inc. Raanan Dagan Rohit Pujari Real World Big Data Architecture B Splunk, Hadoop, RDBMS

Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Copyright*©*2015*Splunk*Inc.*

Raanan*Dagan*Rohit*Pujari*

Real*World*Big*Data*Architecture*B*Splunk,*Hadoop,*RDBMS*

Page 2: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Disclaimer*

2*

During*the*course*of*this*presentaIon,*we*may*make*forward*looking*statements*regarding*future*events*or*the*expected*performance*of*the*company.*We*cauIon*you*that*such*statements*reflect*our*current*expectaIons*and*esImates*based*on*factors*currently*known*to*us*and*that*actual*events*or*results*could*differ*materially.*For*important*factors*that*may*cause*actual*results*to*differ*from*those*contained*in*our*forwardBlooking*statements,*please*review*our*filings*with*the*SEC.*The*forwardBlooking*statements*made*in*the*this*presentaIon*are*being*made*as*of*the*Ime*and*date*of*its*live*presentaIon.*If*reviewed*aTer*its*live*presentaIon,*this*presentaIon*may*not*contain*current*or*

accurate*informaIon.*We*do*not*assume*any*obligaIon*to*update*any*forward*looking*statements*we*may*make.**

*In*addiIon,*any*informaIon*about*our*roadmap*outlines*our*general*product*direcIon*and*is*subject*to*change*at*any*Ime*without*noIce.*It*is*for*informaIonal*purposes*only*and*shall*not,*be*incorporated*into*any*contract*or*other*commitment.*Splunk*undertakes*no*obligaIon*either*to*develop*the*features*

or*funcIonality*described*or*to*include*any*such*feature*or*funcIonality*in*a*future*release.*

Page 3: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Agenda*

!   Splunk*Big*Data*Architecture*!   AlternaIve*Open*Source*Approach*!   RealBWorld*Customer*Architecture*!   EndBtoBend*DemonstraIon*

3*

Page 4: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Who*are*you?*

4*

•  Raanan*Dagan*B*Sr.*SE,*Big*Data*specialist*•  Rohit*Pujari*–*Sr.*SE,*Big*Data*SME*

Page 5: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Rela%onal(Database(Structured(

SQL( Search(

Schema(at(Write( Schema(at(Read(

Splunk(

Big*Data*Technologies*

5*

ETL(Real<Time(Indexing(

RDBMS(Oracle,(MySQL,(IBM(

DB2,(Teradata(

Hadoop(Semi<Structured(

MapReduce(

Schema(at(Read(

HDFS(Storage(

Distributed(File(System(

Time<Series,(Unstructured,(Heterogenous(

NoSQL(Semi<Structured(

Schema(at(Read(

Key<Value,(Column,(

Document(&(Other(Stores(

Cassandra,(Accumulo,(MongoDB(

MapReduce(

Page 6: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

SQL( Search(

Schema(at(Write( Schema(at(Read(

Splunk(

Splunk*Big*Data*Technologies*

6*

ETL(Real<Time(Indexing(

RDBMS(Oracle,(MySQL,(IBM(

DB2,(Teradata(

MapReduce(

Schema(at(Read(

HDFS(Storage(

Distributed(File(System(

Time<Series,(Unstructured,(Heterogenous(

Schema(at(Read(

Key<Value,(Column,(

Document(&(Other(Stores(

Cassandra,(Accumulo,(MongoDB(

MapReduce(

Hunk(DB(Connect(

Page 7: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Splunk*Scalability*

7*

Enterprise<class(Availability(and(Scale(

Send*data*from*thousands*of*servers*using*any*combinaIon*of*Splunk*forwarders***

Auto*loadBbalanced*forwarding*to*Splunk*Indexers*

Offload*search*load*to*Splunk*Search*Heads*

! AutomaIc*load*balancing*linearly**scales*indexing*

! Distributed*search*and*MapReduce*linearly*scales*search*and*reporIng*

Page 8: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Splunk*RealBTime*AnalyIcs**

Data*

Parsing*Que

ue* Parsing*Pipeline*

•  Source,*event*typing*•  Character*set*normalizaIon*

•  Line*breaking*•  Timestamp*idenIficaIon*•  Regex*transforms*

Indexing*Pipeline*

RealBIme*Buffer*

Raw(data(Index(Files(

RealBIme*Search*Process*

Monitor(Input(

Inde

x*Que

ue*

TCP/UDP(Input(

Scripted(Input( Splunk(Index(

8*

Page 9: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

9*

Hunk*B*AnalyIcs*Plaborm*for*Hadoop*

Full<featured,(Integrated(Product*

Insights(for(Everyone*

Works(with(What(You(Have(Today*

Explore* Visualize* Dashboards*

Share*Analyze*

Hadoop(Clusters( NoSQL,(EMR,(S3(Buckets(

Hadoop*Client*Libraries*

Page 10: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Hunk*Unique*Features*

10*

Virtual(Index( Schema<on<the<fly(Flexibility(and((

Fast(Time(to(Value(

•  Enables*seamless*use*of*the*Splunk*technology*stack*on*data*wherever*it*rests*• NaIvely*handles*MapReduce*

•  Structure*applied*at*search*Ime*• No*brifle*schema**•  AutomaIcally*find*paferns*and*trends*

•  InteracIve*search*•  Preview*results*while*MapReduce*jobs*run*• DragBandBdrop*analyIcs*

Security:*Access*Control,*Pass*Through*AuthenIcaIon,*Kerberos**

Page 11: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Hunk*Provides*SelfBService*AnalyIcs*for*Hadoop*

11*

Hadoop(Storage(

Explore* Analyze* Visualize* Dashboards* Share*

Page 12: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

What*About*Structured*Data?*

12*

Customer(profile(

Product(aYributes(

Employee(details(

Pricing(and((Rate(plans(

Asset(info(

Page 13: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Use*cases*for*structured*data*in*Splunk*Index*machine*data*from*databases,*such*as*logs*or*sales*records*

Enrich*machine*data*with*highBlevel*data,*such*as*customer*records*

Update*structured*databases*with*Splunk*info,*such*as*risk*scores*

InteracIvely*browse*structured*and*unstructured*data*from*Splunk*reports*

Page 14: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Machine*Data*–*Delivers*RealBIme*Insights*

14*

Media(server(logs((

(machine(data)(

Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct start for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053 - 80 - 10.164.232.181 "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680!Mar 01 19:18:50:163 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct stop for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!

Phone*Number** IP*Address* Track*ID*

Page 15: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Structured*Data*–*Contains*Business*Context*

15*

Media(server(logs((

(machine(data)(

Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct start for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053 - 80 - 10.164.232.181 "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680!Mar 01 19:18:50:163 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct stop for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!

Track(ID( ArIst* Title* Format*ID* Run*Ime*

01011207201000005652000000000053* Maroon*5* Moves*like*Jagger* MP3* 4:30*

Phone(#( Subscriber(ID(

2172618992* 53546*

Subscriber(ID(

First*Name* Last*Name* Age* State* Customer*Score*

****53546* Jim* Morrison* 25* CA* 93*

Customer,(product(databases(

Phone*number** IP*address* Track*ID*

Page 16: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Splunk*DB*Connect*

!   Enrich*search*results*with*addiIonal*business*context*

!   Easily*import*data*into*Splunk*for*deeper*analysis*

!   Integrate*mulIple*DBs*concurrently*!   Simple*setBup,*nonBevasive*and*secure*

Reliable,(scalable,(real<%me(integra%on(between(Splunk(and(tradi%onal(rela%onal(databases(

MicrosoT*SQL*server*

JDBC*

Database**lookup*

Database**query*

ConnecIon**pooling*

Other**databases*

Oracle**database*

Java*Bridge*Server*

16*

Page 17: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Customer*Open*Source*AlternaIve*

Page 18: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Hadoop*Ecosystem*OpIons*

Page 19: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Hadoop*Advantage*/*Disadvantage**

19*

Advantage( Disadvantage(

Cheap*Storage* Requires*Coding*for*most*AnalyIcs*

Batch*Distributed*Processing* No*VisualizaIon*Tools*

No*OOTB*Apps*/*SoluIons*

Page 20: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

RealBWorld*Customer*Architecture*

Page 21: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

2000 Forwarders

Real Time Data - 25 Indexers

3 instances Splunk / Hunk / DB Connect Search Heads

…(…(

Historical data (VIX) 60 Hortonworks nodes

Enrichment data (lookup) - MySQL DB

***Summary*Architecture*

Page 22: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Splunk*Deployment*Architecture*

22*

3*search*head*

indexer*

indexer*Web*server* 2,000*

forwarders*

Web*server* forwarder*

25*indexers*

~2TB*per*day* ~250*Users*~30*Concurrent*Users*

Page 23: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Hadoop*Architecture*

23*

WebLogic*app*server*

data*node* data*node*

data*node*

~30*Flume*Agents*~60*Data*Nodes*~1.2*PB*of*storage*~2*years*data*retenIon*

data*node*

WebLogic*app*server*

Page 24: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Splunk*+*Hunk*=*All*the*Data*

24*

Web*server*

indexer*

indexer*

app*server* data*node* data*node*

data*node*

•  Real*Time*•  AnalyIcs*•  Alerts*•  Apps*

•  Batch*•  Compliment*Splunk*

AnalyIcs*•  Historical*searches*

Page 25: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

DB*Connect*Architecture*

•  Install*DB*Connect*on*a*Search*Head*•  Use*DB*Connect*for*Lookup*•  Several*Lookups*coming*from*two*different*MySQL*Databases*

•  Lookup*Enrich*log*data*with*business*insight*

25*

Search*Head*

DBB1* DBB2*

MySQL*JDBC*Driver**

Page 26: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

DB*B*Architecture*Performance*Impact*Command( Connec%on(( Architecture(

Indexing(

Inputs*B*dbmonBtail***(Recommended((

Medium*number*of*connecIons*(Small*amount*of*data*B*only*delta)*

DB*to*Index**(connecIon*pooling)*

Inputs*–*dbmonBdump* Small*amount*of*connecIons*(Lots*of*data*per*connecIon)*

DB*to*Index*(connecIon*pooling)*

Outputs* Lots*of*DB*ConnecIons*(Small*amount*of*data)*

Search*Head*to*DB*(connecIon*pooling)*

Not(Indexing(

Search*–*DBXQuery* Lots*of*DB*ConnecIons* DB*to*Search*Head*

Lookups***(Selected(this(op%on(*

Lots*of*DB*ConnecIons* DB*to*Search*Head*

Page 27: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

2000 Forwarders

Real Time Data - 25 Indexers

3 instances Splunk / Hunk / DB Connect Search Heads

…(…(

Historical data (VIX) 60 Hortonworks nodes

Enrichment data (lookup) - MySQL DB

***Summary*Architecture*

Page 28: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

Customer*Chosen*Architecture*Demo*

Page 29: Real*World*Big*Data* Architecture*B*Splunk,* …...Machine*Data*–*Delivers*RealBIme*Insights* 14* Media(server(logs(((machine(data)(Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576

THANK*YOU*