Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Copyright*©*2015*Splunk*Inc.*
Raanan*Dagan*Rohit*Pujari*
Real*World*Big*Data*Architecture*B*Splunk,*Hadoop,*RDBMS*
Disclaimer*
2*
During*the*course*of*this*presentaIon,*we*may*make*forward*looking*statements*regarding*future*events*or*the*expected*performance*of*the*company.*We*cauIon*you*that*such*statements*reflect*our*current*expectaIons*and*esImates*based*on*factors*currently*known*to*us*and*that*actual*events*or*results*could*differ*materially.*For*important*factors*that*may*cause*actual*results*to*differ*from*those*contained*in*our*forwardBlooking*statements,*please*review*our*filings*with*the*SEC.*The*forwardBlooking*statements*made*in*the*this*presentaIon*are*being*made*as*of*the*Ime*and*date*of*its*live*presentaIon.*If*reviewed*aTer*its*live*presentaIon,*this*presentaIon*may*not*contain*current*or*
accurate*informaIon.*We*do*not*assume*any*obligaIon*to*update*any*forward*looking*statements*we*may*make.**
*In*addiIon,*any*informaIon*about*our*roadmap*outlines*our*general*product*direcIon*and*is*subject*to*change*at*any*Ime*without*noIce.*It*is*for*informaIonal*purposes*only*and*shall*not,*be*incorporated*into*any*contract*or*other*commitment.*Splunk*undertakes*no*obligaIon*either*to*develop*the*features*
or*funcIonality*described*or*to*include*any*such*feature*or*funcIonality*in*a*future*release.*
Agenda*
! Splunk*Big*Data*Architecture*! AlternaIve*Open*Source*Approach*! RealBWorld*Customer*Architecture*! EndBtoBend*DemonstraIon*
3*
Who*are*you?*
4*
• Raanan*Dagan*B*Sr.*SE,*Big*Data*specialist*• Rohit*Pujari*–*Sr.*SE,*Big*Data*SME*
Rela%onal(Database(Structured(
SQL( Search(
Schema(at(Write( Schema(at(Read(
Splunk(
Big*Data*Technologies*
5*
ETL(Real<Time(Indexing(
RDBMS(Oracle,(MySQL,(IBM(
DB2,(Teradata(
Hadoop(Semi<Structured(
MapReduce(
Schema(at(Read(
HDFS(Storage(
Distributed(File(System(
Time<Series,(Unstructured,(Heterogenous(
NoSQL(Semi<Structured(
Schema(at(Read(
Key<Value,(Column,(
Document(&(Other(Stores(
Cassandra,(Accumulo,(MongoDB(
MapReduce(
SQL( Search(
Schema(at(Write( Schema(at(Read(
Splunk(
Splunk*Big*Data*Technologies*
6*
ETL(Real<Time(Indexing(
RDBMS(Oracle,(MySQL,(IBM(
DB2,(Teradata(
MapReduce(
Schema(at(Read(
HDFS(Storage(
Distributed(File(System(
Time<Series,(Unstructured,(Heterogenous(
Schema(at(Read(
Key<Value,(Column,(
Document(&(Other(Stores(
Cassandra,(Accumulo,(MongoDB(
MapReduce(
Hunk(DB(Connect(
Splunk*Scalability*
7*
Enterprise<class(Availability(and(Scale(
Send*data*from*thousands*of*servers*using*any*combinaIon*of*Splunk*forwarders***
Auto*loadBbalanced*forwarding*to*Splunk*Indexers*
Offload*search*load*to*Splunk*Search*Heads*
! AutomaIc*load*balancing*linearly**scales*indexing*
! Distributed*search*and*MapReduce*linearly*scales*search*and*reporIng*
Splunk*RealBTime*AnalyIcs**
Data*
Parsing*Que
ue* Parsing*Pipeline*
• Source,*event*typing*• Character*set*normalizaIon*
• Line*breaking*• Timestamp*idenIficaIon*• Regex*transforms*
Indexing*Pipeline*
RealBIme*Buffer*
Raw(data(Index(Files(
RealBIme*Search*Process*
Monitor(Input(
Inde
x*Que
ue*
TCP/UDP(Input(
Scripted(Input( Splunk(Index(
8*
9*
Hunk*B*AnalyIcs*Plaborm*for*Hadoop*
Full<featured,(Integrated(Product*
Insights(for(Everyone*
Works(with(What(You(Have(Today*
Explore* Visualize* Dashboards*
Share*Analyze*
Hadoop(Clusters( NoSQL,(EMR,(S3(Buckets(
Hadoop*Client*Libraries*
Hunk*Unique*Features*
10*
Virtual(Index( Schema<on<the<fly(Flexibility(and((
Fast(Time(to(Value(
• Enables*seamless*use*of*the*Splunk*technology*stack*on*data*wherever*it*rests*• NaIvely*handles*MapReduce*
• Structure*applied*at*search*Ime*• No*brifle*schema**• AutomaIcally*find*paferns*and*trends*
• InteracIve*search*• Preview*results*while*MapReduce*jobs*run*• DragBandBdrop*analyIcs*
Security:*Access*Control,*Pass*Through*AuthenIcaIon,*Kerberos**
Hunk*Provides*SelfBService*AnalyIcs*for*Hadoop*
11*
Hadoop(Storage(
Explore* Analyze* Visualize* Dashboards* Share*
What*About*Structured*Data?*
12*
Customer(profile(
Product(aYributes(
Employee(details(
Pricing(and((Rate(plans(
Asset(info(
Use*cases*for*structured*data*in*Splunk*Index*machine*data*from*databases,*such*as*logs*or*sales*records*
Enrich*machine*data*with*highBlevel*data,*such*as*customer*records*
Update*structured*databases*with*Splunk*info,*such*as*risk*scores*
InteracIvely*browse*structured*and*unstructured*data*from*Splunk*reports*
Machine*Data*–*Delivers*RealBIme*Insights*
14*
Media(server(logs((
(machine(data)(
Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct start for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053 - 80 - 10.164.232.181 "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680!Mar 01 19:18:50:163 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct stop for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!
Phone*Number** IP*Address* Track*ID*
Structured*Data*–*Contains*Business*Context*
15*
Media(server(logs((
(machine(data)(
Mar 01 19:18:50:000 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct start for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053 - 80 - 10.164.232.181 "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680!Mar 01 19:18:50:163 aaa2 radiusd[12548]:[ID 959576 local1.info] INFO RADOP(13) acct stop for [email protected] 10.164.232.181 from 12.130.60.5 recorded OK.!
Track(ID( ArIst* Title* Format*ID* Run*Ime*
01011207201000005652000000000053* Maroon*5* Moves*like*Jagger* MP3* 4:30*
Phone(#( Subscriber(ID(
2172618992* 53546*
Subscriber(ID(
First*Name* Last*Name* Age* State* Customer*Score*
****53546* Jim* Morrison* 25* CA* 93*
Customer,(product(databases(
Phone*number** IP*address* Track*ID*
Splunk*DB*Connect*
! Enrich*search*results*with*addiIonal*business*context*
! Easily*import*data*into*Splunk*for*deeper*analysis*
! Integrate*mulIple*DBs*concurrently*! Simple*setBup,*nonBevasive*and*secure*
Reliable,(scalable,(real<%me(integra%on(between(Splunk(and(tradi%onal(rela%onal(databases(
MicrosoT*SQL*server*
JDBC*
Database**lookup*
Database**query*
ConnecIon**pooling*
Other**databases*
Oracle**database*
Java*Bridge*Server*
16*
Customer*Open*Source*AlternaIve*
Hadoop*Ecosystem*OpIons*
Hadoop*Advantage*/*Disadvantage**
19*
Advantage( Disadvantage(
Cheap*Storage* Requires*Coding*for*most*AnalyIcs*
Batch*Distributed*Processing* No*VisualizaIon*Tools*
No*OOTB*Apps*/*SoluIons*
RealBWorld*Customer*Architecture*
2000 Forwarders
Real Time Data - 25 Indexers
3 instances Splunk / Hunk / DB Connect Search Heads
…(…(
Historical data (VIX) 60 Hortonworks nodes
Enrichment data (lookup) - MySQL DB
***Summary*Architecture*
Splunk*Deployment*Architecture*
22*
3*search*head*
indexer*
indexer*Web*server* 2,000*
forwarders*
Web*server* forwarder*
25*indexers*
~2TB*per*day* ~250*Users*~30*Concurrent*Users*
Hadoop*Architecture*
23*
WebLogic*app*server*
data*node* data*node*
data*node*
~30*Flume*Agents*~60*Data*Nodes*~1.2*PB*of*storage*~2*years*data*retenIon*
data*node*
WebLogic*app*server*
Splunk*+*Hunk*=*All*the*Data*
24*
Web*server*
indexer*
indexer*
app*server* data*node* data*node*
data*node*
• Real*Time*• AnalyIcs*• Alerts*• Apps*
• Batch*• Compliment*Splunk*
AnalyIcs*• Historical*searches*
DB*Connect*Architecture*
• Install*DB*Connect*on*a*Search*Head*• Use*DB*Connect*for*Lookup*• Several*Lookups*coming*from*two*different*MySQL*Databases*
• Lookup*Enrich*log*data*with*business*insight*
25*
Search*Head*
DBB1* DBB2*
MySQL*JDBC*Driver**
DB*B*Architecture*Performance*Impact*Command( Connec%on(( Architecture(
Indexing(
Inputs*B*dbmonBtail***(Recommended((
Medium*number*of*connecIons*(Small*amount*of*data*B*only*delta)*
DB*to*Index**(connecIon*pooling)*
Inputs*–*dbmonBdump* Small*amount*of*connecIons*(Lots*of*data*per*connecIon)*
DB*to*Index*(connecIon*pooling)*
Outputs* Lots*of*DB*ConnecIons*(Small*amount*of*data)*
Search*Head*to*DB*(connecIon*pooling)*
Not(Indexing(
Search*–*DBXQuery* Lots*of*DB*ConnecIons* DB*to*Search*Head*
Lookups***(Selected(this(op%on(*
Lots*of*DB*ConnecIons* DB*to*Search*Head*
2000 Forwarders
Real Time Data - 25 Indexers
3 instances Splunk / Hunk / DB Connect Search Heads
…(…(
Historical data (VIX) 60 Hortonworks nodes
Enrichment data (lookup) - MySQL DB
***Summary*Architecture*
Customer*Chosen*Architecture*Demo*
THANK*YOU*