Upload
isaac-a-mosquera
View
16.281
Download
1
Embed Size (px)
DESCRIPTION
Splunk and Socialize discuss Big Data and how to process it efficiently.
Citation preview
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 1/32
Big Data at theSpeed of Busin
Isaac MosqueraDirector of Mobile, ShareThis
Clint SharpPrincipal Big Data Product Manager,
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 2/32
What We’ll Talk About
• Our quest for visibility
• Analyzing at scale
• Splunk and Big Data
• Where do you start?
• Q&A
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 3/32
About Splunk
Company(NASDAQ:SPLK)
" Founded2004,firstso?warerele
" HQ:SanFrancisco
BusinessModel/Products
" Industry-leadingmachinedatapla
" On-premise,inthecloudandSaa
5,600+Customers
" 63oftheFortune100
" Largestlicense:100Terabytespe
#1BigDataInnovator*
*FastCompany'sMostInnova1veComp
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 4/32
About ShareThis and Social
" ShareThismakestheworldmoreconnected,
trustedandvaluablethroughsharing
" Powersthesocialweb,touchingthelives
of95percentofU.S.
" AcquiresSocialize,whichmakesmobile
andsocialmoreengaging
" SocializedintegratedintothousandsofiOSandAndroidApps
" Installedon80M+devices
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 5/32
Evaluating 20 Billio Ad Impressions Monthl
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 6/32
AdRequest R TB
AdRequest
So
B
BidResponseWinningBidder'sAd
AdImpression
AdClick
Little Bit About Real-Time Bidd
Allthisneedstohappeninlessthan100milliseconds!
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 7/32
So What Are Some of the Proble
DecisionMaki
(BidAlgorithm" IngesYngmorethan10,000
queriespersecond
" Whichbidsare>100ms
" Quicklyfindinganyerrorswithinthesystem
" Campaignspend
" Campaignefficie
" Dissectdataby:
– apps – users – devices
OperaTonal
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 8/32
Analyzing Big Data Efficien
1. 2. 3.
CollecYon Storage AnalyzaYon/
AggregaYon
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 9/32
Some Options
SQLfuncYonslikecount()presentsproblemsatscale
WriteoperaYonstoohighforasingleDB,aswellasasinglepointoffailure
Wouldworkwellforhighinsertsandqueries,howeverwewouldneedtobuildalerYng,charYng
andreporYngdashboardsEasytosetupandqueryusingHivehoweverwewouldhavetosetupanewenvironmentsandlearnnewtechnology
RDBMS
RDBMS
NoSQL
Hadoop
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 10/32
EasilyidenYfyproblemsandpreventerroneouspending.Whenanalertgoesoffwehitascrwhichshutsoffthebidder.
Allowsustofindpaernsinthedatatoimproourbidalgorithms
Instantlyknowcampaignmetricsforusand
ourclientsAddingnewRTBServiceprovidersmeansbillinewadrequests.Scalinghorizontallyiskey
OperaTonal
ReporTng
AdHocQueries
ApplicaTon
ReporTngScalability
Splunk Fits the Bill
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 11/32
Analysis/Aggregationindex=ad_events displayed_ad
| bin _time span=1m
| stats count(meta.displayed_ad) as displayssum(price/1000) as dollars_spent
avg(price) as avg_cpm_price
by campaign_id _time
| mysqloutput spec=ads-prod table=ads_analytics
insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
RDBM
(Generated
Search
Head
Indexer
Indexer
Indexer
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 12/32
InteracYveanalysiswithSearchProcessingLanguage:
Using Splunk to Analyze Operation
EasilydigestinformaYonthroughcharts
source="nginx-prod.log" | stats avg(ResponseTime) avg_rtime, p95(ResponseTime) as p95_rtime ,
stdev(ResponseTime) as stdev_rtime
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 13/32
Final Architecture
RDBMS
(Generated
Reports)S3
Snapshots
Search
Head
SocializeBidder
Splunk
Indexer
Indexer
Indexer
CacheCluster
Memcache Memcache Memc
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 14/32
So, What isSplunk?
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 15/32
Expanding Universe of Data Sou
Machine-generatedDataBusinessApplicaTonData Human
HighlyStructured Arbitraril
2012-12-05 07:04:44Id=00Q000000Rd910EAJ City=New York
Country=US CreatedDate=“2012-12-05
07:06:44” [email protected]
Email_Opt_In_c Customer_Street
_Address_c=“123 Main St.”purchased_product_id=
product_i BD-01 twitter_username
john_t_doe
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 16/32
Industry Leading Platform for Mach
Any Machine Data Operational Intelligen
HAIndexes
andStorage
Custodashboa
Monitorandalert
Adhocsearch
Reportandanalyze
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 17/32
Analyzing Heterogeneous Da
UniversalIndex Schema-on-the-fly FlexFast
• NodatanormalizaYon• AutomaYcallyhandlesYmestamps
• Parsersnotrequired• Indexeveryterm&paern“blindly”
• Noaemptto“understand”upfront
• Structureappliedatsearch-Yme
• Nobrileschematoworkaround
• AutomaYcallyfindtransacYons,paernsandtrends
• Normaneeded
• Faster• Easyse• MulYpsamed
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 18/32
Gain Critical Insights … in Real-OrderID
TimeWaiYngOnHold
Company’sName
Sources
wier
CareIVR
MiddlewareError
OrderProcessing
OrderID
CustomerID
TwierID
CustomerID
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 19/32
Deep Visibility and Insight for IT and
ITOperaYonsManagement WebIntellig
BusinessAnalyYcApplicaYonManagement
SecurityandCompliance IndustrialData/InternetofTh
Over 5,600 organizations using Splunk across IT and busin
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 20/32
Driving Insightsfrom Big Data
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 21/32
Hadoop
The ShareThis Insights Platfo
OnFather’sday:
“Whowerethemostsharedabouttopics?”
“Whattypeoftypeofbeersdopeopledrink?”
API EL Pre-aggregaTon
AnalyTcs
?
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 22/32
Finding the Optimal Approac
" HadoopandMapReducearegreatforcomplexdatasciendataatrest–thepreviousarchitecturetook9monthswi
ofengineers,dataarchitects,etc.
" TheSplunkplaormdeliversreal-Yme,interacYveanalyswecanbuildmanyofthesameinsightswithin1hour
Whatshouldbethecorefocusorcompetencyofyourte
Conclusion:findthemostopYmalapproachforthebusin
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 23/32
What About Ad Hoc Analysis?
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 24/32
PR Insights Example
" WhatwasthesituaTon?(e.g.fastmovingbusiness,needed
real-Ymeinsights)
" WhatwasthePRteamstrugglingwith?Difficulttofindusef
datatobuildinteresYnguse-cases
" Whatdidtheywant?Theywantedaflexiblereal-Ymerepo
environmenttoextractinsightsusefulforthemarket
" Howmyteamhelped?Deliveredasingledashboardthatcoreal-Ymedataintothesharingbehaviorsacrossournetwork
i h hb d
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 25/32
PR Insights Dashboard
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 26/32
Let’s not forgeThe low-hanging fru
O ti l A l ti f O li
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 27/32
Operational Analytics for an Online
website
API NoYficaYonGoogle(GCM)
Feedback
Processor
Apple(APNS)
? !
NoTficaTonsSystems
DrivingSuperiorCustomerExperience
Howmany500errors
haveIhadoverYme
Lookforanomalies
andspikes!
Zone
tothe!OnlineDeviceNoYficaYons
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 28/32
One More T
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 29/32
NewproductfromS
deliversinteracTve
exploraTon,analysvisualizaTonsforH
AnnouncingHunSplunkAnalyYcsfor
D i A ti bl I i ht f R
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 30/32
Derive Actionable Insights from Ra
Hadoop
Storage
s
a
v
d
1 2Point
Splunkat
Hadoop
Cluster
Explore Analyze Visualize Dashboards Share
L M
7/15/2019 Hadoop Summit Socialize & Splunk
http://slidepdf.com/reader/full/hadoop-summit-socialize-splunk 31/32
Learn More
splunk.com/bigd