Transcript

Copyright©2016Splunk Inc.

Dr.AdamOlinerDirectorofEngineering,DataScience,Splunk

UsingtheSplunk MachineLearningToolkittoCreateYourOwnCustomModels

ManishSainaniPrincipalProductManager,Splunk

Disclaimer

2

Duringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfutureeventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose

containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor

functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.

Whoarewe?

3

Dr.AdamOliner– DirectorofEngineering,DataScience&MachineLearning– Splunker for2years– Embarrassinglyovereducated

ManishSainani– PrincipalProductManager,MachineLearning– Splunker for2years– FirstMLhireatSplunk!

Whatarewedoinghere?

4

OverviewofMachineLearningTheAssistants:GuidedMachineLearning– Prepare– Fit– Validate– Deploy

Examples– DIYAnomalyDetector– CustomerApplications

OverviewofMLatSplunk

CorePlatformSearch PackagedPremiumSolutions CustomML

PlatformforOperationalIntelligence

SplunkMachineLearningToolkit

Assistants: Guidemodelbuilding,testing,&deployingforcommonobjectivesShowcases: InteractiveexamplesfortypicalIT,security,business,IoTusecases

Algorithms: 25+standardalgorithmsavailableprepackagedwiththetoolkitSPLMLCommands:Newcommandstofit,testandoperationalizemodelsPythonforScientificComputingLibrary:300+opensourcealgorithmsavailableforuse

Buildcustomanalyticsforanyusecase

ExtendsSplunkplatformfunctionsandprovidesaguidedmodelingenvironment

What’sNewsinceour0.9BetaRelease(lastyear’s.conf)?

7

• Newnameandabbreviation;-)• Noeventlimits(removalof50Klimitonfittingmodels)

• Configurableresourcecapsviamlspl.conf

• Searchheadclusteringsupport• Distributed/streamingapply• Scheduledfit• Newalgorithms(nextslide)

– Featureengineeringandselection– Stochasticgradientdescent(e.g.)– ARIMA

• Multi-algorithmsupportacrossAssistants

• Scatterplotmatrixviz• Alerting• Tooltips• In-apptours• ClusterNumericEventsassistant• VideosvideosvideosforeachassistantacrossIT,Security,IoT andBusinessAnalytics

• ML-SPLCheatSheet

Algorithmssupported(v2.0,.conf2016)

TheAssistants:GuidedMachineLearning

MachineLearning

10

AprocessforgeneralizingfromexamplesExamples– A,B,…→ # (regression)– A,B,... → a (classification)– Xpast → Xfuture (forecasting)– likewithlike (clustering)– |Xpredicted – Xactual|>>0 (anomalydetection)

MachineLearningProcess

11

CollectData

Explore/Visualize

Model

Evaluate

Clean/Transform

Publish/Deploy

MachineLearningProcesswithSplunk

12

CollectData

Explore/Visualize

Model

Evaluate

Clean/Transform

Publish/Deploy

props.conf,transforms.conf,DatamodelsAdd-onsfromSplunkbase,etc.

Pivot,TableUI,SPLMLToolkit

Alerts,Dashboards,Reports

DomainExpertise(IT,Security,…)

DataScienceExpertise

SplunkExpertise

CustomMachineLearning– SuccessFormula

Identifyusecases

Drivedecisions

Setbusiness/opspriorities

SPL

Dataprep

Statistics/mathbackground

Algorithmselection

Modelbuilding

SplunkMLToolkitfacilitatesandsimplifiesviaexamples&guidance

Operationalsuccess

GuidedMLwiththeAssistants

14

Guidesyouthroughvariousanalytics– Prepare,fit,validate,anddeploy

AutomaticallygeneratesalltherelevantSPL

Assistants:Fit

15

Assistants:Validate

16

Assistants:Deploy

17

TheAssistants

18

1. PredictNumericFields2. PredictCategoricalFields3. DetectNumericOutliers4. DetectCategoricalOutliers5. ForecastTimeSeries6. ClusterNumericEvents

PredictNumericFields

19

Algorithms– LinearRegression

ê …includingLasso,Ridge,andElasticNet– KernelRidge– DecisionTreeRegressor– RandomForestRegressor– SGDRegressor

Validation– Fourvisualizationsofpredictionerror– R2 andRMSE

PredictCategoricalFields

20

Algorithms– LogisticRegression– DecisionTreeClassifier– RandomForestClassifier– SGDClassifier– SVM– NaïveBayes

ê BernoulliNB andGuassianNB

Validation– Precision,recall,accuracy,F1– Confusionmatrix

DetectNumericOutliers

21

Methods– Standarddeviation– Medianabsolutedeviation– Interquartilerange

Validation:

DetectCategoricalOutliers

22

StatisticalmethodsValidation:

ForecastTimeSeries

23

Algorithms– State-spacemethodusingKalman filter– ARIMA

Validation

ClusterNumericEvents

24

Algorithms– KMeans– DBSCAN– Birch– SpectralClustering

Validation– ScatterplotMatrixviz

Prepare

DataGatheringandPrep

26

Source:CrowdFlower

Splunk!

27

Leadingplatformforcollecting,cleaning,andtransformingdataInteractiveFieldExtractorDatamodelsHundredsofadd-onsfromSplunkbasetransforms.confprops.confetc.

FeatureEngineeringTFIDF(term-frequencyxinversedocument-frequency)– Transformfree-formtextintonumericattributes

StandardScaler (i.e.normalization)FieldSelector (i.e.choosekbestfeaturesforregression/classification)PCAandKernelPCA

PreprocessingintheAssistants

29

Fit

Fit:What’sNew

31

NoeventlimitsConfigurableresourcecaps(ml-spl.conf)SearchheadclusteringsupportScheduledfitNewalgorithms

Fit:What’sNew

32

Validate

Validate/Apply:What’sNew

34

ConfigurableresourcecapsSearchheadclusteringsupportDistributed/streamingapplyScatterplotmatrixviz

ScatterplotMatrixViz

35

Deploy

DeployanywhereinSplunk!

37

ScheduledtrainingAlertingReportsanddashboardsAugmentedsearchresultsetc.

Deploy:What’sNew

38

DistributedApply– Applymodelstoindexeddata– Streaming

ScheduledtrainingAlerting

What’sNew:ScheduledFit

39

What’sNew:Alerting

40

Example:DIYAnomalyDetector

Let’sBuildanAnomalyDetector!

42

We’llusetwoAssistants– PredictNumericFields– DetectNumericOutliers

Showautomatically-generatedintermediateSPL

FitaPredictiveModel

43

SetupScheduledTraining

44

OpenResidualsinSearch

45

OpenDetectNumericOutliersAssistant

46

DetectOutliers(LargePredictionErrors)

47

ScheduleanAlert

48

ScheduleanAlert

49

ScheduleanAlert

50

ManageYourNewAnomalyDetector

51

TheAssistantGeneratedtheSPLforYou

52

TheAssistantGeneratedtheSPLforYou

53

YouBuiltanAnomalyDetector!

54

YoubuiltapredictivemodelofACPowerWhenthepredictionerrorfromthismodelisanoutliercomparedtopasterrors,yougenerateanalertThispredictivemodelautomaticallyretrainsitselfonascheduleyoucontrolYoudidn’thavetotypeanySPL

#winning

MachineLearningCustomerSuccess

NetworkOptimizationDetect&PreventEquipmentFailure Security/FraudPrevention

PrioritizeWebsiteIssuesandPredictRootCause

PredictGamingOutagesFraudPrevention

MachineLearningConsultingServices AnalyticsAppbuiltonMLToolkit

Optimizingoperationsandbusinessresults

PreventCellTowerFailureOptimizeRepairOperations

Entertainment Company

15

MachineLearningToolkitCustomerUseCases

57

Speedingwebsiteproblemresolutionbyautomaticallyrankingactionsforsupportengineers

Reducingcustomerservicedisruptionwithearlyidentificationofdifficult-to-detectnetworkincidents

Minimizingcelltowerdegradationanddowntimewithimprovedissuedetectionsensitivity

Improvinguptimeandloweringcostsbypredicting/preventingcelltowerfailuresandoptimizing repairtruckrolls

Predictingandavertingpotentialgamingoutageconditionswithfiner-graineddetection

EnsuringmobiledevicesecuritybydetectinganomaliesinIDauthentication

PreventingfraudbyIdentifyingmaliciousaccountsandsuspiciousactivitiesEntertainment Company

DetectNetworkOutliersReduceddowntime+increasedserviceavailability=bettercustomersatisfaction

58

MLUseCase Monitornoiserisefor20,000+celltowerstoincreaseserviceanddeviceavailability,reduceMTTR

Technicaloverview • Acustomizedsolutiondeployedinproductionbasedonoutlierdetection.• Leveragepreviousmonthdataandvotingalgorithms

“TheabilitytomodelcomplexsystemsandalertondeviationsiswhereITandsecurityoperationsareheaded…SplunkMachineLearninghasgivenusaheadstart...”

ReliablewebsiteupdatesProactivewebsitemonitoringleadstoreduceddowntime

59

“SplunkMLhelpsusrapidlyimproveend-userexperiencebyrankingissue severitywhichhelpsusdeterminerootcausesfasterthusreducingMTTRandimprovingSLA”

• Veryfrequentcodeandconfig updates(1000+daily)cancausesiteissues• Finderrorsinserverpools,thenprioritizeactionsandpredictrootcause

• CustomoutlierdetectionbuiltusingMLToolkitOutlierassistant• BuiltbySplunkArchitectwithnoDataSciencebackground

MLUseCase

Technicaloverview

WhatNow?

60

GettheMachineLearningToolkitfromSplunkbaseGowatchMachineLearningVideosonSplunkYoutube Channelhttp://tiny.cc/splunkmlvideosGotoMachineLearningstalks:– AdvancedMachineLearninginSPLwiththeMachineLearningToolkitbyJacobLeverich– ExtendingSPLwithCustomSearchCommandsandtheSplunkSDKforPythonbyJacobLeverich

SeveralCustomersandPartnerTalks– Cisco,Scianta Analytics,AsianTelco,etc.EarlyAdopterAndCustomerAdvisoryProgram:[email protected]:[email protected]:[email protected]

http://tiny.cc/splunkmlapp

THANKYOU


Recommended