56
Big Data, Big Opportunity A Primer for Understanding The Big Data Frontier Sanjai Marimadaiah Mainframe CA Technologies Product Management, Office of the CTO, Big Data Management MFX01E @SanjaiM1 #CAWorld Michael Harer @MikeHarer Hiren Mandalia @hiren0210

Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

Embed Size (px)

Citation preview

Page 1: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

BigData,BigOpportunityAPrimerforUnderstandingTheBigDataFrontier

SanjaiMarimadaiah

Mainframe

CATechnologiesProductManagement,OfficeoftheCTO,BigDataManagementMFX01E

@SanjaiM1#CAWorld

MichaelHarer @MikeHarer Hiren Mandalia @hiren0210

Page 2: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

2 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Abstract

BigDataenvironmentsnowarebusiness-criticalforanyorganization.LearnthebasicsofBigDataandsomeoftheemergingtechnologiestargetingtheBigDataspace

SanjaiMarimadaiah

MichaelHarer

Hiren MandaliaCATechnologiesProductManagementOfficeoftheCTOBigDataManagement

Page 3: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

3 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Agenda

WHATISBIGDATA?

BIGDATAUSECASES

HADOOPBASICS

1

2

3

NOSQL BASICS4

CASSANDRABASICS5

MONGODB BASICS6

Page 4: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

4 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

HowdoIdeliveraflawlessexperienceeverytimeanapplicationtouchesthemainframe?

Intheapplicationeconomyit’sallaboutyourcustomers.Youneedtothinkaboutyourmainframereframed.

Connectmobile-to-mainframeapplications

Createmainframeinfrastructureflexibility

forthefuture

Unleashthepowerofdataonthemainframe

4 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Page 5: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

5 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

WhatisBigData?

Datasetswhosevolume,velocity,varietyandcomplexityexceedabilityofcommonlyusedsoftwaretoolstocapture,process,store,manage,andanalyzethem.

Information Sources

MobileTransactionalData

SearchTextsCRM,SCM,ERP

$ € ¥

ImagesEmail SocialMedia

ITOps AudioVideo

Velocity Volume

Variety Complexity

BigData

Page 6: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

6 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

EvolutionofDataManagementSolutionsRelationalDatabasesarenotsuitedforBigData

HierarchicalDataModels

RelationalDataModels

1960 1970 1980 1990 2000 2010

DocumentDataModels

Structured DataUnstructured Data

IBMIMS

SybaseInformixOracleIBM

GoogleHadoop

Page 7: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

7 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

StateofDatabaseWorkloadsBigDataworkloadsenablebroaderOLAPworkloads

Database- RDBMSOnline TransactionProcessing

DataWarehouseOnlineAnalyticalProcessing

BigDataBigDataWorkloads

BetterAnalyticsforhighervaluetransactions

Collecthistoricaltransactionaldataforanalytics

Addingmorecompletedataenhances analytics

Enhancedinsightsfromoperationalworkloads&

informationaccessapplications

Multimedia

WebLogs

SocialData

Sensordata:images

RFID

TextData:emails

Page 8: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

8 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

WhatisdrivingBigDataSolutionsCostefficiencyandStandardizedPlatformisfosteringinnovation

Scale-OutArchitecture Open-SourceSoftware

• Protects Investment : Just add more servers to expand capacity

• Lower cost of Infrastructure: Less expensive commodity servers (x86 based)

• Standardization leads to Innovation: A common programing interface is enabling innovation up the SW stack

• Lower software cost: Open source software is lowering software cost

100’s of inexpensive servers

HadoopCassandra

Page 9: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

9 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

AdoptionofBigDataSolutions

2X INCREASEinnumberoforganizationsthathavedeployed/implementeddatadrivenprojectssince2014

KeyTrends• Greaterpriorityonstructureddatainitiatives

• Topvendorcriteria- Integrationwithexistinginfrastructure

- Security- EaseofUse

• Necessaryskill sets:BusinessAnalysts,DataArchitects,DataAnalysts&DataVisualizers

40% oforganizationsarestillplanningtoimplementdataprojects

oforganizationsarestillplanningtoimplementdataprojects30%

Source:2015CASponsoredResearch:Vanson Bourne GlobalBigDataUserSurvey

Page 10: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

10 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

OverallBigDataMarket

§ TheBigDatamarketwas$27.36Bin2014,upfrom$19.6Bin2013.

§ 89%ofbusiness leadersbelieveBigDatawillrevolutionizebusinessopsthesamewaytheInternetdid.

§ 83%havepursuedBigDataprojectsinordertoseizeacompetitiveedge.

Wikibon projectstheBigDatamarketwilltop$84Bin2026,attaininga17%Compound AnnualGrowthRate(CAGR)fortheforecastperiod2011to2026.

Source:2015Wikibon BigDataMarketForecast

Page 11: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

11 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

DatabaseforBigDataOverallBigDatadatabasemarkettoprojectedtogrowat33%CAGRuntil2017

Source:©Wikibon BigDataModel2011-2017,BigDataMarketDatabase Projection,2011-2017($USbillions)

• BigDatadatabasemarketwillgrowatapprox.60%from2011-2017(6-year)

• MarketforNoSQLdatabasewas$0.2Bin2012,growingto$1.6Bin2017.

• Technologyprogression inData-in-DRAM-MemoryandData-in-Flash-Memorywillimprovethescalability ofSQLdatabases.

• Applications areeasiertoprogramandrequirelowermaintenanceifSQLisused;NoSQLhasgreaterscalabilityandlowertechnologycostsforverylargebig-dataapplications.

Source:2015Wikibon BigDataModel2011-2017

Page 12: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

12 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

VendorLandscape– BroaderParticipantsBIGDATAMARKETSEGMENT

HARDWARESERVERS(CHIPS) STORAGE NETWORKING

HP EMC/Dell CiscoDell NetApp AristaNetworksIntel Fusion-io Infeineta Systems

SOFTWAREHADOOP NOSQL *NGDW ANALYTICS &BI Management Solutions

Hortonworks Cassandra HP Vertica DigitalReasoning CABigDataControlCenter

Informatica

Cloudera MongoDB EMCGreenplum RevolutionAnalytics Vmware IBM BigInsights

MapR Couchbase TeradataAster Jaspersoft HPHAVEn ZettasetHadapt DataStax IBMNetezza Dataeet BluedataEPIC Syncsort

EMCGreenplum 10gen SAP Pentaho StackIQ BMC Control-M

SERVICESCLOUD SERVICES TECHNICAL SERVICES PROFESSIONALSERVICES

Amazon Hortonworks ThinkBigAnalyticsGoogle Cloudera IBMMapR Cloudwick EMCIBM EMC Accenture

Microsoft IBM Deloitte

*NGDW=NextGenerationDataWarehouse

CoreInfrastructureHadoopCassandraMongoDB AmazonBigDataMAPRElasticSearch

Page 13: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

BigDataUseCaseStudies

Page 14: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

14 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Media&EntertainmentUseCasePROBLEM SOLUTION POTENTIALBENEFITS§ Acompany’s streamingbusiness

hasexpandedfromthousandsofmemberswatchingoccasionallytomillionsofmemberswatchingovertwobillionhourseverymonth.

§ Acollectionofeventsdescribingwhat isbeing viewedmust begathered. Giventhatviewingiswhatmembersspendmostoftheirtimedoing,what’sneededisarobustandscalablearchitecturetomanageandprocessthis.

§ Certain thingswillbreakthearchitecturethatprocessesbillionsofviewing-relatedeventsperday.

§ Focusontheminimumviablesetofusecases

§ Availabilityoverconsistency- ourprimaryusecasescantolerateeventuallyconsistentdata,sodesignfromthestartfavoringavailabilityratherthanstrongconsistencyinthefaceoffailures.

§ Byfocusingontheminimumviablesetofusecases,ratherthanbuildingagenericall-encompassingsolution,wehavebeenabletobuildasimplearchitecturethatscales.

§ The company’sviewingdataarchitectureisdesignedforavarietyofusecases,rangingfromuserexperiencestodataanalytics.Thefollowingarethreekeyusecases,allofwhichaffecttheuserexperience:

Page 15: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

15 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

HealthCareUseCase

15

*SystemzVSAMdatabaserequiresspecialskillstoaccesswithoutvStorm ConnectDataStreamingforBigData

PROBLEM SOLUTION POTENTIALBENEFITS

§ Relapsesincardiacpatients§ “Onesizefitsall”

treatment§ Medicare readmission

penalties§ Sensitivepatientdataon

zSystemsVSAMfiles§ Noefficientwaytooffload

§ Identifyriskfactorsbyanalyzingpatientdata*

§ Factorsusedtopredictlikelyoutcomes

§ Reductioninreadmissions§ Savingsinnopenalty fees§ Nomanualintervention§ Noincrease instaffing

Page 16: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

16 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

RetailUseCase

16

PROBLEM SOLUTION POTENTIALBENEFITS

§ Streamsofuserdatanotcorrelated

§ e.g.storepurchases,websiteusagepattern,cardusage,historicalcustomerdata

§ Historical customerdataSystemzVSAM&DB2based– noefficient,secureoffload

§ HDFSsecurelypopulatedwithhistoricalcustomerdata,cardusage,storepurchases,websitelogs

§ Splunk scorescustomersbasedonthevariousdatastreams

§ Highscoringcustomersofferedcoupons,specialdealsonwebsite

§ Increaseinonlinesalesinthemiddleofretailslowdown

§ Improved conversionrateofwebsitebrowsingcustomers(shoppingcarttosales)

§ Eliminationofdatasilos–sincenowanalyticscoveralldatanomorereliance onmultiple reports/formats

Page 17: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

HadoopBasics

Page 18: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

18 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

WhatisHadoop?

Hadoopis…open-sourcesoftwaredesignedforHighScalability,FaultTolerant andHighlyDistributed

Keyelements:1. Distributedprocessing ofBigData(e.g.MapReduce)2. Distributedstorage(HadoopDistributedFileSystemorHDFS)

HDFS(DistributedReliableStorage)

MapReduce(ResourceManagement

&DataProcessing)

HDFS(DistributedReliableStorage)

YARN(ResourceManagement)

MapReduce(Dist.Programming)

Hadoop1.0 Hadoop2.0

Spark(InMemory)

1

23

HBase

(NoSQLstore)

Hive(Query)

Pig(Scripting)

Oozie(Workflow)

45

Page 19: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

19 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MapReduce– CoreHadoop1

§ Hadoop’sMapReduceframeworkinvolvestwophases:1. MapPhase:Distributesdatasetamongmultiple serversand

operatesonthedatalocally.2. ReducePhase:Recombinesthepartialresults.

AdistributedcomputingFramework

Page 20: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

20 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MapReduce– CoreHadoop1

AdistributedcomputingFramework

• JobTracker-OneoftheCoreHadoopservices thatmanagesthejobs andtheresourcesinthecluster(tasktrackers).JobTrackertriestoschedule a“map”asclosetotheactualdatabeingprocessed.

• TaskTracker–deployedonthedatanodes andareresponsible forrunningthemapandreducetasksasinstructedbyjobtracker

JobTracker

Job-1

Job-2

Job-3

Job-4

Job-5

MR

Processeslargejobsinparallelacrossmanynodesandcombinestheresults.

245

125

134

235

134

DataNodes

TaskTrackers

MasterNode

SlaveNodes

Page 21: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

21 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Job-1

Job-2

Job-3

Job-4

Job-5

HDFS

DataNodes

TaskTrackers

HadoopDistributedFileSystem(HDFS)Self-healing,highbandwidthClusteredStorage

• NameNode-OneoftheCoreHadoopservicesthatmaintainsthenamespace–knowswheredataisandmanagesblocks ondatanodes

• DataNode- serves thatactualstorethedataintheirlocaldisks.

• SecondaryNameNode-performsperiodic checkpointofprimarynamenodetoserveasabackupincaseoffailure

SlaveNodes

245

125

134

235

134

HDFSbreaksincomingfilesintoblocksandstoresthemredundantlyacrossthecluster.

NameNode(primary)

NameNode(secondary)

MasterNode

PeriodicCheckpoint

2

Page 22: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

22 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

YARN

YARNis…§ ResourceManagement§ NextgenerationMapReduce(MRv2)§ Splits JobTrackerinto:

– ResourceManager– Scheduling /Monitoring

3

WhatdoesYARNdo?§ Provides aclusterlevelresourcemanagerfor

improvedresourcemanagement&scaling§ Formsthenewsystem formanaging

applications inadistributedmanner§ Provides slotsforjobsotherthan

Map/Reduce§ Improvesresourceutilization ResourceManagementmovesintoYARN

YetAnotherResourceNegotiator

Page 23: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

23 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

HBASE

Whatisit?§ AHadoopopen source(Java)NoSQLdatabase§ Provides real-timeread/writeaccesstothose

largedatasets§ Distributedwithautomaticfailover

Anon-relational(NoSQL)databasethatrunson topofHDFS

4

Whyuseit?§ Provides anaturaldatastoragemechanism forall

kinds ofdata(especially unstructured)§ Forrandom,realtimeaccesstodatainHadoop§ Whentheprojectgoalistohostverylargetables

i.e.billions ofrowsandmillions ofcolumns§ Combines datasources thatuseawidevarietyof

differentstructuresandschemas§ Greatfor: storingsemi-structureddatalikelogdata

HBase(NoSQLstore)

LogicalViewofCustomerContactInformationinHBase

Page 24: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

24 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Hive

Whatisit?§ AqueryenginewrapperbuiltonMapReduce§ TreatedasadatawarehousetoolfortheHadoop

ecosystem§ PrimarilyforuserswithSQLskills§ ProvidesHive=QL(similartoSQL)§ StoresdatainHDFS

ADataWarehouseinfrastructurebuiltonHadoop

5

Whyuseit?§ Dataanalysisandreportingpurposes§ HidesHadoopcomplexityfromendusers§ CanbeusedwithinanELTfunction– i.e.toconvert

StructuredQuerylanguagetounstructuredMapReducejobs torunonaHadoopcluster

§ Goodfor:BatchProcessing tasks:logs, textmining,documentindexing, customerBI)

§ Notgoodfor:Onlinetransactionprocessing, real-timequeries.

Hive(Query)

Page 25: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

25 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cross-IndustryUseCase– ApacheHadoopELTPROBLEM SOLUTION BENEFITS§ Traditional DataWarehousing resourcesare

EXPENSIVE (e.g.transactionalMainframesystems)

§ Needtoreducecosts associatedtoStorage,CPUcapacityand3rd partyETLtools

§ Current systems cannotscale(i.e.process§ Lackefficient tools§ Toolstypicallyonlyhandlestructured data

(RDBMS)but BigDatainsightisderivedfromalltypesofdata(structured, unstructured, semi-structured

§ ApacheHadooptoolsto:

1. perform ETLfunctions

2. forhandlingallofthespecific datatypes.

3. Toshiftawayfromtraditional ETLtoELT(extract, load, andtransform).Thisshiftismainlydrivenbybigdata,whichfollowsthe“storefirst, analyzelater”modelthatisbecomingthenewstandard.

§ Compared totraditional transactional systems,Hadoopprovidesfast,low-cost processing

§ Newvaluecanbederivedfromability tohandlestructured andnon-structured data

§ Greater flexibility &choice:e.g.theTransformfunction canuseMapReduce,Hive,Pig,R,ShellScripts, Java…etc.

§ Vastsupport model:opensourcedevelopercommunity

ExtractTransform

Load

Load

Load

DWH

DataMining

Reporting

OLAP Analysis

Traditional ETLProcess

Web

CRM

ERP

Web

CRM

ERP

Social Media

Sensor Logs

Structured

Unstructured

Flume

Sqoop

Extract/Load

DataMining

Reporting

AnalyticsHDFS

HadoopDistributedFileSystem

PigMapReduce

Hive

Transform

Page 26: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

26 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Pureopensource– OpenCore– Compatible

CommercialDistributionsofHadoop

Cloudera Hadoop

HDFS OOZIE

Hortonworks

MAPR

Apache

Page 27: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

27 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

TheEvolvingHadoopEcosystemComponents Description

mahout RDataMining/machinelearningtoolsusedagainstHadoop datatodetectpatternsandtrends

PigScriptinglanguageforanalyzinglargedatasets.CompilestoMapReduce jobs

MapReduce YARNProgrammingmodelforprocessinglargedatasets.YARNperforms overall resourcemgmt

Oozie Aworkflowscheduler tooltomanageHadoop MapReduce jobs

Sqoop HiveEnableSQLforHadoop data:Sqoop - DatatransferbetweenHadoopandstructureddatastores.HIVE - datawarehouseforHadoop.Drill - opensource,lowlatencySQLqueryengineforHadoop andNoSQL.

Drill

ZooKeeperCoordinationofconfig.data,namingandsynchronizationofHadoop projects

Components Description

BigTopPackagingservicesforHadoopprojectstoeasetestinganddeployment

HBaseAnon-relational,distributeddatabasethatrunsontopofHDFS

Thrift /AVRO Schema-baseddata serializationsystemusingRPCcalls

Solrhutch Indexingandsearchtoolsfor

datastoredinHDFSforHadoopElasticsearch

Kafka /FlumeCollect,aggregate,andmovestreamingdatafrommultiplesourcesinto Hadoop

SparkAppDev toolfor Hadoop appscombiningbatch,streaming,andinteractiveanalytics

Anbari Chukwa Monitoring&ManagementofHadoop clustersandnodes

Page 28: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

NoSQLBasics

Page 29: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

29 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

NoSQL DatabasesOverview

§ Farbetterathandlingsemi-structuredandunstructureddata

§ Databaseconsistencyiscompromisedforavailabilityandeaseofpartitioning

§ Supportsobject-orientedprogrammingthatiseasytouseandflexible

§ Efficient,scale-outarchitectureinsteadofexpensive,monolithicarchitecture

Page 30: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

30 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

NoSQLtypes

Type DatabaseexamplesColumnDataModel HBase,Cassandra, Accumulo

DocumentDataModel MongoDB

Key-ValueDataModel OpenTSDB,Redis

GraphDataModel Neo4j,ArangoDB

Page 31: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

CassandraBasics

Page 32: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

32 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cassandra– History

BigTable,2006 Dynamo,2007

OpenSource,2008

CassandraDSE– Dec2011

Google Amazon

Facebook

Datastax

Page 33: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

33 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CassandraisIdealFor…

§ Massive,linearscaling

§ Extremelyheavywrites

§ Highavailability

CERN Barracuda

CISCO BlueMountain

Comcast Netflix SoundCloud

Page 34: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

34 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cassandra– DataModel

BenefitsofCassandraDataModel:§ Easilyaddnewcolumnswithoutdowntime

§ Schemafree/schemalessdatabase

§ Compressionpermitscolumnaroperations(MIN,MAX,SUMetc.)rapidly

ColumnFamily(similar toRDBMStable) ColumnFamily- JSONFormat

Page 35: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

35 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CassandraArchitecture

§ Allnodesthesame

§ Datapartitionedamongallnodesincluster

§ EachnodecommunicateswithothernodesusingGossipprotocol

§ Acommitlogisusedoneachnodetocapturewriteactivityfordatadurability

Client

Storage :CassandraFileSystemProcessing :CassandraQueryLanguage(CQL)

Page 36: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

36 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cassandra– Keyfeatures

§ Nosinglepointoffailure

§ Multi-datacenterandzonesupport

§ Purepeer-to-peerclustersetup

§ Allowsfor“tunableconsistency”

§ CassandraQueryLanguage(CQL)

§ CassandraFileSystem(CFS)

Page 37: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

37 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CassandraatNetflix

Usecases:§ WhattitleshaveIwatched?§ Whattitlesarerecommendedforme?§ WheredidIleaveofflast?§ Whatelseisbeingwatched?§ Measurememberengagement§ Informproduct&contentdecisions

Solution:§ Captureall‘view’ eventsinscalable

Cassandraclusters

Challenges:§ Ability toscalebillionwriteevents/day§ Provideresponsive titlebrowsingexp.

Source:techblog.netflix.com

Page 38: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

MongoDB Basics

Page 39: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

39 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

2007Founded

2009MongoDB 1.0Open-sourced

2012MongoDB 2.0

2015MongoDB 3.0

2013MongoDB Inc.

10gen 10gen 10gen MongoDB MongoDB

Page 40: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

40 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB isidealfor…

§ RDBMSreplacementforWebApplications

§ Semi-structuredContentManagement

§ Real-timeAnalyticsandHigh-Speedlogging

§ CachingandHighScalability

Web2.0,Media,SAAS,Gaming

HealthCare,Finance, Telecom,Government

Notsogreatfor– HighTransactionalDatabases

DisneyEventbriteIntuitIGN

Craigslist

Page 41: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

41 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB – Datamodel

RDBMS Document-oriented

BenefitsofDocument-orientedDBMS:

• Databaseschemaisoptional

• Flexibleindealingwithchangeandoptionalvalues

{“streetnum”: “123”,“streetname”: “Main St.”,“unit”: “456”,“City”: “Mountain View”,“State”: “California”,“zip”: “65432”}

{“streetnum”: “123”,“streetname”: “Main St.”,“unit”: “456”,“City”: “Mountain View”,“State”: “California”,“County”: “Santa Clara”“zip”: “65432”}Present

Future

Page 42: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

42 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB – Sharding

Page 43: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

43 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

ShardedProductionClusterSetup

Imagesource:mongodb.org

§ Shards storethedata.Toprovidehighavailabilityanddataconsistency,inaproductionshardedcluster,eachshardisareplicaset

§ ReplicaSetAclusterofMongoDB serversthatimplementsmaster-slavereplicationandautomatedfailover

§ QueryRouters,or mongos instances,interfacewithclientapplicationsanddirectoperationstotheappropriateshardorshards.

§ Config servers storethecluster’smetadata.Thisdatacontainsamappingof thecluster’sdatasettotheshards.

Page 44: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

44 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB– KeyFeatures

§ ScalableHigh-PerformanceOpen-Source,Document-orienteddatabase

§ BuiltforSpeed

§ RichdocumentformatallowsforEasyReadability

§ FullindexsupportforHighPerformance

§ ReplicationandFailoverforHighAvailability

§ Auto-Sharding forEasyScalability

§ Map/ReduceforAggregation

Page 45: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

45 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB atCraigslist

Usecases:§ Createnewposts§ Browseallmyposts§ Allowforpostclassification§ Searchrelevantposts

Solution:§ MigratefromMySQLtoMongoDB

Challenges:§ Archivebillions ofrecordsinmultiple formats§ Query/reportonarchivesatruntime§ Needcontinuous availabilitymandatedfor

regulatorycompliance§ Support 700sitesin70differentcountries

CraigslistEnvironment

• 5Billiondocuments• Avg Size:2KB• 3Replicasets/3serverseach• 2Datacenters• Sharding key– PostingID

Page 46: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

Closing

Page 47: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

47 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CABigDataControlCenter– Vision

Bringefficiencytoroot-causeanalysis atalllevelsofBigDatasolution stack

SimplifymanagementbyabstractingthecomplexitiesofunderlyingBigDataTechnologies

HolisticallymeettheneedsofDevOpsbymanagingthelifecycleofApplications,DataandServices

BigDataTechnologies

LOB/BizAnalysts

AppDev./DataSci.

DataEng./DataAdmin

ITOps/ITMgmt.

BigData/SysAdmin

PrimaryPersonas

1

2

3

SecondaryPersona

End-to-EndManagementofBigDataEnvironments fortheApplicationEconomy

Application

Data Services

DataSources

ITSolutions CABigDataControlCenter

Page 48: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

48 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

ManageBigDataWithAUnifiedView

JobMonitoring

HeterogeneousSystemManagement

IntelligentAlertManagement

ResourceReporting

Cluster/Job/NodeManagement

Page 49: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

49 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

UnifiedView– Details

Page 50: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

50 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

RecommendedSessions

SESSION# TITLE DATE/TIME

MFT05S BigIron+BigData=BIGDEAL!Unlock ThePowerofYourMainframeData

1/18/2015 at2:00pmLocation:MainframeTheater

MFX15S PredictingWhenYourApplicationsWillGoOfftheRails!ManagingDB2Application PerformanceusingAnalytics

1/18/2015 at4:30pmLocation:BreakersI

MFT15TNewMainframeITAnalytics:ActionableInsightintoRootCauseAnalysis ofPerformanceIssues

1/18/2015 at3:45pmLocation:MainframeAreaTechTalk

MFX06S CA'sStrategyandVision forMainframeDataManagementandAnalytics

1/18/2015 at1:00pmLocation:BreakersI

MFT01S TheBigData,BigPicture:CanYouSeeIt? 11/19/2015 at3:45pmLocation:MainframeTheater

Page 51: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

51 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MustSeeDemos

SeetheFutureofBigDataManagement

CABigDataControlCenter

AppEconomyAreaStation:APPECN001

UnleashthePowerof

MainframeData

vStorm ConnectDataStreamingforBigData

MainframeAreaStation:MNFSE001

MaximizeYourMainframe

DatabaseValue

CAIDMS/CADatacom

MainframeAreaStation:MNFSE002

PerformanceAnalyticsforDB2

DB2Analytics

MainframeAreaStation:MNFSE004

Page 52: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

52 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

FollowOnConversationsAt…

SmartBarDB2ToolsandPerformance

Analytics

MainframeAreaonExpoFloor

TechTalksFiveStepstoPowerfulDatabase

Experience

MainframeAreaonExpoFloor

Page 53: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

53 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

InfluencingOurRoadmap

WinningwithCA

§ Submityourideasoncommunities.ca.com

§ Vote&commentonideasthatareimportanttoyou

§ CAProductManagementreviewsideasandupdatesstatusastheymovethroughthelifecycle

§ “CurrentlyPlanned”ideastatusindicatesinclusioninAgileBacklogorProductRoadmap

Taketheopportunity to influenceourproductdevelopment.Helpensurethatwedeliveriswhatyouneedandwant.

AgileDevelopment

CACommunities Ideation§ Registertoparticipatein:– LiveDemos/End-of-SprintReviews

– Private-MembersOnly-OnlineCommunity

– Pre-ReleaseOnsiteTestingandSupport(Beta)

– UpgradeSupportfromSWATTeam

§ Howtoregister:https://validate.ca.com

CustomerValidation

Page 54: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

54 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

AgileDevelopmentTransformation

DrivingSignificantBusinessValueforourCustomers!

Speed Quality

Performance

UKCustomerStandardLifebenefitsfromCAagileprocess

251 uniquecustomersparticipatedin56 productreleasesduringayear

99.5%reductionincost98%reductioninmonthendcycletime

45products releasedagainstzerodefectpolicy20%decreaseinsupportissues

Page 55: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

55 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

ForInformationalPurposesOnlyTermsofthisPresentation

©2015CA.Allrightsreserved.Alltrademarksreferencedhereinbelongtotheirrespectivecompanies.Thepresentationprovided atCAWorld2015isintendedforinformationpurposesonlyanddoesnotformanytypeofwarranty.Someofthespecificslideswith customerreferences relatetocustomer'sspecificuseandexperienceofCAproductsandsolutionssoactualresultsmayvary.

CertaininformationinthispresentationmayoutlineCA’sgeneralproductdirection.Thispresentationshallnotserveto(i)affecttherightsand/orobligationsofCAoritslicenseesunderanyexistingorfuturelicenseagreement orservicesagreementrelatingtoanyCAsoftwareproduct;or(ii)amendanyproductdocumentationorspecificationsforanyCAsoftwareproduct.Thispresentationisbasedon currentinformationandresourceallocationsasofNovember18,2015,andissubjecttochangeorwithdrawalbyCAatanytimewithoutnotice.Thedevelopment,release andtimingofanyfeaturesorfunctionalitydescribedinthispresentationremainatCA’ssolediscretion.

Notwithstandinganythinginthispresentationtothecontrary,uponthegeneralavailabilityofanyfutureCAproductrelease referenced inthispresentation,CAmaymakesuchrelease availabletonewlicenseesintheformofaregularlyscheduledmajorproductrelease.SuchreleasemaybemadeavailabletolicenseesoftheproductwhoareactivesubscriberstoCAmaintenanceandsupport,onawhen andif-availablebasis.Theinformationinthispresentationisnotdeemedtobeincorporatedintoanycontract.

Page 56: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

56 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Q&A