DELIVERING THE ENTERPRISE FABRIC FOR BIG DATA Aiaz Kazi SVP, Platform Strategy and Adoption SAP @aiazkazi.

  • Published on
    03-Jan-2016

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

<p>SAP &amp; DATABRICKS</p> <p> Delivering the Enterprise Fabric for BIG DATAAiaz KaziSVP, Platform Strategy and AdoptionSAP@aiazkazi</p> <p>Good morning everyone. Excited to be back here with you all again. Theres a lot of talk about Big Data and how enterprises should harness the same. The term is a misnomer but its also one that stuck and all of us use it to mean many things. Lets take a look at what Big Data means in the enterprise1Enterprise: BIG DATA = ALL DATADeepAll-SpeedsBroadSimpleInteractiveCONTEXT IS KINGThe Enterprise View is simply that big data includes every piece of data that touches the enterprise. One way to look at this is to look at the key dimensions of data</p> <p>1.Deep Highest level of granularity - greatest level of flexibility in answering complex questions. From miniscule config/ini files distributed across the organization to large stores of data</p> <p>2. Broad - All types of data structured, text, blobs, binary etc.</p> <p>3. Simple - Actual data only all derived data built on the fly. Greatly simplify data management and alleviate the pain of managing snapshots of data</p> <p>All-speeds - Data at rest and data in motion. Counter intuitive to V for Velocity as that implies high speed but works if we think of V as all velocities</p> <p>Interactive - Ask any question of your data and receive answers immediately so you ask the question and iterate forth. Factor in IoT and machines asking questions or providing answerskey dimension for Event Condition Action processes.</p> <p>Data big or small, broad or deep, at rest or in motion without context is meaningless. Big Datas 3 Vs ignore context but some talk about a fourth V called Value that could possibly work as context.</p> <p>Those that can deliver best context and help drive business decisions will winperiod! So lets take a look at how we can deliver such context across these 5 dimensions</p> <p>2Key technology requirementsMassively Parallel ProcessingDistributedIn-MemoryLinear Scaling[FRANKLIN]In Memory:- Lighting fast in-memory computing for large scale data processing- 100X performance thanks to the ability to perform computation in-memoryDistributed- World changing internet technologies run on distributed environments. - Distributed environments is the future for ALL Data The BIG DATA ENTERPRISE- Storage, Queries distributed across a sea of computation power (clusters) with robust fault tolerance (automatically rebuilt on failure)- Work with distributed collection as you will work with local ones -&gt; Simplicity managing distributed systemsMassive Parallel Processing - Massively parallel processing of data is the promising execution stack for Big Data Analytics- programming supporting parallel methodsLinear Scaling I believe BIG Data analytics (or Advance Analytics) should be the 4th . I am not too fun of the term Linear Scaling, is more with the SAP Lingo than the SPARK community.</p> <p>Big Data Analytics- This term is popular with the SPARK Community and SAP. It is within the priority for the CLOUD community- Bring the processing and execution of analytics close to in-memory is key. Spark is doing, Hana is doing it.- Spark initiative like GraphX, MLlib or Shark are trying to create the Analytics platform on top to make this happen._____________________________</p> <p>Business meaning out of Traditional sources OLAP OLTP RDBMS and New Sources (Clickstream, Geo &amp; tracking, Sensors, Social Media)</p> <p>Make sense of highest levels of granularity Deep</p> <p>Data distributed across the org, and all types - Broad</p> <p>3SAP HANA A REIMAGINED PLATFORMIn-MemoryDistributedLinear ScaleMassively ParallelColumnarCompressedNo Aggregates</p> <p>TextGeo-spatialAnalyticsPredictivePlanning [FRANKLIN]In Memory:- Lighting fast in-memory computing for large scale data processing- 100X performance thanks to the ability to perform computation in-memoryDistributed- World changing internet technologies run on distributed environments. - Distributed environments is the future for ALL Data The BIG DATA ENTERPRISE- Storage, Queries distributed across a sea of computation power (clusters) with robust fault tolerance (automatically rebuilt on failure)- Work with distributed collection as you will work with local ones -&gt; Simplicity managing distributed systemsMassive Parallel Processing - Massively parallel processing of data is the promising execution stack for Big Data Analytics- programming supporting parallel methodsLinear Scaling I believe BIG Data analytics (or Advance Analytics) should be the 4th . I am not too fun of the term Linear Scaling, is more with the SAP Lingo than the SPARK community.</p> <p>Big Data Analytics- This term is popular with the SPARK Community and SAP. It is within the priority for the CLOUD community- Bring the processing and execution of analytics close to in-memory is key. Spark is doing, Hana is doing it.- Spark initiative like GraphX, MLlib or Shark are trying to create the Analytics platform on top to make this happen._____________________________</p> <p>Business meaning out of Traditional sources OLAP OLTP RDBMS and New Sources (Clickstream, Geo &amp; tracking, Sensors, Social Media)</p> <p>Make sense of highest levels of granularity Deep</p> <p>Data distributed across the org, and all types - Broad</p> <p>4SAP HANAA completely reimagined in-memory platformDatabase ServicesApplication PlatformFunction Libraries1,500+Startups$1B+Revenue3,300+Enterprise CustomersIn-memory</p> <p>Columnar DataPredictiveText / NLPGeospatialPlanning / RulesSAP HANAA completely reimagined platform that combines data, transactions, analytics, predictive, sentiment and spatial processing so that businesses can operate in real-time</p> <p>5SAP HANA AND SPARK DELIVER THE ENTERPRISE FABRIC FOR BIG DATASAP HANA6CLOUD READY</p> <p>In-memory</p> <p>Spark SQL /SharkSparkStreamingMLlib(Machine Lang)GraphX(graph)Apache Spark+Enterprise Fabric addresses data in corporate applications/databases and data stored outside in stores such as Hadoop ideally 100% of data across all 5 dimensions</p> <p>Spark/Hadoop Developer</p> <p>Bring your applications to the enterprise use HANA to access, analyze, corporate and operational dataSimplify Interactive data analysis across corporate and context dataMinimize data movement and enable in-memory computation on where the data residesHandle transaction, predictive, and geo-spatial, planning, and advanced analytic workloads in HANA not already addressed in Spark/HadoopDevelopers and startups using Scala and Python can now leverage data stored in HANA and engine capabilities of HANA to augment/build their applications</p> <p>In futureLeverage enterprise abilities such as security models etc. from HANAPush down queries into HANA for execution</p> <p>SAP Customer</p> <p>Extending the enterprise fabric to manage data stored in Hadoop. Accelerate HadoopEnable Scala, Python developers in the organization to use the Power of HANA Use Smart Data Access to get relevant data from HDFS into memory into Spark and perform analytics across that data in HANA without moving the dataUse engines from HADOOP to do specialized tasks such as Machine Learning on streaming data or on data derived from data in HANA+HDFS</p> <p>6What are we announcing?spark.saphana.comSPARK 1.0 DISTRIBUTION</p> <p>DOWNLOAD NOWPARTNERSHIP WITH DATABRICKS7SAP HANA + SPARK : Enterprise FABRIC for BIG DATAReal-time Applications, Interactive AnalysisTachyonSCMERPCRMTextGeospatialSensorSocialMediaLogsData SourceDistributed File PersistenceIn-Memory PersistenceIn-Memory ProcessingSAP HANA smart data accessData AccessSQLJavaScalaPythonOtherSQL.NETJavascriptMDXOtherNodeJSIn-memory</p> <p>Columnar DataPredictiveText / NLPGeospatialPlanning / RulesSAP HANASparkSQL/SharkSparkStreamingMLlibGraphX(graph)HDFS / Any Hadoop</p> <p>Fault Tolerant DFS Mgmt8[FRANKLIN]EXAMPLE OF USE CASES TAKING ADVANTAGE OF THIS INTEGRATION</p> <p>EXAMPLE 1 Track machine data from slot machines which records data about how the machine is being used and combine this with POS data in HANA of customers spending money in the casino to provide royalty tracking and predict performance of future games. Look at game play metrics - various/ frequency distribution of data. What are the top performing games from a revenue standpoint?Calculate average session length per game, total events per session, pulls per session. Analyze MOST profitable games with the LEAST number of plays/handle pulls? Access revenue on games by account manager level</p> <p>EXAMPLE 2- Use predictive analytics libraries in HANA to build predictive models on large historical data in SPARK and once the models are fine-tuned apply them to real time operational data feeds in HANA and apply the predictive algorithms on live data.</p> <p>DEVELOPER TALKING POINTS</p> <p>Share a number of percentage of SAP HANA customers leveraging in-memory (HANA). Customers in industries like transportation and logistics, Utilities, telecommunications, CPG and mining are dealing with the traditional and new sources of data (data, geo, clickstream, sensors, etc)Invite developers to leverage the Machine learning, data streaming, geo and tracking initiatives (GraphX) in real customer scenarios. Solve today problems fpr All DATA EnterpriseInvite Startups and extend their applications to the enterprise world by integrating with HANA.Invite Foundation Builders to make this first integration of SAP HANA and SPARK more robust. For in-memory to in-memory to have better adapters, to share a common laguage to even share the same service UI.</p> <p>How Do I Get Started?Available for immediate download: spark.saphana.comSAP HANA Learning Resources: www.saphana.comOther Learning Resources: academy.saphana.comDelivering the Enterprise Fabric for BIG DATA</p> <p>THANK YOU!Aiaz Kazi SVP Products &amp; Innovation Platform Strategy &amp; AdoptionSAP</p> <p>@aiazkazi</p>

Recommended

View more >