Microsoft's Hadoop Story

  • View

  • Download

Embed Size (px)


Presentation at the Seattle Hadoop Meetup 1/23 about Microsoft's Hadoop Story.

Text of Microsoft's Hadoop Story

  • 1. Hadoop and Microsoft.Michael Rys | Principal Program Manager @SQLServerMike

2. Session Objectives What is BigData? How it fits into the Windows and Windows Azure environments How do I program against it in the Microsoft Environment 3. What is Big Data? Traditionally: Physics Experiments, Sensor data, Satellite data, Now: Operational Logs Customer behavior Social interactions online From Terabytes in the 1990 over Petabytes today to Zetabytes in thefuture 4. Big Data. 5. VOLUMEVARIETYVELOCITY (Size)(Structure)(Speed)Big Data. 6. Whats the social sentiment How do I better predictof my product?future outcomes?How do I optimize my servicesbased on patterns of weather,traffic, etc.?New Questions. 7. Hadoop is for Big Data. 8. What is Hadoop (v1)? Processing Platform for Big Data Processing Using the Map-Reduce Processing Paradigm Characteristics: Highly-scalable (scaled out) Commodity HW-based Open Source=> Very low cost for acquisition and storage costs 9. Hadoop Data FlowData Hadoop Analytics 10. Hadoop CapabilitiesExtract Load Distributed TransformCompute Predictive Machine GraphAnalysisLearningProcessing 11. HDInsight Ecosystem ODBCDistributed Processing (Map Reduce) Distributed Storage(HDFS)Worlds Data (Azure Data Windows Azure StorageMarketplace) 12. HDInsightDataKnowledge Action 13. HDFS on Azure: Tale of two File Systems HDFS APIContainers on Azure Blob StorageNameNodeFront endFront endFront endData NodePartition Layer Data Node Stream LayerDFS (1 Data Node per Worker Role) Azure Storage Vault (ASV)and Compute Cluster 14. .Net Map/Reduce Support Install NuGet NuGet Microsoft .Net MapReduce API for Hadoop Provide an implementation of a HadoopJob Execute the job via either MRLibMRRunner.exe -dll ConsoleAppHadoopJob.exe Or HadoopJobExecutor.ExecuteJob(); Collect your result on HDFS 15. Javascript Map/Reduce Support Provide a map and reduce function variable in JS file Use Javascript console with runJS(/user/myself/MRjob.js, /path/to/inputfile,/path/to/output/dir) Collect your result on HDFS 16. Invoking HiveQL Queries Run queries in Hadoop Command Shell after invoking hive Through the web console Programmatically through ODBC Coming soon: LINQ to Hive! 17. Polybase Enhancing PDW query engine Data ScientistsBI UsersDB Admins Regular ResultsTraditional schema-based DWSocialSensorT-SQLapplicationsApps& RFIDMobileWeb Enhanced Apps Apps PDW query engineHadoop PDW V2Unstructured dataStructured data 18. Microsoft Hadoop VisionBetter on Windows and Azure Active Directory System Center .Net ProgrammabilityMicrosoft Data Connectivity SQL Server / SQL Parallel Data Warehouse Azure Storage / Azure Data MarketMicrosoft Business Intelligence (BI) Hive ODBC Connectivity BI Tools for Big DataCollaborate with and Contribute to OSS Collaborate with HortonWorks Provide improvements and Windows support back to OSS 19. Getting started On prem: Single node cluster (onebox) install C:hadoop Starts local services Can start/stop them with start-onebox.cmd/stop-onebox.cmd Comes with: Hadoop command line (shell) Hadoop Status for name node and map-reduce cluster HDInsight Dashboard On Windows Azure: 3 node cluster running as a service in Azure Can be used for 5 days Provides samples and HDInsight Dashboard TAP Program 20. Related Content and Links 21. Thank you