Upload
idalee
View
33
Download
2
Embed Size (px)
DESCRIPTION
Developing Big Data Analytics Applications with JavaScript and .NET for Windows Azure and Windows. Matt Winkler (@ mwinkle ) Principal Program Manager 3-038. What is big?. Image courtesy of CERN. The Large Hadron Collider produces 1 PB/sec. But, I don’t have a Large Hadron Collider. - PowerPoint PPT Presentation
Citation preview
Developing Big Data Analytics Applications with JavaScript and .NET for Windows Azure and WindowsMatt Winkler (@mwinkle)Principal Program Manager3-038
What is big?
Image courtesy of CERN
The Large Hadron Collider produces 1 PB/sec
But, I don’t have a Large Hadron Collider
But you do have…SensorsClicksLogsTransactional recordsCall centersMedical transcriptionsImagesDocumentsSignals from social mediaSimulations
Systems like Hadoop evolved to extract value from this data,
shaped at the intersection of physics and
economics
Redundant, distributed, scalable storage
Easily distribute the computation
Getting Started with HDInsight on Azure and Windows
Introduction to Map/Reduce
Map f(k1,v1) list(k2,v2)Reduce f(k2, list(v2)) (k2, v3)
Functionally In Practice, WordCountThe quick brown fox jumps over the lazy dog
Map(the,1) (quick,1), (brown,1), (fox,1), (jumps,2) (over,1), (the,1),(lazy,1),(dog,1)
Shuffle(the,(1,1)) (quick,1), (brown,1), (fox,1),(jumps,1) (over,1),(lazy,1),(dog,1)
Reduce(the,2) (quick,1), (brown,1), (fox,1), (jumps,1),(over,1), (lazy,1),(dog,1)
In Code
Then, scale to TB/PB of data over 10’s, 100’s or 1000’s of nodes
Map/Reduce in JavaScript
Map/Reduce in .NET
What’s After Wordcount?Reverse indexingDistributed data cleansingData transformationMachine learning algorithmsTraditional analyticsPredictive analytics
Recommended Reading: Data-Intensive Text Processing with MapReduce
Hive, Like SQL, Just Bigger
SELECT airlinelocal.Origin,airlinelocal.Dest, airlinelocal.Carrier, AVG(averagearrivaldelay –
airlinelocal.ArrDelayMinutes) as AvgDiffFromAverage
FROM airlinelocal
JOIN reallybadroutes
ON (airlinelocal.Origin = reallybadroutes.Origin
AND airlinelocal.Dest = reallybadroutes.Dest)
GROUP BY airlinelocal.Origin, airlinelocal.Dest, airlinelocal.Carrier
ORDER By AvgDiffFromAverage DESC
You write Hadoop ExecutesHive Compiles
Hive M/R FilterHive M/R JoinHive M/R AggregateHive M/R Order
HiveLINQ To Hive
Easy to get started
Write Hadoop jobs in the language of your choice
Use your tools to process big data
• Microsoft Big Data• Azure HDInsight• .NET SDK For Hadoop
Resources
Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions
• Follow us on Twitter @WindowsAzure
• Get Started: www.windowsazure.com/build
Resources
Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
• Appendix beyond this