Big Data BriefingHow to implement Big Data on Windows Azure
Big Data; running cheaply and efficiently on Windows Azure to discover business value from data you already hold about your business.
Introduction
AgendaWindows Azure IntroHDInsight IntroArchitectureBI and VisualisationsCase Studies
WhoAndy Cross, Director of Microsoft Partner Elastacloud, Windows Azure MVP, Big Data specialist and international speaker.
Time until next coffee 55:00
30 minutes
30 minutes
30 minutes
15 minutes
15 minutes
Find out more at http://www.elastacloud.com
WINDOWS AZUREIntroduction to
Today’s Transformation – Cloud
Usage BasedElasticSelf-ServicePooled Resources
Economics ▪ Agility ▪ Focus
PaaS SaaS
IT Continuum
Evolution toward highly-virtual and beyond to cloud
Physical Virtual / Private IaaS
Microsoft On The Internet
One of 3rd world’s largest private networks
1.0
1.0
400m active accounts
More than 3 billion queries per month
1 billion+ authentications per day
750 million unique visitors at any given time
Microsoft account
Global Foundation Services
The Microsoft Cloud~Globally Distributed Data Centers
Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs
Inside a Microsoft Data Center
Cloud Computing Patterns
tCom
pute
Inactivity
Period
On and Off – Start/End SemesterOn & off workloads (e.g. batch job)Over provisioned capacity is wasted Time to market can be cumbersome
t
Unpredictable Bursting – Web demandUnexpected/unplanned peak in demand Sudden spike impacts performance Can’t over provision for extreme cases C
om
pute
t
Predictable Burst – Registration
Services with micro seasonality trends Peaks due to periodic increased demandIT complexity and wasted capacity
Com
pute
t
Growing Fast – Research ProjectSuccessful services needs to grow/scale Keeping up w/ growth is big IT challenge Cannot provision hardware fast enoughC
om
pute
Applicationbuilding blocks
StorageBig data
Caching
CDN
Database
Identity
Media
Messaging
Networking
Traffic
Big Data Technologies
HDInsight Summary
GA, recent preview of Hadoop 2Managed Hadoop on Windows AzureFamiliar tools such as Hive, Pig, OozieAdditional BoB Microsoft ecosystem tooling with .net SDK
Powershell and .net for provisionExecution with .net and powershell for Hive
Paired with Hortonworks HDP for on-premises Hadoop; compatible with all major Hadoop implementationsCombined with Excel and traditional Microsoft BI stack for compelling solutions
PaaS
Position in Cloud
Centralised Resources
State – a Windows Azure SQL DB stores metadata to allow identical clusters to be spawned
Data – all your data resides on Windows Azure Blob Storage; resilient, secure and cost effective
Logs – machine state and cluster uptime is stored in Windows Azure Table Storage
Provision hundreds of machines against your data; gain understanding rapidly
Speed to understanding
500 servers.
20 minutes.
40,000,000,000 data points
Buy t
he L
ook
Drive Investment
Exploring Sentiment, Social and Metadata Graphs
Person Product
Person
Product
Owns
Owns
Alike
Alike?
Social Media
Microsoft Big Data Solution
Power View Excel with PowerPivot Embedded BIPredictive Analytics
APPsLOBCRMERP
Microsoft EDW
SSAS SSRS
Devices CrawlersSensors Bots
Hadoop On Windows Server
Hadoop On Windows Azure
Microsoft Offerings
Hive ODBC Driver & Hive Add-in for Excel
Integration with Microsoft PowerPivot
Hadoop based distribution for Windows Server & Azure
Strategic Partnership with Hortonworks
JavaScript framework for Hadoop
RTM of Hadoop connectors for SQL Server and PDW
Hadoop on WindowsInsights to all users by activating new types of data
Integrate with Microsoft Business Intelligence
Choice of deployment on Windows Server + Windows Azure
Integrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on Windows
Simplified programming with . Net & Javascript integration
Integrate with SQL Server Data Warehousing
Diff
ere
nti
ati
on
Microsoft Big Data Roadmap
To accelerate the delivery of Microsoft’s Hadoop based solution for Windows Server and service for Windows Azure, Microsoft is announcing a partnership with HortonworksMicrosoft is committed to broadening accessibility and usage of Hadoop to end users, developers and IT professionals in organizations of all sizes
Microsoft is announcing an end-to-end roadmap for Big Data that embraces Apache HadoopTM by distributing enterprise class Hadoop based solutions on both Windows Server and Windows Azure
Microsoft is extending its leadership in business intelligence and data warehousing to provide insights to all users by activating new types of data of any size
Microsoft’s value addsWhy Elastacloud recommend Microsoft’s platform
Skill reuse
Express elegant solutions in C#Familiar Unit Testing patternsConcise programmatic terseness
Your existing development team can immediately
realise value
The frameworks
facilitate deterministic
testing for highly reliable
queries
Complex logic is best expressed in programmatic
form
Commoditised query
Provision
Execute
De-provision
Valu
e
Action Cost
Value of query
Time
Cost
HDInsight in Azure
Hybrid Compatibility
Hadoop On Premises
Name=Andy Pnid=123456
123456 4712
Microsoft are the only vendor with enterprise on premises and cloud big data offerings
Just as Big Data is more than Hadoop; Windows Azure is more than HDInsight
HDPSparkStorm
Elastacloud provisioned a bespoke 500 core virtual cluster on Windows Azure, allowing us to cut down a compute intensive task from 20 days over 8 cores to under 12 hours over 500 cores.
Jedidiah Francis, Data Scientist, ASOS
Architectural ConsiderationsSolving Big Data challenges cohesively
The first challenge you face with Big Data is somewhat the hardest
Elastacloud’s tooling aims to solve these challenges
Ingress, Compute, Egress automated; allowing focus on providing value
Workflows Big Data solutions are rarely the result of a single piece of code running; instead jobs transform data into intermediate domains before completing an overall piece of work
The overall architectural challenge of Big Data is that just as Data can vary, so must architectures.
Batch
Interactive Real TimeHadoop can operate with O(n) over Petabytes of data
Drill, Stinger and Tez bring must quicker querying
Storm allows enormous scalable throughput
Batc
hServ
ing
Sp
eed
Static Data
Data Stream
Batch Process
Real Time
Operational Data Store
Precompute
Increment
User Query
Technologies possible? Any.
Batch Interactive Real Time
Hadoop SQL / MySQL Storm
Spark Hadoop 2 / Tez / Drill Spark
Lucene Mongo
Cassandra
Windows Azure Table Storage
Spark
Batc
hServ
ing
Sp
eed
Static Data
Data Stream
HDInsight
Storm
SQL/NoSQL Database
Precompute
Increment
User Query
Windows Azure Service Bus
Blob Persist
Elastacloud Service Bus
Spout
Moving Mediatonic to Windows Azure
AWS didn’t work
• Mediatonic are a leading UK games company• They had deployed an analytics platform to AWS• They required quite a bit of manpower to maintain their system
operationally • Their biggest cost was an Amazon Redshift data warehouse which was
a bottleneck in their design• Their system was built so that if a server went down whilst collecting
gaming messages they could lose 15 minutes of data• They found the “Big Data” part so complex that only highly-paid
specialists could improve the system
How Windows Azure Helped
• A joint collaboration between Microsoft, Elastacloud and Mediatonic migrated their Game Fuel platform to Windows Azure
• With the Windows Azure Service Bus we were able to ensure that their messaging was continuous and not discrete so no messages were ever lost
• With HDInsight we were able to teach their staff how to build “Big Data” applications
• We were able to deliver a system with unique components of Windows Azure, removing the expensive data warehousing bottleneck but delivering the same functionality
“I can’t believe that I could get up and running with HDInsight doing real work in 30 minutes. It took me days to do the same thing with Elastic Map Reduce on Amazon”
Adam Fletcher – Gamefuel Project Lead
Buy t
he L
ook
Initial State
Traditional Analytics
Limited
Traditional Analytics allows for the tracing of single actions on a website; who added a product to a bag, where did they come from, where did they go?
Buy t
he L
ook
Advancement Cross Sell
Improvements possible
Beyond immediacy is the ongoing workflow of a system. Who added items to a basket after using the Buy the Look functionality. However, this lacks comprehension of customer behaviour.
Cutting edgeWho bought this?
Cutting edge – understand your customerPu
rch
ase
pro
pen
sity
Time
January February
Proving through Big Data technologies that engaged customers will return through engagement following life events such as pay-days to continue the engagement. Building business on this basis.