Wenming Ye
Sr. Research Program Manager
Microsoft Research Connections
Twitter: @wenmingye
http://www.windowsazure.com/
en-us/develop/nodejs/how-to-
guides/command-line-tools/
Gallery Images Available
MicrosoftWindows Server 2008 R2
SQL Server Eval 2012
Windows Server 2012
Biztalk Server 2013 Beta
Open SourceOpenSUSE 12.2
CentOS 6.3
Ubuntu 12.04/12.10
SUSE Linux Enterprise Server 11 SP2
VM with persistent drive
VM with persistent drive
VM with persistent drive
Server Rack 1 Server Rack 2
Blobs, Disks, Tables and Queues
8.5 trillion stored objects
900K request/sec on average (2.3+ trillion per month)
# Create containerfrom azure.storage import BlobServiceblob_service = BlobService(account_name, account_key)blob_service.create_container('taskcontainer')
# Uploadfrom azure.storage import BlobServiceblob_service = BlobService(account_name, account_key)blob_service.put_blob('taskcontainer', 'task1', file('task1-upload.txt').read(), 'BlockBlob')
#Downloadfrom azure.storage import BlobServiceblob_service = BlobService(account_name, account_key)blob = blob_service.get_blob('taskcontainer', 'task1')
Data centers
Account
Container Blobs
Table Entities
Queue Messages
https://<account>.blob.core.windows.net/<container>
https://<account>.table.core.windows.net/<table>
https://<account>.queue.core.windows.net/<queue>
Design Goals
• “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
Storage Stamp
LB
Storage
Location
Service
Access blob storage via the URL: http://<account>.blob.core.windows.net/
Data access
Partition Layer
Front-Ends
DFS Layer
Intra-stamp replication
Storage Stamp
LB
Partition Layer
Front-Ends
DFS Layer
Intra-stamp replication
Inter-stamp (Geo) replication
Index
Partition Layer
Partition Layer
Partition Layer
• Does not move data around, only reassigns what part of the index a partition server is responsible for
Partition Layer
Index
Partition Layer
• “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
and Queues (NEW)
Europe
West
North
Europe
Geo-replication
South
Central
US
North
Central
US
Geo-replication
East AsiaSouth
East Asia
Geo-replication
West US East US
Geo-replication
East USWest US
Azure
DNShttp://account.blob.core.windows.net/
DNS lookup
Data access
Hostname IP Address
account.blob.core.windows.net West US
Failover
Update DNS
East US
Geo-replication
Windows
Azure
Storage
180
182
184
186
188
190
192
194
196
198
200
660000
665000
670000
675000
680000
685000
690000
695000
700000
Average of TransactionCount
Average of TPS
0
50
100
150
200
250
300
350
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
6/2
4/2
013
6/2
4/2
013 0
:03
6/2
4/2
013 0
:06
6/2
4/2
013 0
:09
6/2
4/2
013 0
:12
6/2
4/2
013 0
:15
6/2
4/2
013 0
:18
6/2
4/2
013 0
:21
6/2
4/2
013 0
:24
6/2
4/2
013 0
:27
6/2
4/2
013 0
:30
6/2
4/2
013 0
:33
6/2
4/2
013 0
:36
6/2
4/2
013 0
:39
6/2
4/2
013 0
:42
6/2
4/2
013 0
:45
6/2
4/2
013 0
:48
6/2
4/2
013 0
:51
6/2
4/2
013 0
:54
6/2
4/2
013 0
:57
6/2
4/2
013 1
:00
Average of TransactionCount
Average of TPS
J S O N
http://www.nuget.org/packages/WindowsAzure.Storage
XL VM Uploading 512, 256MB Blobs (Total upload size = 128GB)
• C=1, P=1 => Averaged ~ 13. 2 MB/s
• C=1, P=30 => Averaged ~ 50.72 MB/s
• C=30, P=1 => Averaged ~ 96.64 MB/s
• Single TCP connection is bound by TCP
• rate control & RTT
• P=30 vs. C=30: Test completed almost
• twice as fast!
• Single Blob is bound by the limits of a
• single partition
• Accessing multiple blobs concurrently
• scales
P=1,
C=1
P=30, C
=1 P=1…
0
2000
4000
6000
8000
10000
Tim
e (
s)
• XL VM Downloading 50, 256MB Blobs (Total download size = 12.5GB)
• C=1, P=1 => Averaged ~ 96 MB/s
• C=30, P=1 => Averaged ~ 130 MB/s
0
20
40
60
80
100
120
140
C=1, P=1 C=30, P=1Tim
e (
s)
Internet of thingsAudio / Video
Log Files
Text/Image
Social Sentiment
Data Market Feeds
eGov Feeds
Weather
Wikis / Blogs
Click StreamSensors / RFID / Devices
Spatial & GPS Coordinates
WEB 2.0Mobile
Advertising CollaborationeCommerce
Digital Marketing
Search Marketing
Web Logs
Recommendations
ERP / CRM
Sales Pipeline
Payables
Payroll
Inventory
Contacts
Deal Tracking
Terabytes
(10E12)
Gigabytes
(10E9)
Exabytes
(10E18)
Petabytes
(10E15)
Velocity - Variety - variability
Vo
lum
e
1980
190,000$
2010
0.07$
1990
9,000$2000
15$Storage/GB
ERP / CRM WEB
2.0
Internet of things
Big Data, BIG OPPORTUNITY
Big Data is a top priority for institutions
49% CEOs and CIOs are planning big data projects
Software Growth
1.82.5
3.44.6
0
5
2012 2013 2014 2015
Bil
lio
ns
$ 34% compound
annual growth
rate2
Services Growth
2.73.9
5.16.5
0
5
10
2012 2013 2014 2015
Bil
lio
ns
$ 39% compound
annual growth
rate2
1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, 2012
2. IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast , 2012
How do I optimize my services
based on patterns of weather,
traffic. How do I build a
recommendation engine?
What’s the social sentiment
of my product?How do I better predict
future outcomes?
Distributed Storage
(HDFS)
Query
(Hive)
Distributed Processing
(MapReduce)
OD
BC
Legend
Red = Core
Hadoop
Blue = Data
processing
Purple =
Microsoft
integration
points and
value adds
Orange = Data
Movement
Green =
Packages
Front
endFront
end
Stream
Layer
Partition
Layer
Name Node
de
Data Node Data Node
Front end
HDFS API
DFS (1 Data Node per Worker Role)
and Compute ClusterAzure Storage (ASV)
…
Azure Blob Storage
Hive, Pig, Mahout, Cascading, Scalding, Scoobi,
Pegasus…
C#, F# Map/Reduce, LINQ to Hive, .NET
management clients
JavaScript Map/Reduce, Browser hosted console,
Node.js management clients
PowerShell, Cross Platform CLI tools
Deploying and Interacting With HDInsight Service
demo
Batch Processing Interactive analysis Stream
processing
Query runtime Minutes to hours Milliseconds to minutes Never-ending
Data volume TBs to PBs GBs to PBs Continuous stream
Programming model MapReduce Queries DAG
Users Developers Analysts and developers Developers
Originating project Google MapReduce Google Dremel Twitter Storm
Open source project Hadoop / Spark Drill / Shark /Impala
Hbase / Cassandra
Storm / Apache S4 /Kafka
http://www.windowsazure.com/en-us/develop/net/
http://blogs.msdn.com/b/windowsazurestorage/
http://blogs.msdn.com/b/windowsazurestorage/archive/2011/11/20/windows-azure-storage-a-highly-available-cloud-storage-service-with-strong-consistency.aspx
Windows Azure Python SDKWindows AzureHow to use Service Management from Pythonhttp://www.windowsazure.com/en-us/manage/linux/other-resources/command-line-tools/http://research.microsoft.com/en-us/projects/azure/