Upload
ezra-doyle
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Using Windows Azure Storage with SharePoint for Document Management Joe GiardinoSenior Development Lead
SPC224
Introduction
• Introduction• Windows Azure Storage Overview• Concepts• Storage Abstractions• Scale• Sharing your data• Analytics• Azcopy• Putting it all together• Questions
Agenda
• Joe Giardino• Senior Development Lead – Azure Storage• [email protected]• @dajoeg (twitter)
Who I am
• (As of 11/12)• Majority
(~80%) are Developers and IT Pros looking to expand into the cloud via Azure.
Who you are
IT Professional
Developer
Business Decision Maker
Azure Storage Overview
• Windows in the Cloud
• Services for multiple application models
• Rock solid, always on
What is Windows Azure?
PaaS SaaSIaaS
Windows Azure Compliance CapabilitiesWindows Azure feature ISO 27001 SSAE 16
SOC 1 Type 2
EU Model Clauses
HIPAA BAA
Web Sites
Virtual Machines
Cloud Services
Storage (Tables, Blobs, Queues)
SQL Database
Caching
Content Delivery Network (CDN)
Networking (Connect, TM, VNet)
Windows Azure Active Directory
Service Bus
Media Services
• Hybrid storage scenarios• Offload large media files to the cloud• Document Archiving• Some 3rd party providers already offering some
of this today• Upload and process large amount of documents
in bulk• Reduce the size of your SharePoint DB / lower
cost
Windows Azure Storage for the SharePoint Developer
• Blobs: Store files and metadata associated with it• Tables: A strongly consistent NoSQL structured store
that auto scales• Queues: Durable asynchronous messaging system to
decouple components• Drives and Disks: Network mounted durable drives
available for applications in the cloud
• Easy to use and open REST APIs o Client libraries in Java, Node.js, PHP, .NET etc.
Windows Azure Storage Abstractions
• A “pay for what you use” cloud storage system
• Durable: Store multiple replicas of your data• Local replication: • Synchronous replication before returning success
• Geo replication: • Replicated to data center at least 400+ miles apart • Asynchronous replication after returning success to user.
• Available: Multiple replicas are placed to provide fault tolerance
• Scalable: Automatically partitions data across servers to meet traffic demands
• Strong consistency: Default behavior is consistent reads once data is committed
Windows Azure Storage Characteristics
• All abstractions backed by same store• Same feature set across all abstractions (geo, durability,
strong consistency, auto scale, monitoring, partitioning logic etc.)
• Reduce costs by blending different characteristics of each abstraction
• 880K requests/s at peak & 4+ Trillion objects• Great performance for low transaction costs!• Easy to use and open REST APIs • Client libraries in Java, Node.js, PHP, .NET etc.
Windows Azure Storage Characteristics
• $0.125 to $0.04 /GB per month depending on usage and redundancy • Locally redundant is ~27% cheaper• Pay for what you use• All abstractions share the same billing
• $0.01 per 100,000 transactions• Free Account available to MSDN subscribers• 90 day Trial Accounts available• Available as part of Enterprise Agreements• Develop for free with the Storage Emulator • Cost Calculator :
https://www.windowsazure.com/en-us/pricing/calculator/
Pricing
North America Region Europe Region Asia Pacific Region
S. Central – U.S. Sub-region
W. Europe Sub-region
N. Central – U.S. Sub-region N.
Europe Sub-region
S.E. AsiaSub-region
E. AsiaSub-region
Major datacenter
CDN PoPs
Windows Azure Storage
East – U.S. Sub-region
West – U.S. Sub-region
• Xbox: Uses Windows Azure Blobs, Tables & Queues for applications like Cloud Game Saves, Halo multiplayer, Music, Kinect data collection etc.
• SkyDrive: Uses Windows Azure Blobs to store pictures, documents etc.
• Bing: Uses Windows Azure Blobs, Tables and Queues to implement an ingestion engine that consumes Twitter and Facebook public status feeds and provides it to Bing search
• And many more…
Windows Azure Storage – How is it used?
Concepts
• User creates a globally unique storage account name• Choose the primary location to host storage account• “North Central US”, “South Central US”• “North Europe”, “Europe West”• “South East Asia”, “East Asia”
• Storage Account Keys• Receive two 512 bit secret key when creating account• Rotate keys
• Storage Account Capacity• Each storage account can store up to 200 TB
Windows Azure Storage Account
• Via the Portal @ https://manage.windowsazure.com/
Getting Started – Creating a Storage Account
• 3 Clicks in an existing Subscription
• Will be provisioned and online in minutes
Blobs
Windows Azure Blobs
BlobContainer
Account
Contoso
Images
PIC01.JPG
Video
VID1.AVI
http://<account>.blob.core.windows.net/<container>/<blobname>
Pages/Blocks
Block/Page
Block/Page
PIC02.JPG
BLOB Storage is the simplest way to store large amounts of unstructured text or binary data such as video, audio and images with the fastest read performance. Highly scalable, durable, available file system.
Blobs can be exposed publically over http.
Can securely lock down permissions to blobs.
Blob Containers• Number of Blob Containers• Can have has many Blob Containers that will fit within the
storage account limit
• Blob Container• A container holds a set of blobs• Set access policies at the container level • Private or Public accessible or Store Shared Access Identifiers• Associate Metadata with Container• Metadata are <name, value> pairs• Up to 8KB per container• List the blobs in a container
Emulate Hierarchical Directory Structure• Containers provide a single level of isolation and access control• Hierarchical directory structures can be
emulated via fixed delimiters in blob names• i.e. Media/Music/Album/Song.mp3
• Listing APIs provide for easy way to target listings to sub “directories”• Exposed in the Storage client libraries as
CloudBlobDirectories
Demo
Hello Blob
• Create, Delete, SetAcl Containers• Put/Get Blobs/Set Metadata• Parallel uploads & Single range gets
• Copy Blob• Asynchronous copy – copy between accounts
• Snapshots• Read only version of a blob• Promote a version as base blob
• Lease on a container or blob • 15-60s lease on blob• Useful for master election scenarios• Infinite leases - Locks
Windows Azure Blobs - Operations
• Block Blob • Targeted at streaming workloads and general purpose files• Each blob consists of a sequence of blocks• Each block is identified by a Block ID• Size limit 200,000 MB per blob
• Page Blob• Targeted at random write workloads• Each blob consists of an array of pages• Each page is identified by its offset from the start of the blob• Size limit 1TB per blob• Must be 512 byte aligned
Types of Blobs
• Uploading a large blob
Uploading a Block Blob
10 GB Movie
Windows Azure Storage
Blo
ck I
d 1
Blo
ck I
d 2
Blo
ck I
d 3
Blo
ck I
d N
blobName = “TheBlob.wmv”;PutBlock(blobName, blockId1, block1Bits);PutBlock(blobName, blockId2, block2Bits);…………PutBlock(blobName, blockIdN, blockNBits);PutBlockList(blobName,
blockId1,…,blockIdN);
TheBlob.wmvTheBlob.wmv
BenefitEfficient continuation and retryParallel and out of order upload of blocks
• Block can be up to 4MB each• Each block can be variable size• Each block has a 64 byte ID• Scoped by blob name and stored with the blob
• Block operation• PutBlock• Puts an uncommitted block defined by the block ID for the blob
• Block List Operations• PutBlockList• Provide the list of blocks to comprise the readable version of the blob• Can use blocks from uncommitted or committed list to update blob
• GetBlockList• Returns the list of blocks, committed or uncommitted for a blob• Block ID and Size of Block is returned for each block
Block Blob Details
• Create MyBlob• Specify Blob Size = 10 GBytes
• Fixed Page Size = 512 bytes
• Random Access Operations• PutPage[512, 2048)• PutPage[0, 1024)• ClearPage[512, 1536)• PutPage[2048,2560)
• GetPageRange[0, 4096) returns valid data ranges:• [0,512) , [1536,2560)
• GetBlob[1000, 2048) returns• All 0 for first 536 bytes• Next 512 bytes are data stored in [1536,2048)
Page Blob – Random Write0
10 GB
10 G
B A
dd
ress S
pace
512
1024
1536
2048
2560
• Page Blob is created with a Max Blob Size• Can change the max size of the blob at anytime.
• Address space is broken up into fixed sized 512 byte pages for updates
• Page update operations – must be page aligned• PutPage - limited to 4MB• Overwrite range of pages starting at the specified offset
• ClearPage - can specify up to max size of blob• Clear range of pages at the offset
• Reading a Page Blob• GetBlob – can read from any byte offset for any valid range
• What parts of the Page Blob have stored pages in them• GetPageRanges• Get valid page ranges in the blob
• Only charged for pages with data stored in them
Page Blob Details
• Move legacy applications easily to cloud
• IaaS• Disks: OS and multiple data disks associated with VM. Network
mounted durable drives
• PaaS• Drives: Dynamically mount network attached single volume VHDs
• Drives & Disks• Fixed formatted VHD• Stored using page blobs • All flushed and un-buffered writes are made durable• Transactions on blobs like PutPage, GetBlob (ranges) etc. are counted towards
storage account
Windows Azure Disks & Drives
• Block Blob• Targeted at streaming workloads• Update semantics• Upload smaller blobs in a single PUT (up to 64 MB)• Upload a bunch of blocks. Then commit change.
• Page Blob• Targeted at random write workloads• Update Semantics• Immediate update
Choosing Between Block and Page Blob
Tables
• http://<account>.table.core.windows.net/
Windows Azure Tables
EntityTableAccountName =…Email = …
Name =…Email = …
contoso
customers
photos
Photo ID =…Date =…
Photo ID =…Date =…
Table Storage a NoSQL key-value store technology used by applications requiring storing large amounts of data storage that need additional structure at no performance cost.
• Tables is a NoSQL DataStore• Designed to be scale to TBs and beyond• Large demand for economically accessing
structured data at Internet scale• Distributed store • Auto scale• Flexible schema
• Relational databases • Scale up your servers: But gets expensive very soon• Manually Shard: But now you lose certain relational capabilities across
shards• Maintenance requires DBA expertise
Windows Azure Tables
• Flexible schema• Tables – No schema associated• Store different entity types in same table
• Single Index• (AccountName, TableName, PartitionKey, RowKey) => Clustered Index• PartitionKey – Mandatory property which defines partitioning• RowKey – Mandatory property which defines uniqueness within a given partition
Windows Azure Table – Basics
• Optimistic concurrency• Timestamp – A read only property maintained by server
• Query• Single entity lookup • Scans: Range based query• Performance depends on selectivity
• Atomic Transactions• Entities with same partitioning key can be part of single
batch request• Exposed via RESTful APIs • OData protocol• Java, Node.js, PHP, .NET client libraries
Windows Azure Table – Basics
• All Entities define 3 reserved properties• PartitionKey• RowKey• Timestamp
• Entities can contain 255 properties• Properties can be up to 64 KB each• A single entity can be up to 1 MB• A single atomic batch operation can be up
to 4 MB
Window Azure Tables - Basics
Demo
Hello Table
Queues
• http://<account>.queue.core.windows.net/
Windows Azure Queues
MessagesQueueAccount
CustomerId=41OrderId=O21
CustomerId=12OrderId=O1
contoso
orders
imageprocessing
BlobUrl=http://contoso.blob…
BlobUrl=http://contoso.blob…
Queues offer a guaranteed message delivery mechanism that can be used to distribute work across roles. Utilizing a two-phase delete process ensures that even if a role which retrieves a message goes down the message will become visible again and subsequently processed by another worker.
• Create, Delete, SetAcl Queues• Enqueue Messages – 64KB messages• Lease mechanism for processing• Get Message(s) – provide lease time• Delete Message
• Metadata• Message count – Enables auto scaling• Dequeue count – Detect poison messages
• Sharing Scenarios• Private access or Shared Access Signatures (Signed Url)
Windows Azure Queues - Operations
Message Processing
Azure Queue
Work itemsWeb Role
Web Role
Worker Role
Worker Role
7:00AM7:04 AM Get Message with 5 minutes visibility timeoutExtend visibility timeout with another 5 minutes
7:07 AM7:09 AM
Periodically store progress information in message content
7:057:097:14
Current Time
Expires @ 7:09AMExpires @ 7:05AM
Retrieve progress from queue message and resume
Demo
Hello Queue
Scale
• Flat network storage design• “Quantum 10” network• Non-blocking 10Gbps based fully meshed network• Move to software based Load Balancer • Provides an aggregate backplane in excess of 50 Tbps
bandwidth per Datacenter• Enables high bandwidth scenarios such as Windows Azure
IaaS disks, HPC, Map Reduce etc.
• http://blogs.msdn.com/b/windowsazurestorage/archive/2012/11/04/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx
Windows Azure Flat Network
• Storage Account level targets by end of 2012 • Applies to accounts created after June 7th
2012• Capacity – Up to 200 TBs • Transactions – Up to 20,000 entities/messages/blobs per
second • Bandwidth for a Geo Redundant storage account• Ingress - up to 5 Gibps• Egress - up to 10 Gibps
• Bandwidth for a Locally Redundant storage account• Ingress - up to 10 Gibps • Egress - up to 15 Gibps
Scalability Targets -Storage Account
• Partition level Targets by end of 2012 • Applies to accounts created after June 7th 2012
• Single Queue – Account Name + Queue Name• Up to 2,000 messages per second
• Single Table Partition – Account Name + Table Name + PartitionKey value
• Up to 2,000 entities per second
• Single Blob – Account Name + Container Name + Blob Name
• Up to 60 Mibps
Scalability Targets – Partition
• 4 trillion objects stored• Average of 270,000 requests/sec• Peaks of 880,000 requests/sec• 4x increase in number of objects stored
over last 12 months
4 Trillion Objects and Counting (As of June)
Sharing
• Full public read access: Container and blob data can be read via anonymous request. Clients can enumerate blobs within the container via anonymous request, but cannot enumerate containers within the storage account.
• Public read access for blobs only: Blob data within this container can be read via anonymous request, but container data is not available. Clients cannot enumerate blobs within the container via anonymous request.
• No public read access: Container and blob data can be read by the account owner only.
Sharing your Blobs
• SAS allows you to provide granular access to third parties without exposing your private keys
• Can be tied to a Container/Table/Queue ACL policy to provide revocability
• A pre-authenticated token for a given resource expressed in the query of a given URI
• Available for Blobs, Tables, and Queues
Shared Access Signatures (SAS)
?sv=2012-02-12&st=2012-11-13T18%3A31%3A38Z&se=2012-11-14T18%3A36%3A38Z&sr=b&sp=r&sig=gAv5uhEsaae5i4Px%2F1qhkePmu7OFmF2EElIH8t3hqKU%3D
Anatomy of a SAS Token (no policy)
Version Start End Resource
Permissions
Signature
?sv=2012-02-12&sr=b&si=readPolicy&sig=OoRHdwlm5aiPukNhUtwQF3tb5dQmnEef43Sq6ZPQj3o%3D
Anatomy of a SAS Token (based on policy)
Version Resource
Signed Identifier (Policy name) Signature
• Use HTTPS • Securely use/transport SAS tokens
• Use minimum permissions needed and restrict time period for access
• Clock Skew - Clients should renew SAS token with sufficient leeway
• Revocable SAS - Use policy to store SAS permissions, expiry etc.• Only 5 policies can be associated with container• When removing and recreating policies, change policy IDs
Shared Access Signatures – Best Practice
SAS Read/Wri
teLogs
• Service scales as it is only responsible for handing SAS • Devices & Roles directly access Windows Azure Storage service
using SAS – no routing overhead • Use Analytics to monitor Storage Access & Usage • Client applications can make use of the storage client libs to offer best
practice / performance
SAS Provider Model
Devices &
Roles
Windows Azure Storage Service
Analytics Logs & Metrics
Public Service
Issue SAS Token
Demo
Sharing your Blobs
Analytics
• Metrics provides usage data per abstraction• Logging provides per request logging, including
client trace ids• Logs / Metrics are stored in your storage account
• Can specify a retention policy to auto delete after some period of time
• Enable via code or the Azure Portal• Provides the ability to monitor and diagnose
applications• Especially useful when utilizing SAS / SAS
provider model
Storage Analytics
• Log records for requests are stored in Windows Azure Blobs• The Log blobs are text files with one log entry per line• Each blob can contain one to many request records
• A request typically appears in the log within 15 minutes after it completes execution
• Configure the logging levels separately for• Blob, Table and Queues• read (GET), write (PUT/POST/MERGE), delete (DELETE) requests or any
combination
• Best effort logging
Storage Analytics Logs
• The following are some of the fields logged for each record:
Storage Analytics Data Fields Logged
Log VersionAccessing AccountOwner AccountService TypeRequest URLObject KeyRequest ID
Operation NumberRequest VersionOperation TypeStart TimeApplication End to End LatencyStorage Server LatencyAuthentication Type
Request StatusHTTP Status CodeClient IPUser AgentReferrerClient Request IDETagLMT
Request Packet SizeRequest Header SizeResponse Packet SizeResponse Header SizeRequest MD5Server MD5Conditions Used
• Each transaction type can be configured separately
• Each service is configured separately
• Use Retention Policy to auto garbage collect old logs
Enable Logging via the portal
• Provides ability to answer commonly asked questions:• How many transactions did my service issue per hour over the past week?
• How many anonymous Get Blob requests were issued to my storage account?
• My application is not performing as expected, what is the availability and performance of storage for a given time period?
• list goes on and on…
• Figure out who is accessing what…
Storage Analytics – Why turn on Metrics?
• Transaction metrics are provided for every 1 hour time interval stored into Windows Azure Tables• Example Metrics• Total Transactions• Availability• % Success, % Network Errors, % Timeout, % Throttled, etc.• Average Latency (Application E2E and Storage Server latency)• Total Ingress• Total Egress
• Blob, Table or Queue Summary and per REST API metrics
• Capacity metrics provided for only Blobs at this time• Updated once a day • Capacity and # of objects
Storage Analytics – Metrics
• Multiple Verbosity levels
• Each service is configured separately
• Use Retention Policy to auto Garbage Collect old logs
Enable Metrics (Monitoring) via the portal
• Separate Namespace• Logs• Stored as blobs in separate Blob Container in the storage account being monitored• http://account.blob.core.windows.net/$logs/
• Metrics• Stored as entities in a separate metrics Azure Tables in the storage account being monitored• http://account.table.core.windows.net/$Metrics*
• Isolation• $logs and $Metrics have separate resource limits and throttling from the rest of the
storage account traffic
• Cost• Capacity to keep the data• Transactions for generating & accessing analytics data• Can use retention policy on both logs and metrics in terms of days• Deleting data via retention policy does not incur transaction cost
Storage Analytics Summary
AzCopy
• Available at : https://github.com/downloads/WindowsAzure/azure-sdk-downloads/AzCopy.zip
• Directory copy to/from the cloud• Recursive directory supported• Can specify wild cards etc.• Fault tolerant and restartable
Azcopy – xcopy for the cloud
• azcopy c:\blob-data https://myaccount.blob.core.windows.net/mycontainer/ /destkey:key /S
• Copies all files recursively in c:\blob-data to a container name mycontainer
Azcopy example
[mycontainer]
car1.docxcar2.docxcar3.docxtrain1.docxsubfolder1/car_sub1.docxsubfolder1/car_sub2.docx
[c:\blob-data]
.\car1.docx
.\car2.docx
.\car3.docx
.\train1.docx
.\subfolder1\car_sub1.docx.\subfolder1\car_sub2.docx
Summary
• SharePoint service as SAS provider, distributes token based on authentication of SharePoint user
• Azure Blobs as bulk storage backend• Clients Read / Write directly to Azure
Storage• Service audits access Via Logs / Metrics• Azure Queues + Blobs for Bulk import /
processing• azCopy to simplify bulk import
Putting it all together…
One-Click Install Per Language Gives You Everything You Need to Start Developing on Azure
.NET Client Libraries
Visual Studio Tooling
.NET Tools & SDK
Core SDK
Node.js Client Libraries
Node.js SDK
Core SDK
PHP Client Libraries
PHP SDK
Core SDK
Core SDK = Emulator, Packaging Tools, etc.
Java Client Libraries
Java SDK
Core SDK
Eclipse Tooling
Get Started
http://WindowsAzure.com
• Storage team blogs @ http://blogs.msdn.com/b/windowsazurestorage/
• Getting Started @ https://www.windowsazure.com/en-us/develop/overview/
• Pricing information @ https://www.windowsazure.com/en-us/pricing/details/
• Advanced Storage Topics (Build 2012) http://channel9.msdn.com/Events/Build/2012/4-004
Resources
Evaluate this session now on MySPC using your laptop or mobile device: http://myspc.sharepointconference.com
MySPC
Questions?
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.