Upload
comunidade-netponto
View
693
Download
2
Embed Size (px)
DESCRIPTION
Nesta sessão vamos analisar as características deste serviço fazer uma breve introdução à arquitectura que a suporta. Iremos verificar as considerações que devem ser tidas em conta na criação e utilização deste tipo de armazenamento, analisando o impacto que as decisões tomadas têm no que respeita a performance e objectivos de escalabilidade. Serão ainda mostrados alguns exemplos de utilização em cenários distintos, incluindo algumas optimizações que se podem fazer para melhorar a performance. Comunidade NetPonto, a comunidade .NET em Portugal! http://netponto.org
Citation preview
NoSQL em Windows Azure Table StorageVítor Tomaz
http://netponto.org37ª Reunião Presencial @ Lisboa - 23/03/2013
Vítor TomazISEL – LEICSAFIRA
NetPontoAzurePTRevista ProgramarPortugal@ProgramarSQLPortMSDN
Agenda
• Characteristics & Concepts• Service Architecture• Scalability Targets• Non-Relational Data Modeling• Best Practices
Windows Azure Storage Characteristics • A “pay for what you use” cloud storage system
Durable: Store multiple replicas of your data Local replication:
Synchronous replication before returning success Geo replication:
Replicated to data center at least 400+ miles apart Asynchronous replication after returning success to user.
Available: Multiple replicas are placed to provide fault tolerance
Scalable: Automatically partitions data across servers to meet traffic demands
Strong consistency: Default behavior is consistent reads once data is committed
Windows Azure Storage Abstractions
TablesStructured storage. A table is a set of entities; an entity is
a set of properties.
QueuesReliable storage and delivery of messages for an application.
BlobsSimple named files along with metadata for the file.
DrivesDurable NTFS volumes for Windows Azure applications to use. Based on Blobs.
Storage Libraries in Many Languages
Windows Azure Storage AccountUser specified globally unique account name
North Central USNorthern Europe
Western Europe East Asia
South East Asia
US Europe Asia
Can choose geo-location to host storage account:
South Central US
West US East US
Table Storage ConceptsEntityTableAccount
contoso
Name =…Email = …
Name =…EMailAdd=
customers
Photo ID =…Date =…
photos
Photo ID =…Date =…
No Fixed Schema
FIRST LAST BIRTHDATE
Wade Wegner 2/2/1981
Nathan Totten 3/15/1965
Nick Harris May 1, 1976
FAV SPORT
Canoeing
Table Details
InsertUpdate Merge – Partial update
Replace – Update entire entity
UpsertDeleteQueryEntity Group Transactions Multiple CUD Operations in a single atomic transaction
Create, Query, DeleteTables can have metadataNot an RDBMS! Table
Entities
Entity PropertiesEntity can have up to 255 propertiesUp to 1MB per entity
Mandatory Properties for every entityPartitionKey & RowKey (only indexed properties)Uniquely identifies an entityDefines the sort order
Timestamp Optimistic ConcurrencyExposed as an HTTP Etag
No fixed schema for other propertiesEach property is stored as a <name, typed value> pairNo schema stored for a tableProperties can be the standard .NET types String, binary, bool, DateTime, GUID, int, int64, and double
Scalability
Partition: Range of entities with same partition key value.Partitions are fanned out based on loadThey can be condensed when load decreasesReads are load balanced against three replicas
Server 1 Server 2 Server 3
P1
P2
Pn
Service Architecture
Storage Stamp Architecture
Extent Nodes (EN)
Front End Layer FE
Incoming Write Request
PartitionServer
PartitionServer
PartitionServer
PartitionServer
PartitionMaster
FE FE FE FE
Lock Service
Ack
Partition Layer
Stream Layer
Windows Azure Storage - Architecture
PartitionKeyUnique identifier for the partition within a give table.
RowKeyUnique Identifier for an entity within a given partition.
Both Keys matter!Define Primary KeyForms a single clustered index
Scalability
SlowestNo Partition KeyNo Row Key
SlowerOnly Partition KeyNo Row Key
Very FastPartition Key + Row Key
Table Storage – Key Points
1000 EntitiesAny query not including the Rowkey and PartitionKey (only those as well) needs to handle Continuation tokenshttp://tinyurl.com/ContToken
Continuation Tokens• Next Table• Next PartitionKey• Next RowKey
Transient Fault Handling• Network• Hardware• DataCenter
Scalability Targets
Scalability Targets -Storage AccountStorage Account level targets by end of 2012 Applies to accounts created after June 7th 2012
Capacity – Up to 200 TBs
Transactions – Up to 20,000 entities per second
Bandwidth for a Geo Redundant storage accountIngress - up to 5 GibpsEgress - up to 10 Gibps
Bandwidth for a Locally Redundant storage account
Ingress - up to 10 Gibps Egress - up to 15 Gibps
Scalability Targets – PartitionPartition level Targets by end of 2012 Applies to accounts created after June 7th 2012Single Table Partition – Account Name + Table Name + PartitionKey value
Up to 2,000 entities per second
Non-Relational Data Modeling
Why Partition
Data Volume (too many bytes)
Work Load (too many transactions/second)
Cost (using different cost storage)
Elasticity (just in time partitioning for high load periods)
Choosing a Partition Key
Natural Keys•Country•First letter, last name•Date
Mathematical•Hash functions•Modulo operator
Lookup Based•Lookup table to resolve value to partitions
Using Modulo
The remainder of a divisionNice properties for partitioning:•Given two positive integers M and N•M mod N will return a number between 0 and N-1
Want equi-sized partitions?•Given an appropriate distribution of M we will get N ‘equally full’ buckets.
Using Hash Values
Using a hash function projects one distribution into anotherUse a hash function that projects a random distributionDo NOT use a cryptographic hash functionBe careful if using Object.GetHashCode()•Boxed types may return different value to un-boxed equivalent
Re-partition all data
Version partitioning scheme
Partition Stability Over Time
May need to change partitioning schemeTwo options:
e.g. <Version><PartitionKey><v1><A3E567D7D8C68789><v2><A8B978C8B6D77836>
wherev1 = GUID mod 4v2 = GUID mod 101 2
E.g. Tweet Storage
TweetID
UserID
DateTimeStamp
Message
With an RDBMS you’d probably start something like this:SELECT * FROM Tweet WHERE Message Like %SearchTerm%
E.g. Tweet StorageYou’d soon realize that LIKE isn’t so wonderful.
You’d do a little normalization
…
Message
TweetID
WordID
WordID
Word (IX)
…
Message
TweetID
Word (IX)
E.g. Tweet Storage
With Tables we go the whole way
TweetID (RK)
UserID (PK)
DateTimeStamp
Message
TweetID (RK)
UserID
DateTimeStamp
Message
Word (PK)
E.g. Tweet Storage
We may create multiple indexes
TweetID (RK)
UserID (PK)
DateTimeStamp
Message
TweetID (RK)
UserID
DateTimeStamp
Message
UserID (PK)
Entity Group Transactions
Modeling In Tables
Currently no secondary indexes (coming)•Be careful to minimize cross partition queries
Build indexes yourself•Concentrate on useful partition keys
If associated data is small enough•Save additional queries•Duplicate data with each index
Best Practices
Common Design & ScalabilityCommon Settings
Turn off Nagling & Expect 100 (.NET – ServicePointManager)Set connection limit (.NET – ServicePointManager.DefaultConnectionLimit)Turn off Proxy detection when running in cloud (.NET – Config: autodetect setting in proxy element)
Design you application that allows distributing requests across your range of partition keys to avoid hotspots Avoid Append/Prepend pattern: Access pattern lexically sorted by Partition Key valuesPerform one time operations at startup rather than every request Creating containers/tables/queues which should always exist Setting required constant ACLs on container/table/queue
Common Design & ScalabilityTurn on analytics & take control of your investigations– Logging and MetricsWho deleted my container? – Look at the client IP for delete container requestWhy is my request latency increased? - Look at E2E vs. Server latencyWhat is my user demographics? – Use client request id to trace requests & client IPHow can I tune my service usage? – Use metrics to analyze API usage & peak traffic statsAnd many more…
Use appropriate retry policy for intermittent errors Storage client uses exponential retry by default
Storage AccountsCollocate storage accounts with your compute roles as egress is free within same region
Use multiple storage accounts to: achieve targets that exceed a single storage achieve client proximityMap multiple clients to same storage account
Use different containers/tables/queues instead an account for each customer
Storage Accounts
Design to add more accounts as needed
Use different account for Windows Azure Diagnostics
Choose local redundant storage ifData can be restored on major disastersGeographical boundary constraints on where data can be stored
WA Table Client - Service Layer• Option 1 – WCF Data Services
• Good for fixed schema used like relational tables• Do not require control on serialization/deserialization
• Option 2 – Table Service Layer’s Dynamic Table Entity• Entity containing a Dictionary of Key-Value properties• Used when schema is not known example: Explorers• Performance!
• Option 3 – Table Service Layer’s POCO • POCO derives from ITableServiceEntity or TableServiceEntity• Control over serialization and deserialization – make your data
dance to your tune!• ETag maintained with Entities - easy to update!• Performance!
Performance - Storage Client Library 2.0
Storage Client 1.7 Storage Client 2.0 : DataServices
Storage Client 2.0 : Reflection
Storage Client 2.0 : No Reflection
0
5
10
15
20
25
30
35
40
0
20
40
60
80
100
120
140
160
Batch Stress Scenario Per Entity Latencies
DeleteQueryInsertProcessor Time (s)Test Duration (s)
Tim
e (
ms)
Faster NoSQL table accessUpto 72.06% reduction in execution timeUpto 31.92% reduction in processor time Upto 69-90% reduction in latency
Performance - Storage Client Library 2.0
Storage Client 1.7 Storage Client 2.00
5,000
10,000
15,000
20,000
25,000
30,000
Large Blob Scenario (256MB) Resource Utilization
Total Test Time (s)Total Processor Time (s)
Tim
e (
s)
Storage Client 1.7 Storage Client 2.00
10
20
30
40
50
60
70
Large Blob Scenario (256MB) Latencies
UploadDownload
Tim
e (
s)
Faster uploads and downloads31.46% reduction in processor time Upto 22.07% reduction in latency
Take Away
Partitioning Data Key to Cloud Scale Apps
Horizontally Partition for Scale Out
Choose appropriate partition keys
Table storage requires different approach to data modeling
Don’t be afraid to aggressively de-normalize and duplicate data
Resources
Storage team blogs @ http://blogs.msdn.com/b/windowsazurestorage/
Getting Started @ https://www.windowsazure.com/en-us/develop/overview/
Pricing information @ https://www.windowsazure.com/en-us/pricing/details
Questões?
Próximas reuniões presenciais
23/03/2013 – Março (Lisboa)20/04/2013 – Abril (Lisboa)22/06/2013 – Junho (Lisboa)??/??/2013 – ? (Porto)??/??/2013 – ? (Coimbra)Reserva estes dias na agenda! :)
Patrocinador “GOLD”
Twitter: @PTMicrosoft http://www.microsoft.com/portugal
Patrocinadores “Bronze”
Obrigado!Vítor Tomazvitorbstomaz AT gmail.comhttp://twitter.com/vitortomaz