45
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lee Atkinson, Solutions Architect, AWS Jey Jeyasingam, CTO, Y-Cam 7 July 2016 Amazon S3 Deep Dive

Deep Dive on Amazon S3

Embed Size (px)

Citation preview

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Lee Atkinson, Solutions Architect, AWSJey Jeyasingam, CTO, Y-Cam

7 July 2016

Amazon S3Deep Dive

Amazon EFS

FileAmazon EBS Amazon EC2

instance store

BlockAmazon S3 Amazon Glacier

Object

Data transfer

AWS Direct Connect

Snowball ISV connectors Amazon Kinesis

Firehose

TransferAcceleration

AWS StorageGateway

AWS storage services

Cross-region replication

Amazon CloudWatch metrics for Amazon S3

& AWS CloudTrail support

VPC endpointfor Amazon S3

Read-after-write consistency in all

regions

Event notifications

Amazon S3 bucket limit increase

Innovation for Amazon S3 (1/2)

Innovation for Amazon S3 (2/2)

Amazon S3 Standard-IA

TransferAcceleration

Incomplete multipart upload expiration

Expired object delete marker

Standard

Active data Archive dataActive Archive

Standard - Infrequent Access Amazon Glacier

Choice of storage classes on Amazon S3

File sync and share / consumer file storage

Backup and archive /disaster recovery

Long retained data

Some use cases have different requirements

11 9s of durability Designed for 99.9% availability

Durable AvailableSame throughput as

Amazon S3 Standard storage

High performance

• Server-side encryption• Use your encryption keys• KMS-managed encryption keys

Secure• Lifecycle management• Versioning • Event notifications• Metrics

Integrated• No impact on user

experience• Simple REST API• Single bucket

Easy to use

Standard-Infrequent Access storage

Management policies

Lifecycle policies

Automatic tiering and cost controlsIncludes two possible actions:

• Transition: to Standard-IA or Glacier after specified time

• Expiration: deletes objects after specified timeAllows for actions to be combinedSet policies at the key prefix level

Lifecycle Policy<LifecycleConfiguration><Rule>

<ID>sample-rule</ID><Prefix>documents/</Prefix>

<Status>Enabled</Status><Transition>

<Days>30</Days>

<StorageClass>STANDARD-IA</StorageClass></Transition><Transition>

<Days>365</Days>

<StorageClass>GLACIER</StorageClass></Transition>

</Rule>

</LifecycleConfiguration>

Standard-IA Storage -> Glacier

Standard-Infrequent Access storage

Standard Storage -> Standard-IA

Versioning S3 buckets

Protects from accidental overwrites and deletes New version with every uploadEasy retrieval of deleted objects and roll backThree states of an Amazon S3 bucket

• Unversioned (Default)• Versioning-enabled• Versioning-suspended

Versioning + lifecycle policies

Expired object delete marker policy

Deleting a versioned object makes a delete marker the current version of the objectNo storage charge for delete markerRemoving delete marker can improve list performanceLifecycle policy to automatically remove the current version delete marker when previous versions of the object no longer exist

Example lifecycle policy to remove current versions <LifecycleConfiguration>

<Rule>...

<Expiration>

<Days>60</Days>

</Expiration>

<NoncurrentVersionExpiration> <NoncurrentDays>30</NoncurrentDays>

</NoncurrentVersionExpiration>

</Rule>

</LifecycleConfiguration>

Leverage lifecycle to expire currentand non-current versions

S3 Lifecycle will automatically remove any expired object delete markers

Expired object delete marker policy

Example lifecycle policy for non-current version expiration

Lifecycle configuration with NoncurrentVersionExpiration action removes all the noncurrent versions,

<LifecycleConfiguration>

<Rule> ...

<Expiration>

<ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>

</Expiration>

<NoncurrentVersionExpiration> <NoncurrentDays>30</NoncurrentDays>

</NoncurrentVersionExpiration>

</Rule>

</LifecycleConfiguration>

ExpiredObjectDeleteMarker element removes expired object delete markers.

Expired object delete marker policy

Restricting deletes with MFA

Bucket policies can restrict deletesFor additional security, enable MFA (multi-factor authentication) delete, which requires additional authentication to:

• Change the versioning state of your bucket• Permanently delete an object version

MFA delete requires both your security credentials and a code from an approved authentication device

Performance optimization for S3

Parallel PUTs with Multipart Uploads

Increase throughput by parallelizing PUTsIncrease resiliency to network errorsFewer large restarts on error-prone networksA balance between part size & number of parts:

• Small parts increase connection overhead• Large parts provide less benefits of multipart

Incomplete multipart upload expiration policy

Multipart upload feature improves PUT performance Partial upload does not appear in bucket listPartial upload does incur storage chargesSet a lifecycle policy to automatically expire incomplete multipart uploads after a predefined number of days

Example lifecycle policy

Abort incomplete multipart uploads seven days after initiation

<LifecycleConfiguration> <Rule>

<ID>sample-rule</ID><Prefix>SomeKeyPrefix/</Prefix>

<Status>rule-status</Status><AbortIncompleteMultipartUpload>

<DaysAfterInitiation>7</DaysAfterInitiation>

</AbortIncompleteMultipartUpload> </Rule>

</LifecycleConfiguration>

Incomplete multipart upload expiration policy

Parallel GETs

Use range-based GETs to get multithreaded performance when downloading objectsCompensates for unreliable networksBenefits of multithreaded parallelismAlign your ranges with your parts!

Parallel LISTs

Parallelize LIST when you need a sequential list of your keys

Secondary index to get a faster alternative to LIST

• Sorting by metadata• Searchability• Objects by timestamp

Distributing object keys

Most important if you regularly exceed 100 TPS on a bucketDistribute keys uniformly across keyspaceUse a key-naming scheme with randomness at the beginning

Distributing object keys

Don’t do this…<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg

<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg

<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg

Distributing object keys

…because this is going to happen

1 2 N1 2 N

Partition Partition Partition Partition

Distributing object keys

Add randomness to the beginning of the key name…<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg

<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg

<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg

Distributing object keys

…so your transactions can be distributed across the partitions

1 2 N1 2 N

Partition Partition Partition Partition

Techniques for distributing keys

Store as a hash:• 83d02a66a0fee41b5767e4f4dd377d29

Prepend with short hash:• 83d02013_11_13-164533125.jpg

Reverse:• 521335461-31_11_3102.jpg

Data ingestion into Amazon S3

AWS Import/Export Snowball• Accelerate PBs with AWS-

provided appliances• 80TB and global availability

AWS Storage Gateway• Up to 120 MB/s cloud upload rate

(4x improvement), and • 10 Gb networking for VMware

Data ingestion into Amazon S3

Amazon Kinesis Firehose• Ingest data streams directly into

AWS data stores

AWS Direct Connect

ISV connectorsTransfer Acceleration• Move data up to 300% faster

using the AWS network

S3 Transfer Acceleration

Introducing Amazon S3 Transfer Acceleration

Up to 300% fasterChange your endpoint, not your code56 global edge locationsNo firewall exceptionsNo client software required

S3 BucketAWS EdgeLocation

Uploader

OptimizedThroughput!

Rio De Janeiro

Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los Angeles

Seattle Tokyo Singapore

Tim

e [h

rs]

500 GB upload from these edge locations to a bucket in Singapore

Public Internet

How fast is Transfer Acceleration?S3 Transfer Acceleration

Getting Started

1. Enable S3 transfer acceleration on your S3 bucket

2. Update your application/destination URL to <bucket-name>.s3-accelerate.amazonaws.com

3. Done!

How much will it help me?

Use the Amazon S3 Transfer Acceleration Speed Comparison page:

http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-

comparsion.html

By Jey Jeyasingam

Y-cam Solutions Ltd Confidential and proprietary

Who we are...

Initially used S3 just to store videos and thumbnails, 6

years ago

120 million objects

But now we also use S3 for so much

more

2 million videos

Y-cam Solutions Ltd Confidential and proprietary

Our Architecture

Y-cam Solutions Ltd Confidential and proprietary

Challenges

Handling the expiration of videos

Legacy scripts

Reducing servers, cutting costs

Y-cam Solutions Ltd Confidential and proprietary

Video Expiration

Create multiple buckets with

different lifecycle

Improve code to decide which

bucket to save the video

Y-cam Solutions Ltd Confidential and proprietary

Legacy Script

Move create thumbnail and

update DynamoDBfrom script to

Lambda function

Extra benefits of using Lambda

Lambda triggered by S3 event notification

Y-cam Solutions Ltd Confidential and proprietary

Future Plans

Reducing number of servers

Servers only serving web app JS code

Moved this to be hosted by S3

Reduced cost

Moving towards serverless architecture

Summary

Amazon S3 Standard-Infrequent AccessAmazon S3 management policiesVersioning for Amazon S3 + MFA DeleteAmazon S3 Transfer Acceleration

Please remember to rate this session under ‘My Agenda’ on

https://awssummit.london