32
Digital Media Ingest and Storage Options on AWS Guy Farber Amazon Web Services © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved©

[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Options on AWS

Embed Size (px)

Citation preview

Digital Media Ingest and Storage Options on AWS

Guy FarberAmazon Web Services

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved©

Content has Gravity and is getting heavier …

…it’s easier to move processing to the content

4k/8kContent

Where is the problem?

More Bandwidth$$$$$

More PowerfulCompute $$$$$

Way more Storage$$$$$

Some Progress(ABR, HEVC, VP10)

Where is the sliding scale on my Infrastructure?

File Block Object

AWS Storage options for digital media

Amazon

EFS

Amazon

EBS

Amazon EC2

Instance

storage

Amazon

S3Amazon

Glacier

A Concept - the Content LakeInspired from Data Lake (Coined by James Dixon in 2010)

A single store of all of digital content that you create and

acquire in any form or factor

•Don’t assume any resolutions/formats (for now or future)

•It is up to the consumer (application consuming the content) to use the

appropriate infrastructure for processing

Amazon S3 : the Content Lake

• Durable, cost-effective and fast

• Highly scalable front-end – Multi-part uploads (parallel writes)

– Range-gets (parallel reads)

• No need for capacity planning or provisioning

• Use Amazon S3 with on-premises storage in a hybrid model

• Secure

S3 scalability: buckets and objects

Hydrating the Content Lake

Amazon S3

Amazon S3(multi-part Upload)

Direct Connect

N x 1G | 10G

Massively Scalable Front-end

Introducing AWS Import/Export Snowball

Scale and Speed

• Up to 50TB Capacity per device

• 10Gbps and 1Gbps connectivity

• Parallel data transfer enables PBs transferred in a week

Secure

• Tamper-resistant enclosure

• 256-bit encryption with KMS

• Secure data erasure

Simple

• Manage entire process through AWS Console

• Lightweight data transfer client

• Notifications

What is Snowball? Petabyte scale data transport

E-ink shipping

label

Ruggedized

case

“8.5G Impact”

All data encrypted

end-to-end50 TB

10G network

Rain & dust

resistant

Tamper-resistant

case & electronics

Can I drop it?

• No (please don’t)

• Snowball is its own box

• Has had many drop tests already

• Can handle 8.5G impacts

• Designed for shipping

How it works

What does it cost?

• $200 / job plus shipping

• Includes 10 days to fill the device at your site

• $15/day after the tenth day on site

• Standard Amazon S3 charges apply

• $0.03/GB to transfer data out

• $0.00/GB to transfer data in

How fast is that truck full of drives?

• Less than 1 day to transfer 250TB via 5x10G connections with 5

Snowballs, less than 1 week including shipping

• Number of days to transfer 250TB via the Internet at typical

utilizations

InternetConnectionSpeed

Utilization 1Gbps 500Mbps 300Mbps 150Mbps

25% 95 190 316 632

50% 47 95 158 316

75% 32 63 105 211

What does it cost?

Example 1:

• 250TB loaded on to 5 Snowballs

• 8 days at your site

• 5 * $200 = $1,000 plus shipping

Example 2:

• 30TB exported on to 1 Snowball

• 8 days at your site

• $200 + 30TB * $0.03/GB = $1,121.60 plus shipping

Edge Locations

Availability Zone

Region

Dallas (2)

St.Louis

Miami

JacksonvilleLos Angeles (2)

Seattle

Ashburn (3)

Newark

New York (3)

Dublin

London (2)

Amsterdam (2)

Stockholm

Frankfurt (2)Paris (2)

Singapore(2)

Hong Kong (2)

Tokyo (2)

Sao Paulo

South Bend

San JosePalo AltoHayward

OsakaMilan

Sydney

MadridSeoul

Mumbai

Chennai

Regional Lakes …

Source

(Virginia)

Destination

(Oregon)

• Only replicates new PUTs. Once

S3 is configured, all new uploads

into a source bucket will be

replicated

• Entire bucket or prefix based

• 1:1 replication between any 2

regions

Use cases

Compliance - store data hundreds of miles apart

Lower latency - distribute data to remote customers/partners)

S3 cross-region replicationAutomated, fast, and reliable asynchronous replication of data across AWS regions

Amazon S3

Amazon S3 (range-gets)

Direct Connect

N x 1G | 10G

Massively Scalable S3 Front-end

EBS

Instance

Store

cMassively Scalable Compute on AWS Cloud

On-Prem Apps

Consuming the Content Lake

Object life cycle from hot to cold

S3 Standard• Primary data

• 11 9’s of durability

• 2.75c – 3c per GB/month, $338 -369 per TB/year

S3 – Infrequent Access• Active Archives

• Mezzanine files

• 11 9’s of durability

• 1.25c per GB/month, $154 per TB/year

• 1c per GB for retrievals

Glacier

• Deep/offline archives

• WORM-compliant

data

• 11 9’s of durability

• 0.7c per GB/month,

$86 per TB/year

Data tiering using Life Cycle Policies

Actual customer quote: $0.0125 ?! OMG I will

take all your storage!!!

1 PB raw storage

800 TB usable storage

600 TB allocated storage

400 TB application data

S3 capacity pricing—pay only for what you use!

AWS Cloud

Storage

Securing your data on S3

• AWS alignment with the latest MPAA cloud based application guidelines for content security – August 2015

• VPC private endpoint for Amazon S3 – enables a true private workflow capability

• Encryption & key management capabilities

• Amazon Glacier Vault for high-value media/originals

Preserve, retrieve, and restore every version

of every object stored in your bucket

S3 automatically adds new versions and

preserves deleted objects with delete

markers

Easily control the number of versions kept by

using lifecycle expiration policies

Easy to turn on in the AWS Management

Console

Key = photo.gif

ID = 121212

Key = photo.gif

ID = 111111

Versioning

Enabled

PUTKey = photo.gif

S3 versioning

Amazon S3 event notifications

Delivers notifications to Amazon SNS, Amazon SQS, or AWS

Lambda when events occur in Amazon S3

S3

Events

SNS topic

SQS queue

Lambda function

Notifications

Foo() {

}

Support for notification when

objects are created via Put,

Post, Copy, or Multipart

Upload.

Support for notification when

objects are deleted, as well

as with filtering on prefixes

and suffixes for all types of

notifications.

Reference Architecture – Content Processing

Pipeline (Using Lambda)

S3 multi-part API

S3 as backend storage for Content Files acesable to

other processing tasks

Amazon Elastic

Transcoder

S3 Notification

Trigger a Lambda

Function to Start a

transcoding job

Ingest

S3 Notification

Lambda function to

generate a signed

URL to share the

file

Update CMS or

Metadata

Elastic File System - Rendering in the Cloud

• Designed to support petabyte scale file systems

• Throughput scales linearly with storage

• Same latency spec across each AZ

• Thousands of concurrent NFS connections

• Works great for large I/O sizes

• Pay for only what you use not what you provision

• Managed with multi-copy durability

Media Workloads (redefined)

EBSInstance

Store

Amazon EBS/EFS/EC2 Instance Store

Process

Partner/Affiliate/Service Provider

User Delivery/ConsumptionVFX/Production

On-Prem Apps

Archive

Amazon Glacier (Life Cycle Policies)

c

c

Direct Connect

Content Access Transfer

Disposable Infrastructure

Auto-scaling

Workload specific

Amazon S3

EFS

Q&A

Learn more at: http://aws.amazon.com/s3/

http://aws.amazon.com/glacier/

[email protected]

How is my data transported securely?• Strong chain of custody

• Tamper-resistant case

• Tamper-resistant electronics (TPM)

• Each Snowball is erased according to NIST 800-88 media sanitization guidelines between every job

How fast is that truck full of drives?• Less than 1 day to transfer 50TB via a 10G

connection with Snowball, less than 1 week including shipping

• Number of days to transfer 50TB via the internet at typical utilizationsInternetConnectionSpeed

Utilization 1Gbps 500Mbps 300Mbps 150Mbps

25% 19 38 63 126

50% 9 19 32 63

75% 6 13 21 42

What does it cost?• Example 1:• 40TB loaded on to 1 Snowball• 2 days at your site• $200 plus shipping• Example 2:• 30TB loaded on to 1 Snowball• 12 days at your site• $200 + 2*$15/day = $230 plus shipping

Media Storage ServicesAmazon EBS

Block storage for use

with Amazon EC2

Amazon S3

Massively scalable

storage & front-end

11 9’s of durability

Internet scale

storage via API

Amazon Glacier

$0.01/GB/month

11 9’s of durability

Multiple copies across

different DCs

Storage for archiving and

backup

EC2

EBS

Amazon EFS

Share File storage for

use with Amazon EC2

EC2

EFS

Massively scalable

Storage up & down

Scalable Performance

Up to 16TB/volume

Up to 20K IOPS

SSD backed

Encryption