Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Aspera OnDemand: S3-direct
Michelle Munson
President, CEO and Co-founder of Aspera
Oct 2011
Cloud Computing – Why is it so compelling?
1. The potential of infinite computing resources, on demand
– Eliminates the need to plan ahead
– Meet demand - without the lead-time bottleneck
2. The elimination of an up-front commitment
– Reduce capital outlay and investment risk
– Start small & increase h/w resources to match need
– Auto-scale to meet demand
3. Pay-for-use resource model
– CPU’s by the hour
– Storage by the day
– Bandwidth by the GB
– No commitment: Release assets & remove costs as needed
Source: Above the Clouds: A Berkeley View of Cloud Computing, P1, 2009
AWS for Media Production & Distribution
• Content Creation
– Compute Intensive: EC2 (10’s, 100’s, 1000’s of CPUs)
• Transcoding, encoding, watermarking, video editing
• Rendering & HPC applications
• Mission Critical Storage and Distribution
– Long-term archive & backup
– Near-line storage for compute
– B2B/B2C media ingest & distribution
• Monetization & Play Out
– Release, project and event specific marketing & social media
– Brand awareness & franchise continuity
– CDN and Delivery
• AWS Cloud Front
M&E Big-Data: Big & Getting Bigger
A single digital cinema production can be 800K–1M 2K/4K frames
Large Data Freighting: Underpins Content Supply & Creation
Characterizing & Understanding
Big-Data Cloud Transfer Bottlenecks
Two Major Bottlenecks: WAN Transfer & Local HTTP I/O
1st Bottleneck - WAN
Transfers over the WAN are TCP
based (FTP, SCP, HTTP etc)
2nd Bottleneck – Data Center
“Last-foot” local transfers from EC2 to
S3 can use multiple HTTP connections
Server(EC2)
S3
WAN
1st Bottleneck: WAN Transfers
#1 WAN Transfer: Local machine to EC2 Effective throughput
• Single HTTP transfer
• Typical internet conditions
50-250ms latency & 0.1- 3% packet loss
0.5 to 5 Mbps
• 15 parallel http streams 7 to 75 Mbps
• Aspera fasp transfer Up to 700 Mbps
WAN
Aspera Solution – fasp Optimal Transfer Performance
• Optimal end-to-end throughput efficiency
– Full utilization of commodity Internet bandwidth
– Highly bandwidth efficient
• Low Overhead
– Less than 0.1% overhead on 30% packet loss
– Full utilization of storage throughput
– Equal performance with large files or large
collections of small files
• Congestion Avoidance and Policy Control
– Real-time policy based bandwidth control
– Congestion avoidance (WAN, LAN, Disk)
• The Result:
– Transfers up to thousands of times faster than FTP
– Precise and predictable transfer times
– End-to-end policy control over transfer priority and
speed
fasp – Built-in Security & Reliability
• Secure user/endpoint authentication
– Standard secure shell (SSH)
– Standard system authentication and user access control (LDAP, AD)
• AES-128 encryption in transit and at rest
– Real-time in transit packet encryption
• GUI, API, Web & Mobile
– Encryption at rest per recipient (secured storage of transferred content)
• Data integrity verification
– For each transmitted data block
• Automatic resume of partial or failed transfers
– GUI/CLI/API: Stop/ Start/ Pause/ Resume
• Automatic HTTP fallback in restricted networks
fasp – Management & Control
• Extraordinary bandwidth control
– Automatic, full utilization of available bandwidth
– Protection of other network traffic
– On-the-fly, per flow, user and job prioritization
– Highly-concurrent transfer stacking
• Scalable, system-wide monitoring and reporting
– Real-time progress and performance analysis
– Real-time bandwidth utilization
– Detailed transfer history, logging and manifest
• Centralized, network-wide command and control
– Per transfer, user, group and node
– Manage and create global transfer policies
– Remotely initiate, schedule and automate transfer jobs
2nd Bottleneck: Data Center/ Local HTTP I/O
2nd Bottleneck – Data Center
“Last-foot” local transfers from Server (EC2) to Storage (S3) using one/many HTTP connections
Server(EC2)
S3
AWS S3: 449 Billion objects and counting
Amazon’s S3 is the premier cloud storage system
• 449 Billion objects and counting
• 1,440 objects for every resident of the US
• 64 objects for each person on Planet Earth
• About as many objects as there are stars in the Milky Way
http://aws.typepad.com/aws/2011/07/amazon-s3-more-than-449-billion-objects.html
Some History: 2006-2009 Big-Data & AWS
March 2006:
– AWS Launches the S3 object storage system
June 2009:
– AWS announces physical import/export for AWS
September 2009:
– Aspera launches Aspera OnDemand for AWS Video & Graphics
Genetic Sequencing
Photos & Imaging
ComputerModeling
Music / Audio
PDFs
Sept 2009: Aspera Launches On-Demand for AWS
• On-Demand: powered by Aspera’s patented fasp™ technology
– Next-generation transport protocol for digital media
– Eliminates the latency & packet loss bottlenecks of TCP
– Reliable & secure asset delivery system
– Replaces FTP, HTTP, NFS, CIFS, tape and disks
– Seamless integration w/ all Aspera Clients and Console Management
• Aspera On-Demand lowered the network barrier to AWS adoption:
– Solved Bottleneck #1
– But didn’t transfer data directly to/from S3 – Bottleneck #2
Aspera On-Demand Version 1 – No S3 Support…
AsperaOn-Demand
Server
EC2
LocalHD
Elastic Block Store – EBS (NAS)
fasp
S3???
EC2HTTP
Aspera On-Demand v.1 did not read/write data to S3
Aspera Client
Aspera Connect
Browser Plugin
Aspera Mobile
WAN
2011: Big-Data REALLY Meets AWS…
Dec 2010: AWS Announces Major S3 Upgrade
– S3 object size increased
• 5GB to 5TB
• AWS introduces multipart HTTP uploader
• API’s available in Java, .NET, PHP & REST
Fantastic… but now what?
• Still HTTP over the WAN (SLOW)
• Still have to “glue” any fasp high speed transfer
to S3 I/O in custom s/w – big speed bump!
• Find an expert s/w team
• Build upon the multipart API
• Concurrently stream data to S3
• Integrate into operations
Video & Graphics
Genetic Sequencing
Photos & Imaging
ComputerModeling
Music / Audio
PDFs
S3 Big-Data i/o: API or Application
Step Action
1 Initiate multipart upload by providing your AWS credentials
2 Provide required bucket name and key name
3 Save the upload ID for each subsequent multipart upload operation
4Upload parts providing part upload information (upload ID, bucket name, part number)
5 Save the responses (ETag value and the part number)
6 Repeat tasks 4 and 5 for each part of your object
7 Execute a final call to complete the multipart upload
! "#
S3 Multi-part uploadHTTP
Multi-Part Uploader API (from AWS) Commercial Tools
Why These Challenges of Storing Big Files in the Cloud?
• Designed as scalable distributed object stores
– Target applications require simple read/write operations of binary "blobs”, indexed by a single primary key
– Should work well for storing large numbers of media files, compared to traditional file systems
• BUT
– “Blob" sizes are small (<64 MB) => large media files must be “chunked”
– Data I/O use the standard HTTP protocol – VERY SLOW at distance
– API for managing data requires a team of experts
• M&E/ Big-Data services require high-speed software bridge over the WAN
– Large files to be moved at full bandwidth capacity w/ global access
– Must overcome the WAN and the I/O bottleneck
– Must allow for writing media files of any size
– Must be transparent to the end user uploading / downloading (GUI, command line, browser, etc.)
Solving the Big-Data i/o at Scale
With the help of AWS, Aspera did a full characterization of AWS S3 i/o:
• Upload/Download performance vs. thread count
• Upload/Download performance vs. chunk size
• 24hr upload stability w/ fixed thread size
• 24hr download stability w/ fixed chunk
• Upload/Download performance vs. duration
• DNS lookup performance
• Performance w/ concurrent access to single S3 bucket
• Performance w/ max connections per host
S3
The Result? Aspera On-Demand S3-direct
Aspera Client
Aspera Connect
Browser Plugin
Aspera Mobile
EC2
AsperaOn-Demand
Server
fasp-S3 Gateway
Server RAM
fasp
Aspera On-Demand S3-direct:
• Full client-side r/w of S3
• Synchronous transfer from Client to S3 (via EC2 Aspera On-Demand)
• Real-time optimization of HTTP threads
• Real-time optimization of chunk size
HTTP S3
Parts API
Optimized S3 i/o
WAN
Overcoming Both Bottlenecks - Transferring Data to S3 over WAN
#1 - Transfer Data to EC2 over WAN Effective throughput
• http transfer over WAN (single stream)
• Typical internet conditions
• 50-250ms latency & 0.1- 3% packet loss
• 15 parallel http streams
0.5 to 5 Mbps
7.5 to 100 Mbps
• Aspera fasp transfer over WAN to EC2 up to 700 Mbps
#2 - Transfer Data from EC2 to S3 Effective throughput
• Standard single stream http 20 to 100 Mbps
• Aspera S3 Proxy
• With parallel I/O http streamsup to 700 Mbps
fasp™ 45 Mbps 100 Mbps 200 Mbps 1 Gbps 5 Gbps 10Gbps
1 GB 3.2 min 1.4 min 42 sec 8.4 sec 1.6 sec 0,8 sec
10 GB 32 min 14 min 7 min 1.4 min 16 mins 8.2 sec
100 GB 5.3 hrs 2.3 hrs 1.2 hrs 14 min 2.7 mins 82 sec
1TB 2.1 days 23 hrs 11.7 hrs 2.3 hrs 28 mins 14 mins
Aspera On-Demand S3-direct
AsperaOn-Demand
Server
EC2
LocalHD
Elastic Block Store – EBS (NAS)
faspS3
EC2HTTP
Aspera Client
&/or Server
Aspera Connect
Browser Plugin
Aspera Mobile
WAN
fasp
Applications: 2K/4K Global Freighting
Applications: 2K/4K Global Freighting (faspframes)
• Native 2K/4K frame transport software
• Designed for 10Gbps WANs
• Millions of frame files
• 60 min of footage (1 TB) transferred globally in under 20 minutes !
• 8 Gbps at 200 ms / 2%
faspframes – Ultra Simple, Ultra Fast s/w for 2K/4K Transfers
Aspera faspframes Transfer Times
10 Gbps Global WANs
Distance Speed Transfer Time for 1 TB (~60 min Film)
LA-NY (100 ms / 1%) 8.1 Gbps 18.1 minutes
LA-London (200 ms/2%) 7.9 Gbps 18.6 minutes
LA-Mumbai (300ms/5%) 6.3 Gbps 23.3 minutes
Compare To
HW Appliance for 2K/4K Transfers – Highest Capacity Model
Distance Speed Transfer Time for 1 TB (~60 min of Film)
LA-NY (100 ms / 1%) 3.6 Gbps 42 minutes
LA-London (200 ms/2%) No data ??
LA-Mumbai (300ms/5%) No data ??
faspframes – Ultra Simple, Ultra Fast s/w for 2K/4K Transfers
What is it?
• An ultra-simple software tool for ultra-fast (fully reliable) transfers of 2K/4K frame files
• Max speed in-order transfer of 2K/4K frame files over WAN (any distance, any bandwidth)
• Available for users of Aspera Point-to-Point and Server
Advantages?
• Software application only integrates easily with any workflow
• No clunky brute force hardware appliances to integrate
• Full 10 Gbps performance; 2X the best speeds published by appliances
• Comprehensive bandwidth management and congestion control
• Seamlessly integrates with Aspera transfer and management tools
Platforms?
• Linux 32/64-bit
• Other platforms coming
Big-Data: Accessed & Delivered by Aspera