20
Introduction to Manta Rod Boothby VP 415-819-9253 [email protected] August 12, 2013

Intro to Joyent's Manta Object Storage Service

Embed Size (px)

DESCRIPTION

 

Citation preview

2. Object Stores are the Future 2 $14,639 $12,597 $14,193 $13,228 $15,305 $11,812 $10,868 $10,432 $9,924 $13,147 $15,700 $15,200 10 14 18 29 40 82 102 262 449 556 762 905 1,000 1,300 2,000 0 500 1000 1500 2000 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13 IDC Wordwide Server Sales in $ Millions Vs Billions of Objects in AWS S3 The Number of Objects in Amazon S3 is Growing Fast Server Sales are basically at 3. Manta is Joyents new Object Storage Service 3 Joyent Object Store Manta Put Data into Manta Get Data from Manta Via a RESTful API An object is non-interpreted data of any size that you read and write to the store. 4. Manta is Live and Available Today 4 http://www.joyent.com/products/manta 5. A le is an example of an object The code below does the following: 1. Creates a le called hello.txt that contains the words Hello Manta 2. Puts the le into Manta 3. Gets the le back from Manta and outputs its contents 5 $ echo "Hello, Manta" > /tmp/hello.txt $ mput -f /tmp/hello.txt /$MANTA_USER/stor/hello-foo /$MANTA_USER/stor/hello-foo [====================>] 100% 13B $ mget /$MANTA_USER/stor/hello-foo Hello, Manta 6. Manta Partners support File Interfaces 6 Joyent Object Store Manta Partners offer NAS File Interfaces that run in existing data centers but back up to the Manta Object Store Panzura solution is available today. The other solutions are due to be available by end of Q4, 2013. 7. Manta adds Big Data to Object Storage 7 Joyent Object Store Manta Only 1 Step - Analyze or Process Data using Manta Jobs Send in the Big Data Job Manta acts like a Platform as a Service (PaaS) for Big Data Analytics Manta is the only Object Storage System that brings Compute directly to the Data. 8. Big Data is easy on Manta vs complex on AWS 8 1 - Download Data 3 - Upload Data Again Cloud Object Store S3 2 - Analyze or Process Data Netix has open-sourced their Genie Management Tools for Running Hadoop Jobs with S3. To Analyze Data in S3, the Netix system requires coordinating 9 pieces of Software: Hadoop, Hive, Pig, Karyon, Servo, Ribbon, Archaius, Eureka, and Genie Big Data analytics on AWS/S3 requires 3 complex steps vs 1 simple step on Manta. 9. S3 + EC2 also requires new Sysadmins 9 Admins are needed because Genie is not an end-to- end resource management tool - it doesnt provision or launch clusters, and neither does it scale clusters up and down based on their utilization End-users are the data-scientists who want to analyze or process data stored in S3 10. 4 Big Data Made Simple Single store of record for your data Do analysis without the learning curve of server administration Do big data analysis in any language There is no learning curve to run Manta for us, since it runs on Unix. Konstantin Gredeskoul, CTO 11. Manta delivers Value Requests Delete! Free POST, PUT, LIST (GET DIR)! $0.005/1000 requests GET, OPTION, HEAD! $0.004/10000 requests Bandwidth All bandwidth in $0.000 (free) Bandwidth out after 1st TB $0.120 /GB to $0.050 / GB 11 Storage Tier Per Individual Copy Per 2 Copies (default) First 1 TB/month $0.043 per GB $0.086 per GB Next 49 TB/month $0.036 per GB $0.072 per GB Next 450 TB/month $0.032 per GB $0.064 per GB Next 500 TB/month $0.029 per GB $0.058 per GB Next 4000 TB/month $0.027 per GB $0.054 per GB Next 5000 TB/month $0.025 per GB $0.050 per GB Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). Storage Compute $0.00004/GB DRAMsec If you run 1000 parallel tasks on 1000 objects and they each take a second, then you've used 1000 seconds of time and the cost for this job would be $0.04. 12. Technical Appendix 13. Accessing Manta is Easy Manta REST API Manta CLI & Shell Manta Node.js SDK Manta Python SDK Manta Ruby SDK Manta Java SDK 13 14. Technical Description of Manta Multi-datacenter Object Store Granular datacenter and copy policies No size limits In-kernel (clustered ZFS DMU) More akin to a MetroCluster Netapp S3: JVM on ext3 on Linux Strongly consistent and transactional data semantics Close to UNIX le-system semantics 14 15. Analytics Capability: Codename Marlin A facility for running compute jobs directly on Manta storage nodes Complete EC2-like batch compute environment A framework for distributing work to the right physical servers, tracking which pieces are complete, capturing the output, and repeating the whole process to facilitate multi-phase computation on objects at rest Complete unix environment without any ETL A non-interactive unix shell environment for doing "work" on Manta objects as local les 15 16. Why Marlin is Revolutionary Customers are able to do queries, create datapipes, do transformations and map reduce on objects very quickly and without data movement and without the additional costs of spinning up instances 16 17. Big Data Use Case Examples - Part 1 Log processing Clickstream analysis, map reduce on logs Image processing converting formats, generating thumbnails Video processing transcoding, extracting segments, resizing Hardcore" data analysis NumPy, SciPy, R, machine learning, data mining 17 18. Big Data Use Case Examples - Part 2 SQL-like queries over structured data Similar to what Hive provides for Hadoop Datapipeling MySQL, Postgres plus other clients Text processing e-discovery and internal search engines Backup and Disaster recovery Encrypt and verify integrity without moving/downloading the data 18 19. Key Security & Sharing Example With rich access controls in Manta, it is possible to run compute on other users' data that's been made available to you Without actually having access to it Without having to ship it Without being able to egress the dataset itself 19 20. Thank You