Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Alluxio: Unify Data at Memory SpeedProduct Overview
September 26, 2017
Confidential © Alluxio, Inc. All Rights Reserved. 2
Agenda
2
1
2
3
Why we built Alluxio
Alluxio’s innovations
Use cases
Confidential © Alluxio, Inc. All Rights Reserved. 3
Data Ecosystem Yesterday
•One Compute Framework• Single Storage System• Co-located
ETL
ETL
ETL
Confidential © Alluxio, Inc. All Rights Reserved. 4
Data Ecosystem Today
…
• Many Compute Frameworks
• Multiple Storage Systems• Most not co-located
…
Confidential © Alluxio, Inc. All Rights Reserved. 5
Data Ecosystem Issues
• Each application manage multiple data sources
• Add/Removing data sources require application changes
• Storage optimizations requires application change
• Lower performance due to lack of locality
…
…
Confidential © Alluxio, Inc. All Rights Reserved. 6
Data Ecosystem Challenges
2 Data Freshness• Real time data?• Cross-network movement is slow• Each ETL creates more lag
4 Security & Governance• Data security & governance is
increasingly complex
1 Speed & Complexity• Many storage & compute systems• Integration and interoperability issues
(on prem, hybrid, cloud)• Many departments & groups
3 Cost • Data and App explosion driving cost up• Data duplication
6
Heavy integrations create painful organizational drag
Confidential © Alluxio, Inc. All Rights Reserved. 7
This is why we built AlluxioA unified data solution for the digital economy
Confidential © Alluxio, Inc. All Rights Reserved. 8
Data Ecosystem with Alluxio
• Apps only talk to Alluxio
• Simple Add/Remove
• No App Changes
• Highest performance in Memory
• No Lock in
Native File System Hadoop Compatible File System
REST Web Service Key-Value Interface
HDFS Interface Amazon S3 Interface Swift Interface NFS Interface
…
…
Confidential © Alluxio, Inc. All Rights Reserved. 9
Fastest Growing Big Data Open Source Project
0
100
200
300
400
5000 10 20 30 40 45
Num
ber
of C
ontr
ibut
ors
Open Source Contributors by Month (Github)
Alluxio
Spark
Kafka
Redis
HDFS
Cassandra
Hive
Fastest Growing open-source project in the big data ecosystem
Running world’s largest production clusters
600+ Contributors from 100+ organizations
Confidential © Alluxio, Inc. All Rights Reserved. 10
Selection of customers
Confidential © Alluxio, Inc. All Rights Reserved. 11
Alluxio Design Principles
2 Data Sharing• Don’t own the data• Multiple apps sharing common data• Data stored in multiple, hybrid systems
4 Enterprise Class• Distributed architecture• Commodity hardware• Service-oriented• High availability• Security
1 Big Data & Machine Learning• Interoperability with leading projects• Large scale data sets• High IO
3 High Speed Data Access• Remote data• Hot/warm/cold data• Temporary data• Read/write support
11
Confidential © Alluxio, Inc. All Rights Reserved. 12
Alluxio Innovation:
Unified NamespaceEnables effective data management across different Under Stores
Uses Mounting with Transparent Naming
Confidential © Alluxio, Inc. All Rights Reserved. 13
Alluxio Innovation:
Unified NamespaceCreate a catalog of available data sources for Data Scientists
/finance/customer-transactions//finance/vendor-transactions//operations/device-logs//operations/phone-call-recordings//operations/check-images//research/us-economic-data//research/intl-economic-data//marketing/advertising-dataset//marketing/marketing-funnel-dataset/
alluxio://
Confidential © Alluxio, Inc. All Rights Reserved. 14
Alluxio Innovation:
Server-side API TranslationConvert from Client-side Interface to Native Storage Interface
HDFS Interface
HDFS Interface S3A Interface Swift InterfaceGoogle Cloud Interface
Confidential © Alluxio, Inc. All Rights Reserved. 15
Alluxio Innovation:
Server-side API TranslationConvert between different versions of HDFS
HDFS 2.7 Interface
HDP 2.4 InterfaceCDH 5.6 Interface MAPR 5.2 Interface
Confidential © Alluxio, Inc. All Rights Reserved. 16
Alluxio Innovation:
Intelligent CacheLocal performance from remote data using native multi-tier storage
RAM SSD HDD
Hot Warm Cold
Read & Write BufferingTransparent to App
Policies for pinning, promotion/demotion, TTL
Confidential © Alluxio, Inc. All Rights Reserved. 17
Alluxio Innovation:
Intelligent CacheMaintain read & write operations in the event of an outage
RAM SSD HDD
Hot Warm Cold
Read & Write BufferingTransparent to App
Policies for pinning, promotion/demotion, TTL
X
Confidential © Alluxio, Inc. All Rights Reserved. 18
Where to use AlluxioFinding high-fit Alluxio use-cases
Compute ZoneStandalone or managed with Mesos or Yarn
Storage in Different Availability ZoneEither on-prem or cloud
Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance.
Spark Tensorflow Presto
HDFS
Guidelinesü Compute separated from storageü Distributed computeü I/O or network latency existsü Unification of many storage systemsü Applications sharing long lived data
More checks result in higher fit applications
Confidential © Alluxio, Inc. All Rights Reserved. 19
Where to use AlluxioFinding high-fit Alluxio use-cases
Compute ZoneStandalone or managed with Mesos or Yarn
Storage in Different Availability ZoneEither on-prem or cloud
Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance.
Spark Tensorflow Presto
HDFS
Example First ProjectsüBig Data Hybrid StorageüCommon Data CatalogüData Center ContainerizationüCloud Migrationü ETL Alternative
Confidential © Alluxio, Inc. All Rights Reserved. 20
Alluxio Offerings
Cap
abili
ty/V
alue
TechnologyValidation
Alluxio OpenSource (AOS)
Open Source
Alluxio EnterpriseEdition (AEE)
EnterpriseDeployment
• Kerberos Authentication
• LDAP Integration• Encryption• Data Replication• Fast Durable Write• Support
Alluxio Manager
Open Source
Confidential © Alluxio, Inc. All Rights Reserved. 21
Use Cases
Confidential © Alluxio, Inc. All Rights Reserved. 22
Next Gen Analytics PlatformLeading US TechnologyCompany
Confidential © Alluxio, Inc. All Rights Reserved. 23
HPC/Deep Learning Partnership -
Alluxio maximizes GPU investment:
• Self-serve data access for data scientists
• Rapid integration of new data sources
• Improved memory management & performance
Confidential © Alluxio, Inc. All Rights Reserved. 24
Machine Learning Case Study –
Challenge –Slow training of model for algorithmic trading in $46B data driven Hedge Fund
Data access was slow, costing them $$ in compute cost and lower modeler productivity
SPARK
HDFS
SPARK
HDFS
Solution –With Alluxio, data access are 10-30X faster
Impact –Increased efficiency on training of ML algorithm, lowered compute cost and increased modeler productivity, resulting in 14 day ROI of Alluxio
MES
OS
MES OS
Public Internet
Public Internet
Confidential © Alluxio, Inc. All Rights Reserved. 25
Consumer Intelligence Use Case – Top 3 Telco
Challenge –Desired a central view of consumer information in near real time for proactive support.
Many HDFS, different distributions, many incompatible versions. On-prem & cloud. Integration through heavy ETL.
HADOOP
Solution –Alluxio integrates data into central catalog for fast access to consumer interaction records.
Impact –Reduced integration timeFaster data speed & freshness
ML HADOOP
HDFS HDFS HDFS
ML
ETL
HDP
HDFS
CDH
HDFS
MAPR
HDFS
HDFS
Confidential © Alluxio, Inc. All Rights Reserved. 26
Big Data Case Study – Top 3 Retailer
Challenge –Bottleneck in Trend Analysis of mission critical daily sales and inventory management
Queries were slow / not interactive, resulting in operational inefficiency
SPARK
HDFS
SPARK
HDFS
Solution –With Alluxio, data queries are 10X faster
Impact –Higher operational efficiency
Use case: http://bit.ly/2ook8Nh
Confidential © Alluxio, Inc. All Rights Reserved. 27
Big Data Case Study –
27
Challenge –Gain end to end view of business with large volume of data
Queries were slow / not interactive, resulting in operational inefficiency
SPARK
TERADATA
SPARK
TERADATA
Solution –ETL Data from Teradata to Alluxio
Impact –Faster Time to Market – “Now we don’t have to work Sundays”
Use Case: http://bit.ly/2oMx95W
Confidential © Alluxio, Inc. All Rights Reserved. 28
Enabling Next Gen Big Data Analytics
1
2
3
Unified Storage Bridge
Unified Cache Management
Security & Governance
Twitter.com/alluxio
Linkedin.com/alluxio
Websitewww.alluxio.com
@
Social Media
á
�
Confidential © Alluxio, Inc. All Rights Reserved. 29
Thank You!