Upload
ian-lumb
View
56
Download
1
Tags:
Embed Size (px)
Citation preview
The Rise in Popularity of Apache Spark With Ian Lumb, Product Marketing Manager
Youtube VoDcast: https://youtu.be/PimVUaQBMLM
6
Abstraction for in-memory computing
Fault-tolerant, parallel data structures• Cluster-ready
Optionally persistent
Can be partitioned for optimal placement
Manipulated via operators
Resilient Distributed Datasets (RDDs)Resilient Distributed Datasets (RDDs)
Zaharia et al., NSDI 2012http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
10
Well-managed ClustersWell-managed Clusters
https://spark.apache.org/ http://aryannava.com/2014/02/19/apache-hadoop-ecosystem/hadoopecosystem/
14
Spark’s Converged ApplicationsSpark’s Converged Applications
http://www.informationweek.com/big-data/big-data-analytics/apache-spark-3-promising-use-cases/a/d-id/1319660
15
Big Data Analytics
• Combine SQL, streaming, machine learning and graph analytics
HPC
• Decouple from Hadoop to easily incorporate with existing infrastructure
Spark’s converged application playSpark’s converged application play
https://spark.apache.org/
www.brightcomputing.com/solutions-hadoop
Youtube VoDcast: https://youtu.be/PimVUaQBMLM