20
The BDAS Open Source Community UC BERKELEY Ion Stoica UC Berkeley and Databricks

The BDAS Open Source Community

Embed Size (px)

Citation preview

Page 1: The BDAS Open Source Community

The BDAS Open Source Community

UC  BERKELEY  

Ion Stoica UC Berkeley and Databricks

Page 2: The BDAS Open Source Community

Growing Beyond AMPLab As software matures and becomes successful, more and more contributors outside AMPLab New startups have anchored development » Databricks (Spark Stack) » Mesosphere (Mesos) » …

Enables AMPLab to focus more resources on future systems instead of software maintenance

Page 3: The BDAS Open Source Community

Apache Spark

Velox Model Serving

Tachyon

SparkStreaming SparkSQL

BlinkDB

GraphX MLlib

MLBase SparkR

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark (core)

HDFS, S3, …Apache Mesos Yarn

Tachyon

Page 4: The BDAS Open Source Community

Apache Spark Open Source: end of 2010 Apache Project: 2013 Over time has grown to include key libraries » SparkStreaming, SparkSQL, MLlib, GraphX

Becoming a platform for Big Data apps

Page 5: The BDAS Open Source Community

Apache Spark Today M

apRe

duce

YARN H

DFS Stor

m

Spar

k

0200400600800

100012001400160018002000

Map

Redu

ce

YARN

HD

FS

Stor

m

Spar

k

0

50000

100000

150000

200000

250000

300000

350000

Commits Lines of Code ChangedActivity in past 6 months

2-3x more activity than: Hadoop, Storm, MongoDB, NumPy, D3, Julia, …

Page 6: The BDAS Open Source Community

Meetups Around the World

Page 7: The BDAS Open Source Community

Monthly Contributors

0

25

50

75

100

2011 2012 2013 2014

370+ contributors for last 12 months

Databricks founded

Page 8: The BDAS Open Source Community

Spark Stack (2013)

Tachyon

SparkStreaming

BlinkDB

MLlib

MLBase

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark (core)

HDFS, S3, …Apache Mesos Yarn

Tachyon

Shark

Page 9: The BDAS Open Source Community

MLlib

Last Year Developments

Tachyon

SparkStreaming

BlinkDB MLBase

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark (core)

HDFS, S3, …Apache Mesos Yarn

Tachyon

SharkSparkSQL GraphX MLlib

Tachyon

SparkR

TachyonUC  BERKELEY  

TachyonUC  BERKELEY  

…  

UC  BERKELEY  

BlinkDB

Velox Model Serving

Page 10: The BDAS Open Source Community

Wide Adoption All major Hadoop distributions include Spark

Beyond Hadoop

Page 11: The BDAS Open Source Community

Wide Adoption All major Hadoop distributions include Spark

Beyond Hadoop

Databricks: spurred Spark’s enterprise growth

partners

partners

Page 12: The BDAS Open Source Community

Apache Mesos

Velox Model Serving

Tachyon

SparkStreaming SparkSQL

BlinkDB

GraphX MLlib

MLBase SparkR

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark

HDFS, S3, …Apache Mesos Yarn

Tachyon

Page 13: The BDAS Open Source Community

Apache Mesos Open Source: 2010 Apache Project: 2012 Used in production at Twitter for past 2.5 years » +10,000 machines » +500 engineers using it

Most development moved outside Berkeley starting with 2012

Page 14: The BDAS Open Source Community

Monthly Contributors

65 contributors for last 12 months

Mesosphere founded

Page 15: The BDAS Open Source Community

BDAS Stack

Velox Model Serving

Tachyon

SparkStreaming SparkSQL

BlinkDB

GraphX MLlib

MLBase SparkR

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark

HDFS, S3, …Apache Mesos Yarn

Tachyon

Page 16: The BDAS Open Source Community

Release Growth

Tachyon 0.2: - 3 contributors

Feb ‘14 Oct‘13 Apr ‘13

Tachyon 0.3: - 15 contributors

Tachyon 0.4: - 30 contributors

16  July ‘14

Tachyon 0.5: - 46 contributors

Tachyon 0.1: -1 contributor

Dec ‘12

Page 17: The BDAS Open Source Community

Fast Growing Community

Berkeley Contributors Non-Berkeley Contributors (20+ companies)

~80% contributors already outside AMPLab

Page 18: The BDAS Open Source Community

Reaching Tipping Point

18  

Page 19: The BDAS Open Source Community

Research to Real-World Impact

Research Real-world Impact

Apache Spark (core)

MLlib Spark Streaming

Spark SQL

Apache Mesos

GraphX

Tachyon Velox

Succinct

ADAM

BlinkDB

AMPLab/Berkeley Non-Berkeley

com

mitt

ers

/ com

mits

Page 20: The BDAS Open Source Community

Impact on AMPLab Created blue-print & ecosystem for other BDAS components to succeed » MLlib, GraphX, Tachyon, …

Enabled AMPLab to increase focus on new research projects » Velox, ADAM, Succinct, …