9

Spark MLlib

Machine Learning with Spark MLlib

Download PDF Report

Upload
todd-mcgrath
View
213
Download
5

Embed Size (px)

Citation preview

Page 1: Machine Learning with Spark MLlib

Spark MLlib

Page 2: Machine Learning with Spark MLlib

Overview

• MLlib is Spark’s library of machine learning (ML) functions designed to run in parallel on clusters. MLlib contains a variety of learning algorithms

• MLlib invokes various algorithms on RDDs

• Some classic ML algorithms are not included with Spark MLlib because they were not designed for parallel

Page 3: Machine Learning with Spark MLlib

Overview

• Divided into two packages:

• spark.mllib contains the original API built on top of RDDs.

• spark.ml provides higher-level API built on top of DataFrames

• Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. Plan is to keep supporting spark.mllib along with the development of spark.ml.

Page 4: Machine Learning with Spark MLlib

Machine Learning Recap

• Machine learning algorithms try to predict or make decisions based on training data.

• There are multiple types of learning problems, including classification, regression, or clustering. All of which have different objectives.

Page 5: Machine Learning with Spark MLlib

Spark MLlib Data Types

• MLlib contains a few specific data types including Vector, LabeledPoint, Rating, Matrix (local and distributed) and various Model classes.

Page 6: Machine Learning with Spark MLlib

MLlib Supported Supervised Algorithm Methods

• Binary Classification Problems

• linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive bayes

• Multiclass Classification Problems

• logistic regression, decision trees, random forests, naive Bayes

• Regression Problems

• linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression

Page 7: Machine Learning with Spark MLlib

MLlib Supported Unsupervised Models

• K-means

• Gaussian mixture

• Power iteration clustering (PIC)

• Latent Dirichlet allocation (LDA)

• Bisecting k-means

• Streaming k-means

Page 8: Machine Learning with Spark MLlib

Recommender Systems

• Collaborative filtering is commonly used for recommender systems.

• spark.mllib currently supports model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries.

• spark.mllib uses the alternating least squares (ALS) algorithm to learn these latent factors.

Page 9: Machine Learning with Spark MLlib

For more, visit https://supergloo.com

https://supergloo.com

Apache spark mllib guide for pipeliningcis.csuohio.edu/~sschung/cis612/Apache spark mllib guide for pipelining.pdfHere is the Apache spark mllib guide for pipelining: ... hence a Transformer

Apache spark mllib guide for pipeliningcis.csuohio.edu/~sschung/cis612/Apache spark mllib guide for pipelining.pdfHere is the Apache spark mllib guide for pipelining: ... hence a Transformer

Documents

Spark Machine Learning · Spark Machine Learning Amir H. Payberah amir@sics.se SICS Swedish ICT June 30, 2016 Amir H. Payberah (SICS) MLLib June 30, 2016 1 / 1

Spark Machine Learning · Spark Machine Learning Amir H. Payberah [email protected] SICS Swedish ICT June 30, 2016 Amir H. Payberah (SICS) MLLib June 30, 2016 1 / 1

Documents

Neural Networks, Spark MLlib, Deep Learning

Neural Networks, Spark MLlib, Deep Learning

Data & Analytics

Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Documents

Spark MLlib is the Spark component providing the machine learning…€¦ · Spark MLlib is the Spark component providing the machine learning/data mining algorithms ... applying

Spark MLlib is the Spark component providing the machine learning…€¦ · Spark MLlib is the Spark component providing the machine learning/data mining algorithms ... applying

Documents

Hadoop architecture and ecosystemdbdmg.polito.it/wordpress/wp-content/uploads/2017/02/14_Spark... · Spark SQL structured data Spark Streaming real-time MLlib (Machine learning and

Hadoop architecture and ecosystemdbdmg.polito.it/wordpress/wp-content/uploads/2017/02/14_Spark... · Spark SQL structured data Spark Streaming real-time MLlib (Machine learning and

Documents

Spark Streaming and MLlib - Hyderabad Spark Group

Spark Streaming and MLlib - Hyderabad Spark Group

Technology

Machine Learning with Spark - HPC-Forge · • MLlib is a Spark subproject providing machine learning primitives: • MLlib’s goal is to make practical machine learning (ML) scalable

Machine Learning with Spark - HPC-Forge · • MLlib is a Spark subproject providing machine learning primitives: • MLlib’s goal is to make practical machine learning (ML) scalable

Documents

Apache Spark MLlib - GitHub Pagesbekbolatov.github.io/docs/SparkMLlib.pdf · Apache Spark MLlib Machine Learning Library for a parallel computing framework Review by Renat Bekbolatov

Apache Spark MLlib - GitHub Pagesbekbolatov.github.io/docs/SparkMLlib.pdf · Apache Spark MLlib Machine Learning Library for a parallel computing framework Review by Renat Bekbolatov

Documents

Spark MLlib and Viral Tweets

Spark MLlib and Viral Tweets

Data & Analytics

Recent Developments in Spark MLlib and Beyond

Recent Developments in Spark MLlib and Beyond

Software

Hadoop architecture and ecosystemdbdmg.polito.it/.../25_SparkMLlib_Part1_BigData_2x.pdf · Spark MLlib is the Spark component providing the machine learning/data mining algorithms

Hadoop architecture and ecosystemdbdmg.polito.it/.../25_SparkMLlib_Part1_BigData_2x.pdf · Spark MLlib is the Spark component providing the machine learning/data mining algorithms

Documents

Intro To Machine Learning · 2021. 2. 1. · Using MLlib One of the reasons we use spark is for easy access to powerful data analysis tools. The MLlib library gives us a machine learning

Intro To Machine Learning · 2021. 2. 1. · Using MLlib One of the reasons we use spark is for easy access to powerful data analysis tools. The MLlib library gives us a machine learning

Documents

Spark MLlib - courses.physics.illinois.edu · Machine Learning on Spark (MLlib) MLlib allows for distributed machine learning on very large datasets. Built on top of Spark so you

Spark MLlib - courses.physics.illinois.edu · Machine Learning on Spark (MLlib) MLlib allows for distributed machine learning on very large datasets. Built on top of Spark so you

Documents

MLlib and All-pairs Similarity - Stanford Universityrezab/slides/maryland_mllib.pdfSpark Core Spark Streaming" real-time Spark SQL structured GraphX graph MLlib machine learning …

MLlib and All-pairs Similarity - Stanford Universityrezab/slides/maryland_mllib.pdfSpark Core Spark Streaming" real-time Spark SQL structured GraphX graph MLlib machine learning …

Documents

Apache Spark MLlib 2.0 Preview: Data Science and Production

Apache Spark MLlib 2.0 Preview: Data Science and Production

Software

MLlib: Scalable Machine Learning on Sparkrezab/sparkworkshop/slides/...Spark SQL + MLlib // Data can easily be extracted from existing sources, // such as Apache Hive. val trainingTable

MLlib: Scalable Machine Learning on Sparkrezab/sparkworkshop/slides/...Spark SQL + MLlib // Data can easily be extracted from existing sources, // such as Apache Hive. val trainingTable

Documents

Best practices for productionizing Apache Spark MLlib models

Best practices for productionizing Apache Spark MLlib models

Documents

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

Software

Reference Architecture · Machine learning – discipline in computer science which gives computers the ability to learn from data. MLlib Apache Spark MLlib is a distributed machine

Reference Architecture · Machine learning – discipline in computer science which gives computers the ability to learn from data. MLlib Apache Spark MLlib is a distributed machine

Documents

MLlib: Scalable Machine Learning on Sparkstanford.edu/~rezab/sparkworkshop/slides/xiangrui.pdf · MLlib: Scalable Machine Learning on Spark ... MATLAB at even moderate levels of data,

MLlib: Scalable Machine Learning on Sparkstanford.edu/~rezab/sparkworkshop/slides/xiangrui.pdf · MLlib: Scalable Machine Learning on Spark ... MATLAB at even moderate levels of data,

Documents

Hadoop architecture and ecosystem - polito.itdbdmg.polito.it/.../25_SparkMLlib_Part1_BigData_6x.pdf · Spark MLlib is the Spark component providing the machine learning/data mining

Hadoop architecture and ecosystem - polito.itdbdmg.polito.it/.../25_SparkMLlib_Part1_BigData_6x.pdf · Spark MLlib is the Spark component providing the machine learning/data mining

Documents

Spark MLlib is the Spark component providing the machine ......Spark MLlib is the Spark component providing the machine learning/data mining algorithms Pre-processing techniques Classification

Spark MLlib is the Spark component providing the machine ......Spark MLlib is the Spark component providing the machine learning/data mining algorithms Pre-processing techniques Classification

Documents

MLlib: Scalable Machine Learning on Spark

MLlib: Scalable Machine Learning on Spark

Documents

Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day

Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day

Software

MLlib: Spark's Machine Learning Library

MLlib: Spark's Machine Learning Library

Documents

Cloudera’sInvestmentsin theSparkEcosystem - Meetupfiles.meetup.com/12063092/Olson - Spark at Cloudera.pdf · • Mllib"–Machine"Learning"toolkit ... • SQL"supportin"Spark"Streaming"

Cloudera’sInvestmentsin theSparkEcosystem - Meetupfiles.meetup.com/12063092/Olson - Spark at Cloudera.pdf · • Mllib"–Machine"Learning"toolkit ... • SQL"supportin"Spark"Streaming"

Documents

Spark - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/spark.pdf · SparkSQL & Dataframe Catalyst Optimizer Spark Streaming Mllib (Machine learning) GraphX (Graph

Spark - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/spark.pdf · SparkSQL & Dataframe Catalyst Optimizer Spark Streaming Mllib (Machine learning) GraphX (Graph

Documents

Elasticsearch And Apache Lucene For Apache Spark And MLlib

Elasticsearch And Apache Lucene For Apache Spark And MLlib

Data & Analytics

Practical Machine Learning Pipelines with MLlib Joseph K. Bradley March 18, 2015 Spark Summit East 2015

Practical Machine Learning Pipelines with MLlib Joseph K. Bradley March 18, 2015 Spark Summit East 2015

Documents