45
Big Data and FrameWorks ; Perspectives to Applied Machine Learning Mehdi Habibzadeh PhD in Computer Science

Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and FrameWorks;

Perspectives to Applied Machine

Learning

Mehdi Habibzadeh

PhD in Computer Science

Page 2: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Outlines (Oct 2016) :

Big Data and Challenges

Review and Trends

Math and Probability Concepts

Data Structure and Retrieval Algorithms

Map-Reduce on Large Clusters

Hadoop Framework Programming

Apache Spark Framework

Big Data and Cloud Computing

Big Data and NoSQL

Machine Learning (Conventional and Deep Learnings)

Big Data in the real world

2016 Big Data and Applied Machine Learning 2

Page 3: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and Challenges

Sources and Massive Information

Characteristics and Trends

The year 2015 was a big jump in the world of big data.

» Adoption of technologies, associated with unstructured data

» Ref : http://www.tableau.com/top-8-trends-big-data-2016?

2016 Big Data and Applied Machine Learning 3

Page 4: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and Challenges (Cont.)

2016 Big Data and Applied Machine Learning 4

Page 5: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and Challenges (Cont.)

2016 Big Data and Applied Machine Learning 5

Page 6: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and Challenges (Cont.)

2016 Big Data and Applied Machine Learning 6

Page 7: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and Challenges (Cont.)

2016 Big Data and Applied Machine Learning 7

Page 8: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data: Math Terms

Understanding and Visualization

Missing values, Outliers values , ….

ML -Maximum Likelihood

EM-Expectation Maximization

The interquartile range (IQR)

Data Mining and Statistical Approaches

Data Dimensionality Reduction ( PCA, SFS, BFS,….)

Relevance and Redundancy (Kruskal–Wallis, Kolmogorov-Smirnov)

Regression modeling (Logistic Regression, )

Data compression (Singular value decomposition)

Variable Selection and Ranking (Eigen values/Vectors, HDMR)

2016 Big Data and Applied Machine Learning 8

Page 9: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data: Math Terms (Cont.)

Feature selection :

Reasons and motivation

To trace effectiveness of aforementioned high dimensional invariant

descriptors in white blood cell classification performance.

To provide a smaller effective set compared to the starting data pool.

To avoid redundant or irrelevant features.

Two approaches (Wrapper - Filter) :

Wrapper: An iterative method with considering its predictive efficiency

to a given classifier (Pattern Recognition algorithm).

Filter : The objective function evaluates subsets using statistical

dependency, Regression, interclass distance (Machine Learning).

2016 Big Data and Applied Machine Learning 9

Page 10: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data: Math Terms (Cont.)

Machine Learning and Predicting

Reliability, Uncertainty and Global Sensitivity Analysis

Clustering and Classification

Validation Method ( Cross Validation, Hold-out datasets, …. )

Graph Laplacian for clustering

Deterministic (NN, SVM , … )

Probabilistic methods ) Bayes classifier, PAM,…)

Deep Learning (Hierarchical Classification)

2016 Big Data and Applied Machine Learning 10

Page 11: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data Search Algorithms

Cache aware and Cache oblivious model

Using CPU cache without having the size of the cache (Sort of

Machines ) Memory performance & Improvement

Adapt to arbitrary memory hierarchies

Data clustering

Locality of memory references is increased.

Application : Matrix multiplication, Sorting, Matrix transposition

2016 Big Data and Applied Machine Learning 11

Page 12: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data Retrieval Algorithms

Streaming

Online Data Management

Adapt to arbitrary and unstructured Input Data

Real-Time Analytical Processing (RTAP)

2016 Big Data and Applied Machine Learning 12

Page 13: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Map-Reduce on Large Clusters

Motivation and Demand:

Tend to be very short, code-wise

Represent a data flow

2016 Big Data and Applied Machine Learning 13

Page 14: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Map-Reduce (Cont.)

2016 Big Data and Applied Machine Learning 14

Page 15: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Map-Reduce (Cont.)

2016 Big Data and Applied Machine Learning 15

Page 16: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Map-Reduce (Cont.)

Each step has one Map phase and one Reduce phase

Convert any into MapReduce pattern

Great solution for one-pass computations

Not very efficient for Multi-pass computations and algorithms

2016 Big Data and Applied Machine Learning 16

Page 17: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Hadoop Framework

Features :

Open Source Framework for Processing Large Data

Work on Cheap and Unreliable Clusters

Known in Companies who deal with Big Data Applications

Compatible with Java, Python and Scala

2016 Big Data and Applied Machine Learning 17

Page 18: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Hadoop Framework (Cont.)

MapReduce Framework

Assign work for different nodes

Hadoop Distributed File System (HDFS)

Primary storage system used by Hadoop applications.

Copies each piece of data and distributes to individual nodes

Name Node (Meta Data) and Data Nodes (File Blocks)

Redundant information ( Three times by default)

Machines in a given cluster are cheap and unreliable

Decreases the risk of catastrophic failure

» Even in the event that numerous nodes fail

Links together the file systems on different nodes to make an

integrated big file system (Parallel Processing(

2016 Big Data and Applied Machine Learning 18

Page 19: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Hadoop Framework (Cont.)

Hadoop V.2 : Hadoop NextGen MapReduce (YARN)

2016 Big Data and Applied Machine Learning 19

Page 20: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Hadoop Framework (Cont.)

Hadoop Programming

Java

Full control of MapReduce , Cascading (Open Java Library)

Python , Scala, Ruby

Data Retrieval / Query Language

Hive

SQL- Like Language

Pig

Data Flow Language (Simple and Out of Small Steps)

Scalding

Library built on top of Scala (Elegant Model)

2016 Big Data and Applied Machine Learning 20

Page 21: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data Programming

R – Java- Python and Scala ( Commonly Used)

Three References : ( Recommended to Read)

https://www.linkit.nl/knowledge-

base/177/4_most_used_languages_in_big_data_projects_Java

https://www.linkit.nl/knowledge-

base/226/4_most_used_languages_in_big_data_projects_R

https://www.linkit.nl/eng/knowledge-

base/196/4_most_used_languages_in_big_data_projects_Python

2016 Big Data and Applied Machine Learning 21

Page 22: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Hadoop Framework (Cont.)

2016 Big Data and Applied Machine Learning 22

Page 23: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Apache Spark Framework

Spark Features (More than Distributed Processing)

Ease of use, and sophisticated analytics

In-memory data storage and near real-time processing

Holds intermediate results in memory

Store as much as data in memory and then goes to disk

Spark vs Hadoop

On top of existing HDFS

Data sets that are diverse in nature (Text, Videos, …)

Variety in source of data (Batch v. real-time streaming data).

100 times faster in memory, 10 times faster when running on disk.

2016 Big Data and Applied Machine Learning 23

Page 24: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Apache Spark Framework (Cont.)

2016 Big Data and Applied Machine Learning 24

Page 25: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Apache Spark Framework (Cont.)

Compatible with Java, Scala and Python

Perform Data Analytics and Machine Learning

SQL Queries, Streaming Data

Machine Learning and Graph Data Processing

Spark MLlib, Spark’s Machine Learning library

Spark and data stored in a Cassandra database

2016 Big Data and Applied Machine Learning 25

Page 26: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and Cloud

Cloud Computing Platform & Services

(Cloudera, Hortonworks, MapR, Azure)

2016 Big Data and Applied Machine Learning 26

Page 27: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and NoSQL

Key-values Stores

Unique key and a pointer to a particular item of data.

Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon

SimpleDB, Riak

Column Family Stores

Very large amounts of data distributed over many machines.

Cassandra, HBase

2016 Big Data and Applied Machine Learning 27

Page 28: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and NoSQL (Cont.)

Document Databases

Similar to key-value stores,

Semi-structured documents are stored in formats like JSON

Allowing nested values associated with each key.

Document databases support querying more efficiently.

CouchDB, MongoDb

2016 Big Data and Applied Machine Learning 28

Page 29: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and NoSQL (Cont.)

Graph Database

Flexible graph model

Instead of tables of rows and columns and the rigid structure

of SQL

Scale across multiple machines (Scale Out)

Neo4J, InfoGrid, Infinite Graph, Titan

2016 Big Data and Applied Machine Learning 29

Page 30: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and NoSQL (Cont.)

JASON Format

2016 Big Data and Applied Machine Learning 30

RDBMS NoSQL

Databae Database

Table, View Collection

Row Document (JSON, BSON)

Column Field

Index Index

Join Embedded Document

Foreign Key Reference

Partition Shard

> db.user.findOne({age:39})

{

"_id" : ObjectId("5114e0bd42…"),

"first" : "John",

"last" : "Doe",

"age" : 39,

"interests" : [

"Reading",

"Mountain Biking ]

"favorites": {

"color": "Blue",

"sport": "Soccer"}

}

Page 31: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data and NoSQL (Cont.)

2016 Big Data and Applied Machine Learning 31

Page 32: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Machine Learning

Conventional Methods

Feature Extraction and Selection as an Input and

Proposed machine as a Classifier

Sample ML Methods :

Support Vector Machines (SVM)

Naive Bayes Classifier

Artificial Neural Network (ANN)

2016 Big Data and Applied Machine Learning 32

Page 33: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Machine Learning (Cont.)

Support Vector Machine (SVM)

Kernel Settings (Linear, polynomial and Gaussian )

Number of features is compared to the training sample.

Less prone to over fitting than alternative choice.

Soft-Margin and Hard Margin.

Over fitting controlled by soft margin (Slack variables εi)

One-versus-all.

Well in practice ( highest response)

K Fold - cross validation(Validation data)

2016 Big Data and Applied Machine Learning 33

Page 34: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Machine Learning : Deep Learning

Supervised & Unsupervised approaches

Greedy layer-wise unsupervised pre-training.

Hierarchy of features one level at a time,

Learn a new transformation at each level to be composed with the

previously learned transformations.

Seeking for regularities to extract an unique representation

Higher layer will find more useful than the original input

Accurate hierarchical representation of complex data

Subsequent feature extraction,

Classification problems (types and classes)

2016 Big Data and Applied Machine Learning 34

Page 35: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Deep Learning (Cont.)

Earliest concepts of deep learning :

Perceptron Neural Networks structures.

Neural Network technically can have more than one hidden layer.

Increasing the number of hidden layers

» Vanishing gradients, Over fitting.

2016 Big Data and Applied Machine Learning 35

Page 36: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Deep Learning (Cont.)

Auto-encoders, Stacked Auto-encoders, Restricted

Boltzmann Machines, The spike and slab Restricted

Boltzmann Machine (RBM), Deep Belief Networks,

Convolutional Networks

2016 Big Data and Applied Machine Learning 36

Page 37: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Deep :Convolution Neural Network

Extract topological invariant properties (spatially local connections

(receptive fields) ) from the gray-scale image

Especially in which input is spatially or temporally distributed

CNN is composed of two distinct parts :

Several layers are convolution and then down-sampled (Max pooling)

The second part categorizes the pattern into classes (such as RBF).

CNN consists of three different layers:

convolution layer (with different feature map), sub-sampling (max-

pooling) layer and an ensemble of fully connected layers

2016 Big Data and Applied Machine Learning 37

Page 38: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Convolution Neural Network (Cont.)

2016 Big Data and Applied Machine Learning 38

CNN : Recognition rate after 105 epoch, Few samples (28 per class),

Similarity between Basophil and Lymphocyte

Page 39: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Deep learning In Codes!

Reference :

www.deeplearning.net/software_links

Programming Language :

Python – Matlab – Java – Lua

Machine Learning in Python

Scikit-learn , Keras, Caffe , ….

Pylearn2

Machine Learning in Matlab

Torch7

Machine Learning in Java

Deeplearning4j

2016 Big Data and Applied Machine Learning 39

Page 40: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Machine Learning in Python

2016 Big Data and Applied Machine Learning 40

Page 41: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data in the real world

Climate data, Large scale health care

Complex Image Processing

Personalization ( Facebook, Telegram, ….)

Advertising, Mobile Telecommunication Networks (i.e, 5G),

E-commerce and E- Banking Applications

2016 Big Data and Applied Machine Learning 41

Page 42: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data in the real world (Cont.)

Deep Learning Algorithm Transcribes House Numbers (Google)

2016 Big Data and Applied Machine Learning 42

Page 43: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data in the real world (Cont.)

Car Classification using Deep Learning Approach

2016 Big Data and Applied Machine Learning 43

Page 44: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Big Data in the real world (Cont.)

Banking Systems; Big Data and Deep Learning

Banknote Authentication and Forgery Detection

Financial Fraud Detection

Bank Embezzlement & Money Laundering

Boost e-commerce Sales

Losing From Disgruntled Customers

Loan Approval Prediction

2016 Big Data and Applied Machine Learning 44

Page 45: Big Data and FrameWorks; Perspectives to Applied Machine ...up.persianscript.ir/...DeepLearning-MehdiHabibzadeh... · Deep Learning (Cont.) Earliest concepts of deep learning : Perceptron

Contact Info

Mehdi (Nima) Habibzadeh Motlagh

PhD in Computer Science (Concordia university , Sept 2015)

Email : [email protected]

Cell phone : +98 912 326 7046

Telegram : +1 514 632 2838

2016 Big Data and Applied Machine Learning 45