13
1 © 2015 The MathWorks, Inc. Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany

Big Data Analytics with MATLAB - MathWorks · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, ... • DDS (Data Distribution Service) ... Distributed Computing

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

1 © 2015 The MathWorks, Inc.

Big Data Analytics with MATLAB

Dmitrij Martynenko, Application Engineer, The MathWorks Germany

2

Data Science with MATLAB

§  Data Analysis §  Statistics §  Machine Learning §  Software Engineering §  Multivariable Calculus and Linear Algebra §  Big Data §  Data Cleaning §  Data Visualization and Communication

3

How do you define Big Data?

“Any collection of data sets so large and complex that it becomes difficult to process using … traditional data processing applications.”

(Wikipedia)

“Any collection of data sets so large that it becomes difficult to process using

traditional MATLAB functions, which assume all of the data is in memory.” (MATLAB)

4

Big Data – Data Sources

File I/O •  Text •  Spreadsheet •  XML •  CDF/HDF •  Image •  Audio •  Video •  Geospatial •  Web content

Hardware Access •  Data acquisition •  Image capture •  GPU •  Lab instruments

Communication Protocols • CAN (Controller Area Network) • DDS (Data Distribution Service) • OPC (OLE for Process Control) • XCP (eXplicit Control Protocol)

Database Access •  Financial Data •  ODBC •  JDBC •  HDFS (Hadoop)

5

Three Dimensions of Scaling

Compute power •  Larger, complex problems •  Cloud technologies

Data •  More data, more quickly •  Complicated, incomplete, and variable formats •  System too complex to know governing equation

People •  Share algorithms, protect IP •  Web and enterprise

6

Three Dimensions of Scaling - MathWorks’ Solutions

Compute power MATLAB parallel computing solutions

Data MATLAB Hadoop interface Distributed arrays

People MATLAB deployment tools

7

Scale Your Data Memory and Data Access §  64-bit processors §  Memory Mapped Variables §  Disk Variables §  Databases §  Datastores

Programming Constructs §  Streaming §  Block Processing §  Parallel-for loops §  GPU Arrays §  SPMD and Distributed Arrays §  MapReduce

Platforms §  Desktop (Multicore, GPU) §  Clusters §  Cloud Computing (MDCS for EC2) §  Hadoop

8

Datastore

MATLAB – Access Data in HDFS

HDFS

Node Data

Node Data

Node Data

Hadoop

Datastore access portions of data stored in HDFS from MATLAB

ds = datastore('hdfs://localhost:9000/datasets/airline/airlinedata.csv’);

9

Datastore

MATLAB Distributed Computing Server - Hadoop

MapReduce Code

HDFS

Node Data

MATLAB Distributed Computing

Server

Node Data

Node Data

Map Reduce

Map Reduce

Map Reduce

10

Scalable Data Workflow Easily migrate from desktop to Clusters/Hadoop

Desktop

datastore/mapreduce Access HDFS

Connected to Clusters

mapreduce on clusters including Hadoop (HDFS)

MATLAB Distributed Computing Server

MATLAB Compiler

MATLAB (Parallel Computing)

Desktop

datastore/mapreduce Access HDFS

Connected to Clusters

mapreduce on clusters including Hadoop (HDFS)

Production Clusters

Deploy mapreduce for use on production clusters

11

Key Takeaways

§  Easy access to Big Data from your desktop with MATLAB

§  Work on the desktop with MATLAB and scale to clusters

§  Easy deployment into production including support for Hadoop

12

Resources

§  MATLAB MapReduce and Hadoop –  http://www.mathworks.com/discovery/matlab-mapreduce-hadoop.html –  Google “MATLAB Hadoop”

§  Consulting Team –  MATLAB for Business Critical Applications

§  Reach out to your account team

13 © 2015 The MathWorks, Inc.

Thank you!