27
1 © 2015 The MathWorks, Inc. Becoming a Data-Centric Engineering Team Emelie Andersson

Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

1© 2015 The MathWorks, Inc.

Becoming a Data-Centric Engineering Team

Emelie Andersson

Page 2: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

2

A path for how your team can

better work with and utilize

data.

Page 3: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

3

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Generally Useful

Tools for

Analysis

Common

Infrastructure;

Tested and

Documented

Overhead when asking

a new question

Ease of scaling

to more people

Page 4: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

4

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Goal is to be fast: reduce time to insight

Page 5: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

5

Getting Started: Exploring a New Dataset

Page 6: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

6

Getting Started: Exploring a New Dataset

Page 7: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

7

Getting Started: Exploring a New Dataset

Page 8: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

8

Getting Started: Exploring a New Dataset Missing Dataismissing

rmmissing

fillmissing

Outliersisoutlier

rmoutliers

filloutliers

Change Pointsischange

Noisy Datasmoothdata

and more…

https://www.mathworks.com/help/

matlab/preprocessing-data.html

Page 9: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

9

Getting Started: Exploring a New Dataset

Page 10: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

10

Getting Started: Exploring a New Datasetgeoplot

geoscatter

geobubble

geodensityplot

https://www.mathworks.com/help/

matlab/geographic-plots.html

Page 11: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

11

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Explore and understand data

• Document analysis

• Tools will be re-used in next steps

Generally Useful

Tools for

Analysis

Page 12: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

12

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Apply to different datasets

• Functions/Scripts

• MATLAB Apps

• Trend: Work with BIG DATA

Generally Useful

Tools for

Analysis

Page 13: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

13

Overview of Flight Data

▪ 35 unique aircraft

▪ 180,000 unique flights

▪ 300 GB of data

▪ Source:

– NASA Dash Link: Sample Flight Data

– https://c3.nasa.gov/dashlink/projects/85/

Page 14: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

14

Big Data Creates Opportunities

Find rare events, then dive deeper

Build and validate test scenarios that match real-world conditions

Perform fleet-wide calculations

Page 15: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

15

Big Data Requires New Tools

Built-In Datastores

General datastore

spreadsheetDatastore

tabularTextDatastore

fileDatastore

Database databaseDatastore

Image imageDatastore

denoisingImageDatastore

randomPatchExtractionDatastore

pixelLabelDatastore

augmentedImageDatastore

Audio audioDatastore

Predictive

Maintenance

fileEnsembleDatastore

simulationEnsembleDatastore

Simulink SimulationDatastore

Automotive mdfDatastore

Page 16: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

16

Big Data Requires New Tools

▪ Customize a datastore to work with

your dataset

▪ Gives you control over how data is

loaded and formatted

▪ MATLAB subclass: “fill-in-the-blanks”

▪ Build a piece of infrastructure, then re-

use it in your analyses

function [data,info] = read(ds)

...

end

function tf = hasdata(ds)

...

end

function reset(ds)

...

end

function p = progress(ds)

...

end

function data = readall(ds)

...

end

Custom Datastore

Page 17: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

17

A Custom Datastore for Flight Data

Page 18: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

18

Find Rare Events, then Dive Deeper

Page 19: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

19

Perform Fleet-Wide Calculations

Page 20: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

20

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Make it easy to navigate the data

• Re-use each time you analyze the dataset

Generally Useful

Tools for

Analysis

Page 21: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

21

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Collaborate: Work with others on a common code base

• Verify: Write well-tested software

• Share: Build tools for others

Page 22: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

22

MATLAB Projects

Page 23: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

23

Testing

Page 24: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

24

Creating a Toolbox

Page 25: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

25

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Scale-out to larger group of users

• Easier to maintain and share

Generally Useful

Tools for

Analysis

Page 26: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

26

What’s Next?

Advanced Analytics and Machine Learning

Build and Test Algorithms for

Embedded Systems

Deploy Apps and Analytics to

Enterprise IT Systems

Page 27: Becoming a Data-Centric Engineering Team...Ad-hoc Individual Analysis Common Infrastructure; Tested and Documented •Apply to different datasets •Functions/Scripts •MATLAB Apps

27

Takeaways

▪ MATLAB has many new tools to help you better work with and utilize

your data

▪ Create tools for you / your team / your organization to explore and

analyze data

▪ Increasing maturity with data science is a journey; we’re here to help