20
ORNL is managed by UT-Battelle for the US Department of Energy Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks Steven R. Young Research Scientist Oak Ridge National Laboratory

New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

ORNL is managed by UT-Battelle

for the US Department of Energy

Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks

Steven R. Young

Research Scientist

Oak Ridge National Laboratory

Page 2: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

2 Adapting DL to New Data

Overview

• Deep Learning in Science

• Challenges

• Tools

• Next Steps

Page 3: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

3 Adapting DL to New Data

Deep Learning for Science Applications

CommercialApplications

ScienceApplications

Object Recognition

Face Recognition

State of the Art Results

Characteristics

• Data is easy to collect• Inexpensive labels

Challenging New Domains

Material Science

High Energy Physics

Characteristics

• Data is difficult to collect• Few labels available

Page 4: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

4 Adapting DL to New Data

Problem: Adaptability Challenge

• Premise: For every data set, there exists a corresponding neural network that performs ideally with that data

• What’s the ideal neural network architecture (i.e., hyper-parameters) for a particular data set ?

• Widely-used approach: intuition

1. Pick some deep learning software (Caffe, Torch, Theano, etc)

2. Design a set of parameters that defines your deep learning network

3. Try it on your data

4. If it doesn’t work as well as you want, go back to step 2 and try again.

Page 5: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

5 Adapting DL to New Data

The Challenge

Pooling

Convolutional

Input

Output

Fully Connected

Pooling

Convolutional

Output

Fully Connected

Pooling

ConvolutionalLearning Rate Batch Size

Momentum Weight Decay

Deep Learning Toolbox

Page 6: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

6 Adapting DL to New Data

Deep Learning Toolbox

The Challenge

Convolutional

Input

Convolutional

Output

Learning Rate Batch Size

Momentum Weight Decay

Page 7: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

7 Adapting DL to New Data

Hyper-parameter Selection

• Manual search, guess and check

– Requires domain knowledge

• Grid search

– Exponential growth with high-dimensional hyper-parameter space

– Doesn’t exploit low effective dimension for discovery

• Random search

– By itself, not adaptive (no use of information from previous experiments)

Page 8: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

8 Adapting DL to New Data

What can we do with Titan?

18,688 GPUs

Page 9: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

9 Adapting DL to New Data

MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning• Evolutionary algorithm as a solution for searching hyper-

parameter space for deep learning

– Focus on Convolutional Neural Networks

– Evolve only the topology with EA; typical SGD training process

– Generally: Provide scalability and adaptability for many data sets and compute platforms

• Leverage more GPUs; ORNL’s Titan has 18k GPUs

– Next generation, Summit, will have increased GPU capability

• Provide the ability to apply DL to new datasets quickly

– Climate science, material science, physics, etc.

Page 10: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

10 Adapting DL to New Data

Designing the Genetic Code

• Goal: facilitate complete network definition exploration

• Each population member is a network which has a genome with sets of genes

– Fixed width set of genes corresponds to a layer

• Layers contain multiple distinct parameters

– Restrict layer types based on section

• Feature extraction and classification

• Minor guided design in network, otherwise we attempt to fully encompass all layer types

Population – Group of Networks

Individual - Network

Feature Layers

Parameters

… …

Classification Layers

… …

Page 11: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

11 Adapting DL to New Data

MENNDL: Communication

Genetic Algorithm

Master

Gene:

Population

Network

Parameters

Network 1

Parameters,

Model

Predictions

Performance

Metrics

Network 2

Parameters,

Model

Predictions

Performance

Metrics

Network N

Parameters,

Model

Predictions

Performance

Metrics

Fitness

Metrics:

Accuracy

Worker

(one per node)

MPI

Page 12: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

12 Adapting DL to New Data

Hyper-parameter Values vs Performance

• Currently T&E of latest code that changes all possible parameters (e.g., # of layers, layer types, etc)

• Using just 4 nodes

• From 27% to 65% Accuracy

Evolved

Page 13: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

13 Adapting DL to New Data

MINERvA

Page 14: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

14 Adapting DL to New Data

MINERvA Vertex Segment Classification

Goal: Classify which segment the vertex is

located in.

Challenge: Events can have

very different characteristics.

Page 15: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

15 Adapting DL to New Data

Benefit of Parallelization

MINERvA dataset

12 hours

6 hours

2 hours

Page 16: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

16 Adapting DL to New Data

Unusual layers

• Second convolution layer has a kernel size of 29• Followed by MAX Pooling layer with kernel size of 19

Page 17: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

17 Adapting DL to New Data

Unusual Layers (limited training examples)

Page 18: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

18 Adapting DL to New Data

Current Status

• Scaled to 15,000 nodes of Titan

• 460,000 Networks evaluated in 24 hours

• Expanding to more complex topologies

• Evaluating on a wide range of science datasets

Page 19: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

19 Adapting DL to New Data

Acknowledgements

• Gabriel Perdue (FNAL) and Sohini Upadhyay (University of Chicago)

• Adam Terwilliger (Grand Valley State University) and David Isele (University of Pennsylvania)

• Robert Patton, Seung-Hwan Lim, Thomas Karnowski, and Derek Rose (ORNL)

Page 20: New Adapting DL to New Data: An Evolutionary Algorithm for … · 2017. 5. 5. · 9 Adapting DL to New Data MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning •Evolutionary

20 Adapting DL to New Data

Questions