20
Apache Horn (Incubating) a Large-scale Deep Learning Platform Edward J. Yoon @eddieyoon Oct 15, 2015 @ R3 Diva-Hall, Samsung Electronics

Introduction to apache horn (incubating)

Embed Size (px)

Citation preview

Page 1: Introduction to apache horn (incubating)

Apache Horn (Incubating)a Large-scale Deep Learning Platform

Edward J. Yoon @eddieyoonOct 15, 2015 @ R3 Diva-Hall, Samsung Electronics

Page 2: Introduction to apache horn (incubating)

I am ..● Member of Apache Software Foundation● PMC member and committer, or Mentor of

○ Apache Incubator, ○ Apache Hama, Apache Horn, Apache MRQL, ○ and Apache Rya, Apache BigTop.

● Cloud Tech Lab, Software R&D Center.○ HPC Cloud (Network Analysis, ML & DNN)

Page 3: Introduction to apache horn (incubating)

What’s Apache Horn?

Horn [hɔ:n]: 얼(혼) 魂 = Mind

● Horn is a clone project of Google’s DistBelief, supports both data and model parallelism.○ Apache Incubator Project (Since Sep 2015)○ 9 initial members are from Samsung Electronics, Microsoft, Cldi Inc,

LINE plus, TUM, KAIST, …, etc.

Page 4: Introduction to apache horn (incubating)

Google’s DistBelief● GPUs are expensive, both to buy and to rent. ● Most GPUs can only hold a relatively small amount of data in

memory and CPU-to-GPU data transfer is very slow. ○ Therefore, the training speed-up is small when the model

does not fit in GPU memory.

● DistBelief is a framework for training deep neural networks that avoids GPUs-only approach (for the above reasons) and solves the problems with a large number of examples and dimensions (e.g., high-resolution images).

Page 5: Introduction to apache horn (incubating)

Google’s DistBelief

● It supports both Data and Model Parallelism○ Data Parallelism: The training data is partitioned

across several machines each having its own replica of the model. Each model trains with its partition of the data in parallel.

○ Model Parallelism: The layers of each model replica are distributed across machines.

Page 6: Introduction to apache horn (incubating)

DistBelief: Basic ArchitectureEach worker group performs minibatch in BSP paradigm, and interacts with Parameter Server asynchronously.

Page 7: Introduction to apache horn (incubating)

What’s BSP?● Bulk Synchronous Parallel

It was developed by Leslie Valiant of Harvard University during the 1980s.

● Iteratively:a. Local Computationb. Communication (Message Passing)c. Global Barrier Synchronization

Page 8: Introduction to apache horn (incubating)

DistBelief: Batch Optimization

Coordinator 1) finds stragglers (slow tasks) for better load balancing and resource usage. It similar to Google MapReduce’s “Backup Tasks” 2) reduces communication overheads between the central Parameter Server and workers something like Aggregators.

Page 9: Introduction to apache horn (incubating)

As a result:● CPU cluster to train deep networks significantly faster

than a GPU, w/o limitation on the max size of model.○ CPU cluster is 10x faster than a GPU.

● Trained a model with over 1 billion parameters to achieve better than state-of-the-art performance on ImageNet challenge.

Nov 2012: IBM simulates 530 billion neurons, 100 trillion synapses * 1,572,864 processor cores, 1.5 PB memory, and 6,291,456 threads.

Page 10: Introduction to apache horn (incubating)

Wait, .. Why do we need this?● Deep learning is likely to spur other applications beyond

speech and image recognition in the nearer term. ○ e.g., medicine, manufacturing, and transportation.

Page 11: Introduction to apache horn (incubating)

and, it’s a Closed Source Software● We needs to solve size matters (training set and the

size of neural networks), but many OSS such as Caffe, DeepDist, Spark MLlib, Deeplearning4j, and NeuralGiraph are data or model parallel only.

● So, we started to clone the Google’s DistBelief, called Apache Horn (Incubating).

Page 12: Introduction to apache horn (incubating)

The key idea of implementation

● .. is to use existing OSS distributed systems○ Apache Hadoop: Distributed File System, Resource

Manager.○ Apache Hama: general-purpose BSP computing

engine on top of Hadoop, which can be used for Both data-parallel and graph-parallel in flexible way.

Page 13: Introduction to apache horn (incubating)

Apache Hama: BSP framework

BSP frameworkon Hama or YARN

Hadoop HDFS

Task 1 Task 2 Task 3 Task N...

Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes.

BSP tasks are globally synchronized after performing computations on local data and communication actions.

Page 14: Introduction to apache horn (incubating)

Global Regional Synchronization

BSP frameworkon Hama or YARN

Hadoop HDFS

Task 1

Task 2Task 3

Task 4

Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes.

All tasks within the same group are synchronized with each others. Each group works asynchronously as independent BSP job.

...Task 6

Task 5

Page 15: Introduction to apache horn (incubating)

Async mini-batches using Regional Synchronization

BSP frameworkon Hama or YARN

Hadoop HDFS

Task 1

Task 2Task 3

Task 4

Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes.

...

Task 5

Task 6

Each group performs minibatch in BSP paradigm, and interacts with Parameter Server asynchronously.

Parameter Swapping

Parameter Server Parameter Server

Page 16: Introduction to apache horn (incubating)

BSP frameworkon Hama or YARN

Hadoop HDFS

Task 1

Task 2Task 3

Task 4

Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes.

...

Task 5

Task 6

One of group works as a Coordinator

Each group performs minibatch in BSP paradigm, and interacts with Parameter Server asynchronously.

Parameter Swapping

Async mini-batches using Regional Synchronization

Parameter Server Parameter Server

Page 17: Introduction to apache horn (incubating)

Neuron-centric Programming APIs

User-defined neuron-centric programming APIs:

The activation and cost functions computes the propagated information, or error messages and sends its updates to Parameter Server (but not fully designed yet).

Similar to Google’s Pregel.

Page 18: Introduction to apache horn (incubating)

Job Configuration APIs /* * Sigmoid Activation Function */ public static class Sigmoid extends ActivationFunction { public double apply(double input) { return 1.0 / (1 + Math.exp(-input)); } }

... public static void main(String[] args) { ANNJob ann = new ANNJob();

// Initialize the topology of the model ann.addLayer(int featureDimension, Sigmoid.class, int numOfTasks); ann.addLayer(int featureDimension, Step.class, int numOfTasks); ann.addLayer(int featureDimention, Tanh.class, int numOfTasks); …

ann.setCostFunction(CrossEntropy.class); ..}

Page 19: Introduction to apache horn (incubating)

Job Submission Flow

BSP framework onApache Hama or YARN

clusters

Task 1

Task 4

Task 7

Task 2 Task 3

Task 5 Task 6

Task 8 Task 9

Parameter Server

Parameter Server

Parameter Swapping

One of worker group works as a Coordinator

Hadoop HDFS

Data Parallelism

Model Parallelism

Apache Horn

Client and Web UIUser’sANN Job

Page 20: Introduction to apache horn (incubating)

Horn Community● https://horn.incubator.apache.org/● https://issues.apache.org/jira/browse/HORN● Mailing lists

[email protected]