19
DEEP LEARNING IN NEURAL NETWORKS Tanushri Sarma CSI14007 Roshan Chettri CSI14029

Deep Learning

Embed Size (px)

Citation preview

DEEP LEARNING IN NEURAL NETWORKS

Tanushri Sarma CSI14007Roshan Chettri CSI14029

Deep Learning: DefinitionDeep learning is a set of algorithms in machine learning that attempt to learn in multiple levels, corresponding to different levels of abstraction. It typically uses artificial neural networks. The levels in these learned statistical models correspond to distinct levels of concepts, where higher-level concepts are defined from lower-level ones, and the same lower level concepts can help to define many higher-level concepts.

Deep Learning Overview• Train networks with many layers (vs. shallow nets with just a couple

of layers)• Multiple layers work to build an improved feature space

– First layer learns 1st order features (e.g. edges…)– 2nd layer learns higher order features (combinations of first layer

features, combinations of edges, etc.)– In current models layers often learn in an unsupervised mode

and discover general features of the input space – serving multiple tasks related to the unsupervised instances (image recognition, etc.)

– Then final layer features are fed into supervised layer(s)• And entire network is often subsequently tuned using

supervised training of the entire net, using the initial weightings learned in the unsupervised phase

– Could also do fully supervised versions, etc.

A Three-Way Categorization

1. Deep networks for unsupervised or generative learning,

which are intended to capture high-order correlation of the

observed or visible data for pattern analysis or synthesis purposes when no information about target class labels is available.

2. Deep networks for supervised learning, which are intended

to directly provide discriminative power for pattern classification

purposes, often by characterizing the posterior distributions

of classes conditioned on the visible data.

3. Hybrid deep networks, where the goal is discrimination which

is assisted, often in a significant way, with the outcomes of generative or unsupervised deep networks.

Supervised Learning

Unsupervised Learning

Deep Learning Architectures

• Deep Neural Networks

• Deep Belief Networks

• Convolutional Neural Networks

• Deep Boltzmann Machines

Deep Neural Networks

• A deep neural network (DNN) is an artificial neural network with multiple hidden layers of units between the input and output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships. DNN architectures, e.g., for object detection and parsing generate compositional models where the object is expressed as layered composition of image primitives. The extra layers enable composition of features from lower layers, giving the potential of modelling complex data with fewer units than a similarly performing shallow network.

Deep Belief Networks

• Geoff Hinton (2006)• Uses Greedy layer-wise training but each layer is an RBM

(Restricted Boltzmann Machine)• RBM is a constrained

Boltzmann machine with– No lateral connections between

hidden (h) and visible (x) nodes– Symmetric weights– Does not use annealing/temperature, but that is all right since

each RBM not seeking a global minima, but rather an incremental transformation of the feature space

– Typically uses probabilistic logistic node, but other activations possible

Convolutional Neural networks Each layer combines (merges, smoothens) patches from previous layers

– Typically tries to compress large data (images) into a smaller set of robust features, based on local variations

– Basic convolution can still create many features•Pooling –

This step compresses and smoothens the data– Make data invariant to small translational changes– Usually takes the average or max value across disjoint patches

•Often convolution filters and pooling are hand crafted – not learned, though tuning can occur•After this hand-crafted/non-trained/partial-trained convolving the new set of features are used to train a supervised model•Requires neighborhood regularities in the input space (e.g. images, stationary property)

.

Convolutional Neural networks

Deep Boltzmann Machines

• A Deep Boltzmann Machine (DBM) is a type of binary pairwise Markov random field (undirected probabilistic graphical models) with multiple layers of hidden random variables. It is a network of symmetrically coupled stochastic binary units.

Like DBNs, they benefit from the ability of learning complex and abstract internal representations of the input in tasks such as object or speech recognition, with the use of limited number of labelled data to fine-tune the representations built based on a large supply of unlabelled sensory input data.

Why GPU in Deep Learning?

• Progress in AI

IDEA

CODE

TRAIN

TEST

Why GPU in Deep Learning?

• One important thing that determines our progress in AI is the Latency that it takes to go from Idea to Tested model and then we can go around the circle again.

• We need to be able to express our ideas about models in code quickly and run them quickly on hardware.

• Boiling out model down into something that can actually run could take years. The Latency would be really long.

Why GPU in Deep Learning?

• We need something that is programmable. We need to be able to change our ideas about what our model should look like and just compile and run them.

• This is one of the main advantage of GPU. That is why GPUs have been so popular in the field of Deep Learning.

• Another thing that is really important is the time it takes to train. So this is where HPC and Parallel Computing comes in.

CPUs Vs. GPUs• CPU and GPU are designed very differently.

CPUs Vs. GPUs• CPU: Latency oriented design.

CPUs Vs. GPUs• GPU: Throughput oriented design.

THANK YOU