Deep Learning and its Applications - Computer Vision

{Deep Learning

And Its Applications: Computer Vision

Adam Gibson{ deeplearning4j.org // skymind.io // zipfian academy

• Object Recognition• Image Categorization• Scene Parsing• Face Recognition

Computer Vision: A Primer

• OpenCV • SIFT• Filters/Edge Detection• Feature Extraction

What’s currently done?

• Representation Learning • More precise than hand-done

features• Non-linearities and higher-

order trends• Pretrain and Hessian Free

This is manual!

• Representation Learning• Position Invariance with

convolutions• Semantic Hashing

Deep Learning and Images

• Normal pixels – 0-255 – normalization

• Sparse – binarization (depending on pixel presence)

Different kinds of images

• Faces = a collection of images.• With persistent patterns of pixels.• Pixel patterns = features.• Nets learn to identify features in data, to

classify faces as faces and label them: John or Sarah.

• Nets train by reconstructing faces from features many times.

• Measuring their work against a benchmark.

Facial recognition

DL4J’s Facial Reconstructions

• Slices of a feature space (Max pooling)• Learns different portions for easily

scalable and robust feature engineering.

Position Invariance - Convolutions

Visual Example - Convolutions

Pen Strokes

• Facebook uses facial recognition to make itself stickier and know more about us.

• Government agencies use it to secure national borders.

• Video game makers use it to construct more realistic worlds.

• Stores use it to identify customers and track behavior.

What are faces for?

• 2 layers of neuron-like nodes.• The 1st is the visible, or input, layer• The 2nd is “hidden.” It identifies features in

input• Symmetrically connected.• “Restricted” = no visible-visible or hidden-

hidden ties• All connections happen between layers.

Restricted Boltzmann Machines (RBMs)

• A stack of RBMs.• Each RBM’s hidden layer Next RBM’s

visible/input layer. • DBNs learn more & more complex features• Example:

• 1) Pixels = input; • 2) H1 learns an edge or line; • 3) H2 learns a corner or set of lines; • 4) H3 learns two groups of lines forming an

object -- a face!• Final layer classifies feature groups: sunset,

elephant, flower, John, Sarah.

Deep-Belief Net (DBN)

• 2 DBNs.• 1st DBN *encodes* data into vector of 10-30

numbers = Pre-training.• 2nd DBN decodes data into original state.• Backprop only happens on 2nd DBN• 2nd is the fine-tuning stage (reconstruction

entropy).• Reduces documents or images to compact

vectors .• Useful in search, QA and information

retrieval.

Deep Autoencoder

Deep Autoencoder Architecture

Image Search Results

• Top-down & hierarchical rather than feed-forward (DBNs).

• Handles sequence-based classification, windows of several events, entire scenes (multiple objects).

• Features themselves are vectors. • A tensor = a multi-dimensional matrix, or multiple

matrices of the same size.

Recursive Neural Tensor Net

RNTNs & Scene Composition

Engineering

Deep Learning and its Applications - Computer Vision