Upload
adam-gibson
View
915
Download
1
Embed Size (px)
DESCRIPTION
Deep Learning and its Applications - Computer Vision Zipfian Academy Meetup Deep-learning is useful in detecting anomalies like fraud, spam and money laundering; identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; recognizing faces and voices. The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.
Citation preview
{Deep Learning
And Its Applications: Computer Vision
Adam Gibson{ deeplearning4j.org // skymind.io // zipfian academy
• Object Recognition• Image Categorization• Scene Parsing• Face Recognition
Computer Vision: A Primer
• OpenCV • SIFT• Filters/Edge Detection• Feature Extraction
What’s currently done?
• Representation Learning • More precise than hand-done
features• Non-linearities and higher-
order trends• Pretrain and Hessian Free
This is manual!
• Representation Learning• Position Invariance with
convolutions• Semantic Hashing
Deep Learning and Images
• Normal pixels – 0-255 – normalization
• Sparse – binarization (depending on pixel presence)
Different kinds of images
• Faces = a collection of images.• With persistent patterns of pixels.• Pixel patterns = features.• Nets learn to identify features in data, to
classify faces as faces and label them: John or Sarah.
• Nets train by reconstructing faces from features many times.
• Measuring their work against a benchmark.
Facial recognition
DL4J’s Facial Reconstructions
• Slices of a feature space (Max pooling)• Learns different portions for easily
scalable and robust feature engineering.
Position Invariance - Convolutions
Visual Example - Convolutions
Pen Strokes
• Facebook uses facial recognition to make itself stickier and know more about us.
• Government agencies use it to secure national borders.
• Video game makers use it to construct more realistic worlds.
• Stores use it to identify customers and track behavior.
What are faces for?
• 2 layers of neuron-like nodes.• The 1st is the visible, or input, layer• The 2nd is “hidden.” It identifies features in
input• Symmetrically connected.• “Restricted” = no visible-visible or hidden-
hidden ties• All connections happen between layers.
Restricted Boltzmann Machines (RBMs)
• A stack of RBMs.• Each RBM’s hidden layer Next RBM’s
visible/input layer. • DBNs learn more & more complex features• Example:
• 1) Pixels = input; • 2) H1 learns an edge or line; • 3) H2 learns a corner or set of lines; • 4) H3 learns two groups of lines forming an
object -- a face!• Final layer classifies feature groups: sunset,
elephant, flower, John, Sarah.
Deep-Belief Net (DBN)
• 2 DBNs.• 1st DBN *encodes* data into vector of 10-30
numbers = Pre-training.• 2nd DBN decodes data into original state.• Backprop only happens on 2nd DBN• 2nd is the fine-tuning stage (reconstruction
entropy).• Reduces documents or images to compact
vectors .• Useful in search, QA and information
retrieval.
Deep Autoencoder
Deep Autoencoder Architecture
Image Search Results
• Top-down & hierarchical rather than feed-forward (DBNs).
• Handles sequence-based classification, windows of several events, entire scenes (multiple objects).
• Features themselves are vectors. • A tensor = a multi-dimensional matrix, or multiple
matrices of the same size.
Recursive Neural Tensor Net
RNTNs & Scene Composition