Upload
braincreators
View
564
Download
2
Embed Size (px)
Citation preview
Hello World
BrainCreators
BrainCreators builds software solutions
AI technology is our comfort zone
Data is our source of inspiration
1. What is deep learning?OverviewConvolutional Neural Network
2. Case studyAutomated category assignmentDiscovering new categories
Today
Machine Learning
Machine learning > Artificial neural networks
1011 neurons 104 synapses per neuron1016 “operations” per second
Cortex: 2.500 cm 2, 2 mm thick 1.4 kg, 1.7 liters 250 million neurons per mm3 .180,000 km of “wires” 25 Watts
Learning causes synapses to strengthen or weaken, to appear or disappear.
The Human Brain
8x1012 operations/second 500 Watts 5760 (small) cores $3000
Are we only a factor 10,000 away from the power of the human brain?Probably more like 1 million; synapses are complicated A factor of 1 million is 30 years of Moore's Law 2045?
The Machine BrainNVIDIA Titan-Z GPU
Neural Networks
Inspired by the architecture of the brain, researchers wanted to train neural networks for the last 25 years
We have had good algorithms for learning the weights in networks with 1 hidden layer.
But no successful attempts for deep layers were reported before 2006 …
Deep learning
‘Deep Learning’ means using a neural network with several (hidden) layers of nodes between input and output
The series of layers between input & output do feature identification and processing in a series of stages, just as our brains seem to.
what’s new: algorithms for training networks better hardware (faster / more cores)a lot more data (the internet)
Deep learningTrain in steps as autoencoders.
having an input layer, an output layer and one or more hidden layers connecting them –, but with the output layer having the same number of nodes as the input layer, and with the purpose of reconstructing its own inputs
By making this happen with (many) fewer units than the inputs, this forces the ‘hidden layer’ units to become good feature detectors
Deep learning algorithmsConvolutional Neural Network
Deep Belief Network
Restricted Boltzmann Machine
Deep Reinforcement Learning
Deep Q Learning
Hierarchical Temporal Memory
Stacked Denoising Autoencoders
Convolutional Neural Networkfor Image classification
X
O
translationweight
rotation
CNN
CNN
scaling
What computers see
=?
Filtering1 x 1 = 1
Filtering
0.55
(1+1-1+1+1+1-1+1+1) / 9 = 0.55
Convolution: Apply every possible match
Convolution layerOne image becomes a stack of filtered images
Pooling: Reducing size1. Pick a window size (usually 2 or 3).
2. Pick a stride (usually 2).
3. Walk your window across your filtered images.
4. From each window, take the maximum value.
Poolingmaximum
Pooling
max pooling
Rectified Linear Units (ReLUs)Remove negative values
X
Fully connected layer
O.92
.51Weighted
Deep Stacking
Conv
olut
ion
ReLU
Pool
ing
Conv
olut
ion
ReLU
Conv
olut
ion
ReLU
Pool
ing
Fully
conn
ecte
d
Fully
conn
ecte
d XO
.92
.51
Put it all together
Training/Learning
Q: Where do all the magic numbers come from?Features in convolutional layersVoting weights in fully connected layers
A: Backpropagation
Learned
ApplicationsSpeech recognition
Image recognition
Natural language processing
Recommendation systems
Customer relationship management
Etc...
A Case StudyHow to categorise data from 100 scraped webshops?
Object recognitionAutomatically Classify Product Images
Dress Boot
Problem Description
How to create training data
How can we discover “the right” categories
ToolsNvidia Digits / Caffe
Deep learning
ElasticsearchDocument store
Apache SparkDistributed clustering
ScrapyDistributed scraping
Scraped JSON Data"productID": "37801580","price": 12400,"originalPrice": 12400,"name": "Dirk Bikkembergs Polo Shirt","priceCurrency": "USD","url": "http://www.yoox.com/us/37801580KS/item","brand": "DIRK BIKKEMBERGS","description": "Dirk Bikkembergs Men Polo Shirt on YOOX.COM. The best online selection of Polo Shirts Dirk Bikkembergs. YOOX.COM exclusive items of Italian and international designers - Secure payments","seller": "Yoox","variantID": "66","image": [ "http://images.yoox.com/37/37801580ks_12_f.jpg", "http://images.yoox.com/37/37801580ks_12_r.jpg"],"availability": "InStock","raw_tag": [ "Polo shirts", "T-Shirts and Tops", "men", "jersey, solid color, polo collar, long sleeves, logo, no pockets", "46% Cotton, 46% Modal, 8% Spandex", "DIRK BIKKEMBERGS"],
Giant product database5 million products, uncategorised
Basic Steps
Initialization Phase - creating initial training data
Feature Discovery / Refinement Phase
Creating Initial training set
Manually pre defined ontology / categorization
Text search by tag
Use simple image featuresEdge HistogramJCD
Create 50 categories with 500 samples Generate rough initial neural network
Initial Phase: Create labeled training data
Initial Categoriesacc_bagacc_clothing_beltsacc_clothing_bowtieacc_clothing_glovesacc_clothing_tieacc_glasses_glassesacc_hats acc_jewelryclothing_bodywear_bodysuitclothing_bodywearclothing_bottomwear_bikiniclothing_bottomwear_leggingsclothing_bottomwear_pajamaclothing_bottomwear_pantsclothing_bottomwear_pants_jeansclothing_bottomwear_short clothing_bottomwear_skirtclothing_dresswear
clothing_footwearclothing_hosieryclothing_topwear_blazerclothing_topwear_blouseclothing_topwear_coat clothing_topwear_jacketclothing_topwear_poloclothing_topwear_sweaterclothing_topwear_tankclothing_topwear_tshirtother_bedother_electronicsother_homeother_media_bookother_media_moviesother_other_cosmetics
Manual labeling - by tag
search by tag
We would select the top matches and assign the training label
Simple Image features
LIRE: Lucene Image RetrievalLIRE is a Java library that provides a simple way to retrieve images and photos based on color and texture characteristics. LIRE creates a Lucene index of image features for content based image retrieval (CBIR) using local and global state-of-the-art methods. Easy to use methods for searching the index and result browsing are provided. Best of all: it's all open source.
Edge Histogram example
Manual labeling
search by image
We created our own Elasticsearch Plugin to help us visually search and sort by simple image features
by image feature
Training a neural network
Define a network
Define a dataset
Transfer learning Alexnet1.5 million training examples
1000 categories
Define Digits Model
Create Dataset Dataset
Training A Model
Applying modelHigh Confidence Low Confidence
Phase 2Improving categories by clustering features and human selection
Clustering using Kmeans
Applied Feature Discovery ProcessCreate deep learn model
Apply model to data
Automatically create new clusters using kmeans
Human verify useful clusters
Build new training set from top confidence
Create new cluster labels
Clustering (Bracelets)
Clustering (Bracelets)
Clustering (Bracelets)
Clustering (Sneakers)
Clustering (Sneakers)
Clustering (Sneakers)
Final Categoriesacc_bag_backpack, acc_bag_backpack_human, acc_bag_briefcase, acc_bag_bucket, acc_bag_bucket_human, acc_bag_clutch, acc_bag_duffel, acc_bag_hobo, acc_bag_messenger, acc_bag_pouch, acc_bag_satchel, acc_bag_shoulderbag, acc_bag_shoulderbag_human, acc_bag_suitcase, acc_bag_totes, acc_bag_wallet, acc_clothing_belts, acc_clothing_belts_human, acc_clothing_bowtie, acc_clothing_cufflinks, acc_clothing_gloves, acc_clothing_gloves_human, acc_clothing_suspenders, acc_clothing_tie, acc_clothing_umbrella, acc_glasses_glasses, acc_glasses_glasses_human, acc_glasses_sunglasses, acc_glasses_sunglasses_human, acc_hats_beanie, acc_hats_beanie_human, acc_hats_bucket, acc_hats_bucket_doll, acc_hats_bucket_human, acc_hats_cap, acc_hats_cap_human, acc_hats_fedora, acc_hats_fedora_doll, acc_hats_fedora_human, acc_headwear_headband, acc_headwear_headband_human, acc_headwear_scarf, acc_headwear_scarf_human, acc_jewelry_bracelet, acc_jewelry_brooche, acc_jewelry_earrings, acc_jewelry_earrings_human, acc_jewelry_hairpins, acc_jewelry_hairpins_human, acc_jewelry_necklace, acc_jewelry_necklace_human, acc_jewelry_ring, acc_jewelry_watch, acc_jewelry_watch_strap, clothing_bodywear_bodysuit, clothing_bodywear_bodysuit_human, clothing_bodywear_bodysuit_zoom, clothing_bodywear_swimsuit, clothing_bodywear_swimsuit_human, clothing_bottomwear_bikini, clothing_bottomwear_bikini_human, clothing_bottomwear_leggings_capri, clothing_bottomwear_leggings_capri_human, clothing_bottomwear_leggings_default, clothing_bottomwear_leggings_default_human, clothing_bottomwear_leggings_sport, clothing_bottomwear_pajama_female, clothing_bottomwear_pajama_human, clothing_bottomwear_pajama_human_female, clothing_bottomwear_pajama_human_male, clothing_bottomwear_pants, clothing_bottomwear_pants_chino, clothing_bottomwear_pants_chino_human_female, clothing_bottomwear_pants_chino_human_male, clothing_bottomwear_pants_human, clothing_bottomwear_pants_human_female, clothing_bottomwear_pants_jeans, clothing_bottomwear_pants_jeans_boyfriend, clothing_bottomwear_pants_jeans_boyfriend_human, clothing_bottomwear_pants_jeans_human_female, clothing_bottomwear_pants_jeans_human_male, clothing_bottomwear_pants_jeans_skinny, clothing_bottomwear_pants_jeans_skinny_human, clothing_bottomwear_pants_jeans_skinny_zoom, clothing_bottomwear_pants_jeans_straight, clothing_bottomwear_pants_jeans_straight_human_female, clothing_bottomwear_pants_jeans_straight_human_male, clothing_bottomwear_pants_jeans_wideflaredbootcut, clothing_bottomwear_pants_jeans_wideflaredbootcut_human, clothing_bottomwear_pants_joggers_sweatpants,...
Conclusion/ResultsConclusions:From chaotic data we can discover unknown categories and classify it thanks to a loop workflow.
Discover new categories.- How: Feature Analysis
Classify/structure the data.- How: Deep Learning
Repeat
Results:
More than 95% from 5M products classified with confidence > 0.96:
More than 250 new labeled categories:
Thank you! Please stay in touchGerbert [email protected]
Jasper [email protected]
BrainCreators
Prinsengracht 7961017JV Amsterdam
The Netherlands
www.braincreators.com