31
UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko IT Faculty University of Jyväskylä TIES4911 Deep-Learning for Cognitive Computing for Developers Spring 2020

Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

Lecture 6: Computer Vision (part 2)

by:Dr. Oleksiy Khriyenko

IT FacultyUniversity of Jyväskylä

TIES4911 Deep-Learning for Cognitive Computing for DevelopersSpring 2020

Page 2: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

I am grateful to all the creators/owners of the images that I found from Google and haveused in this presentation.

2

Acknowledgement

27/02/2020 TIES4911 – Lecture 6

Page 3: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

3

SSDSSDSSD (Single-Shot Detector) provides enormous speed gains over Faster R-CNN, but does so in a markedly differentmanner. In contrast to previous models that used a region proposal network to generate regions of interest and furthereither fully-connected layers or position-sensitive convolutional layers to classify those regions, SSD does the two in a“single shot”, simultaneously predicting the bounding box and the class as it processes the image.

Without region proposal SSD skips filtering step where we get object regions (with certain confidence). SSD classifiesand draws bounding boxes from every single position in the image, using multiple different shapes, at several differentscales. As a result, a much greater number of bounding boxes are generated, and nearly all of the them are negativeexamples. To fix this imbalance, SSD does:§ non-maximum suppression to group together highly-overlapping boxes into a single box. In other words, if four boxes of

similar shapes, sizes, etc. contain the same dog, NMS would keep the one with the highest confidence and discard the rest.§ hard negative mining to balance classes during training, only a subset of the negative examples with the highest training

loss (i.e. false positives) are used at each iteration of training. SSD keeps a 3:1 ratio of negatives to positives.

Relevant links:https://arxiv.org/abs/1512.02325https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fabhttps://towardsdatascience.com/review-ssd-single-shot-detector-object-detection-851a94607d11https://towardsdatascience.com/deep-learning-for-object-detection-a-comprehensive-review-73930816d8d9https://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.htmlhttps://github.com/pierluigiferrari/ssd_kerashttps://lambdalabs.com/blog/how-to-implement-ssd-object-detection-in-tensorflow/https://github.com/balancap/SSD-Tensorflowhttps://github.com/weiliu89/caffe/tree/ssdhttp://silverpond.com.au/2017/02/17/how-we-built-and-trained-an-ssd-multibox-detector-in-tensorflow.html

TIES4911 – Lecture 627/02/2020

Page 4: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

4

SSD

Relevant links:https://deepsystems.ai

Bounding-box correction, classification and filtering

TIES4911 – Lecture 627/02/2020

Page 5: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

5

SSD

Relevant links:https://deepsystems.ai

5*5*3=75 boxes

get resulting bbox, applyingcorrection to the default b-box

Bbox position correction vector

20 classes + background = 21 Probability for 21 classes

Confidence based filtering

TIES4911 – Lecture 627/02/2020

Page 6: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

6

SSD

Relevant links:https://deepsystems.ai

TIES4911 – Lecture 627/02/2020

Page 7: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

7

OSVOSOSVOS: One-Shot Video Object Segmentation is a State-of-the-Art Results inAccuracy and Speed (http://people.ee.ethz.ch/~cvlsegmentation/osvos/)

Relevant links:http://openaccess.thecvf.com/content_cvpr_2017/papers/Caelles_One-Shot_Video_Object_CVPR_2017_paper.pdfhttps://github.com/scaelles/OSVOS-TensorFlow

OSVOS is a method that tackles the task of semi-supervised video object segmentation. It is based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned onImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated objectof the test sequence (hence one-shot). Experiments on DAVIS 2016 show that OSVOS is faster than currentlyavailable techniques and improves the state of the art by a significant margin (79.8% vs 68.0%).

TIES4911 – Lecture 627/02/2020

Page 8: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

8

YOLOYOLO (You Only Look Once) is a system for detecting objects on the Pascal VOC 2012 dataset.(http://host.robots.ox.ac.uk:8080/pascal/VOC/)

Relevant links:http://arxiv.org/abs/1506.02640https://pjreddie.com/darknet/yolo/https://towardsdatascience.com/yolov1-you-only-look-once-object-detection-e1f3ffec8a89https://www.youtube.com/watch?v=VOC3huqHrsshttps://github.com/llSourcell/YOLO_Object_Detectionhttps://www.youtube.com/watch?v=4eIBisqx9_g

In turn, YOLO applies a single neural network to the full image, divides image into regions and predicts boundingboxes and its probabilities for each region (2 bboxes in YOLO, and up to 5 in YOLOv2) as well as predict the class ofthe corresponding object. These bounding boxes are weighted by the predicted probabilities to only highlight the highscoring detections. Combined score of both (bounding box and object class) predictions finally is assessed.

Making predictions with a single network evaluation, YOLO becomes more than 1000x faster than R-CNN (whichrequire thousands for a single image) and 100x faster than Fast R-CNN.

It can detect the 20 Pascal object classes:§ person§ bird, cat, cow, dog, horse, sheep§ aeroplane, bicycle, boat, bus, car, motorbike, train§ bottle, chair, dining table, potted plant, sofa, tv/monitor

Unlike SSD and YOLO, prior detection systems repurposeclassifiers or localizers to perform detection and apply themodel to an image at multiple locations and scales, andhigh scoring regions of the image are considereddetections.

TIES4911 – Lecture 627/02/2020

Page 9: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

9

YOLO

add new conv layers to improve the performance…

Relevant links:https://deepsystems.ai

TIES4911 – Lecture 627/02/2020

Page 10: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

10

YOLO

Total 7*7*2=98 bboxes

Relevant links:https://deepsystems.ai

TIES4911 – Lecture 627/02/2020

Page 11: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

YOLO

11

Relevant links:https://deepsystems.ai

TIES4911 – Lecture 627/02/2020

Page 12: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

YOLO

Do this procedure for all classes…

Relevant links:https://deepsystems.ai

12TIES4911 – Lecture 627/02/2020

Page 13: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

YOLO :§ Makes more localization errors, but it less likely to predict false positives on background§ Is fast, but not too precise§ Good performance in combination with fast R-CNN

In YOLOv1 Image is divided to 7x7 grid where each cell predicts 2 bounding boxes. In the end, youwill get a tensor value of 7*7*30. For every grid cell, you will get two bounding boxes, which willmake up for the starting 10 values of the 1*30 tensor. The remaining 20 denote the number ofclasses.

YOLOv2 uses a few tricks to improve training and increase performance. Being trainedsimultaneously on several datasets (COCO detection and ImageNet classification datasets)YOLO9000 (Better, Faster, Stronger) predicts detections for more than 9000 different objectcategories. The full YOLOv2 model uses three times as many layers and has a slightly morecomplex shape, but it’s still just a regular convnet. It uses 13x13 grid where each cell predicts 5bounding boxes. There is no fully-connected layer in YOLOv2. Last convolutional layer has a 1×1kernel and exists to reduce the data to the shape 13×13×125.

The “tiny” version of YOLO (Tiny YOLO) has only these 9 convolutional layers and 6 pooling layers and canbe used in mobile apps. YOLO Tiny performs 155 fps (YOLOv1 only 45 fps).

YOLO v3: Better, not Faster, Stronger… On top of original Darknet 53 layer network trained on Imagenet, 53more layers are added giving us a 106 layer fully convolutional underlying architecture for YOLO v3.

13

YOLO

Relevant links:https://arxiv.org/abs/1612.08242https://www.youtube.com/watch?v=GBu2jofRJtkhttps://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088https://medium.com/diaryofawannapreneur/yolo-you-only-look-once-for-object-detection-explained-6f80ea7aaa1ehttps://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6bhttps://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html

Layer kernel stride output shape---------------------------------------------Input (416, 416, 3)Convolution 3×3 1 (416, 416, 16)MaxPooling 2×2 2 (208, 208, 16)Convolution 3×3 1 (208, 208, 32)MaxPooling 2×2 2 (104, 104, 32)Convolution 3×3 1 (104, 104, 64)MaxPooling 2×2 2 (52, 52, 64)Convolution 3×3 1 (52, 52, 128)MaxPooling 2×2 2 (26, 26, 128)Convolution 3×3 1 (26, 26, 256)MaxPooling 2×2 2 (13, 13, 256)Convolution 3×3 1 (13, 13, 512)MaxPooling 2×2 1 (13, 13, 512)Convolution 3×3 1 (13, 13, 1024)Convolution 3×3 1 (13, 13, 1024)Convolution 1×1 1 (13, 13, 125)---------------------------------------------

The architecture of Tiny YOLO:

TIES4911 – Lecture 627/02/2020

Page 14: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

Main changes implemented in YOLOv3:

• Darknet + ResNet as the base model: The new Darknet-53 still relies on successive 3x3 and 1x1 conv layers, just like theoriginal dark net architecture, but has residual blocks added.

• Multi-scale prediction: Inspired by image pyramid, YOLOv3 adds several conv layers after the base feature extractor model andmakes prediction at three different scales among these conv layers. Thus, it has to deal with many more bounding box candidatesof various sizes overall.

• Skip-layer concatenation: YOLOv3 also adds cross-layer connections between two prediction layers (except for the output layer)and earlier finer-grained feature maps. The model first up-samples the coarse feature maps and then merges it with the previousfeatures by concatenation. The combination with finer-grained information makes it better at detecting small objects.

14

YOLO

Relevant links:https://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html

TIES4911 – Lecture 627/02/2020

• Logistic regression for confidence scores: YOLOv3uses logistic regression to predict an confidence score foreach bounding box (while YOLO and YOLOv2 uses sum ofsquared errors for classification terms). Linear regressionof offset prediction leads to a decrease in mAP.

• No more softmax for class prediction: to predict classconfidence, YOLOv3 uses multiple independent logisticclassifier for each class rather than one softmax layer. Ithelps especially in case when one image might havemultiple labels and not all the labels are guaranteed to bemutually exclusive.

Overall, YOLOv3 performs better and faster than SSD, and worse than RetinaNetbut 3.8x faster.

Page 15: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

Relevant links:https://github.com/pjreddie/darknet/wiki/YOLO:-Real-Time-Object-Detection

15

YOLOObject detection with the YOLO …

Darknet is an open source neural network framework written in C and CUDA made by JosephRedmon. (https://pjreddie.com/darknet/). There is Docker image that contains all the models you need to run Darknetwith the following neural networks and models (https://hub.docker.com/r/loretoparisi/darknet/): yolo real time object detection,imagenet classification, nightmare cnn inception, rnn Recurrent neural networks model for text prediction, darkgo Go game play

Darkflow is a Tensorflow version of Darknet made by Th. Thrieu (https://github.com/thtrieu/darkflow) thatallows you to train and test the YOLO model.

TIES4911 – Lecture 6

Alternatively, you may use TensorFlow implementation to use the pre-trained models. Such implementations usesconverted models (.ckpt) to be used with TensorFlow or (.h5) for Keras implementations. They also include modelconvertor (YOLO weight extractor).Implementations:q https://github.com/qqwweee/keras-yolo3q https://modelzoo.co/model/keras-yolov3q https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/q https://www.youtube.com/watch?v=PyjBd7IDYZs&index=1&list=PLX-LrBk6h3wSGvuTnxB2Kj358XfctL4BMq https://appliedmachinelearning.blog/2018/05/27/running-yolo-v2-for-real-time-object-detection-on-videos-images-via-darkflow/q https://towardsdatascience.com/yolov2-object-detection-using-darkflow-83db6aa5cf5fq https://github.com/experiencor/keras-yolo2q https://www.kaggle.com/sajinpgupta/object-detection-yolov2/notebookq https://github.com/ksanjeevan/dourflowq https://github.com/gliese581gg/YOLO_tensorflowq https://github.com/nilboy/tensorflow-yoloq https://github.com/llSourcell/YOLO_Object_Detectionq https://www.youtube.com/watch?v=lwYUmllJSvA&list=WL&index=30&t=122sq https://www.youtube.com/watch?v=XRVzuV9RexYq https://www.youtube.com/watch?v=b5_B9oNqaqE&list=WL&index=26&t=0s

YOLOv3 Object Detector with Darknet in the Cloud:https://colab.research.google.com/drive/1Mh2HP_Mfxoao6qNFbhfV3u28tG8jAVGk#scrollTo=HCs4VQmESACk

27/02/2020

Page 16: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

RetinaNet has been formed by making twoimprovements over existing single stage objectdetection models (like YOLO and SSD):

16

RetinaNet

Relevant links:https://arxiv.org/abs/1612.03144https://arxiv.org/pdf/1708.02002.pdfhttps://github.com/fizyr/keras-retinanethttps://github.com/tensorflow/tpu/tree/master/models/official/retinanethttps://towardsdatascience.com/object-detection-on-aerial-imagery-using-retinanet-626130ba2203https://medium.com/@tabdulwahabamin/an-introduction-to-implementing-retinanet-in-keras-for-multi-object-detection-on-custom-dataset-be746024c653https://mc.ai/object-detection-on-satellite-imagery-using-retinanet-part-1%E2%80%8A-%E2%80%8Atraining/https://medium.com/@14prakash/the-intuition-behind-retinanet-eb636755607dhttps://www.freecodecamp.org/news/object-detection-in-colab-with-fizyr-retinanet-efed36ac4af3/https://www.tejashwi.io/object-detection-with-fizyr-retinanet/

TIES4911 – Lecture 627/02/2020

• Feature Pyramid Network: Pyramid networks have been usedconventionally to identify objects at different scales. A FeaturePyramid Network (FPN) makes use of the inherent multi-scalepyramidal hierarchy of deep CNNs to create feature pyramids.

• Focal Loss: is an improvement on cross-entropy loss that helpsto reduce the relative loss for well-classified examples andputting more focus on hard, misclassified examples. The focalloss enables training highly accurate dense object detectors inthe presence of vast numbers of easy background examples.

Page 17: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

17

Object Detection

Related tutorials:§ https://pythonprogramming.net/introduction-use-tensorflow-object-detection-api-tutorial/§ https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10§ https://www.curiousily.com/posts/object-detection-on-custom-dataset-with-tensorflow-2-and-keras-using-python/§ https://github.com/pythonlessons/TensorFlow-object-detection-

tutorial/tree/master/5_part%20step%20by%20step%20custom%20object%20detection§ https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md§ https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb§ https://www.oreilly.com/ideas/object-detection-with-tensorflow§ https://medium.com/@WuStangDan/step-by-step-tensorflow-object-detection-api-tutorial-part-1-selecting-a-model-a02b6aabe39e

Creating an Object Detection Application on Google Cloud using TensorFlow(tutorial with billable components): https://cloud.google.com/solutions/creating-object-detection-application-tensorflow

TensorFlow Object Detection with Docker from scratch (tutorial)https://medium.com/@evheniybystrov/tensorflow-object-detection-with-docker-from-scratch-5e015b639b0b

TIES4911 – Lecture 6

TensorFlow Object Detection APIis an open source framework built on top ofTensorFlow that makes it easy to construct,train and deploy object detection models.GitHub:https://github.com/tensorflow/models/tree/master/research/object_detection

27/02/2020

YOLOv3RetinaNet

Page 18: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

18

Image SegmentationMaskRCNN tutorials:§ https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md§ https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb§ https://github.com/matterport/Mask_RCNN§ https://github.com/markjay4k/Mask-RCNN-series/blob/master/Mask_RCNN%20Install%20Instructions.ipynb

DeepLab v3:§ https://github.com/bonlime/keras-deeplab-v3-plus

Segmentation Models:§ https://github.com/bonlime/segmentation_models

Segmentation with TF:§ https://www.tensorflow.org/tutorials/images/segmentation§ https://github.com/tensorflow/models/blob/master/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb§ https://ai.googleblog.com/2018/03/semantic-image-segmentation-with.html§ https://medium.freecodecamp.org/how-to-use-deeplab-in-tensorflow-for-object-segmentation-using-deep-learning-a5777290ab6b§ https://medium.com/nanonets/how-to-do-image-segmentation-using-deep-learning-c673cc5862ef

TIES4911 – Lecture 627/02/2020

Page 19: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

19

January 2018, Google announced the launch of Cloud AutoML platform to help businesseswith limited resources and expertise, streamline and build high-quality machine learning models.Google believes this will provide opportunities for less-skilled engineers to build powerful AI systems and make AIexperts even more productive and efficient.

Relevant links:https://www.blog.google/topics/google-cloud/cloud-automl-making-ai-accessible-every-business/https://www.analyticsvidhya.com/blog/2018/01/google-cloud-automl-platform-build-models-without-coding/

Cloud AutoML Vision – is a first product, as partof the Cloud AutoML portfolio, to make it simpler totrain image recognition models. It has a drag-and-drop interface that let’s the user upload images,train the model, and then deploy those modelsdirectly on Google Cloud. Cloud AutoML Vision isbuilt on Google’s transfer learning and neuralarchitecture search technologies (among others).https://cloud.google.com/automl/https://www.youtube.com/watch?v=GbLQE2C181U

The entire Cloud AutoML platform will help businesses scale up their AI capabilities without requiring advancedmachine learning expertise. It opens the door for non-technical people to understand and utilize machine learning toimprove not only their portfolio, but their company’s as well. Google has promise more products under the CloudAutoML umbrella so those who don’t like coding can enjoy this latest renaissance in the ML and AI fields.

TIES4911 – Lecture 6

Systems and Tools

27/02/2020

Page 20: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

20

Similarly to Google, in January 2018, Facebook has come out with a platform for object detection ofit’s own - Detectron.

Relevant links:https://www.analyticsvidhya.com/blog/2018/01/facebook-launched-detectron-platform-object-detection-research/https://hackernoon.com/how-to-use-detectron-facebooks-free-platform-for-object-detection-9d41e170bbcbhttps://arxiv.org/abs/1703.06870/ ; https://arxiv.org/abs/1708.02002 ; https://arxiv.org/abs/1506.01497 ; https://arxiv.org/abs/1506.01497 ;https://arxiv.org/abs/1504.08083 ; https://arxiv.org/abs/1605.06409 ; https://arxiv.org/abs/1611.05431 ; https://arxiv.org/abs/1512.03385 ;https://arxiv.org/abs/1612.03144 ; https://arxiv.org/abs/1409.1556 ;

Detectron - a software system developed by Facebook’s AIResearch team (FAIR) that “implements state-of the art objectdetection algorithms”. It is written in Python and leverages theCaffee2 deep learning framework underneath.https://research.fb.com/facebook-open-sources-detectron/https://github.com/facebookresearch/Detectron

The goal of Detectron is to provide a high-quality, high-performance codebase for object detection research. It isdesigned to be flexible in order to support rapid implementationand evaluation of novel research.

Detectron includes implementations of the following objectdetection algorithms:§ Mask R-CNN (Marr Prize at ICCV 2017)

§ RetinaNet (Best Student Paper Award at ICCV 2017)

§ Faster R-CNN§ RPN§ Fast R-CNN§ R-FCN

using the following backbone network architectures:§ ResNeXt{50,101,152}§ ResNet{50,101,152}§ Feature Pyramid Networks (with ResNet/ResNeXt)§ VGG16Requirements:§ NVIDIA GPU (currently do not have CPU implementation), Linux, Python2§ Caffe2, various standard Python packages, and the COCO API

TIES4911 – Lecture 6

Systems and Tools

27/02/2020

Page 21: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

IBM PowerAI Vision makes computer vision withdeep learning more accessible to business users. PowerAIVision includes an intuitive toolset that empowers subjectmatter experts to label, train, and deploy deep learningvision models, without coding or deep learning expertise.https://www.ibm.com/fi-en/marketplace/ibm-powerai-vision

ImageAI a python library built to empowerdevelopers to build applications and systems with self-contained Deep Learning and Computer Vision capabilitiesusing simple and few lines of code. Built with simplicity inmind, ImageAI supports a list of state-of-the-art MachineLearning algorithms for image prediction, custom imageprediction, object detection, video detection, video objecttracking and image predictions trainings.https://github.com/OlafenwaMoses/ImageAI

Supervisely the leading platform for entirecomputer vision lifecycle. Image annotation and datamanagement tools transform your images / videos / 3d pointcloud into high-quality training data. Train your models,track experiments, visualize and continuously improvemodel predictions, build custom solution within the singleenvironment.https://supervise.lyhttps://www.youtube.com/watch?v=7eOLm_dsuz8&t=69s

21

Systems and Tools

TIES4911 – Lecture 627/02/2020

Page 22: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

Computer Vision Annotation Tool (CVAT) is free, online,interactive video and image annotation tool for computer vision. It is being usedto annotate million of objects with different properties. Many UI and UX decisionsare based on feedbacks from professional data annotation team.https://github.com/opencv/cvathttps://software.intel.com/en-us/articles/computer-vision-annotation-tool-a-universal-approach-to-data-annotation

VGG Image Annotator (VIA) is a simple and standalone manualannotation software for image, audio and video. VIA runs in a web browser anddoes not require any installation or setup. The complete VIA software fits in asingle self-contained HTML page of size less than 400 Kilobyte that runs as anoffline application in most modern web browsers.http://www.robots.ox.ac.uk/~vgg/software/via/

LabelImg is a graphical image annotation tool. It is written in Python anduses Qt for its graphical interface. Annotations are saved as XML files inPASCAL VOC format, the format used by ImageNet. Besides, it also supportsYOLO format. https://github.com/tzutalin/labelImg

LabelMe is an online annotation tool to build image databases for computervision research. http://labelme.csail.mit.edu/Release3.0/

RectLabel an image annotation tool to label images for bounding box objectdetection and segmentation. https://rectlabel.com/

COCO Annotation UI project includes the front end user interfaces thatare used to annotate COCO dataset. For more details, please visit COCO.https://github.com/tylin/coco-ui

22

Systems and Tools

TIES4911 – Lecture 627/02/2020

Page 23: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

23

Computer VisionMulti-task learning with UberNet

Kokkinos CVPR 2017: UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Visionusing Diverse Datasets and Limited Memoryhttp://openaccess.thecvf.com/content_cvpr_2017/papers/Kokkinos_Ubernet_Training_a_CVPR_2017_paper.pdf

allows training in an end-to-end manner a unifiedCNN architecture that jointly handles:§ boundary detection§ normal estimation§ saliency estimation§ semantic segmentation§ human part segmentation§ semantic boundary detection,§ region proposal generation§ object detection.

TIES4911 – Lecture 627/02/2020

Page 24: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

24

Computer VisionRealtime Multi-Person Pose Estimation

Cao et.al. CVPR 2017: Multi-Person Pose Estimation using Part Affinity Fieldshttps://arxiv.org/pdf/1611.08050.pdf

Zhe Cao: https://people.eecs.berkeley.edu/~zhecao/Github: https://github.com/ZheC/Realtime_Multi-Person_Pose_EstimationSlides: http://image-net.org/challenges/talks/2016/Multi-person%20pose%20estimation-CMU.pdfDemo: https://github.com/raymon-tian/keras_Realtime_Multi-Person_Pose_Estimation/blob/master/train/demo.ipynb

TIES4911 – Lecture 627/02/2020

Page 25: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

25

Computer VisionFace (and body pose) analysis with DenseReg

Guler et.al. CVPR 2017: DenseReg: FullyConvolutional Dense Shape Regression In-the-Wildhttps://arxiv.org/pdf/1612.01202.pdf

Web: http://alpguler.com/DenseReg.htmlVideo: https://www.youtube.com/watch?v=RZbQXpQ_wdgCode: https://github.com/ralpguler/DenseReg

Guler et.al. submitted Feb 2018: DensePose: DenseHuman Pose Estimation In The Wildhttps://arxiv.org/abs/1802.00434

Web: http://densepose.org/DensePose-RCNN System

TIES4911 – Lecture 627/02/2020

Page 26: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

Relevant links:https://arxiv.org/pdf/1412.6572.pdfhttps://arxiv.org/pdf/1710.11342.pdfhttps://arxiv.org/pdf/1703.09387.pdfhttps://openreview.net/forum?id=HknbyQbC-https://github.com/tensorflow/cleverhanshttps://blog.openai.com/adversarial-example-research/https://github.com/InnerPeace-Wu/CapsNet-tensorflowhttps://github.com/gongzhitaao/tensorflow-adversarialhttps://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-misconceptions.htmlhttps://becominghuman.ai/understand-and-apply-capsnet-on-traffic-sign-classification-a592e2d4a4ea

26

Adversarial examples

A Step-by-Step Guide to SynthesizingAdversarial Examples:http://www.anishathalye.com/2017/07/25/synthesizing-adversarial-examples/

Adversarial examples are inputs tomachine learning models that an attackerhas intentionally designed to cause themodel to make a mistake; they’re likeoptical illusions for machines.

In their work One pixel attack forfooling deep neural networks(https://arxiv.org/abs/1710.08864), Jiawei Suewith co-authors proposed a novel methodfor generating one-pixel adversarialperturbations based on differentialevolution, that requires less adversarialinformation and can fool more types ofnetworks.

TIES4911 – Lecture 627/02/2020

Page 27: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

Object recognition models (e.g.VGG) are invariant to differentobject representations (style,shape, background, etc.).Therefore, content and style can beseparated.

27

Neural Style Transfer

Relevant links:https://arxiv.org/abs/1705.04058https://arxiv.org/abs/1505.07376https://arxiv.org/abs/1508.06576https://github.com/jcjohnson/neural-stylehttps://github.com/jcjohnson/fast-neural-style

Deep Learning is more than justclassification or regression…

TIES4911 – Lecture 627/02/2020

Page 28: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

28

Neural Style Transfer

Relevant links:https://deepsystems.ai

(i,j) element is inner product ofi and j feature maps Update image pixels after backpropagation…

Style generation…

TIES4911 – Lecture 627/02/2020

Page 29: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

29

Neural Style Transfer

Relevant links:https://deepsystems.ai

Style transfer…

TIES4911 – Lecture 627/02/2020

Page 30: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

30

Neural Style Transfer

TIES4911 – Lecture 6

Tutorials and Implementations:§ https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf§ https://github.com/tensorflow/models/blob/master/research/nst_blogpost/4_Neural_Style_Transfer_with_Eager_Execution.ipynb§ https://github.com/titu1994/Neural-Style-Transfer§ https://github.com/anishathalye/neural-style§ https://github.com/cysmith/neural-style-tf§ https://github.com/lengstrom/fast-style-transfer

27/02/2020

Page 31: Lecture 6: Computer Vision (part 2)users.jyu.fi/~olkhriye/ties4911/lectures/Lecture06.pdf · UNIVERSITY OF JYVÄSKYLÄ Lecture 6: Computer Vision (part 2) by: Dr. Oleksiy Khriyenko

UNIVERSITY OF JYVÄSKYLÄ

31

Neural Style TransferDeepdream: Inceptionism

Deep Style Thin Style Deep Dream

§ https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html§ https://github.com/google/deepdream§ https://deepdreamgenerator.com/generator

TIES4911 – Lecture 627/02/2020