Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBN

Preview:

Citation preview

Twenty Billion NeuronsBerlin & Toronto based Video Understanding Company

DOMESTIC COMPANIONS AUGMENTED REALITY

AUTOMOTIVE

(10M cars)

(85M smart cameras) (6M AR glasses)

COLLABORATIVE ROBOTICS

(150M cobots)

SMARTPHONE APPS

(3 BN phones)

All figures are estimated number of devices in 2020

By 2020:

(CONSUMER VIDEOS)(80% of Internet Traffic)

Sources: KPCB, Barclays

DogCat

15 people, 3 street signs

2012 2014 2016 2017

“Neural networks can’t doimage

classification”

“Neural networks can’t

translate text”

“Neural networks can’t play Go”

“Neural networks don’t have

common sense”

1986

“Neural networks don’t work”

?

At TwentyBN we build the brain that allows cameras to see

Prof. Yoshua Bengio

Scientific Advisor

Professor at MILA Montréal; noted for his pioneering work

on deep learning

Valentin Haenel

VP Engineering

Co-initiator of PyData Berlin; contributor in more than 50

open source projects

Nathan Benaich

Advisor

VC investor, technologist, former scientist; Organizer of

London.ai and RAAIS

+ 13 full-time staff, including AI researchers, engineers and product people

Roland Memisevic

15+ years experience in DL as Professor (MILA Montreal) & PhD student of Geoff Hinton

CEO & Chief Scientist

Moritz Müller-Freitag

COO & Head of Product

Experience as Professor (FH Münster) & principal software

architecture (XING AG)

Experience as data scientist (Eleven) & country manager

(Savedo/HitFox Group)

Ingo Bax

CTO

Christian Thurau

CBDO

Experience as Co-founder, CTO (Game Analytics, exit) & researcher (Fraunhofer)

Research & engineering

Data platform

Integrated technology stack

1 2Embedded real-time net

3

Solutions4

● RGB (for example, cheap, built-in laptop camera) ● Recognizes 25 hand gestures● Very high accuracy ● Runs in real-time on a laptop using RGB camera input

● Require depth sensor devices ● ~5 gestures ● Low accuracy ● Never gained traction

Camera based gesture control

Existing solutions

TwentyBN solution

Note: Click picture for video

VariationsCamera angles and scene layouts

Multi-person actions and localization

Interactivity

Complex object interactions

Indoor activity monitoring

Output: “Person picking [something] up”

Output: “[Something] falling like a feather or paper”

Output: “Person leaving through a door”

Output: “Bending [something] until it breaks”

Output: “Trying to bend [something unbendable] so nothing happens”

Output: “[gesture] Zooming Out With Two Fingers”

We support all stages of our clients’ product cycles

Softcore IP

Data licensing

Software licensing

Hardware licensing

Product Description

Software that adds video capabilities to your product

High-quality labeled videos customized to support your video applications

20BN-JESTER

A crowd-acted dataset of generic human hand gestures.

Number of Videos: 148.094

License: Free for academic use

(Creative Commons Attribution 4.0 International license CC BY-NC-ND 4.0)

https://www.twentybn.com/datasets/jester

20BN-SOMETHING-SOMETHING

A crowd-acted dataset of basic interactions with everyday objects.

Number of Videos: 108.499

License: Free for academic use

(Creative Commons Attribution 4.0 International license CC BY-NC-ND 4.0)

https://www.twentybn.com/datasets/something-something

Contrastive classes make learning harder and networks stronger

Tearing [something] into two pieces VS Tearing [something] just a little bit 0.74 (0.52)

Pretending to pick [something] up VS Picking [something] up 0.86 (0.75)

Pretending to pour VS Pouring 0.82 (0.64)

Pouring with overflow VS Pouring without 0.76 (0.54)

Pretending to put [something] onto VS Putting [something] onto [something] 0.82 (0.64)

Mistaken “opening” predictions

Ground truth: Moving [part] of [something]

Prediction: Opening [something]

Ground truth: Unfolding [something]

Ground truth: Putting [something] on a flat surface

without letting it roll

Prediction: Opening [something]

Prediction: Opening [something]

Mistaken “covering” predictions

Ground truth: Putting [something] in front of [something]

Prediction: Covering [something]

Ground truth: Turning [something] upside down

Prediction: Covering [something]

Transfer learning

Roland Memisevic+1 416 826 1032

roland@twentybn.com

www.twentybn.com

Recommended