Deep Learning Research of NAVER Clova for AI-Enhanced Business · Deep Learning Research of NAVER Clova for AI-Enhanced Business 2 July 2019 NVidia AI Conference Jung-Woo Ha, PhD

Deep Learning Research of NAVER Clova

for AI-Enhanced Business2 July 2019

NVidia AI Conference

Jung-Woo Ha, PhD

Research Head, Clova AI, NAVER & LINE

NAVER

2

LINE

3

NAVER & LINE

4

NAVER

Korea No.1 Internet Platform Company

LINE

Global Messenger Platform Company

Based on Japan

Clova

Vision of Clova: AI for Everyone

Innovative AI

Technologies

Practical &

Valuable AI

Services

B2B

B2CBetter World

5

Clova AI Core Technologies to B2B

6

Deep

Learning

Speech

NLP

Recomme-

ndation

ML

Platform

Computer

Vision

B2B

OCR

SpeechRecognition

Chatbot

FaceSDK

Voice Synthesis

AI ARS

NSML

AI Technology Hierarchy

7

Powerful Pretrained Models

(Regularizer, DataAug, LR scheduling, Curriculum learning, …)

Task-specific Models

(Fine-tuning, Transfer learning, Distillation, Domain adaptation, ...)

Application Logics + API

Lightweight AutoML

Pretrained Models

8

Distillation: Student Better Than A Teacher

9

[CIFAR-100][Heo et al. Arxiv 2019]

Distillation: Student Better Than A Teacher

10

[ImageNet-1k] [Other Tasks: Object Detection & Segmentation]

[Heo et al. Arxiv 2019]

CutMix: New Robust Data Augmentation

11

[Yun et al. Arxiv 2019]

CutMix: New Robust Data Augmentation

12

[Yun et al. Arxiv 2019]

Lightweight CNN Architecture Design

• Here, the problem is how to design feature-map sizes when the # of parameters is limited.

• The performance could be improved by weight layer reallocation.

13

128

128

256

256

512

512

512

64

Input

Output

64

512

512

512

SOTA Lightweight Image Models

14

[Han et al. 2019; Yun et al. 2019; Yoo et al. 2019]

0

5

10

15

20

25

71 72 73 74 75 76

# o

f para

ms

(M)

[Classification (ImageNet)]

ResNet-50

MobileNet_v2 Clova AI

Top1 정확도 (%)

0

10

20

30

40

0.7 0.72 0.74 0.76 0.78

mAP

[Object detection (Pascal VOC07)]

SSD-ResNet50

TDSOD Clova AI

0

4

8

12

16

20

24

0.75 0.8 0.85 0.9

[Face detection (Widerface Hard)]

S3FD-Base

Clova AI

mAP

S3FD-Small

Lightweight CNN Architecture Design

• Transfer to object detection task (finetuning)• Pascal VOC 07 test results (trained on VOC 0712 trainval):

• Transfer to text detection task (Lite-CRAFT)• ICDAR-13 test results:

15

Backbone # of params Hmean(%)

VGG-16 BN 20.8 M 91.5

Ours 2.3 M 91.0

Ours 2.1 M 89.0

Clova AI 경량화 SSD - MobileNet_v2

PASCAL: VOC 정확도(모델크기) 77.0 (5.4M) 70.1 (5.4M)

LaRva: Language Representation by Clova

• BigLM based on BERT

• New task definition• GLUE vs. KLUE and JLUE

• New encoding, corpus-level curriculum learning, n-gram maksing

• Distributed learning for LaRva

• Fast LaRva

16

Task-specific Models

17

Speech Enhancement

18

[Orignal] [Enhanced]

Audio-Visual Speech Enhancement

19

• Speech separation given lip regions in the video[Afouras et al. Interspeech 2018]

HDTS: SOTA Speech Synthesis

20

Required recorded voice data: 10hrs 4hrs 40mins

Detection in OCR

CRAFT: Character region awareness for text detection

22

[Baek et al. CVPR 2019]

OCR Challenges

• The 1st rank in 4 Leaderboards @ ICDAR Challenges (Jan 2019)

23

Source video

Suga

V

Jungkook

https://channels.vlive.tv/EF0205/home

AutoCut (BTS)

AutoCAM (Black Pink)

Applications of tracking, pose estimation, and person re-id in in-the-wild videos

25

Detection in OCR


26


Detection in OCR


27


Recognition in OCR

• Full integration for practical OCR

28

[Baek et al. ArXiv 2019]

SOTA Fashion Retrieval

29

[Jeon et al. Arxiv 2019]

Audio-Visual Speech Enhancement

30

[Chung et al. ICASSP 2019]

Super-Real Style Transfer

31

[Yoo et al. Arxiv 2019]

[WCT] [PhotoWCT wo pp] [WCT2(ours)]

Super-Real Style Transfer

• Wavelet Pooling for Perfect Reconstruction

32

[Yoo et al. Arxiv 2019]

https://github.com/clovaai/wct2

https://github.com/clovaai/wct2

EXTD: EXtremely Tiny face Detector

33

[EXTD: 0.16M] [MobileFaceNet: 1.5M]

StarGAN

34

[Choi et al. CVPR 2018 (oral)]

LaRva: Language Representation by Clova

• Giant-scale general-purpose language model by improving BERT

• Not Multi-lingual BERT but LaRva

35

[WikiSQL] [KorQuAD]

NSML

NSML: ML research platform that enables you to focus on your model!!

36

[Sung et al. MLSYS 2017@NIPS 2017]

Easy One-Liner CLI

• Dataset registration

• Train

• Serve (Inference)

37

NSML

• Support any kind of GPU clusters

• Automated GPU allocation / release

• Support most deep learning libraries

• Docker-based isolated research environment

• Easy-to-use CLI and Web interfaces

• Effective visualization

• Leaderboard

• Jupyter and AutoML

38

[Sung et al. MLSYS 2017@NIPS 2017]

네이버스마트머신러닝플랫폼

39

쉽고 편리한 사용 다양한 실험과인사이트

효과적 협업과투명한 관리

클라우드 기반으로 즉시 사용이 가능하고,데이터 관리가 편리합니다

효율적 자원 활용

GPU Clustering 및 Scheduling 를통해 자원을 효과적으로 배분합니다

병렬학습 및 형상관리, Visualization 을 통해빠르게 인사이트를 얻습니다

문제를 공유할 수 있고 전체 학습 및 자원 현황을실시간 확인할 수 있습니다

AutoML을 통해 자동으로 파라미터를튜닝하고 성능을 최적화합니다

모델 성능최적화

Active Learning 을 통해성능 향상에 중요한 데이터를 선별하고,자동으로 데이터를 레이블링 합니다

빠르고 효율적인데이터 레이블링

AutoML in NSML

40

+ AutoML

(5) AutoML을통해더빠르게, 더좋은학습결과를얻을수있습니다

41

파라미터 튜닝 자동화

• 간단한 조건 세팅으로 Auto ML을 통해 자동으로 Parameter 를 조정하고 최적 모델 탐색

• 각 모델의 Parameter 조합 및 성능은 그래프를 통해 비교 확인

• 설정에 따른 성능 추이를 분석하고, 탐색 범위를 좁혀가며 추가적인 실험 진행

AutoML 활용 사례

기존 AutoML

기존 AutoML

91% ↓

96% ↓

투입 인력

투입 자원

모델 성능

기존 AutoML

10%up

Clova AI Tech Demo

42

https://clova.ai/techdemo

https://clova.ai/techdemo

We can meet here

43

Research Opportunities

44

Pose TrackingVLive …

Four-hour Recoding Voice

2%(err rate vs others 5+%)

99%

Product Image Search

Optical Character Recognition

80%(Eng recg. vs google 54%)

86%(recall@1 vs Alibaba 79%)

Clova AI World-class Achievement

83%(WIKISQL SOTA 75%)

Lightweight (detection)

75% (1.6M)(vs 72.1% with 1.7M, Intel Lab)

Selected Publication List of Clova AI Since 2018

46

1. Sung et al. NSML: A Machine Learning Platform That Enables You to Focus on Your Models, MLSYSWS@NIPS 2017

2. Seo et al. Neural Speed Reading via Skim-RNN, ICLR 2018.

3. Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation, CVPR 2018.

4. Afouras et al. Deep Lip Reading: a comparison of models and an online application, Interspeech 2018.

5. Chung et al. VoxCeleb2: Deep Speaker Recognition, Interspeech 2018.

6. Afouras et al. The Conversation: Deep Audio Visual Speech Enhancement, Interspeech 2018.

7. Lee et al. Acoustic modeling using adversarially trained variational recurrent neural network for speech synthesis, Interspeech 2018.

8. Hwang et al. A Unified Framework for the Generation of Glottal Signals in Deep Learning-Based Parametric Speech Synthesis Systems, Interspeech 2018.

9. Park et al. Representation Learning of Music Using Artist Labels, ISMIR 2018.

10. Lee et al. Unsupervised holistic image generation from key local patches, ECCV 2018.

11. Kim et al. Multimodal Dual Attention Memory for Video Story Question Answering, ECCV 2018.

12. Seo et al. Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension, EMNLP 2018.

13. Lee et al. Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog, NeurIPS 2018.

14. Song et al. Hierarchical Context enabled Recurrent Neural Network for Recommendation, AAAI 2019.

15. Park et al. Adversarial Dropout for Recurrent Neural Networks, AAAI 2019.

16. Park et al. Paraphrase Diversification using Counterfactual Debiasing, AAAI 2019.

17. Heo et al. Knowledge Distillation with Adversarial Samples Supporting Decision Boundary, AAAI 2019.

18. Heo et al. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons, AAAI 2019.

19. Oh et al. Modeling Uncertainty with Hedged Instance Embeddings, ICLR 2019.

20. Gu et al. DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder, ICLR 2019.

21. Lee et al. Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation, ICLR 2019.

22. Kim et al. Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty, ICML 2019.

23. Chung et al. Perfect match: Improved cross-modal Embeddings for audio-visual synchronisation, ICASSP 2019

24. Baek et al. Character Region Awareness for Text Detection, CVPR 2019.

25. Seo et al. Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL 2019.

26. Chung et al. Who said that?: Audio-visual speaker diarisation of real-world meeting, Interspeech 2019.

27. Afouras et al. My lips are concealed: Audio-visual speech enhancement through obstructions, Interspeech 2019.

28. Yamamoto et al. Probability Density Distillation with Generative Adversarial Networks for High-quality Parallel Waveform Generation, Interspeech 2019.

29. Hwang et al. Parameter enhancement for MELP speech codec in noisy communication environment, Interspeech 2019.

WE ARE HIRING! Join Us!

ResearchScientist

Positions

AI SoftwareEngineer

ResearchInternship

GlobalResidency

Fields and Domains

All fields of AI from fundamental theories to practical applications

Why should I Join Naver Clova?Great Colleagues & Work Environment, Hands-on R&D Experience

Advisory: Kyunghyun Cho, Jun-Yan Zhu, Hannaneh Hajishirzi, and Jaegul Choo

How to [email protected]

47

mailto:[email protected]

We are hiring 2019 Fall / 2020 Spring Global Residency

48

Send Your CV Today!

49

[email protected]

[Clova AI Github] [Clova AI Research Web][Facebook Clova AI Research]

mailto:[email protected]

Documents

Deep Learning Research of NAVER Clova for AI-Enhanced Business · Deep Learning Research of NAVER Clova for AI-Enhanced Business 2 July 2019 NVidia AI Conference Jung-Woo Ha, PhD