Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Deep Learning Research of NAVER Clova
for AI-Enhanced Business2 July 2019
NVidia AI Conference
Jung-Woo Ha, PhD
Research Head, Clova AI, NAVER & LINE
NAVER
2
LINE
3
NAVER & LINE
4
NAVER
Korea No.1 Internet Platform Company
LINE
Global Messenger Platform Company
Based on Japan
Clova
Vision of Clova: AI for Everyone
Innovative AI
Technologies
Practical &
Valuable AI
Services
B2B
B2CBetter World
5
Clova AI Core Technologies to B2B
6
Deep
Learning
Speech
NLP
Recomme-
ndation
ML
Platform
Computer
Vision
B2B
OCR
SpeechRecognition
Chatbot
FaceSDK
Voice Synthesis
AI ARS
NSML
AI Technology Hierarchy
7
Powerful Pretrained Models
(Regularizer, DataAug, LR scheduling, Curriculum learning, …)
Task-specific Models
(Fine-tuning, Transfer learning, Distillation, Domain adaptation, ...)
Application Logics + API
Lightweight AutoML
Pretrained Models
8
Distillation: Student Better Than A Teacher
9
[CIFAR-100][Heo et al. Arxiv 2019]
Distillation: Student Better Than A Teacher
10
[ImageNet-1k] [Other Tasks: Object Detection & Segmentation]
[Heo et al. Arxiv 2019]
CutMix: New Robust Data Augmentation
11
[Yun et al. Arxiv 2019]
CutMix: New Robust Data Augmentation
12
[Yun et al. Arxiv 2019]
Lightweight CNN Architecture Design
• Here, the problem is how to design feature-map sizes when the # of parameters is limited.
• The performance could be improved by weight layer reallocation.
13
128
128
256
256
512
512
512
64
Input
Output
64
512
512
512
SOTA Lightweight Image Models
14
[Han et al. 2019; Yun et al. 2019; Yoo et al. 2019]
0
5
10
15
20
25
71 72 73 74 75 76
# o
f para
ms
(M)
[Classification (ImageNet)]
ResNet-50
MobileNet_v2 Clova AI
Top1 정확도 (%)
0
10
20
30
40
0.7 0.72 0.74 0.76 0.78
mAP
[Object detection (Pascal VOC07)]
SSD-ResNet50
TDSOD Clova AI
0
4
8
12
16
20
24
0.75 0.8 0.85 0.9
[Face detection (Widerface Hard)]
S3FD-Base
Clova AI
mAP
S3FD-Small
Lightweight CNN Architecture Design
• Transfer to object detection task (finetuning)• Pascal VOC 07 test results (trained on VOC 0712 trainval):
• Transfer to text detection task (Lite-CRAFT)• ICDAR-13 test results:
15
Backbone # of params Hmean(%)
VGG-16 BN 20.8 M 91.5
Ours 2.3 M 91.0
Ours 2.1 M 89.0
Clova AI 경량화 SSD - MobileNet_v2
PASCAL: VOC 정확도(모델크기) 77.0 (5.4M) 70.1 (5.4M)
LaRva: Language Representation by Clova
• BigLM based on BERT
• New task definition• GLUE vs. KLUE and JLUE
• New encoding, corpus-level curriculum learning, n-gram maksing
• Distributed learning for LaRva
• Fast LaRva
16
Task-specific Models
17
Speech Enhancement
18
[Orignal] [Enhanced]
Audio-Visual Speech Enhancement
19
• Speech separation given lip regions in the video[Afouras et al. Interspeech 2018]
HDTS: SOTA Speech Synthesis
20
Required recorded voice data: 10hrs 4hrs 40mins
Detection in OCR
CRAFT: Character region awareness for text detection
22
[Baek et al. CVPR 2019]
OCR Challenges
• The 1st rank in 4 Leaderboards @ ICDAR Challenges (Jan 2019)
23
Source video
Suga
V
Jungkook
https://channels.vlive.tv/EF0205/home
AutoCut (BTS)
AutoCAM (Black Pink)
Applications of tracking, pose estimation, and person re-id in in-the-wild videos
25
Detection in OCR
CRAFT: Character region awareness for text detection
26
[Baek et al. CVPR 2019]
Detection in OCR
CRAFT: Character region awareness for text detection
27
[Baek et al. CVPR 2019]
Recognition in OCR
• Full integration for practical OCR
28
[Baek et al. ArXiv 2019]
SOTA Fashion Retrieval
29
[Jeon et al. Arxiv 2019]
Audio-Visual Speech Enhancement
30
[Chung et al. ICASSP 2019]
Super-Real Style Transfer
31
[Yoo et al. Arxiv 2019]
[WCT] [PhotoWCT wo pp] [WCT2(ours)]
Super-Real Style Transfer
• Wavelet Pooling for Perfect Reconstruction
32
[Yoo et al. Arxiv 2019]
https://github.com/clovaai/wct2
EXTD: EXtremely Tiny face Detector
33
[EXTD: 0.16M] [MobileFaceNet: 1.5M]
StarGAN
34
[Choi et al. CVPR 2018 (oral)]
LaRva: Language Representation by Clova
• Giant-scale general-purpose language model by improving BERT
• Not Multi-lingual BERT but LaRva
35
[WikiSQL] [KorQuAD]
NSML
NSML: ML research platform that enables you to focus on your model!!
36
[Sung et al. MLSYS 2017@NIPS 2017]
Easy One-Liner CLI
• Dataset registration
• Train
• Serve (Inference)
37
NSML
• Support any kind of GPU clusters
• Automated GPU allocation / release
• Support most deep learning libraries
• Docker-based isolated research environment
• Easy-to-use CLI and Web interfaces
• Effective visualization
• Leaderboard
• Jupyter and AutoML
38
[Sung et al. MLSYS 2017@NIPS 2017]
네이버스마트머신러닝플랫폼
39
쉽고 편리한 사용 다양한 실험과인사이트
효과적 협업과투명한 관리
클라우드 기반으로 즉시 사용이 가능하고,데이터 관리가 편리합니다
효율적 자원 활용
GPU Clustering 및 Scheduling 를통해 자원을 효과적으로 배분합니다
병렬학습 및 형상관리, Visualization 을 통해빠르게 인사이트를 얻습니다
문제를 공유할 수 있고 전체 학습 및 자원 현황을실시간 확인할 수 있습니다
AutoML을 통해 자동으로 파라미터를튜닝하고 성능을 최적화합니다
모델 성능최적화
Active Learning 을 통해성능 향상에 중요한 데이터를 선별하고,자동으로 데이터를 레이블링 합니다
빠르고 효율적인데이터 레이블링
AutoML in NSML
40
+ AutoML
(5) AutoML을통해더빠르게, 더좋은학습결과를얻을수있습니다
41
파라미터 튜닝 자동화
• 간단한 조건 세팅으로 Auto ML을 통해 자동으로 Parameter 를 조정하고 최적 모델 탐색
• 각 모델의 Parameter 조합 및 성능은 그래프를 통해 비교 확인
• 설정에 따른 성능 추이를 분석하고, 탐색 범위를 좁혀가며 추가적인 실험 진행
AutoML 활용 사례
기존 AutoML
기존 AutoML
91% ↓
96% ↓
투입 인력
투입 자원
모델 성능
기존 AutoML
10%up
We can meet here
43
Research Opportunities
44
Pose TrackingVLive …
Four-hour Recoding Voice
2%(err rate vs others 5+%)
99%
Product Image Search
Optical Character Recognition
80%(Eng recg. vs google 54%)
86%(recall@1 vs Alibaba 79%)
Clova AI World-class Achievement
83%(WIKISQL SOTA 75%)
Lightweight (detection)
75% (1.6M)(vs 72.1% with 1.7M, Intel Lab)
Selected Publication List of Clova AI Since 2018
46
1. Sung et al. NSML: A Machine Learning Platform That Enables You to Focus on Your Models, MLSYSWS@NIPS 2017
2. Seo et al. Neural Speed Reading via Skim-RNN, ICLR 2018.
3. Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation, CVPR 2018.
4. Afouras et al. Deep Lip Reading: a comparison of models and an online application, Interspeech 2018.
5. Chung et al. VoxCeleb2: Deep Speaker Recognition, Interspeech 2018.
6. Afouras et al. The Conversation: Deep Audio Visual Speech Enhancement, Interspeech 2018.
7. Lee et al. Acoustic modeling using adversarially trained variational recurrent neural network for speech synthesis, Interspeech 2018.
8. Hwang et al. A Unified Framework for the Generation of Glottal Signals in Deep Learning-Based Parametric Speech Synthesis Systems, Interspeech 2018.
9. Park et al. Representation Learning of Music Using Artist Labels, ISMIR 2018.
10. Lee et al. Unsupervised holistic image generation from key local patches, ECCV 2018.
11. Kim et al. Multimodal Dual Attention Memory for Video Story Question Answering, ECCV 2018.
12. Seo et al. Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension, EMNLP 2018.
13. Lee et al. Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog, NeurIPS 2018.
14. Song et al. Hierarchical Context enabled Recurrent Neural Network for Recommendation, AAAI 2019.
15. Park et al. Adversarial Dropout for Recurrent Neural Networks, AAAI 2019.
16. Park et al. Paraphrase Diversification using Counterfactual Debiasing, AAAI 2019.
17. Heo et al. Knowledge Distillation with Adversarial Samples Supporting Decision Boundary, AAAI 2019.
18. Heo et al. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons, AAAI 2019.
19. Oh et al. Modeling Uncertainty with Hedged Instance Embeddings, ICLR 2019.
20. Gu et al. DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder, ICLR 2019.
21. Lee et al. Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation, ICLR 2019.
22. Kim et al. Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty, ICML 2019.
23. Chung et al. Perfect match: Improved cross-modal Embeddings for audio-visual synchronisation, ICASSP 2019
24. Baek et al. Character Region Awareness for Text Detection, CVPR 2019.
25. Seo et al. Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL 2019.
26. Chung et al. Who said that?: Audio-visual speaker diarisation of real-world meeting, Interspeech 2019.
27. Afouras et al. My lips are concealed: Audio-visual speech enhancement through obstructions, Interspeech 2019.
28. Yamamoto et al. Probability Density Distillation with Generative Adversarial Networks for High-quality Parallel Waveform Generation, Interspeech 2019.
29. Hwang et al. Parameter enhancement for MELP speech codec in noisy communication environment, Interspeech 2019.
WE ARE HIRING! Join Us!
ResearchScientist
Positions
AI SoftwareEngineer
ResearchInternship
GlobalResidency
Fields and Domains
All fields of AI from fundamental theories to practical applications
Why should I Join Naver Clova?Great Colleagues & Work Environment, Hands-on R&D Experience
Advisory: Kyunghyun Cho, Jun-Yan Zhu, Hannaneh Hajishirzi, and Jaegul Choo
How to [email protected]
47
We are hiring 2019 Fall / 2020 Spring Global Residency
48
Send Your CV Today!
49
[Clova AI Github] [Clova AI Research Web][Facebook Clova AI Research]