JSAI Cup 2018 - SIGNATE...Method 16 CNN 0.4 0.6 0.2 0.2 random augmentation1 unlabeled 0.2 0.9 0.1...

JSAI Cup 2018

Aizawa Laboratory

University of Tokyo

Self-introduction

• Name: 郁 (Yu) 青 (Qing)

• Hometown: Shanghai, China

• Affiliation: Aizawa Laboratory, University of Tokyo

• Grade: M1

• Image Classification (55 Classes)

Dataset

• 55 classes of food ingredient images provided by cookpad

Dataset

• 55 classes of food ingredient images provided by cookpad

• Train: 11,995 images (labeled)

• Test: 3,937 images (unlabeled)

• Ensemble

• K-fold ensemble

• Pseudo labeling

• Test time augmentation

• Pretrained model

• Model evaluation: Accuracy

Problem

• Pretrained model

• Small dataset

• Overfitting

Method: Semi-supervised Learning

• Train: 11,195 images (labeled) + 3,937 images (unlabeled)

• Test: 3,937 images (unlabeled)

• Semi-supervised Learning: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results [Tarvainen, NIPS 2017]

• Implemented in PyTorch

Method

labeled

0.1 0.3 0.6 0.3

0 0 1 0

softmax cross entropy loss

back prop

randomaugmentation

Groundtruth Label

Method

0.4 0.6 0.2 0.2

randomaugmentation

unlabeled

Method

0.4 0.6 0.2 0.2

randomaugmentation1

unlabeled

0.2 0.9 0.1 0.1

randomaugmentation2

TeacherCNN

Method

0.4 0.6 0.2 0.2

randomaugmentation1

unlabeled

0.2 0.9 0.1 0.1

randomaugmentation2

TeacherCNN

More accurate result

Method

0.4 0.6 0.2 0.2

randomaugmentation1

unlabeled

0.2 0.9 0.1 0.1

randomaugmentation2

TeacherCNN

MSE loss

back prop

Method

0.4 0.6 0.2 0.2

randomaugmentation1

unlabeled

0.2 0.9 0.1 0.1

randomaugmentation2

TeacherCNN

MSE loss

back prop

How to generate teacher CNN?

Method

0.4 0.6 0.2 0.2

randomaugmentation1

unlabeled

0.2 0.9 0.1 0.1

randomaugmentation2

TeacherCNN

samestructure

Method

0.4 0.6 0.2 0.2

randomaugmentation1

unlabeled

0.2 0.9 0.1 0.1

randomaugmentation2

TeacherCNN

exponential moving average

of past iterations

𝜃 𝜃′

Parameters of CNN

Method

0.1 0.3 0.6 0.3

randomaugmentation1

0.1 0.2 0.8 0.3

randomaugmentation2

TeacherCNN

labeled batch

MSE loss

back prop

0 1 0 0

softmax cross entropy loss

back prop

Groundtruth Label

Train:

Method

0.4 0.6 0.2 0.2

randomaugmentation1

0.2 0.9 0.1 0.1

randomaugmentation2

TeacherCNN

MSE loss

back prop

Train:unlabeled batch

Method

CNN TeacherCNN

Train:

𝜃𝑡

𝜃𝑡′ = 𝜃𝑡−1

′ × α + 𝜃𝑡 × 1 − α

𝜃𝑡′

α = 0.999

Method

0.8 0.4 0.1 0.2 0.9 0.3 0.1 0.1

test timeaugmentation

TeacherCNN

unlabeled

result

Data Augmentation

• Train:

RandomRotation,RandomCrop,RandomHorizontalFlip,ColorJitter,RandomErasing,Mixup(Size 224 x 224)

Data Augmentation

• Train:

[Zhong, arXiv:1708.04896]

Data Augmentation

• Train:

Data Augmentation

• Train:

RandomRotation,RandomCrop,RandomHorizontalFlip,ColorJitter,RandomErasing,Mixup(Size 224 x 224) 0.7 * dog + 0.3 * cat

[Zhang, ICLR 2018]

Data Augmentation

• Train:

• Test:

144 Crop (Size: 320 x 320)

144 Crop

Resize x4

380 410 450

Crop x3

380 380 380

Crop x5 + Resize x1

HorizontalFlip x1 + Original x1

Sum = 4 x 3 x 6 x 2 = 144 [Szegedy, CVPR 2015]

• SE-ResNet-50 [Hu, CVPR 2018]

Squeeze-and-Excitation block

Hyperparameter

• Optimizer: SGD with Nesterov Momentum

• Learning rate: cosine annealing from 0.1

• Momentum: 0.9

• Weight decay: 1e-3

• Epoch: 300

• Batch size: 300

• Labeled batch size: 280

• Training needs 3 days on 4 Titan Xp

• Spent one month on hyperparameter tuning

Result

Date Score Method

2018/2/7 0.91071 ResNet-34 + Random Erasing

2018/2/8 0.92593 + Mixup

2018/2/12 0.96093 + Mean Teacher

2018/2/14 0.97210 + Test image size 320

2018/2/15 0.97869 ResNet-34 -> SE-ResNet-50

2018/2/22 0.97971 + 144 crop

2018/3/29 0.98326 + Hyperparameter tuning

Score: 0.98326 Final score: 0.98118

Summary

• Semi-supervised learning is awesome.

• Data augmentation is important.

• Hyperparameter tuning is arduous but indispensable.

JSAI Cup 2018 - SIGNATE...Method 16 CNN 0.4 0.6 0.2 0.2 random augmentation1 unlabeled 0.2 0.9 0.1...

Documents

0.1 0.2 0.3 MICROELECTRONICS AND 0.4 OPTOELECTRONICS

Turbulent Structure in a Drag-reducing Channel Flow with ......0 10 20 30 40 0 5 10 15 20 25 30 35 Speed 0.2 82643 0.2 64592 0.2 4654 0.2 28488 0.2 1 437 0.1 92385 0.1 74334 0.1 56282

Site Name: MCK024 Site Type: QUADRAT - EPA WA | EPA ...€¦ · Boerhavia burbidgeana 0.1 0.1 Boerhavia gardneri 0.2 0.1 Boerhavia sp. 0.1 0.5 Bonamia pannosa 0.1 0.1 *Calotropis

Stan: A platform for Bayesian inferencegelman/presentations/stantalk... · 2014. 11. 14. · 33/44 NUTSExampleTrajectory Hoffman and Gelman!0.1 0 0.1 0.2 0.3 0.4 0.5!0.1 0 0.1 0.2

CountryInsightSnapshot Belgium - Bisnode...Indicator 2015 2016 2017 2018f 2019f 2020f 2021f 2022f C/Abalance%GDP -0.2 0.1 -0.2 0.1 -0.1 -0.1 -0.1 -0.2 Govtbalance,%GDP -2.5 -2.5 -1.0

0.1 Module 4 0.2 DC Generator

researchportal.bath.ac.uk...Contents 0.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 0.2 Abbreviations

MM Catalogo 0.1 - 0.2 Primavera 2013

0.1 0.2 0 - Gotop

Comparison£of£methods£that£use£whole£genome£ data£to ... · CV MAF > 0.05 CV MAF < 0.0025 0.4 0.3 0.2 0.1 0.0 –0.1 Estimate of h 2 0.5 0.4 0.3 0.2 0.1 0.0 –0.1 Estimate

T&T6 1.7 Conditional Probability - eMaths.iecb4b.weebly.com/uploads/1/0/7/1/10716199/tt5_1.7... · 2013-02-12 · 0.4 0.1 0.3 0.2 (i) P(AIB) = 0.1 = 0.25 0.4 0.1 = 0.2 0.5 Example

Cables and Connectors Specification rev02 › fileadmin › user_upload › Date... · o11.4 ± 0.2 7.0 ± 0.5 0.8 ± 0.2 o14.5 ± 0.1 o11.0 ± 0.1 o4.8 ± 0.2 Housing Material and

The Role of Analytics in the Development of Advanced ... · N287 0.4% 0.1% 0.2% 0.6% 1.2% 1.2% N316 2.8% 0.7% 2.7% 6.1% 7.2% 7.6% N362 0.1% 0.1% 0.1% 0.2% 0.2% 0.2% N385, N390 4.6%

LAZYMAN BASEBALL WEST - 2020...SP Beede, Tyler SF 0.2 SP Pena, Felix LAA 0.2 SP Armenteros, Rogelio HOU 0.1 SP Brigham, Jeff MIA 0.2 SP Plutko, Adam CLE 0.2 SP Baez, Michel SD 0.1

SMEADOWS12/20/2017 81427 - Chesapeake, Virginia · Month Jul Aug Sep Oct Nov Dec Motorcycles 0.2% 0.1% 0.2% 0.1% 0.1% CTS Monthly Disclosure - November 2017 Chesapeake Expressway

I. 28.pdf · (DAN1 survey) 0.1 - 1.3 0.2 - 2.6

Semantic Texture for Robust Dense Trackingjc8515/pubs/semantic_texture.pdf · pitch [rad] 0.2 0.1 0.0 0.1 0.2 yaw [rad] 0.2 0.1 0.0 0.1 0.2 5000 10000 15000 20000 25000 Figure 1

4-7 APRILE 30 2019 - Maranello Kart · 2019-04-08 · 312313 0.0 320 0.6 328 0.1 324 0.0 303 0.2 307 1.3 309 0.1 302 0.1 319 0.4 310 0.2 305 0.3 318 0.1 338 0.3 337 0.2 331 0.3 333

Cimbria Cleaner · Screw Conveyor For Light Waste • ensures continuous discharge of dust and light ... 101 0.96 m2 0.2 0.2 0.1 0.1 0.08 0.1 0.08 0.08 0.04 0.04 0.04 0.04 0.04 0.08

Primate Research Institute, Kyoto University · 0.2 0.1 0.4 0.3 0.2 O 0.8 0.6 0.4 02 0.3 0.1 0.4 0.3 0.2 0.1 1.2 0.9 0.6 0.3 0.09 0.06 0.03 20 40 60 80 100 10 100 16 12 120 100 o