Pairing Up CNNs for High Throughput Deep LearningBabak Zamirai, Salar Latifi, Scott Mahlke
Input Variation
• ThereisnosinglebestCNNforallinputs• CombinemultipleCNNs- Lowercomputationalcomplexity- Higheraccuracy
OddCorrect3%54%
CommonCorrect
ComplexCorrect25%
18%CommonWrong
ResNet-15279%
AlexNet57%
DNN Complexity Trend
• Computationalcomplexitygrowsfast✓ Accuracyimprovement✗ Input-invariantaccelerations
ResNet-50(2016)
Inception-v3(2016)
Xception (2017)
Inception-v4(2017)
ResNeXt-101(2017)
PolyNet(2017)
SENet(2018)
75
76
77
78
79
80
81
82
83
84
0 5 10 15 20 25 30 35 40 45
Top-1Accuracy(%
)
MACOperations(GFLOPs)
Pairing Up CNNs
HW Resources
Model Pool
Req.
Input
Output
UserService
GPUsCPUs FPGAs
Traditional Service
GPUCPU GPU
Dynamic Duo (DD) Service
GPUCPU GPU
Confidence Probe
HW Resources
Model Pool
Req.
Input
Output
UserService
GPUsCPUs FPGAs
Traditional Service
GPUCPU GPU
Dynamic Duo (DD) Service
GPUCPU GPU
Confidence Probe
Runtime
Datacenter Throughput
• Sameresponsetimeasbaseline
1.8x
Latency
• Exhaustivesearchresultsinonly5%additionalgains
69%
Synergistic Pairs
• Higheraccuracymeansmoreroomforsavings• Oddcorrectsandpeakaccuracyarecorrelated• Moreoddcorrects,bettersynergy
Confidence Probe
• Recoveryrate=26%• Oddcorrectsmaintaintheaccuracy
Complexcorrects Commoncorrects
Softmax Layer
CNN
0.9
0.2
1.3
4.98
Outputlayer
Softmax0.016
0.008
0.024
0.952
Normalizedoutput
✗Cat
✗Fish
✗Bird
✓ Dog
• Anestimationofconfidence• Sumoftheelements=1.0
• RuneverythingonthelittleCNN• Detectandrecoverunreliableoutputs
Little CNNInput
Output
User
Confidence ProbeUnreliable
Reliable
Big CNN