Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Energy-Efficient Face Detection Using Andes RISC-V Processor
Presenter: Chien-Hao Chen
Advisor: Prof. Chen-Yi Lee
Date: 2018/03/12
1
Outline • Introduction
• Face Detector on Andes Processor
• Experiment Result
• Conclusion
• Reference
2
Outline • Introduction
• Motivation
• Face Detection Model
• Face Detector on Andes Processor
• Experiment Result
• Conclusion
• Reference
3
Motivation • Cloud computing
– Image upload to cloud → → result returned
• Edge computing
– Image directly computed → → result returned
4
processing
processing
Face Detection Model MTCNN, 2016[1]
1. Resize image and sliding window sampling
2. P-Net (Proposal): Find candidate bounding box
3. R-Net (Refine): Reject the wrong candidate from P-Net
4. O-Net (Output): From R-Net, find more correct face region
P-Net R-Net O-Net
5 Image from Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks, IEEE Signal Processing Letters (SPL), vol. 23, no. 10, pp. 1499-1503, 2016
Face Detection Model • P-Net (Proposal):
• Fully convolution with 3 convolution and 1 max pooling layer
• Rough proposal
• R-Net (Refine): • 3 convolution, 2 max pooling and 1 fully connect layer
• Reject false proposal from P-Net
• O-Net (Output): • 4 convolution, 3 max pooling and
1 fully connect layer
• More complicated model
→ Reject false result from R-NET
→ Better face bounding box position
6
Image from Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks, IEEE Signal Processing Letters (SPL), vol. 23, no. 10, pp. 1499-1503, 2016
Outline • Introduction
• Face Detector on Andes Processor − Hardware environment
− Model Simplification and Acceleration
• Experiment Result
• Conclusion
• Reference
7
8
Hardware environment Andes RISC-V :
− Processor 60MHz, 64-bit AndesCore
− Xilinx Kintex-7 FPGA XC7K410T
− DRAM: 1GB
− Flash: 64MB
Outline • Introduction
• Face Detector on Andes Processor − Hardware environment
− Model Simplification and Acceleration
• Experiment Result
• Conclusion
• Reference
9
Depth-wise separable convolution [3]
10
Model Simplification and Acceleration
Model Simplify
1 1
Depth-wise MTCNN
• P-Net: (Proposal) • Fully convolution with 1 convolution layer: stride = 2 (channel: 10)
2 DW convolution layer: stride = 1 (channel: 16, 32)
• R-Net: (Refine) • 1 convolution layer: stride = 2
1 DW convolution layer: stride = 2 1 DW convolution layer: stride = 1
• 1 fully connect
• O-Net: (Output) • 1 convolution: stride = 2
2 DW convolution: stride = 2 2 convolution: stride = 1 (channel: 128, 128)
• 1 fully connect
11
Model Simplification and Acceleration
8 24
Motivation
• Ex: If PNET input size 240 × 320 output1 size 115 × 155 × 2 output2 size 115 × 155 × 4
• Soft-max:
𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥 + 𝑒𝑦
𝑒𝑦
𝑒𝑥 + 𝑒𝑦
→ 6 𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 & 2 𝑑𝑖𝑣𝑖𝑠𝑖𝑜𝑛
• For output1 Soft-max: → 115 × 155 × 6~107𝑘 𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 → 115 × 155 × 2~35𝑘 𝑑𝑖𝑣𝑖𝑠𝑖𝑜𝑛
12
1 2 Soft-max
Approximation
Model Simplification and Acceleration
𝐻𝑜𝑢𝑡 =𝐻𝑖𝑛 − 𝐻𝑓𝑖𝑙𝑡𝑒𝑟 + 𝑃𝑎𝑑𝑑𝑖𝑛𝑔
𝑆𝑡𝑟𝑖𝑑𝑒+ 1
=240 − 12 + 0
2+ 1 = 115
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
13
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
14
> 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑃)
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
15
> 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑃)
𝑒𝑥
𝑒𝑥 + 𝑒𝑦> 𝑃
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
16
> 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑃)
𝑒𝑥
𝑒𝑥 + 𝑒𝑦> 𝑃
𝑒𝑥 > 𝑃𝑒𝑥 + 𝑃𝑒𝑦
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
17
> 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑃)
𝑒𝑥
𝑒𝑥 + 𝑒𝑦> 𝑃
𝑒𝑥 > 𝑃𝑒𝑥 + 𝑃𝑒𝑦
(1 − 𝑃)𝑒𝑥> 𝑃𝑒𝑦
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
18
> 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑃)
𝑒𝑥
𝑒𝑥 + 𝑒𝑦> 𝑃
𝑒𝑥 > 𝑃𝑒𝑥 + 𝑃𝑒𝑦
(1 − 𝑃)𝑒𝑥> 𝑃𝑒𝑦
𝑙𝑛 1 − 𝑃 + 𝑥 > 𝑙𝑛 𝑃 + 𝑦
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
19
> 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑃)
𝑒𝑥
𝑒𝑥 + 𝑒𝑦> 𝑃
𝑒𝑥 > 𝑃𝑒𝑥 + 𝑃𝑒𝑦
(1 − 𝑃)𝑒𝑥> 𝑃𝑒𝑦
𝑙𝑛 1 − 𝑃 + 𝑥 > 𝑙𝑛 𝑃 + 𝑦
𝑥 > 𝑙𝑛 (𝑃
1 − 𝑃) + 𝑦
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
• 𝜎𝑥𝑦 =
𝑒𝑥
𝑒𝑥+𝑒𝑦
𝑒𝑦
𝑒𝑥+𝑒𝑦
20
> 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑃)
𝑒𝑥
𝑒𝑥 + 𝑒𝑦> 𝑃
𝑒𝑥 > 𝑃𝑒𝑥 + 𝑃𝑒𝑦
(1 − 𝑃)𝑒𝑥> 𝑃𝑒𝑦
𝑙𝑛 1 − 𝑃 + 𝑥 > 𝑙𝑛 𝑃 + 𝑦
𝑥 > 𝑙𝑛 (𝑃
1 − 𝑃) + 𝑦
constant
Model Simplification and Acceleration
1 2 Soft-max
Approximation
21
𝑒𝑥
𝑒𝑥 + 𝑒𝑦= 0.7
𝑥 = 𝑙𝑛 (0.7
1 − 0.7) + 𝑦
Model Simplification and Acceleration
1 2
Outline • Introduction
• Face Detector on Andes Processor
• Experiment Result
• Conclusion
• Reference
22
• On FDDB[4] database: • P-Net, R-Net threshold = 0.6, 0.7; min-face = 25x25
23
Experiment Result
Method Accuracy @
FPPI 0.01 Accuracy @
FPPI 0.1 Accuracy @
FPPI 1.0
Speedup @ Andes RISC-V
Processor
MTCNN 84.95% 92.40% 94.66% -
Ours 82.59% 88.15% 90.68% 106x
• FPPI: False Positive Per Image
• On FDDB database:
24
Experiment Result
• FPPI: False Positive Per Image
Method Accuracy @
FPPI 1.0
Speedup @ Andes RISC-V
Processor
MTCNN 94.66% -
Ours 90.68% 106x
Method Accuracy
@ FPPI 0.1 Accuracy
@ FPPI 0.01 FPS
(Titan X GPU)
FPS (1080-Ti)
Brodmann17 89.25% 81.88% 200 90
DeepIR 88.45% 82.16% <=1
Xiaomi 87.82% 77.99% 2?
Faceness 86.04% 79.67% 1
Hyperface 85.63% 80.68% 0.33
DP2MFD 85.57% 76.73% <0.05
Ours 88.15% 82.59% 54
• On FDDB database:
• Performance without considering face size under 48x48
• P-Net, R-Net threshold = 0.9, 0.85; min-face = 48x48
• P-Net, R-Net threshold = 0.6, 0.7; min-face = 48x48
25
Method Accuracy @
FPPI 0.01 Accuracy @
FPPI 0.1
Ours 86.64% 87.7%
Method Accuracy @
FPPI 0.01 Accuracy @
FPPI 0.1
Ours 90.53% 93.81%
Experiment Result
Outline • Introduction
• Face Detector on Andes Processor
• Experiment Result
• Conclusion
• Reference
26
• Proposed face detection model
Conclusion
27
Model Size 3.6x smaller
Speedup @ Andes processor
106x faster
Accuracy @ FPPI 1.0
90.68%
Reference
28
[1] Zhang, Kaipeng, et al. "Joint face detection and alignment using multitask cascaded convolutional networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
[2] Li, Haoxiang, et al. "A convolutional neural network cascade for face detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[3] Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
[4] Jain, Vidit, and Erik Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Vol. 2. No. 4. UMass Amherst Technical Report, 2010.
Reference
29
[5] Sun, Xudong, Pengcheng Wu, and Steven CH Hoi. "Face detection using deep learning: An improved faster rcnn approach." Neurocomputing 299 (2018): 42-50.
[6] Jiang, Huaizu, and Erik Learned-Miller. "Face detection with the faster R-CNN." 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 2017.
[7] Yang, Shuo, et al. "Faceness-net: Face detection through deep facial part responses." IEEE transactions on pattern analysis and machine intelligence 40.8 (2018): 1845-1859.
[8] Ranjan, Rajeev, Vishal M. Patel, and Rama Chellappa. "Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence 41.1 (2019): 121-135.
Reference
30
[9] Ranjan, Rajeev, Vishal M. Patel, and Rama Chellappa. "A deep pyramid deformable part model for face detection." 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE, 2015.
Thanks for your listening!
31
32
Soft-max with NMS
33
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
34
Highest score
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
•𝑒𝑥1
𝑒𝑥1+𝑒𝑦1 >𝑒𝑥2
𝑒𝑥2+𝑒𝑦2
35
Highest score
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
•𝑒𝑥1
𝑒𝑥1+𝑒𝑦1 >𝑒𝑥2
𝑒𝑥2+𝑒𝑦2
→ 𝑒𝑥1(𝑒𝑥2 + 𝑒𝑦2) > 𝑒𝑥2(𝑒𝑥1 + 𝑒𝑦1)
36
Highest score
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
•𝑒𝑥1
𝑒𝑥1+𝑒𝑦1 >𝑒𝑥2
𝑒𝑥2+𝑒𝑦2
→ 𝑒𝑥1 ∙ 𝑒𝑥2 + 𝑒𝑥1 ∙ 𝑒𝑦2 > 𝑒𝑥2 ∙ 𝑒𝑥1 + 𝑒𝑥2 ∙ 𝑒𝑦1
37
Highest score
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
•𝑒𝑥1
𝑒𝑥1+𝑒𝑦1 >𝑒𝑥2
𝑒𝑥2+𝑒𝑦2
→ 𝑒𝑥1 ∙ 𝑒𝑥2 + 𝑒𝑥1 ∙ 𝑒𝑦2 > 𝑒𝑥2 ∙ 𝑒𝑥1 + 𝑒𝑥2 ∙ 𝑒𝑦1
38
Highest score
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
•𝑒𝑥1
𝑒𝑥1+𝑒𝑦1 >𝑒𝑥2
𝑒𝑥2+𝑒𝑦2
→ 𝑒𝑥1 ∙ 𝑒𝑥2 + 𝑒𝑥1 ∙ 𝑒𝑦2 > 𝑒𝑥2 ∙ 𝑒𝑥1 + 𝑒𝑥2 ∙ 𝑒𝑦1
39
Highest score
𝑒𝑥1+𝑦2 > 𝑒𝑥2+𝑦1
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
•𝑒𝑥1
𝑒𝑥1+𝑒𝑦1 >𝑒𝑥2
𝑒𝑥2+𝑒𝑦2
→ 𝑒𝑥1 ∙ 𝑒𝑥2 + 𝑒𝑥1 ∙ 𝑒𝑦2 > 𝑒𝑥2 ∙ 𝑒𝑥1 + 𝑒𝑥2 ∙ 𝑒𝑦1
40
Highest score
𝑒𝑥1+𝑦2 > 𝑒𝑥2+𝑦1 𝑥1 + 𝑦2 > 𝑥2 + 𝑦1
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Soft-max approximation
•𝑒𝑥
𝑒𝑥+𝑒𝑦 > 𝑃 → 𝑥 >𝑙𝑛 𝑃
𝑙𝑛 1−𝑃+ 𝑦
Soft-max approximation with NMS
• NMS:
•𝑒𝑥1
𝑒𝑥1+𝑒𝑦1 >𝑒𝑥2
𝑒𝑥2+𝑒𝑦2
→ 𝑒𝑥1 ∙ 𝑒𝑥2 + 𝑒𝑥1 ∙ 𝑒𝑦2 > 𝑒𝑥2 ∙ 𝑒𝑥1 + 𝑒𝑥2 ∙ 𝑒𝑦1
41
Highest score
𝑒𝑥1+𝑦2 > 𝑒𝑥2+𝑦1 𝑥1 + 𝑦2 > 𝑥2 + 𝑦1 𝑥1 − 𝑦1 > 𝑥2 − 𝑦2
• Speedup: 1.43x faster
Model Simplification and Acceleration
1 2 Soft-max
Approximation
Computational Complexity
42
Model operation complexity comparison
43
Experiment Result
Original MTCNN
Network Input size MAC number
P-Net 12x12 44.76K
P-Net* 120x160 55x75x44.76K
=184.6M
R-Net 24x24 1.531M
O-Net 48x48 12.91M
Ours
Network Input size MAC number
P-Net 12x12 7.872K
P-Net* 120x160 55x75x7.872K
=32.47M
R-Net 24x24 319.3K
O-Net 48x48 2.267M
*: Consider P-Net’s input is an image with size 120x160 but not a block only.
Quantization
44
Model size comparison
45
Experiment Result
Original MTCNN
Network Data type Model size (Byte)
P-Net float32 26.04K
R-Net float32 398.5K
O-Net float32 1.542M
Total 1.966M
Ours
Network Data type Model size (Byte)
P-Net int8 1.088K
R-Net int8 137.4K
O-Net int8 402.6K
Total 541.2K
• On FDDB database:
Quantization Result
46
Word Length Accuracy @
FPPI 0.1
Original MTCNN 92.40%
Ours (float32) 88.20%
Ours (int8) 88.15%
• FPPI: False Positive Per Image ANDES
DSP 1 3
Quantization Method
47
ANDES DSP
1 3
• Weight quantization
𝑠ℎ𝑖𝑓𝑡 𝑛𝑢𝑚𝑏𝑒𝑟 = 7 − 𝑐𝑒𝑖𝑙(𝑙𝑜𝑔2(max (𝑎𝑏𝑠 𝑤𝑒𝑖𝑔ℎ𝑡 𝑚𝑖𝑛 , 𝑎𝑏𝑠 𝑤𝑒𝑖𝑔ℎ𝑡 𝑚𝑎𝑥 )))
𝑠ℎ𝑖𝑓𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 = 𝑟𝑜𝑢𝑛𝑑 𝑑𝑜𝑤𝑛 𝑜𝑙𝑑 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 × 2𝑠ℎ𝑖𝑓𝑡 𝑛𝑢𝑚𝑏𝑒𝑟
𝑠ℎ𝑖𝑓𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑠ℎ𝑖𝑓𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 > 126 = 127
𝑠ℎ𝑖𝑓𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑠ℎ𝑖𝑓𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 < −127 = −128
𝑓𝑖𝑛𝑎𝑙 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 = 𝑠ℎ𝑖𝑓𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 ÷ 2𝑠ℎ𝑖𝑓𝑡 𝑛𝑢𝑚𝑏𝑒𝑟
48
Quantization Method ANDES
DSP 1 3
• Layer output quantization
𝑠ℎ𝑖𝑓𝑡 𝑛𝑢𝑚𝑏𝑒𝑟
= 7
− 𝑐𝑒𝑖𝑙(𝑙𝑜𝑔2(max (𝑎𝑏𝑠 𝑙𝑎𝑦𝑒𝑟 𝑜𝑢𝑡𝑝𝑢𝑡 𝑚𝑖𝑛 , 𝑎𝑏𝑠 𝑙𝑎𝑦𝑒𝑟 𝑜𝑢𝑡𝑝𝑢𝑡 𝑚𝑎𝑥 )))
𝑤ℎ𝑖𝑙𝑒 (𝑠ℎ𝑖𝑓𝑡_𝑠𝑡𝑎𝑟𝑡):
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑟𝑜𝑢𝑛𝑑 𝑑𝑜𝑤𝑛 𝑜𝑢𝑡𝑝𝑢𝑡 × 2𝑠ℎ𝑖𝑓𝑡 𝑛𝑢𝑚𝑏𝑒𝑟
𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 > 126 = 127
𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 < −127 = −128
𝑓𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑜𝑢𝑡𝑝𝑢𝑡 ÷ 2𝑠ℎ𝑖𝑓𝑡 𝑛𝑢𝑚𝑏𝑒𝑟
𝑠ℎ𝑖𝑓𝑡𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 += 1
49
Quantization Method ANDES
DSP 1 3
• 𝑜𝑢𝑡𝑝𝑢𝑡 = −4, −0.24, −0.20, … , 0.19, 0.23, 4
Example
50
ANDES DSP
1 3
• 𝑜𝑢𝑡𝑝𝑢𝑡 = −4, −0.24, −0.20, … , 0.19, 0.23, 4
Example
51
ANDES DSP
1 3
7 − 𝑙𝑜𝑔2 4 = 5
• 𝑜𝑢𝑡𝑝𝑢𝑡 = −4, −0.24, −0.20, … , 0.19, 0.23, 4
• 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑠ℎ𝑖𝑓𝑡 5 = [−4, −0.25, −0.1875, … , 0.1875, 0.21875, 3.96875]
Example
52
ANDES DSP
1 3
• 𝑜𝑢𝑡𝑝𝑢𝑡 = −4, −0.24, −0.20, … , 0.19, 0.23, 4
• 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑠ℎ𝑖𝑓𝑡 5 = [−4, −0.25, −0.1875, … , 0.1875, 0.21875, 3.96875]
• 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑠ℎ𝑖𝑓𝑡 6 = [−2, −0.234375, −0.203125, … , 0.1875, 0.234375, 1.984375]
Example
53
ANDES DSP
1 3
• 𝑜𝑢𝑡𝑝𝑢𝑡 = −4, −0.24, −0.20, … , 0.19, 0.23, 4
• 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑠ℎ𝑖𝑓𝑡 5 = [−4, −0.25, −0.1875, … , 0.1875, 0.21875, 3.96875]
• 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑠ℎ𝑖𝑓𝑡 6 = [−2, −0.234375, −0.203125, … , 0.1875, 0.234375, 1.984375]
Example
54
More precise
ANDES DSP
1 3
Speed-up each step
55
• On FDDB database:
56
Experiment Result • FPPI: False Positive Per Image
Method Accuracy @
FPPI 1.0
Speedup @ Andes RISC-V
Processor
Ori-MTCNN 94.66% -
Ours 90.68% 106x
Method Accuracy
@ FPPI 0.1 Accuracy
@ FPPI 0.01 FPS
(Titan X GPU)
FPS (1080-Ti)
Brodmann17 89.25% 81.88% 200 90
DeepIR 88.45% 82.16% <=1
Xiaomi 87.82% 77.99% 2?
Faceness 86.04% 79.67% 1
Hyperface 85.63% 80.68% 0.33
DP2MFD 85.57% 76.73% <0.05
MTCNN 92.40% 84.95% 51
Ours 88.15% 82.59% 54
57
Step Baseline Sim#1 Fast soft-max DSP-Sim#1 DSP-Sim#2
Overall 294.0129 99.81 53.69 3.88 2.78
Overall Speedup - 2.95 1.86 13.84 1.397
FPS 0.0034 0.01002 0.01863 0.25776 0.3601
P-Net Overall time
97.25 77.2 31.2 1.54 1.18
P-Net Overall speedup
- 1.26 2.47 20.30 1.30
R-Net Overall time
59.08 6.158 6.028 0.989 0.628
R-Net Trigger Times 46 22 22 32 29
R-Net normalize 1.28 0.28 0.274 0.0309 0.022
R-Net normalize speedup - 4.59 1.02 8.87 1.43
O-Net Overall time
132.19 15.034 15.004 1.35 0.96
O-Net Trigger Times 14 9 9 8 9
O-Net normalize 9.44 1.67 1.67 0.17 0.107
O-Net normalize speedup - 5.65 1.002 9.9 1.57
58
Step Baseline Sim#1 Fast soft-max DSP-Sim#1 DSP-Sim#2
Overall 294.0129 99.8111858 53.687959 3.879538 2.777296
Overall Speedup - 2.94569088 1.8590982 13.8388 1.396875954
FPS 0.0034012107406
68638 0.01001891713455
0104 0.01862615026848
5554 0.25776 0.360062449
P-Net Overall time
97.248312473297119
77.170741379261017
31.195177435874939
1.536423 1.180413
P-Net Overall speedup
- 1.26017077 2.4738036 20.3038 1.301597831
R-Net Overall time
59.077883005142212
6.1582962274551392
6.0284666419029236
0.988531 0.627762
R-Net Trigger Times 46 22 22 32 29
R-Net normalize 1.284302 0.27992256 0.2740212 0.03089 0.021646966
R-Net normalize speedup - 4.58806178 1.0215361 8.87087 1.426989815
O-Net Overall time
132.18732833862305
15.033685207366943
15.003592789173126
1.345193 0.961341
O-Net Trigger Times 14 9 9 8 9
O-Net normalize 9.441952 1.67040947 1.6670659 0.16815 0.106815667
O-Net normalize speedup - 5.65247753 1.0020057 9.91416 1.574207274