Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Jingrong Wang†, Kaiyang Liu†∗, George Tzanetakis†, Jianping Pan†
†Department of Computer Science, University of Victoria, Victoria, Canada∗School of Information Science and Engineering, Central South University, Changsha, China
Email: {jingrongwang, liukaiyang, pan}@uvic.ca, [email protected]
Learning-based Cooperative Sound Event Detection with Edge Computing
1
Problem and Motivation
•Gunshot violence increasing…– 6,000+ reported last year in USbut 80% more unreported
– Slow response time: about 10minutes since incoming 911 calls
– Lives and evidence lost
•New services, e.g., ShotSpotter– Sensors installed in certain places
– Audio clips sent to cloud for ID
– 90% identified in about 1 minute– Cost and scalability problem
2
(1) Spectrogram (2) Log-scale mel spectrogram
How to identify a sound event?
•First, extract the audio features
“Short-Time Fourier Transform” + “Log-Mel Spectrogram”
3
Then classification based on extracted features
[1] S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, “Youtube-8M: A large- scale video classification benchmark,” arXiv preprint arXiv:1609.08675, 2016.
Frame-levelFeatures
Up-projection Layer
Pooling Classifier
Randomlychoose 128 batches asaudio features
E.g., Deep Bag-of-Frames learning-based approach [1]
NVIDIA GTX 970 4GB:“4h+” + “300MB+”
4
Challenges
•Delay-sensitive + computation-intensive– Front-end devices → limited computation capabilities [2]
– Cloud → high communication latencies [3]– Communication among devices, or through an access point
•Edge computing– Enhances and extends the cloud services at the edge of the network
– Deploys computation capacity closer to where the data is captured
– Breakdown between devices, edge and cloud?
[2] X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, “DeepDecision: A mobile deep learning framework for edge video analytics,” in Proc. of IEEE INFOCOM, 2018.[3] K. Hong, D. Lillethun, U. Ramachandran, B. Ottenwa ̈lder, and B. Koldehofe, “Mobile fog: A programming model for large-scale applications on the internet of things,” in Proc. of ACM SIGCOMM workshop on Mobile cloud computing, 2013, pp. 15–20.
5
Edge computing system setup
•Front-end acoustic devices– Slow local execution
• Edge server– Wireless comm. overhead
• Cloud server– Backbone congestion
6
Why multiple acoustic sensors?
100 200 300 400 500 600Distance (m)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Prob
abili
ty o
f gun
shot
•Localization by triangulation
•Classification accuracy is affected by:– Training data (Google Audioset)
– Learning algorithm (DBof)
– Distance
•Near field
•Reverberant field
• Joint localization and classification needed
7
•Least-squares formulation – Time difference of arrival (TDOA)
– Minimize the quadratic difference between the predicted and the actual value
•Deadzone– Hyperbolas + measurement noise
DeadzoneEnd devices
Localization
8
•Merge multiple learners to obtain a more accurate prediction than any individual learner alone– Ensemble learning → Majority vote
Aggregated classifier
High confidence Majority vote
9
Performance evaluation: Scenario and metrics
•Metrics–Response time (RT)
–Classification accuracy (CA)–Localization error (LE)
–Dead zone ratio (DZ)
0 100 200 300 400 500X (m)
0
100
200
300
400
500
Y (m
)
End devices Edge server
Fig. 1 Grid deployment Fig. 2 Random deployment
Tab. 1 System parameters
10
Performance evaluation – Response time
4 9 16 25 36# of devices
0
1
2
3
4
5
Res
pons
e tim
e (s
)
Local Edge Cloud
4 6 8 10 12Computation capability ratio
0
1
2
3
4
Res
pons
e tim
e (s
)
Local Edge Cloud
Fig. 3 Response time
11
Performance evaluation – Classification accuracy
4 9 16 25 36# of devices
0
0.2
0.4
0.6
0.8
1
Cla
ssifi
catio
n ac
cura
cy
w/o EC with EC
0.1 0.2 0.3 0.4 0.5Threshold
0
0.2
0.4
0.6
0.8
1
Cla
ssifi
catio
n ac
cura
cy
w/o EC with EC
Fig. 4 Classification accuracy
12
Performance evaluation – Localization & random deployment
4 9 16 25 36# of devices
22.22.42.62.8
33.23.43.63.8
Avg
. loc
aliz
atio
n er
ror (
m)
0.180.20.220.240.260.280.30.320.340.36
Dea
dzon
e
Avg. localization errorDeadzone
RT CA LE DZ0
0.5
1
1.5
2
2.5
3
3.5
Val
ue
Grid Random
Fig. 5 Localization performance Tab. 2 Impact of deployment
13
Conclusion and future work
•Edge-assisted sound event detection framework
– Computation capacity at the edge of the network
• Ensemble-based cooperative processing – Aggregates information for a more accurate result
•Future work– Realistic sound propagation model + complex acoustic scenario – Distance-weighted differentiation
14
Thanks!
Q&A
15
Wireless communication model• Path loss model
𝑃𝐿# = 𝑃𝐿 𝑑& + 10θlog𝑑#𝑑&
– 𝑑#(in m) > 𝑑& is the distance between the base station and device 𝑛– θis the path loss exponent– 𝑑& is the reference distance for the antenna far-field propagation effect
• Received signal strength 𝑃# = 𝑃01-𝑃𝐿#-𝑋34
– 𝑃01 (in dBm) is the transmitted power of device 𝑛– 𝑋34 denotes the shadowing fading (in dB) subject to the Gaussian distribution with zero mean and standard deviation 𝜎6
• Maximum uplink transmission rate
𝑟#01 = 𝑊log9(1 +10;</6&
𝐼# + 𝑁&)
– 𝑊is the channel bandwidth,𝑁& (in mW) is the noise power – 𝐼# (in mW) is the interference signal from other devices