Learning-based Cooperative Sound Event Detection with Edge ...jrwang/file/IPCCC_slides.pdf · Jingrong Wang†, Kaiyang Liu†∗, George Tzanetakis†, Jianping Pan† †Department

Jingrong Wang†, Kaiyang Liu†∗, George Tzanetakis†, Jianping Pan†

†Department of Computer Science, University of Victoria, Victoria, Canada∗School of Information Science and Engineering, Central South University, Changsha, China

Email: {jingrongwang, liukaiyang, pan}@uvic.ca, [email protected]

Learning-based Cooperative Sound Event Detection with Edge Computing

1

Problem and Motivation

•Gunshot violence increasing…– 6,000+ reported last year in USbut 80% more unreported

– Slow response time: about 10minutes since incoming 911 calls

– Lives and evidence lost

•New services, e.g., ShotSpotter– Sensors installed in certain places

– Audio clips sent to cloud for ID

– 90% identified in about 1 minute– Cost and scalability problem

2

(1) Spectrogram (2) Log-scale mel spectrogram

How to identify a sound event?

•First, extract the audio features

“Short-Time Fourier Transform” + “Log-Mel Spectrogram”

3

Then classification based on extracted features

[1] S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, “Youtube-8M: A large- scale video classification benchmark,” arXiv preprint arXiv:1609.08675, 2016.

Frame-levelFeatures

Up-projection Layer

Pooling Classifier

Randomlychoose 128 batches asaudio features

E.g., Deep Bag-of-Frames learning-based approach [1]

NVIDIA GTX 970 4GB:“4h+” + “300MB+”

4

Challenges

•Delay-sensitive + computation-intensive– Front-end devices → limited computation capabilities [2]

– Cloud → high communication latencies [3]– Communication among devices, or through an access point

•Edge computing– Enhances and extends the cloud services at the edge of the network

– Deploys computation capacity closer to where the data is captured

– Breakdown between devices, edge and cloud?

[2] X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, “DeepDecision: A mobile deep learning framework for edge video analytics,” in Proc. of IEEE INFOCOM, 2018.[3] K. Hong, D. Lillethun, U. Ramachandran, B. Ottenwa ̈lder, and B. Koldehofe, “Mobile fog: A programming model for large-scale applications on the internet of things,” in Proc. of ACM SIGCOMM workshop on Mobile cloud computing, 2013, pp. 15–20.

5

Edge computing system setup

•Front-end acoustic devices– Slow local execution

• Edge server– Wireless comm. overhead

• Cloud server– Backbone congestion

6

Why multiple acoustic sensors?

100 200 300 400 500 600Distance (m)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Prob

abili

ty o

f gun

shot

•Localization by triangulation

•Classification accuracy is affected by:– Training data (Google Audioset)

– Learning algorithm (DBof)

– Distance

•Near field

•Reverberant field

• Joint localization and classification needed

7

•Least-squares formulation – Time difference of arrival (TDOA)

– Minimize the quadratic difference between the predicted and the actual value

•Deadzone– Hyperbolas + measurement noise

DeadzoneEnd devices

Localization

8

•Merge multiple learners to obtain a more accurate prediction than any individual learner alone– Ensemble learning → Majority vote

Aggregated classifier

High confidence Majority vote

9

Performance evaluation: Scenario and metrics

•Metrics–Response time (RT)

–Classification accuracy (CA)–Localization error (LE)

–Dead zone ratio (DZ)

0 100 200 300 400 500X (m)

0

100

200

300

400

500

Y (m

)

End devices Edge server

Fig. 1 Grid deployment Fig. 2 Random deployment

Tab. 1 System parameters

10

Performance evaluation – Response time

4 9 16 25 36# of devices

0

1

2

3

4

5

Res

pons

e tim

e (s

)

Local Edge Cloud

4 6 8 10 12Computation capability ratio

0

1

2

3

4

Res

pons

e tim

e (s

)

Local Edge Cloud

Fig. 3 Response time

11

Performance evaluation – Classification accuracy

4 9 16 25 36# of devices

0

0.2

0.4

0.6

0.8

1

Cla

ssifi

catio

n ac

cura

cy

w/o EC with EC

0.1 0.2 0.3 0.4 0.5Threshold

0

0.2

0.4

0.6

0.8

1

Cla

ssifi

catio

n ac

cura

cy

w/o EC with EC

Fig. 4 Classification accuracy

12

Performance evaluation – Localization & random deployment

4 9 16 25 36# of devices

22.22.42.62.8

33.23.43.63.8

Avg

. loc

aliz

atio

n er

ror (

m)

0.180.20.220.240.260.280.30.320.340.36

Dea

dzon

e

Avg. localization errorDeadzone

RT CA LE DZ0

0.5

1

1.5

2

2.5

3

3.5

Val

ue

Grid Random

Fig. 5 Localization performance Tab. 2 Impact of deployment

13

Conclusion and future work

•Edge-assisted sound event detection framework

– Computation capacity at the edge of the network

• Ensemble-based cooperative processing – Aggregates information for a more accurate result

•Future work– Realistic sound propagation model + complex acoustic scenario – Distance-weighted differentiation

14

Thanks!

Q&A

15

Wireless communication model• Path loss model

𝑃𝐿# = 𝑃𝐿 𝑑& + 10θlog𝑑#𝑑&

– 𝑑#(in m) > 𝑑& is the distance between the base station and device 𝑛– θis the path loss exponent– 𝑑& is the reference distance for the antenna far-field propagation effect

• Received signal strength 𝑃# = 𝑃01-𝑃𝐿#-𝑋34

– 𝑃01 (in dBm) is the transmitted power of device 𝑛– 𝑋34 denotes the shadowing fading (in dB) subject to the Gaussian distribution with zero mean and standard deviation 𝜎6

• Maximum uplink transmission rate

𝑟#01 = 𝑊log9(1 +10;</6&

𝐼# + 𝑁&)

– 𝑊is the channel bandwidth,𝑁& (in mW) is the noise power – 𝐼# (in mW) is the interference signal from other devices

Documents

Learning-based Cooperative Sound Event Detection with Edge ...jrwang/file/IPCCC_slides.pdf · Jingrong Wang†, Kaiyang Liu†∗, George Tzanetakis†, Jianping Pan† †Department