Sound shredding moustafa

  • View

  • Download

Embed Size (px)

Text of Sound shredding moustafa

  1. 1. Sound Shredding : Privacy Preserved Audio Sensing Presenter: Moustafa Alzantot (UCLA) Sumeet Kumar, et al. Carnegie Melon University
  2. 2. Introduction Sound sensing can be very useful for context awareness. Identify user location and activities Potential risks on users privacy Speech recognition Speaker identification How to preserve user privacy without comprising the context awareness accuracy ?
  3. 3. Research Question This paper presents two approaches for preserving user privacy without significantly decreasing the context recognition accuracy or consuming much battery in Encryption/Decryption. Sound shredding Sound subsampling
  4. 4. Methodology Activity context: the place where the activity takes place (e.g. restaurant for dinning) Context identification process: Audio Data Collection: 35 sounds collected at 8KHz using nexus 4 phone. Feature Extraction: Sliding window frame (40 ms window , 50%overlap) 12 MFCC features for every window. Context Recognition: Experiments using both simple KNN, and SVM.
  5. 5. Methodology Sound Subsampling: collection part of raw data. 50% subsampling discarding one frame after every single frame is stored. Subsampling results in a slight drop in context recognition accuracy.
  6. 6. Methodology Sound Shredding: randomize the audio frames order in a sound snippet.
  7. 7. Results : Context Recognition Accuracy Collected 35 sound samples in different contexts (faculty meeting, restaurant, walking, coffee shop) 80% of data for training, 20% for testing. Context recognition accuracy is slightly dropped.
  8. 8. Results: Privacy User Study User study involves playing different sounds (shredded, and sub- sampled) Users rated the ability of speech recognition, gender identification, and people counting. Scale used from 1(Yes, I can) to 5 (Not, at all). Gender identification improves the least by 20%.
  9. 9. Results: Computer Based Recognition
  10. 10. Results: Reconstructing based on frequency content Number of (10ms) frames in 10 seconds audio snippet = 667 frames. Number of possible orderings = 667! (intractable to break shredding by bruteforce). Reconstructing by frequency content Greedly match the left and right edge of subsequent frames in frequency domain. Can reconstruct if audio is broken in 5 or less segments
  11. 11. Critique of work(1slide) Sound subsampling alone is not sufficient for privacy preserving (at least for people counting, and gender identification). Shredding can be attacked (As they mentioned at the end of paper) Should compare against other methods (like filtering or perturbing the speech frequency range in the audio collected)