Pr!!ch: A System for Privacy- Preserving Speech Transcription · 2. Sensitive word scrubbing Final Transcript In Pr,,ch, the DP noise does NOT deteriorate the utility, instead it

Pr𝜀𝜀ch: A System for Privacy-Preserving Speech Transcription

Shimaa Ahmed, Amrita Roy Chowdhury, Kassem Fawaz, and Parmesh Ramanathan

USENIX Security Symposium 2020

1

Speech Transcription*

Speech transcription applications:

2

• Scalable

• Privacy-preserving

• Accurate

* Transcription by IBM Speech-to-Text Demo

Speech Transcription Services

Cloud-based transcription

Deep Speech

+by

3

Open-source offline transcription

Performance Comparison

Standard datasets

4

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

LibriSpeech DAPS TIMIT

WER* (%)

GoogleAWSDeep Speech

* Word-Error-Rate (WER) = "#$#%&𝐷:𝐼:𝑆:𝑁:

# Deletion# Insertion# Substitution# Reference words

Performance Comparison

Real world use-cases• Facebook hearing before the US Senate

• Supreme Court case “Carpenter v. United States”

• VCTK: non-American accent dataset• Speaker p266 of an Irish accent• Speaker p262 of a Scottish accent

5

Standard vs Real World Performance

0.00%5.00%

10.00%15.00%20.00%25.00%30.00%35.00%40.00%45.00%

LibriS

peec

hDAPS

TIMIT

VCTK - p266

VCTK - p262

Faceb

ook1

Faceb

ook2

Faceb

ook3

Carpne

ter1

Carpen

ter2

WER (%)

GoogleAWSDeepSpeech

6

Standard Real World

Off-the-shelf offline transcribers are not reliable for real world applications

Speech is a Rich Source of Sensitive Information

Voice biometrics

• Personal attributes

• Identity

• Impersonation

7technology can clone a speaker’s voice from a short segment of their speech

Textual content

• Sensitive words

• Statistical analysis of the entire transcript• Topic model• Stylometry analysis• Document classification• Sentiment analysis

8

Speech is a Rich Source of Sensitive Information

Utility-Privacy Trade-off

9

Utility

Cloud service providers

Offline service providers

Privacy

Cloud service providers

Offline service providers

Goal: Design an end-to-end transcription system that provides an intermediate solution along the utility-privacy spectrum

• Obfuscates the users’ voice biometrics

• Protects the sensitive textual content

• Improves on the transcription accuracy compared to offline systems

• Provides control knobs to customize its utility and privacy levels

Original SpeechPrivacy-Preserving Operations

High-accuracy Transcription Final Transcript

De-noising

10

Pr𝜀𝜀ch: Privacy-Preserving Speech Transcription







De-noising

11


Voice Biometrics

Many-to-One Voice Conversion

Voice Conversion

12

Senator Harris

Senator Thune

Mark Zuckerberg

0% accuracy in matching original speakers with their voice-converted speech using Azure speaker identification API







De-noising

13


Break the context

• Segmentation

• Sensitive words scrubbing Original speech Segmentation

~3 non-stop words

Sensitive WordsScrubbing

14

Deep Speechtranscription

information about 87million Facebook users being obtained by the company CambridgeAnalytica

Break the context

• Segmentation

• Sensitive words scrubbing Original speech Segmentation

~3 non-stop words


15


PocketSphinx


Break the context

• Segmentation

• Sensitive words scrubbing

• The textual content is transferred into a bag-of-words model

Original speech Segmentation

~3 non-stop words


16



Differentially Private (DP) Words’ Histogram

• Bag-of-words: histogram of words

• Apply DP to the true histogram• A randomized mechanism 𝐴: ℕ|"| → ℕ|"| satisfies (𝜀, 𝛿)-DP, if for any pair of

histograms 𝐻# and 𝐻$ such that ||𝐻#, 𝐻$||# = 𝑑, and for any set 𝑂 ⊆ ℕ|"|,

Words

Coun

t

17

Pr 𝐴(𝐻# ∈ 𝑂] ≤ 𝑒% Pr 𝐴(𝐻$ ∈ 𝑂] + 𝛿

DP Challenges

• Pr𝜀𝜀ch has access only to the speech, but not the transcript• No access to the true histogram

• The noise ‘dummy words’ must be added in the speech domain

• The dummy words must be indistinguishable from the true speech• Segment length• Voice• Language model

18

Dummy Segments Generation

The Lanham Act's banon federal registrationof scandaloustrademarks is not arestriction on speechbut a valid conditionon participation in afederal program.

Dummy Text Generation TTS

Domain

Legal, courtcase, violation

Original Speech

Offline transcriber

19

ban on federal registration

DP Mechanism

trademarks is not a restriction

in a federal program

End-to-End System(1) Hide voice biometrics (2) Ensure noise indistinguishability

Protect the textual content

20

The Lanham Act's banon federal registration ofscandalous trademarksis not a restriction onspeech but a validcondition onparticipation in a federalprogram.

At issue in thiscase is thegovernment'swarrantlesscollection of 127days ofPetitioner's cellsite locationinformation

Original speech

DomainLegal, courtcase, supremecourt

1. Segmentation

3. Dummy Text Generation 4. TTS

5. Voice Conversion

7. Transcription

8. De-noising and de-shuffling

6. Partitioning,adding noise, and shuffling.

government's At on scandalous

cell issue of location7. Transcription

2. Sensitive wordscrubbing

Final Transcript

In Pr𝜀𝜀ch, the DP noise does NOT deteriorate the utility, instead it adds monetary cost overhead

speech not a registration

in condition a federal restriction

valid Petitioner's 127 Lanham on collection

Participation in this program




• Provides the users with control knobs to customize its utility and privacy levels



De-noising

21


End-to-End System

22

The Lanham Act's banon federal registration ofscandalous trademarksis not a restriction onspeech but a validcondition onparticipation in a federalprogram.

At issue in thiscase is thegovernment'swarrantlesscollection of 127days ofPetitioner's cellsite locationinformation

Original speech

DomainLegal, courtcase, supremecourt

1. Segmentation

3. Dummy Text Generation 4. TTS

5. Voice Conversion

7. Transcription

8. De-noising and de-shuffling

6. Partitioning,adding noise, and shuffling.

government's At on scandalous

cell issue of location7. Transcription

2. Sensitive wordscrubbing

Final Transcript

speech not a registration

in condition a federal restriction

valid Petitioner's 127 Lanham on collection

Participation in this program

Utility: Transcription Accuracy

Dataset No Voice Conversion

One-to-one Voice Conversion

Many-to-One Voice Conversion

Deep Speech

VCTK p266 5.15 16.55 21.92 26.72VCTK p262 4.53 7.39 10.82 15.97Facebook1 8.26 14.60 20.30 24.72Facebook2 9.75 18.27 19.44 26.61Facebook3 14.93 23.25 27.06 30.72Carpneter1 14.43 23.88 22.63 25.85Carpenter2 13.53 33.71 38.90 39.71

Table: WER(%) at different settings of Pr𝜀𝜀ch vs Deep Speech

23

2% − 32.5%44% − 80%

Textual Content Privacy

Noise Level

Original Word Cloud

24

Formal Privacy Guarantee

Post-processing of DP:

25

For a speech file 𝑆, Pr𝜀𝜀ch provides perfect voice privacy using many-to-one voice conversion and an (𝜀, 𝛿)-DP guarantee on the word histogram for the domain considered, under the assumption that the dummy segments are indistinguishable from the true segments.

Any statistical analysis on the noisy words’ histogram does not cause loss in privacy

TakeawaysPr𝜀𝜀ch as a privacy-preserving speech transcription system:• Provides an improved performance relative to offline transcription• 2% to 32.52% relative improvement in WER

• Obfuscates the speakers’ voice biometrics• 0% accuracy in matching real speakers with their voice-converted speech

• allows only a DP view of the textual content.

26

Contact us: [email protected]

mailto:[email protected]

Documents

Pr!!ch: A System for Privacy- Preserving Speech Transcription · 2. Sensitive word scrubbing Final Transcript In Pr,,ch, the DP noise does NOT deteriorate the utility, instead it