Rafal Kuklinski, Piotr Lewalski Amazon Text-to-Speech12/09/2016Amazon PollyA service that turns text into lifelike speech
2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1
Introduction to Amazon PollyFeatures and functionalitiesText-to-Speech: Under the HoodGetting started Workshop & DemoPricingQ&AWhat to Expect from the Session
2
Introduction to Amazon Polly
3
Why we built PollyApps using voice to communicate with end-users are becoming more common every dayNaturalness of generated speech is a key element of user experienceIntegration of speech varies across use cases
What is PollyA service that converts text into lifelike speechOffers 47 lifelike voices and 24 languagesLow latency responses enable developers to build real-time systemsDevelopers can store, replay and distribute generated speech
Polly QualityNatural sounding speechA subjective measure of how close TTS output is to human speech.
Accurate text processingAbility of the system to interpret common text formats such as abbreviations, numerical sequences, homographs etc.Today in Las Vegas, NV it's 90F."We live for the music", live from the Madison Square Garden.
Highly intelligibileA measure of how comprehensible speech is.Peter Piper picked a peck of pickled peppers.
Polly Language PortfolioAmericas:Brazilian PortugueseCanadian FrenchEnglish (US) Spanish (US)
A-PAC:Australian English Indian English Japanese
EMEA:DanishDutch British EnglishFrenchGermanIcelandicItalianNorwegian Polish PortugueseRomanianRussianSpanishSwedishTurkishWelshWelsh English
7
Features and Functionality
8
Polly features: SSMLSpeech Synthesis Markup Languageis a W3C recommendation, anXML-basedmarkup languageforspeech synthesisapplications
My name is Kuklinski. It is spelled Kuklinski
9
Polly features: LexiconsEnables developers to customize the pronunciation of words or phrases
My daughters name is Kaja.
KajakajaKAJA"kaI.@
10
Text-to-Speech: Under the Hood
11
Goal: Convert text into intelligible, accurate, and natural speechChallengesHomographs: words written identically that have different pronunciation I live in Las Vegas vs This presentation broadcasts live from Las VegasText normalization: disambiguation of abbreviations, acronyms, units St. expanded as street or saintConversion of text to phonemes (Grapheme-to-Phoneme) in languages with complex mapping such as English e.g. tough, through, thoughForeign words (dj vu), proper names (Franois Hollande), slang (ASAP, LOL) etc. Main Challenges of Text-to-Speech
12
TEXTMarket grew by > 20%.WORDSPHONEMES{{{{{twn.ti p.sntm.ktgubamo nPROSODY CONTOURUNIT SELECTION AND ADAPTATION
TEXT PROCESSINGPROSODY MODIFICATIONSTREAMING
Marketgrewbymorethantwentypercent
Speech units inventory
13
Unit SelectionConversion of phoneme sequence to waveformDatabase of recorded audioUnit diphone
Coverage of diphones and various featurese.g. Allophonic variationPin vs Spin vs limping
14
Recording Data for TTSTons of textRecording script:Few weeks of recordings
Automatic selection of textsRecording script:Covers all combinations of diphones and significant features in a language
15
an error occurred while searching for your routebecause snaps weren't all so obedient anymore,now we say apple again. and we say apple,general electric soars today. information on general electricquick breads, zucchini, holiday, crock pot, cake,so are you still keeping tabs on your old team,that weighs more than four tons, disrupts the herring's swimAnappleaday,keeps
16
Getting started
17
Get started
18
Voicing your blog
19
AWS Blog
20
ArchitectureRSS FeedAmazon PollyAmazon CloudWatchAmazon S3
AWS Lambda
1. Trigger2. Check3. Content4. Text5. Audio6. Audio
21
Workshop & Demohttps://github.com/awslabs/amazon-polly-sample
22
Sourcefrom boto3 import Session, resourcefrom contextlib import closing
polly = Session().client(polly")
response = polly.synthesize_speech(Text="Sample content", OutputFormat="mp3", VoiceId="Joanna")
with closing(response["AudioStream"]) as stream: bucket = resource("s3").Bucket("podcasts")bucket.put_object(Key="output.mp3", Body=stream.read())
23
SummaryRSS FeedAmazon PollyAmazon CloudWatchAmazon S3
AWS Lambda
1. Trigger2. Check3. Content4. Text5. Audio6. Audio
24
Other use cases
25
Polly is cost-effectivePay-as-you-go$4 for 1M charactersFree Tier of 5M characters/month - first yearYou can store and reuse generated speech
26
Thank you!
null15696.0null3216.0null3168.0null2256.0null5163.758null1584.0null1728.0null2208.0null2016.0null12864.0