27
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rafal Kuklinski, Piotr Lewalski Amazon Text-to-Speech 12/09/2016 Amazon Polly A service that turns text into lifelike speech

Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Monthly Webinar Series

Embed Size (px)

Citation preview

Rafal Kuklinski, Piotr Lewalski Amazon Text-to-Speech12/09/2016Amazon PollyA service that turns text into lifelike speech

2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

1

Introduction to Amazon PollyFeatures and functionalitiesText-to-Speech: Under the HoodGetting started Workshop & DemoPricingQ&AWhat to Expect from the Session

2

Introduction to Amazon Polly

3

Why we built PollyApps using voice to communicate with end-users are becoming more common every dayNaturalness of generated speech is a key element of user experienceIntegration of speech varies across use cases

What is PollyA service that converts text into lifelike speechOffers 47 lifelike voices and 24 languagesLow latency responses enable developers to build real-time systemsDevelopers can store, replay and distribute generated speech

Polly QualityNatural sounding speechA subjective measure of how close TTS output is to human speech.

Accurate text processingAbility of the system to interpret common text formats such as abbreviations, numerical sequences, homographs etc.Today in Las Vegas, NV it's 90F."We live for the music", live from the Madison Square Garden.

Highly intelligibileA measure of how comprehensible speech is.Peter Piper picked a peck of pickled peppers.

Polly Language PortfolioAmericas:Brazilian PortugueseCanadian FrenchEnglish (US) Spanish (US)

A-PAC:Australian English Indian English Japanese

EMEA:DanishDutch British EnglishFrenchGermanIcelandicItalianNorwegian Polish PortugueseRomanianRussianSpanishSwedishTurkishWelshWelsh English

7

Features and Functionality

8

Polly features: SSMLSpeech Synthesis Markup Languageis a W3C recommendation, anXML-basedmarkup languageforspeech synthesisapplications

My name is Kuklinski. It is spelled Kuklinski

9

Polly features: LexiconsEnables developers to customize the pronunciation of words or phrases

My daughters name is Kaja.

KajakajaKAJA"kaI.@

10

Text-to-Speech: Under the Hood

11

Goal: Convert text into intelligible, accurate, and natural speechChallengesHomographs: words written identically that have different pronunciation I live in Las Vegas vs This presentation broadcasts live from Las VegasText normalization: disambiguation of abbreviations, acronyms, units St. expanded as street or saintConversion of text to phonemes (Grapheme-to-Phoneme) in languages with complex mapping such as English e.g. tough, through, thoughForeign words (dj vu), proper names (Franois Hollande), slang (ASAP, LOL) etc. Main Challenges of Text-to-Speech

12

TEXTMarket grew by > 20%.WORDSPHONEMES{{{{{twn.ti p.sntm.ktgubamo nPROSODY CONTOURUNIT SELECTION AND ADAPTATION

TEXT PROCESSINGPROSODY MODIFICATIONSTREAMING

Marketgrewbymorethantwentypercent

Speech units inventory

13

Unit SelectionConversion of phoneme sequence to waveformDatabase of recorded audioUnit diphone

Coverage of diphones and various featurese.g. Allophonic variationPin vs Spin vs limping

14

Recording Data for TTSTons of textRecording script:Few weeks of recordings

Automatic selection of textsRecording script:Covers all combinations of diphones and significant features in a language

15

an error occurred while searching for your routebecause snaps weren't all so obedient anymore,now we say apple again. and we say apple,general electric soars today. information on general electricquick breads, zucchini, holiday, crock pot, cake,so are you still keeping tabs on your old team,that weighs more than four tons, disrupts the herring's swimAnappleaday,keeps

16

Getting started

17

Get started

18

Voicing your blog

19

AWS Blog

20

ArchitectureRSS FeedAmazon PollyAmazon CloudWatchAmazon S3

AWS Lambda

1. Trigger2. Check3. Content4. Text5. Audio6. Audio

21

Workshop & Demohttps://github.com/awslabs/amazon-polly-sample

22

Sourcefrom boto3 import Session, resourcefrom contextlib import closing

polly = Session().client(polly")

response = polly.synthesize_speech(Text="Sample content", OutputFormat="mp3", VoiceId="Joanna")

with closing(response["AudioStream"]) as stream: bucket = resource("s3").Bucket("podcasts")bucket.put_object(Key="output.mp3", Body=stream.read())

23

SummaryRSS FeedAmazon PollyAmazon CloudWatchAmazon S3

AWS Lambda

1. Trigger2. Check3. Content4. Text5. Audio6. Audio

24

Other use cases

25

Polly is cost-effectivePay-as-you-go$4 for 1M charactersFree Tier of 5M characters/month - first yearYou can store and reuse generated speech

26

Thank you!

null15696.0null3216.0null3168.0null2256.0null5163.758null1584.0null1728.0null2208.0null2016.0null12864.0