36
S Legrand Snack for Ruby

S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Embed Size (px)

Citation preview

Page 1: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

S Legrand

Snack

for

Ruby

Page 2: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Talk Objectives

Tour of APILearn the walk and talkHave Fun

Page 3: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

SnackSnack library is a tool to aid in the learning about sound, voice, ASR, and is hopefully a fun way to experimentSnack is a tcl-based APISnack has been adapted to and included in Standard Python Distribution

Page 4: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

SnackSnack is Swedish for “talk” or “chat”Kåre Sjölander is the principal investigator for tcl-based snackTcl Snack is available at http://www.speech.kth.se/snack/

Page 5: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Snack for

RubyrbSnack is a ruby wrapper around tcl snackrbSnack has additional ruby based utilitiesrbSnack has html-based help. (rdoc+rbTeX)rbSnack can be found at http://rbsnack.sourceforge.net/

Page 6: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Snack Toolkit Includes

Recording, PlaybackWaveform displaySpectrogram: Fourier, LPCFormant analysisPower analysisFilters

(will demo)

Page 7: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

The Speech Signal

Continuous speech is discretely sampledSignal consist of rapidly changing data points.The display of the sampled signal is called the waveformSnack can display the waveform real-time

Page 8: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Analysis uses framesSignal is broken into framesFrames may overlapCharacteristics of signal analyzed using Fourier and LPC analysis on a per frame basis.

Page 9: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Going in Circles

Complex numbers is just a funny way of multiplying: add angles.

Eulers formula

Page 10: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Fourier Analysis

Fourier matrix is an unitary matrixMultiplication by Fourier matrix returns the frequency components of the signal, called the Fourier coefficientsEasy to compute the inverse: Called Fourier Inverse

Page 11: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

The Fourier Matrix Looks Like

Spinning disks

Multiplication by signal produces Fourier coefficients (frequency components)

Page 12: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Examining Fourier components

A Spectrogram gives a picture of the Fourier components (coefficients) as they evolve over time. Snack can display real time.Looks like an X RayBands of high activity correspond to formants

Page 13: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Linear Filters

Useful to understand nature of speech signalsGenerators: generate square waves, sin waves, saw tooth, etc.Composers: composes several filters.FIR: Finite impulse responseIIR: Infinite impulse response

Page 14: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

FIR Filter

Determined completely by response to a unit impulse.Response finite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n)

(We will demo FIR using rbSnack)

Page 15: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

IIR Filter

Also called Recursive filterResponse infinite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n) +a1 y(t-1)+ a2y(t-2)+…+an y(t-n)

(We will demo IIR using rbSnack)

Page 16: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Linear Predictive Analysis

Analogous to Fourier analysisAssumption: For each frame, the signal is predicted by

The LPC coefficients are the best least squares approximation.Can also be used to predict formants

y(t)=a1 y(t-1)+ a2y(t-2)+…+ap y(t-p)

Page 17: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

What is Sound? What is Speech?

Sound is the resulting signal created by the longitude waves in some medium like air.Sound waves are continuousCan be decomposed into linear combination of sin waves.Speech is a special noise made by humans

Page 18: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

It’s Just Tubing…

The simplest model of speech is to consider the lungs and trachea as one long tube.Resonance frequencies are called Formants.

F1 F2

Page 19: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Some Speech Recognition

FeaturesFormantsPitchVoiced/UnvoicedNasalityFrication

Energy

Our current work only uses Formants and Energy

Page 20: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Basic UtterancesA basic unit of speech is called a PhoneVowels are utterances with constant formantsDiphthong is the transitioning from one vowel to anotherVowels and Diphthongs are essentially characterized by the first and second formant.

Page 21: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Other Phones: The Consonants

Plosives: closure in oral cavity /p/Nasal: Closure of nasal cavity /m/Fricative: Turbulent airstream noise /s/Retroflex liquid: Vowel like-tongue high curled back /r/Lateral liquid: Vowel like, tongue central, side air stream /l/Glide: Vowel like /y/

Page 22: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Some Problems with Speech Signals

Segmentation: when does a word begin and end? (Noise?)Wet ware: (speaker’s internal configuration + lip smacks, breathing etc.)

SegmentationWorkshop demos one approach.

Page 23: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Code Books

A code book consists of code words.Idea is to search through code book to find code word corresponding to best match of feature sequence.RbSnack uses codebook approach in word recognition.

Page 24: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Code Book Approach

++ Easy to implement

+ Good for isolated words

+- Works best on small vocabularies

-- Is insensitive to context, prone to errors

Page 25: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Code Book Approach

WhichWay is a simple demo of this approach

Page 26: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

More Problems with Speech Signals

Accent: Southern vs. New England vs. California Valley vs. Other.Variation in rate of speech makes it hard to compare words

Page 27: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Dynamic Time Warping

A pattern comparison techniqueA way of stretching or compressing one sequence to match another.Evaluated using dynamic programming

Page 28: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Dynamic Programming

Form a grid, with start at lower left, end at upper right.Label each node with difference (error) between pattern 1 at time i and pattern 2 at time j.Find minimal distance from start to end using

Page 29: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Dynamic Programming

A possible path

Basic Assumption:

If best path P(S,E) passes through node N, then P(S,E) is the concatenation of P(S,N) (best from S to N) and P(N,E) (best from N to E)

Page 30: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Dynamic Programming

RbSnack includes examples for various time alignment approaches

1

2 13

3

2

Type I

Type III

Page 31: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Dynamic Programming

1

Itakura

1 1

Type IV

1

11

1

1

Page 32: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Hidden Markov Models

Sometime the second (or third) best match is the right word. Use HMM’s to ascertain the correct word in the context of the sentence. (Ditto for phones within a word)HMM’s are similar to non-deterministic finite state machines, except for they have non-deterministic output.

Page 33: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Hidden Markov Models

Dynamic Programming is used to compute weights.HMM’s look like

31

4

2

.4

.2

.4

P(/i/)=.5 P(/a/)=.2 P(/o/)=.3

Page 34: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

PossibleFuture Directions

Examine other features, (pitch?)Incorporate other libraries. (Do the computationally hard work in C)Add more signal processing routinesAdd more examplesUse Hidden Markov Models

Page 35: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

Lessons Learned/to be learned

Document everything.Nothings perfectAutomate everythingProject is never done

Page 36: S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun

What’s next?Try it out.