17
Philips tech. Rev. 37, 207-219, 1977, No. 8 207 Speaker recognition by computer We all know from experience that, people can be easily recognized by their voices. This comes about not just from their individual habits of speech, but from the endless variety of the acoustic parameters of the vocal tract throughout the population - it is almost impossible to find two voices that are perfectly identical. The voice may therefore serve as a passport, provided there are technical means of distinguishing the individual voices from one another with a sufficiently high degree of reliability. This is where the techniques for speech analysis come in. These were developed years ago for resolution and resynthesis of the speech signal to condense the information so that speech could be transmitted in narrower frequency bands; now they have been turned to the more general applications of man-machine communication by voice. As might have been expected, computers have come to play a large part in this. Whereas the prospect of a free and fluent conversation between man and machine still seems to be a long way off, the simpler task of recognizing a person by his voice has been proved to be well within the capabil- ities of a machine and high reliability has been obtained. Research projects on this subject are going on in several places in the world. One· of these is a government-sponsored project calledAUROS at Philips Forschungslaboratorium Hamburg, in West Germany. Introduetion When listening to somebody who is talking, a human listener perceives what is said, who is speaking and how it is said. The human brain is capable of splitting up the complex information contained in the speech signal to answer simultaneously questions about the meaning of what is said, the identity of the speaker and his emotional state. Investigations have been going on for some time to see whether the artificial intelligence avail- able in computers can be employed to perform at least part of this task. Speech recognition by computers - i.e. computers understanding what is said - still seems to be a very difficult problem and to give valuable results only with restricted vocabularies and restricted groups of speak- ers. On the other hand, speaker recognition - the identification of who is speaking - seems to have made considerable progress and to be nearly ready for prac- tical application. The value of a device for speaker recognition is ob- vious. It would be highly efficient if a simple voice Dr E. BUIIge,formerly with Philips GmbH Forschungslaboratorium Hamburg, Hamburg, West Germany, is IIOW with Bundeskriminal- amt, Wiesbaden, West Germany, E. Bunge utterance was sufficient in all those cases where a person's identity has to be established or verified. The human voice is very specific to the individual speaker; it cannot be lost or stolen, it cannot easily be imitated [IJ, and it can be transported over long distances at low cost. For all these reasons the human voice would provide an ideal 'acoustic passport'. This passport could be used to supplement existing systems to in- crease their security, or by itself, e.g. to allow author- ized access to money from banks and to confidential information from data banks and government au- thorities. A second application isin crime investigation, where recorded voices of blackmailers and kidnappers have to be compared with the voices of suspects. Systems for speaker recognition have one or more speech-input terminals, connected to a central com- puter. The connection may be made via the public telephone system. The voice is analysed in the com- puter and compared with stored voice-patterns. [1] B. S. Atal, Automatic recognition of speakers from their voices, Proc. IEEE 64,460-475, 1976. A. E. Rosenberg, Automatic speaker verification: a review, Proc. IEEE 64, 475-487, 1976.

Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Embed Size (px)

Citation preview

Page 1: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Philips tech. Rev. 37, 207-219, 1977, No. 8 207

Speaker recognition by computer

We all know from experience that, people can be easily recognized by their voices.This comes about not just from their individual habits of speech, but from the endlessvariety of the acoustic parameters of the vocal tract throughout the population - it isalmost impossible to find two voices that are perfectly identical. The voice may thereforeserve as a passport, provided there are technical means of distinguishing the individualvoices from one another with a sufficiently high degree of reliability. This is where thetechniques for speech analysis come in. These were developed years ago for resolution andresynthesis of the speech signal to condense the information so that speech could betransmitted in narrower frequency bands; now they have been turned to the more generalapplications of man-machine communication by voice. As might have been expected,computers have come to play a large part in this. Whereas the prospect of a free andfluent conversation between man and machine still seems to be a long way off, the simplertask of recognizing a person by his voice has been proved to be well within the capabil-ities of a machine and high reliability has been obtained. Research projects on this subjectare going on in several places in the world. One· of these is a government-sponsoredproject calledAUROS at Philips Forschungslaboratorium Hamburg, in West Germany.

Introduetion

When listening to somebody who is talking, a humanlistener perceives what is said, who is speaking and howit is said. The human brain is capable of splitting up thecomplex information contained in the speech signal toanswer simultaneously questions about the meaning ofwhat is said, the identity of the speaker and hisemotional state. Investigations have been going on forsome time to seewhether the artificial intelligence avail-able in computers can be employed to perform at leastpart of this task.

Speech recognition by computers - i.e. computersunderstanding what is said - still seems to be a verydifficult problem and to give valuable results only withrestricted vocabularies and restricted groups of speak-ers. On the other hand, speaker recognition - theidentification of who is speaking - seems to have madeconsiderable progress and to be nearly ready for prac-tical application.The value of a device for speaker recognition is ob-

vious. It would be highly efficient if a simple voice

Dr E. BUIIge,formerly with Philips GmbH ForschungslaboratoriumHamburg, Hamburg, West Germany, is IIOW with Bundeskriminal-amt, Wiesbaden, West Germany,

E. Bunge

utterance was sufficient in all those cases where aperson's identity has to be established or verified. Thehuman voice is very specific to the individual speaker;it cannot be lost or stolen, it cannot easily be imitated [IJ,

and it can be transported over long distances at lowcost. For all these reasons the human voice wouldprovide an ideal 'acoustic passport'. This passportcould be used to supplement existing systems to in-crease their security, or by itself, e.g. to allow author-ized access to money from banks and to confidentialinformation from data banks and government au-thorities. A second application is in crime investigation,where recorded voices of blackmailers and kidnappershave to be compared with the voices of suspects.Systems for speaker recognition have one or more

speech-input terminals, connected to a central com-puter. The connection may be made via the publictelephone system. The voice is analysed in the com-puter and compared with stored voice-patterns.

[1] B. S. Atal, Automatic recognition of speakers from theirvoices, Proc. IEEE 64,460-475, 1976.A. E. Rosenberg, Automatic speaker verification: a review,Proc. IEEE 64, 475-487, 1976.

Page 2: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

208 E. BUNGE

Speaker identification has to be distinguished fromspeaker verification. In speaker verification an un-known speaker claims to be a certain person X, and bycomparing his voice data with that of speaker X thesystem decides whether he really is speaker X or animpostor. For speaker identification, an unknown voiceis compared with a stored voice-pattern library and the.system finds out if the voice is present in the library,and if so, who it belongs to.The greater part of this article is a description of an

experimental speaker-recognition equipment called'AUROS' (AUtomatic Recognition Of Speakers),which has been designed at Philips Forschungslabor-atorium Hamburg, in West Germany, for a govern-ment-sponsored research project at PFH. The systemhas been set up to enable us to compare the merits ofdifferent speaker-identification and verification proce-dures. Attention is given to the error rates in exper-iments with larger groups of subjects, and also to thecomputer time required and the complexity of theequipment, which is quite different for different proce-dures. The experiments already made - which will bereported in the last section - indicate that the errorrates are of the order of one per cent for procedures ofaverage complexity [21.

In speaker recognition by computer there are twofundamental problems: first, all speech utterances areunique and not exactly reproducible, and, secondly, theamount of data to be processed in short time intervalsis extremely large. Let us look more closely at these twoproblems and at the procedures that they compel us toadopt.

The non-reproducibility of human speech

No-one can pronounce a sentence twice in an iden-tical manner. Each repetition yields a different acousticrepresentation. Fig. 1 shows three curves of soundpressure as a function of time for three utterances ofthe vowel 'a' pronounced by the same speaker. Thedifferences between the three curves are evident al-though the speaker tried hard to produce identicalutterances.The non-reproducibility also holds for any feature

extracted from the original time function. Fig. 2 showsas examples the Fourier spectra and the 'cepstrum'functions of the three versions of the vowel 'a'. Thisfunction is obtained by computing the power spectrumof the logarithm of the Fourier spectrum. The Fourierspectrum has periodic ripples corresponding to theharmonics of the speech signal; the frequency spacingbetween these ripples is equal to the fundamental fre-quency of the speech. The cepstrum, which is again atime-domain function, will have a peak correspondingto the periodicity of the Fourier spectrum, and in-

Philips tech. Rev. 37, No. 8

______________ L- _

I .

_____________L _

25 SOms~t

Fig. 1. Sound pressure p as a function of time f for three utter-ances of the vowel 'a' by the same speaker. The different curveshapes demonstrate that human speech is not exactly reproduc-ible; this constitutes a problem in speaker recognition by com-puter.

dicating the fundamental pitch period of the speech [31.

It follows that the voice of a speaker cannot bedescribed by one single utterance, because other utter-ances produce different data. Instead, it is necessary toprocess a set of utterances statistically to determine thecharacteristic features of a voice. This can be donesystematically by pattern-recognition techniques.

Page 3: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Philips tech. Rev. 37, No. 8 SPEAKER RECOGNITION 209

Fig. 2. Fourier spectra (a) and cepstra (b) for the three utterances of vowel 'a' represented infig. 1. Spectrum and cepstrum functions also appear not to be reproduced exactly. P powerdensity.jfrequency. C cepstrum function. T pitch period.

p

t

3 IJ.-"f

o 2

Q

The large amount of data to be processed

The sampling theorem shows that with an upperlimiting frequency of 5 kHz, 1 second ofhuman speechcan be represented by a string of 10 000 samples. Witha dynamic range of 48 dB, each sample can be coded asan 8-bit computer word. Under these circumstances-- which correspond to those- in our experiment -

c

t

5kHz o 15 20--..r5 10

[2) E. Bunge, Automatic speaker recognition system AUROS forsecurity systems and forensic voice identification, Proc. 1977Int. Conf. on Crime Countermeasures - Science and en-gineering, Oxford 1977, pp. 1-7.E. Bunge, Vergleichende systematische Untersuchung zurautomatischen Identifikation und Verifikation kooperativerSprecher, dissertation, Darmstadt 1977.

[3) A. M. Noli, Short-time spectrum and 'cepstrum' techniquesfor vocal-pitch detection, J. Acoust. Soc. Amer. 36, 296-302,1964.

Page 4: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

210

the amount of information contained in 1 second ofhuman speech is equal to 80 kilobits. Since manyseconds of speech by many different speakers have tobe stored and processed, problems of amount ofstorage and computer time are inevitable. Fortunately,the speech signal has high redundancy; the same char-acteristic features are encountered over and overagain. These features can be extracted by speciallydeveloped processors, which is preferably done in realtime. Instead of the speech signal it is only necessary tostore these voice-specific features in the computer, toform the reference with which newly arrived speechutterances are compared in a manner similar to theprocedures used in optical-pattern recognition bycomputers. The term .'pattern recognition' will there-fore be used here.

General description of the AUROS equipment

A meaningful approach to the speaker-recognitionproblem is to combine real-time speech-analysis tech-niques with statistical pattern-recognition methods.But there is no a priori knowledge that indicates whichcombination of analysis and pattern-recognitionmethods will givethe highest recognition rates for largepopulations. The AUROS equipment allows the meritsof different combinations to be assessed. It consists ofa general-purpose computer for pattern recognitionand a set of the signal-analysis processors referred toabove. These have been specially designed by us forthis application; the voice features resulting from thesignal analysis are stored in the computer memory. Adedicated computer for simulation is also available.Most of the processors perform an analysis in real

time on consecutive time segments of the speech signal.Each segment is described by a set of numbers re-presenting quantities such as the output voltages of afilter bank used to measure the frequency spectrum.These numbers are considered as the components of amultidimensional vector. The probability-density dis-tribution over the vector space, or a mean vector, iscalculated from the consecutive vectors.A second set of analysis procedures is based on

linear transformations such as the fast Fourier trans-form, the Walsh-Hadamard transform and inversefiltering. They are used off-line to obtain pitch contoursand cepstrum functions, and the frequencies and band-widths of the higher-order speech resonances (theformants).

An explanation of the three linear transformationsmentioned here is perhaps appropriate. Thefast Fouriertransform is simply a computer algorithm enabling usto carry out the complicated computations - implyingseveral multiplications - of the Fourier coefficients in

E. BUNGE Philips tech. Rev. 37, No. 8

a reasonable time. The repeated computation ofFourier spectra required in some applications has onlybecome feasible thanks to this algorithm [41.

. Like the Fourier transform, the Walsh-Hadamardtransform is used to compute a frequency spectrum. Inplace of the sine and cosine functions, however, it usessquare-wave functions for the set of orthogonal fun-tions required for the signal decomposition [51: Thesquare-wave functions only have the values +1 and-1;their use simplifies digital implementation.

Inverse filtering, finally, is based on ·the observationthat the frequency spectrum of the human voice, char-acterized by the frequencies and bandwidth of theformants, derives mainly from the acoustic resonancesof the vocal tract; the excitation by the glottis - whichsets up a pulse train - has in itself a rather flat fre-quency spectrum. If we succeed in filtering the speechsignals in such a way that a flat spectrum is again ob-tained, then the filter has a transfer function that is theinverse of the acoustic transfer function of the speak-er's vocal tract. The parameters of the inverse filter aretherefore characteristic ofthe speaker [61.

In the AUROS system the analysis data, obtainedeither in real time or by the more time-consuming lineartransformations mentioned above, is fed to the gen-eral-purpose computer for statistical evaluation and'classification', i.e. assignment to a known speaker.A large set of pattern-recognition methods is available.

A supervisory program allows a special analysisprocedure and a pattern recognition method to beselected and combined for special experiments. Theinput to the system takes the form of voice signals,which can either be received from microphones in anon-line experiment or recorded on magnetic tape foroff-line experiments with large data bases. The outputof the system takes the form of statistical data de-scribing voices, recognition rates, error rates andrejection rates.The procedure followed in speaker recognition can

generally be described by three phases:- preprocessing the speech signal,- 'training' the system, and- testing its reliability.Each of these three phases will now be dealt with inmore detail. This provides a 'natural opportunity topresent an outline of several speech-analysis proce-dures and classification algorithms.

[4] J. W. Cooley and J. W. Tukey, An algorithm for the machinecalculation of complex Fourier series, Math. of Comp. 19,297-301,1965.

[5] A good introduetion is given by W. K. Pratt, J. Kane andH. C. Andrews, Hadamard transform image coding, Proc.IEEE 57, 58-68, 1969.

[6] J. D. Markel, Digital inverse filtering - a new tool forformant trajectory estimation, IEEE Trans. AU-20, 129-137,1972.

Page 5: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Philips tech. Rev. 37, No. 8 SPEAKER RECOGNITION 211

Fig. 3. Four analyses of a 20-ms segment of vowel 'a'. Even in such a short segment the speechdata is characteristic ofthe individual speaker and can provide sufficient data for voice patternsto be recognized bya computer. a) Fourier power spectrum. b) Power spectrum derived by theWalsh-Hadamard transform. c) Cepstrum function. d) Autocorrelation function. r correlationcoefficient. -,;time shift.

Preprocessing phase

As was stated earlier, the acoustic speech signalcontains information concerning what, who and how,and the most suitable approach is therefore not to usethe redundant speech signal for the recognition proce-dure, but to describe it by a set of features that aretypical of the speaker's identity. This is what has to bedone in the preprocessing phase. In speaker recognition

p

f

3-f

4 5kHz

p

f

2 3 4 5kHz-f

there are three general ways of feature extraction. Oneway is to rely on the analysis of one short preselectedtime segment of the speech utterance. Another methodis to measure contours such as those of pitch or energycontent, during a specific utterance. The third way isto compute the mean (or the probability-density distri-bution) of the feature vectors characterizing successivetime segments. Each of the three methods was studiedby means ofthe AUROS system.

L- ~ ~__~

Short-time analysis of a preselected segment

The idea behind the procedure of analysis of a singlepreselected segment is that the speaker-specific in-formation is already contained in a segment of a fewtens of milliseconds of speech. So, using a segmentationalgorithm, a special segment of this length, e.g. a fewpitch periods of a nasal sound, is selected from theacoustical signal. The same segment is selected from

c

f

o 5s:

15-T

20 25ms

o 5 10 15_T

20 25ms

the same 'code word' for all the speakers. Operationssuch as a Fourier transform, a Walsh-Hadamard trans-form or a cepstrum transform are performed on thissegment. The coefficients of these transforms are thecomponents of feature vectors used in the pattern-recognition procedure. Fig. 3 shows four examples ofan analysis ofa 20-ms segment ofvowel 'a'.

The problem in short-term feature extraction is theaccuracy of the preceding segmentation algorithm. If

Page 6: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

212

the defined segment in the acoustic signal is not locatedexactly, the following steps are not meaningful sincerepresentations of physically different events are beingcompared.

Contour analysis

Contour analysis describes the temporal structureof an utterance. The acoustic signal is segmentedequidistantly. For each segment only a single feature.e.g. the pitch period or the energy, is evaluated. Thesequence of these segment features forms the resultingcontour curve. Fig. 4 shows as an example the pitchcontour of the sentence 'My name is Nemo' which wasobtained by applyingcepstrum analysis and peak detec-tion to the segments (about 100) of this speech utter-ance.The results can be combined to form a feature vector

of a dimensionality equal to the number of segments.The consequence of this is that if a given sentence isspoken slowly the dimensionality will be high (becausemany segments are produced), and if it is pronouncedrapidly a low dimensionality will result. However, sincethe pattern-recognition algorithms assume equaldimensionality for each speaker, complicated time-normalization methods have to be applied to the con-tour vectors to stretch them for fast speakers and tocompress them for slow speakers. These time-normal-ization or time-warping methods are complicated andrequire a great deal of computer time [7].

The use of contour analysis for feature extraction inspeaker recognition entails a restrietion to code-word-related recognition, as with the previous method. Text-independent recognition is not possible because thetime-normalization methods are adapted to a givencode word.

my na .•. me is nemo ...

I -----.::::- _.-- ._. --.._-_....... '.

T ~

o 2 35Fig.4. Contour analysis describes the temporal structure of anutterance. Here the value of the pitch period T has been evaluatedfor each 20-ms segment, and the sequence of pitch values iscombined to the pitch contour curve of the sentence 'My nameis Nemo'. These contour curves differ from speaker to speakerand can function as specific features in a recognition procedure.

E. BUNGE Philips tech. Rev. 37, No. 8

Statistical speech-signal analysis

The speech signal is divided into a sequence of 20-mssegments by equidistant segmentation. A set of coef-ficients is evaluated for each segment by one of theavailable methods of analysis; these include its lineartransformations mentioned earlier and also the deter-mination of the autocorrelation function. As men-tioned earlier, a feature vector is combined from thesequence of these sets of coefficients, either by evaluat-ing the mean-value vector or by approximating prob-ability densities of event groups.

This kind of statistical feature extraction will givespeaker recognition without the condition that all thespeakers speak the same prescribed text; however, wefound that the processed speech signal should last atleast 10 seconds. There are no segmentation or time-normalization problems. These advantages turn out tobe of considerable importance, which is why, in thecourse of our investigations, we have come to preferthis.stätistical method in our research project. As anexample of this technique, fig. 5 shows the averagedlong-term Fourier spectrum, the long-term Walsh-Hadamard transform, the long-term cepstrum, and thelong-term autocorrelation function. All of these havebeen evaluated from the same text.

Training phase

To characterize the voice of a speaker it is necessaryto combine the analysis results of a few utterances (atleast ten) to a typical voice reference. Because of thepoor reproducibility ofhuman speech a singleutterancedoes not contain a sufficient amount of information.It is in the training phase that a reference 'pattern' foreach voice is stored in the system.

Each feature vector generated by preprocessing thespeech signal can be mapped as a pattern point in anN-dimensional feature space, where N is a quantitysuch as the number of Fourier coefficients. Each utter-ance of a speaker produces a new point in this space.The entirety of all the pattern points of the trainingphase forms a 'cloud' in the N-dimensional space.

Different methods can be followed to find outwhether a pattern point produced by an utterance froman unknown speaker must be assumed to belong to agiven speaker or not. It depends on the method whichfurther operations the voice data will have to undergowhile training the system. One method is to evaluatethe distance from the point to the centre of gravity ofthe cloud of pattern points belonging to the knownspeaker. Another method establishes whether the newpoint lies inside the (hyper)sphere enveloping the cloud.A third method divides the N-dimensional space into.subspaces and determines the probability for each

Page 7: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Philips tech. Rev. 37, No. 8 SPEAKER RECOGNITION 213

Fig. 5. Long-term averages over an utterance about 10 s long of Fourier spectrum (a), Walsh-Hadamard spectrum (b), cepstrum function (c), and autocorrelation function (d). Thestatistical feature-extraction method avoids the problems ofsegmentation and time normaliza-tion and permits text-independent speaker recognition.

subspace that a point in it will belong to any of thegiven speakers. The operations performed during thetraining phase on the points in the N-dimensionalspace in the first case consist in the evaluation of thecentre of gravity of the cloud of points, in the secondcase in the construction of the (hyper)sphere, in thelast case in the determination of the probabilities forthe individual subspaces.

oQ -f

p

f

o 2 3 4. 5kHz-f

We have used the three methods outlined here inspeaker-recognition experiments. There were manymodifications for measurements of distance, datanormalization and probability-density approximation.These methods will now be described in some moredetail. The assignment of a point to a given speaker iscalled 'classification', which is related to the term 'class'used for the array of pattern points of a speaker; thealgorithm performing the assignment is called a'classificator' .

Minimum-distance classificator

The centre of gravity of the cloud of pattern pointsis evaluated independently for each speaker and storedas reference vector. An unknown pattern point isclassified by evaluating the distances from this point tothe centres of gravity of each cloud. This point is thenassociated with the cloud whose centre of gravity is thenearest.

clf I

15 20 25ms-T

r

f

L- ~ _L .___L ~~ ~

15 20 25mso 5 10 _"

The distance measure used for defining the proximityof points may be the usual Euclidean distance. Thedistance measure mayalso be based on the correlationbetween the two corresponding feature vectors. Thecorrelation is at a maximum when the two vectorscoincide; the distance may be defined as the amountby which the correlation falls short of this maximum.

[7] G. R. Doddington, A computer method of speaker verifica-tion, dissertation, Madison 1970.

Page 8: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

214

Let the feature vectors be X and Y with cornponents(Xl, X2, ... , XN) and (yi, Y2, ... , YN). The Euclidean distance is

The correlation between the two vectors is expressed by the cor-relation coefficient, whose maximum value is I. By subtractingthe true correlation coefficient from I we obtain the distancemeasure

clc(X,Y) = I -

Both distance measures can be altered by weightingeach dimension by the inverse of its variance betweendifferent utterances of the same speaker, to base thedecision mainly on the parameters of the feature vectorwith the highest reliability.

Distribution-free tolerance-region classificator

The region occupied in multidimensional space by acloud of pattern points can also be taken as its primecharacteristic, regardless of the distribution of thepoints over this region. This region may be demarcatedby evaluating an enveloping hypersphere of the cloudof pattern points in a manner that avoids intersectionsbetween different classes. The mathematical equationof this hypersphere defines the tolerance region insidewhich a point must fall to be classified with the corre-sponding speaker, and thus represents his voice charac-teristics. To classify an unknown pattern point a checkis made for each class to see whether the point lieswithin its hypersphere or not. If the point cannot befound in any tolerance region, it will be rejected.

Minimum-risk classificator

With the minimum-risk classificator, the N-dimen-sional pattern space is subdivided into subspaces ofequal volume. For each subspace a conditional prob-ability is approximated by counting the number ofpattern points in this volume independently for eachspeaker. This conditional probability indicates thelikelihood that a pattern point, given that it belongs toa certain speaker, willlie in that subspace. For classi-fication of a pattern point a check is made to find outwhich subspace it lies in, and the values of the corre-sponding conditional probabilities for this subspace arecompared. The unknown point is then classified withthe speaker having the highest conditional probability.In this way the risk of loss due to a wrong classificationis minimized. Fig. 6 shows the principle for the one-dimensional case.

E. BUNGE Philips tech. Rev. 37, No. 8

(I)

Reliability-test phase

During the reliability-test phase, utterances that havenot been used for training are offered to the system forclassification. In a speaker-identification test the utter-ance belonging to one of the speakers entered into thesystem may be either correctly classified, or wronglyclassified, or rejected as unclassifiable. The wrongclassifications and rejections together determine thetypical error rate for the system performance. Wrongclassifications cause speaker confusion; by countingthe number of confusions the probability of confusionof each pair of speakers can be represented in a 'con-fusion matrix'.

In speaker verification the system only has to decidebetween the alternatives of acceptance or rejection. Anerror may be either the false rejection of a customer orthe false acceptance of an impostor. Fig. 7 shows asimplified speaker-verification experiment with realdata. Speaker X spoke a test sentence 15 times. A long-term spectrum was evaluated for each utterance.Fig.7a shows a plot of these 15 spectra. The non-reproducibility is clearly seen. In a 'model' trainingphase, a tolerance region was constructed in form of amask by connecting all the minimum and maximumvalues (fig. 7b); no consideration was given to the distri-bution of the values between the maximum and min-imum. For verification, the speaker X repeated thesame text, and the result of the analysis was comparedwith the reference. It was accepted, as this long-termspectrum was within the tolerance region (fig.7c).When speakers Y and Z claimed to be speaker X, theywere rejected, because their curves did not fit thetolerance-region template (fig. 7d).

(2)

CP(x)

t 5,

-xFig.6. Principle of the minimum-risk classificator shown for aone-dimensional case. One-dimensional space (the abscissa x) issubdivided into equal lengths for each of which the number ofpoints belonging to the different classes is counted (two classes Sland S2 are shown). From these numbers it is possible to calculatethe conditional probability CP(x) that a point, given its classmembership, willlie in the subspace with abscissa x. An unknownpoint in subspace X will be assigned to the class with the highestconditional probability CP(x) in that subspace. In the figure forabscissa Xl this is class S'i. The risk of a wrong classification isthen at a minimum.

Page 9: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Philips tech. Rev. 37, No. 8 SPEAKER RECOGNITION 215

5 TO T5 20 25 30 35 40_n

Fig.7. Example of a speaker-verification experiment with realdata. a) Speaker X speaks a sentence fifteen times. The long-termspectrum is evaluated each time. n number of frequency channel.b) From the fifteen long-term spectra a speaker-specific template

L- ..__ ~_~~~_

Test results obtained with the AUROS system

A number of classification procedures have beentested using the AUROS system, each ofthem employ-ing several kinds ofvoice features, some ofwhich havebeen mentioned above. In experimenting with differentkinds of voice features the advantages of long-termspectra - the absence of segmentation and timenormalization - turned out to be very important, aswas the possibility of text-independent voice-featureextraction. We therefore found it preferable to use thelong-term spectra for our most important tests; theresults of these tests will now be reported.In the AUROS system the long-term spectra are

derived in real time, and this is very important for anypractical application. Spectral analysis is performed bya 43-channel filter bank covering the frequency bandfrom 100 Hz to 6 kHz; each filter has a bandwidth ofnearly 10 per cent ofthe centre frequency. Equidistantsegmentation is carried out by scanning the filter out-puts every 20 ms. The short-term spectra are averagedby a special digital multiplex integrator, which alsoperforms a rough energy normalization. For a com-parative study 5000utterances of 82male and 18 femalesubjects have been processed. Fig.8 shows long-termspectra of a l2-s utterance of two men (a) and twowomen (b). The absence of energy in the first threechannels for the women indicates the higher pitch oftheir voices.It turns out that for a given data base the variations

in the recognition rate with the chosen pattern-recognition ~Igorithm are considerable. Each al-gorithm only operates in the optimum fashion when aset of a priori assumptions is satisfied, e.g. concerningthe kind of distribution of the features or the correla-tion and covariance of the data of each individualspeaker (which may be expressed by a correlation anda covariance matrix). For that reason, investigation ofthe statistics of the data base is necessary to adapt thepattern-recognition algorithm to the data structure.From the correlation matrix of the 43 frequency

parameters averaged over 2500 long-term spectra (seefig. 9) it can be seen that the 43 parameters of thefeature vector are not statistically independent butcorrelated. The average correlation coefficient is 0.4.This correlation has to be taken into account whendesigningan appropriate pattern-recognition algorithm.For optimum performance of most of the pattern

recognition algorithms it is assumed that the com-ponents of the feature vectors are not cross-correlated.It can be shown from vector analysis that this, if it is

is constructed by connecting all the minimum and maximumvalues. c) A new long-term spectrum from speaker X fits thetemplate, and he is accepted. d) Long-term spectra from speakersY and Z do not fit the template. These speakers are rejected.

Page 10: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

216

indeed possible, is only the case in one particular posi-tion of the system of basic vectors of the vector space.The basic vectors are brought into this position byrotations in accordance with data provided by thecovariance matrix mentioned above. In this way decor-related, 'whitened' feature vectors are obtained. Apply-ing the minimum-distance classificator with Euclideandistance to the whitened data corresponds to 'Ma-halano bis classification' [8]; in our tests this yieldederror-free identification for all the speakers tested.

E. BUNGE Philips tech. Rev. 37, No. 8

minimum-risk classificator (with histogram approx-imation) were considerably faster and required far lessstorage capacity, while the recognition rates did notdrop below 98.5%.Speaker identification

For the recognition experiments the 5000 processed. utterances have been divided into two data bases of2500 utterances each. This allowed code-word-relatedrecognition and text-independent recognition to be in-

SJ.

Fig. 8. Long-term-averaged frequency spectra oftwo male voices (a) and two female voices (b).The spectra, which cover frequencies between 100 Hz and 6000 Hz, are obtained by scanninga 43-channel filter bank 50 times a second and adding the output voltages for every separatefilter in a multiplex digital integrator. Because of their higher pitch the female voices do notgive energy in the lowest three frequency channels.

Whenever recognition rates are being compared, itis necessary to consider all the assumptions and bound-ary conditions. In speaker-identification tests theMahalanobis classificator achieved' recognition ratesof 100% for 2500 utterances of 50 speakers; the ut-terances were the same apart from the name of thespeaker given in each utterance. However, the Ma-halanobis classificator is a rather 'expensive' algorithmin storage capacity and computer time. 'Economical'classificators like the minimum-distance classificator or

vestigated separately. The speaker-identification testswere performed with 'economical' pattern recognitionalgorithms. All the speakers tested were known to thesystem and the system was made to classify any ut-terance presented; rejection as 'unclassifiable' was notallowed. Recognition rates of 99.5% were achieved forboth data bases, code-word related as well as text-independent. For training the classificators 10 to 20utterances seem to be necessary to obtain this highrecognition accuracy.

Page 11: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Philips tech. Rev. 37, No. 8 SPEAKER RECOGNITION

Speaker verification

While in speaker identification only one rate - thatfor correct classification - has to be considered, aspeaker-verification experiment is characterized by twoerror rates; the rate for false acceptation (mix-up) andthe rate for false rejection. There is a functional relationbetween these two error rates, given by the decisionthreshold. During the training phase of speaker veri-fication, in addition to the reference pattern, this

speakers will be falsely rejected because of the vari-ability of their utterances. On the other hand, if thethreshold value is too high, speakers with similar voiceswill be falsely accepted and thus be mixed up with thereal speakers. So the value of the threshold is a trade-off between security of the system (low false-acceptrate) and customer convenience (low false-reject rate).The interdependence of these three variables is showninfig. 10 for a minimum-distance classificator.

Fig. 9. Cross-correlation between the frequency channel outputs of the filter bank is expressedby vertical deviation in the figure; n is the channel number. The figure is the average for 2500long-term spectra. The highest correlation is between adjacent channels.

threshold is evaluated. If a pattern with a claimed classmembership has to be verified, it is compared with thereference of that class using the classificator-typicalsimilarity measure. If the similarity exceeds a giventhreshold the claimed class membership is confirmedand the speaker is accepted; otherwise the speaker willbe rejected.The relation between the threshold value and the

error rates for false acceptance and false rejection iseasily seen: if the threshold value is too low, many

Various methods for decision-threshold evaluationby the computer have been compared. The most effec-tive method is to evaluate the mean of the distancemeasure determined by the absence of correlation (seeabove, p. 213). First, the mean is taken within each class,for the distance between each pair of points, and thenthe results are averaged over all the classes.

[8] K. Fukunaga, Introduetion to statistical pattern recognition,Academic Press, New York 1972.

217

Page 12: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

.----------------------------- --- _- ---_

218

24%

---- ThFig. 10. The dependence of false-accept rate (curve FA) and false-reject rate (curve FR) on threshold level Th in a speaker-verifica-tion system. A higher threshold value Th means that a greaterdistance is permitted between the pattern point of the speakerclaiming to be speaker X and the centre of gravity of speaker X'scloud of pattern points.

The distance measure referred to here is de, expressed by eq,(2). The threshold value obtained by averaging de is

ThNN= N

L XjjkXlIk

1- 1=1 , (3)

/N N

1, L Xjjk2 L XlIk2

1=1 1=1

where S is the number of speaker classes and M the number ofutterances per class. R is a scaling factor allowing the thresholdvalue to be shifted along the horizontal axis in fig. 10 to obtainthe desired false accept/false reject ratio. Generally R = 1; inparticular cases, however, R is adjusted by hand to another value.

Th

1

10 20 30----n

Fig. 11. During the training phase with a growing number 11 ofsamples offered to the speaker-verification system the similarity-threshold level Th gradually approaches a steady value that ishigh enough to make false rejections extremely unlikely. Some20 samples appear to be required to reach a steady level.

E. BUNGE Philips tech. Rev. 37, No. 8

It is important to know how many speech sampleshave to be used for training to establish a suitablethreshold value. Fig. 11 shows the threshold value as afunction of the sample size of the training phase. Thethreshold was initially evaluated using two samples andthe third sample of the same class was verified. Thenthis third sample was included for training, a newthreshold was evaluated and the fourth sample wasverified, and so on up to 49 samples for training. Thethreshold value is drawn as a stepped curve, while theactual distance for verification is plotted as a bar. Aslong as the bar is below the threshold, the sample iscorrectly accepted, otherwise it is falsely rejected as inthe case of the third bar. By adding this sample to thetraining phase the threshold, which was obviously toosmall, is increased, and the following samples areclassified correctly. After another correction by theninth bar, the threshold adapts almost asymptoticallyto a value at which no false rejects occur. For thispicture the false accepance rate was 0%.

In general, the verification experiments showed thatIOta 20 samples for the training phase seem to benecessary to obtain reliable results.

Applying the verification system to the two voice-data bases of 2500 samples yielded error rates of about1% for false accept and false reject in both cases, code-word related as well as text-independent.

The effects of transmission by telephone lineFor commercial application of a speaker-recognition system it

is important to know how telephone transmission of the speechsignal affects the security.

Transmitting speech over a telephone channel means freqency-band limitation from 300 Hz to 3500 Hz. The transfer char-acteristic in this frequency range is not flat but changes its shapedepending on the line selected by dialling. The spectra availablefrom telephone lines are therefore band-limited and also multi-plied by a transfer function of unknown shape. As a first step itcould be shown that band-limiting the long-term spectra to tele-phone quality did not affect the recognition rate significantly.However, weighting the spectra by arbitrary transfer functionsmade recognition based on long-term-averaged spectra unreliablebecause in some cases the effect of the transfer function on thespectra outweighed the voice characteristics. To overcome thisdifficulty a novel feature set has been introduced to eliminate theeffect of the unknown transfer functions. These features, whichwe call the modified standard-deviation profiles (MSP), are afunction of the voice characteristics only and are not affected bythe channel transfer function.

The j-th component Sbj of this novel feature vector is ex-pressed by

± I ajXjj- + ± ajXjj I1=1 /=1Sbj=----------

1 /Y L ajxjj

1=1

modified standard

(4)

In the numerator a deviation of the j-thspectral component of the i-th short-term spectrum is evaluated,

Page 13: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Philips tech. Rev. 37, No. 8 SPEAKER RECOGNITION 219

Q

sf.

t

Fig. 12. The effect of a hypothetical, erratic telephone-line trans-fer function (c or f) on a long-term spectrum (a) is shown by (d)and (g) respectively, the effect on the modified standard-deviationprofile (b) is shown to be non-existent (e, Iz). First recognitionexperiments using the modified standard-deviation profile as afeature vector yield promising results.

where I is the number of spectra taken. A large value of thenumerator points to a high short-term variability of the speechsignal. The numerator still contains the frequency-weightingfactor aj of the unknown transfer function. The average in thedenominator, on the other hand, describes the long-term prop-erties of the voice on which the unknown line characteristic issuperimposed. Since the unknown weighting factor aj occurs inboth numerator and denominator it vanishes and in this way theeffect of the unknown transfer function is eliminated. Fig.12shows a long-term spectrum (a) and an MSPvector (b). Simulatinga telephone line by a characteristic like that in fig. 12c orfchangesthe long-term spectrum to the curve in fig. 12d or g respectively,while the MSP vector remains unchanged (e, Iz). First recognitionexperiments with these MSP vectors and related functions yieldpromising results.

The work described was sponsored by the GermanFederal Ministry for Research and Technology undercontract No. 0812014 Kap. No. 3004 Titel 68301. Re-sponsibility for the contents of this publication restswith the author.

Summary. AUROS, an experimental system for speaker recogni-tion by computer, has been developed at Philips Forschungs-laboratorium Hamburg, in West Germany, in a government-sponsored research project. The system comprises several speech-signal processors which extract different characteristic featuresfrom the signal in real time, and can also use procedures such asthe fast Fourier transform, the Walsh-Hadamard transform, andcomputation of the autocorrelation and cepstrum functions. Ageneral-purpose computer is programmed to compare the fea-tures with the stored data from a group of known speakers;5000 utterances from 100 speakers are stored in a data base.Several different measures of similarity are being tried. Some ofthese are more economical than others in computer time andstorage capacity, but yield slightly higher error rates. In speaker-identification experiments using economical classificators ree-ognition rates of 99.5 % have been obtained. Experiments wherea speaker's claimed identity is verified yield error rates of aboutI % for false acceptance and false rejection.

Page 14: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

220 Philips tech. Rev. 37, No. 8

Recent scientific publicationsThese publications are contributed by staff of laboratories and plants which form part ofor cooperate with enterprises of the Philips group of companies, particularly by staff ofthe following research laboratories:

Philips Research Laboratories, Eindhoven, The Netherlands EPhilips Research Laboratories, RedhiIl, Surrey, England MLaboratoires d'Electronique et de Physique Appliquée, 3 avenue Descartes,94450 Limeil-Brévannes, France L

Philips GmbH Forschungslaboratorium Aachen, Weil3hausstraJ3e,51Aachen,Germany A

Philips GmbH Forschungslaboratorium Hamburg, Vogt-Kölln-StraJ3e 30,2000 Hamburg 54, Germany H

MBLE Laboratoire de Recherches, 2 avenue Van Becelaere, 1170 Brussels(Boitsfort), Belgium B

Philips Laboratories, 345 Scarborough Road, Briarcliff Manor, N.Y. 10510,U.S.A. (by contract with the North American Philips Corp.) N

Reprints of most of these publications will be available in the near future. Requests forreprints should be addressed to the respective laboratories (see the code letter) or to PhilipsResearch Laboratories, Eindhoven, The Netherlands.

W. J. Bartels, L. Blok & C. W. Th. Bulle: X-raytopography and diode efficiency of vapour grownGaAsI-XPXlayers.J. Crystal Growth34, 181-188, 1976(No. 2). E

V. Belevitch: Theory of the proximity effect in mul-tiwire cables, Part Il.PhiIips Res. Repts. 32,96-117, 1977(No. 2). B

C. Belin: On the growth of large single crystals of cal-cite by travelling solvent zone melting.J. Crystal Growth 34,341-344,1976 (No. 2). L

C. H. J. van den Brekel (I, Il) & J. Bloem (ll): Char-acterization of chemical vapour-deposition processes,Parts I and Il.Philips Res. Repts. 32, 118-133, 134-146, 1977 (No. 2).

E

A. M. van Diepen: The B-site Mössbauer linewidth inFe304.Physics Letters 57A, 354-356, 1976(No. 4). E

C. T. Foxon, B. A. Joyce & S. Holloway (University ofLeicester): Instrument response function of a quadru-pole mass spectrometer used in time-of-flight meas-urements.Int. J. Mass Spectrom. Ion Phys. 21, 241-255, 1976(No.3/4). M

M. Gleria & R. Memming: Novelluminescence genera-tion by electron transfer from semiconductor elec-trodes to ruthenium-bipyridil complexes.Z. phys. Chemie neue Folge 101, 171-179, 1976(No. 1-6). ' H

H. Hieber: Aging properties of gold layers with dif-ferent adhesion layers.Thin Solid Films 37, 335-343, 1976 (No. 3). H

A. Humbert, L. Hollan & D. Bois (I.N.S.A. Lyon, Vil-leurbanne): Influence of the growth conditions on theincorporation of deep levels in VPE GaAs.J. appI. Phys. 47, 4137-4144, 1976 (No. 9). L

A. J. R. de Koek: Characterization and elimination ofdefects in silicon.Festkörperprobleme 16, 179-193,1976. E

W. Lems, H. Kinkartz & W. Lechner: Incandescentlamp filaments: facet-cooling, failure mechanisms.Philips Res. Repts. 32, 82-95, 1977 (No. 2). A

A. Milch: Etch polishing of GaP single crystals byaqueous solutions of chlorine and iodine.J. Electrochem. Soc. 123, 1256-1258, 1976 (No. 8). N

C. Mulder & H. E. J. Wulms: High speed integratedinjection logic (J2L).IEEE J. SC-U, 379-385,1976 (No. 3). E

J. M. S. Schofield: The physics of gas discharge cellswithin d.c. memory display panels.4th Int. Conf. on Gas discharges, Swansea 1976 (IEEConf. Publn No. 143), pp. 397-400. M

A. L. N. Stevels: On the luminescence of CsI: Mn andCsBr:Mn.Philips Res. Repts. 32, 77-81, 1977 (No. 2). E

A. L. N. Stevels & A. D. M. Schrama-de Pauw: Eu2+luminescence in hexagonal aluminates containing largedivalent or trivalent cations.J. Electrochem. Soc. 123,691-697,1976 (No. 5). E

R. P. Tijburg & T. van Dongen: Selective etching ofIlI-V compounds with redox systems.J. Electrochem. Soc. 123, 687-691,1976 (No. 5). E

Volume 37, 1977, No. 8 Published 18th May 1978.pages 189-220

Page 15: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

Mayl978

Recent 'United States PatentsAbstracts from patents that describe inventions from the following research laboratoriesthat form part or cooperate with the Philips group of companies: .

Philips Research Laboratories, Eindhoven, The Netherlands EPhilips Research Laboratories, RedhilI, Surrey.England RLaboratoires d'Electronique et de Physique Appliquée, 3' avenue Descartes,

94450 Limeil-Brévannes, France LPhilips GmbH Forschungslaboratorium Aachen, WeiBhausstraBe, 51 Aachen,

Germany . APhilips GmbH Forschungslaboratorium Hamburg, Vogt-Kölln-Straûe 30,

2000 Hamburg 54, Germany HMBLE Laboratoire de Recherches, 2 avenue Van Becelaere, 1170 Brussels

(Boitsfort), Belgium BPhilips Laboratories, N.A.P.C., 345 Scarborough Road, Briarcliff Manor,

N.Y. 10510,U.S.A. N

4052340Method Correproducing a voltage dependent resistor anda voltage dependent resistor obtalned therewithR. K. Eijnthoven EJ. T. C. vanKemenadeVoltage dependent resistor obtained by sintering a body of amixture of Zno and other metal oxides in an atmosphere whichcontains bismuth.

4052605Interpolating non-recursive digital filterL. D. J. EggermontAn interpolating non-recursive digital filter for generating outputsignal samples which occur at a given output sampling frequencyand which are related in a predetermined way to a sequence ofinput signal samples has an output sampling frequency which isan integer multiple of the frequency of the input signal samples.In order to make more efficient use of the storage capacity of astorage device in the filter, multiplying coefficients representativeof the difference between two samples of the impulse responsewhich belong to different sets but to the same sampling period areused.

4052633Restorabie cold cathode in a gas discharge electron gunT. M. B. Schoenmakers ERestorabie cold cathode in a gas discharge electron ion gun' inwhich the eroded material in the active surface of the cathode isrestored by supplying new material in the form of a wire whichis moved through a hole in the cathode body by means of a screwspindle.

4052681Gas-discharge laserKi BulthuisB. J. DerksemaH. T. DijkstraJ. vander WalA gas-discharge laser having a Brewster window near-at least alleof the reflectors which is secured directly to the laser tube. TheBrewster window is secured to a surface of the laser tube and anormal to the window makes an angle equal to the Brewsteranglewith the axis of the laser tube. The surface may be a slot in thelaser tube or a recessed end face ofthe laser tube near a reflector.

An advantage is that the Brewster-window may have any con-venient diameter and that the laser tube need not be drilled toform a cavity.

{

4052683Microwave deviceJ. H. C. van HeuvenF. C. de Ronde

E

E

A microstrip waveguide transition where the substraté is arrangedin a symmetry plane of waveguide and is situated parallel to thefield lines of the electrical field and the longitudinal axis of thewaveguide. The asymmetric microstrip conductor structure iscoupled, via a symmetrical-asymmetrical transformer, to sym-metric band line provided on the substrate. To be conductive forRF energy, the individual conductors of the 'band line are con-nected to opposite walls of the wavegulde 'via broadening con-ductors.

4052707Magnetic device having domains of two different slzesfna single layerW.F. Druyvesteyn EH. M. W. BooijA magnetic' device comprising at l~asi one thinIayer of a mag-netizable material having a preferred direction of magnetizationwhich is approximately perpendicular to the surface of the layerin which magnetic domains are generated, maintained and pos-sibly annihilated in the layer. A domain guiding structure whichwith a given magnetic field substantially in the preferred directionenables the occurrence of two types of magnetic domains. Thearea of the largest domain is at least is% and at most 125%larger than that of the other domains. The device also has meanswhich convert one type of domain into the other type of domain.

'4052747 '.1

E Device for-the magnetic domain'sterage of data having ashift register Wied with coded series of domains .J.Roos EA magnetic bubble domain device for recording information ona magnetizabie recording medium including a shift register filledwith a series of bubble domains coded in accordance with theinformation to be recorded, the whole bubble domain patternbeing printed, 'in combination with ~ magnetic transfer field, inone time on a recording medium, new information replacing theold information by shifting of the bubble domains in the register.

Page 16: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

4052748Magnetoresistive magnetic headK. E. KuijkA magnetic head for detecting information-representing mag-netic fields on a magnetic recording medium and comprising anelongate magneto-resistive element of a magnetically anisotropicmaterial which at its ends has contacts for connection to a currentor voltage source. Inorder to linearize the playback characteristicof the element, the easy axis of magnetization coincides with thelongitudinal direction of the element and means are presentwhich force the current to travel at an angle of minimum 15° andmaximum 75° with the longitudinal direction. These meansconsist in particular of equipotential strips provided on the el-ement.

4052853Hot-gas reciprocating machine comprising two or moreworking spaces, provided with a control device for thesupply of working medium to the said working spacesJ. H. Abrahams EA. hot-gas reciprocating machine involving a plurality of cycleshaving a mutually different phase, during each crank shaft revolu-tion working medium from a source of pressurized working me-dium being successively supplied to each cycle separately, via acontrol device, comprising one or more slides which are controlledexclusively by the variable cycle pressures.

4053207Electro-optic devicesE. T. KeveK.L.ByeAn electro-optic device having a platelet of PLZT material, thebirefringence of which depends ori an electric field. The thicknessof the platelet preferably is smaller than jhe thickness of onegrain. As a result ofthis the sensitivity is large and this permits al?w operating voltage in the device.

4053796Rectifying circuitR.J. van de PlasscheA rectifying circuit for balanced input currents, of which the twocomponents are applied to a first and a second point of a selectivecurrent mirror circuit respectively, either the first point constitut-ing the input and the second point the output, or the first pointconstituting the output and the second point the input of theselective current mirror circuit, depending'on and under controlof the polarity of the difference between the two components. Asa result, the selective current mirror circuit follows either .thegreater or the smaller-of the two components of the balancedinput current. The first and the second point are each connectedto the output terminal of the rectifying circuit via a current cir-cuit, These current circuits each have a reverse direction and eachcomprise the main current path of a transistor for transferring thedifference ·between the output current of the selectivê current mir-ror circuit and the component ofthe balanced input current whichis applied'to thé relevant' output to the output terminal in a volt-age decoupling manner.

4053806Pyroelectric detector compnsmg nucleating materialwettable by aqueous solution of pyroelectrlc materialA. A. Turnbull RH. SewellA pyroelectric detector employing a substrate supporting a thin,Le., 0.5 to 5 !Lmthick, solid layer of pyroelectric material with anintermediate layer ofnucleating material, i.e., a material which is

E

wettable by a solution of the pyroelectric material so that anadherent continuous layer is formed thereon. The pyroelectriclayer may be in the form of a mosaic of islands separated by anelectrically conductive material covered with an electrically in-sulating material.

4053887Doppler radar systemK. Holford RA Doppler radar system for controlling portable traffic signals inresponse to on-coming traffic. Each of two channel amplifiers ofthe system is fixed at high gain and passes both noise signals andDoppler signals to a phase detector. A threshold element providesa control signal when the average level of the phase detector out-put between high and Iow levels changes sufficiently, due to thepresence of Doppler signals, from a mean level which is due tonoise alone.

4055961Device for liquefying gasesP. S. Admiraa/ EA liquefactor includes a refrigeration stage for cooling a com-pressed gaseous body, and a first duct containing a first Joule-Thompson valve for connecting the refrigeration stage to a col-lecting container for use when the gaseous body: comprises asingle gas. A second duct parallelly connects the refrigerationstage to the collecting container and contains a second Joule-Thompson valve for use when the gaseous body comprises a mix-ture oftwo gases to be separated -. '

R 4056810Integrated injection logic memory circuitC.M.HartA. Slob

E

E

A new integrated circuit in which bias currents are supplied bymeans of a current injector, a multi-layer structure in which cur-rent is supplied, by means of injection and collection of chargecarriers via rectifying junctions, to zones to be biased of circuitelements of the circuit, preferably in the form of charge carrierswhich are collected by the zones to be biased themselves fromone of the layers of the current injector. By means of said currentinjector circuit arrangements can be realized without load resis-tors being necessary, while the wiring pattern may be very simpleand the packing density ofthe circuit elements may be very high.In addition a simple method of manufacturing with comparativelyfew operations can in many cases be used in particular upon ap-plication of transistors having a structure which is invertedrelative to the conventional structure.

4055953Hot-gas reciprocating engine'A. M. Neder/of EA hot-gas reciprocating engine in which the tubular connectionmembers interconnect a heater duct inside the heat pipe with theengine's expansion space and regenerator, each connection mem-ber extends transverse of the center line of the relevant unit; andis connected to the heat pipe wall with a flexible sealing member.

4056832Servo system for controlling the position of a readinghead

EJ. de BoerA. Wa/ravenA servo system for controlling the position of a magnetic readinghead relative to the center of a selected information track. Duringrecording a: long-wave positioning signal is recorded below thè

Page 17: Speaker recognition by computer - Philips Bound...variety of the acoustic parameters of the vocal tract ... fluent conversation between man andmachine still ... In speaker recognition

data signal in the tracks. Upon reading out, the head not onlyreads the information of the selected track but, as result of cross-talk, also the positioning signals of the adjacent tracks. Afterfiltering out and processing the positioning signals, a controlsignal for controlling the head is obtained.

4057063Device for sterilization by transuterine tube coagulationA. C. M. Gieles EG. H. J. SomersA device for sterilizing human females by transuterine fallopiantube coagulation wherein the substantial increase in impedanceofthe tissues during coagulation is used to signal for terminationof treatment.

4057725Device for measuring local radiation absorption in a bodyW. Wagner HA device for measuring the spatial distribution of radiationabsorption in a body wherein a multiplicity of radiators areregularly distributed about a circle surrounding the body, eachradiator emitting a wedge-shaped beam of radiation in the planeof the circle toward a different are portion of the circle betweentwo other radiators, a multiplicity of adjoining detectors in eachare portion measuring radiation from the radiator emittingradiation to that are portion, the spatial distribution of radiationabsorption being calculated from the measured radiation valuesof all the detectors.

4057728X-ray exposure device comprising a gas-filled chamberK. Peschmann AH.-G. JungingerAn x-ray exposure device comprising a flat and plane rectangularchamber containing an ionizabie gas and having walls providedwith electrode structures which generate a potential distributioncorresponding to that of two concentric spherical electrodes, aninsulating foil on which charge carriers resulting from ionizationofthe gas by the x-radiation and displaceable in the longitudinaldirection of the chamber being arranged therewithin.

4057796Analog:.wgitaI converterA. HoogendoornR. E. J. van de GriftT. J. vanKesselAn analog-digital converter in which a capacitor is charged ordischarged by a reference current with the aid of transistorswitches, a continuous current flows through the capacitor whichis determined by an input difference voltage. One side of thecapacitor is connected via a comparator to a flip-flop, to whichflip-flop a clock signal is applied and which flip-flop drives thetransistor switches. The other side of the capacitor is connectedto the first input of a differential amplifier, which differentialamplifier forms part of a negative feedback loop which maintainsthe voltage at the first input of the differential amplifier equal tothe voltage which is applied to the second input ofthe differentialamplifier.

4057831Video record disc manufactured by a process involvingchemicalor sputter etchingB. A. J. Jacobs EJ. van der WalG. B. GerritsenA mother for manufacturing long-playing video records isprovided by (1) selectively exposing a photoresist disposed as theouter layer on a substrate comprising a disc-shaped plate, a thinlayer of base material, e.g., an oxide or nitride, adhering to theplate and a thin metal layer, e.g., chromium, silver, nickel ortitanium, coating the base material layer, (2) removing non-activated sections ofthe photoresist layer and (3) sputter or chem-ically etching the thin metal and base material layers in sectionscorresponding to the removed sections of the photoresist layer.

4057833Centering detection system for an apparatus for playingoptically readable record carriersJ. J.M. Braat EAn apparatus as described for reading a record carrier on whichinformation, for example video and/or audio information isstored inan optically readable track-shaped information structure.A deviation between the center of a read spot which is projectedon the information structure and the center line of a track to beread can be detected with the aid of at least two detectors whichare disposed in the far field of the information structure in dif-ferent quadrants. With the aid of the same detectors a referencesignal is obtained which is used for deriving a control signal forcorrecting the position ofthe read spot relative to the track to beread.

4058382Hot-gas reciprocating machine with self-centered freepistonJ. Mulder EA hot-gas reciprocating machine having a free piston, one faceofwhich varies the volume of a working space while its other facebounds a buffer space of constant pressure. A control mechanismmaintains a constant nominal central piston position by mo-mentarily connecting the working space and the buffer space.

E4058743Pulse generating circuitD. R.Armstrong RA pulse is generated in an output transformer and this pulse isused to provide a spark across a spark gap for the purpose ofigniting any gas/air mixture present. The circuit operates from a1.5 volt d.c. source. .A blocking oscillator is energized by a two position switch whichis operated to a first position, and this charges a capacitor. Thecapacitor is then discharged through an output transformer toproduce the spark when the switch is returned to its crlginal restposition. The winding used to charge the capacitor from theblocking oscillator is also used in the discharge path for the capaci-tor and this inductive loading provides a slower discharge and amore controlled lower energy spark which is found to be betterfor gas ignition.