6
Greeting speech recognition based on semiconductor neurocomputer Wenming Cao Yihua Wan Institute of Intelligent Information System, Zhejiang University of Technology Hangzhou, Zhejiang 310014, China E-mail: [email protected] Abstract-Since CASSANDRA- I was successfully produced by Wang Shoujue in 1995, In this paper, we analysis its the implementation method of the novel semiconductor neurocomputer. Finally, the semiconductor neurocomputer CASSANDRA- I I is used to greeting speech recognition, and the satisfactory performance is obtained through experiment results. I . INTRODUCTION It is understood that neural network is a non-program parallel computation structure constituted by a large number of same interconnected operation cells. There are a lot of ways to implement this structure, all of that can be classified into two kinds: all-hardware implementation and virtual implementation, according to the correspondence between material physical units and neuron. The physical processing units and communication channels are corresponded to neuron and its connection of the neural network model on a application problem, which is so-called all-hardware implementation, every neuron and its connection have its corresponding physical components. There are P physical units implement the neural network constituted by N neurons, if P<N, we call it the virtual implementation of neural network. Therefore, the virtual implementation is to simulate the neural network on its fumction. Now, many famous IC corporations (such as Intel, Motorola, Panasonic, Hitachi, Fujitsu etc.) have produced their own analog or digital neural network chips. Not only on the scale of network but also its run speed, these chips are nearly practicality, which accelerate the development of the application of neural networks. For the neural network chip on the carried-ship weapon system should learn on-line, many circuits (such as feedback circuit, weight storage circuit, weight computation circuit, and modification circuit, etc) have been integrated into a chip, which realizes the all-hardware and self-learning neural network system, namely adaptive neural networks. In this paper, we study its the implementation method of the novel semiconductor neurocomputer. Finally, the semiconductor neurocomputer CASSANDRA-II is used to greeting speech recognition, and Shoujue Wang Lab of Artificial Neural Networks, Institute of Semiconductors, CAS, Beijing, 100083, P.R. of China E-mail: [email protected] the satisfactory performance is obtained through experiment results. II. THE HARDWARE IMPLEMENTATION OF CASSANDRA-H AND THE INTRODUCTION OF ITS FUNCTIONS. In this paper, we analysis novel neurocomputer - CASSANDRA-IH that wang shoujue proposed, which can simulate the neural network on the largest scale constituted by 1024 neurons with 512 input synapses. Each input has two weights, one is the direction weight, and the other is the kernel weight. The CASSANDRA-II has many operating modes as follows: 1) It can simulate 1024 neurons synchronously as general feed forward network, all of that can connect 256-dimensions input nodes. Among 1024 neurons the output of former 256 neurons not only can connect any neuron as input, but also can be used as any neuron of the hidden layer whose state is readable, the later 768 neurons are specialized used in the output layer of network. It can calculate 63 samples at most in this operating mode. 2) It can simulate 256 neurons with 512 input synapses as all-connected feedback network, besides that it also can connect 256-dimensions input nodes according to the request. It can iterate 63 times during a calculating process when computing the feedback network, with the readable middle-results in the stepwise iterations and the final result simultaneity. 3) It can be used to the sort order of priority as the single-layer perceptron POSLP. It can calculate 127 input samples' vectors at one time at most in this operating mode with the number of 1024 neurons whose input node is 512. A. The general equation of the neurocomputer of CASSANDRA-II The general equation is: (2.1) 0-7803-9422-4/05/$20.00 ©2005 IEEE 1453 Omi(t+l)=Fk, Ai Ci (%) Oi

[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Greeting speech recognition

Embed Size (px)

Citation preview

Page 1: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Greeting speech recognition

Greeting speech recognition based onsemiconductor neurocomputer

Wenming Cao Yihua WanInstitute of Intelligent Information System,

Zhejiang University of Technology Hangzhou,Zhejiang 310014, China

E-mail: [email protected]

Abstract-Since CASSANDRA- I was successfully producedby Wang Shoujue in 1995, In this paper, we analysis its theimplementation method of the novel semiconductorneurocomputer. Finally, the semiconductor neurocomputerCASSANDRA- I I is used to greeting speech recognition, andthe satisfactory performance is obtained through experimentresults.

I . INTRODUCTION

It is understood that neural network is a non-programparallel computation structure constituted by a large numberof same interconnected operation cells. There are a lot ofways to implement this structure, all of that can be classifiedinto two kinds: all-hardware implementation and virtualimplementation, according to the correspondence betweenmaterial physical units and neuron.

The physical processing units and communicationchannels are corresponded to neuron and its connection ofthe neural network model on a application problem, which isso-called all-hardware implementation, every neuron and itsconnection have its corresponding physical components.There are P physical units implement the neural networkconstituted by N neurons, if P<N, we call it the virtualimplementation of neural network. Therefore, the virtualimplementation is to simulate the neural network on itsfumction.

Now, many famous IC corporations (such as Intel,Motorola, Panasonic, Hitachi, Fujitsu etc.) have producedtheir own analog or digital neural network chips. Not onlyon the scale of network but also its run speed, these chipsare nearly practicality, which accelerate the development ofthe application of neural networks. For the neural networkchip on the carried-ship weapon system should learn on-line,many circuits (such as feedback circuit, weight storagecircuit, weight computation circuit, and modification circuit,etc) have been integrated into a chip, which realizes theall-hardware and self-learning neural network system,namely adaptive neural networks. In this paper, we study itsthe implementation method of the novel semiconductor

neurocomputer. Finally, the semiconductor neurocomputerCASSANDRA-II is used to greeting speech recognition, and

Shoujue WangLab of Artificial Neural Networks,

Institute of Semiconductors, CAS, Beijing,100083, P.R. of China

E-mail: [email protected]

the satisfactory performance is obtained through experimentresults.

II. THE HARDWARE IMPLEMENTATION OF CASSANDRA-H ANDTHE INTRODUCTION OF ITS FUNCTIONS.

In this paper, we analysis novel neurocomputer -CASSANDRA-IH that wang shoujue proposed, which cansimulate the neural network on the largest scale constitutedby 1024 neurons with 512 input synapses. Each input hastwo weights, one is the direction weight, and the other is thekernel weight. The CASSANDRA-II has many operatingmodes as follows:

1) It can simulate 1024 neurons synchronously asgeneral feed forward network, all of that can connect256-dimensions input nodes. Among 1024 neuronsthe output of former 256 neurons not only canconnect any neuron as input, but also can be used asany neuron ofthe hidden layer whose state isreadable, the later 768 neurons are specialized used inthe output layer of network. It can calculate 63samples at most in this operating mode.

2) It can simulate 256 neurons with 512 input synapsesas all-connected feedback network, besides that italso can connect 256-dimensions input nodesaccording to the request. It can iterate 63 times duringa calculating process when computing the feedbacknetwork, with the readable middle-results in thestepwise iterations and the final result simultaneity.

3) It can be used to the sort order of priority as thesingle-layer perceptron POSLP. It can calculate 127input samples' vectors at one time at most in thisoperating mode with the number of 1024 neuronswhose input node is 512.

A. The general equation ofthe neurocomputer ofCASSANDRA-II

The general equation is:

(2.1)

0-7803-9422-4/05/$20.00 ©2005 IEEE1453

Omi(t+l)=Fk, Ai Ci (%) Oi

Page 2: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Greeting speech recognition

where

Where Fk,is the nonlinear output function of the i-thneuron, ki is the sequence number ofthe nonlinear functionof the i-th neuron in the function database. ki: 1-8. Ini isthe j-th input value ofthe n-th input sample correspondingto the j-th input nodes. Wji is the direction weight ofthe j-thinput nodes corresponding to the i-th neuron. The kernelweight Wgi (1<=gi<=256) is the kernel of the g-th neuronoutputting to the i-th neuron. P is the exponential parameter(p is 1/3,1/2, 1, 2, 3 or 4). S is the monomial sign 0 or 1. °ng(t) is the output state value ofthe g-th neuron (1<=g<=256)at t time when the m-th sample be inputed. Oi is thethreshold of the i-th neuron (I<=i<=1024). Ci is theproportional divisor deciding the input size ofneuron. X, isthe coordinate proportional divisor ofthe nonlinear functionofneuron.

B. Mathematical description ofthe neurocomputerofCASSANDRA-II

A mathematical description of the neurocomputer ofCASSANDRA-II P of class A:

p=u ip;i ={xlp(x,y)<k,yEB1,xER'}, (2-3)

Bi= {xIx=aS+(1-a)S+1,,a=[O,1]} (2-4)

Where S, is an information data of class A ininformation space. Let

d(x,x x4= M ind(x,ax, +(1-a)x2) (2-5)31X2

aE0,1]be the distance ofx and line segment xix2. Then

jjlx-xllJj,q(xr-,xl,x2)<Qtf(r,,J{2 IIx-x2jI2,q(Y, ,x2) ><lx2 _xJj (2-6)

lx-l _q2(,X,2 o ui

q (x,xl 7x2) =< (x - xl){ 2XI1 > (2-7)

And Neuron of the neurocomputer of CASSANDRA-II

(2-2)

s(x,,x2;r) is:

S (x1, x2;r)={xld2(x,xx2)<r2 } (2-8)

If x l and x 2 are the same information data in

information space, then d (x,xx2) is equivalent to

d(x, xl), and S (xI, X2; r) is equivalent to S(x1; r)(Neuron of the neurocomputer becomes a hypersphere). Ifx 1 and x 2 are different points, then a neuron

S(x,X2;r) is a connection between xI and x 2, withcompact coverage compared to a hypersphere.A new neuron model, Neuron of the neurocomputer of

CASSANDRA-TI, is defined by the input-output transferfunction:

f(x;x1 ,x2)= A(d(Xx,x2)) (2-9)Where 0q() is the non-linearity of the neuron, the input

vector xe R', and x,x2 E Rn were two centers. A

typical choice of 00 was the Gaussian function used indata fitting, and the threshold function was a variant of theMultiple Weights Neural Network (MfWNN) [6]. Thenetwork used in data fitting or system controlling consistedof an input layer of source nodes, a single hidden layer, andan output layer of linear weights. The network implementedthe mapping

n,

fs (x) = AO + iv2(d(X, Xi,i2))i=l

(2-10)

Where 2,,O .< i < n, were weights or parameters. Theoutput layer made a decision on the outputs of the hiddenlayer. One mapped

fs (x) = max (d(x,xXii2))i=1(2-11)

Where 0(.) was a threshold function.

Im. THE APPLICATION OF SEMICONDUCTORNEUROCOMPUTERFOR GREETING SPEECH RECOGNITION

A. Feature extraction ofgreeting speech

There ,we use a novel method of data compression-decreasing frames according to a definite distance betweenangles, based on MFCC feature parameters extraction fromspeech signals. After feature parameters extraction withMFCC from primary speech signal frames, we got 16feature parameters. Composed every 16 feature parametersas a vector C,, i =1,2,* ,n,then we calculated the angle9. between adjacent 16-dimensions vectors,J

1454

Page 3: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Greeting speech recognition

OCC01 = ar cos( I 3+1 ) When the angle less than the

experimental statistical data 0.13 rad, we delete C or CJ.,n = n -1, until all the angles between adjacent vectorslarger than or equal to 0.13 rad, or n < 8 . But for

C andC + , which should be deleted, when the angle JJ J+1

less than 0.13 rad? We used the method as fellows:: Uj(=1)

arcco<-)J1+ (j=2:,..;n-i) (3-1)

0 Uj(j=n-1)lC l ) (J=1,223 2n-2)(3-2)

J(:; J+2According to the equation (3-1) and (3-2), we

calculated the angle qp and q2, if qp p2 , delete the

vector C1+1, whereas C, should be deleted.For the compressed data, we regulated it to a defmiite

length, namely through manual audition and watching byeyes, we selected continuous 8 vectors of every MFCC classevery person with optimal auditory result to single syllable(16*8 numerical values in all), and composed a128-dimension feature vector.

Synthesized above, the process of feature parametersextraction can be summarized as follows:

The first, we extract feature vector form the singleword greeting samples with MFCC.

1 ) Supposed that S is the primary speech signal set oftraining samples (greetings single word speech), S-{SiSi E S}, Si is the samples' set of the i-th class, x(n) is then-th sample point of the samples' set Si, preemphasized x(n)as follows:

x'(n) = x(n)-0.9375*x(n-1).2) Hamming window was used:

x'(n) = [0.54-0.46cos(2rn/255)]x(n) ,so thatthe greeting single syllable speech signal was partitionedinto many frames.

3) Then, each frame of data was disposed throughthe Mel cepstrum transformation with a filter bank with 24filters, as the result of which, Dm, in which the firstcoefficient with obvious energy characteristic and the last 7coefficients tend to 0 were deleted, the remaining 16coefficients were remained as feature coefficients.

The second, we eliminate the redundant data bydecreasing frames according to a definite distance between

angles.We calculate the angle 09 between adjacent

16-dimensions vectors, 9 = ar cos( j 1). When

the angle less than the experimental statistical data 0.13 rad,we delete C3 or CJ+1, n = n-I, until all the anglesbetween adjacent vectors larger than or equal to 0.13 rad, orn<8.

The third, we regulate the data to a definite length.We select continuous 8 vectors of every MFCC class

every person with optimal auditory result to single syllable(16*8 numerical values in all), and composed a128-dimension feature vector.

B. The training and recognition of semiconductorneurocomputer

There are 18 class samples in all. Supposed the setconstituted by every class sample of these 18 class samplesis Si(i =0,1,*I, 17) . The new construct-network set

Sti={Xu Xu E Si, j = 1,2, * * *,240} are constitutedby 10 vectors of 128-dimensions (240 sample points in all)after feature extraction, which are selected from everyperson of every class sample. After learning from thesamples of every construct-network set S', (i = 0,1, * - * ,17),the semiconductor neurocomputer CASSANDRA- I I isapplied to train and recognize these samples.

IV. EXPERIMENTS AND ANALYSIS

A. Construct the continuous speech database to berecognized

The continuous greeting speech database to berecognized is similar to the continuous speech database inmost characters, there are only a few differences in thecontent of data and the size of database between them, thesedifferences is detailed described as follows:

1) The difference between them in the content ofspeech data:

Every segment speech can be the greeting sentenceconstituted by 1-18 words, which must be spokencontinuously, namely guaranteed that it is continuous speech.1-18 kind words are shown in Table I

TABLE I .THE CATEGORIES OF SINGLE WORDS

Category SpellI Chi2 Guole

1455

Page 4: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Greeting speech recognition

345

67891011

12131415161718

HaoHenJiaoMaMing

NiQing

ShangShenmeWanWenWo

Wu

XiaZaozi

2) The difference between them in the scale ofdatabase:

There are totally 29 participants, 16 women and 13men, in which a woman and 4 men haven't participated inthe training. Each person said 3 greeting speech randomlywhich is constituted by words belong to 1-18 classes.5.1 HMM model and the introduction of constructing the

model of our systemExcept for the training and recognition method is based

on the semiconductor neurocomputer which is applied in thesystem, we also compare its recognition correct ratio withthat ofHMM model.

We train the same training samples by the method ofthe semiconductor neurocomputer and HMM modelrespectively, then the same samples to be recognized isrecognized by these two methods respectively. The numberof samples to be recognized and the classes of samples are

shown in Table II. These training samples which haveparticipated in the training were classified into 5 groupsaccording to the different number of samples, as shown inTable Il .the method of semiconductor neurocomputer andthe HMM model

TABLE II.

THE GROUPING WITH DIFFERENT NUMBER OF SAMPLES

For HMM model, we used the continuous HMM model,in which the parameter B always be expressed as Gaussianprobability density function. HMM model generally beexpressed as 2 = (iz, A, B) . The combination of elementsin a continuous HMM model is shown in Table Ill.

TABLE HTHE GROUP OF BASAL ELEMENTS IN A CONTINUOUS HMM

MODELModel Note

parametersN The number of

model statesA ={a..} The matrix of

state transitionprobability

The primaryprobabilisticdistributing of everystate

B= {b1 (o)} The outputi() probability densityfunction

The numbers of states and Gaussian probability densityfumctions with the maximal correct recognition rate aftermany adjustments are shown in TableIV, when constructingacoustic model with HMM modeling.

TABLE IVTHE NUMBER OF OPTIMAL STATES AND GAUSSIAN DENSITY

FUNCTIONS WHEN HMM MODELINGThe 24 72 144 192 240

numberoftraingsamples

The 5 5 5 5 5numberof states

The 2 6 6 6 3numberof Gaussdensityfunctions

B. The experimental results comparison between oursystem andHMM

We train the same training samples (the number oftraining samples of each class are 24, 72, 144, 192, 240respectively) by the method of the semiconductorneurocomputer and HMM model, then the same samples tobe recognized were recognized by these two methods. Thecomparison between the experimental results of these twomethods is shown in Table V a-e. The total recognition rate

1456

The group 1 2 3 4 5The number of 1 3 6 8 10

samples of everycategory(every

person)The number of 24 72 144 192 240

samples points ofevery categories ___I __ _I___

Page 5: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Greeting speech recognition

comparison is shown in Table VI, when using different sizeof training samples.

TABLE VA. RECOGNITION RATE COMPARISON WHEN

THERE ARE 24 SAMPLES TO MODElSingle The method of The method

word neural ofHMM modelsemiconductorcomputer

Hao 0.8683 0.7234Jiao 0.9536 0.8631Ma 0.8681 0.6910Ming 1.0000 0.9930Shang 0.8657 0.7037Wan 0.9384 0.8263Zao 0.8743 0.7486Zi 0.9716 0.9242

TABLE VB. RECOGNITION RATE COMPARISON WHEN

THERE ARE 72 SAMPLES TO MODELSingle The method of The method

word neural of HMMsemiconductor modelcomputer

Hao 0.9530 0.8805Jiao 0.9698 0.8863Ma 0.9549 0.9271Ming 1.0000 0.9930Shang 0.9329 0.8588Wan 0.9608 0.9440Zao 0.9078 0.8715Zi 0.9905 0.9573

TABLE VC. RECOGNITION RATE COMPARISON WHENTHERE ARE 144 SAMPLES TO MODEL

Single The method of The methodword neural of HMM

semiconductor modelcomputer

Hao 0.9643 0.9153Jiao 0.9930 0.9327Ma 0.9896 0.9549Ming 1.0000 0.9930Shang 0.9606 0.9074Wan 0.9692 0.9776Zao 0.9777 0.9190Zi 1.000 0.9787

TABLE VE. RECOGNITION RATE COMPARISON WHEN

THERE ARE 240 SAMPLES TO MODEl

Single The method of The methodword neural semiconductor of11MMmodel

computerHao 0.9577 0.9144Jiao 0.9930 0.9466

Ma 0.9965 0.9826Ming 1.0000 0.9930Shang 0.9792 0.9421Wan 0.9748 0.9860Zao 0.9749 0.9469Zi 0.9976 0.9858

TABLE VF. RECOGNITION RATE COMPARISON WHENTHERE ARE 240 SAMPLES TO MODEL

Single The method of The methodword neural of HMM

semiconductor modelcomputer

Hao 0.9802 0.9059Jiao 0.9930 0.9559Ma 1.0000 0.9618Ming 1.0000 1.0000Shang 0.9907 0.9306Wan 0.9944 0.9888Zao 0.9469 0.9469Zi 1.0000 0.9834

TABLE VI.TOTAL RECOGNITION RATE COMPARISON WHENUSING DIFFERENT SIZE OF DATASET TO MODEL

The number The method of Theof every neural method oftraining semiconductor HMM modelsamples computer

24 0.9114 0.799872 0.9587 0.9088144 0.9788 0.9415192 0.9796 0.9532240 0.9870 0.9497

V. CONCLUSION

Since CASSANDRA- I was successfully produced byWang Shoujue in 1995, In this paper, we analysis study its

1457

Page 6: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Greeting speech recognition

the implementation method of the novel semiconductorneurocomputer. Finally, the semiconductor neurocomputerCASSANDRA- I I is used to greeting speech recognition,and the satisfactory performance is obtained throughexperiment results.Then, the realization of novelsemiconductor neurocomputer is discussed. Experimentsdemonstrate that it is a good way to solve the problem onthe stability of the continuous speech recognition system ofthe independent-speaker greetings. It can reach high wordrecognition rate with short distance between microphoneand talker, under the circumstances with part street noise.We will farther perfect the work of this paper, and thehardware implementation of new semiconductorneurocomputer and its application are still our researchemphasis.

REFERENCES

[l]Hou Shouren, Yu Shaobo, "The introduction of neural network",National University ofDefense Technology Press, 1992

[2]Lu Huaxiang, Wang Shoujue: "The research of semiconductor artificialneural network and its development", Proceeding of Electronictechnology, (1996.9) 10-12

[3]Wang Shoujue, Wang Liyan, Wei Yun and Lu Huaxiang, "A GeneralPurpose Neuro Processor with Digital-Analog Processing", ChineseJournal ofElectronics, Vol. 3, No. 4,pp. 73-75, 1994

[4]Wang Shoujue, Lu Huaxiang, Chen Yudong and Ceng Yujuan: "Thehardware implement methods of artificial neural network andneurocomputer research", Journal ofShenzhen University, Vol.4, No.1,1997

[5]Wei Yun, Wang Shoujue, Wang Liyan, Lu Huaxiang: "Design of aGeneral Purpose Neuro Processor with Digital Processing andDiscussion on VLSI Integration", Chinese Journal ofElectronics, Vol.23, No. 5, pp. 69-73, 1995

[6]Wang Shoujue, "Biomimetic (Topological) Pattern Recognition - ANew Model of Recognition Theory and Its Applications", ChineseJournal ofElectronics, 2002,30 (10):14

[7]Wang Shoujue, " The install and service manual of the CASSANDRA-II neurocomputer", Institute of Semiconductors, CAS, 2001

[8]Xu Jian, "Hardware Implementation and Applied Research in SpecialPurpose Neural Computing system Based on Biomimetic PatternRecognition", Doctor's degree paper, Institute of Semiconductors,CAS, 2003

1458