VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

VoCoRoBo: Remote Speech Recognition and Tilt

Sensing Multi-Robotic System

Sagun Man Singh Shrestha1, Labu Manandhar

2, Ritesh Bhattarai

3

Department of Electronics and Computer Engineering,

Tribhuvan University – Kathmandu Engineering College, Nepal

Gmail:

1 sagunms,

2 laburocks,

2 reittes | github.com/sagunms/vocorobo

Abstract: This work is based on the implementation of real-time speech recognition using DSP

algorithms such as Chebyshev IIR filters, accelerometer for tilt-sensing and establishment of short-

range wireless secure link with ARC4 cipher, all using low-cost 8-bit ATmega microcontrollers.

The robot implements a simple but effective algorithm for comparing the spoken word with a

dictionary of fingerprints using a modified Euclidean distance calculation. It also includes the ability

to securely control the navigation of multiple robots located at remote locations wirelessly from the

Control Module and also gather the various environmental data collected by the Robot Modules and

display them in the back to Control. Considering the time-critical algorithms actually requiring large

computations as well as a variety of sensors interfaced in the system, this project can demonstrate

how one can build an expansible multi-robotic system from cheap and ubiquitous electronics.

Keywords: Speech Recognition, Chebyshev, Digital Signal Processing, Euclidean Distance, ARC4

Cryptography, ATMega16/32, nRF24L01+ Wireless Transceiver, MMA7260Q Accelerometer

I. INTRODUCTION

VoCoRoBo stands for Voice Controlled RoBot in

which the user is capable of wirelessly controlling

multiple robots with either a voice command or

tilting the controls towards the desired direction. In

addition to this, each robot also relays temperature

and light sensor data securely back to the user station.

1.1 HARDWARE

A microcontroller is an integrated circuit composed

of a microprocessor unit, memory, and input/output

peripheral devices. Atmel ATmega32/16 is a low-

power CMOS 8-bit microcontroller based on the

AVR RISC architecture which is used to implement

the voice recognition, tilt-sensing, wireless and

cryptography algorithms. An accelerometer measures

proper magnitude and direction of acceleration

experienced relative to freefall, and can be used to

sense orientation. Controlling the robots with fun and

intuitive tilt gestures was possible using the Freescale

MMA7260Q 3-axis accelerometer. The two parts of

the system – control and robot modules are linked

wirelessly using the popular Nordic nRF24L0+ radio

transceiver. It operates on 2.4 - 2.5 GHz ISM band,

with air data rate up to 2Mbps, has ultra low power

operation and is ideally suited for remote control and

data acquisition. L293D H-bridge IC is a quad push-

pull driver capable of delivering output currents up to

600mA per channel. To control each robot turning

speeds simply by speed difference between wheels on

either side, differential drive technique was used.

1.2 SOFTWARE

Speech recognition is the process of converting an

acoustic signal captured by microphone and then

identifying the word from the sound. Due to speaker

dependence, the system needs to be trained before

use. Digital signal processing is concerned with the

representation of signals by a sequence of numbers

and their processing. Infinite impulse response is a

property of signal processing systems having impulse

response function that is non-zero over infinite length

of time. An example of IIR filter are Chebyshev II

filters having a steeper roll-off and more stop band

ripple than Butterworth filters. They minimize the

error between the idealized and the actual filter

characteristic over the range of the filter.

1.2.1 Speech Analysis

In speech recognition, the frequency content of the

detected word has to be analyzed. Several 4th order

Chebyshev band pass filters are created by cascading

two 2nd

order filters using the following Direct Form

II Transposed realization of difference equations.

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( )

Coefficients a’s and b’s used in the above equations

was obtained using the following syntax in Matlab.

[B,A] = cheby2(2,40,[Freq1, Freq2]);

cheby2 designs Chebyshev Type II digital filter using

the given specifications, 2 defines a 4th order filter, 40

defines the stop band ripple in dB, and Freq1 and

Freq2 are the normalized cutoff frequencies. The

tf2sos function is then used to convert the transfer

function of the filter to a 2nd

order section version.

1.2.2 Voice-fingerprint Calculation

Due to the limited RAM on the ATMega32, the

relevant information of each spoken word had to be

encoded in the form of a ‘fingerprint’. To compare

fingerprints, the following pseudo Euclidean distance

formula was used between the fingerprint of stored

and sampled word to find correct word.

∑| |

where, P = (p1, p2, ..., pn) is the dictionary fingerprint

and, Q = (q1, q2, ..., qn) is sampled word fingerprint.

pi and qi are the fingerprint data points. To see if two

words are the same, the distance between them are

computed and the words with the minimum distance

in the database are considered to be the matching

word. Original Euclidean distance requires squaring

the difference between two points. Fixed point

arithmetic produces too large a number, causing the

variables to overflow. Thus a modified formula was

used by neglecting the square root and the square

which practically showed satisfactory results.

1.2.3 ARC4 Cryptography

ARC4 is one of the most widely used software stream

ciphers in many encryption schemes, including WEP,

WPA, and SSL. The main factors in ARC4's success

over such a wide range of applications are its speed,

simplicity and efficiency in software and hardware.

3. DESIGN AND IMPLEMENTATION

3.1 HARDWARE ARCHITECTURE

Figure 3.1: Overall Hardware Architecture

2.4 GHz

wireless link

with 2 bytes (control byte

+ count byte)

payload

ATmega16 @ 8 MHz

(nRF24L01+ wireless

interface with ARC4 Cryptography)

ATmega32 @ 16 MHz

(Speech Recognition and MMA7260Q Tilt

Sensing)

Port C

PB0-PB3

ADC

(Port A)

PD3-PD5

x y

z

Port C

SPI

(Port B)

PA0-PA2 nRF24L01

Module

LCD LEDs

Port C

SPI

(Port B)

PD0-PD3

LEDs

L293D

H-Bridge

M

M

nRF24L01

Module

ATmega16 @ 8 MHz

(nRF24L01+ with ARC4 and H-Bridge

interface)

The system is divided into two broad sub-

subsystems: Control Module and Multi-Robot

Module. The Control Module is further divided into

two layers: the topmost layer and the second layer.

3.1.1 Control Module

The topmost layer of the control module consists of

ATMega32, where speech recognition, MMA7260Q

accelerometer sensing, output to 16x2 text LCD are

handled. The 2nd

layer consists of ATMega16 where

the nRF24L01 wireless routine as well as encryption

and decryption with ARC4 cipher are implemented.

The bridge protocol between the 1st and 2

nd layers in

the control module (Fig. 3.1 and 3.2) is designed such

the three output pins of PORTD of ATMega32 viz.

PD2, PD3 and PD4 are connected to the respective

input pins of PORTA of ATMega16 viz. PA0, PA1

and PA3. When the 1st layer recognizes the spoken

word (front, back, left, right or stop), the equivalent

bit combination is inputted to PORTA of the 2nd

layer

via these bridge lines. The 2nd

layer then sends out

the corresponding control byte wirelessly via SPI

port. When the one of the robot receives this control

byte, it will be decoded into its matching differential

drive motor combinations that will move the robot

physically in the commanded direction.

FUNCTION

Equivalent

received

control byte

PIN A

(Connected to Layer 1)

BINARY HEX PA2 PA1 PA0

STOP S 0 0 0 00H

FRONT F 0 0 1 01H

BACK B 0 1 0 02H

LEFT L 0 1 1 03H

RIGHT R 1 0 0 04H

SPD_UP U 1 0 1 06H

SPD_DN D 1 1 1 07H

Table 3.1: Function control byte to be sent out via Wireless

(SPI port) and corresponding bit combination inputted to

the second layer of Control Module (PINA).

3.1.2 Robot Module

It consists of two identical robots (A and B) which

can be positioned at different locations, provided they

are within the signal range of the Control Module.

Each robot consists of an ATMega16 with sensors

that take environmental data specifically, LM35

temperature sensor and a light dependent resistor. A

2.4 GHz wireless transceiver nRF24L01 is also

available on-board to receive control data and

transmit the remote data for data acquisition. With

L293D H-Bridge driver, two differential drive motors

are controlled independently so that the robot can

navigate front, back, left or right. Four input pins of

the L293D viz. IN1, IN2, IN3 and IN4 are connected

to four output pins of PORTD of ATMega16 viz.

PD0, PD1, PD2 and PD3 respectively.

FUNCTION

Equivalent

received

control byte

PORT D

(Connected to H-

bridge)

BINARY HEX

(LSB) IN4 IN3 IN2 IN1

STOP S 0 0 0 0 00H

FRONT F 0 1 1 0 06H

BACK B 1 0 0 1 09H

LEFT L 0 0 1 1 03H

RIGHT R 1 1 0 0 04H

SPD_UP U

SPD_DN D

Table 3.2: Function control byte received via Wireless

(SPI port) and corresponding bit combinations outputted to

H-bridge (PORTD).

3.2 SOFTWARE ARCHITECTURE

3.2.1 Input, Processing, Output

Figure 3.2: Input, Processing and Output block diagram for

speech recognition

At a rate of 4 KHz, the algorithm checks the ADC

input for audio signal. If the ADC value exceeds the

threshold value, it is taken as the start of half a

second long word. The sampled word passes through

Speech ADC Band Pass

Filters

Generate Voice

Fingerprints

Fingerprint

Templates

Control Signals

Output to the Robot

COMPARE

8 band pass filters to be encoded into a fingerprint.

The words to be matched are stored as fingerprints in

a dictionary so that newly generated sampled

fingerprints can be compared with them later. The

modified Euclidean distance calculation finds the

fingerprint that is the closest match and then sends a

control signal ultimately to the robot to perform

operations like left, right, front, back and stop.

3.2.2 Initial-Threshold Calculation

All the background sound at the startup is considered

to be a base value which improves the accuracy of

the speech recognition. At the start up, the algorithm

reads the ADC input using ATMega32 timercounter0

and accumulates its value 256 times. By interpreting

the reading of the ADC value as a number between 1

to 1/256 in fixed point, and accumulating 256 times,

the average ADC value is calculated without doing a

multiply or divide. Three average values are taken

each with a 16.4 ms delay between the samples. After

this, the threshold value is to be four times the value

of the median number. The threshold value is useful

to detect whether a word has been spoken or not.

3.2.3 Voice-fingerprint Generation

Figure 3.3: Filter implementation block diagram for the

generation of fingerprints

The program considers a word detected if a sample

value from the ADC is greater than the threshold

value. Every sample of ADC stored in an integer

variable Ain which again passes through eight 4th

order band pass filters for 2000 samples (half a

second) once a word has been detected. When a filter

is used its output is squared and that value is

accumulated with the previous squares of the filter

output. After 125 samples the accumulated value is

stored as a data point in the fingerprint of that word.

The accumulator is then cleared and the process is

begun again. After 2000 samples 16 points have

been generated from each filter, thus every sampled

word is divided up into 16 parts. Our assembly

language code is based around using 8 filters and

since each one gives an output of 16 data points and

every fingerprint is made up of 128 data points.

3.2.4 Filter Design and Implementation

Figure 3.4: Band pass Filter 200-400 Hz

Figure 3.5: Band pass Filter 1600-1800 Hz

3.2.5 Digital Filter Implementation

The 4th order Chebyshev digital filter with 40 dB stop

band was chosen due to very sharp transitions after

the cutoff frequency. Most of the important

frequency content in speech is found to be within the

first 2 KHz as it usually contains the first and second

speech formants. Thus 8 BPFs of frequencies ranging

from .2 to 1.8 KHz were designed as shown in the

magnitude and phase plot. This also permitted the

sampling at 4 KHz (to satisfy the Nyquist criteria for

sampling first 2 KHz voice frequencies) and enough

time to implement 8 filters. For sufficient frequency

resolution to properly identify words, bandwidth of

each filter is set to 200 Hz.

Each 4th order filter is created in assembly code by

cascading two 2nd

order IIR filters whose coefficients

ADC FILTER 2

FILTER 9

ACCUMULATOR

ACCUMULATOR

VOICE FINGERPRINT

are generated using Matlab (Listing 1.2.1). Floating

point coefficients are converted to fixed point by

multiplying them by 256 and rounding off to nearest

integer in real-time. Fixed point was used instead of

floating point (which would have been more

accurate) as floating point calculations of ATMega32

is too slow to call all the filters within 4 KHz.

The ATMega32 only has 2 KB of RAM and a word

sampled at 4 KHz for a half a second would require

entire 2 KB. In order to make a fingerprint then from

a word the ADC output has to pass through all the

filters faster than the ADC sample time of 250 µs.

The output of the filter was squared in order to store

the intensity of the sound rather than just the

amplitude. Since the lowest and highest frequencies

could be neglected without noticeable degradation in

accuracy of speech recognition and that the memory

and cycle time of ATMega32 wouldn’t be sufficient

to implement all ten filters, only 8 BPFs was

sufficient to compartmentalize frequencies between

200 Hz - 1.8 KHz.

3.2.7.1 Chebyshev II filter coefficients

# Filter 1 Filter 2 Filter 3 Filter 4

f, K

Hz

0.2 – 0.4 0.4 – 0.6 0.6 – 0.8 0.8 – 1

1st 2

nd-o

rd

er

co

eff

.

A112:451

A113:-248

B111:21

B112:-32

B113:21

A212:355

A213:-248

B211:27

B212:-29

B213:27

A312:224

A313:-248

B311:31

B312:-15

B313:31

A412:72

A413:-248

B411:34

B412:4

B413:34

2n

d 2

nd-o

rd

er

co

eff

.

A122:458

A123:-248

B121:2225

B122:-4285

B123:2225

A222:366

A223:-248

B221:1090

B222:-1826

B223:1090

A322:239

A323:-248

B321:762

B322:-965

B323:762

A422:88

A423:-248

B421:633

B422:-464

B423:633

Ga

in

G1=80 G2=120 G3=140 G4=160

Table 3.3: MATLAB filter coefficients for Chebyshev II

(40 dB stop band) Filters 1-4

# Filter 5 Filter 6 Filter 7 Filter 8

f, K

Hz

1 – 1.2 1.2 – 1.4 1.4 – 1.6 1.6 – 1.8

1st 2

nd-o

rd

er

co

eff

.

A512:-72

A513:-248

B511:34

B512:-4

B513:34

A622:-239

A623:-248

B621:762

B622:965

B623:762

A712:-355

A713:-248

B711:27

B712:29

B713:27

A812:-451

A813:-248

B811:21

B812:32

B813:21

2n

d 2

nd-o

rd

er

co

eff

.

A522:-88

A523:-248

B521:633

B522:464

B523:633

A622:458

A723:-248

B721:2225

B722:-4285

B723:2225

A722:-366

A723:-248

B721:1090

B722:1826

B723:1090

A822:-458

A823:-248

B821:2225

B822:4285

B823:2225

Ga

in

G5=160 G6=140 G7=120 G8=80

Table 3.4: MATLAB filter coefficients for Chebyshev II

(40 dB stop band) Filters 5-8

3.2.6 Wireless Packet Format

The preamble byte composed of alternating zeros and

ones is sent first, followed by five bytes address field.

Data payload of user settable length (1-32 bytes) is

sent next. Two versions of payload was implemented

i.e. 2 bytes payload was primarily used having only

the encrypted byte and a count byte, however for data

acquisition from temperature and light sensors from a

remote location, an 18 byte payload version was

designed. The final part is the two byte long CRC.

3.2.7.2 Wireless Data Payload format

The data payload for control module is of two types:

transmitter mode and receiver mode payload, both

having 18 bytes payload width. Control module has

to transmit data packets to individual robots and also

receive sensor data from replying robots. So it has to

hold the entire payload of the each robot (two in our

case) for both transmission and reception. Two 18

bytes char arrays data_tx1 and data_tx2 stores the

transmission mode payload while the other two

arrays data_rx1 and data_rx2 are for the receiver.

Both the payload sizes are of PAYLOAD_SIZE (18

bytes) defined in the wireless routine of ATmega16.

Figure 3.6: Transmitter Mode Payload

Out of three blocks, the first 16 bytes block holds the

data text to be sent from the control to the robot

modules. For inputting the text data, we use

RealTerm to send the text from the computer to the

MCU via UART for sending text messages to the

individual robots at different locations. The control

block is formed by the 1 byte data_control which

stores the ASCII characters: ‘F’, ‘B’, ‘L’, ‘R’ and ‘S’,

representing the control information for front, back,

left, right and stop. When the targeted robot receives

the control information in transmitter payload, it will

interpret the ASCII control byte as the corresponding

robot movement commands.

The ARC4 cipher is used to encrypt the control byte

and data text block. Being a stream cipher, the byte

count must be kept up to date (missing a packet will

result in an incorrect decryption from that point on),

so a packet count byte is added with each packet that

keeps a packet count. This allows the local unit to

catch up to the correct byte in the PRGA (assuming

targeted robot misses less than 256 packets in a row).

Figure 3.7: Receiver Mode Payload

Receiver mode payload is needed to receive the

encrypted data payload sent by the individual robots,

decrypt the encrypted block by syncing with the help

of packet count, segregate the sensor readings of

temperature (2 bytes), light (2 bytes) and speed

setting (1 byte) and store it in their respective

variables for data logging. For data acquisition, the

payload is divided into temperature and light blocks.

The readings from the two sensors in each robot are

stored in their respective integer variables and sent to

the control module in packet format.

3.2.7.3 Source and destination data pipe addressing

Using switches connected to PD4 and PD5 of

ATMega16, the user can select whether the control

byte generated by roboControl function is directed to

control bytes data_control1 or data_control2 which

are concatenated to respective data packets for each

robot. The user would thus be able to select to which

robot the current command would be directed to. This

technique would enable the realization of multi-robot

control paradigm from a single control module.

For implementing a minimalistic Star network

topology, the receiving pipes of control module,

Robot1 and Robot2 are 0, 1 and 2 respectively and

the corresponding pipe addresses are E7:E7:E7:E7:E7,

C2:C2:C2:C2:C2, C2:C2:C2:C2:C3. The rest of the five

data pipes in each of the three linking modules are

disabled to effectively block reception of packets

destination was elsewhere. Prior to transmitting a

data packet, the destination address should be set.

Figure 3.8: Minimalistic Star Network Topology for

establishing communication link between Control and

Robot Agent modules and their respective destination

multi-pipe addressing

3.2.7 ARC4 Cryptography

ARC4 generates a pseudorandom stream of bits

(keystream) which, for encryption, is combined with

the plaintext using bit-wise xor; decryption is

performed in the same way (since xor is a symmetric

operation). To generate the keystream, the cipher

Data text (data_text1, data_text2)

(16 bytes)

17 16 15----------------------------------------------- 0

Packet

count (1 byte)

Control

(1 byte)

data_control

Encrypted Block

T T T T T T T T T T T T T T T

T

PAYLOAD_SIZE (18 Bytes)

data_tx1

data_tx2

Padding bits

(12 bytes)

Light

(2 bytes)

P P P P P P P P P P

P P L L T T

17 16 15 ------------------------ 4 3 2 1 0

Speed

(1 byte)

Packet

count

(1 byte)

Temp

(2 bytes)

PAYLOAD_SIZE (18 Bytes)

Encrypted Block

E7:E7:E7:E7:E7 Pipe 0

Robotic

Agent I

Pipe 2 Pipe 0

C2:C2:C2:C2:C2 Pipe 1

C2:C2:C2:C2:C3

Pipe 2

Pipe 1 Pipe 0

P5 P4 P3 P2 P1

P0

P5 P4 P3 P2 P1

P0

P5 P4 P3 P2 P1

P0 TX

TX

Robotic

Agent II

TX

Control Module

Communication

Link (Pipe

Destination)

makes use of a secret internal state which consists of

two parts:

A permutation of all 256 possible bytes (denoted

"S" below).

Two 8-bit index-pointers (denoted "i" and "j").

The permutation is initialized with a variable length

key, typically between 40 and 256 bits, using the key-

scheduling algorithm (KSA). After this, the stream of

bits is generated using the pseudo-random generation

algorithm (PRGA). The ARC4 cipher is implemented

in conjugation with the wireless routine of

ATMega16 of both control and robot modules.

3.2.8 MMA7260Q Tilt Sensing

Figure 3.9: Overall accelerometer tilt sensing algorithm

MMA7260Q has three sensor output pins viz. X, Y

and Z connected to three of the ADC inputs viz. PA3,

PA4 and PA5 of ATMega32. The robot functions

(front, back, left and right) are controlled in either

Speech or Accelerometer mode. In the latter, the tilt-

sensing algorithm samples the X, Y, Z values for

origin first into xyzOrigin, and rapidly stores the

remaining into xyzADCArray. These arrays are used

by the three decision blocks to determine the speeds

in the individual directions. In the speed and decision

block, once the speed either in positive or negative

direction (depending on accelerometer orientation) is

determined, decide whether the function to be

interpreted is a front, back, left, right or stop. For this,

the calculated speed in either X or Y has to exceed a

predefined threshold, to consider the movement data

valid. The decision of the command interpreted by

the algorithm is sent to the roboControl function

which conveys it ultimately to one of the robots.

Figure 3.10: Flowchart showing xSpeed determination and

decision making of robot functions (FRONT and BACK)

Yes

Samples X, Y, Z values for Origin into xyzOrigin array

No

START

Initialize: *Origin & Speed variables for x, y, z XOrizin = yOrigin = zOrigin = 0 XSpeed = ySpeed = zSpeed = 0 *Configure ADC pin = 3 to 5 *Initialize LCD

Determine xSpeed, ySpeed and zSpeed (REFERENCE AXIS)

Decision of robot function

Send appropriate control signal

ADC conversi

on

Store the remaining values into xyzADCArray

No

Yes

Yes

No

Yes

Yes

No

Yes

Decision = FRONT

Decision = STOP

Decision = LT/RT

Is xADCArr

ay>

xSpeed = xADCArray - xOrigin (+ve speed value)

From xOrigin

From xADCArray

xSpeed = xOrigin - xADCArray (-ve speed value)

Is xSpeed >

threshold

Decision = BACK

AXIS=1?

Decision =

Is xSpeed >

threshold

AXIS=1?

Send decision to roboControl function

4. RESULTS

4.1 Time domain waveform

The figure depicts different time domain waveforms

of the spoken word generated by Matlab. The time

duration of the spoken words front, left and right are

approximately of 4s duration. The word back is of

lowest duration of 2s due to which it is recognized

with least accuracy relative to other five words while

stop is of highest duration of 5s accuracy is highest.

Figure 4.1: Time domain representation of Back

Figure 4.2: Time domain representation of Stop

4.2 Frequency domain waveform

These figures depict the spectral analysis (discreate

fourier transform) of the sampled time domain data

generated using Matlab.

Figure 4.3: FFT of the word Back

Figure 4.4: FFT of the word Stop

4.3 Dictionary data points for voice fingerprints

Table 4.1: Dictionary data points for the word FRONT

stored in the flash memory

128 data points for each of the five words are logged

via RealTerm in similar manner during the training

stage and stored as dictionary in the flash memory.

4.4 Speech Recognition

Figure 4.5: Recognition Probability Comparison

85%

90%

95%

100%

Front Back Left Right Stop

95%

90%

95% 95%

100%

Recognition Probability

Number of Testing=20

Filter1

Filter2

Filter3

Filter4

Filter5

Filter6

Filter7

Filter8

731 831 723

2343 4838 2514 7815 1085 681

1025 707

1057 625 309 172 672

177 346 307 364 95 59 10 0 0 0 0

35 4 0 0 0

3120 3704 4341 1001 1957 5105 288 51

156 31 0

732 175

4 0

44

474 1188 1966 539 167 184 78 0

30 52 30

193 0 0 0 0

7662 4377 3991 2200 1639 347 561 134

0 23 20

1309 874

0 0 0

1564 789

4137 1752 1311 1629

52 5

34 68

123 728 343 120 77 76

385 183 306 171 553 163

3 56 72

123 68

219 196 42 41 37

704 764 796 950

2347 1998 489 665 266 379 137 138 729 944

1400 516

The accuracy of the speech recognition was within an

acceptable range of above 90% by our initial

expectations of the system design. However,

considering the basic speech algorithm, recognition is

valid only for the same person who underwent the

preliminary voice training to initialize the dictionary

fingerprints. For convenience, the recorded voice of

Oxford dictionary software stored as a .wav file was

played in a relatively quiet surroundings.

4.5 Euclidean Distance Comparison

Figure 4.6: Euclidean Distance Comparison

UART logging from RealTerm was done and the

Euclidean distance comparison was logged with all

five different fingerprints already stored in the

EEPROM. As expected, the word was recognized as

the one with the least distance when comparing with

the five fingerprints.

4.6 Wireless Transmit and Receive

4.6.1 Correct ARC4 Key Encryption/Decryption

The logged data data from the RealTerm is presented

below. It depicts correct ARC4 key encryption and

decryption. If the private key is matched in both the

control and robot modules as shown below, then the

encrypted data is decrypted back to the original data

as the PRGA of robot agent updates 12 times to catch

up with the PRGA of Control module.

CONTROL Initialized!

== Control Module ==

Private Key = SaGuN

- TX to Robot I -Destination:

C2:C2:C2:C2:C2(Pipe1)

Original:

data_tx1[0]= S

data_tx1[1]=0

ROBOT Initialized!

== Robot Module I==

Private Key = SaGuN

-RX from Control- Packet received!

Encrypted

data[0]= ‘

data[1]=0

No. of PRGA updates =

Encrypted:

data_tx1[0]= ‘

data_tx1[1]=0

Packet sent!

Current Sequence = 1

- TX to Robot I - Destination:


Original:

data_tx1[0]= S

data_tx1[1]=1

Encrypted:

data_tx1[0]= ,

data_tx1[1]=1

\Packet sent!


12 times

Decrypted

data[0]= S

data[1]=0


-RX from Control- Packet received!

Encrypted

data[0]= ,

data[1]=1

Decrypted

data[0]= S

data[1]=1


4.6.2 Incorrect ARC4 Key Encryption/Decryption

If the private key is not matched between the two

modules then the encrypted data cannot be decrypted

back to its original data as shown below.

CONTROL Initialized!

= Control Module =

Private Key= VoCoRoBo

- TX to Robot II -

Destination:


Original

data_tx1[0]= S

data_tx1[1]=0

Encrypted

data_tx1[0]= j

data_tx1[1]=0

Packet sent!


- TX to Robot II -

Destination:


Original

data_tx1[0]= S

data_tx1[1]=1

Encrypted

data_tx1[0]= D

data_tx1[1]=1

Packet sent!


ROBOT Initialized!

=Robot Module II=

Private Key = SaGuN

- RX from Control-

Packet received!

Encrypted

data[0]= j

data[1]=0

No. of PRGA updates =

7 times

Decrypted

data[0]= ƒ

data[1]=0


- RX from Control-

Packet received!

Encrypted

data[0]= D

data[1]=1

Decrypted

data[0]= ~

data[1]=1


5. CONCLUSION

This project is based on the implementation of real-

time speech recognition using DSP algorithms such

as Chebyshev IIR filters, accelerometer for tilt-

sensing and establishment of short-range wireless

secure link with ARC4 cipher, all using ubiquitous

low-cost 8-bit microcontrollers. With an accuracy of

the speech recognition above 90%, it shows the

feasibility of the system to be applied in any low cost

applications in real time. It was observed that the

words with greater pronunciation stress were

recognized better. Although for now, the recognition

is accurate only for the same person who trained the

system, it can be expanded to make the system

speaker independent by further research on the

storing and retrieval of the voice fingerprint from a

different media. Multi-channel wireless link with

ARC4 was also successfully implemented to

exchange control and sensor data. As nRF24L01 is

capable of higher speed data transmission, the system

can also be expanded to incorporate other sensors

like audio or video sensors for richer data acquisition

.

6. REFERENCES

[1] T. Aamodt. (2003, April) “Speech Recognition

Algorithm”, University of British Columbia.

http://www.eecg.toronto.edu/%7Eaamodt/ece34

1/speech-recognition

[2] X. Lu, S. Lee, 2006. “Voice Recognition

Security System”, Cornell University

[3] A. Harison, C. Shah, 2006 "Voice Recognition

Car", Cornell University.

[4] B. R. Land; Cornell University; Fixed Point

mathematical function in GCC and assembler;

Optimized 2nd order IIR code.

[5] B. R. Land (2008, September). Fast Digital

Filtering. Circuit Cellar Issue # 218, p. 40.

[6] Application Note AVR201: “Using the AVR®

Hardware Multiplier”, Atmel Corporation.

[7] IIR Design: nauticom.net/www/jdtaft/iir.htm

[8] Brennen Ball; 2007; “Specializing in the NXP

LPC2148 and Microchip PIC18F452

microcontrollers and the Nordic Semiconductor

nRF24L01 2.4 GHz RF link”; diyembedded.com

[9] “Interfacing nRF2401 with SPI” (White Paper),

Nordic Semiconductor.

[10] T. Igoe, “MMA7260Q 3-Axis Accelerometer

Report for PIC 18F252 using PicBasic Pro”,

Sensor Workshop at ITP (January 16, 2006).

[11] Application Note AN3447: “Implementing Auto-

zero calibration technique for accelerometers”,

Freescale Semiconductors.

7. PICTURES

Figure 7.1: Overall System

Figure 7.2: Schematic Diagram of Control Module

Figure 7.3: Schematic Diagram of a single Robot Module

Technology

VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System