Phillip .J. McKerrow et. al. - Classification of Plant Species From CTFM

Classification of Plant Species from CTFM Ultrasonic Range Data using a Neural Network

NEIL L. HARPER, MACS AND PHILLIP J. MCKERROW, member IEEE Department of Computer Science,

The University of Wollmgmg, Northfields Avenue,

Wollmgmg, NS W, 2522 [email protected] and [email protected]

ABSTRACT

This paper describes the application of an Artificial Neural Network (ANN) to the recognition of plants using Continuous Tone Frequency Modulation (CTFM) ultrasonic range data. Ultrasonic Mobility Aids have been used successfully by unsighted people for some time. One of these systems uses CTFM ultrasonics and allows perception of the environment by transmitting at ultrasonic frequencies and converting the echo into audible tones. After sufficient practice, unsighted users can navigate independently and confidently because they can recognise objects by the tonal patterns that they produce. This audio signal can also be captured and analysed by a computer.

This paper discusses research that is being done in the Intelligent Robotics Laboratory at the University of Wollongong that aims to isolate the information in the signal that facilitates recognition.

Keywords - Artificial Neural Networks, CTFM Ultrasonics, Machine Perception, Robotics.

1. Introduction

Professor Leslie Kay [6]-[8] developed an ultrasonic blind aid to assist blind people to perceive their environment. This blind aid insonifies the environment with ultrasound, detects the echoes and converts them into audible stereo tones. The pitch of the tone is proportional to the range of the object and the direction is determined binaurally by the auditory neural system from the stereo tones [7]. The audible representation of the echoes sound like squeals, chirps, tweets, warbles, or swishes.

With the blind aid, an experienced user can detect whether there is something in the way, and can get a large amount of information from the signal. The nature of the object comes from the tonal pattem, the height of the object is discovered by vertical scanning, and horizontal scanning will yield the width. A smooth highly specular surface produces a hard, clear chirp.

Complex objects produce highly complex signal patterns [2]. The tones contain information about the geometric structure the objects in the auditory field. The human brain converts this information back into a spatial pattem of the environment. This information can be captured by a computer using specialised hardware, after which it is able to be processed. A detailed overview of the

system can be found in Harper and McKerrow 141. An ANN is then used to classify several different

This paper discusses the initial experiments. Starting with four similar looking plants of different species, we have found that we can correctly classify the plants at any angle. The ANN is however, only being used as a stepping stone to our ultimate aim, which is to discover the information content of the echoes that will allow a machine to perceive the environment in a similar way to a human. This will lead to a formal theory of information content and perception technology for recognising objects. This technology will have applications in machine perception in areas such as robot or vehicle guidance and industrial process control and inspection.

The application of an ANN serves as a proof of concept to show that there is sufficient information encoded in the echo for robust recognition of objects. They will also enable us to determine the resolution of the system.

plants.

2. CTFM Ultrasonics

The advantage of ultrasonics over vision is its ability to measure the range of objects. Conventional ultrasonic sensing devices used in

Authorized licensed use limited to: University of Wollongong. Downloaded on May 25, 2009 at 08:41 from IEEE Xplore. Restrictions apply.

robotics use short pulses (or tone bursts) of ultrasonic energy, whereas CTFM ultrasonics transmit continuously. The ultrasonic echo is detected by electrostatic transducers and electronic circuitry converts it into tones audible to the listener.

CTFM radiation consists of a sine wave signal that is swept down over a 1 octave frequency range (typically 100 to 50 KHz with a sweep period of 250 ms). The received signal is correlated against the transmitted signal and the difference in frequency is proportional to the range of the target. In the audio signals, the pitch is proportional to the distance to the object. The Blind Aid system that we are using has three transducers mounted side by side in the horizontal plane. The centre one transmits continuously and the left and right sensors receive the echoes from the environment.

3. Machine Perception

Gissoni[2], who is completely blind, highlighted his own successful use of the blind aid [6] . He states that notes are heard singularly or in combination and can be as meaningful and informative as words spoken to you in your native tongue. The sensor can give accurate range indications to within 1 cm when the unit is set on short range. Object movement is detected by a change in pitch. Different types of plants produce a variety of signals depending on their leaf or needle structure. Tree trunks are distinguished from poles as they have texture and produce different signals. Other everyday objects such as doormats and picket fences have their own characteristic signals.

In familiar territory the aid is useful for detecting landmarks. Gissoni also refers to a signal bank that he has built up in his mind over the period of using the device. He uses this knowledge to detect unfamiliar objects and to navigate in unfamiliar territory. He describes an experience where he was presented with an object which seemed like a rock wall with foliage growing through it. Upon examination by touch, he found that it was exactly that. He had detected both rock walls and foliage individually in the past and was able to link that experience to what he was currently detecting.

He has achieved increased independence using the system. For example, he can locate the position of elevator signal buttons. He can then detect the doors opening and assess whether people are moving in and out so that he can better judge his own entry. Gissoni points out that he now can even thread his way through heavy pedestrian traffic; find an empty seat on a bus; an empty desk in a classroom; or a table in a restaurant.

The success of Gissoni has inspired our research in this area of ultrasonics and we are investigating exactly what is in the signal that allows objects to be recognised by the human perceptual system. With a blind aid, the human brain processes the tones received from the ear pieces [71 and it is also possible for a machine to do this.

Most of the current sensing applications simply present an image on some sort of display (eg. radar, medical ultrasound) and leave it to the operator to interpret the data. It is however, a much more complex problem for a machine to perceive the environment that is being sensed. An experimental system has been purchased that consists of a binaural sensor connected to a microcomputer through a spectrum analyser and transmidreceive electronics (Figure 1). The system enables us to obtain the outputs of the spectrum analyser for each receiver at a particular instant in time.

Fig. 1. Schematic of the Experimental System

The system outputs data from the spectrum analyser in the form of 512 magnitude values. These represent the amplitude of 512 frequencies over the range 0 to 5 KHz in 10 Hz steps. Like a human who perceives range from the pitch of an audible tone, the computer determines the range to the object from the frequency. The presence of an object is usually indicated by a significant magnitude across a group of frequencies.

The kind of object is indicated by the magnitude variations within a group of frequencies. Specular surfaces have a very narrow frequency band and usually have one particular frequency with a very large magnitude. Diffuse surfaces generally have a wider band of frequencies with smaller magnitudes (see Figure 2 for a typical example).

Fig. 2. Frequency Distribution of different types of surfaces

When more than one object or feature is in the range of the sensors, they appear as distinct lines in the frequency spectrum. Objects at the same range are distinguished binaurally. Figures 3 through 6 are graphs of scans of four plants used in the experiment described in this paper. The x-axis represents individual frequencies (512 in total) and the y-axis represents the magnitude at that particular frequency. Plants were chosen as they were highlighted by Gissoni [2] as having a distinctive tonal pattern. Plants are common in the home and everyday environments and can be a useful navigation landmark for mobile robots.


2 0 0 ........ -- ................................................................................

~~

Figure 3. Frequency spectrum of signal from Hypoestes

Figure 5. Frequency spectrum of signal from Codiaeum Crotona

4. Experimental Conditions

Four species of plants of similar size and shape were selected. All measurements were taken at approximately the same temperature and humidity since these factors affect the transmission of sound in air [9]. These measurements were taken from stationary plants as they were rotated through 3600 in 3O increments. This resulted in 120 samples per plant. The rotation of the plants ensures that the network, once trained is able to recognise the plant at any orientation.

The sensor was directed about one third from the top of the plant and the edges of the field of audition of the sensor were verified with a solid object before the plant was moved into position. This was done to ensure that the plant was being scanned instead of other objects in the environment such as the pot. We also ensured that the entire width of the plant was in the field of audition of the sensor.

The aim was to see whether the plants could be correctly classified by the network. Seventy five percent of the data collected from the plants was used to train the network (360 records) and the remaining twenty five percent was used for testing (120 records). The records used for testing were chosen randomly from the entire set by the pre-processing module.

5. Network Architecture and Pre- processing

The Backpropagation learning algorithm [ l l ] was chosen for several reasons : 1) the available training data suited supervised learning since it was

I 2 0 0 1 ........................................................................................

Figure 4. Frequency spectrum of signal from Radermachia sinicia

100 .......................................................... ......................... 2oo- I d""

0

Figure 6. Frequency spectrum of signal from Schefflera Arboricola

taken from known objects, 2) none of the outputs of the network were to be fed back to the inputs, so a feedforward network was applicable 3) a commercially tested system (package) was available on our hardware, 4) the paradigm had been used successfully previously (in fact it has been used in over 85% of published applications 1121); and 5) it is commonly used in sonar research (some examples being reported in [3] , 1131).

The specific implementation of the Backpropagation algorithm used here was the one provided by Neural Works Professional IIRlus from Neural Ware. A simple topology with one hidden layer was chosen since this has been shown to be sufficient to distinguish arbitrarily complex shapes [ 11. Furthermore, it has connections from all units in one layer to all units in the next layer.

When selecting an appropriate topology, the goal is to create a network large enough to learn yet small enough to generalise well. It is extremely difficult to specify an effective topology beforehand for any particular problem, the best way to find one is through experimentation [lo]. The number of inputs was chosen by analysing the raw signal, and this was set at one hundred. We found this range to be the envelope of the signal that all of the plants scanned fell within.

The amount of training was determined empirically so that the network did not over train and learn irrelevant details of the individual training vectors. The final network topology is shown in Figure 7.

A topology was initially set with a minimum number of hidden nodes. Training was performed for set numbers of iterations and test performance


Raw Pre-processing Input Hidden Output Data Module Layer Layer ay er (5 12 ( 100 nodes) (6 nodes) (4 nodes) values)

Hidden Number of Classification Units Presentations Error

Fig. 7. Network Schematic

logged. Multiple training iterations were conducted (20) and results averaged since initial conditions affect the results. We found the optimum number of hidden nodes to be 6.

A multiple class model was used for the output nodes where each class has a separate output node. The winner-takes-all method was used to determine which of the outputs is chosen ie. the output with the highest activation designates the class. The weights were initialised to small random numbers between -0.1 and +0.1. The error function being used is the mean squared error divided by two.

A learning rule is used to adapt the weights between the processing elements during the training phase. The process of changing these weights to achieve the desired results is called adaption. The learning rule that we chose is the Normalised- cumulative Delta-rule. This rule accumulates the weight changes over several epochs and makes the changes all at once. It's advantage is that it automatically adjusts the learning coefficients.

Wasserman 1121 noted that the choice of transfer function is not critical as long as it is non- linear and bounds the output. We chose to use the tunh activation function for all of the nodes in the network. Kalman et al 151 noted that the transfer function makes little difference when the network is trained but have shown that the tunh function has the best overall properties in training a layered feedforward network. The tunh function requires the inputs to be scaled between values of -1.0 and +1.0 before presentation to the network.

Vectors were selected that attempt to capture the features of the object. These consisted of 100 frequency values and included the information around the peak of the frequency spectrum. The resulting data is then normalised [12] and scaled. This flattens the shape of the input data and removes insignificant characteristics. The normalisation and scaling process ensures that the data is easier for the network to learn by not having extreme values and will also be within limits for the tunh activation function. Training data is presented to the network in a random order.

Standard Deviation

6. Network Training and Results

2000 3000 4000 SO0

The network was trained and tested, with results shown in Table 1. It shows the classification

5.04 2.50 4.88 1.05 4.47 1.14 4.84 1.13

Table 1. Results

The number of hidden nodes is shown in column 1. Column 2 represents the number of presentations of individual training vectors. The number of presentations was not selected by any particular method other than being round numbers over a wide range. The number of epochs can be calculated by dividing the number of presentations by the number of Training records. Column 3 shows the average classification error (as a percentage) for 20 separate runs of the test data. Column 4 shows the standard deviation of the percentage error.

The classification error for a network with 6 hidden units trained for 4000 presentations (11 epochs) is 4.47%. This indicates an extremely high likelihood of recognising a plant at an arbitrary angle once the network is trained. Increasing the number of presentations of the training set beyond 4000 , increases this error. This indicates that overtraining occurred and the network learnt irrelevant features of the training data.

The extremely high probability of correct classification of the training set (95.53%) indicates that there is sufficient information in the CTFM signal to be able to continue with further research in the area. Further work will involve testing the network with data from new plants of the same species. This will show how well the network will generalise. Also, it will give us a measure of the discrimination of the system. We will also see whether the network can recognise plants at different distances. Initial experiments have proved very promising.

We will then follow on to test the resolution of the system by taking different parts of the spectral envelope and grouping frequencies. Further plans also include modifying the structure of the plants (cutting parts from them) and seeing whether they can still be recognised.

7. Conclusion

In this paper the Ultrasonic Blind Mobility Aid developed by Professor Leslie Kay was


introduced and the success of one of its users was highlighted. The aid can provide the user with information about the presence of objects as well as information about the type of object. We discuss some research that is being done at the University of Wollongong using an experimental system connected to a microcomputer.

The ANN has proved that it can recognise and differentiate between plants. Unfortunately, there is currently no good way of extracting information from the weights. The network can however, give us an indication of whether there is enough information in the signal to use the system for complex object recognition.

Our next step will be to develop a reliable system based on the information in the signal and demonstrate that system in an application of robot navigation. In order to obtain understanding of the perception process, we will perform more experiments with this system and test other Artificial Intelligence techniques.

The ultimate aim is a fonnal theory of object recognition using a CTFM system which can be applied to a large range of applications. This system is of significant importance to the field of mobile robotics. It will be used as an aid to the sensor systems that currently exist.

8. Acknowledgments

This research is supported by an ARC, APIA scholarship funded by Thomson Sintra Pacific (TSP). We wish to thank Professor Leslie Kay of Bay Advanced Technologies for help with the sensor system, and Janice Sendt of TSP for her guidance. We would also like to thank John Fulcher for his valuable comments.

9. Bibliography

[l] Beale, R., Jackson, T., 1990, Neural Computing : An Introduction, IOP Publishing Ltd.

141

[61

[71

191

Gissoni, F., 1966, My Cane is Twenty Feet Long, The New Outlook for the Blind, February G o ” , R.P., Sejnowski, T.J., 1988, analysis of bidden units in a layered network trained to classify sonar targets, Neural Networks 1,75- 79 Harper7 N.L., McKerrow, P.J., 1994, Perception of Object Characteristics by the Interpretation of Ultrasonic Range Data, in Proceedings of the 7th Australian Joint Conference on Artificial Intelligence, 1994, pp 418-426, World Scientific Kalman, B.L., Kwansky, S.C., 1992, Why Tanh :Choosing a sigmoidal Function, International Joint Conference on Neural Networks, Baltimore, MD, pp 578-581 Kay, L., 1964, An Ultrasonic Sensing Probe as a Mobility Aid for the Blind, Ultrasonics, v2

Kay, L., 1974, A sonar aid to enhance spatial perception of the blind : engineering design and evaluation, The Radio and Electronic Engineer, Vol. 44 No. 11 Kay, L., 1985, Airborne ultrasonic imaging of a robot work space, Sensor Review, January, pp

Kingsler, L.E., Frey, A.R., Coppens, AB., Sanders, A.B., Fundamentals of Acoustics, John Wiley and Sons Masters, T., 1993, Practical Neural Network Recipes in C++, Academic Press, Znc Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986, Learning internal representations by error propagation in Rumelhart, D.E., McClelland, J.L., Eds), 1986, Parallel Distributed Processing : explorations in the microstructure of cognition Vol I : Foundations, MIT Press Wasserman , P.D., 1993, Advanced Methods in Neural Computing, Van Nostrand Reinhold Watanabe, S. , Masahide, Y., 1992, An ultrasonic visual sensor for three-dimensional object recognition using neural networks, ZEEE Transactions on Robotics and Automation, Vol

pp53-59

8-12

8, pp 240-249, April, 1992


Documents

Phillip .J. McKerrow et. al. - Classification of Plant Species From CTFM