On-line handwriting recognition using Chain Code ...ben-shahar/Teaching/Computational-Vision/... · On-line handwriting recognition using Chain Code ... Off-line handwriting recognition

On-line handwriting recognition using Chain Code representation

Final project by

Michal Shemesh

shemeshm at cs dot bgu dot ac dot il

Introduction

Background

When one preparing a first draft, concentrating on content creation or summarizing lecture notes, pencil and paper are often favored over the keyboard. Although in some cases (e.g. when copying text into a document or trying to organize written text) keyboard has a significant advantage over handwriting, in others it can cause difficulties, e.g. when writing lecture notes when equations, graphs and other marks are required. Also, when writing text, the keyboard is faster than handwriting for small alphabet languages, however for large-alphabet languages, like Chinese, keyboards are cumbersome. In these cases, where writing is favored, handwriting recognition offers the alternative's advantages.

On-line vs. Off-line

On-line handwriting recognition means that the machine recognizes the writing while the user is writing. On-line handwriting recognition requires a transducer that captures the writing as it is written. The most common of these devices are electronic tablet or digitizer. Off-line handwriting recognition, by contrast, is performed after the writing is complete. It can be performed later on. Off-line handwriting recognition is a subset of optical character recognition (OCR).

An advantage of on-line devices is that they capture the temporal or dynamic information of the writing. This information consists of the number of strokes, the order of the strokes, the direction of the writing for each stroke, and the speed of the writing within each stroke. A stroke is the writing from pen down to pen up. Most on-line transducers capture the trace of the handwriting or line drawing as a sequence of coordinate points. By contrast, off-line conversion of scanner data to line drawings usually requires costly and imperfect preprocessing to extract contours and to thin or skeletonize them. The temporal information provided by on-line entry improves recognition accuracy. The temporal information of on-line systems complicates recognition with variations that are not apparent in the static images. Nevertheless, these complications can be handled successfully, and the temporal information can be used to advantage. The main disadvantage of on-line handwriting recognition is that the writer is required to use special equipment. Unfortunately, current on-line equipment is not as comfortable and natural to use as pen and paper.

Recognition problems

There are many pattern recognition problems for handwriting and drawing on tablets: • Recognition of language symbols (e.g. the large alphabet of Chinese characters) • Pattern recognition (e.g. the pattern recognition problems for the various writing styles

of English – where sometimes even a "character segmentation" is required ) • Shape discrimination between characters that look alike (sometimes can only be

distinguished by context or relative position) • Recognition of equations, line drawings, and gestural symbols. • Noisy tablet data.

The goal In this project, I tried to simulate a tablet application, trying to recognize a user written symbol belonging to a certain alphabet, after "learning" his alphabet handwriting in advanced. For simplicity, I chose my alphabet to be the ten digits: 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0. In fact, alphabets usually contain more than 10 symbols and the symbols may be more complicated than these ten digits. But in order to demonstrate the general idea while keeping in mind that my means are very limited, I chose the above set of characters.

Approach and Method

Assumption: The user is always writing the alphabet symbols in a consisting way. I base my assumption on the fundamental property of writing making communication possible: differences between different characters are more significant than differences between different drawings of the same character. Learning and representing the symbols:

With this thought in mind I thought to use chain codes containing information about each symbol.

The learning process:

In theory, this process should all be integrated into the tablet or device used by the user, but since I don't have such a device, I used two applications in order to learn how the user is writing each symbol:

1. Easy Video Capture V1.3 – Allowing me to record a video of what is shown on the screen.

2. Virtual Dub 1.8.8 – Allowing me to convert the video file into a sequence of images

For each symbol, I recorded a video file of me writing the symbol and then converted it into a sequence of images. For example, I write the digit '9' like this:

A sequence of images produced from the video file looks like this:

Using matlab I created a sequence of coordinate, each describing the 'center of mass' for each difference of two sequential images. This way, also the order of coordinates written is saved. This way of processing the input has been dictated due to poor means, but has its own advantage: it actually performing a kind of "smoothing filter" on the input data. (One can see that the digit '9' below is smoother than the 'nines0033' digit in the last image above) The disadvantage is that crucial information may be lost, especially where sharp edges exit. A picture of the above coordinates matching the nine digit looks like this:

Where the red dot is the first coordinate in the sequence and the green dot is the last. Representing the Symbols: Each digit will be represented by a directions sequence, describing the changing directions during pen movement, according to coordinates measured during writing, meaning, each number in the sequence represents the angle (in degrees, measured relative to the x axis) between two following samples: each sample is a set of coordinates (x,y) belonging to the digit painted in the picture. The coordinate samples are stored according to the writing process of the user. A coordinates sequence matching the digit '9' in the above example will be:

The matching direction sequence will be:

[90, 90, 90, 108, 135, 126, 123 …… 270, 270, 270, 270, 270, 270, 270, 270, 270, 270, 270, 270, 270, 270, 270]

At the end of the learning process, we have an 'angles sequence vector' for each symbol. I assume this data is already known once the user is writing a symbol for the system to recognize. From now on I'll address such vector as our 'digit chain code'.

Choosing the right symbol

Choosing the right symbol is in fact the hardest stage in the process of recognition of any kind. After the learning process is complete we have 10 'digit chain code's representing the 10 digits matching the user hand writing. Once the user is writing an input symbol, the system has to decide what symbol in our alphabet it resembles the most.

In order to compare the input character with our alphabet, the following methods were tested:

1. Normalizing without using interpolation: for each pair of input and one of the alphabet 'digit chain code's, the two chains are normalized to the same length (the maximum length of both chains). The missing entries are filled with duplicating present entries in equally spaced location along the vector, according to the "stretching factor". Once equal

in size, the sum of differences in each entry is measured (basically it means measuring the differences using 1-NORM), and normalized.

2. Normalizing using interpolation: the only difference from the first method is in the process of normalizing both vectors to the same length: the additional entries are calculated using linear interpolation in between two adjacent entries. This way the matching process is invariant to scaling.

3. Using Relative chain codes: both previous approaches were implemented on a relative 'digit chain code's. We follow the same process on a sequence of differences between following angles measured. This way the matching process is invariant to rotation and scaling.

4. Divide and conquer: dividing each digit chain code to a number of smooth segments. The number of smooth segments is first compared, narrowing down the number of candidates. Then, for each segment, a similar procedure is taken as in 1, 2.

Final decision is made using L-1 norm of differences between compared chains.

Results

Here is the alphabet recorded by the user:

Method no. 1, 2:

These first two methods showed successful recognition results. Various symbols (tested number) in different sizes were tested and the results were unambiguous.

• Here are a few examples of symbols recognized correctly:

• There were cases where a symbol was recognized incorrectly. For example, observe the following tested input:

It has a relative small size. Once compared with the above alphabet, its length is being extended. Using the first method, where the entries are being duplicated, the 'digit chain code' 'looks like' a seven. These two chain codes consist of very similar angles. And indeed, although resulting in very similar norms, it has been recognized as seven. On the other hand, when using the second method, where interpolation is used, it is recognized correctly.

Method no. 3:

This method produces unsuccessful recognition results. The same database of samples was tested and the recognition was wrong in most cases.

Method no. 4:

This method was not tested. The theory behind it is to classify each symbol according to the number of smooth segment constructing the symbol. For example, in the user handwriting shown above, '0','6','8' are constructed from one smooth segment, '1', '2', '3','9' are constructed from two smooth sections, '5' & '7' – from three and '4' from four segments:

According to this characteristic, an initial selection of relevant candidates can be done. After having to choose from a smaller number of candidates, Methods 1/2/3 can be implemented on EACH smooth section separately. This method should be invariant to scaling and rotation.

Conclusions

• Using chain codes consisting of absolute angles has shown useful recognizing symbols and being invariant to scaling.

• If a larger alphabet is being used, naturally recognition process becomes harder. There are number of ways one should implement in order to successfully deal with resembling symbols in the alphabet: (1) perform smoothing and filtering on the input data. Filtering can also achieve normalization of the data according to time (2) Normalizing the size of input data.

• Using relative angles when representing a symbol as is should be done differently. Using some kind of divide and conquer method, or using normalized representation according to a time sequence of the writing process, should help the system overcome mistaken recognition.

Additional Information

• Full project report (or download it in PDF ) • Oral presentation slides (or download it in PDF ) • Downloadable source code.

References

[1] Tappert, C.C., Suen, C.Y., Wakahara, T. , "The State of the Art in On-Line Handwriting Recognition", 1990 [2] Plamondon, R. and Srihari, S.N., "On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey", 2000 [3] Doster, W. and Oed, R., "Word processing with on-line script recognition", IEEE MICRO, vol. 4, pp. 36-43, Oct, 1984. [4] Powers V. Michael, "Pen Direction Sequences in Character Recognition", Pattern Recognition Pergamon Press 1973. Vol. 5, pp. 291-302. [5] http://www.cs.bgu.ac.il/~ben-shahar/Teaching/Computational-Vision

Documents

On-line handwriting recognition using Chain Code ...ben-shahar/Teaching/Computational-Vision/... · On-line handwriting recognition using Chain Code ... Off-line handwriting recognition