6
VIMPY – A Yapper for the Visually Impaired Amiya Tripathy 1 , Avanish Pathak 2 , Amruta Rodrigues 3 , Charu Chaudhari 4 Computer Engineering, Don Bosco Institute of Technology, Mumbai, India 1 [email protected], 2 [email protected], 3 [email protected], 4 [email protected] AbstractTraditionally, visually impaired people have had only a limited means of accessing printed material. The accessibility of the huge knowledge bank that is available in the form of printed text is thus very restricted for them. A visually impaired person has two traditional ways of accessing printed data; they are the Braille script and audio tapes. However, even the assistance of these media is not enough and the blind have to rely on the sighted people for mundane reading. This reliance and the higher costs of producing alternative format materials necessarily reduce the material availability. In this work it has been proposed to eradicate the above mentioned hindrance. VIMPY- A Yapper for the Visually Impaired is a reader for the blind that helps reading any printed A4 size physical document. It is a standalone device which uses its own operating system, VimpyOS. An attempt has been made to develop a machine that converts physical documents containing printed text to speech and provides other facilities such as options to save text for later use, customize user experience by altering pitch, volume, speed and also read text files in the Devnagiri script etc. Keywords - Optical Character Recognition (OCR); Assistive Technology; Print-to-speech; Data Tunneling; Remastering OS. I. INTRODUCTION The problems faced by the visually impaired are far beyond the understanding of the ones privileged with sight. According to the World Health Organization (WHO), 285 million people are visually impaired worldwide about 39 million are blind and 246 have low vision [19]. The inaccessibility of the huge knowledge bank which exists in the form of printed documents for the blind is indeed a huge setback for the community. The only form of transfer of information for them is by relying on sighted people. This reliability is a big hindrance for them. For years, the source of information for the blind was speech. The creation of Braille was a great improvement in the way they could access the information [20]. Since the development of Braille, a lot of books have been converted to it. Although, it was a huge step, not all the documents can be changed to Braille. Recent years have seen a wider adoption of audio recordings [21], but like Braille these suffer from a lack of immediacy and availability. This major drawback is being overcome by the recent advancements in technology. These technologies are continuously trying to eradicate this major obstacle towards a normal life of the visually impaired. Assistive technology is a term that includes assistive, adaptive, and rehabilitative devices for people with disabilities and also includes the process used in selecting, locating, and using them [24]. Various assistive technologies on the same lines have been developed in order to deal with this problem. Some of these technologies are readers(reading devices) that convert text to braille documents or provide a tactile display and others that provide a text to speech transformation [1] [2] [3]. Some of the most recognized assistive technologies that are being used have been studied and their user reviews are observed. The Intel reader is a hand-held device that uses a five-megapixel camera with flash to take a picture of the text that the Reader then converts to speech [8]. Whereas, the Open Book Reader is a software that can create MP3 or WAV formats. It allows one to convert OpenBook files to .brf and .brl Grade II Braille formats [9]. KNFB - Kurzweil-National Federation for the Blind is an application for mobile phones which uses the built in camera of the mobile phone [10]. Zoom Reader is Compatible with iPhone 4 with iOS 4.2+ only [11]. The major drawback of all these aids is that they use a camera for which focus on the document is very important. This can prove to be of great difficulty for a blind user. Portability is an important issue in the present scenario, and hence the need for an independent standalone device is very prominent. Technology has proved to be a very useful tool in reducing the manual work involved in any operation. Similarly, it can also be used to simplify the long and tedious process of acquiring information from a document that the blind have to undergo. Therefore, a standalone device that provides a ‘reading system’ that can ‘read out’ a page and gives a user-friendly interface can prove to be of importance in revolutionizing the trends in assistive technologies for the blind masses. In this work, a system named Vimpy- A Yapper for the Visually Impaired, which could deliver to the above mentioned requirement, has been developed. VIMPY is a system which makes a tangible document available to the user by reading aloud its contents. The motivation for VIMPY lies in the inaccessibility of the knowledge bank in the form of printed documents for the visually impaired community. This work is an implementation of the above described analogy. This is a standalone device that scans a printed A-4 size document and uses a speech module in order to produce a recognizable audio output. This device makes the tedious task of ‘reading’ a document for the blind community much easier. The main aim of VIMPY is to read out printed A-4 size documents, books, etc. The key idea behind VIMPY is to use a scanner to scan a document. This scanned image is then converted to 167 978-1-4673-4805-8/12/$31.00 c 2012 IEEE

[IEEE 2012 World Congress on Information and Communication Technologies (WICT) - Trivandrum, India (2012.10.30-2012.11.2)] 2012 World Congress on Information and Communication Technologies

  • Upload
    charu

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

VIMPY – A Yapper for the Visually Impaired

Amiya Tripathy 1, Avanish Pathak 2, Amruta Rodrigues 3, Charu Chaudhari 4 Computer Engineering, Don Bosco Institute of Technology, Mumbai, India

1 [email protected], 2 [email protected], 3 [email protected], 4 [email protected]

Abstract— Traditionally, visually impaired people have had only a limited means of accessing printed material. The accessibility of the huge knowledge bank that is available in the form of printed text is thus very restricted for them. A visually impaired person has two traditional ways of accessing printed data; they are the Braille script and audio tapes. However, even the assistance of these media is not enough and the blind have to rely on the sighted people for mundane reading. This reliance and the higher costs of producing alternative format materials necessarily reduce the material availability. In this work it has been proposed to eradicate the above mentioned hindrance. VIMPY- A Yapper for the Visually Impaired is a reader for the blind that helps reading any printed A4 size physical document. It is a standalone device which uses its own operating system, VimpyOS. An attempt has been made to develop a machine that converts physical documents containing printed text to speech and provides other facilities such as options to save text for later use, customize user experience by altering pitch, volume, speed and also read text files in the Devnagiri script etc. Keywords - Optical Character Recognition (OCR); Assistive Technology; Print-to-speech; Data Tunneling; Remastering OS.

I. INTRODUCTION The problems faced by the visually impaired are far

beyond the understanding of the ones privileged with sight. According to the World Health Organization (WHO), 285 million people are visually impaired worldwide about 39 million are blind and 246 have low vision [19].

The inaccessibility of the huge knowledge bank which exists in the form of printed documents for the blind is indeed a huge setback for the community. The only form of transfer of information for them is by relying on sighted people. This reliability is a big hindrance for them. For years, the source of information for the blind was speech. The creation of Braille was a great improvement in the way they could access the information [20]. Since the development of Braille, a lot of books have been converted to it. Although, it was a huge step, not all the documents can be changed to Braille.

Recent years have seen a wider adoption of audio recordings [21], but like Braille these suffer from a lack of immediacy and availability. This major drawback is being overcome by the recent advancements in technology. These technologies are continuously trying to eradicate this major obstacle towards a normal life of the visually impaired.

Assistive technology is a term that includes assistive, adaptive, and rehabilitative devices for people with

disabilities and also includes the process used in selecting, locating, and using them [24]. Various assistive technologies on the same lines have been developed in order to deal with this problem. Some of these technologies are readers(reading devices) that convert text to braille documents or provide a tactile display and others that provide a text to speech transformation [1] [2] [3].

Some of the most recognized assistive technologies that are being used have been studied and their user reviews are observed. The Intel reader is a hand-held device that uses a five-megapixel camera with flash to take a picture of the text that the Reader then converts to speech [8]. Whereas, the Open Book Reader is a software that can create MP3 or WAV formats. It allows one to convert OpenBook files to .brf and .brl Grade II Braille formats [9].

KNFB - Kurzweil-National Federation for the Blind is an application for mobile phones which uses the built in camera of the mobile phone [10]. Zoom Reader is Compatible with iPhone 4 with iOS 4.2+ only [11]. The major drawback of all these aids is that they use a camera for which focus on the document is very important. This can prove to be of great difficulty for a blind user.

Portability is an important issue in the present scenario, and hence the need for an independent standalone device is very prominent. Technology has proved to be a very useful tool in reducing the manual work involved in any operation. Similarly, it can also be used to simplify the long and tedious process of acquiring information from a document that the blind have to undergo. Therefore, a standalone device that provides a ‘reading system’ that can ‘read out’ a page and gives a user-friendly interface can prove to be of importance in revolutionizing the trends in assistive technologies for the blind masses.

In this work, a system named Vimpy- A Yapper for the Visually Impaired, which could deliver to the above mentioned requirement, has been developed. VIMPY is a system which makes a tangible document available to the user by reading aloud its contents. The motivation for VIMPY lies in the inaccessibility of the knowledge bank in the form of printed documents for the visually impaired community. This work is an implementation of the above described analogy. This is a standalone device that scans a printed A-4 size document and uses a speech module in order to produce a recognizable audio output.

This device makes the tedious task of ‘reading’ a document for the blind community much easier. The main aim of VIMPY is to read out printed A-4 size documents, books, etc. The key idea behind VIMPY is to use a scanner to scan a document. This scanned image is then converted to

167978-1-4673-4805-8/12/$31.00 c©2012 IEEE

text using OCR (optical character recognition) [4]. This text is then passed to a speech module which reads it aloud.

II. MATERIALS AND METHODS

The major technologies used in Vimpy are Optical Character Recognition (OCR) [5, 6], Speech synthesis and Shell script programming. Optical Character Recognition is used to extract the alphabets and words from an image. This is used to recognize the contents of the document. A speech synthesizer is used to speak out the contents of the extracted text. Apart from this, speech synthesis helps in interacting with the user, and thus is a very important module in Vimpy. Shell script programming is used to perform the functions of interfacing and calling the different modules of Vimpy. Data tunneling is the method of passing operational data from one module to another through a virtual tunnel. This again, is important for Vimpy to work as the different modules need to be integrated appropriately.

The packages used for OCR and speech synthesis are Cuneiform and eSpeak respectively. Cuneiform is a software package that recognizes most of the print fonts [13] [14]. eSpeak is a speech synthesizing software which reads aloud a text file. eSpeak also allows for different modifications in the way the text is read [17]. Both these packages have been chosen as they are Open Source and due to their lightweight nature.

As the basic process starts the scanner scans an image and passes the scanned image to Cuneiform (using data tunneling) to convert the image to a text file [13]. The character recognition software picks up the image of the printed text and identifies letters and symbols for the phonetic translation module and writes them in a text file. The phonetic translation module produces the specifications for the speech sounds, which are then converted into synthetic speech in real time by the speech synthesis module, eSpeak [14]. eSpeak is used to generate a .wav file of the content of the text data. In the last stage of data tunneling, the .wav extension file is used by the playback package mplayer to produce the desired output (which is the speech format of the physical document) [18].

Although, eSpeak can be used to provide a speech output directly, mplayer is used so that the keyboard shortcuts for pausing, forwarding, rewinding can be made usable. The keyboard shortcuts make it possible to move forward and backward in the document. Also, the output can be temporarily paused and resumed later. This facility is useful in case the user misses out on a certain important part of the document or would like to listen to it again and also, when the user finds it unnecessary to listen to the whole document.

Vimpy uses a scanner to scan a document. A scanner is used to convert the physical document to its electronic counterpart (an image of the document). Vimpy uses Scanner Access Now Easy (SANE) API to interface the scanner with the Operating System [16]. SANE provides standardized access to any raster image scanner hardware. Vimpy is designed and tested using the most commonly used scanner

Canon Lide 110 but is also supported by other scanners that are SANE compatible [16].

The scanner scans the document and passes an image file (.tiff format) and tunnels it to the software. This interfacing of the hardware and the software and their management is done by VimpyOS (a specialized operating system for Vimpy). This operating system has been made specifically for this device. It has been built on top a customized Linux kernel [7].

The kernel has been modified in order to include only the modules that are useful for Vimpy. Many modules, such as networking support, that are unnecessary as far as the purpose of Vimpy is concerned, have been excluded; other modules such as USB support, audio drivers have been optimized to greater detail, to make Vimpy work efficiently. This way, the kernel is made extremely lightweight so that there are no overloads present in it. The exclusively compiled kernel thus helps save the system resources which may be wasted on unneeded processes. The optimized kernel then forms the base of the VimpyOS.

Figure 1. Vimpy initramfs initialization screen

The OS has been initially designed for Intel Chipsets and a stable release for AMD based chipsets is being worked upon. The “stable” VimpyOS is the key to having a portable reading device; this means when the OS boots on any compatible hardware the user can resume the functioning of the system from the same state as he/she left it before. The VimpyOS, like all other Linux based Operating Systems uses an initramfs, a root file system which is embedded into the kernel and loaded at an early stage of the boot process. Figure 1 shows the initramfs initialization screen. The VimpyOS was developed from ground up starting with the kernel and going all the way up to the terminal interface. Figure 2 is a description of this interface. This terminal is for troubleshooting purposes only. The highlight of the operating system is that it does not require a visual device or monitor for its usability; it has been developed considering the visually impaired community. The VimpyOS is labeled as stable as it can be operated on a variety of Intel chipsets; it has been tested on i3, i5, dual core and Pentium 4 processors.

168 2012 World Congress on Information and Communication Technologies

Vimpy uses initramfs which is a root file system embedded into the kernel and loaded at an early stage of the boot process. This file system is used when the functionality of the kernel has to be enhanced since it provides an early user space. It can be used for loading third party modules, provide a rescue shell, mount a root partition, etc. The initramfs usually contains at least one file, /init. This file is executed by the kernel as the main init process (PID 1). In addition, there can be any number of additional files and directories that are required by /init. When the kernel mounts the initramfs, the target root partition is not yet mounted, so the files saved in it can't be accessed. Therefore, all the required files need to be included in initramfs [25].

Figure 2. Vimpy Interface

The VimpyOS has been licensed under GNU GPL version 3.0[22]. Any external drive such as a CD or pen-drive can be used to live boot VimpyOS. The Vimpy package is also exported to Ubuntu in order to support the freedom of its usage [12].

As VimpyOS loads, the Vimpy Program starts up immediately. This program is written in C language and uses shell script commands to perform the necessary functions. This program is the software component of Vimpy and the back bone of the device. This program initiates the data tunneling process by providing an interface in the form of a ‘talking menu”. This menu serves as the complete user interface and provides all the functions offered by Vimpy. It presents the user with several options that allow him/her to perform an operation best suited him/her. All these options are read out to the user so that a blind user can easily use it and no display device is used. The flow chart in figure 3 gives a description of this menu.

A document can be scanned and read out, and after reading, if the user wishes to save the file, this can be done. Also, an option to directly save the file for later use is provided. In order to read a page horizontally, for example in case of books (novels), alt+enter should be pressed to select an option instead of enter. The documents are saved as text files in a folder called “Text” in the memory. This folder internally has time-stamped folders according to the

date. The files saved are stored in the folder of the current date.

The files stored in these folders can be accessed by navigating through these folders. Files in an external drive such as a USB stick can also be read. Language Localization is also an innovative feature of Vimpy, it can read text files written in the Devnagiri script as well. Therefore, Hindi/Marathi files in a USB stick can be read. After a selecting the file to be read from the USB, the user is prompted to answer whether he/she wants to read the file in Hindi/Marathi or English. To provide portability, the Text folder can be dumped to an external drive. The files saved in it can be used on any other machine. The settings of Vimpy can also be customized according to the preference of the user. The amplitude and speed of the speech output can be controlled. Also, the user has an option to choose between a male and a female voice. Also, a “Restore Factory Settings” option is provided which resets all the customizations made to the settings back to default. The folders that were created also are deleted in this process. Hence, this option also provides memory management for Vimpy. This way, Vimpy is made brand new. IN order to quit or switch off Vimpy, an option called “Exit” is provided. The program can also be left by pressing alt+enter (instead of only enter to select the option) to use the terminal interface. Although, the menu providing all the above mentioned options is the essence of Vimpy, the terminal interface can be used by programmers and experts to make technical changes in it.

A help tutorial is available to the user due to which the user can understand how to use and navigate swiftly through the program to maximize the ease of use and a comfortable user experience without the need of vision.

The user can navigate through and select the options using the keyboard. The keys used for navigation are the arrow keys and in order to select an option, enter is used. The data tunnel moves the data through a series of processes which are ready to execute with the raw data.

The Vimpy interactive program also called the talking menu is built using the Ncurses library [15]. ncurses (new curses) is a programming library that provides an API which allows the programmer to write text-based user interfaces in a terminal-independent manner. It is a toolkit for developing "GUI-like" application software that runs under a terminal emulator. It also optimizes screen changes, in order to reduce the latency experienced when using remote shells [25].

Figure 3. Schematic Diagram of Vimpy

2012 World Congress on Information and Communication Technologies 169

Figure 4. Vimpy functionalities and interactions

The hardware requirements for the basic functioning of

Vimpy Device is a scanner, a keyboard, an atom board,RAM, speakers/headphones and SMPS.A scanner is used to convert the physical document to its electronic counterpart.The keyboard is used to help navigate through the VimpyMenu. An atom board is the basic processing unit and is used to interface the hardware and software components with each other. Speakers/headphones/earphones are very important as these provide the output audio of Vimpy. SMPS is used to provide electrical power to the all the necessary hardware.

III. RESULTS AND DISCUSSIONS Vimpy, a standalone device that bridges the gap between

the physical documents and their electronic versions is created. This device is designed keeping in mind the many obstacles that visually impaired people face in order to access information. VimpyOS, a stable operating system, which is built on top of a linux based kernel (version 2.6.35.2), has been developed [23]. The generic C program has also been made. This program provides the interface to the user via speech synthesis. The user can navigate through the various options provided using the arrow keys. An option can be selected by pressing the enter key. The entire program is integrated with speech notifications in order to provide interactivity and user-friendliness. While reading the document, the user can stop, pause, forward, and rewind by using the keys q, p, and the arrow keys respectively. The

up-down for forward-backward by one minute and right-left keys for backward-forward by ten seconds [18].

Vimpy is an open source project, and in order to promote its freedom, the OS as well as the generic program can be downloaded from a dedicated Vimpy site www.vimpy.in. This site also provides step wise tutorials on how to install the OS as well as the program and to make your own Vimpy Device. The user is not restricted to using specialized hardware as it is an open source project. The scanner depends on SANE in order to work as it has to be interfaced with the OS [16].

Although the system has proved to be helpful to the blind, there are certain features that are missing from it. Thus, there are improvements that can be incorporated in Vimpy in order to make it more efficient. For example, the orientation in which the paper is inserted in the scanner can vary and this could lead to an undesired output. Also, the use of a keyboard is still difficult when compared to voice commands.

The system serves an accuracy of approximately 99.7% for regular font styles such as Times New Roman, Arial, etc. Vimpy is also able to read very small font sizes with an accuracy of 98.67%..

The visually impaired fraternity is the main stakeholder of this system. Therefore, testing the device with them is of

170 2012 World Congress on Information and Communication Technologies

great significance. For this reason, Vimpy has been tested by several representatives of the blind community. Mr. Krishnakant Mane is an IT advisor for many Government projects. As per his suggestions, the Vimpy Device has been ergonomically designed and developed. The said improvements are shown in figure 5 and 6.

Figure 5. The Vimpy Device

Figure 6. The Vimpy Device with the hardware components

Mr Satej Dicholkar works as a professor of Computers in Happy Home and School for the Blind. He is partially visually impaired. Vimpy Device was tested on the 20th of April 2012 in Happy Home and School for the Blind under his supervision. As per the feedback Vimpy was found to be highly efficient in terms of the functionalities. A few suggestions were also made which were language localization for scanning as well as reading and converting the document to an audio format which can be used on mp3 players. Work is under progress for finding a stable Hindi OCR and also for saving of audio files in external drives such as USB sticks. The future scope consists of including handwriting analysis thus helping one read even the handwritten text. VIMPY is an open source module thus it is easily available and open to contributions through development, user feedback and bug reporting. Its further

distribution to social organizations and libraries for the blind is being worked upon.

IV. CONCLUSION The need for making printed information available to a

visually impaired user forms the basis and motivation of Vimpy. The project is aimed to help the visually impaired access data in an easier and better way. Vimpy is affordable as compared to the other existing devices/software, robust, efficient, user friendly as well as portable.

Vimpy has been implemented successfully. It has been also tested by several visually impaired people, their suggestions and issues have been duly noted and most of the beneficial suggestions have been integrated in Vimpy. The VimpyOS has been licensed under GNU GPL version 3.0.The VimpyOS as well as the generic program is available for downloading at www.vimpy.in. This site also provides tutorials and help as to how to use Vimpy and to make the Vimpy Device so that the user can be independent.

ACKNOWLEDGMENT We would like to thank and our sincere gratitude to Mr. Krishnakant Mane for his guidance. This project has been sponsored by Don Bosco Institute of Technology, Mumbai, India.

REFERENCES [1] VelazquezR.,Hernandez H., PrezaE.; A portable eBook reader for

the blind. Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE. Print ISBN: 978-1-4244-4123-5, Issue date:Aug. 31 2010-Sept. 4 2010, Pp: 2107 – 2110.

[2] Davis, J.H.; Print recognition apparatus for blind readers. Journal of the British Institution of Radio Engineers; Vol: 24 Issue: 2; Aug 1962, Pp: 103 – 110.

[3] Huang Xiaoli; Li Tao; Hu Bing; Cheng Qiang; Xiao Qiang; Huang Qiang; Electronic Reader for the Blind Based on MCU.Electrical and Control Engineering (ICECE), 2010International Conference. Print ISBN: 978-1-4244-6880-5, Issue Date: 25-27 June 2010. Pp: 888 – 890.

[4] Jianli Liu, Nugent J.H., Bowen D.G., Bowen, J.E.; Intelligent OCR editor, Electrical and Computer Engineering (1993). Canadian Conference, Vol: 3, Pp: 9-11.

[5] Lee F.; Reading Machine : from text to speech; IEEE Transactions on audio and electroacoustics;Vol:17 Issue:4;Dec 2009 Pp: 275-282.

[6] O’Malley, M.H.: Text-to-speech conversion technology, Volume: 23 Issue: 8, Aug 1990 Pp: 17-23.

[7] Thiruvathukal, G.K.; Gentoo Linux: the next generationof Linux, Computing in Science & Engineering, Issue Date:Oct. 2004, Vol: 6, Issue: 5,Pp: 66 – 74.

[8] About Reader. Intel For Business, Online article, www.intel.com/corporate/healthcare/emea/eng/reader/features.htm (Accessed on 5 March 2011).

[9] Open Book. Nanopac, Inc. Technology for Independence, Online article(Accessed on 5 September 2011) www.nanopac.com/open%20book.htm.

[10] KNFB Reading Technology. KNFB Reading Technology Inc., Online article,(Accessed on 5 September 2011).http://www.knfbreader.com/.

2012 World Congress on Information and Communication Technologies 171

[11] Zoom Reader Technology. Global Accessibility News., Online article, (Accessed on 5 September 2011). globalaccessibilitynews.com/2011/04/13/zoom-reader-for-iphone-4-an-app-for-low-vision-users/.

[12] The Ubuntu Project, Online article, www.ubuntu.com/project (Accessed on 10 September 2011).

[13] Cuneiform, Online article (Accessed on 2 October 2011).en.wikipedia.org/wiki/CuneiForm.

[14] eSpeak,Online article, eSpeak.sourceforge.net(Accessed on 5 February 2012).

[15] Ncurses and Cshell programming, Online article,tldp.org/HOWTO/NCURSES-Programming-HOWTO/menus.html (Accessed on 20 January 2012).

[16] SANE – Scanner Access Now Easy, Online article, en.wikipedia.org/wiki/Scanner_Access_Now_Easy (Accessed on 2 October 2011).

[17] eSpeak commands, Online article(Accessed on 25 January 2012). www.eSpeak.sourceforge.net/commands.html.

[18] Mplayer keyboard controls, Online article, www.mplayerhq.hu/DOCS/man/en/mplayer.1.html (Accessed on 22 February 2012.).

[19] Visual Impairment and Blindness, Fact Sheet N*282, Online article (Accessed on 2 October 2011) www.who.int/mediacentre/factsheets/fs282/en/.

[20] Braille, Online article (Accessed on 21 February 2012.). www.wikipedia.org/wiki/Braille.

[21] Audiobook, Online article, (Accessed on 21 February 2012). wikipedia.org/wiki/Audiobook.

[22] GNU GPL License, Online article, (Accessed on 2 March 2012). www.gnu.org/licenses/gpl-howto.html.

[23] Linux Kernel, Online article, (Accessed on 2 January 2012). www.kernel.org/.

[24] Assistive Technologies, Online Article, (Accessed on 30 March, 2012). wikipedia.org/wiki/Ncurses.

[25] Initramfs. Ubuntu Wiki, Online Article (Accessed on 1 March 2011) https://wiki.ubuntu.com/Initramfs.

172 2012 World Congress on Information and Communication Technologies