Upload
sachin235
View
220
Download
0
Embed Size (px)
Citation preview
8/3/2019 A Visual Keyboard
1/6
A VISUAL KEYBOARD
Replacement for the conventional keyboard...
ABSTRACTInformation and communication
technologies can have a key role in
helping people with special
educational needs,considering both
physical and cognitive disabilities.Replacing a keyboard or mouse,
with eye-scanning cameras mounted
on computers have become
necessary tools for people without
limbs or those affected with
paralysis. The camera scans the
image of the character, allowing
users to "type" on a monitor as they
look at the Visual Keyboard.
The paper describes an input device,
based on eye scanning techniques
that allows people with severe motor
disabilities to use gaze for selecting
specific areas on a computer screen.
Compared to other existing
solutions, the strong points of this
approach are simplicity and novelty.
KEYWORDS:Vis-Key, ICT, Cornea, Sclera,
Choroids, Retina, Digitizer, Pixel,
Equalization, Contrast, Digital
image.
1. INTRODUCTION
Vis-Key aims at replacing the
conventional hardware keyboard with a
Visual Keyboard. It employs
sophisticated scanning and pattern
matching algorithms to achieve the
objective. It exploits the eyes natural
ability to navigate and spot familiarpatterns. Eye typing research extends
over twenty years; however, there is
little research on the design issues.
Recent research indicates that the type
of feedback impacts typing speed, error
rate, and the users need to switch her
gaze between the visual keyboard and
the monitor.
2. Description of Concepts
2.1 The Eye
8/3/2019 A Visual Keyboard
2/6
Fig 2.1 Cross Sectional View of the Human Eyes
Fig 2.1 shows us the horizontal cross
section of the human eye. The eye is
nearly a sphere with an average
diameter of approximately
20mm.Three membranes TheCornea & Sclera cover, the Choroid
layer and the Retina encloses the eye.
When the eye is properly focused, light
from an object is imaged on the retina.
Pattern vision is afforded by the
distribution of discrete light receptors
over the surface of the Retina. There
are two classes of receptors Cones
and Rods. The cones,typically present
in the central portion of the retina
called fovea is highly sensitive tocolor.
The number of cones in the human eye
ranges from 6-7 millions. These cones
can resolve fine details because they
are connected to its very own nerve
end. Cone vision is also known as
Photopic or Bright-light vision. The
rods are more in number when
compared to the cones (75-150
million). Several rods are connected to
a single nerve and hence reduce theamount of detail discernible by the
receptors. Rods give a general overall
picture of the view and not much
inclined towards color recognition.
Rod vision is also known as the
Scotopic vision or Dim-light vision as
illustrated in fig 2.1, the curvature of
the anterior surface of the lens is
greater than the radius of its posterior
surface. The shape of the lens is
controlled by the tension in the fiber of
the ciliary body. To focus on distant
objects, thecontrolling muscles cause
the lens to be relatively flattened.
Similarly to focus on nearer objects the
muscles allow the lens to be thicker.
The distance between the focal
distance of the lens and the retina
varies from 17 mm to 14 mm as
the refractive power of the lens
increases from its minimum to its
maximum.
3. The Vis-Key System
The main goal of our system is to
provide users suffering from severe
motor disabilities (and that therefore
are not able to use neither the keyboard
nor the mouse) with a system that
allows them to use a personal
computer.
Fig 3.1 System Hardware of the Vis-Key System
The Vis-Key system (fig 3.1)
comprises of a High Resolution camera
that constantly scans the eye in order to
capture the character image formed on
the Eye. The camera gives acontinuous streaming video as output.
The idea is to capture individual
frames at regular intervals (say of a
second). These frames are then
compared with the base frames stored
in the repository. If the probability of
success in matching exceeds the
threshold value, the corresponding
character is displayed on the screen.
The hardware requirements are simply
a personal computer, Vis-Key Layout(chart) and a web cam connected to the
USB port. The system design, which
refers to the software level, relies on
the construction, design and
implementation of image processing
algorithms applied to the captured
images of the user.
4. System Architecture
8/3/2019 A Visual Keyboard
3/6
4.1. Calibration
The calibration procedure aims at
initializing the system. The first
algorithm, whose goal is to identify the
face position, is applied only to the
first image, and the result will be used
for processing the successive images,
in order to speed up the process. This
choice is acceptable since the user is
supposed only to make minor
movements. If background is
completely black (easy to obtain) the
users face appears as a white spot, and
the borders can be obtained incorrespondence of a decrease in the
number of black pixels. The Camera
position is below the PC monitor; if it
were above, in fact, when the user
looks at the bottom of the screen the
iris would be partially covered by the
eyelid, making the identification of the
pupil very difficult. The user should
not be distant from the camera, so
that the image does not contain much
besides his/her face. The algorithmsthat respectively
identify the face, the eye and the pupil,
in fact, are based on scanning the
image to find the black pixel
concentration: the more complex the
image is, the slowest the algorithm is
too. Besides, the image resolution will
be lower. The suggested distance
is about 30 cm. The users face should
also be very well illuminated, and
therefore two lamps were posed on
each side of the computer screen. In
fact, since the identification algorithms
work on the black and white images,
shadows should not be present on the
Users face.
4.2. Image AcquisitionThe Camera image acquisition is
implemented via the Functions of the
AviCap window class that is part of the
Video for Windows (VFW) functions.
The entire image of the problem
domain would be scanned every
1/30 of second. The output of the
camera is fed to an Analog to Digital
converter (digitizer) and digitizes it.
Here we can extract individual frames
from the motion picture for furtheranalysis and processing.
4.3. Filtering of the eye component
The chosen algorithms work on a
binary (black and white) image, and
are based on extracting the
concentration of black Pixels. Three
algorithms are applied to the first
acquired image, while from the second
image on only the third one is applied.
Fig 4.3.1 Algorithm 1 Face Positioning
The First algorithm, whose goal is to
identify the face position, is applied
only to the first image, and the result
will be used for processing the
successive images, in order to speed up
the process. This choice is acceptable
since the user is supposed only to make
minor movements. The Face algorithm
converts the image in black and white,
and zooms it to obtain an image thatcontains only the users face. This is
8/3/2019 A Visual Keyboard
4/6
done by scanning the original image
and identifying the top, bottom, left
and right borders of the face. (Fig
4.3.1). Starting from the resulting
image, the Second algorithm extracts
the information about the eye position(both left and right) pixels is the one
that contains the eyes. The algorithm
uses this information to determine the
top and bottom borders of the eyes area
(Fig. 4.3.2), so that it is extracted from
the image. The new image is then
analyzed to identify the eye: the
algorithm finds the right and left
borders, and generates a new image
containing the left and right eyes
independently.
The procedure described up until now
is applied only to the first image of thesequence, and the data related to the
right eye position are stored in a buffer
and used also for the following images.
This is done to speed up the process,
and is acceptable if the user does only
minor head movements. The Third
algorithm extracts the position of the
center of the pupil from the right eye
image. The Iris identification
procedure uses the same approach of
the previous algorithm. First of all, the
left and right borders of the iris are iris,
is the one that has the higher
concentration of black pixels. The
center of this image represents also the
center of the pupil. The result of this
phase is the coordinates of the center
of the pupil for each of the image in
the sequence
4.4. Preprocessing
The key function of preprocessing is
to improve the image in ways to
improve the chances for success with
other processes. Here preprocessing
deals with 4 important techniques:
_To enhance the contrast of theimage.
_To eliminate/minimize the effect ofnoise on the image.
_To isolate regions whose textureindicates likelihood to alphanumeric
information.
_To provide equalization for theimage.
4.5. Segmentation
Segmentation broadly defines the
partitioning of an input image into its
constituent parts or objects. In general,
autonomous segmentation is one of the
most difficult tasks in Digital ImageProcessing. A rugged segmentation
procedure brings the process a
long way towards successful solution
of the image problem. In terms of
character recognition, the key role of
segmentation is to extract individual
characters from the problem domain.
The output of the segmentation stage is
raw pixel data, constituting either the
boundary of a region or all points in
the region itself. In either caseconverting the data into suitable form
for computer processing is necessary.
The first decision is to decide whether
the data should be represented as a
boundary or as a complete region.
Boundary representation is appropriate
when the focus is on external shape
characteristics like corners and
inflections. Regional representation is
appropriate when the focus is on
internal shape characteristics such as
8/3/2019 A Visual Keyboard
5/6
texture and skeletal shape. Description
also called feature selection deals with
extracting features that result in some
quantitative information of interest or
features that are basic for
differentiating one class of objectsfrom another.
4.6. Recognition and Interpretation
Recognition is the process that assigns
a label to an object based on the
information provided by its
descriptors. This process allows us to
cognitively recognize characters based
on knowledge base. Interpretation
involves assigning a meaning to an
ensemble of recognized objects.
Interpretation attempts to assignmeaning to a set of labeled entities. For
example, to identify a character say 'C',
we need to associate descriptors for
that character with label 'C'.
4.7. Knowledge Base
Knowledge about a particular problem
domain can be coded into an image
processing system in the form of a
Knowledge database. The knowledge
may be as simple as detailing regions
of an image where the information of
interest is known thus limiting our
search in seeking that information. Or
it can be quite complex such as an
image database where all image entries
are of high resolution. The key
distinction of this knowledge base is
that it, in addition to guiding the
operation of various components,
facilitates feedback operations
between the various modules of thesystem. This depiction on fig 4.1
indicated that communication between
processing modules is based on prior
knowledge of what a result should be.
5. Unique FeaturesThis model is a novel idea and the first
of its kind in the making. Also it opens
a new dimension to how we perceive
the world and should prove to be a
critical technological breakthroughconsidering the fact that there has not
been sufficient research in this field of
Eye scanning. If implemented, it will
be one of the awe-inspiring
technologies to hit the market.
6. Design Constraints:
Though this model is thoughtprovoking, we need to address the
design constraints as well.
_R & D constraints severely hamperour cause for a full-fledged working
model of the Vis-Key system.
_The need for a very high resolutioncamera calls for a high initial
investment.
_The accuracy and the processingcapabilities of the algorithm are very
much liable to quality of the input.
Due to these design constraints we will
be able to chalk out a plan to
encompass modules 4, 5, 6 and 7
(Preprocessing, Segmentation and
Representation, Recognition and
Interpretation and the Knowledge
base). The most preferred softwares to
implement these algorithms are C++
and Mat Lab 6.0.
7. Alternatives/RelatedReferencesThe approaches till date have only
been centered on the Eye Tracking
theory. It lays more emphasis on the
use of eye as a cursor and not as a data
input device. An eye-tracking device
lets users select letters from a screen.
Dasher, the prototype program taps
into the natural gaze of the eye and
makes predictable words and phrases
simpler to write, said David MacKay,project coordinator and physics
professor from Cambridge University.
Dasher calculates the probability of
one letter coming after another. It then
presents the letters required as if
contained on infinitely expanding
bookshelves. Researchers say people
will be able to write up to 25 words per
minute with Dasher compared to on-
screen keyboards, which they say
average about 15 words per minute.
Eye-tracking devices are still
8/3/2019 A Visual Keyboard
6/6
problematic. "They need re-calibrating
each time you look away from the
computer," says Willis.
He controls Dasher using a trackball.
Bibliography:_http://www.cs.uta.fi/~curly/publications/ECEM12-Majaranta.html_www.inference.phy.cam.ac.uk/djw30/dasher/eye.html_http://www.inference.phy.cam.ac.uk/dasher/_http://www.cs.uta.fi/hci/gaze/eyetyping.php_http://www.acm.org/sigcaph
_Ward, D. J. & MacKay, D. J. C.Fasthands-free writing by gazedirection.Nature, 418, 838, (2002)._Daisheng Luo Pattern
Recognition and ImageProcessing Horwood series inengineering sciences.