A Visual Keyboard

Embed Size (px)

Citation preview

  • 8/3/2019 A Visual Keyboard

    1/6

    A VISUAL KEYBOARD

    Replacement for the conventional keyboard...

    ABSTRACTInformation and communication

    technologies can have a key role in

    helping people with special

    educational needs,considering both

    physical and cognitive disabilities.Replacing a keyboard or mouse,

    with eye-scanning cameras mounted

    on computers have become

    necessary tools for people without

    limbs or those affected with

    paralysis. The camera scans the

    image of the character, allowing

    users to "type" on a monitor as they

    look at the Visual Keyboard.

    The paper describes an input device,

    based on eye scanning techniques

    that allows people with severe motor

    disabilities to use gaze for selecting

    specific areas on a computer screen.

    Compared to other existing

    solutions, the strong points of this

    approach are simplicity and novelty.

    KEYWORDS:Vis-Key, ICT, Cornea, Sclera,

    Choroids, Retina, Digitizer, Pixel,

    Equalization, Contrast, Digital

    image.

    1. INTRODUCTION

    Vis-Key aims at replacing the

    conventional hardware keyboard with a

    Visual Keyboard. It employs

    sophisticated scanning and pattern

    matching algorithms to achieve the

    objective. It exploits the eyes natural

    ability to navigate and spot familiarpatterns. Eye typing research extends

    over twenty years; however, there is

    little research on the design issues.

    Recent research indicates that the type

    of feedback impacts typing speed, error

    rate, and the users need to switch her

    gaze between the visual keyboard and

    the monitor.

    2. Description of Concepts

    2.1 The Eye

  • 8/3/2019 A Visual Keyboard

    2/6

    Fig 2.1 Cross Sectional View of the Human Eyes

    Fig 2.1 shows us the horizontal cross

    section of the human eye. The eye is

    nearly a sphere with an average

    diameter of approximately

    20mm.Three membranes TheCornea & Sclera cover, the Choroid

    layer and the Retina encloses the eye.

    When the eye is properly focused, light

    from an object is imaged on the retina.

    Pattern vision is afforded by the

    distribution of discrete light receptors

    over the surface of the Retina. There

    are two classes of receptors Cones

    and Rods. The cones,typically present

    in the central portion of the retina

    called fovea is highly sensitive tocolor.

    The number of cones in the human eye

    ranges from 6-7 millions. These cones

    can resolve fine details because they

    are connected to its very own nerve

    end. Cone vision is also known as

    Photopic or Bright-light vision. The

    rods are more in number when

    compared to the cones (75-150

    million). Several rods are connected to

    a single nerve and hence reduce theamount of detail discernible by the

    receptors. Rods give a general overall

    picture of the view and not much

    inclined towards color recognition.

    Rod vision is also known as the

    Scotopic vision or Dim-light vision as

    illustrated in fig 2.1, the curvature of

    the anterior surface of the lens is

    greater than the radius of its posterior

    surface. The shape of the lens is

    controlled by the tension in the fiber of

    the ciliary body. To focus on distant

    objects, thecontrolling muscles cause

    the lens to be relatively flattened.

    Similarly to focus on nearer objects the

    muscles allow the lens to be thicker.

    The distance between the focal

    distance of the lens and the retina

    varies from 17 mm to 14 mm as

    the refractive power of the lens

    increases from its minimum to its

    maximum.

    3. The Vis-Key System

    The main goal of our system is to

    provide users suffering from severe

    motor disabilities (and that therefore

    are not able to use neither the keyboard

    nor the mouse) with a system that

    allows them to use a personal

    computer.

    Fig 3.1 System Hardware of the Vis-Key System

    The Vis-Key system (fig 3.1)

    comprises of a High Resolution camera

    that constantly scans the eye in order to

    capture the character image formed on

    the Eye. The camera gives acontinuous streaming video as output.

    The idea is to capture individual

    frames at regular intervals (say of a

    second). These frames are then

    compared with the base frames stored

    in the repository. If the probability of

    success in matching exceeds the

    threshold value, the corresponding

    character is displayed on the screen.

    The hardware requirements are simply

    a personal computer, Vis-Key Layout(chart) and a web cam connected to the

    USB port. The system design, which

    refers to the software level, relies on

    the construction, design and

    implementation of image processing

    algorithms applied to the captured

    images of the user.

    4. System Architecture

  • 8/3/2019 A Visual Keyboard

    3/6

    4.1. Calibration

    The calibration procedure aims at

    initializing the system. The first

    algorithm, whose goal is to identify the

    face position, is applied only to the

    first image, and the result will be used

    for processing the successive images,

    in order to speed up the process. This

    choice is acceptable since the user is

    supposed only to make minor

    movements. If background is

    completely black (easy to obtain) the

    users face appears as a white spot, and

    the borders can be obtained incorrespondence of a decrease in the

    number of black pixels. The Camera

    position is below the PC monitor; if it

    were above, in fact, when the user

    looks at the bottom of the screen the

    iris would be partially covered by the

    eyelid, making the identification of the

    pupil very difficult. The user should

    not be distant from the camera, so

    that the image does not contain much

    besides his/her face. The algorithmsthat respectively

    identify the face, the eye and the pupil,

    in fact, are based on scanning the

    image to find the black pixel

    concentration: the more complex the

    image is, the slowest the algorithm is

    too. Besides, the image resolution will

    be lower. The suggested distance

    is about 30 cm. The users face should

    also be very well illuminated, and

    therefore two lamps were posed on

    each side of the computer screen. In

    fact, since the identification algorithms

    work on the black and white images,

    shadows should not be present on the

    Users face.

    4.2. Image AcquisitionThe Camera image acquisition is

    implemented via the Functions of the

    AviCap window class that is part of the

    Video for Windows (VFW) functions.

    The entire image of the problem

    domain would be scanned every

    1/30 of second. The output of the

    camera is fed to an Analog to Digital

    converter (digitizer) and digitizes it.

    Here we can extract individual frames

    from the motion picture for furtheranalysis and processing.

    4.3. Filtering of the eye component

    The chosen algorithms work on a

    binary (black and white) image, and

    are based on extracting the

    concentration of black Pixels. Three

    algorithms are applied to the first

    acquired image, while from the second

    image on only the third one is applied.

    Fig 4.3.1 Algorithm 1 Face Positioning

    The First algorithm, whose goal is to

    identify the face position, is applied

    only to the first image, and the result

    will be used for processing the

    successive images, in order to speed up

    the process. This choice is acceptable

    since the user is supposed only to make

    minor movements. The Face algorithm

    converts the image in black and white,

    and zooms it to obtain an image thatcontains only the users face. This is

  • 8/3/2019 A Visual Keyboard

    4/6

    done by scanning the original image

    and identifying the top, bottom, left

    and right borders of the face. (Fig

    4.3.1). Starting from the resulting

    image, the Second algorithm extracts

    the information about the eye position(both left and right) pixels is the one

    that contains the eyes. The algorithm

    uses this information to determine the

    top and bottom borders of the eyes area

    (Fig. 4.3.2), so that it is extracted from

    the image. The new image is then

    analyzed to identify the eye: the

    algorithm finds the right and left

    borders, and generates a new image

    containing the left and right eyes

    independently.

    The procedure described up until now

    is applied only to the first image of thesequence, and the data related to the

    right eye position are stored in a buffer

    and used also for the following images.

    This is done to speed up the process,

    and is acceptable if the user does only

    minor head movements. The Third

    algorithm extracts the position of the

    center of the pupil from the right eye

    image. The Iris identification

    procedure uses the same approach of

    the previous algorithm. First of all, the

    left and right borders of the iris are iris,

    is the one that has the higher

    concentration of black pixels. The

    center of this image represents also the

    center of the pupil. The result of this

    phase is the coordinates of the center

    of the pupil for each of the image in

    the sequence

    4.4. Preprocessing

    The key function of preprocessing is

    to improve the image in ways to

    improve the chances for success with

    other processes. Here preprocessing

    deals with 4 important techniques:

    _To enhance the contrast of theimage.

    _To eliminate/minimize the effect ofnoise on the image.

    _To isolate regions whose textureindicates likelihood to alphanumeric

    information.

    _To provide equalization for theimage.

    4.5. Segmentation

    Segmentation broadly defines the

    partitioning of an input image into its

    constituent parts or objects. In general,

    autonomous segmentation is one of the

    most difficult tasks in Digital ImageProcessing. A rugged segmentation

    procedure brings the process a

    long way towards successful solution

    of the image problem. In terms of

    character recognition, the key role of

    segmentation is to extract individual

    characters from the problem domain.

    The output of the segmentation stage is

    raw pixel data, constituting either the

    boundary of a region or all points in

    the region itself. In either caseconverting the data into suitable form

    for computer processing is necessary.

    The first decision is to decide whether

    the data should be represented as a

    boundary or as a complete region.

    Boundary representation is appropriate

    when the focus is on external shape

    characteristics like corners and

    inflections. Regional representation is

    appropriate when the focus is on

    internal shape characteristics such as

  • 8/3/2019 A Visual Keyboard

    5/6

    texture and skeletal shape. Description

    also called feature selection deals with

    extracting features that result in some

    quantitative information of interest or

    features that are basic for

    differentiating one class of objectsfrom another.

    4.6. Recognition and Interpretation

    Recognition is the process that assigns

    a label to an object based on the

    information provided by its

    descriptors. This process allows us to

    cognitively recognize characters based

    on knowledge base. Interpretation

    involves assigning a meaning to an

    ensemble of recognized objects.

    Interpretation attempts to assignmeaning to a set of labeled entities. For

    example, to identify a character say 'C',

    we need to associate descriptors for

    that character with label 'C'.

    4.7. Knowledge Base

    Knowledge about a particular problem

    domain can be coded into an image

    processing system in the form of a

    Knowledge database. The knowledge

    may be as simple as detailing regions

    of an image where the information of

    interest is known thus limiting our

    search in seeking that information. Or

    it can be quite complex such as an

    image database where all image entries

    are of high resolution. The key

    distinction of this knowledge base is

    that it, in addition to guiding the

    operation of various components,

    facilitates feedback operations

    between the various modules of thesystem. This depiction on fig 4.1

    indicated that communication between

    processing modules is based on prior

    knowledge of what a result should be.

    5. Unique FeaturesThis model is a novel idea and the first

    of its kind in the making. Also it opens

    a new dimension to how we perceive

    the world and should prove to be a

    critical technological breakthroughconsidering the fact that there has not

    been sufficient research in this field of

    Eye scanning. If implemented, it will

    be one of the awe-inspiring

    technologies to hit the market.

    6. Design Constraints:

    Though this model is thoughtprovoking, we need to address the

    design constraints as well.

    _R & D constraints severely hamperour cause for a full-fledged working

    model of the Vis-Key system.

    _The need for a very high resolutioncamera calls for a high initial

    investment.

    _The accuracy and the processingcapabilities of the algorithm are very

    much liable to quality of the input.

    Due to these design constraints we will

    be able to chalk out a plan to

    encompass modules 4, 5, 6 and 7

    (Preprocessing, Segmentation and

    Representation, Recognition and

    Interpretation and the Knowledge

    base). The most preferred softwares to

    implement these algorithms are C++

    and Mat Lab 6.0.

    7. Alternatives/RelatedReferencesThe approaches till date have only

    been centered on the Eye Tracking

    theory. It lays more emphasis on the

    use of eye as a cursor and not as a data

    input device. An eye-tracking device

    lets users select letters from a screen.

    Dasher, the prototype program taps

    into the natural gaze of the eye and

    makes predictable words and phrases

    simpler to write, said David MacKay,project coordinator and physics

    professor from Cambridge University.

    Dasher calculates the probability of

    one letter coming after another. It then

    presents the letters required as if

    contained on infinitely expanding

    bookshelves. Researchers say people

    will be able to write up to 25 words per

    minute with Dasher compared to on-

    screen keyboards, which they say

    average about 15 words per minute.

    Eye-tracking devices are still

  • 8/3/2019 A Visual Keyboard

    6/6

    problematic. "They need re-calibrating

    each time you look away from the

    computer," says Willis.

    He controls Dasher using a trackball.

    Bibliography:_http://www.cs.uta.fi/~curly/publications/ECEM12-Majaranta.html_www.inference.phy.cam.ac.uk/djw30/dasher/eye.html_http://www.inference.phy.cam.ac.uk/dasher/_http://www.cs.uta.fi/hci/gaze/eyetyping.php_http://www.acm.org/sigcaph

    _Ward, D. J. & MacKay, D. J. C.Fasthands-free writing by gazedirection.Nature, 418, 838, (2002)._Daisheng Luo Pattern

    Recognition and ImageProcessing Horwood series inengineering sciences.