A Single Camera Based Floating Virtual Keyboard

Embed Size (px)

Citation preview

  • 8/10/2019 A Single Camera Based Floating Virtual Keyboard

    1/5978-1-4673-4681-8/12/$31.00 2012 IEEE 1

    2012 IEEE 27 th Convention of Electrical and Electronics Engineers in Israel

    A Single Camera Based Floating Virtual Keyboardwith Improved Touch Detection

    Erez Posner Nick Starzicki Eyal Katz partment of Electrical Engineering, Department of Electrical Engineering, Department of Electrical Engineering,ka, Tel Aviv College of Engineering Afeka, Tel Aviv College of Engineering Afeka, Tel Aviv College of Engineering

    8 Bney Efraim Rd., Tel Aviv 69107, [email protected]

    218 Bney Efraim Rd., Tel Aviv 69107, [email protected]

    218 Bney Efraim Rd., Tel Aviv 69107, Israel [email protected]

    Abstract Virtual keyboard enables user typing on anysurface, including a plain paper or your desk. Some virtualkeyboards give vibration feedback; some are projected onthe typing surface, while others give different kind of visualfeedback such as showing it on a smart phon e s screen. Theuser presses the virtual keys thus typing the desired inputtext.

    In this work we have implemented a single standard camera-based virtual keyboard, by improving shadow-based touchdetection. The proposed solution is applicable to any surface.The system has been implemented on an Android phone,operates in real time, and gives excellent results.

    I. I NTRODUCTION

    This paper describes an improved method implementing avirtual keyboard using the single integrated standard twodimensional (2D) camera of a smart phone.Virtual keyboards had been proposed based on different methods[1]-[13]. Camera based virtual keyboards can be implementedusing a single or multiple cameras. One of the major challengesis how to determine if the finger touches the surface or not.Touch detection based on real three-dimensional (3D) model

    built from stereoscopic camera based systems is more accuratethan single camera based solutions. However, since stereoscopiccameras are not common in mobile phones, this method is lessapplicable to mobile solutions. The challenge of accurate touchdetection is even greater when using a single camera and almostany surface. Floating virtual keyboards, which are portable andenable directing the camera to any surface, are even morechallenging to implement.The proposed solution is based on single standard mobile phonecamera; it implements a floating keyboard; and presents animproved touch detection method, based on shadow analysis.Thus enables working on any surface, as long as both the fingerand its shadow are visible to the camera.

    .

    The rest of the paper is organized as follows: Section II reviewsthe related work for existing 2D and 3D touch detection methods.Section III describes the proposed system, shadow-based touchdetection improvement and the conditions applicable to the

    proposed solution. Section IV shows comparison results. AndSection V concludes the paper.

    II. R ELATED WORK

    A Virtual Keyboard can be implemented in several differenttechniques . In [1], a virtual key board (vkb) based on a true-3Doptical ranging is presented. It is accurate and robust; however itrequires a 3D optical imaging system. Similarly in [2], [3], thesystems provide special features such as hand gestures and multi

    touch but [2] requires multiple cameras, and [3] requiresunique hardware. In [4], the shadow of a finger is detected, andwhen it is occluded by the finger, a touch is assumed. Thecorresponding touch detection system created in [5] wasdesigned to detect a touch by comparing the ratio of black pixelsto the number of white ones. It is understood that in thecorresponding article [5], the ratio is acquired by searchingsmall regions around the fingertips and comparing the number ofwhite pixels to black ones, where black pixels represent theshadow. If the ratio of white to black pixels exceeds a certainthreshold, a touch has occurred. However these methods aresensitive to the direction of lighting, where in many cases only athin portion of the shadow is captured by the camera. Therefore,when the finger seems close to touching the surface, it is still faraway from it, since the pixel difference is small. In [6] and [7],high speed camera is used, and a special in-air movement should

    be made

    III. P ROPOSED SOLUTION

    This section describes extensively the implementation detailsof the proposed solution shown in Figure 1, which is primarily

    based on [5] with added modifications as shown in Figure 1.

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/10/2019 A Single Camera Based Floating Virtual Keyboard

    2/52

    ( , )

    (2)

    used to enhance the preliminary filter results thus removingartifacts and undesirable small blobs. This ROI is then further processed by operating edge detectionwhich is eventually used to identify the finger tips. For finger tipsdetection we used the algorithm described in [8]. The challenge of touch detection is coped with the paradigm touse the shadow as a measure of distance to the surface.

    Hence, the shadow is extracted using image subtraction from aninitial reference image and the captured image. The subtractedimage is processed in the same way as the captured image inorder to find its shadow's contour that leads to the shadow's tip.The crossing between the fingertip and the shadow's tip iscalculated under specific terms and restraints to discover touch. A feedback to the user is given as the virtual keyboard exhibitedon the Android's phone screen turns red and the selected letter is

    presented. One of the main constraints is the implementation on an Android

    phone, which is not as powerful as using a pc. Unique steps aretaken into consideration in order to reduce the complexity of oursystem.

    1. Filtering, Image subtraction, Color Segmentation and Morphological operations

    Figure 1: Block diagram of the proposed virtual keyboard. The red

    The first phase is to separate between pixels that have the potential of being hand pixels and the ones that are not. The processed frame is blurred using a 3x3 Gaussian blur filter. It isused to reduce image noise and to reduce the image details.Then, the image is subtracted from an initially referenced imageas presented in [5]. For the purpose of enhancing the hand'sdetection - neglecting abnormal objects skin segmentation isused. The image captured is coded in YUV color space due tohardware constrains. A transformation to HSV color space is

    performed. Due to the fact that the hand's dominant color is red,using HSV color space coding enables a comprehensiveobservation on the entire red color region. Thus, different users'hand's colors would not be neglected. The detected hand regionthat passes a certain threshold turns to white, while the rest of theimage is made black. Ideally, after the mentioned operations onlythe hand should remain. However, small bumps remained, and sowe applied a median filter and morphological close technique.The median filter applied using a 3x3 rectangular element asfollows:

    outline marks the improved algorithm

    33 = + 1 + 1

    = 1 The system is divided into two portions: the captured image analysis and shadow analysis. Initially, the captured image is , = = 1

    (1)

    255 , 3 3 255 ( ) 2

    preliminary filtered to reduce the noises level. Then, the capturedimage is subtracted from an initially referenced image, and theuser's hand is detected using color segmentation, resulted in thehand Region of Interest (ROI). Morphological operations are

    0 , 3 3 < 255 ( 2 )

    The morphological close technique is based on dilating and eroding of the ROI using a 3x3 rectangular element.

  • 8/10/2019 A Single Camera Based Floating Virtual Keyboard

    3/53

    = ( ( )) (3)

    Where dilate is: max ,

    ( + , +

    ) (4) Whereerode is:

    min ,

    ( + , +

    ) (5)

    Where src is the image and x' y' in structural element.

    In some cases small objects remained rather than the hand,and had to be removed. This is done after the edge detectionstage.

    Figure 3: Finger Edge Detection. Continuous finger curve is shown inwhite (a); discrete finger curve is shown in yellow (b)

    3. Tip Detection

    In this stage the tip of the object is found using the approach presented in [8]. The finger's discrete outline is converted into alist of consecutive coordinates representing the contour of the

    finger. Each three j consecutive pixel coordinates [C( j k ),C( j) , C( j + k )] represent three vertexes of a triangle, and thehead angle is calculated under the assumption that the middlecoordinate is the head vertex. The angle is calculated using thelaw of cosines.

    2 2 2

    = ( +

    ) (7) 2

    Figure 2: Finger Separation. The finger Region is shown in white

    2. Edge Detection

    The edge detection phase is essential in order to find thehand's contour and then the finger tip. Hence, to extract thecontour we use canny filter. Additional modifications are neededdue to the fact that the contour acquired is not continuous and tomake it so we use 8-point connected component analysisneighborhood system that produces a set of counter-clockwise

    perimeter coordinates which trace the outline of the handI( j) = { ( x j , y j) } (6)

    this enables a complete traversal of the hand's edge used in 3.and small bumps that may have remained.

    In order to remove the remaining bumps we threshold theimage under the assumption that there should be only one bigobject in the frame- the hand- . The area of each remained objectis calculated and the largest one is determined to be the hand. Toreduce the complexity, the continuous contour is transformed intoa discrete one.

    (a) (b)

    Where is the angle representing the contour's peakexamined as a potential peak. And a, b, c are the triangle edgescalculated as distances between [C( j k ), C( j) , C( j + k )].

    Once all angles are acquired, the smallest angle represents thefinger tip. This is under the assumption that the surrounding ofthe fingertip is the only area that will provide the smallest angle.

    For this method to work the finger's contour must be discrete,otherwise the coordinates will be too close to each other, andeven the angle for the fingertip will be large. In addition, thediscrete method also reduces the complexity of our system.

    Figure 4: Finger-tip Detection. The fingertip detected is shown as a reddot on the hand's contour

    4. Shadow Extraction

    In most Smart Phones, in particular Samsung Galaxy s9000Ithere are many features. Among them is an ISO Camera. ISOdetermines how sensitive the image sensor is to light ISO 100was found affective to revoke dynamic operations that couldaffect the shadow isolation.

    The image obtained from 1. and the captured image aresubtracted from the initial reference image leaving us with onlythe shadow within the new obtained image (after transformingthe result image to a binary image). Then, the image containingthe shadow is processed as if a hand is being detected through

  • 8/10/2019 A Single Camera Based Floating Virtual Keyboard

    4/54

    phases 1. 3. (without skin segmentation). The outcome is asdescribed in 3.

    (a) (b)

    (c) (d)

    Figure 5: Shadow Processing. (a) shows the shadow extraction from thecaptured image, (b) shows the shadow's contour consisting of the main

    shadow and an unnecessary small shadow, (c) shows the shadow'scontinuous curve, (d) The shadow's tip detected is shown as a green dot

    on the shadow's contour

    (a) (b)

    (c) (d)

    5. Touch Detection The fingertip detected in 3. and its following shadow's tipfound in 4. is an estimate of the finger and shadow locations inthe image. The fingertip is marked on its hand's curve ( xSF , ySF )shown in RED in Figure 6.a, Figure 6.b. The corresponding point

    (e) (f)

    Figure 6: Shadow curve is shown in blue, finger curve in white. (c)Shows a typical no touch case. The distance between the red and the green dot is large. (d) Shows a typical touch case. The distance is

    on the shadow's curve ( xS , yS ) is marked in GREEN in Figure small. (e) and (f) are the virtual keyboard as seen on the phone's screen. (e) The floating keyboard. (f) a touch detected on the letter "I".6.a, (theorem: the corresponding point on the shadow curve willalways be visible). Then the distance between ( xSF , ySF )( xS , yS ) is calculated as:

    and 6. Mapping

    The bottom half of the phone s screen is divided into 30 buttons. The button coordinates are known, making the mapping

    d = ( xSF xS , ySF yS )

    In opposed to the pixel ratio measured in [5]:

    =

    (8)

    (9)

    quite simple: once a touch is detected, its coordinate is known,and is compared to the button ranges to obtain the requested keyand add user feedback. The keyboard's language wasimplemented in both English and Hebrew.

    IV. R ESULTS AND COMPARISONS

    As seen from Figure 6.a where the finger is distant from thesurface, the distance d is large and there is no touch, and whenthey are close, shown in Figure 6.b the distance is small and atouch has occurred. It must be noted that when the distance d issmall enough to represent a touch, both the fingertip and shadowtip must be in the same region representing a certain letter inorder for the touch to be valid. Special cases are treated orflagged by the system.

    From examining the ratio, r, of Eq. (9), [5] it is apparent thatthe shadow area has little or no change between "Almost-touch"and actual touch, but is not reduced to an area of zero. Unlikethe above measure the proposed measure calculated distances

    between the fingertip and the shadow-tip. The proposed measureof Eq. (8) reduces to zero upon actual touch.

    Furthermore, looking at the possibility for a false touchdetection, since the existing method of [5] needs a thresholdwhich depends on lightning conditions and the virtual keyboardsurface texture, this threshold acts on variable and noisy input.

    The proposed solution is always searching for a fingertip andshadow-tip distance that ideally reduces to one pixel, dependingon the light source position. Therefore, it appears that this

  • 8/10/2019 A Single Camera Based Floating Virtual Keyboard

    5/55

    method [5] may lead to a number of points representing a touchor a point representing a false touch, even though this is not thewill of the user, as in the image there are several places whereyou can see that the ratio is in favor of the white pixels, but thisdoes not necessarily mean that the finger is touching the desired

    point.

    Parameter No touch Almost Touch TouchExisting Measure:Pixel Ratio [5] 1.2 0.1 0.07ProposedMeasure :Distance (d)

    20.5 7.3 0.9

    Table 1: Distance Measurement and Pixel Ratio using 3x3 rectangularelement comparison. Note the differences in values betweenalmost touch and touch

    The presented touch detection method solves these issues byadding another layer of accuracy. The method in which thefinger and the fingertip are analyzed, and calculating the

    distance from the shadow tip, results in very accuratecoordinates of the location at which a touch has occurred. Asmentioned in the previous section, a touch will only be valid if

    both the fingertip and shadow tip are in the same letter region,also adding to the accuracy of the system. Special cases in whichthe region difference is minimal, is treated by the system. Ourtouch detection implementation runs in real time at 15 fps, andresults in a touch accuracy of approx 95%, and a minimal falsetouch detection.

    V. S UMMARY AND CONCLUSIONS

    A floating virtual keyboard, based on a single camera has been presented, implemented on an Android smart phone, and

    runs in real time. Since this implementation runs on a mobile phone, no extra hardware is required.The touch detection is performed using improved shadow

    based touch detection and runs at 15 fps. This detection is basedon measuring a distance between corresponding points on theshadow and finger, rather than measuring ratio of finger andshadow pixels, and is therefore more robust to variousilluminations conditions. Also, hand detection is based on HSVcolor space, which although a heuristic method, still provides

    better results than the RGB color space. Our touch accuracy isapprox 95% and false touch detection is minimal.

    The virtual keyboard implementation took system complexityinto consideration, allowing the final product to run on a less

    powerful machine than a pc. This was done by rewriting manyfunctions and minimizing their time complexity.Within our project we implemented the option of choosing

    more than one language, and many more can be added. This was possible as our virtual keyboard is shown on a smart phone'sscreen, rather than on a physical desktop.

    For future work, we believe that it is possible to improve thesystem's complexity even further and add extra functionality,such as more languages, font change, multi-touch typing andmore. Although there is always place for improvement, we

    believe we have provided a rock-solid virtual keyboardimplementation that comes to solve certain challenges met in the

    past, and may be a milestone for future outcomes.

    VI. R EFERENCES[1] H. Du, T. Oggier, F. Lustenburger and E. Charbon, A virtual

    keyboard based on true-3D optical ranging, Proc. British MachineVision Conference (BMVC), Oxford, pp. 220-229, Sept. 2005.

    [2] Katz I, Gabayan K, Aghajan H, "A Multi-Touch Surface UsingMultiple Cameras". Dept. of Electrical Engineering, StanfordUniversity, Stanford, CA 94305, 2007.

    [3] F. Echtler, M. Huber, G. Klinker, Shadow Tracking on Multi-Touch Tables", Technische Universitt Mnchen - Institut frInformatikBoltzmannstr. 3, D-85747 Garching, Germany. , AVI08, 28-30 May, 2008, Napoli, Italy.

    [4] Andrew D. Wilson, PlayAnywhere, A compact interactivetabletop projection-vision system , Proceedings of the 18th annual

    ACM symposium on User interface software and technology,October 2326, 2005, Seattle, WA, USA[5] Y. Adajania, J. Gosalia, A. Kanade, H. Mehta, N. Shekokar.

    Virtual Keyboard Using Shadow Analysis , IEEE Conference onEmerging Trends in Engineering and Technology (ICETET),2010.

    [6] Kenkichi Yamamoto, Satoshi Ikeda, Tokuo Tsuji, and Idaku Ishii,A Real-time Finger-tapping Interface Using High-speed VisionSystem , 2006 IEEE International Conference on Systems, Man,and Cybernetics, October 8-11, 2006, Taipei, Taiwan

    [7] Y. Hirobe, T. Niikura, Y. Watanabe, T. Komuro, M. Ishikawa,Vision-based Input Interface for Mobile Devices with High-speedFingertip Trackin g , UIST 09, October 4 7, 2009, Victoria, BC,Canada.

    [8] Malik S, "Real-time Hand Tracking and Finger Tracking forInteraction", CSC2503F Project Report, December 18, 2003.

    [9] T. .Niikura, Y. Hirobe, A. Cassinelli, Y. Watanabe, T. Komuro, M.Ishikawa, In-air Typing Interface for Mobile Devices withVibration Feedb ack , SIGGRAPH 2010.

    [10] Jani Mantyjarvi', Jussi Koivumaki2, Petri Vuori3 ,KEYSTROKE RECOGNITION FOR VIRTUAL KEYBOARD ,2002

    [11] H. A. Habib and M. Mufti: Real Time Mono Vision Gesture BasedVirtual Keyboard System , IEEE Transactions on ConsumerElectronics, Vol. 52, No. 4, NOVEMBER 2006

    [12] Shumin Zhai, Michael Hunter, Barton A Smith, The MetropolisKeyboard An Exploration of Quantitative Techniques for

    Virtual Keyboard Design , Proceedings of ACM Symposium onUser Interface Software and Technology (UIST 2000), November5-8, 2000, San Diego, California. pp 119-128.

    [13] Mathias Klsch, Matthew Turk, Keyboards without Keyboards:A Survey of Virtual Keyboards , UCSB Technical Report 2002-21, July 12, 2002