Kinect Full

Embed Size (px)

DESCRIPTION

seminar report on kinect technology by microsoft

Citation preview

  • CHAPTER 1

    INTRODUCTION

    The Xbox 360 is the second video game console developed by and produced forMicrosoft and the successor to the Xbox. Kinect is a "controller-free gaming andentertainment experience" for the Xbox 360. It was first announced on June 1, 2009 at theElectronic Entertainment Expo, under the codename, Project Natal. The add-on peripheralenables users to control and interact with the Xbox 360 without a game controller by usinggestures, spoken commands and presented objects and images. The Kinect accessory iscompatible with all Xbox 360 models, connecting to new models via a custom connector,and to older ones via a USB and mains power adapter. Based around a webcam-style add-on peripheral for the Xbox 360 console, it enables users to control and interact with theXbox 360 without the need to touch a game controller, through a natural user interfaceusing gestures and spoken commands. It aims at broadening the Xbox 360's audiencebeyond its typical gamer base. Kinect holds the Guinness World Record of being the"fastest selling consumer electronics device". It is also considered as the advanced VirtualReality controller.

    1.1 MOTIVATIONInitially developed by Microsoft for gaming, KINECT holds further scope of

    application in a vast array of areas. The purpose of this seminar is to identify such areaswhich are already or yet to be recognized and to familiarize with such fields. The mainmotivation for this seminar is to provide overview information about techniques andapplication of Kinect along with familiarizing and sharing knowledge about the futurescope. The possible scenarios of virtual reality, and its scope also remains open to beexplored.

  • 1.2 LITERATURE REVIEW

    Kinect is a relatively recent technology, no more than one and a half years old. Butbeing such an advanced and novel piece of hardware, a lot of research has already beenperformed on it. That research includes studies in fields that vary from physicalrehabilitation [23] to robotics. A good example of its use in robotics is NASAs integrationof Kinect with their prototype robots [24]. The LIREC project (Living with Robots andinteractive Characters) is also another good example of Kinects integration in robotics.

    This project is a collaboration between several entities (universities, research institutes andcompanies) from several different countries, Heriot-Watt being one of the partneruniversities. Heriot-Watt researchers have been integrating Kinect with their prototyperobot and studying how can it be used to facilitate human-robot interaction.

    Efficiently tracking and recognizing hand gestures with Kinect is one of the fieldsthat is getting more attention from researchers [25][26][27][28]. This is a complex problembut it has a number of diverse applications, being one of the most natural and intuitiveways to communicate between people and machines [28]. Regarding full-body posesrecognition, E. Suma et al. [29] developed a toolkit which allows customizable full-bodymovements and actions to control, among others, games, PC applications, virtual avatarsand the onscreen mouse cursor. This is achieved by binding specific movements to virtualkeyboard and mouse commands that are sent to the current active window.

    M. Raptis et al. [30] proposed a system that is capable of recognizing in real timeand with high accuracy a dance gesture from a set of pre-defined dance gestures. Thissystem differs from games like Harmonixs Dance Central, allowing the user to performrandom sequences of dance gestures that are not imposed by the animated character. Inthis paper, authors also identified noise originated from players physique andclothing, from the sensor and from kinetic IQ as being one of the main disadvantagesof depth sensors, like Kinect, when compared to other motion capture systems.Following the same idea, D. S. Alexiadis et al. [31] addressed the problem of real-timeevaluation and feedback of dance performances considering a scenario of an online danceclass. Here, online users are able to see a dance teacher perform some choreography steps

  • which they try to imitate later. After comparing the user performance with the teacher

    performance, the user performance is automatically evaluated and some visual feedbackis provided.

    In a different perspective, F. Kistler et al. [32] adopted a game book and implemented aninteractive storytelling scenario, using full body tracking with Kinect to provide differenttypes of interaction. In their scenario, two users listen to a virtual avatar narrating parts

    of a game book. At specific points, the users need to perform certain gestures to influencethe story. Almost none of the Kinect games developed so far concentrate on a story, andthis may be an interesting approach for the creation of new games. Another interestingidea is the one presented by M. Caon et al. [33] for the creation of smart environments.Using multiple Kinects, these researchers developed a system capable of recognizingwith great precision the direction to where a user is pointing in a room. Their smartliving room is composed by several smart objects, like a media centre, several lamps,a fan, etc., and its able to identify several users postures and pointing gestures. If a userpoints to any of those objects, they will automatically change their state (On/Off). A lotof work can be done to improve home automation based on this idea (and using Kinect).

    Motion Detection Real Time 3D Walkthrough in Limkokwing University of

    Creative Technology (Modet-Walk) using Kinect Xbox by Behrang Parhizkar,Kanammal A/p Sandrasekaran and Arash Habibi Lashkari. This research intends to givean interactive communication by implementing the Kinect into the 3D walkthrough. Theproject is based on motion detection which is interacted with virtual 3D walkthrough inthe real environment. Now the possibility of combining 3D walkthrough with the KinectXbox seems to be a success. This paper is simply to emphasize on combining 3D virtualwalkthrough with Kinect Xbox to detect the motion. The research is based on how theimplementation of motion detection using Kinect can help people inunderstanding/translating and give meaning to the environment displayed around themubiquitously.

  • In almost all of the papers presented before, researchers have used both the RGBand depth sensors to track the human body or objects. At the time this literature reviewwas conducted, we werent able to find any relevant paper where researchers havestudied how well Kinects RGB camera can recognise and track colours and/or objectschanges in size under different lighting conditions, especially on smart textiles.Searching online, we can find some videos of developers/users demonstrating Kinectapplications where colour recognition and tracking seems to be the main objective.However, this is clearly not enough to report on or to formulate an informed opinion.This information would be valuable, as these are very important features for thesuccessful achievement of the purposed goals.

    We believe that, given all the possibilities Kinect offers, much more research willbe done based on it in the next years. One fact that supports this idea was the releaseof the Kinect for Windows in February 2012. Kinect for Windows consists of animproved Kinect device and a new SDK version especially designed for Windows PCs.This will allow the development of new kinds of applications, and, consequently,new tools will become available to researchers to perform new studies.

  • CHAPTER 2

    EXISTING SYSTEM

    2.1 INTRODUCTION

    Virtual reality (VR) is the creation of a highly interactive computer basedmultimedia environment in which the user becomes a participant with the computer inwhat is known as a synthetic environment. Virtual reality uses computers to immerse

    one inside a three- dimensional program rather than simulate it in two-dimensionson a monitor. Utilizing the concept of virtual reality, the computer engineer integrates

    video technology, high resolution image-processing, and sensor technology into thedata processor so that a person can enter into and react with three-dimensional spacesgenerated by computer graphics. The goal computer engineers have is to create anartificial world that feels genuine and will respond to every movement one makes, justas the real world does.

    Naming discrepancies aside, the concept remains the same - using computertechnology to create a simulated, three-dimensional world that a user can manipulateand explore while feeling as if he were in that world. Scientists, theorists and engineershave designed dozens of devices and applications to achieve this goal. Opinions differon what exactly constitutes a true VR experience, but in general it should include:

    Three-dimensional images that appear to be life-sized from the perspectiveof the user

    The ability to track a user's motions, particularly his head and eyemovements, and correspondingly adjust the images on the user's displayto reflect the change in perspective

  • Virtual realities are a set of emerging electronic technologies, with applications

    in a wide range of fields. This includes education, training, athletics, industrial design,architecture and landscape architecture, urban planning, space exploration, medicineand rehabilitation, entertainment, and model building and research in many fields ofscience. Virtual reality (VR) can be defined as a class of computer-controlledmultisensory communication technologies that allow more intuitive interactions with

    data and involve human senses in new ways. Virtual reality can also be defined as anenvironment created by the computer in which the user feels present. This technologywas devised to enable people to deal with information more easily. Virtual Realityprovides a different way to see and experience information, one that is dynamic andimmediate. It is also a tool for model building and problem solving. Virtual Reality ispotentially a tool for experiential learning. The virtual world is interactive; it respondsto the users actions.

    Virtual Reality is defined as a highly interactive, computer-based multimediaenvironment in which the user becomes the participant in a computer-generated world.It is the simulation of a real or imagined environment that can be experienced visuallyin the three dimensions of width, height, and depth and that may additionally providean interactive experience visually in full real-time motion with sound and possibly withtactile and other forms of feedback. VR incorporates 3D technologies that give a reallife illusion. VR creates a simulation of real-life situation. The emergence of augmentedreality technology in the form of interactive games has produced a valuable tool foreducation. One of the emerging strengths of VR is that it enables objects and theirbehaviour to be more accessible and understandable to the human user.

    2.2 DIFFERENT KINDS OF VIRTUAL REALITY

    There is more than one type of virtual reality. Furthermore, there are different schemasfor classifying various types of virtual reality. Jacobson (1993a) suggests that there arefour types of virtual reality:

    (1) Immersive virtual reality

    (2) Desktop virtual reality

  • (3) Projection virtual reality

    (4) Simulation virtual reality

    (5) Augmented virtual reality

    (6) Text-based virtual reality.

    2.2.1 Immersive virtual reality

    An Immersive VR system is the most direct experience of virtual environments.Here the user either wears an head mounted display (HMD) or uses some form of head-coupled display such as a Binocular Omni-Orientation Monitor (BOOM) to view thevirtual environment, in addition to some tracking devices and haptic devices. It is a typeof VR in which the user becomes immersed (deeply involved) in a virtual world. It isalso a form of VR that uses computer related components.

    2.2.2 Augmented Reality

    A variation of immersive virtual reality is Augmented Reality where a see-through layer of computer graphics is superimposed over the real world to highlightcertain features and enhance understanding. Augmented virtual reality is the idea oftaking what is real and adding to it in some way so that user obtains more informationfrom their environment. Azuma (1999) explains, Augmented Reality is aboutaugmentation of human perception: supplying information not ordinarily detectableby human senses. According to Isdale (2001), there are four types of augmented reality

    (AR) that can be distinguished by their display type, including:

    1. Optical See-Through AR uses a transparent Head Mounted Display (HMD)to display the virtual environment (VE) directly over the real world.

    2. Projector Based AR uses real world objects as projection surface for the VE.

    3. Video See-Through AR uses an opaque HMD to display merged video of theVE with and view from cameras on the HMD.

  • 4. Monitor-Based AR also uses merged video streams but the display is a moreconventional desktop monitor or a hand held display. Monitor-Based AR is perhaps theleast difficult to set up since it eliminates HMD issues.

    2.2.3 Text-based Virtual Reality

    In this type of virtual reality, a reader of a certain text, form a mental model ofthis virtual world in their head from the description of people, places and things.

    2.2.4 Through the Window

    With this kind of system, also known as desktop VR, the user sees the 3-D

    world through the window of the computer screen and navigates through the space witha control device such as a mouse. Like immersive virtual reality, this provides a first-person experience. One low-cost example of a Through the window virtual realitysystem is the 3-D architectural design planning tool Virtus Walkthrough that makes itpossible to explore virtual reality on a Macintosh or IBM computer. Another exampleof through the window virtual reality comes from the field of dance, where a computerprogram called Life Forms lets choreographers create sophisticated human motionanimations.

    2.2.5 Projected Realities

    Projected realities (Mirror worlds) provide a second-person experience inwhich the viewer stands outside the imaginary world, but communicates with charactersor objects inside it. Mirror world systems use a video camera as an input device. Userssee their images superimposed on or merged with a virtual world presented on a largevideo monitor or video projected image.

  • 2.3 EXISTING SYSTEMS

    2.3.1 Head-Mounted Display (HMD)

    The head-mounted display (HMD) was the first device providing its wearer withan immersive experience. Evans and Sutherland demonstrated a head-mounted stereodisplay in 1965. A typical HMD houses two miniature display screens and anoptical system that channels the images from the screens to the eyes, thereby,presenting a stereo view of a virtual world. A motion tracker continuously measures theposition and orientation of the user's head and allows the image generating computer toadjust the scene representation to the current view. As a result, the viewer can lookaround and walk through the surrounding virtual environment. To overcome the oftenuncomfortable intrusiveness of a head-mounted display, alternative concepts (e.g.,BOOM and CAVE) for immersive viewing of virtual environments were developed.

    Fig 1: HMD

    2.3.2 BOOM

    The BOOM (Binocular Omni-Orientation Monitor), from Fake space lab is ahead- coupled stereoscopic display device. Screens and optical system are housed in abox that is attached to a multi-link arm. The user looks into the box through two holes,sees the virtual world, and can guide the box to any position within the operationalvolume of the device. Head tracking is accomplished via sensors in the links of the armthat holds the box.

  • Fig 2: BOOM

    2.3.3 CAVE

    The CAVE (Cave Automatic Virtual Environment) was developed at theUniversity of Illinois at Chicago and provides the illusion of immersion by projectingstereo images on the walls and floor of a room-sized cube. Several persons wearinglightweight stereo glasses can enter and walk freely inside the CAVE. A head trackingsystem continuously adjusts the stereo projection to the current position of the leadingviewer. The advantages of CAVE are that, it gives a wide surrounding field of viewand it has the ability to provide a shared experience to a small group. A variety of inputdevices like data gloves, joysticks, and hand-held wands allow the user to navigatethrough a virtual environment and to interact with virtual objects. Directionalsound, tactile and force feedback devices, voice recognition and other technologies arebeing employed to enrich the immersive experience and to create more sensualizedinterfaces.

    Fig 3: CAVE

  • 2.3.4 Data Glove

    A data glove is outfitted with sensors on the fingers as well is as an overallposition/ orientation tracking equipment. Data glove enables natural interaction withvirtual objects by hand gesture recognition. Modern VR gloves are used tocommunicate hand gestures (such as pointing and grasping) and in some cases returntactile signals to the users hand.

    Fig 4: Data glove

    Concerned about the high cost of the most complete commercial solutions,Pamplona et al. propose a new input device: an image-based data glove (IBDG). Byattaching a camera to the hand of the user and a visual marker to each fingertip, theyuse computer vision techniques to estimate the relative position of the finger tips. Oncethey have information about the tips, they apply inverse kinematics techniques in orderto estimate the position of each finger joint and recreate the movements of the fingersof the user in a virtual world. Adding a motion tracker device, one can also map pitch,yaw, roll and XYZ-translations of the hand of the user, (almost) recreating all thegesture and posture performed by the hand of the user in a low cost device.

  • CHAPTER 3

    EXISTING SYSTEM

    3.1 INTRODUCTION

    Microsoft Xbox 360 Kinect has revolutionized gaming In that you are able touse your entire body as the controller. Conventional Controllers are not requiredbecause the Kinect Sensor picks Up on natural body movements as inputs for thegame. Three major components play a part in making the Kinect function as itdoes; the movement tracking, the speech recognition, and the motorized tilt of thesensor itself. The name Kinect is a permutation of two words; Kinetic and Connect.

    3.2 EXPLANATION

    Kinect can only be described by its features which override all others virtualreality devices. Such as

    Full Body Gaming

    Controller-free gaming means full body play. Kinect responds to howyou move. So if you have to kick, then kick. If you have to jump, then jump.You already know how to play. All you have to do now is to get off the couch.

    Something For Everyone

    Whether you're a gamer or not, anyone can play and have a blast. And withadvanced parental controls, Kinect promises a gaming experience that's safe,secure and fun for everyone.

    Its All About You

    Once you wave your hand to activate the sensor, your Kinect will be ableto recognize you and access your Avatar. Then you'll be able to jump in and outof different games, and show off and share your move

  • 3.3 ARCHITECTURE/ COMPONENTS

    3.3.1 COMPONENTS

    The main components of KINECT include:

    Video Color CMOS Camera

    Infrared (IR) CMOS Camera Infrared Projector Audio Multi-Array Microphone

    Tilt Control Motor

    Accelerometer

    Processor And Memory Prime-Sense Chip PS1080-A2 64 MB DDR2 SDRAM

    3.3.2 ARCHITECTURE

    3.3.2.1 AN RGB COLOR SPACE

    An RGB color space is any additive color space based on the RGB color model. Aparticular RGB color space is defined by the three chromaticities of the red, green, and blueadditive primaries, and can produce any chromaticity that is the triangle defined by thoseprimary colors. The complete specification of an RGB color space also requires a whitepoint chromaticity and a gamma correction curve. An LCD display can be thought of as agrid of thousands of little red, green, and blue lamps, each with their own dimmer switch.The gamut of the display will depend on the three colors used for the red, green and bluelights.

  • 3.3.2.2 3D SENSOR

    It is a device that analyzes a real-world object or environment to collect data on itsshape and possibly its appearance (i.e. color). The collected data can then be used toconstruct digital, three dimensional models useful for a wide variety of applications.

    Multiple scans, even hundreds, from many different directions are usuallyperformed and the required information about all sides of the subject is obtained. Thesescans have to be brought in a common reference system, a process that is usually calledalignment or registration, and then merged to create a complete model. This whole process,going from the single range map to the whole model, is usually known as the 3D scanningpipeline.

    3.3.2.3 MULTIPLE MICROPHONE ARRAY

    A microphone array is any number of microphones operating in tandem. Typically,an array is made up of omni directional microphones distributed about the perimeter of aspace, linked to a computer that records and interprets the results into a coherent form.Arrays may also be formed using numbers of very closely spaced microphones. In Kinect,the microphone array features four microphone capsules and operates with each channelprocessing 16-bit audio at a sampling rate of 16 kHz.

    3.4 TECHNOLOGIES USED

    KINECT uses a combination of the above mentioned hardware to create a virtualenvironment to map the actual physical data to various rendering points and commands.

    3.4.1 SENSING TECHNOLOGY

    Behind the scene of PrimeSense's 3D sensing technology there are three main partsthat make it work. An infrared laser projector, infrared camera, and the RGB coloredcamera. The depth projector simply floods the room with IR laser beams creating a depthfield that can be seen only by the IR camera. Due to infrareds insensitivity to ambient

    light, the Kinect can be played in any lighting conditions. However, because the face

  • recognition system is dependent on the RGB camera along with the depth sensor, light isneeded for the Kinect to recognize a calibrated player accurately. The following imageshows a generalized concept of how kinect's depth sensing works.

    Figure 3.1: How the sensor sees in 3D

    In more detail, the IR depth sensor is a monochrome complimentary metal--oxide-- semiconductor (CMOS) camera. This means that it is only sees two colors, inthis case black and white which is all thats needed to create a "depth map" of any room.

    The IR camera used in the Kinect is VGA resolution (640x480) refreshing at a rate of30Hz. Each camera pixel has a photodiode connected to it, which receives the IR lightbeams being bounced off objects in the room. The corresponding voltage level of eachphotodiode depends on how far the object is from the camera. An object that is closer tothe camera appears brighter than an object that is farther away. The voltage produced bythe photodiode is directly proportional to the distance the object. Each voltage producedby the photodiode is then amplified and then sent to an image processor for furtherprocessing. With this process being updated 30 times per second, you can imagine the

    Kinect has no problem detecting full--body human movements very accuratelyconsidering the player is within recommended distance.

  • Figure 3.2: Infrared beams in the room

    Although the hardware is the basis for creating an image that the processorcan interpret, the software behind the Kinect is what makes everything possible. Usingstatistics, probability, and hours of testing different natural human movements theprogrammers developed software to track the movements of 20 main joints on a humanbody. This software is how the Kinect can differentiate a player from say a dog that happensto run in front of the IR projector or different players that are playing a game together.The Kinect has the capabilities of tracking up to six different players at a time, but as ofnow the software can only track up to two active players.

    One of the main features of the Kinect is that it can recognize you individually.When calibrating yourself with the Kinect, the depth sensing and the color camera worktogether to develop an accurate digital image of how your face looks. The 8-- bit colorcamera, also VGA resolution, detects and stores the skin tone of the person it is calibrating.

    The depth sensor helps make the facial recognition more accurately by creating 3--D shapeof your face. Storing these images of your face and skin tone color is how the Kinect canrecognize you when you step in front of the projected IR beams. As mentioned earlier, forthe facial recognition to work accurately there needs to be a certain amount of light.Another added feature of the color camera is it takes videos or snapshots at key momentsduring game play so you can see how you look while playing.

  • Figure 3.3: facial recognition

    3.4.2 SPEECH RECOGNITION

    The Xbox 360 Kinect is also capable of speech recognition, which is it will not onlyrespond to natural body movements, but it will respond to a voice commands as well. Thiswas a technology designed for Kinect solely developed by Microsoft. Microsoft engineerstravelled to an estimated 250 different homes to test out their voice recognition system.They placed 16microphones all over each room to test the acoustics, echoing, etc. to get afeel of how the Kinect would respond in different environments. The end result wasplacing4 downward facing microphones on the bottom of the Kinect unit, which wouldthen listen to human voices. This is also a part of why the Kinect is so physically wide,because of the microphone placement. The 3D sensing portion only needs about half of thewidth the Kinect is now. The combination of the microphone placement and the motionsensing technology allows the Kinect to zeroin on the users voice and able to tell wherethe sound is coming from while cancelling out other ambient noise. There are 4microphones, so this means that the audio portion of the Kinect has 4 separate channels.The resolution of the audio is 16 bits and the audio is also sampled at 16 kHz. There arethree major languages supported by Kinect thus far: English, Spanish, and Japanese with

  • plans to support other popular languages soon. The Kinect is always listening as long

    as it is turned on, when the user says Xbox the user will be prompted to select oneof the options from the screen. Popular options are Play Game, Watch a Movie or

    Sign In. One of the major techniques involved with the Kinects ability to block out noise

    is known as echo cancellation.

    Figure 3.4-Kinectic sensor

    3.4.3 MOTORISED TILT

    The Kinect comes equipped with a built in motor that is able to tilt the entire unitup or down, expanding its field of view. Without moving, the Kinect is capable of havinga 43vertical viewing angle and a 57 horizontal viewing angle. With the addition of a tilt,its vertical view is expanded to 27. The Kinect is powered via standard USB connection;however it also requires a special type of connector for the motor. USB is capable ofsupplying2.5W; however this is not enough power to run the sensor and the motorsimultaneously. So Microsoft developed a special connector that draws power from theXboxs power supply, however this comes with the newer Xboxs only. Older Xboxmodels must have a separate power supply for the Kinect.

  • 3.4.4 HUMAN DETECTION

    Figure 3.5-overview of human detection method

    3.4.4.1 Pre-processing

    To prepare the data for processing, some basic pre-processing is needed. In thedepth image taken by the Kinect, all the points that the sensor is not able to measure depthare offset to 0 in the output array. We regard it as a kind of noise. To avoid its interference,we want to recover its true depth value. It is supposed that the space is continuous, and themissing point is more likely to have a similar depth value to its neighbours. With thisassumption, we regard all the 0 pixels as vacant and need to be filled.

    3.4.4.2 2D chamfer distance matching

    The first stage of the method is to use the edge information embedded in the deptharray to locate the possible regions that may indicate the appearance of a person. It is arough scanning approach in that we need to have a rough detection result with a falsenegative rate as low as possible but may have a comparatively high false positive rate toprovide to the next stage. We use 2D chamfer distance matching in this stage for quickprocessing. Also, chamfer distance matching is a good 2D shape matching algorithm thatis invariant to scale, and it utilizes the edge information in the depth array which means theboundary of all the objects in the scene.

    We use Canny edge detector to find all edges in the depth array. To reducecalculation and reduce the disturbance from the surrounding irregular objects, we eliminate

  • all the edges whose sizes are smaller than a certain threshold. We use a binary headtemplate and match the template to the resulted edge image. To increase the efficiency,a distance transform is calculated before the matching process. This results in a distancemap of the edge image, where pixels contain the distances to the closest data pixels in theedge image. Matching consists of translating and positioning the template at variouslocations of the distance map; the matching measure is determined by the pixel values ofthe distance image which lie under the data pixels of the transformed template. The lowerthese values are, the better the match between image and template at this location.

    3.4.4.3 Generate 3D model

    Considering the calculation complexity of 3D model fitting is comparatively high,we want the model to be view invariant so that we dont have to use several different

    models or rotate the model and run several times. The model should generalize thecharacteristics of the head from all views: front, back, side and also higher and lower viewswhen the sensor is placed higher or lower or when the person is higher or lower. To meetthese constraints and make it the simplest, we use a hemisphere as the 3D head model.

    3.4.4.4 Extract contours

    We extract the overall contour of the person so that we may track his/her hands andfeet and recognize the activity. In an RGB image, despite the person is standing on theground, it is less a problem to detect the boundary between the feet and the ground planeusing gradient feature. However, in a depth array, the values at the persons feet and the

    local ground plane are the same. Therefore, it is not feasible to compute humans wholebody contours from a depth array using regular edge detectors. The same applies when theperson touches any other object that is partially in the same depth with the person. Toresolve this issue, we take advantages of the fact that persons feet generally appear uprightin a depth array regardless of the posture.

    We use the filter response to extract the boundary between the persons and theground. We develop a region growing algorithm to extract the whole body contours from

  • the processed depth array. It is assumed that the depth values on the surface of a humanobject are continuous and vary only within a specific range. The algorithm starts with aseed location, which is the centroid of the region detected by 3-D model fitting. The rulefor growing a region is based on the similarity between the region and its neighboringpixels. The similarity between two pixels x and y in the depth array is defined as:

    S(x, y) =| depth(x) - depth(y) |

    A B

    Figure 3.6 Contour extraction

    A- Original depth ray. Some parts of the body has been merged with the backgrondB- Input depth ray to the region growing algorithm.

    Start region growing until similarity between the region and neighboring pixels is higherthan a threshold

    Initialize: region = seed ii.(1) Find all neighboring pixels of the region(2) Measure the similarity of the pixels and the region s1, s2 and sort the pixelsaccording to the similarity.(3) If smin < threshold(3.1). Add the pixel with the highest similarity to the region. (3.2). Calculate thenew mean depth of the region.(3.3). Repeat (1)-(3)elseAlgorithm terminate iii. Return the region

  • C D

    Figure 3.7 Region growing algorithm

    C- Result of our region growing algorithm.

    D-The extracted whole body contours are superimposed on the depth map.

    The depth of a region is defined by the mean depth of all the pixels in that region:

    3.4.4.5 TRACKING

    Finally, we give preliminary results on tracking using depth information based onour detection result. Tracking in RGB image is usually based on color, the assumption isthat the color of the same object in different time frames should be similar. But in depthimages we dont have such color information. What we have is the 3D space information

    of the objects, so that we can measure the movements of the objects in a 3D space.We assume that the coordinates and speed of the same objects in neighboring frameschange smoothly.

  • 3.5 WORKING OF KINECT

    The Kinect uses structured light and machine learning.

    Inferring body position is a two-stage process: first compute a depth map(using structured light), then infer body position (using machine learning).

    The system uses many college-level math concepts, and demonstratesthe remarkable advances in computer vision in the last 20 years.

    Fig.3.8- Stage 1-The depth map is constructed by analyzing a specklepattern of infrared laser light

    The Kinect uses an infrared projector and sensor; it does not use its RGBcamera for depth computation.

    The technique of analyzing a known pattern is called structured light.

    The Kinect combines structured light with two classic computer vision techniques:depth from focus, and depth from stereo structured light general principle: projecta known pattern onto the scene and infer depth from the deformation of that pattern.

    Fig.3.9: Depth from focus uses the principle that stuff that is more blurry isfurther away.

  • The Kinect dramatically improves the accuracy of traditional depth from focus. TheKinect uses a special (astigmatic) lens with different focal length in x- and y-directions.

    A projected circle then becomes an ellipse whose orientation depends on depth Depth from stereo uses parallax.

    If you look at the scene from another angle, stuff that is close gets shifted tothe side more than stuff that is far away.

    The Kinect analyzes the shift of the speckle pattern by projecting from onelocation and observing from another.

    Inferring body position is a two-stage process: first compute a depth map, then inferbody position.

    Stage 1: The depth map is constructed by analyzing a speckle pattern of infraredlaser light.

    Stage 2: Body parts are inferred using a randomized decision forest, learnedfrom over 1 million training examples.

    The basic techniques used behind all these methods implemented for creating avirtual reality environment for the users are:

    Motion Sensor:

    Kinect uses a motion sensor that tracks your entire body. So when you play, its

    not only about your hands and thumbs. Its about all of you. Arms, legs, knees, waist,

    hips and so on. Which means to get into the game, youll need to jump off the couch.

    Skeletal Tracking:

    As you play, Kinect creates a digital skeleton of your avatar based on depth data.So when you move left or right or jump around, our sensor will process it and translateit into gameplay.

    Facial Recognition:

    Kinect ID remembers who you are by collecting physical data thats stored in their

    profile. So when you want to play again, your Kinect will know its you.

  • Voice Recognition:

    Kinect uses four strategically placed microphones within the sensor to determinea profile of the room you play in, which makes it calibrated perfectly to pick up yourvoice.

  • CHAPTER 4

    APPLICATIONS

    4.1 VIRTUAL REALITY

    As the technologies of virtual reality evolve; the applications of VR becomeliterally unlimited. It is assumed that VR will reshape the interface between people andinformation technology by offering new ways for the communication of information,the visualization of processes, and the creative expression of ideas. A virtualenvironment can represent any three- dimensional world that is either real or abstract.This includes real systems like buildings, landscapes, underwater shipwrecks,spacecrafts, archaeological excavation sites, human anatomy, sculptures, crime

    scene reconstructions, solar systems, and so on. Of special interest is the visual andsensual representation of abstract systems like magnetic fields, turbulent flowstructures, molecular models, mathematical systems, auditorium acoustics, stockmarket behavior, population densities, information flows, and any otherconceivable system including artistic and creative work of abstract nature. Thesevirtual worlds can be animated, interactive, shared, and can expose behavior andfunctionality.

    Useful applications of VR include training in a variety of areas (military,medical, equipment operation, etc.), education, design evaluation (virtualprototyping), architectural walk-through, human factors and ergonomic studies,simulation of assembly sequences and maintenance tasks, assistance for thehandicapped, study and treatment of phobias (e.g., fear of height), entertainment, andmuch more. Virtual reality appears to offer educational potentials in the following areas:

    (1) Data gathering and visualization,

    (2) Project planning and design,

  • (3) The design of interactive training systems,

    (4) Virtual field trips,

    (5) The design of experiential learning environments.

    Virtual reality also offers many possibilities as a tool for non-traditionallearners, including the physically disabled and those undergoing rehabilitation whomust learn communication and psychomotor skills.

    In industry, VR has proven to be an effective tool for helping workersevaluates product designs. In 1999, BMW explored the capability of VR for verifyingproduct designs. They concluded that VR has the potential to reduce the number ofphysical mock-ups needed to improve overall product quality, and to obtain quickanswers in an intuitive way during the concept phase of a product. In addition, Motoroladeveloped a VR system for training workers to run a pager assembly line (Wittenberg,1995).

    In the past decade medical applications of virtual reality technology have beenrapidly developing, and the technology has changed from a research curiosity to acommercially and clinically important area of medical informatics technology. Virtualreality is under exploration as a therapeutic tool for patients. For example, psychologistsand other professionals are using virtual reality as tool with patients that are afraid ofheights. NASA has developed a number of virtual environment projects. This includesthe Hubble Telescope Rescue Mission training project, the Space Station Coupolatraining project, the shared virtual environment where astronauts can practicereconnoitring outside the space shuttle for joint training, human factors, andengineering design. NASA researcher Bowen Loftin has developed the VirtualPhysics Lab where learners can explore conditions such as changes in gravity. Virtualreality can make it possible to reduce the time lag between receiving equipmentand implementing training by making possible virtual prototypes or models of theequipment for training purposes.

    In entertainment field, virtual realities are used in movies and games. One of theadvantages of using the VR games is that it creates a level playing field. These virtual

  • environments eliminate contextual factors that create inequalities between learners,thereby interfering with the actual learning skills featured in the training program, thatis, interpersonal skills, collaboration, and team-building. Serious games are being moreand more deployed in such diverse areas as public awareness, military training, andhigher education. One of the driving forces behind this stems from the rapidly growingavailability of game technologies, providing not only better, faster, and more realisticgraphics, physics, and animations, but above all making the language of game

    development accessible to increasingly more people. Game based simulations proposean architecture for a professional fire-fighter training simulator that incorporates

    novel visualization and interaction modes. The serious game, developed in cooperationwith the government agency responsible for the training of fire and rescue personnel, isa good example of how virtual reality and game technology helps making the delicatecombination of engaging level design and carefully tuned learning objectives.

    The emergence of augmented reality technology in the form of interactivegames has produced a valuable tool for education. The Live communal nature of thesegames, blending virtual content with global access and communication, has resulted ina new research arena previously called, edutainment but more recently called

    learning games. Windows Live combined with Xbox 360 with Kinect technologyprovides an agile, real-time environment with case-based reasoning, where learnerscan enjoy games, simulations and face to face chat, stream HD movies andtelevision, music, sports and even Twitter and Facebook, with others around the world,or alone, in the privacy of the home.

  • 4.2 TELEIMMERSIVE CONFERENCING

    Fig.4.1:Avatar Kinect virtual environment.

    With increasing economic globalization and workforce mobilization, there is astrong need for immersive experiences that enable people across geographicallydistributed sites to interact collaboratively. Such advanced infrastructures and toolsrequire a deep understanding of multiple disciplines. In particular, computer vision,graphics, and acoustics are indispensable to capturing and rendering 3D environmentsthat create the illusion that the remote participants are in the same room. Existing

    videoconferencing systems, whether they are available on desktop and mobile devicesor in dedicated conference rooms with built-in furniture and life-sized high-definitionvideo, leave a great deal to be desiredmutual gaze, 3D, motion parallax, spatial audio,to name a few. For the first time, the necessary immersive technologies are emerging

    and coming together to enable real-time capture, transport, and rendering of 3Dholograms. The Immersive Telepresence project at Microsoft Research addresses thescenario of a fully distributed team. Figure illustrates three people joining avirtual/synthetic meeting from their own offices in three separate locations. A capture

    device (one or multiple Kinect sensors) at each location captures users in 3D with high

  • fidelity (in both geometry and appearance). They are then put into a virtual room as ifthey were seated at the same table.

    The users position is tracked by the camera so the virtual room is rendered

    appropriately at each location from the users eye perspective, which produces the rightmotion parallax effect, exactly like what a user would see in the real world if the threepeople met face to face. Because a consistent geometry is maintained and the users

    position is tracked, the mutual gaze between remote users is maintained. In Figure 10,users A and C are looking at each other, and B will see that A and C are looking at eachother because B only sees their side views. Furthermore, the audio is also spatialized,and the voice of each remote person comes from his location in the virtual room. Thedisplay at each location can be 2D or 3D, flat or curved, single or multiple, transparentor opaque, and so forththe possibilities are numerous. In general, the larger a displayis, the more immersive the users experience. Because each person must be seen fromdifferent angles by remote people, a single Kinect does not provide enough spatialcoverage, and the visual quality is insufficient. Cha Zhang at Microsoft Research, withhelp from others, has developed an enhanced 3D capture device that runs in real timewith multiple IR projectors, IR cameras, and RGB cameras.

    4.2 OTHER APPLICATIONS

    Meet students from different schools. Practice proper footwork for dancing (ballroom, square, etc.). See if students really studied for a test by checking if they logged on to the

    content

    Enhance real-world environment. Your Shape: Fitness Evolved game helps students with disabilities practice

    range of motion on a prescribed schedule and then assess their performance. Displays virtual humans so you can learn their parts and do an anatomy

    course without the formaldehyde. Easier for special needs people to play because they dont have to use hard

    to hold and manage controllers. Kinect Adventures is good for full body motion encouraging people to get

    moving.

  • Kinectimals teaches students how to care and feed for a pet. Kinect Sports can be used for students with disabilities that cannot

    participate in gym.

    This game can also be used to encourage movement and reduce boredomduring indoor recess.

    Brings kids, parents, educators, mentors, etc together by breaking the ice.

    Promotes teamwork.

    Sportsmanship and fair play. In Defence to control non-piloted automated weaponry In deep space exploration Underwater exploration Remote surgery

  • CHAPTER 5

    CONCLUSION

    Kinect is a "controller-free gaming and entertainment experience" for the Xbox360. By integrating all these techniques to a single console, Kinect act as a perfectdevice for creating a virtual reality for the user. Several project researches are nowcarried out using Kinect as the main tracking device. Some of the researches andprojects have already proved that Kinect is not just a gaming console, but also an eyeto a computer. The Kinect sensor offers an unlimited number of opportunities for oldand new applications. This article only gives a taste of what is possible. Thus far,additional research areas include hand-gesture recognition, human-activity recognition,body biometrics estimation (such as weight, gender, or height), 3D surfacereconstruction, and healthcare applications. Here, I have included just one reference perapplication area, not trying to be exhaustive.

  • REFERENCES

    [1] www.xbox.com/KINECT

    [2] www.ieee.org

    [3] http://kinecthacks.net/

    [4] K. Sung, Recent Videogame Console Technologies, Computer, vol. 44, no. 2, pp.9193, Feb. 2011.

    [5] P. Doliotis, A. Stefan, C. McMurrough, D. Eckhard, and V. Athitsos, ComparingGesture Recognition Accuracy Using Color and Depth Information, in Proceedings ofthe 4th International Conference on PErvasive Technologies Related to AssistiveEnvironments, 2011.

    [6] Z. Ren, J. Meng, J. Yuan, and Z. Zhang, Robust hand gesture recognition withkinect sensor, in Proceedings of the 19th ACM international conference onMultimedia, 2011, pp. 759760.

    [7] F. Kistler, D. Sollfrank, N. Bee, and E. Andr, Full Body Gestures enhancing aGame Book for Interactive Story Telling, in Interactive Storytelling, vol. 7069,Springer Berlin / Heidelberg, 2011, pp. 207218.

    chapter 1.pdfchapter 2.pdfchapter3.pdfchapter 4.pdfchapter 5.pdf