Line-of-sight signaling and positioning for mobile …Line-of-sight; device positioning; device identification; optical signaling; mobile device We are developing an optical line-of-sight

Keyword(s): Abstract:

External Posting Date: March 21, 2013 [Fulltext] Approved for External Publication

Internal Posting Date: March 21, 2013 [Fulltext]

Line-of-sight signaling and positioning for mobile devices

Wei Hong, Ramin Samadani, Mary Baker

HP LaboratoriesHPL-2013-22

Line-of-sight; device positioning; device identification; optical signaling; mobile device

We are developing an optical line-of-sight signaling technology that enables mobile devices such assmartphones, tablets, and laptops to identify, locate, and receive information from other devices andinfrastructure within their vicinity. Using this technology we can determine where particular devices arewith respect to each other in a room - the angles between the devices - and we can use our mobile devicesto read light-emitting codes similar to Quick Response (QR) codes in the environment. Our technologyuses a robust encoding method and can tolerate shaking due to unsteady hands holding the mobile devices.Experimental results using hand-held cameras show our technology can accurately and robustly locate anddecode signals from small LED emitters even at indoor distances of at least 5 meters.

Copyright 2013 Hewlett-Packard Development Company, L.P.

LINE-OF-SIGHT SIGNALING AND POSITIONING FOR MOBILE DEVICES

Wei Hong, Ramin Samadani, Mary Baker HP Labs

{whong, ramin.samadani, mary.baker}@hp.com

ABSTRACT

We are developing an optical line-of-sight signaling technology that enables mobile devices such as smartphones, tablets, and laptops to identify, locate, and receive information from other devices and infrastructure within their vicinity. Using this technology we can determine where particular devices are with respect to each other in a room – the angles between the devices – and we can use our mobile devices to read light-emitting codes similar to Quick Response (QR) codes in the environment. Our technology uses a robust encoding method and can tolerate shaking due to unsteady hands holding the mobile devices. Experimental results using hand-held cameras show our technology can accurately and robustly locate and decode signals from small LED emitters even at indoor distances of at least 5 meters.

Index Terms— Line-of-sight, optical communication, mobile devices, relative position

1. INTRODUCTION In this paper we describe an optical line-of-sight signaling technology that makes it possible for mobile devices such as smartphones, tablets, and laptop computers to identify, locate, and receive information from other devices and infrastructure in their vicinity. Using this technology we can support several use cases for mobile devices.

First, we can determine where particular devices are with respect to each other in a room. In our daily life, it is often important for us to know who is around us and where they are. Likewise, it can be important to do the same thing with mobile computing devices and other devices in our environment including displays, printers, projectors, gaming consoles, and TVs. When we know who and what is where, we can “flick” content from one device to another, control a display from a mobile phone, identify those sitting across from us at a meeting (without revealing we’ve forgotten their names), and know who is sitting where around a tablet-based group design activity or game. To do this, we need our mobile devices to detect the identity of other devices in their vicinity and also their relative positions – the angles between the devices.

Second, we can use our mobile devices to read light-emitting codes similar to Quick Response (QR) codes in the

environment. Stores can exploit such codes to provide navigational aid or customer incentives like coupons without requiring users to be right next to a printed QR code.

The problem we solve is to use optical signaling to determine, within the coordinate system of a receiving device, the relative angular positions of the other devices within its line-of-sight. Where line-of-sight information is not directly available, we may be able to provide it transitively through intermediate devices that are in the line of sight. One of the main challenges of our problem is that in practice the number and positions of the signal emitters is unknown. To address this issue we use pixel-independent temporal processing for the decoding at each pixel of the received images. To be robust to camera shake, we apply two important pre-processing steps: a global motion compensation to allow good performance at larger emitter distances, followed by a local neighborhood max operation that in turn allows us to decode subsequently using only pixel-independent temporal processing.

Existing methods for detecting devices within a proximity, such as those using WiFi triangulation [1, 2], can determine the identity of the devices but do not give us the relative positional information we require. Optical methods are better suited to solving the positional problem, but the existing methods suffer from optical sensor noise, change of surrounding light, and shaking of mobile devices due to unsteady hands.

Some previous work has used optical communication to identify targets. Balloon Tag [3] requires four emitters to transmit a clock signal to synchronize the sender and receiver. ID CAM [4] requires an image sensor operating at a frequency much higher than the emitter’s frequency. Park et al.’s ID recognition system [5] applies simple binarization to detect and track an IR-LED. The binarization does not work if the background has any other bright areas, such as a window. These three systems all use optical communication, but they require a fixed, steady camera, which makes them useless on mobile devices. PicapiCamera [6] uses a large color-coded dot to transmit an ID between two mobile phones, but the dot is obtrusive and has a low bandwidth. Furthermore, all the existing optical solutions focus on identifying the emitters; none of them provides relative physical location.

Our solution can identify emitters within line-of-sight reliably from complex backgrounds using temporal matched

filters. It can tolerate shaking of mobile devices due to unsteady hands using 2D filtering and video stabilization, and it provides the relative angular positions of other devices for applications such as mapping, gestural content sharing, and remote control. In this paper we describe our technique in more detail and motivate its use with some example applications. Experimental results using hand-held cameras show our technology can accurately and robustly locate and decode signals from small LED emitters even at indoor distances of at least 5 meters.

2. PROPOSED APPROACH In this section we describe our optical line-of-sight signaling technology for identifying, locating, and receiving information from nearby devices. 2.1. Identification of other devices through robust optical communication Our technique for identifying other devices relies on asynchronous line-of-sight optical communication between an omni-directional emitter (e.g. using visible light LEDs or IR-LEDs) and a camera on a mobile device. Normally, line-of-sight optical communication is vulnerable to noise and changes in ambient light. We use Manchester coding [7] with adaptive thresholding and temporal matched filters to detect packets and decode the IDs of other devices robustly. 2.1.1. Packet format and Manchester coding On the sending side, a data packet, encoded by Manchester encoding, modulates the intensity of the emitter. In our implementation, the data packet consists of four parts: a header, an ID, a parity bit, and a footer. Figure 1 shows an example format for a data packet. The header is a special code that does not exist in natural ambient light and that is disallowed from appearing in the user data. We choose H = {0, 1, 0, 1, 0, 0, 1, 1} as the header. In our current implementation the user data is an ID that can be mapped through a service (accessed via WiFi or another available network) to other identifying information about the sending device or its user. Applications using this technology can embed other application specific user data.

On the receiving side, a camera receives the signals from the emitter. We create an independent decoder at each pixel because we do not know which pixels are covered by the signal from the emitter. At each pixel location, we first decode the pixel values into a binary sequence using Manchester decoding. The pixel difference between the

current frame and the previous frame 𝐝(𝑥,𝑦,𝑛) =𝐈(𝑥,𝑦,𝑛) − 𝐈(𝑥,𝑦,𝑛 − 1) is compared with a threshold 𝐓(𝑥,𝑦,𝑛) to decode the Manchester encoded pixel values into binary bits 𝐃(𝑥,𝑦,𝑛):

𝐃(𝑥,𝑦,𝑛) = �1 if 𝐝(𝑥,𝑦,𝑛) ≥ 𝐓(𝑥,𝑦,𝑛),

invalid if |𝐝(x, y, n)| < 𝐓(𝑥,𝑦,𝑛), 0 if 𝐝(x, y, n) ≤ −𝐓(𝑥,𝑦,𝑛).

�

When a decoded binary bit is invalid at a pixel location,

this pixel does not receive any Manchester code and the decoding of this pixel is reset. 2.1.2. Temporal matched filter for header detection To reject random binary temporal sequences created by noise and change of surrounding light, we use a temporal matched filter to detect the header 𝐇(𝑛) from the Manchester decoded data (𝑥,𝑦,𝑛) at every pixel location.

𝐑(𝑥,𝑦,𝑛) = � (2𝐃(𝑥,𝑦,𝑛 − 2𝑖) − 1)(2𝐇(𝐿𝐻 − 𝑖) − 1)𝐿𝐻−1

𝑖=0

,

where 𝐿𝐻 is the length of the header. Since the offset of the Manchester code is unknown, 2𝐿𝐻 − 1 Manchester-decoded frames are stored in a buffer for detecting the header. We detect a header if 𝐑(𝑥,𝑦,𝑛) = 𝐿𝐻 at a pixel location. At the pixel location of the header, we parse the upcoming binary sequence into an ID, parity bit, and trailer. We verify successful transmission of the ID using the parity bit and trailer.

The same ID can be detected at multiple pixel locations because one emitter may cover multiple pixels. After detecting each ID at a pixel location of frame 𝑛, we define the confidence value of this ID at this location as

𝐂(𝑥,𝑦,𝑛) = � |𝐝(𝑥,𝑦,𝑛 − 2𝑖)|𝐿𝑃−1

𝑖=0

,

where 𝐿𝑃 is the length of the packet. Intuitively, 𝐂(𝑥,𝑦,𝑛) measures the strength of the emitter signal at the camera based on transition strengths of the captured imagery. The center of the captured image of the emitter has the strongest oscillating pattern that occurs in the Manchester code, and will result in the highest confidence. We designate the pixel location with the highest confidence as the detected pixel location of this ID. If there are multiple pixel locations with

Header User data: a 12-bit ID Parity bit Trailer

0 1 0 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1 0 1

Figure 1. An example of a packet before Manchester coding. The user data is a 12-bit ID (1234 in decimal). The length of the packet is 23 bits before Manchester coding and 46 bits after Manchester coding.

the same confidence, we choose the average pixel location if they are close. If they are far apart, the detection must be incorrect and is ignored. 2.1.3. Adaptive thresholding Since the received signal strength can change drastically depending on the emitter brightness, the distance between the camera and the emitter, and surround light, the Manchester decoding threshold 𝐓(𝑥,𝑦,𝑛) needs to be adaptive. Initially, the threshold 𝐓(𝑥,𝑦, 0) is set to 𝑇0 for all pixels. After detecting a header at a pixel location on frame 𝑛, we update the threshold at that pixel location to

𝐓(𝑥,𝑦,𝑛) = 𝛼 ∙ � |𝐝(𝑥,𝑦,𝑛 − 2𝑖)|𝐿𝐻−1

𝑖=0

,

where 𝛼 is a parameter that determined empirically. After a full packet is decoded at a pixel location, the threshold at this location is reset back to 𝑇0. 2.1.4. Synchronization The bit rate of the emitter needs to match the camera frame rate to maximize the transmission rate. The above decoding scheme works very well when the clocks of the emitter and camera are synchronized. In reality, small differences between the two clocks will cause the phase difference between them to change slowly overtime. After a while, the phase difference between the emitter and camera becomes half of a bit and the received signal will be undetectable. Fortunately, the thresholding in Manchester decoding will detect these cases and ignore them. Thus, we will lose some packets but we will not make a false identification if the thresholding parameters are carefully chosen. To avoid the situation where the two clock rates are exactly identical and the phase difference stays at half of a bit, we add a very small perturbation to the frequency of the emitter signal. 2.2. Compensating for camera shake The above optical communication scheme works well when the sending and receiving devices are both static. However, it works poorly on hand-held mobile devices because the pixel-wise correspondence among consecutive frames is often lost due to camera shake from unsteady hands holding the devices. 2.2.1. 2D max filter To tackle this problem, we compute at each pixel the maximum of the pixels in a two dimensional neighborhood of size S surrounding that pixel to compensate for camera shake. A bright pixel will become an S by S square on the filtered frame. With the max filter applied, if the movement

introduced by the camera shake is less than S/2 as shown in Figure 2, the pixel-wise correspondence among consecutive frames is still preserved for some pixels and the pixel independent decoding remains correct.

Figure 2. Left: Motion causes loss of pixel-wise correspondence among the five frames. Right: With a max filter on every frame, we can still decode the correct signal 10101 from the pixels within the green rectangle. 2.2.2. Video stabilization

The size of the 2D max filter cannot be very large because it will cause problems if the emitter is close to another emitter or other bright source such as a window. The filter can only compensate for small amounts of camera shake. If a person is walking, the camera shake can be very large. In these cases, we first stabilize the input video using a global-motion video stabilization module. The motion stabilization can compensate for large motion. In our implementation, we set the range to be +/-32 pixels. However, video stabilization alone cannot recover the pixel correspondence between frames since the global motion model is not accurate enough. For large-motion cases we thus need to cascade video stabilization with the 2D max filter to achieve good results. 2.3. Estimation of relative angular position Existing proximity detection methods such as WiFi triangulation can also find nearby devices, but they do not provide accurate angular position information for surrounding devices. We can use relative angular positions to tell devices apart and interact with them individually, for instance by “flicking” (dragging) content in the direction of a particular device. As shown in Figure 3, the angle 𝜑 between device B and the y-z plane is defined as the angular position of device B relative to device A. The x-axis and y-axis are the axes of the camera image plane and the z-axis is the optical axis of the camera. We can estimate the relative angular position 𝜑 as

𝜑 = arctan (𝑥0 −

𝑊2

�𝑓2 + �𝑦0 −𝐻2�

22),

where 𝑥0 and 𝑦0 are the horizontal and vertical pixel coordinates of the emitter on the camera image. 𝑊 and H are the width and height of the camera image and 𝑓 is the

focal length of the camera in pixels. When we tilt device A (rotate along the x-axis), the y-z plane does not change and thus by definition 𝜑 is invariant to the tilt angle of device A. Users do not have to hold their mobile devices upright to get the relative angular positions of other devices. Here we assume a pinhole camera model. If the camera has significant lens distortion, the distortion should be corrected beforehand.

Figure 3. Relative angular position 𝜑 and flick angle. The angle 𝜑 between device B and the y-z plane is defined as the angular position of device B relative to device A. The angle between a finger flick and the y-axis is defined as the flick angle of this gesture. 2.4. Visible light or IR light Our optical communication scheme can use either visible light or IR light depending on the application. The advantage of IR light is that it is non-intrusive to users, but it requires an IR-camera which is not available on current mobile devices. However, if IR-based 3D cameras become standard on future mobile devices, we can exploit those IR cameras. The advantage of using visible light is that regular cameras are already available on current mobile devices. Also, visible light is more suitable for some applications such as the QR like codes described below. 2.5. Power consumption Power consumption is critical for mobile devices. The emitter can be always on, intermittently on, or only on when the device wants to be discovered by other devices (in discoverable mode) to save power. Leaving an LED emitter always on does not consume much power because LEDs are very power efficient. Typical indicator LEDs are designed to operate with no more than 30–60mW. We find even more

efficient ones on mobile devices. The camera only needs to be turned on when the current device needs to identify other devices. After the target device is identified, the camera is turned off to save power

3. SAMPLE APPLICATIONS In this section we describe example applications that can make use of our optical line-of-site signaling technology. 3.1. People identification

An application that makes use of our technology can non-intrusively identify other people within line-of-sight who are using mobile devices. Using IDs transmitted by the devices, a service that maps from IDs to identifying information (owner’s name, title, email address, etc.), mobile applications can provide the user with a layout of who is sitting where in a meeting. Since there may be multiple devices within the line-of-sight of the current device, visualizations of the identification results should allow users to associate them with real persons easily. One method is augmented reality, which overlays the tags of the detected devices on a video captured by a regular camera as shown in the left image in Figure 4. Another method places detected devices on a circular map according to their relative angular positions φs as shown in the right image in Figure 4.

Figure 4. Visualization of the identification results. Left: augmented reality. Right: circular map. 3.2. Content sharing using finger flicks

We often need to share content with other people in the same room. Many content sharing applications can make use of our positional technology to provide an effortless sharing interface for users. For instance, we can allow a user to select content and “flick” it (drag a finger across the mobile display) in the direction of the receiving device. As shown in Figure 3, the angle between the finger flick and the y-axis is defined as the flick angle of this gesture. The device with a relative angular position φ closest to the flick angle will be the target. Or users could drag content onto a tag with the tags positioned on the display to reflect the actual positions of the associated devices as shown in Figure

4. Besides mobile devices, the target device could be a display, a projector, a printer, or a TV.

3.3. QR like code from a distance A mobile device can only decode a regular 2D-QR code from a short distance. But in many scenarios, people want to check the information from farther away. For example, shoppers in a grocery store may want to check the sale items of an aisle without walking to that aisle or pick up coupons relevant to their current position in the store. We can mount the modulated emitter (either a visible light LED or an IR-LED) in our solution on any object or wall to create a temporal QR-like code that allows multiple people to receive the encoded information from a distance. Besides a single LED, we can use multiple LEDs or a video of coded patterns displayed on a screen as a 3D-QR code (which can provide a higher bandwidth). Our previously described per-pixel decoding approach works well for these scenarios. 3.4. Controlling devices in the vicinity The positional information we gather with our technique also enables gestural control of devices in the vicinity. Besides the previous “finger-flicking,” we can use motion-sensor-based methods to recognize pointing a phone at target devices to, for instance, control the brightness or volume of a TV.

4. EVALUATION In this section we describe the experiments we have performed to investigate the behavior of our optical line-of-sight signaling technology. In particular, we explore distance and speed of packet detection. 4.1. Experimental setup We have built a real-time prototype demonstration system. The output of the system shows the IDs and the relative angular positions of detected devices. We can configure the system for both visible light and IR light. The emitter is a regular LED for the visible light setup and an 850µm IR-LED for the IR light setup. We use a hand-held camera to simulate a camera on a mobile device. The camera is a Logitech webcam for the visible light setup and a PrimeSense IR-camera in the IR light setup. We modulate each emitter at 30 bits per second using an Arduino Uno board. Figure 5 shows one example setup. The receiving device is an HP Elitebook 8540w mobile workstation equipped with the camera described above with QVGA resolution and a 30fps frame rate. We implemented the decoder in C++ which provides real-time performance. In our experiments, the system can determine the IDs and relative angular positions of multiple emitters

simultaneously from at least 5 meters away even when the camera is hand-held.

Figure 5. A visible light LED modulated by an Arduino Uno board. 4.2. Evaluation of the accuracy and robustness

For the 23-bit packet as shown in Figure 1, since it is encoded by Manchester coding, transmitting each packet at 30bps takes 1.53s. We tested our system on one-minute length video clips where the transmission was continuously repeated. In one minute, we transmit about 39 packets, but we cannot detect all of them due to camera shake, noise, and the synchronization issue described in Section 2.1.4. Therefore, the average detection time is longer than 1.53s per packet. In Table 1 we compare the average time to detect a packet and identify its ID using different camera shake compensation methods, different emitter distances, and different capture motion conditions. For the capture motion conditions, fixed means the camera is statically mounted rather than hand-held. Hand-held means the camera is held by a person standing still. Hand-held+walking means the camera is held by a person who walks in place without moving forward (to keep the distance from the emitter fixed). For each condition and distance, we do three trials with different people holding the camera. Table 1: Average time in seconds to detect packets and identify their IDs in different conditions. The shaded cells highlight poor detection times.

Condition Distance

No camera shake

compensation (s)

2D max filter (s)

Stabilization+ 2D max filter

(s)

Fixed 1m 2.00 2.00 2.00 3m 2.14 2.14 2.14 5m 2.40 2.31 2.30

Hand held 1m 5.63 2.28 2.28 3m 12 2.40 2.43 5m 20 2.47 2.37

Hand held +

walking 5m ∞ 60 2.65

Our evaluation shows that with a fixed camera we can

reliably identify an ID from at least as far as 5 meters. (We

have not tested it from further away than 5 meters.) The identification time increases slightly when the distance increases because the signal-to-noise ratio worsens. With a hand-held camera the identification time increases sharply due to camera shake. However, our camera shake compensation solution, the 2D max filter, significantly improves the performance for hand-held cameras. With the 2D max filter applied, the performance of a hand-held camera is almost the same as a fixed camera. In the last row of the table, when the camera holder walks, the camera shake motion is too large and the max filter fails to provide good performance. By applying video stabilization followed by the max filter we can achieve almost the same detection time as the fixed camera. In these experiments, the size of the 2D max filter is set to 9x9 pixels and the video stabilization motion range is +/- 32 pixels to achieve the best results. 4.2. Real-time demo of estimating relative angular positions In this demo, we used one camera and two emitters to simulate one device interacting with two other devices as shown in Figure 6. One emitter is mounted on a tablet and another is mounted on a laptop. The IDs of the emitters are set to 2000 and 4000 respectively. Our real-time system identifies the IDs of the other two devices as well as their relative angles to the camera, and displays their IDs on a circular map according to their relative angular positions. When any of the devices moves, the circular map updates automatically based on the newly identified relative locations.

5. CONCLUSION

We have developed an optical line-of-sight signaling method that enables a mobile device to detect, locate, and

receive information from other devices in its proximity. The method proves robust in the face of camera shake and larger motions, even at distances of 5 meters from a small LED emitter. Detection times vary in different conditions, but in our most stressful tests (a person walking with a hand-held device at 5 meters from the emitter) we achieved a 2.65-second average packet detection and decode time for 23-bit Manchester-encoded packets. We also illustrate the estimation of relative angular positions of other mobile devices using our real-time demo system.

6. REFERENCES [1] H. Liu, H. Darabi, P. Banerjee, J. Liu, "Survey of

Wireless Indoor Positioning Techniques and Systems," IEEE Transactions on Systems, Man, and Cybernetics, pp.1067-1080, Nov. 2007.

[2] S. Sen, B. Radunovic, R. Choudhury, and T. Minka, “Spot Localization using PHY Layer Information.” MobiSys, 2012.

[3] H. Aoki, S. Matsushita, "Balloon Tag: (in)visible marker which tells who's who." Wearable Computers, 2000.

[4] N. Matsushita, D. Hihara, T. Ushiro, S. Yoshimura, J. Rekimoto, Y. Yamamoto, "ID CAM: a smart camera for scene capturing and ID recognition," The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings, pp. 227- 236, 7-10 Oct. 2003.

[5] J. Park, K. H. Lee, S. G. Lee, "ID recognition for museum tourists using infrared LED's," Second Workshop on Digital Media and its Application in Museum & Heritage, pp.115-117, Dec. 2007.

[6] http://www.core77.com/blog/technology/will_tri-colored_dots_supplant_qr_codes_22839.asp.

[7] W. Stallings, “Data and Computer Communications.” (7th edition). Prentice Hall, pp. 137–138, 2004.

Figure 6. Our real-time prototype demonstration system. Two devices with emitter ID 2000 and 4000 are identified by the current device with a camera. The current device shows a circular map indicating the relative angular positions of the other two devices. When any of these devices moves, the relative angular positions on the circular map update automatically.

Documents

Line-of-sight signaling and positioning for mobile …Line-of-sight; device positioning; device identification; optical signaling; mobile device We are developing an optical line-of-sight