[IEEE 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and

CONTROL YOUR SMART HOME WITH AN AUTONOMOUSLY MOBILE SMARTPHONE

Haidong Wang, Jamal Saboune, Abdulmotaleb El Saddik

Discover Lab, University of [email protected], [email protected], [email protected]

ABSTRACT

Since smart homes will be more and more popular in the fu-ture, the needs of finding a friendly and intelligent interfacebetween users and home environments are growing. How-ever, conventional interfaces have reached their limits. There-fore, this paper describes a new concept of interface in a homeenvironment. This work is characterized by building a smartrobot: a smart phone is used as the brain, and a robot car isused as the body. With this system, a person will be allowedto carry the robot’s brain when she is out of their home. Thus,the brain can not only be a normal smart phone, but will alsomonitor different parameters such as the temperature at home.On the other hand, when the person is at home, she will be al-lowed to put their phones on the robot car. By invoking thesystem, she will have an assist-robot. This assist-robot willact as a new interface in the smart home by following the per-son and recognizing the person’s voice commands for takingnotes, reading notes and controlling smart home devices.

Index Terms— Smart Home, Robot, Smart phone, Sepa-rable Robot, Following a person

1. INTRODUCTION

The smart home environment refers to a home where devicesare smart enough to acquire information from the users andtheir living environment, and use this collected informationto serve people better.

There are several core technologies for studying smarthomes according to [1], which are listed as follows: technolo-gies of sensors and actuators, communicating technologies,and interfaces and management technologies.

With the fast development of these smart home tech-nologies, people have developed many smart home devicesthat contain sensors and actuators to serve people at home.Also, many communicating technologies can act as bridgesbetween sensors, actuators and interfaces. Moreover, thereare many technologies that can be used as interfaces in smarthomes.

However, computers, tablets, smart phones and robots,which are widely used as interfaces in smart homes [2] [3] [4][5], some have drawbacks. In brief, computers and laptopsare not convenient to carry when people leave their homes.

Although the Internet and web services [6] [7] offer a goodmethod to let people remotely control their smart homes,people still need to find a computer or laptop first. Smartphones has improved the smart home interfaces by makingthem smaller, but they still need people to hold and carry themat home, and this is not convenient. Robot assistants are a po-tential solution for interfaces at home. However, robots areexpensive, and not convenient to use when people leave theirhomes.

Therefore, the main contribution of this paper is that wepropose a composite robot as the interface in a smart home.This robot consists of an Android smart phone combined toWIFI robot car. When a person is not at home, she can receiveinformation such as room temperature and controls home de-vices such as lights through her smart phone. When a personis at home, she can put her phone on a WIFI robot car. Thisrobot assistant can understand the user’s voice-command totake personal notes, voice control smart home devices, andfollow the user visually. This software/hardware solutionwould solve the high cost and portability problems of a robotand the lack of self movement problem of a smart phone.Moreover, smart phones are the kind of devices that almostall of us own and are familiar to use. The second contribu-tion of this paper is that we successfully allow our compositeto robot follow a person by using the smart phone’s camera.We designed an algorithm to detect the direction that a personis walking according to the person’s position in each framecaptured by the smart phone’s camera. Then, the smart phonecontrols the robot car to follow the person. The third contribu-tion of this paper is the development of a method to let a smartphone be aware of when it should invoke its voice recogni-tion itself by using its camera. Today, many voice recogni-tion services, such as Google voice search and SIRI, allowsmart phones to understand voice-command. However, mostof them require people to press a button to initialize the voicerecognition service, which it is impractical for our robot. Thispaper is organized as follows. We briefly discuss some previ-ous works on smart home interfaces in Section 2. The detailof our robot are explained in Section 3. Some preliminaryexperimental results and evaluation of our approach are pro-vided in Section 4. The conclusion and future work are sum-marized in Section 5.

Fig. 1. The high-level design

2. PREVIOUS SMART HOME INTERFACES

Computers are traditional devices to act as an interface in asmart home. They were used in many previous works suchas the Aware Home [2] and Ubiquitous Home [3]. Mean-while tablets, PDAs and Phones are widely used as interfacesin smart homes. For example, a PDA was used by Stephen S.Intille from MIT in [8] and a phone was used by the authorsin [9]. They are portable when people leave their home, but itis not comfortable for people to hold them at home. Robotsare more popular as smart home interfaces in recent research[4] [5]. Wener et al. evaluated a human robot NAO in [10].They show that human robots like NAO will make people feelsafe and without fear, and that these robots can also communi-cate with people through voice which makes people feel morecomfortable. However, robot are still not mobile enough tolive with people outside home.

3. THE PROPOSED SEPARABLE ROBOTINTERFACE

3.1. High-level design

Figure 1 illustrates the high-level design of our system, inwhich there are two modes: the Portable Mode and the RobotMode.

The Portable Mode: If a person is not at home, her An-droid OS phones can be manually connected to their GoogleCalendar or a Home Automation web service. The home au-tomation web service we implemented was developed in aprevious work. This web service allows users to monitor theirroom information and to control home devices. Similarly, the

Fig. 2. Our proposed separable robot

Google Calendar is a web service for a person to take notes.In this case, the Android phone would be acting as a tradi-tional phone interface in a smart home.

The Robot Mode: When a person is at home, she can se-lect the Robot Mode and put her Android phone on the robotcar. Then, a smart robot would be set up and act as the in-terface of the smart home. It can start a conversation with theuser by recognizing the user’s face, and it can invoke differentweb services by understanding the user’s voice commands.Moreover, it is able to follow the user if the user wishes to.

3.2. Invoking the Google Voice Recognition

Under the Robot Mode, the smart phone is attached on a robotcar, as shown in figure 2, and the user could be several metersaway from the separable robot. In this situation, it is unfeasi-ble for the user to press the button on the phone to invoke thevoice recognition and do voice commands. Therefore, in thissubsection, the method we developed to let the smart phonebe aware of when it should invoke its voice recognition byusing its camera, will be explained.

People commonly notice that someone wants to talk tothem if they find that someone is looking at them, so we pro-posed a method for Android phones by using the similar con-cept, and the method is based on image processing.

When the Robot Mode is activated, the camera of the An-droid phone is opened and looks for faces first. We implementthe Haar-cascade model in OpenCV (Open Source ComputerVision) to detect faces. Haar like features cascade presentedin [11] is an object detection application based on a boostedcascade of simple features. The face features mainly used inHaar-cascade face detection are the eyes. Therefore, a per-son’s face is only detected if he or she faces the camera, andthis is similar to the situation when a person wants to start aconversation with another.

Moreover, in order to improve the security and personal-ization of our system, before invoking the Voice Recognition,we developed a face recognition web service by implement-ing the Eigenfaces algorithm for recognizing a person [12].

Fig. 3. The flow chart of initializing voice recognizer

Therefore, under the Robot Mode, if a face is detected, ascreenshot will be sent automatically to the face recognitionweb service at the same time. The face in the screenshot iscompared with the information saved in the web service. Af-ter the face recognition, a name will be returned to the smartphone. Then, the smart phone will start the conversation bycalling the user by her name.

3.3. Following a Person

In our system, the robot needs to follow the person by usingthe smart phone’s camera and the robot car’s wheels. There-fore, we developed a method for following a person visually.In our proposed method, we try to take advantage of the mo-bility of the robot car to follow a person, instead of calcu-lating the particular position of the person in the real worldvisually and then controlling the robot to follow the person.The basic concept of our method is that the person’s positionis compared with two thresholds in each frame taken by thesmart phone’s camera. The thresholds in our system is a valuein pixels, beyond which we consider that the person moved.Then, the robot car moves forward in order to keep a constantdistance to the person, and it turns left or right in order to keepthe person in the middle of each frame.

Figure 3 illustrates the flow chart of how our Robot Modeworks to follow a person. It will be explained as follows.

3.3.1. People Detection Method

Before the phone controls our robot car to follow a person,the phone requires the position of a person. In 2005, Dalaland Triggs [13] found that locally normalized Histogram of

Oriented Gradient (HOG) descriptors are excellent at describ-ing feature sets of humans relative to other feature sets. ThisHOG method for human detection has proven to be particu-larly successful, so we implement it in our system. The reso-lution of each frame taken by the camera is set to 240*160, inorder to improve the detection speed in real-time. Also, we setthe maximum height of a person in each frame to 140 pixels,in order to ensure the accuracy of the person’s detection.

Then, the person’s position in each frame is compared toa left-right threshold and a forward threshold which is com-puted by using our algorithms.

3.3.2. The Thresholds

To calculate the left-right and forward thresholds for find-ing out the direction of the a person’s movement, we requireinformation about the image resolution taken by the smartphone camera, the horizontal center point and the verticalrange (pixels) of a person’s whole body when she is detectedin the new position, the minimum detection distance betweenthe robot and the person, and the horizontal angle of view ofthe camera.

As mentioned above, the image resolution for person’s de-tection is 240*160 pixels. The angle of view of the cameradepends on the manufacturer of the smart phone.

By combining the position of the smart phone on the robotcar, the height of the person, and the vertical angle of view ofthe camera, we can calculate the distance between the smartphone and the person when the smart phone is used to de-tect the person and control the robot car to follow the person.Figure 4 is an example, in which the height of the person’swhole body is 140 pixels in the image captured by the smartphone’s camera. The point o stands for the camera, the lineAB stands for the height of the person, the angle AoB standsfor the vertical angle of view of the camera. Then, the follow-ing distance oC can be calculated by equation 1 and 2.

tan 6 AoC =AC

oC(1)

tan 6 BoC =BC

oC(2)

As the smart phone camera is a pinhole camera, Figure5 shows a mathematical model for calculating whether theperson is moving left or right. The point O strands for theoptical center of the smart phone’s camera. The point E is thecenter point of the person’s body displayed in an image whenhe or she has moved from D to P. The line OD stands for thefollowing distance. The angle AOB is the horizontal angle ofview of the camera. If a person is one step away from thecenter point D, we consider the related value of the line CE tobe the left-right threshold. It can be calculated by equation 3.

CE =DP ∗ AE

tan 6 AOE

OD(3)

Fig. 4. Calculating the following distance

Fig. 5. Mathematical model for moving left or right

Figure 6 shows a mathematical model for calculatingwhether the person is moving forward. The line C’D’ is thenew vertical range of the person’s body in image when theperson has moved from AB to A’B’. The angle CoD is thereal vertical angle of view of the camera. The line OL standsfor the following distance. If a person has move one step for-ward, we consider the related value of the line C’D’ to be theforward threshold. It can be calculated by equation 6.

6 BOB′ = arctan(BL

OL)− arctan(

BL

OL+AA′ ) (4)

6 AOA′ = arctan(AL

OL)− arctan(

AL

OL+AA′ ) (5)

Fig. 6. Mathematical model for moving forward

C ′D′

CD=

tan(6 COD

2 − 6 AOA′) + tan(6 COD

2 − 6 BOB′)

2 tan(6 COD/2)(6)

According to [14], the average step length for an adult is(0.74m+0.04/leg length). Therefore, if a person is stepping toleft, right or forward, his or her body’s central point’s positionwill be about 0.37m away from the original position. There-fore, the value of line DP in 5 and line AA’ in 6 is 0.37m.

4. EXPERIMENT RESULTS

Figure 7 shows that our robot is following a person. Therobot consists of a HTC EVO 3D Android phone and a WIFIrobot. The WIFI robot was developed by the MXchip In-formation Technology company in Shanghai. This robot caris powered by a pair of 14500 3.7v Li-ion batteries. Therobot car’s wheels are controlled respectively by four smallmotors. These motors are connected with a STM32F107ARM Cortex-M3 Board [15], which incorporates the high-performance ARM Cortex-M3 32-bit RISC core. The ARMcore operates at up to 72 MHZ frequency, with 256KB Flashand 64KB RAM internal memory. Also, a 3.2 inch TFT LCD

Fig. 7. Following a person in the real world

Panel (320*240) with touch screen is mounted on the board.Moreover, a WIFI module EM380C, which is also designedby the MXchip Information Technology company, is used onthe board. The protocol for controlling this robot car throughWIFI is TCP/IP, where the IP address of the robot car inour system is ”192.168.1.8”, and the IP address of the smartphone is ”192.168.1.2”.

We have done experiments to test the accuracy of our al-gorithm for visually detecting a person’s moving directionvisually in the real world. We invited 8 people (7 man, 1woman, age from 20 to 28) to do the experiment respectively.According to their height, the related thresholds and follow-ing distance in our system were calculated for each person byusing our algorithm. For example, if a person is 1.8m tall,the minimum following distance is 2.9m from the robot car,and the left-right and forward thresholds are 30 pixels and126 pixels. Then, After making sure that each person wasdisplayed in the middle of the image and was detected suc-cessfully by the smart phone, each person was asked to stepleft, step right, one step forward, and two steps forward for10 times each. A TCP debugging tool was used to simulatethe robot car of our system and display each command thatthe smart phone send. If the TCP debugging tool receiveda correct command according to the moving direction of theperson, it was considered to be a successful detection. Table 1shows that our algorithm have high accuracy of detecting thedirection a person is moving.

Also, each person is required to sit on a chair while do-ing anything he or she wants while our robot is standing be-side, and it is in Robot Mode. Then, the user is able to facethe smart phone’s camera if he or she wants to interact withthe robot. After that, the user filled questionnaire about thenatural of the interaction with the robot and the convenience

Table 1. The average ratio of detecting a person’s movingdirection successfully

Detecting Detecting Detecting DetectingLeft Right one step two steps

Successfully Successfully Forward ForwardSuccessfully Successfully

95% 96.25% 72.5% 100%

of its different features. The questionnaire consisted of thefollowing two questions: how nature our system is to start aconversation and how good the idea is that our robot is able torecognize the user. For question 1, 7 people were highly sat-isfied and 1 were satisfied, while for the question 2, 8 peoplewere highly satisfied.

5. CONCLUSION AND FUTURE WORK

We introduced a new potential interface for smart homes inthis paper. It is a composite robot, in which smart phone actsas a brain and a WIFI robot acts as a body.

We developed a method by implementing the Haar-cascade from the OpenCv for the Android phone to be awareof when it should invoke the Google Voice search itself.Moreover, we also developed a face recognition web serviceby implementing the Eigenfaces algorithm to identify the userbefore invoking the Google Voice RecognizeIntent.

Experiments were conducted to test the accuracy of ouralgorithm for detecting a person’s moving direction in realtime. Moreover, we evaluated the proposed method to invokethe Google Voice Recognize through face detection. The ex-perimental results show that our proposed system is well ac-cepted.

Improving the person’s detection method is important fu-ture work. From table 1, we can find that the rate of successfulone step forward detection is only 72.5%. The reason is thatalthough HOG has proved to be a successful method for de-tecting a person, it still has problems when it is implementedin our system. One problem is that the vertical range of aperson’s body fluctuates a lot.

We also intend to improve the robot car. First of all, wewill add more sensors on the robot car to avoid obstacles. Sec-ondly, the robot body will be more like a human. Ideally, therobot will be able to move upstairs and downstairs and to beused in complicated environment.

6. REFERENCES

[1] S.Das D.Cook, Smart Environments: Technology, Pro-tocols, and Applications, Wiley Inter-Science, 2004.

[2] C. Kidd, R. Orr, G. Abowd, C. Atkeson, I. Essa, B. Mac-Intyre, E. Mynatt, T. Starner, and W. Newstetter, “Theaware home: A living laboratory for ubiquitous comput-ing research,” Cooperative buildings. Integrating infor-mation, organizations, and architecture, pp. 191–198,1999.

[3] T. Yamazaki, “Ubiquitous home: real-life testbed forhome context-aware service,” in Testbeds and ResearchInfrastructures for the Development of Networks andCommunities, 2005. Tridentcom 2005. First Interna-tional Conference on. IEEE, 2005, pp. 54–59.

[4] C.Giovanni, D.Berardina, and P.Sebastiano, “Socialrobots as mediators between users and smart environ-ments,” in Proceedings of the 12th international confer-ence on Intelligent user interfaces, New York, NY, USA,2007, IUI ’07, pp. 353–356, ACM.

[5] S.Baeg, J.Park, J.Koh, K.Park, and Moon-Hong Baeg,“Building a smart home environment for service robotsbased on rfid and sensor networks,” in Control, Automa-tion and Systems, 2007. ICCAS ’07. International Con-ference on, oct. 2007, pp. 1078 –1082.

[6] M. Aiello, “The role of web services at home,” 2005.

[7] J. Parra, M.A. Hossain, A. Uribarren, E. Jacob, andA. El Saddik, “Flexible smart home architecture us-ing device profile for web services: a peer-to-peer ap-proach,” International Journal of Smart Home, vol. 3,no. 2, pp. 39–56, 2009.

[8] S.S. Intille, “Designing a home of the future,” PervasiveComputing, IEEE, vol. 1, no. 2, pp. 76 –82, april-june2002.

[9] M. Al-Qutayri, H. Barada, S. Al-Mehairi, and J. Nuaimi,“A framework for an end-to-end secure wireless smarthome system,” in Systems Conference, 2008 2nd AnnualIEEE. IEEE, 2008, pp. 1–7.

[10] K. Werner, J. Oberzaucher, and F. Werner, “Evaluationof human robot interaction factors of a socially assistiverobot together with older people,” in Complex, Intelli-gent and Software Intensive Systems (CISIS), 2012 SixthInternational Conference on. IEEE, 2012, pp. 455–460.

[11] P. Viola and M. Jones, “Rapid object detection using aboosted cascade of simple features,” in Computer Visionand Pattern Recognition, 2001. CVPR 2001. Proceed-ings of the 2001 IEEE Computer Society Conference on,2001, vol. 1, pp. I–511 – I–518 vol.1.

[12] M.A. Turk and A.P. Pentland, “Face recognition usingeigenfaces,” in Computer Vision and Pattern Recogni-tion, 1991. Proceedings CVPR’91., IEEE Computer So-ciety Conference on. IEEE, 1991, pp. 586–591.

[13] N. Dalal and B. Triggs, “Histograms of oriented gradi-ents for human detection,” in Computer Vision and Pat-tern Recognition, 2005. CVPR 2005. IEEE ComputerSociety Conference on, june 2005, vol. 1, pp. 886 –893vol. 1.

[14] J.JudgeRoy, B.Davis, and S.Ounpuu, “Step length re-ductions in advanced age: the role of ankle and hip ki-netics,” The Journals of Gerontology Series A: Biolog-ical Sciences and Medical Sciences, vol. 51, no. 6, pp.M303, 1996.

[15] “Stm32f107,” http://www.embedinfo.com/en/list.asp?id=57Accessed February 24, 2013.

Documents

[IEEE 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and