Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
A Smart Home Gateway Platform for Data
Collection and Awareness
Pan Wang, Feng Ye∗, Member, IEEE, Xuejiao Chen
Abstract
Smart homes have attracted much attention due to the expanding of Internet-of-Things (IoT) and
smart devices. In this paper, we propose a smart gateway platform for data collection and awareness
in smart home networks. A smart gateway will replace the traditional network gateway to connect the
home network and the Internet. A smart home network supports different types of smart devices, such as
in home IoT devices, smart phones, smart electric appliances, etc. A traditional network gateway is not
capable of providing quality-of-service measurement, user behavioral analytics, or network optimization.
Such tasks are traditionally performed with measurement agents such as optical splitters or network
probes deployed in the core network. Our proposed platform is a lightweight plug-in for the smart
gateway to accomplish data collection, awareness and reporting. While the smart gateway is able to adjust
the control policy for data collection and awareness locally, a cloud-based controller is also included for
more refined control policy updates. Furthermore, we propose a multi-dimensional awareness framework
to achieve accurate data awareness at the smart gateway. The efficiency of data collection and accuracy
of data awareness of the proposed platform is demonstrated based on the tests using actual data traffic
from a large number of smart home users.
Index Terms
Smart Home; Smart Gateway; Data Collection; Data Awareness; IoT.
Pan Wang is with Department of Modern Posts, Nanjing University of Posts & Telecommunications, Nanjing, China. E-mail:
Feng Ye is with the Department of Electrical and Computer Engineering, University of Dayton, Dayton, OH, USA. E-mail:
Xuejiao Chen is with the Department of Communication, Nanjing College of information Technology, Nanjing, China. E-mail:
April 5, 2018 DRAFT
arX
iv:1
804.
0124
2v1
[cs
.NI]
4 A
pr 2
018
2
I. INTRODUCTION
A smart home is a cyber physical system built on Internet of Things (IoTs), computers, and
smart electric appliances, with human interactions through in-home communication networks
and the Internet [1]. As a data concentrator and gateway to the Internet, a smart home gate-
way monitors smart home devices that control the home environment and serve home users.
Traditionally, a network gateway, e.g., a modem/router, bridges the Internet connection with
the in-home local area network. The gateway is also the network manager for most in-home
network devices, such as smart phones, smart electric appliances, TV boxes, etc. In a smart
home setting, the traditional network gateway would struggle to provide user-oriented network
management. Therefore in this paper, we propose a smart gateway platform that can collect data
and sense data for the network service provider to optimize network resources based on user
quality-of-experience (QoE) in smart homes.
In-home IoT devices, such as smart lockers, visitor video recorders, remote controllers, etc.
are connected through various communication technologies to provide different types of smart
applications, including environment monitoring, security, home automation, user entertainment,
etc. [2]. The overall network management of those smart home applications is conducted at the
smart gateway. The smart gateway also interacts with external systems such as cloud services
and Internet services. In order to provide network services with good user QoE, a large amount
of data must be collected by the smart gateway for analysis, e.g., in a cloud. The data analysis
would be focused on the network quality-of-service (QoS) measurement data, security and smart
home user behavior [3–5].
In a traditional setting, a network service provider would deploy dedicated measurement
agents, such as optical splitters, network test access points, etc., in the core network. Measure-
ments would be taken passively to collect data [6]. However, the traditional setting has several
drawbacks to be applied to smart homes. First, the traditional setting has high complexity and
high cost of deployment. Second, it is challenging to keep up with hardware upgrade to meet the
growing demands of smart homes [7]. In this paper, we propose a smart gateway platform for
data collection and awareness that can be deployed at each smart home. The proposed platform
is a simple piece of software plug-in embedded in a smart gateway. Once data is collected,
data awareness can be performed also at the smart gateway based on control policies that are
assigned from a cloud controller. We propose a multi-dimensional awareness (MDA) framework
DRAFT April 5, 2018
3
to set control policies for data awareness. Thus data can be accurately classified depending on
multiple factors, e.g., application, location, device, etc. While the cloud controller is capable of
overwriting policies of a smart gateway, the smart gateway is allowed to adjust policies based on
its data collection and processing results for accurate data awareness. The proposed framework
is tested with a data set collected in 90 days. The results demonstrated the efficiency of the
proposed smart gateway platform and the accuracy of the proposed MDA scheme.
The remaining of the paper is organized as follows. The proposed smart gateway data collection
and awareness framework for smart homes is presented in Section II. The MDA scheme is
presented in Section III. Deployment and operation of the proposed data collection and awareness
schemes are presented in Section IV. Evaluation and experimental results are presented in
Section V to demonstrate our proposed method. Finally, the conclusion is drawn in Section VI.
II. A SMART GATEWAY DATA COLLECTION AND AWARENESS FRAMEWORK
A. Overview of the Proposed Framework
The proposed smart gateway data collection and awareness framework for smart homes is
shown in Fig. 1. The framework consists of three layers: smart home infrastructure layer, smart
gateway layer, and smart home cloud layer.
The smart home infrastructure layer consists of smart devices in a smart home, such as
smart appliances, computers, in-home IoT devices, etc. Smart devices require access to the
external network, i.e., the Internet, through the smart gateway.
The smart gateway layer consists of the smart gateway, which is host for the Home Gateway
Unit (HGU) [8]. The HGU performs the core functions of data collection and awareness at the
smart gateway. Specifically, a simple pieces of software plug-in is implemented at the level of
operating system (OS).
The cloud layer is provided by network service providers and smart home service providers
for three functions. First, a cloud is to store data reported by smart homes in the format of Smart
Home Detail Record (SHDR). Second, a cloud also receives the status of each HGU through the
HGU Management System (HMS). Third, data collection and awareness policy will be adjusted
and sent by the cloud [9]. As smart homes are numerous and widely distributed, they often
require hierarchical and sub-regional smart home clouds.
April 5, 2018 DRAFT
4
Fig. 1: Smart gateway data collection and awareness framework for smart home networks.
B. The Software Architecture of HGU
The proposed software architecture of HGU is shown in Fig. 2. Since the HGU physically
resides in the smart gateway, thus it provides networking functions for various smart devices
in the home network. It also connects to the external network, i.e., the Internet. The software
architecture of HGU includes the parts as follows: HGU OS, basic service platform, traffic
collection plug-in, MDA plug-in, and data report interface.
The HGU OS provides basic functions of the gateway, including packet forwarding, ad-
dressing, QoS and security. For example, OpenWrt is the most popular OS for smart gateway
developers.
The OSGi is chosen for the open platform. The specifications of OSGi describe a modular
system and a service platform for Java programming language [10]. With the OSGi open platform,
DRAFT April 5, 2018
5
Fig. 2: The software architecture of HGU.
applications or components of a smart gateway can be remotely installed, started, stopped,
updated, and deleted without interrupting the on-going operation of the system.
The traffic collection plug-in located in the kernel is responsible for extracting IP packets from
the network card driver. Packets can be captured by mounting different hook functions based
on the NetFilter framework. However, gateway manufacturers start to add network hardware
acceleration function to improve the efficiency of packet processing. This process will stop
the OS kernel from receiving packets. Fortunately, the problem can be bypassed if gateway
manufacturers are willing to open the related interfaces. Once the packets are collected, useful
information will be extracted and sent to the MDA plug-in.
The MDA plug-in is to perform data awareness according to multiple factors, such as types
of services, devices, applications, QoS, etc. The results of data awareness are formatted into
SHDR and saved as a file to be submitted to the data report interface.
The data report interface consists of two types of services: non real-time reporting and real-
time reporting. The non real-time reporting is for offline analysis of a single class of data [11].
The real-time reporting is for delay-sensitive data, e.g., alarms, notifications, real-time feedback
and control, etc. The format SHDR includes the records of user behavior on the Internet, QoS
April 5, 2018 DRAFT
6
measurement data, etc. The format can be self-adjusted according to the configuration policy
issued by the HMS in the cloud in order to meet different needs of data acquisition requirements
in smart homes. The final data upload can use HTTP Post or HTTP Put mode.
III. MULTI-DIMENSIONAL AWARENESS FOR IN-HOME DATA
Accurate data awareness in smart homes can help network service providers to allocate network
resources adaptively. Therefore, it will help to improve network reliability and security, to provide
real-time protection and to enhance active service capabilities for smart home applications.
Accurate data awareness can also enhance QoE of smart home users. By understanding users’
profiles, such as application preference, active time and locations, types of smart devices, etc.,
the network service provider can discover service areas, top services and types of devices in
smart homes so as to effectively improve user QoE in smart homes. In this section, we propose
a data awareness scheme based on multi-dimensional factors, as illustrated in Fig. 3. The dimen-
sions include services-oriented awareness, application-oriented awareness, location awareness,
QoS awareness, devices-oriented awareness, and subscriber-oriented awareness. Note that the
framework has a modular design, which can be easily updated in the future.
Fig. 3: Multi-dimensional awareness for in-home data.
Services awareness and application awareness are the dimensions based on different types
of services and applications. In comparison, service awareness is more coarsely-grained than
application awareness. For example, types of service include HTTP, P2P, etc. HTTP services
can be further divided into different applications such as web browsing, HTTP video streaming,
web gaming, etc. In the recent years, even finer definitions of applications, i.e., actions of each
DRAFT April 5, 2018
7
application have attracted attentions of researchers. For example, an E-commerce web browsing
application can be further divided into actions such as general browsing, searching, checking out,
etc. Traffic identification methods are usually applied to realize service awareness and application
awareness. For example, deep packet inspection (DPI), port matching, connection pattern recogni-
tion, statistical traffic feature recognition, etc. However, port matching and protocol analysis have
become ineffective due to more proprietary and customized service protocols. DPI technology is
still effective in this case. Nonetheless, it requires constant updates of the database of application
signatures for accurate data awareness using DPI. Moreover, DPI technology cannot be applied
to the encrypted services, e.g., HTTPS [12].
Recently, more research efforts have been made in traffic modeling with machine learning
methods to extract connection patterns. For example, learning methods such as HMM [13],
Naive Bayesian models [14], AdaBoost and maximum entropy methods [15] have been applied
for identifying the types of services and applications. In smart home networks, traditional Internet
services and home entertainment applications can be identified by port matching, protocol
analysis, DPI or a combination of these methods. For example, online multi-media stream-
ing services often use the HTTP protocol for transmission. Moreover, DPI can be applied to
identify the name extensions. Specifically, the file name extensions of multi-media data are often
distinguishable, e.g., mov, asf, 3gp, swf, etc. Therefore, online multi-media streaming services
can be easily identified by using HTTP protocol analysis combined with DPI methods. However,
home automation applications such as smart smoke detections and smart light controls often use
proprietary protocols to interact with the server for security consideration. Therefore, it cannot
be identified by port matching or protocol analysis. Nonetheless, the cloud servers for such
applications are often limited and the target IP addresses are also fixed in most cases. As a
result, such applications can be recognized by extracting the targeted IP addresses. In some
specific actions of applications that are encrypted due to security considerations, such as smoke
alarm of smart smoke detection, the traditional identification methods will be useless. In this
case, we should consider session parameters, such as the number of packets, packet lengths,
durations of each session, inter-arrival time of incoming packets, etc.. We can use machine
learning methods, e.g., decision tree, to model these factors and identify such applications. For
example, smart smoke alarm packet length usually has a fixed length, i.e., 96 bytes.
QoS awareness is the dimension based on network parameters, such as bandwidth, delay and
concurrent connections. Such measurements are sent to the cloud server for further evaluation.
April 5, 2018 DRAFT
8
The network service provider and the smart gateway will optimize the network management
accordingly. Traditionally, QoS measurements are conducted by implementing passive or active
probes at core network links. However, the accuracy is not guaranteed in smart homes. In the
proposed MDA scheme, different QoS parameters are formulated for different applications at the
service level. For example, to measure bandwidth awareness, we can calculate the accumulative
packet length of the wide-area network interface of a smart gateway. Statistics of various
bandwidths can be found based on types services, applications and devices. As for delay, we can
first record the interval of the first and last packets of an interactive session in the smart gateway.
Then, we can get the delay of services or applications based on sampling measurement data.
Those methods are mostly passive measurement. In comparision, active measurement is done by
sending controlled testing packets to destination servers. Both passive and active measurements
are implemented as software plug-in that is embedded in smart gateways. With QoS awareness,
network service provider can locate the bottleneck of QoS more accurately and quickly, thus to
improve network operation and maintenance.
Device awareness is the dimension based on the types of devices and operating systems of
devices. Device awareness is mainly conducted through passive measurement methods such as
DPI, identification of MAC addresses, identification of user agents, etc. For example, the user
agent in an HTTP header has a distinguishable pattern, e.g., AppleWebKit/534.30 (KHTML,
like Gecko) Version/4.0 Mobile Safari/534.30, which is the description of the web browser
of the smart devices. With this information, we can identify different types of smart devices,
especially home entertaining equipment such as TV boxes, gaming consoles, etc. As for home
automation appliances such as smart fire detectors, smart thermometers, we can easily identify
them according to their hardware addresses (e.g., MAC addresses) which are already recorded
by the smart gateway during network initialization in smart homes.
Services provider awareness is the dimension based on the identification of services providers,
such as Facebook, Twitter, Youtube, etc. Services provider awareness can be realized by extract-
ing the service IP addresses, service Uniform Resource Locator (URL), etc.
Location awareness is the dimension based on the identification of the physical locations
of a user to obtain congestion points (such as APs and base stations) that may have access
bottlenecks in the network, so as to provide the data basis for network capacity plan. The technical
attributes of the location dimension include the physical location, and access topology of a user.
The awareness of the location dimension is mainly conducted by identifying IP addresses, port
DRAFT April 5, 2018
9
numbers, identification of the Dynamic Host Configuration Protocol (DHCP) server, etc.
Subscriber awareness is the dimension based on the types of subscriber. The awareness can
be achieved by identifying user accounts, DHCP address segments, IP address, port numbers,
etc.
IV. THE OPERATION OF DATA COLLECTION AND AWARENESS
Data collection and awareness are operated with the permission of users. Users that are not
participated in this program will be provided with traditional network services. With permission,
the HGU based data collection and awareness will perform extensive data processing and analysis
on the cloud to identify the bottlenecks of network performance and adjust network services to
enhance user QoE. The process of data collection and awareness is shown in Fig. 4.
Fig. 4: Traffic data collection and processing.
April 5, 2018 DRAFT
10
Since most applications in smart homes use HTTP as the interactive protocol, the behavior
information of users is mainly from HTTP request messages. A HTTP request message consists
of two types: one is the HTTP Get message, which often contains detailed requests from home
users for cloud services or applications such as requesting a link to the URL of a website
or a video clip of a streaming application. The other type is the HTTP Post message, which
contains the User Generated Content (UGC), such as website comments, microblog etc. Because
the content of HTTP Post often contains user privacy, network operators do not collect such
information in most cases. Therefore, our focus is on the information collection and process of
HTTP GET messages. As mentioned earlier, the message information is stored in SHDR format.
This record usually contains service or application type, source/destination MAC addresses,
source/destination IP addresses, source/destination ports, packet length, arrival timestamp of a
packet, HTTP GET header (i.e., URL/Host Name/User Agent/Referer), etc. The gateway plug-in
will first write SHDR to the file and then periodically upload it to the cloud, or upload it in real
time through the Kafka interface.
The plug-in for non real-time traffic awareness decodes HTTP GET message according to
the specifications of HTTP protocol after receiving raw packets from the plug-in for traffic
collection based on the net-filter in the OS kernel. It then extracts the information from key
fields of the HTTP GET request header. Due to redundant information (e.g., JS scripts, CSS
style sheets, pictures, advertisement links, etc.) in the HTTP GET message, there will be high
computing overhead on cloud storage if all data is collected. Therefore, the collected data is
cleansed first according to the policy subscribed from the cloud controller. The filtered data will
then be reported through the data report interface. The cleansing process ensures the efficiency
and accuracy of data mining in the cloud.
The plug-in for real-time traffic awareness operates in a different way. Once the plug-in
completes the decoding of the HTTP GET message and the extraction of the HTTP GET message
information, it reports to the cloud in real time. This information usually triggers some QoS
related real-time control in smart homes. For example, smart smoke alert information should be
reported in real time to request a higher level of QoS for emergency. All of the plug-ins update
their policies with policy configuration scripts, which are based on the Lua-based scripting engine.
The high-concurrency data collectors are implemented based on the Flume-ng architecture or
Nginx and Nodejs. Data access can be based on Hadoop HBase. In addition, non real-time data
can be analyzed using Hive and real-time data is analyzed and mined using Storm.
DRAFT April 5, 2018
11
V. EVALUATION AND EXPERIMENT RESULTS
A. Settings and Data Sets for Evaluation and Experiments
In this section, we demonstrate the proposed smart gateway platform and the MDA scheme
using data collected from 7195 smart home users over 90 days. The volume of data is roughly 10
GBytes per day. All plug-ins are distributed and installed from the cloud to each smart gateway.
The configurations of the tested smart gateway are as follows: 2000 DMIPS, clocked at 600MHz,
512MByte RAM, 256MByte Flash memory. The plug-in itself is 2.7 MB in size.
B. Data analysis of Smart Home
We first demonstrate the MDA scheme for data awareness. As shown in Fig. 5. The proposed
scheme is able to identify data traffic in multiple dimensions. Due to limited space in this paper,
only the awarenesses of device, location and QoS are illustrated in the figure.
92.1742%
3.5273%
2.4770%
1.7700% 0.0350%
0.0078%
0.0041% 0.0041%
0.0005%
The Ratio of Devices Type in Smart Home Smart phone
PAD
TV Box
Smart smoke detector
MP3 Player
Mobile Handheld
Student tablet
Reader
digital camera
0 20000 40000 60000 80000 100000 120000 140000
Apple
Xiaomi
Samsung
HUAWEI
Coolpad
Vivo
Meizu
OPPO
Lenovo
Number of Smart Phones
0
100
200
300
400
500
600
700
800
900
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
QoS Analysis of HTTP Video
successful connection rate (%) handshake delay(ms)
Location of Smart Home
Fig. 5: Results of data awareness in multiple dimensions.
Specifically, coarse-grained types of devices, such as smart phone and PAD can be identified.
It can be seen that the proportion of smart phones is much larger than other devices. Fine-
grained types of smart devices, such as different brands of smart phone can also be identified
with the proposed MDA scheme. Besides smart phones, we found it clear to identity the brand
and model of TV boxes by extracting information from user agents. The proposed platform
April 5, 2018 DRAFT
12
and MDA scheme are also successful in detecting smart smoke detectors by checking the
MAC addresses and HTTP URL records that include the same destination host name of service
providers. The QoS awareness results are generated mostly based on HTTP video (mostly from
Chinese websites) in smart homes. The QoS results can be clearly captured by checking the
rates of successful connection and delay of handshakes. Network service providers can certainly
achieve better network management to enhance user QoE in smart homes based on such results.
C. Performance of the Plug-in
In this subsection, we evaluate the performance of the plug-in that is installed in each gateway.
The test is conducted by analyzing the usage of the central processing unit (CPU) and memory
of the smart gateway with and without an active plug-in. In particular, we created a script file on
a laptop to simulate 10,000 HTTP GET requests per second to the cloud, which is a demanding
case for a smart home network. The usage of the CPU and memory is recorded once in 6
seconds.
0 50 100 150 200 250 3000
10
20
30
40
50
60
70
80
90
100
duration(seconds)
CP
U U
sage(%
)
with plug-in
no plug-in
Fig. 6: CPU usage of the implemented plug-in.
DRAFT April 5, 2018
13
As shown in Fig. 6, the usage of CPU is barely increased with an active plug-in. The periodic
pattern of CPU usage is due to heartbeat messages from the cloud to keep active connections.
The sudden drops are due to periodic sleeps of the CPU for power efficiency. In addition to
CPU usage, the memory consumption is around 5 MBytes. Since plug-ins are pre-allocated with
memory buffer, there would be no extra heap space to apply. Therefore, the plug-in does not
increase performance burden to the smart gateway.
VI. CONCLUSION
In this paper, a smart gateway based data collection and awareness plug-in framework is
proposed. By embedding software plug-in into the smart gateway, data collection, awareness and
reporting can be achieved. Moreover, the cloud controller can easily dispatch control policies
and assign specific job to each smart gateway. Furthermore, we defined the MDA framework
to describe the data collected by the smart gateway. The evaluation and experiment with actual
smart home data demonstrated that the proposed platform and MDA scheme can efficiently
collect data and accurately provide data awareness. The performance evaluation demonstrated
that the implemented plug-in is frugal on computing power. In the future work, we will focus
on the improvement of user QoE based on the management and control of smart gateways in
cloud.
REFERENCE
[1] S. Suresh and P. V. Sruthi, “A review on smart home technology,” in 2015 Online International Conference on Green
Engineering and Technologies (IC-GET), Nov 2015, pp. 1–3.
[2] S. Guoqiang, C. Yanming, Z. Chao, and Z. Yanxu, “Design and implementation of a smart iot gateway,” in 2013 IEEE
International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical
and Social Computing, Aug 2013, pp. 720–723.
[3] P. Wang, S. Y. Zhang, and X. J. Chen, “Sfarima: A new network traffic prediction algorithm,” in 2009 First International
Conference on Information Science and Engineering, Dec 2009, pp. 1859–1863.
[4] F. K. Santoso and N. C. H. Vun, “Securing iot for smart home system,” in 2015 International Symposium on Consumer
Electronics (ISCE), June 2015, pp. 1–2.
[5] D. Vavilov, A. Melezhik, and I. Platonov, “Reference model for smart home user behavior analysis software module,” in
2014 IEEE Fourth International Conference on Consumer Electronics Berlin (ICCE-Berlin), Sept 2014, pp. 3–6.
[6] P. Wang and X. Chen, “Co hijacking monitor: Collaborative detecting and locating mechanism for http spectral hijacking,”
in The 2017 IEEE Cyber Science and Technology Congress (CyberSciTech 2017), Oct 2017.
[7] P. Wang, “Big data plug-in technology for smart router based on multidimensional awareness,” Journal of Nanjing University
of Posts and Telecommunications(Natural Science), pp. 18–21, 2016.
April 5, 2018 DRAFT
14
[8] P. Wang, S. Liu, F. Ye, and X. Chen, “A fog-based architecture and programming model for iot applications in the smart
grid,” Under submission, 2017.
[9] P. Wang, S. Zhang, and X. Chen, “A novel reputation reporting mechanism based on cloud model and gray system theory,”
vol. 3, pp. 75–84, 11 2011.
[10] D. Manzaroli, L. Roffia, T. S. Cinotti, E. Ovaska, P. Azzoni, V. Nannini, and S. Mattarozzi, “Smart-m3 and osgi: The
interoperability platform,” in The IEEE symposium on Computers and Communications, June 2010, pp. 1053–1058.
[11] G. Huaiyu, X. Jinsong, and P. Wang, “An associated perception import method for big data,” in Telecommunications
Science, vol. 32, 2016, pp. 130–134.
[12] X. Chen, P. Wang, and S. Liu, “Key technology of ssl encrypted application identification under imbalance of application
class,” in Telecommunications Science, vol. 31, 2015, pp. 83–89.
[13] C. Wright, F. Monrose, and G. M. Masson, “Hmm profiles for network traffic classification,” in Proceedings of the 2004
ACM Workshop on Visualization and Data Mining for Computer Security, ser. VizSEC/DMSEC ’04. New York, NY,
USA: ACM, 2004, pp. 9–15. [Online]. Available: http://doi.acm.org/10.1145/1029208.1029211
[14] A. W. Moore and D. Zuev, “Internet traffic classification using bayesian analysis techniques,” SIGMETRICS Perform.
Eval. Rev., vol. 33, no. 1, pp. 50–60, Jun. 2005. [Online]. Available: http://doi.acm.org/10.1145/1071690.1064220
[15] N. Williams, S. Zander, and G. Armitage, “A preliminary performance comparison of five machine learning algorithms
for practical ip traffic flow classification,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 5, pp. 5–16, Oct. 2006.
[Online]. Available: http://doi.acm.org/10.1145/1163593.1163596
DRAFT April 5, 2018