A Virtual Sensor Network Framework for Vehicle Quality ...bvicam.ac.in/news/INDIACom 2018 Proceedings/Main/papers/134.pdf · on a data driven approach, using machine learning and

Proceedings of the 12th

INDIACom; INDIACom-2018; IEEE Conference ID: 42835

2018 5th

International Conference on “Computing for Sustainable Global Development”, 14th

- 16th

March, 2018

Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

A Virtual Sensor Network Framework for Vehicle

Quality Evaluation

Mohammad Alwadi

Faculty of ESTeM,

University of Canberra,

Canberra, AUSTRALIA

Girija Chetty

Faculty of ESTeM

University of Canberra

AUSTRALIA

Mohammad Yamin

Faculty of Economics and Admin

King Abdulaziz University

SAUDI ARABIA

Abstract—In this paper we propose a novel virtual sensor network

framework for assessing the value and quality of the vehicle based

on a data dri ven approach, using machine learning and data

science techniques. The evaluation of the proposed approach done on two publicly available datasets, showed the capability for

automatic prediction of vehicle quality, based on different

characteristics of vehicle captured at different levels of vehicle

supply chain from manufacturing to the end user.

Keywords—Machine learning; Vehicle Quality, Risk Assessment

I. INTRODUCTION

Artificial Intelligence (AI) and machine learn ing systems have

become a focus of intensive research today, especially in Cyber physical systems area, monitoring large physical

environments, and for tracking environments within the homes, the cars and for different types of indoor and outdoor

spaces. Recently, there has been a tectonic shift happening in the vehicle and automotive industry – moving away fro m

traditional methods of selling cars to the adoption of data driven solutions based on machine learn ing, big data and the

artificial intelligence. As car and ride sharing gain in

popularity, quality aspects in terms of assessing the vehicle quality, the safety and the risks, based on historical data, and

data driven models are gaining increasing importance. Further, the demands for provision of several new services have also

become necessary: such as remote diagnostics, user behavior analysis, automatic owner identification, and many more. As a

result, the car is being transformed into a connected gadget on

wheels, including a browser for the real world, and powerfu l sensor or tracking device for everyone. This has led to need for

the traditional players in vehicle industry, the manufacturers and dealerships, to share the market opportunity with the IT

industry. So, how can the very latest trends in IT — artificial intelligence (AI), machine learn ing, and big data analytics —

help in the search for new data driven models in a changing car

market? Traditionally, vehicles, particularly cars, have been the purely

mechanical transportation solutions, and were not designed to provide any digital services. Over the years, car manufacturers

invested in the improvement of the quality of their engines and chassis, focused on improving safety and productivity. But the

growth of internet technologies and innovative developments that have been achieved in the fields of AI, machine learning,

and big data analytics are fueling the transformation in the auto

industry as well. With these cutting edge technologies , the car

manufacturers have been able to improve the driver

experience, meet customer expectations, enhance safety in terms of ensuring the vehicle is built safe enough to withstand

the safety on road, as well as to provide decision support to the driver in terms of emergency road assistance, during accidents

and vehicle breakdown, as well as allowing remote monitoring

of vehicle for provision of better driver support, navigation, and tracking and surveillance by control centers for law

enforcement agencies. This tremendous improvement in capabilit ies of the vehicles has become possible due to

improvement in hardware, and software technologies, as well as novel data driven approaches to learn from previous

mistakes, in terms of historical data, allowing users and customers to choose a safe and acceptable car.

According to automotive industry forecasts [1], by 2020 there

will be roughly 152 million connected cars on the roads, with each generating up to 30 terabytes of data daily. This massive

data provides immense opportunities using machine learning and data mining algorithms to improve vehicle and driver

safety, preserve customer privacy, and can identify driver behavior patterns, allowing manufacturers to offer services that

uniquely meet specific customer needs on a long term basis,

long after the vehicle has been purchased. For example, with data processing and the car’s management block, it’s possible

to identify when a car is about to break down, before it happens. The car owner can be alerted, and the manufacturer

can proactively contact a service center and register for repairs with a single click [1]. Further, providing value added

connected car services, can bring manufacturers a higher

marginal return compared to selling cars as standalone products. AI, Machine learning and data science algorithms

make it possible to sell each separate service efficiently to each user, and thus to earn more. More important, cars equipped

with connected services is not just a product, but a complete vehicle ecosystem with services, enabling continuous channels

of communication between the manufacturer and the customer.

II. BACKGROUND

The long cycles of automobile development process from manufacturers to dealers/resellers to end uses/customers, and

with availability of massive data stores, logging and tracking different activities in each stage of this process can provide

immense opportunities to achieve efficiencies in the process,

by analyzing this large data in depth using AI, machine learning and big data analysis algorithms, and for enriching the

Copy Right © INDIACom-2018; ISSN 0973-7529; ISBN 978-93-80544-28-1 1416



2018 5th


- 16th

March, 2018

product planning process , in addition to providing clear

direction about the nature of customers and their consumable needs. AI can indicate a good time for changing cars,

understand that a lifestyle has evolved, and can offer drivers a new car already customized for their needs. Big data analysis

and AI algorithms can help manufacturers to forecast customer behavior. Collect ing data from networked and connected cars,

with other subsystems in vehicle manufacturing chain –

including several internal production systems, and analyzing this data with machine learning and data mining algorithms,

can speed up the business processes and decision-making time of car manufacturers. These technologies can help vehicle

manufacturers to be more customer centric, allowing them to choose the right solution and strategy for marketing the cohort

for different demographics and customer profiles, provide better sales and after-sales services by reaching out easily to

end users, and improve the product quality, with continuous

customer feedback. In this paper, we present details of our research work, towards

development of an innovative AI and machine learning technology platform for assessing car/automobile quality based

on historical information corresponding to and solutions based on forecasting user preferences and interests. The

technology platform provides value added services based on

requirements of the end users by automatically understanding their needs, interests, and requirements offer customized

solutions within the vehicle as well as outside the vehicle in monitoring and maintenance of the vehicle, thus creating a

unique and innovative solution of soft sensors seamless throughout the production chain, right from the manufacturer,

to the network of dealers and resellers, up to the end users and

customers. We use a novel formulation of the research problem, by

casting it as a virtual soft sensor network problem, and achieve dynamic node allocation using machine learn ing based

strategy, which optimizes the network nodes participating in the human machine communication subsystem. The optimal

machine learn ing solution is obtained by selecting most significant set of sensors, instead of using large number of

data collecting sensor nodes, used for collecting and storing

data continuously over time since the vehicle has been purchased, to the time it is discarded or change of ownership

happens [2, 3]. This is analogous to sensor networks which has been utilized in numerous applications, some examples of

which are wildlife monitoring [4], military target tracking and surveillance [5], hazardous environment exploration [6], and

natural disaster relief [7]. As many of the soft sensors collect

data continuously run unsupervised for longer periods of time, spanning into months and years, it could build immense

pressure on the energy resources. Therefor we need to design suitable data collection schemes which could cap the quantity

of transmitted data in the virtual soft sensor network. In this article, we propose an effective machine learning based

mechanis m for v irtual soft sensor network (SSN) for the

purpose of monitoring automobile supply chain system, which we claim would be energy efficient. The proposed approach

models the virtual SSN on the lines of wireless sensor networks (WSN), in data transmission networks, and uses an

adaptive routing scheme for energy efficiency. The adaptive

routing scheme eliminates redundant nodes in the SSN, based

on selecting most significant soft sensors for the accurate modeling of the data centric virtual SSN environment, with

historical data collected as the information source, corresponding to vehicle/automobile characteristics, usage

informat ion and the quality assessment, and predicting quality of service parameters, including vehicle quality, safety and

associated risks.

The experimental evaluation of the proposed virtual energy efficient SSN scheme was done with two publicly available car

and automobile datasets , and validates that the proposed scheme provides a good solution for solving complex

communicat ion and interaction aspects between internal and external vehicle systems. The proposed scheme allows

visualizing the virtual SSN similar to tradit ional wireless sensor network (WSN) parad igm, but includes a machine

learning formulation, that can provide better decision support

leveraging the benefits of data centric modelling techniques . We handle the complexity of virtual SSN with a data mining

mechanis m where each virtual sensor treated as an attribute of the data set, and all the virtual sensor nodes together constitute

the SSN set up equivalent to a multip le feature or attributes of the data set. With this, it is possible to dynamically adapt the

sensor nodes participating in the decision making loop, by

using powerful feature selection, dimensionality reduction and learning classifier algorithms from machine learning/data

mining field. In this way, we end up with an energy efficient monitoring/tracking system for assessing several endogenous

(internal) and exogenous(external) aspects of vehicle, such as vehicle quality, safety, and risks based on the historical data

[7]. This amounts to sourcing effective and efficient selection

and classification algorithms. For example, we can acquire an energy efficient solution even with missing and poor quality

noisy data, and sparse and insufficient information. We know that the accuracy of data mining mechanis ms or schemes

depends on the quantity of historical data, which we use for the purpose of forecasting the future state of the environment, the

virtual SSN may learn adaptively, as the quantity of data increases. This would trigger a trade-off between energy

efficiency and prediction accuracy. Thus we have

demonstrated that we can achieve this with an experimental validation of our proposed scheme with two publicly availab le

datasets, the car evaluation data set [9], and the automobile data set [10]. In the next section, we discuss the concept of

virtual soft sensor network, and the proposed scheme. The details of the two datasets used, and classification and

regression algorithms developed for experimental validation is

described in section V. In section VI, we present the details of experimental results obtained, and finally in Section VII , we

present conclude our discussion.

III. VIRTUAL SOFT SENSOR NETWORKS IN AUTOMOBILES

Depending on the model of the car or automobile, there are

more than 100 sensors deployed in a modern vehicle these

days, to measure wear and tear of the brakes, the tire pressure,

temperature and if the person was too close to a car. The focus

of the majority of these sensors is to monitor the state the car

and its safety. Recently, soft sensors have opened new avenues


A Virtual Sensor Network Framework for Vehicle Quality Evaluation

to monitor and strengthen the safety and comfort of the riders.

For example, soft sensors embedded in a car seat can be

utilized to determine how comfortably the riders sit in the

vehicle, clearly exh ibit ing the weight distribution and posture

of the driver or the riders. Moreover, the seats can be

automatically adjusted to the personal liking of the riders, and

to ensure their comfort continuously throughout the journey.

Sensors for safety features like that of airbag can be

dynamically geared toward the individual sitting in the seat —

whether it’s an adult or a child — enabling the car to deploy

the airbag with appropriate pressure and height in the event of

an accident. The two publicly available datasets used in this

study use similar information for monitoring the quality and

safety aspects of a particular type of vehicle based on the data

logged from several vehicles, and can provide decision support

for future enthusiast or the car purchaser in making appropriate

decisions. The details of the two datasets are described next.

A. Car Evaluation Dataset

This publicly available multivariate data set [9], consists of

informat ion about car evaluation, using a single performance

measure, called as the car acceptability metric, derived from

several attributes, including overall price, buying price,

maintenance price, technical characteristics and comfort level

offered, which is represented as the number of doors, the

number of people it can carry, the boot size, and the estimated

car safety. The statistical summary for data set is described in

the Table 1 below. The prediction variable here is the car

acceptability metric, as a 4-class classification problem, in

terms of car quality being of acceptable, unacceptable, good or

very good quality of evaluation of the care quality as

acceptable, unacceptable good or very good, using the

attributes (aka soft sensor values), including buying and

maintenance price values, technical and comfort level

measurements and safety assessment metric. Since each of the

soft sensor attributes are not physically located or connected in

hard sense, similar to how a physical wireless sensor network

with hard sensors, what we have here is a virtual sensor

network, consisting of soft sensor nodes. We model this

network with a machine learning formulat ion, and obtain the

prediction of car quality based on the data collected from

several such vehicles and different attributes. Table 1 shows

the structure of car evaluation dataset (Dataset 1), and class

distribution (instances per class) is shown in Table 2.

TABLE I. CAR EVALUATION DATA SET

Car evaluation data set description

Buying

Maint Doors

Persons Lug_boot. Safety Car

Quality/Price

vhigh high

med low

vhigh high

med low

2 3

4 5

more

2 4

more

small med

Big

low med

high

unacc acc

good vgood

TABLE II. CLASS DISTRIBUTION (NUMBER OF INSTANCES PER CLASS

Class N N[% ]

unacc 1210 (70.023 %)

acc 384 (22.222 %)

good 69 (3.993 %)

v-good 65 (3.762 %)

This dataset is highly imbalanced with large instances for one

of the class (unacc), as compared to other class instances.

B. Automobile data set

This is the second publicly available data set used in the study,

and consists of three types of entities namely, (a) the

specification of the auto in terms of various characteristics, (b)

its assigned insurance risk rating, (c) its normalized losses in

use as compared to other cars. The second rating corresponds

to the degree to which the auto is more risky than its price

indicates. Cars are init ially assigned a risk factor symbol

associated with its price. Then, if it is more risky (or less), this

symbol is adjusted by moving it up (or down) the scale. This

"symboling" process assigns a value of +3 indicating that the

auto is risky, and -3 that it is probably pretty safe.

TABLE III. AUTOMOBILE DATA SET

Attribute number

Type Attribute number

Type

1 Symboling 7 Body style 2 Normalized losses 8 drive wheels

3 Make 9 engine location 4 Fuel Type 10 Wheel base

5 Aspiration 11 Length

6 Number of doors 12 Width 13 Height 20 Stroke

14 Curb weight 21 Compression-

ratio

15 Engine type 22 Horsepower 16 Number of cylinders 23 Peak rpm

17 Engine size 24 city mpg

18 Fuel system 25 highway mpg 19 Bore 26 Price

The third factor is the relative average loss payment per

insured vehicle year. This value is normalized for all autos

within a part icular size classification (two-door small, station

wagons, sports/specialty, etc...), and represents the average

loss per car per year. This data set is a regression task dataset

that is sparse, with large set of attributes (26 attributes) and

few instances available for machine to learn (just around 205

instances). Further, there is lot of missing data for several of

the instances. Table 3 shows the structure of the automobile

data set. This is a regression dataset with Price of the vehicle

as the output regression variable, to be predicted from 24

different vehicle attributes.

Both the datasets are of different levels of complexity

(classification vs. regression), with different type of attributes,

different number of attributes and size of data availab le. The

proposed virtual soft sensor network characterization of the

problem, and the use of machine learning approach to learn the

historical informat ion, it is possible to develop a data driven




2018 5th


- 16th

March, 2018

decision support model, in spite of the complex and poor

quality information, including class imbalance and sparsity.

The combined machine learning based virtual SSN strategy

allows leveraging the benefits of both technologies to predict

the output, here, the car quality, price, safety and risk

associated, even with incomplete, sparse and imbalanced

informat ion available. Next section discusses the algorithms

used for the proposed study.

IV. ALGORITHMS USED FOR THE PROPOSED STUDY

We examined two different sets of learning algorithms for

Dataset 1 (Car evaluation Dataset) and Dataset 2 (Automobile

dataset). For Dataset 1 (Car Evaluation Dataset), six different

classification algorithms were examined in this work,

including Naive Bayes, Lazy learn ing (kNN), Logistic

learning, Bagging with Random Forest as the base learner, J48

(decision tree) classifier, and CV Parameter selection (cross

validation parameter selection) with random forest classifier as

the base learner. We used a stratified cross validation with

different folds for examining different classifier algorithms.

For Dataset 2 (automobile dataset), being a regression task,

five different regression learning algorithms were examined,

including linear regression, Random Forest learner, CV

Parameter Selection, multilayer perceptron and the Support

vector regression with regularized optimizer and polynomial

kernel. As this dataset has several missing values, in addition

to large attributes (26 attributes) with small data size (205

instances), we used preprocessing algorithms including

standardization and resampling, in addition to different feature

selection algorithms to reduce the dimensionality, such as

correlation based feature selection with two different search

strategies, best first fit strategies, and greedy forward and

backward search strategies. Further details of each of these

algorithms are available in [3] and [9]. Next Section discusses

the experimental results achieved for each set of experiments .

V. EXPERIMENTAL RESULTS

Different sets of experiments were performed to examine the

relative performance of classification and regression learning

algorithms of the proposed vSSN learn ing framework. We

used k-fold stratified cross validation technique with different

folds for performing experiments, with k=10 and k=5. For

regression learning, since the data size was too small, we used

full train ing dataset for examining the baseline benchmark

performance measures. For Dataset 1, as the attributes

available were few, we did not use feature selection stage for

extracting most significant features. As can be seen Table 4

and Figure 1 below, for Dataset 1, it was possible to achieve

96% car quality predict ion accuracy (as acceptable,

nonacceptable, good and very good) based on 6 attributes (or

soft sensor information).

TABLE IV. CAR EVALUATION PREDICTION ACCURACY (UNACC, ACC, VGOOD AND GOOD)

Classifier Algorithm 10 Fold CV 5 fold CV ZeroR 70.02% 70.02%

Naïve Bayes 77.31% 76.85%

Classifier Algorithm 10 Fold CV 5 fold CV Lazy learning (kNN) 80.38% 80.72%

Logistic Classifier 82.35% 82.46% Bagging (Random Forest learner) 96.4% 96.6%

J48 (Decision Trees) 96.35% 95.83% CVParameter Selection 96.64% 95.5%

TABLE V. CAR EVALUATION PERFORMANCE MATRIX

Further, the data size for building the models, and the class

imbalance did not impact the performance as the prediction

performance shown in Table 4 for k= 10 and k = 5 folds is

almost similar. The three best performing classifiers are J48

(Decision Trees), Bagging Classifier with Random Forest

algorithm as the base learner and the CV parameter selection

classifier, which have a prediction accuracy higher than 95%,

in addition to better performance in terms of other metrics such

as confusion matrix, true positive and false positive rates,

precision and recall. Table 5 shows these metrics for one of the

best performing classifier.

Fig. 1. Results from car evaluation dataset

For Dataset 2, being a regression learning task, we used %

RMSE as the evaluation metric, and we derive p rediction

accuracy as (100- %RMSE). As since the size of the data


A Virtual Sensor Network Framework for Vehicle Quality Evaluation

available was too small, we used 3 fold CV for build ing the

model, instead of 5 and 10 folds, and shown the performance

achieved when full training set was used for build ing the

model, just as a baseline benchmark performance measure.

TABLE VI. % RMSE FOR PRICE PREDICTION FOR DATASET 2

Algoritm Full

Training with all features

3 Fold CV (

with all 26 features)

3 fold CV with

BestFirst FeatureSelect (8 features )

3 fold CV with

GreedySearch FeatureSelect (5 features)

Linear Regression

3.69 7.19 7.19 7.37

Random Forest

0.75 2.03 2.38 2.42

Bagging 0.58 2.23 2.74 2.88

CVParam

Selection

0.72 1.97 2.40 2.43

MLP 2.13 3.19 5.44 6.03 SMOReg

(Poly)

4.37 5.20 7.93 8.24

Table 6 below shows the results of % RMSE (Root Mean

Square Error) achieved from the second dataset, the

Automobile data set, with %RMSE between 0.58 and 8.24.

The reason for the results to be in %RMSE, not as % accuracy

was because the Automobile data set was regression tasks not

a classification task. However, by subtracting the % RMSE out

of 100 (100- %RMSE) we can compare the performance

achieved for each algorithm across different datasets.

Fig. 2. Autombile data set experiments

As can be seen in Table 6, the performance achieved with two

different soft sensor (feature) selection algorithms (BestFirst

Search and Greedy Search) is comparable to 3 fold CV results.

However, instead of all 26 virtual sensors (attributes), with

BestFirst feature selection, only 8 virtual sensors are needed

and with Greedy Search only 5 virtual sensors are needed for

achieving similar performance. So, we could achieve an energy

efficiency by a factor of 26/8 = 3.25 and 26/5 = 5.2 for the two

different automatic sensor selection strategies used.

VI. CONCLUSION

In this paper we propose a novel virtual sensor network

framework based on data driven formulation for assessing the

vehicle price, and quality. The experimental validation of the

proposed framework, based on two publicly available datasets,

and different classification and regression learning algorithms,

showed promising results, and provides several opportunities

for better connection and communication in automobile supply

chain ecosystem, with a data driven strategy for providing

value added services.

REFERENCES

[1] Automative. Decisions fueled by insight, [Online], Last Accessed on 1/11/2017 from https://www.ihs.com/industry/automotive.html

[2] Ping, S., Delay measurement time synchronization for wireless sensor networks. Intel Research Berkeley Lab, 2003.

[3] Hall, M., et al., The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009. 11(1): p. 10-18.

[4] Csirik, J., P. Bertholet, and H. Bunke. Pattern recognition in wireless sensor networks in presence of sensor failures. 2011.

[5] Nakamura, E.F. and A.A.F. Loureiro, Information fusion in wireless sensor networks, in Proceedings of the 2008 ACM SIGMOD international conference on Management of data2008, ACM: Vancouver, Canada. p. 1365-1372.

[6] Bashyal, S. and G.K. Venayagamoorthy. Collaborative routing algorithm for wireless sensor network longevity. 2007. IEEE.

[7] Richter, R., Distributed Pattern Recognition in Wireless Sensor Networks, 2008, [Online], Last accessed on November1, 2017 from https://www.semanticscholar.org/paper/Distributed-Pattern-Recognition-in-Wireless-Sensor-Richter/d889fc994f21c0dad4eba693556de67ab1bf0e2b?tab=references

[8] Alwadi, M. and G. Chetty, Energy Efficient Data Mining Scheme for High Dimensional Data, Biodiversity Environment. Procedia Computer Science, Volume 46, 2015, Pages 483-490.

[9] Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

[10] B. Zupan, M. Bohanec, I. Bratko, J. Demsar: Machine learning by function decomposition. ICML-97, Nashville, TN. 1997.

[11] Kibler, D., Aha, D.W., & Albert,M. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence, Vol 5, 51—57


http://www.sciencedirect.com/science/journal/18770509/46/supp/C

Documents

A Virtual Sensor Network Framework for Vehicle Quality ...bvicam.ac.in/news/INDIACom 2018 Proceedings/Main/papers/134.pdf · on a data driven approach, using machine learning and