View
9
Download
0
Category
Preview:
Citation preview
Reliability, November 2016
I
Reliability Society Administrative Committee (AdCom) Members
Officers (Excom)
President
Christian Hansen
Sr. Past President
Jeffrey Voas
Jr. Past President
Dennis Hoffman
Vice Presidents
Technical Activities
Shiuhpyng Winston Shieh
Publications
Jeffrey Voas (acting)
Meetings and Conferences
Carole Graas
Membership
Joe Childs
Secretary
Pradeep Ramuhalli
Treasurer
Bob Loomis
Elected Members-at-Large (with vote)
TERM EXPIRES 2016
(DEC 31)
TERM EXPIRES 2017
(DEC 31)
TERM EXPIRES 2018
(DEC 31)
Lou Gullo
Christian Hansen
Pradeep Lall
Zhaojun (Steven) Li
Bob Loomis
Pradeep Ramuhalli
Joseph A. Childs
Pierre Dersin
Lance Fiondella
Carole Graas
Samuel J. Keene
W. Eric Wong
Scott Abrams
Evelyn H. Hirt
Charles H. Recchia
Jason W. Rupe
Alfred M. Stevens
Jeffrey Voas
Standing Committees and Activities/Initiatives
Web Master
Lon Chase
Standards
Lou Gullo
Chapters Coordinator
Loretta Arellano
Professional Development
Marsha Abramo
Constitution and Bylaws
Dennis Hoffman
Fellows
Sam Keene
Finance
Christian Hansen
Academic
Education/Scholarship
Sam Keene
Meetings Organization
Alfred Stevens
Membership Development
Marsha Abramo
Transactions Editor
Eric Wong
Newsletter Editor
Lon Chase
Video Tutorials
Christian Hansen
Nominations and Awards
Jeffrey Voas
Life Science Initiative
Peter Ghavami
Transportation
Electification
Sam Keene
Pradeep Lall
Michael Austin
IEEE Press Liaison
Dev Raheja
IEEE Government Relations Committees
Energy Policy
Jeffrey Voas
Transportation and Aerospace Policy
Scott Abrams
Medical Technology Policy
Jeffrey Voas
Critical Infrastructure Protection
Sam Keene
Career and Workforce Policy
Christian Hansen
(corresponding only)
Intellectual Property Committee
Carole Graas
(corresponding only)
Research and Development Committee
Pradeep Lall
Combined TAB Ad Hoc Committee on
Attracting Industrial Members
Dennis Hoffman
2013 TAB Awards and Recognition
Committee (TABARC)
Dennis Hoffman
Reliability, November 2016
II
Technical Committees
Vice President for Technical Activities
Shiuhpyng Winston Shieh, National Chiao Tung University
Email: ssp@cs.nctu.edu.tw
Technical Committee on Internet of Things (IoT)
Chair: Jeffrey M. Voas, National Institute of Standards and Technology
Email: jeff.voas@nist.gov
Co–chair: Irena Bojanova, National Institute of Standards and Technology
Email: irena.bojanova@nist.gov
Committee Member:
1. George F. Hurlburt: CEO of Change Index
Technical Committee on System and Software Assurance
Chair: Eric Wong, University of Texas at Dallas
Email: ewong@utdallas.edu
Technical Committee on Prognostics and Health Management (PHM)
Chair: Rex Sallade, Sikorsky Aircraft Co.
Email: Rex.Sallade@SIKORSKY.COM
Co–chair: Pradeep Lall, Auburn University
Email: lall@eng.auburn.edu
Technical Committee on Big Data
Chair: David Belanger, Stevens Institute of Technology
Email: david.belanger@stevens.edu
Technical Committee on Trustworthy Computing and Cybersecurity
Chair: Wen-Guey Tzeng, National Chiao Tung University
Email: wgtzeng@cs.nctu.edu.tw
Co–chair: Yu-Lun Huang, National Chiao Tung University
Email: ylhuang@cn.nctu.edu.tw
Committee Member:
1. Raul Santelices: Assistant Professor, Department of Computer Science and Engineering, University of Notre Dam
2. Brahim Hamid: Associate Professor, IRIT Research Laboratory, University of Toulouse, France
3. Yu-Sung Wu: Assistant Professor, Department of Computer Science, National Chiao Tung University, Taiwan
Technical Committee on Reliability Science for Advanced Materials &
Devices
Chair: Carole Graas, Colorado School of Mine & IBM Systems and Technology Group
Email: cdgraas@mines.edu
Technical Committee on Systems of Systems
Chair: Pierre Dersin, Alstom Transport
Email: pierre.dersin@transport.alstom.com
Reliability, November 2016
III
Technical Committee on Resilient Cyber-Systems
Chair: Pradeep Ramuhalli, Pacific Northwest National Laboratory
Email: pradeep.ramuhalli@pnnl.gov
Technical Committee on Cloud Computing, SDN and NFV
Co–chair: Chih-Wei Yi, National Chiao Tung University
Email: yi@cs.nctu.edu.tw
Co–chair: Jason W. Rupe, Polar Star Consulting
Email: jrupe@ieee.org
Committee Member:
1. Li-Ping Tung, National Chiao Tung University
Technical Committee on Power and Energy
Chair: Shiao-Li Tsao, National Chiao Tung University
Email: sltsao@cs.nctu.edu.tw
Technical Committee on Electronic and Computer System Reliability
Chair: Terngyin Hsu, National Chiao Tung University
Email: tyhsu@cs.nctu.edu.tw
Committee Member:
1. Kai-Chiang Wu: Assistant Professor, Department of Computer Science, National Chiao Tung University, Taiwan
Standards Committee
Chair: Louis J Gullo, IEEE RS Standards
Email: Lou.Gullo@RAYTHEON.COM
Committee Member:
1. Ann Marie Neufelder: Softrel, LLC – Owner; IEEE P1633 Standard Working Group Chair
2. Lance Fiondella: Assistant Professor, Dept. of Electrical and Computer Engineering, University of Massachusetts
Dartmouth; IEEE P1633 Standard Working Group Vice Chair
3. Steven Li: Assistant Professor, Industrial Engineering and Engineering Management, Western New England
University; IEEE P61014 Standard Working Group Chair
4. Diganta Das: Research Staff at the Center for Advanced Life Cycle Engineering, University of Maryland
5. Sony Mathews: Engineer at Halliburton, IEEE P1856 Standard Working Group Chair
6. Mike Pecht: Chair Professor and Director of Center for Advanced Life Cycle Engineering, University of Maryland
7. Arvind Sai Sarathi Vasan: Research Assistant at Center for Advanced Life Cycle Engineering , University of
Maryland; IEEE P1856 Standard Working Group Vice-Chair
8. Joe Childs: Staff Reliability/Testability Engineer at Lockheed Martin
Working Group on Education
Chair: Zhaojun (Steven) Li, Western New England University
Email: zhaojun.li@wne.edu
Committee Member:
1. Emmanuel Gonzalez, Jardine Schindler Elevator Corporation
Reliability, November 2016
IV
Editorial Board
Editor-in-Chief
Shiuhpyng Winston Shieh,
National Chiao Tung University
Email: ssp@cs.nctu.edu.tw
Area Editors
Jeffrey M. Voas, Internet of Things (IoT)
National Institute of Standards and Technology
Email: jeff.voas@nist.gov
Irena Bojanova, Internet of Things (IoT)
National Institute of Standards and Technology
Email: irena.bojanova@nist.gov
Eric Wong, System and Software Assurance
University of Texas at Dallas
Email: ewong@utdallas.edu
Rex Sallade, Prognostics and Health Management (PHM)
Sikorsky PHM
Email: Rex.Sallade@SIKORSKY.COM
Pradeep Lall, Prognostics and Health Management (PHM)
Auburn University
Email: lall@eng.auburn.edu
David Belanger, Big Data
Stevens Institute of Technology
Email: david.belanger@stevens.edu
Wen-Guey Tzeng, Trustworthy Computing and Cybersecurity
National Chiao Tung University
Email: wgtzeng@cs.nctu.edu.tw
Yu-Lun Huang, Trustworthy Computing and Cybersecurity
National Chiao Tung University
Email: ylhuang@cn.nctu.edu.tw
Carole Graas, Reliability Science for Advanced Materials & Devices
Colorado School of Mines
Email: cdgraas@mines.edu
Pierre Dersin, Systems of Systems
Alstom Transport
Email: pierre.dersin@transport.alstom.com
Reliability, November 2016
V
Pradeep Ramuhalli, Resilient Cyber-Systems
Pacific Northwest National Laboratory
Email: pradeep.ramuhalli@pnnl.gov
Chih-Wei Yi, Cloud Computing, SDN and NFV
National Chiao Tung University
Email: yi@cs.nctu.edu.tw
Jason W. Rupe, Cloud Computing, SDN and NFV
Polar Star Consulting
Email: jrupe@ieee.org
Shiao-Li Tsao, Power and Energy
National Chiao Tung University
Email: sltsao@cs.nctu.edu.tw
Terngyin Hsu, Electronic and Computer System Reliability
National Chiao Tung University
Email: tyhsu@cs.nctu.edu.tw
Editorial Staff
Zhi-Kai Zhang
Assistant Editor
Email: skyzhang.cs99g@g2.nctu.edu.tw
Hao-Wen Cheng
Assistant Editor
Email: chris38c28@gmail.com
Reliability, November 2016
VI
Regular Issue
1. Effects of Sampling Rate on the Accuracy of the Gas
Turbine Performance Deterioration Modeling
Houman Hanachi, Jie Liu, Avisekh Banerjee, and Ying Chen
2. Applying Genetic Algorithm to Rearrange Cloud Virtual
Machines for Balancing Resource Allocation
Yu-Lun Huang and Zong-Xian Li
1-6
7-12
3. Resolving Malware-Loading Problem by Leveraging
Shellcode Detection Heuristics
Michael Cheng Yi Cho, Zhi-Kai Zhang, and Shiuhpyng Shieh
13-19
Reliability, November 2016
Effects of Sampling Rate on the Accuracy of the Gas Turbine Performance Deterioration Modeling Houman Hanachi Queen’s University houman.hanachi@queensu.ca Jie Liu Carleton University jliu@mae.carleton.ca Avisekh Banerjee Life Prediction Technologies Inc. banerjeea@lifepredictiontech.com Ying Chen Life Prediction Technologies Inc. cheny@lifepredictiontech.com
Abstract - Logging the operating parameters of a gas turbine
engine (GTE) is essential for the real-time performance
monitoring and future health state prediction. Sampling rate of
the operating data is restricted to the available resources,
especially when they are logged manually by the operators. In
some recent research works, the authors have introduced a
physics-based approach to develop unified indicators for gas
turbine performance monitoring. “Heat Loss index” and
“Power Deficit index” were two of the indicators introduced
to provide metrics for the health state of the gas turbines using
the logged data from the engine control system. In this paper,
prediction models are established on the time series of the
indicators, and the modeling accuracy is investigated by
considering with the sampling rate and the time window
within which the model is established. As a result, this paper
provides an insight into the uncertainty of the performance
prediction model, with respect to the sampling rate and the
length of the time window.
Keywords - gas turbine health monitoring; performance
deterioration; modeling uncertainty; sampling rate; sampling
decimation.
I. INTRODUCTION
Modern approaches for health management of the gas
turbine engine (GTE) tend to account for the real-time health
condition of the machine for maintenance decisions, which is
called condition based maintenance (CBM). Such strategies
call for unscheduled maintenance actions for imminent failures,
and prevent replacing the healthy parts based only on
prescheduled maintenance tables. In this way CBM improves
the reliability and availability of the GTE, and reduces the
maintenance costs using the actual health information from
diagnostic and prognostic analyses [1].
Prognosis of the future performance deterioration of the
GTE, variation of the GTE performance needs to be
investigated by trending, for which different statistical and
inference techniques can be used. Residuals are generated,
which are the differences between the measurements and the
predicted value by the model. A well fitted modeling curve can
be achieved by minimizing a length of the residuals [2]. For a
prediction model fitted on numerical data, the rate of data
sampling is a key factor for the modeling accuracy, which
needs to be carefully investigated. In the case of the GTE
performance, the short-term performance is expected to
deteriorate steadily, which implies that the performance
prediction model has a non-periodic nature. As a result, the
minimum sampling frequency cannot be defined by the
Nyquist-Shannon sampling theorem [3].
In recent research work [4, 5], physics-based approach has
been taken by the authors to develop performance indicators to
provide metrics for the health level of the single shaft GTEs for
diagnostics. Two of the introduced performance indicators are
“heat loss index” (HL) and “power deficit index” (PD), which
showed a promising robustness to the variations of the
operating conditions. HL is defined the normalized difference
between the measured exhaust gas temperature (EGT) and the
ideal EGT, i.e., the EGT calculated given the measured
operating conditions. PD is defined the normalized difference
between the ideal power and the measured power of the GTE.
Both the indicators are calculated using the measurements
obtained from the GTE control system. Therefore, these
indicators are time series signals with the same time index of
the measurements. The variation trends of the indicators are
then modeled by polynomial curve fitting [6], which is well
accepted for performance deterioration modeling [7].
When a model is established using the sampled data, the
prediction accuracy of the model will depend on the sampling
rate. Given a particular statistical characteristic for the
introduced performance indicators, the corresponding sampling
rate should be investigated in regards to the required accuracy
of the performance prediction. The dependency of the
modeling accuracy to the sampling rate of the indicators is
further investigated in this research. The study is intended to
figure out the required data logging frequency from the GTE
1
Reliability, November 2016
control system. The results will help the GTE operators to
optimize the logging periods as well as help in subsequent
performance monitoring. The results can also be utilized for
state estimation frameworks using sequential hybrid filters,
where the modeling uncertainty is a necessary parameter.
II. UNIFIED PERFORMANCE INDICATORS
The control system of the GTE provides measurements on
the gas path parameters of the machine, which are dependent
on the GTE operating conditions. Therefore, there is no single
parameter which can individually provide a measure on the
GTE performance. Having a unified performance indicator that
provides the health level of the machine in a single scalar can
be much beneficial. Given the measurements available from
the GTE control system, dependency of the measured
parameters can be modeled by thermodynamic modeling of the
GTE gas path, what can provide an insight into the health
status of the GTE.
A. Available Operating Data
The formulation of the GTE thermodynamic model
depends on the available turbine design data, as well as input
and output parameters. The gas turbine in this study is a single
shaft 5 MW GTE, which was in service in a local power plant
for 38 months. The basic technical data of the GTE are
provided in Table I. The operating parameters were measured
once every two hours during the entire operating period. There
are 19 readings from the cycle parameters, which can be used
for thermodynamic modeling. For the same period, the ambient
conditions like air pressure and relative humidity of air were
also acquired from the historical weather records for the plant
location. After performing sensitivity analysis for developing
GTE performance model, three ambient conditions, i.e., inlet
air temperature, pressure and relative humidity ( and ),
and three operating parameters, i.e., shaft speed ( ), and
power ( ) were selected. The Fig. 1 shows variations of the
operating data during the operating period. The gaps in the
plots correspond to the GTE down time. It can be concluded
from the plots that the health state of the GTE cannot be readily
acquired from an individual parameter.
B. Introducing the Performance Indicators
The only available measurements to define the performance
indicators were presented in the previous section. In a recent
research work, a comprehensive nonlinear model for the single
shaft GTEs using humid air as the working fluid was
constructed [8]. The model combines the submodules of the
compressor, combustion chamber(s) and the turbine sections,
using heat balance and mass conservation principles. The
mathematical forms of the model to predict power and EGT are:
TABLE I. BASIC TECHNICAL SPECIFICATIONS OF THE GTE
Parametera Value
Nominal power ( ) 5 (MW)
Shaft speed ( ) 16500 (RPM)
Thermal efficiency ( ) 30.2 (%)
Pressure ratio ( ) 14.0
Exhaust gas temperature ( ) 785 (K)
a: On design data points
Fig. 1. Variations of the ambient conditions (blue), and the operating
parameters (brown): (a) intake air temperature; (b) intake air pressure; (c)
relative humidity; (d) shaft speed; (e) power and (f) EGT.
where is the specific humidity of the intake air and is the
fuel to air mass ratio. is a function of at a given
temperature and pressure, and it was shown that can be
eliminated from the equations by taking into account the
measurements on EGT and power in (1) and (2) respectively.
As a result, the model equations reduce to:
Once the model is calibrated with the design parameters of a
GTE, (3) and (4) predict the expected values for the power and
the EGT in the ideal brand new condition. During the GTE
operation, internal conditions of the machine deviate from their
ideal level and the performance gradually deteriorates.
Consequently, the model prediction will no more fit the
measurements. At a given power, the higher performance
deterioration accounts for the more energy waste through a
warmer exhaust gas. Similarly, at a given EGT, a lower power
is expected from a degraded GTE, compared to that of a brand-
new GTE. Using this concept, two robust performance
indicators were introduced [4] to quantify the level of the
performance deterioration. The heat loss index is defined the
difference between the measured EGT and the model
prediction ( ), normalized by the EGT at the design point
( ):
The power deficit index is defined the shortage of the measured
power from the model prediction ( ), normalized by the
nominal power ( ):
Fig. 2 shows the processes through which and are
calculated from the GTE measurements.
2
Reliability, November 2016
C. Model Implementation Results
To investigate the GTE performance deterioration, the
established model in Fig. 2 is used at each time step, and as a
result, it creates a time series for each performance indicator.
Variations of and throughout the operating time are
plotted in Fig. 3. There are two distinct trends in the plots; a
short-term sequential saw-tooth trend with gradual growths and
sudden falls, and an overall slow growth during the entire
operating time. The short-term trends in the indicators reflect
the deterioration in the performance, which has recovered at
the end of each segment.
The times of the sudden falls were checked with the GTE
service logs, which revealed they are coincident with the dates
of the compressor washes. There are seventeen distinct
segments observable between consecutive compressor washes,
which are given in Table II. As an example, Fig. 4 shows
in the 5th time segment. To predict the performance as a
function of time, trend modeling techniques, such as curve
fitting can be utilized in each time segment.
III. PERFORMANCE PREDICTION MODELING
A. Performance Trend Modeling by Curve Fitting
In curve fitting, there are always residuals, which are the
differences between the data points and the predicted values by
fitted curve. The root-mean-square residual (RMSR) provides
the fraction of the standard deviation of the data that is
explained by the fitted curve, and it can be used as a measure
of accuracy to compare forecasting errors of different models
for a particular variable, though not between different variables,
because of its dependency on the scale [9].
Given a set of data values with population, the fitted
model estimates the values by , which creates the residuals
. The RMSR of the fitted model is:
To utilize RMSR of a model for comparing modeling errors
between different variables, the scale of the variables should be
Fig. 2. The process flow to find and from the GTE measurements.
removed. As a result, the coefficient of variation of the RMSR
is defined the RMSR normalized by the mean of the variable.
The best-fit curve is defined as a model, whose parameters
are set such that the corresponding mean squared residual
reaches to its global minimum. As a result, for a given type of
function for the fitted curve, the modeling error increases if the
Fig. 3. Variations of the indicators:(a) HL; (b) PD.
Fig. 4. The best-fit curves evolve upon new observations.
TABLE II. SEGMENTS OF SHORT TERM PERFORMANCE VARIATIONS
Segment Period (hours) Segment Period (hours)
1 2 – 1132 10 17112 – 18782
2 1150 – 2872 11 18786 – 19524
3 4128 – 5002 12 19606 – 21122
4 5012 – 6672 13 21136 – 22128
5 6714 – 9700 14 22146 – 23434
6 9746 – 11098 15 23444 – 25790
7 11158 – 13572 16 25900 – 27082
8 13584 – 15748 17 27382 – 27612
9 15786 – 17104
curve deviates from the best-fit. For the short-term
performance deterioration between two consecutive washes,
the performance indicators gradually increase. To fit prediction
models on the time series of the indicators, polynomial curves
with different degrees were examined. Eventually, the third-
degree polynomial curve was adopted, which comparatively
showed smaller modeling errors.
3
Reliability, November 2016
It is known from the Bayesian inference that a subjective
degree of belief will update to account for the newly observed
evidences [10]. In this application, it means: given the indicator
time series in an observation window and the corresponding
best-fit model, the model will no more remain as the best
estimator, if new data are added to the observation window.
Assuming the length of the observation window between the
last compressor wash and the present time, the expected
indicators and are estimated by the best-fit curve on the
available data. When time elapses, expands and more data
become available, what accounts for a new curve fitting. As a
result, when the accuracy of the fitted model is subject to study,
the length of the time window is an effective factor to be
investigated. Fig. 4 shows the difference of the models, set on
data in different observation windows in the 5th time
segment.
B. Change of Sampling Rate
It is believed that higher sampling rate provides more
updated data and a better understanding of the real-time state of
a system. However, it accounts for more resources, especially
when the labor is involved. When the sampling rate decreases,
variations of the measured parameter between two
measurements will be missed, and the corresponding fitted
curve may deviate from the best-fit, which could have been
fitted on potentially more samples. For a set of data with read
out frequency of , the decimation factor of stands for sub-
sampling with the frequency of . Therefore, with
regards to the original data points, points will be lost
between two consecutive sub-sampling, which may have
contained useful information about the variation trend. At the
same time, sub-sampling can have started from any of the 1st,
2nd,… or th time steps, which leads to independent subsets
for the sub-sampled data points, i.e., …, where … .
Fig. 5 depicts the effects of data decimation and the choice
of the sub-sampled subset on the resulting fitted model. It
shows the fitted models in the observation window of the first
960 hours (40 days) in the 5th time segment for . The orange
bold line is the best-fit model on the entire record with = 1/2
hr-1,within the time window. Then data is sub-sampled by
the frequencies 1/12hr-1. The green dash-dot line shows the
fitted curve on the 1st subset and the purple dashed line is fitted
on the 4th subset. Deviation of the fitted model from that of the
original data indicates the dependency of the modeling
accuracy on the sampling rate, and takes place in the cost of
losing accuracy with lowering the sampling rates.
C. Modeling Error
As explained, sampling with the frequency of creates
decimated subsets. In the observation window of , we
define
the set of estimated values by curve fitting on the
th subset. For such subset, there will be a modeling residual
at each time step t, and the modeling error is:
All decimated sub-samples have equal chances to be picked up.
Therefore, the expected modeling error regardless of a specific
decimated subset will be:
The expected modeling error is independent from both
the population size and the scale. As a result, it can be used to
monitor the error when the observation window expands, the
decimation factor changes, and even when it is intended to
compare the modeling errors between two different indicators.
IV. RESULTING ERRORS AND ANALYSIS
Equation (10) shows the modeling error as a function of the
sampling period and length of the observation window, which
is depicted in Fig. 6 for selected time segments between
compressor washes. It is seen that for both HL and PD, the
modeling error steadily increases with the growth of the
sampling period. At the same time, when the observation
window expands, the modeling error becomes less dependent
on the length of the window, and it gradually stabilizes.
Fig. 7 shows the variation of the average modeling errors
for all the 17 time segments between consecutive compressor
washes. It can be seen that modeling errors vary similarly for
both indicators HL and PD. It means, there is no tangible
difference between the time models established on either data
to predict the indicator values. The plots show that the
modeling errors are dependent on the observation window
length, when the length is short. For the windows longer than
Fig. 5. Sampling rate affects the established model.
480 hours, the errors practically do not change any more, if the
sampling frequency is fixed. At the same time the modeling
errors are highly dependent on the sampling frequency, i.e.,
low sampling frequencies generate large errors, while the
higher sampling frequencies lead to smaller errors. This
dependency is more significant especially when the
observation window is short. For instance, for hours,
with sampling frequency of less than 0.06 hr-1, the modeling
errors exceed 0.10, and the errors fall below 0.02, when the
sampling frequency rises above 0.3 hr-1. It is interesting that,
for the sampling frequencies more than 0.08 hr-1, the modeling
errors become independent from the observation window
length. The modeling errors are defined the normalized
4
Reliability, November 2016
standard deviation of the residuals. Given a normal distribution
for the residuals, means that about 68% of the
observations are predicted by the mode within a tolerance of ±2%
of the average index value. If the model is built on the data
with 0.03 hr-1 sampling frequency, the mentioned errors reach
to 0.07. It means that the tolerance band should expand over 7%
of the average index in order to include the same 68% of the
observations.
V. CONCLUSION
In this paper, the effect of sampling frequency on the error
of the prediction models for and indicators is studied. It
is quantitatively shown that increasing the sampling frequency
improves the modeling accuracy. It is also shown that
establishing the model based on larger number of data points
improves the modeling accuracy, however, the improvement is
limited, i.e., after a certain number of data points, the accuracy
will not increase any further. The results of the research work
can be utilized to optimize the sampling frequency of the
measured data if a certain level of accuracy is desired for
performance monitoring models. The study also provides a
measure for model prediction uncertainty, which is necessary
when the model is employed in sequential data-model fusion
frameworks for state estimation [11].
Fig. 6. Dependency of the modeling error ( ) on the length of observation
window (T) and sampling period ( ): HL modeling error in segments 9 (a)
and 10 (b); PD modeling error in segments 9 (c) and 10 (d).
Fig. 7. Average modeling errors show similar behavior: (a) HL model error;
(b) PD model error.
ACKNOWLEDGMENT
This project was funded and supported by the Natural
Sciences and Engineering Research Council (NSERC) of
Canada, and Life Prediction Technologies Inc. (LPTi), Ottawa,
Canada.
REFERENCES
[1] A. K. Jardine, D. Lin and D. Banjevic, "A review on
machinery diagnostics and prognostics implementing
condition-based maintenance," Mechanical Systems and
Signal Processing, vol. 20, pp. 1483-1510, 2006.
[2] M. Li, "Minimum description length based 2D shape
description," in Computer Vision, 1993. Proceedings.,
Fourth International Conference On, 1993, pp. 512-517.
[3] C. E. Shannon, "Communication in the presence of
noise," Proceedings of the IRE, vol. 37, pp. 10-21, 1949.
[4] H. Hanachi, J. Liu, A. Banerjee, Y. Chen and A. Koul, "A
Physics-Based Modeling Approach for Performance
Monitoring in Gas Turbine Engines," Transactions on
Reliability, vol. PP, 2014.
[5] H. Hanachi, J. Liu, A. Banerjee, Y. Chen and A. Koul, "A
physics-based performance indicator for gas turbine
engines under variable operating conditions," in
Proceedings of ASME Turbo Expo 2014, Düsseldorf,
Germany, 2014.
[6] H. Hanachi, J. Liu, A. Banerjee and Y. Chen, "Effects of
sampling decimation on a gas turbine performance
monitoring," in Prognostics and Health Management
(PHM), 2014 IEEE Conference On, Cheney, WA, USA,
2014, pp. 1-6.
[7] D. Sánchez, R. Chacartegui, J. Becerra and T. Sánchez,
"Determining compressor wash programmes for fouled
gas turbines," Proc. Inst. Mech. Eng. A: J. Power
Energy, vol. 223, pp. 467-476, 2009.
5
Reliability, November 2016
[8] H. Hanachi, J. Liu, A. Banerjee and Y. Chen, "Effects of
the intake air humidity on the gas turbine performance
monitoring," in ASME Turbo Expo 2015: Turbine
Technical Conference and Exposition, Montreal, Canada,
2015, pp. (accepted article in Turbo Expo 2015, Montreal,
Canada).
[9] R. J. Hyndman and A. B. Koehler, "Another look at
measures of forecast accuracy," Int. J. Forecast., vol. 22,
pp. 679-688, 2006.
[10] J. Joyce, "Bayes' theorem," 2008.
[11] J. Sun, H. Zuo, W. Wang and M. G. Pecht, "Application
of a state space modeling technique to system prognostics
based on a health index for condition-based
maintenance," Mechanical Systems and Signal
Processing, vol. 28, pp. 585-96, 04, 2012.
AUTHOR BIOGRAPHY
Houman Hanachi received his B.Sc. and
M.Sc. in mechanical engineering from Sharif
University of Technology, Iran and his Ph.D.
in mechanical engineering from Carleton
University, Ottawa, Canada. He worked as a
professional engineer for thirteen years and
is currently a postdoctoral research fellow
and Term Adjunct Assistant Professor at Queen’s University,
Kingston, Canada. His research focus is on diagnostics,
prognostics and health management, and hybrid techniques for
state estimation, specialized in gas turbine engines.
Jie Liu received his B.Eng. in electronics
and precision engineering from Tianjin
University, China; his M.Sc. in control
engineering from Lakehead University,
Canada; and his Ph.D. in mechanical
engineering from the University of Waterloo,
Canada. He is currently an Associate
Professor in the Department of Mechanical & Aerospace
Engineering at Carleton University, Ottawa, Canada. He is
leading research efforts in the areas of prognostics and health
management, intelligent mechatronic systems, and power
generation and storage. He has published over 30 papers in
peer-reviewed journals. Prof. Liu is a registered professional
engineer in Ontario, Canada.
Avisekh Banerjee, Ph.D, P.Eng., is the
Systems Development and Services
Manager at Life Prediction Technologies
Inc. (LPTi). He manages the development
of diagnostics and prognostics tools for
turbo-machinery and avionics at LPTi. His
broad areas of research interest are
performing physics based prognostics case studies, ENSIP,
data trending for failure prediction, development of parts life
tracking systems and the development of PHM framework. He
works extensively with end users requiring prognostics
services.
Ying Chen is a senior Thermal Engineer at
Life Prediction Technologies Inc. (LPTi).
He received his B.E. and M.E. degrees in
the Department of Power and Energy
Engineering from Huazhong University of
Science and Technology, Wuhan, China. He
received his Ph.D. degree (2007) in the
Department of Mechanical and Aerospace Engineering from
Carleton University, Canada. He is in charge of the thermal
aspects of all the projects at LPTi, and his research interests
include gas turbine performance monitoring, conjugate heat
transfer (CHT) based computational fluid dynamics (CFD) of
turbomachinery components, thermal analysis of
electromechanical systems and sustainable energies.
6
Reliability, November 2016
Applying Genetic Algorithm to Rearrange Cloud Virtual Machines for Balancing Resource Allocation
Yu-Lun Huang National Chiao Tung University ylhuang@g2.nctu.edu.tw
Zong-Xian Li National Chiao Tung University noreg0379482.eed00@g2.nctu.edu.tw
Abstract - Advance of the novel cloud computing
technologies reduces hardware cost by sharing resources
operated by a cloud service provider (CSP). If an
inappropriate distribution strategy is applied resulting in
unbalanced resource allocation, the virtual machines on the
overloading hosts cannot perform as expected. In this paper,
we present a framework to apply genetic algorithms for
balancing resource allocation and reallocation to help CSP
managing resources of physical machines. The framework
collects CPU utilization data for the control server to
construct a proper distribution strategy using the proposed
algorithm, GASd (Genetic Algorithm-based Strategy
Daemon). GASd allows CSP to select a dimension of load
(CPU and memory of IO utilization) and to control the
migration ratio of virtual machines running on different
physical machines. We conduct several experiments to
evaluate GASd and the result shows that GASd can come out
an optimized strategy when considering different load
dimensions in distributing VMs on physical machines. We
further show that GASd can determine an optimized strategy
in distributing 10000 virtual machines on 2500 physical
machines, in 139 seconds (using TS selection) with a genetic
distance of 6.25%.
Keywords - cloud resource management, genetic algorithm,
virtual machine rearrangement
I. INTRODUCTION
In a cloud system, several physical machines (PMs) are
used to provide resource for virtual machines (VMs) by
adopting modern virtualization technologies. Due to the variety
of resource required by different VMs, a proper resource
management strategy is needed for balancing resource
allocation among VMs and to guarantee the performance of
VMs running in the cloud system. Since a cloud service
provider needs to manage a large amount of resource in a cloud
system, it is not easy to generate a proper distribution strategy.
Many researchers intend to leverage genetic algorithms to
design a resource allocation algorithm to solve the above issue,
such that a cloud service provider (CSP) can make a good
distribution strategy for VMs and ensure the performance of
the VMs. Considering a situation of managing 20 VMs on 5
PMs, the CSP needs to compare 520 possible combinations to
find out proper strategies for allocating resource for the VMs.
To solve such a difficult problem, genetic algorithms (GA) are
adopted to make the distribution strategy for CSP.
GA is a kind of global search algorithm developed by John
Holland in the late 1960s and early 1970s. The fundamental
concept of GA is to apply evolution to different generations,
then the generations may gradually converge to an optimized
solution. Running GA on a control server, a CSP can rearrange
VMs to different PMs to better utilize the physical resource on
PMs, as illustrated in Fig. 1.
Fig. 1: The application of Genetic Algorithm
II. THE EXISTING GA-BASED RESOURCE
ALLOCATION ALGORITHMS
In recent years, some algorithms based on GA were
proposed to help allocating physical resource for virtual
machines in a cloud system. Jinhua Hu et.al [1] designed an
algorithm (called HGA) to allocate resource and
simultaneously keep load balanced on CPU in 2010. In 2012,
Shin Chen et.al [2] presented another algorithm to allocate
resource by considering more factors.
Hu’s algorithm searches for an optimized solution to
reallocate physical resource such that all CPUs are running in a
balanced state while keeping the migration cost as low as
possible. The algorithm applies a high mutation rate in the
former generations, and then gradually reduce the mutation
rate to speed up the converge. In Hu’s algorithm, only one
factor is considered, which is no longer practical since memory
7
Reliability, November 2016
and network I/O are important factors in managing virtual
machines.
Shin Chen et.al presented HGA to reallocate resource for
VMs by considering more factor, less migration cost and fewer
active physical machines. Three dimensions (load, amount of
active PMs and migration cost) are used in HGA. In HGA,
load dimensions, instead of load balance, are considered. Two
PMs are considered to be load balanced if they have the same
ratio of CPU utilization and network throughput.
The above algorithms use fitness functions to evaluate the
survival ability of an individual data in GA, where a large
fitness value implies the high survival ability. Since load
deviation is the denominator in the fitness functions, a small
decrease of load deviation may cause a dramatic increase in
fitness value and lead to a distorted situation.
In this paper, we propose a framework for resource
management (VRMF) and a GA-based strategy algorithm
(GASd) by designing unfitness functions and unfitness values
to evaluate individual’s survival ability. To guarantee the
service provided by a CSP, no idle PM is allowed in VRMF.
Hence, only system load and migration cost are considered in
GASd.
III. VRMF
VRMF is a framework responsible for allocating resource
and keeping load balancing among all PMs/ VRMF is
composed of two components: U-probe and Resource Manager.
U-probe operates in each VM and Resource Manager (running
GASd) executes in the cloud control server (CCS), as
illustrated in Fig. 2. Two load dimensions, including CPU
utilization and memory utilization are considered in GASd and
hence U-probe collects the two factors for GASd by invoking
/proc/stat and /proc/meminfo.
Fig. 2: The system architecture of VRMF: (a) The
physical cloud computing environment (b) The software
corresponding to physical environment
Resource Manager receives the utilization data from U-
probe and then applies GASd to make a new distribution
strategy to keep load balanced. As shown in Fig. 3, Resource
Manager is composed of Monitor, Analyzer and Strategy
Maker. Initially, Resource Manager keeps the hardware
specification of each PM and resource requirement of each VM
in PMPROP and VMPROP, respectively. Monitor collects and
stores the raw probing data from U-probe forward the raw data
to the Analyzer. The Analyzer first smoothes the sequence of
the raw data to better evaluate the resource utilization of a VM.
The exponentially weighted moving average (EWMA), a first-
ordered low-pass filter function, is adopted to smooth the
sequence of utilization data in Analyzer. The input of Analyzer
is a sequence of raw probing data, while the output is a single
average utilization of a VM. Then, Strategy Maker runs GASd
and generates new distribution strategy to keep load balanced
among PMs in a cloud system.
Fig. 3: The Resource Manager
GASd is an algorithm based on genetic algorithm (GA).
GASd runs in Strategy Maker in Resource Manager to
determine a strategy and keep load balanced among PMs in
terms of both CPU and memory. GASd concerns about the
migration cost when a new strategy was made. The migration
cost should consider the environment factors, such as VM’s
image size, the disk I/O rate of source PM and destination PM
and the network bandwidth between the two PMs. These
factors are decided by use cases, hence, GASd estimated
migration cost by the ratio of moving VM amount to total VM
amount instead. Before conducting GASd, we define the
workload caused by VM and its unfitness functions.
Taking GA as its base, GASd performs a series of
operations, like selection, crossover and mutation. Finally, the
output of GASd can be an optimized distribution strategy
according to the utilization data obtained in the previous period,
as illustrated in Fig. 4.
Fig. 4: The workflow of GAsd
GASd consists of four steps: Initial, update, selection,
crossover, and mutation, explained in the following
subsections:
A. Initial
Initially, GASd loads the utilization data and environment
parameters, the hardware specification of each PM and the
resource requirement of each VM are generated.
B. Update
GASd updates its current solution in the ‘update’ step. At
the beginning of update, the proposed unfitness function is
used to calculate every individual’s unfitness value for this
generation, then updates the current solution according to these
new unfitness value.
In the first generation, this step simply chooses the
individual with the lowest unfitness value to be the current
solution. In the following generations, this step is conducted
after the operation of mutation, and chooses the individual with
the lowest unfitness value to compare with the current solution.
8
Reliability, November 2016
If the chosen individual has a lower unfitness value than the
current solution, it takes place of the current solution. When
GASd finishes its execution, this current solution becomes the
final solution. Except updating the current solution, this step
also issues a termination condition to determine the
termination. The termination condition generally occurs either
when reaching the maximum iteration or when the user
requirements are satisfied. For example, when the termination
condition is set to 300 iterations or standard deviation becomes
smaller than 5%. If current solution reaches a standard
deviation smaller than 5%, GASd terminates its execution.
C. Selection
‘Selection’ step is responsible for choosing parent
individuals from current generation and putting the chosen
individuals into the selected pool for the next generation. In the
view of diversity, selection step absolutely reduces diversity of
the next generation, and prompts GASd to converge. Basically,
selection step tends to choose those individuals with lower
unfitness values. With different selection methods, the
tolerance to those bad individuals is different, and the
performance of GASd is also different. In the paper, we apply
three selection methods in GASd: Proportional Roulette Wheel
Selection (PRWS), Rank-based Roulette Wheel Selection
(RRWS) and Tournament Selection (TS).
1) PRWS (Proportional Roulette Wheel Selection)
PRWS first maps every individual on a wheel according to
their fitness value. And the occupied proportion of an
individual decided by its fitness value. The bigger fitness
value leads to more occupation on this wheel. After
mapping the individuals, this wheel starts to spin, and
choose an individual by the pointer when the wheel stops
each time until reaching the population count as shown in
Figure 5.
Fig. 5: PRWS
Each time this wheel stops spinning, the pointed individual
is selected by the pointer. With this mechanism, it is
obvious that the bigger proportion an individual occupied,
the higher probability for being selected.
2) RRWS (Rank-based Roulette Wheel Selection)
RRWS uses a rank value to map individuals on the wheel
instead of their unfitness values. This idea helps GASd
prevent from premature (early converge). When a little
amount of individuals is much better than the other
individuals, these stronger individuals have very high
probability being chosen to become the parents of the next
generation. This situation drastically reduces the diversity
of population, and makes GASd to give up the potential of
those weak individuals. So, RRWS uses a rank value to
map individuals on wheel to eliminate the bias caused by
the big gap unfitness values. Note that, in RRWS, a
parameter SP is used to adjust the tolerance to poor
performance individuals. Comparatively, larger SP
normally has smaller tolerance to the individuals with poor
performance individuals.
3) TS (Tournament Selection)
TS selects several candidates from the previous generation,
and puts the best candidates into the pool. The parameter ts
in TS stands for the proportion of candidates to the
population size. In Fig. 6, 3 out of 6 individuals are selected
(since ts is set to 3), and then the best one is selected to be
the final result of TS.
Fig. 6: TS with ts=3
D. Crossover
Crossover is the key to a new generation. Crossover
exchanges some features of an individual, increases the
diversity in generating new individuals and approaches to an
optimized solution. In this step, GASd randomly chooses a pair
of individuals the corss point to obtain new individuals with
new chromosome, as shown in Fig. 7.
Fig. 7: The flow chart of crossover and its detail.
E. Mutation
Mutation further increases the diversity of the next
generations. GASd selects one bit of the chromosome and
changes its value randomly. The mutation repeats until all new
individuals are generated. Fig. 8 indicates the flow chart of
mutation and its detail.
9
Reliability, November 2016
Fig. 8: The flow chart of mutation and its detail.
IV. EVALUATION
VRMF and GASd is realized in a cloud system running
Xen as its hypervisor. Each PM installs XenServer as its host
OS, and these PMs connect to the cloud control server through
network. The cloud control sever runs a program call
XenCenter to manage the PMs. The control system designed
by Xen allows the administrator easily to control every PMs
and VMs through XenCenter.
Since XenServer provides hypervisor-based virtualization
interfaces, XenServer can be directly installed on a PM and can
manage hardware resource easier. In such a system, Dom0, a
special VM, provides the service console and the management
tools, while DomU, managed by Dom0, runs guest OSs on the
PM. Resource requests from DomU are managed by Dom0 for
better resource control, as illustrated in Fig. 9.
Fig. 9: XenServer’s system architecture.
In the above Xen-based cloud system, every VM runs U-
probe to collect utilization data for GASd. In the paper, we
only show how GASd works for rearranging VMs after
analyzing the utilization data from U-probes to keep load
balanced in the whole cloud system. In this experiment, we
adopt RRWS with SP =2 for selecting individuals and
generating new individuals. The experimental cloud system
runs 4 PMs, each of them has 4 cores with 16GB memory. In
the beginning, 16 VMs were launched on the 4 PMs. Table I
shows the CPU loads and memory requirements for each VM.
TABLE I: CPU loads collected by U-Probe in the Experiment
V M I
D
CP
U
loads
memory
requirements
(MB)
V M I
D
CP
U
loads
memory
requirements
(MB) vm1 28.
2048 vm9 18.
2560 vm2 23.
1024 vm10 9.2 3584 vm3 17.
2048 vm11 38.
4096 vm4 16.
1536 vm12 7.3 2048 vm5 12.
3072 vm13 8.1 3072 vm6 22.
4096 vm14 28.
4096 vm7 33.
2048 vm15 24.
1024 vm8 40.
2048 vm16 26.
1536
After performing GASd in the cloud control server,
different distribution strategies may come out for different
weight of CPU loads and memory requirements. If we assign
0.9 for weighting the CPU loads and 0.1 for memory
requirements, we may obtain Distribution Strategy I (See Fig.
10). If we concern more on memory requirements and assign
its weight to 0.9, then we may obtain Distribution Strategy II
(See Fig. 11). Moreover, after applying Distribution Strategy I,
we obtain a deviation of 0.32 for CPU loads and 7.44 for
memory requirements. This means that the four PMs in the
cloud system have similar computational loads. After applying
Distribution Strategy II, we obtain a deviation of 4.02 for CPU
loads and 1.81 for memory requirements, which means that the
four PMs have similar memory requirements.
Fig. 10: Distribution Strategy I (concern more on CPU loads)
Fig. 11: Distribution Strategy II (concern more on
memory requirements)
From the experiment, we show that GASd is able to make a
distribution strategy according to the policy given by the cloud
service provider.
10
Reliability, November 2016
V. SCALABILITY
We have proved the reliability of GASd to find a optimal
distribution strategy in experiments. In the experiments, only
few PMs and VMs are used. Here, we show GASd’s
performance in a real environment. We run 2000~10000 VMs
on 500~2500 PMs with different selection methods in GASd.
Fig. 12 shows the execution time and Fig. 13 shows the
distances of each case. In short, GASd can come out a
distribution strategy mapping 10000 VMs on 2500 PMs in 139
seconds with distance of 6.25% (if TS selection is applied).
This proves the scalability of GASd.
Fig. 12: The execution time for different amounts of
machines (second)
Fig. 13: The distance of different amounts of machines (%)
VI. CONCLUSION
In this article, we present a resource management
framework (VRMF) for cloud service providers. VRMF
intends to reallocate resources for each VM and keep load
balanced among PMs. A probing software (U-probe) is
installed in each VM to collect the utilization data of the VM.
Then, the Resource Manager running on the cloud control
server launches the proposed GA-based algorithm to generate
an optimized distribution strategy according to the preference
of a cloud service provider. We conduct a simple experiment to
show that different strategies may be obtained for different
preferences given by the cloud service provider. Further, we
discuss the scalability of GASd, and prove that GASd can
complete the execution in 139 seconds (using TS selection)
with distance equal to 6.25% when allocating 10000 VMs on
2500 PMs.
REFERENCES
[1] J. Hu, J. Gu, G. Sun, and T. Zhao, ”A scheduling strategy
on load balancing of virtual machine resources in cloud
computing environment,” Parallel Architectures,
Algorithms and Programming (PAAP), 2010 Third
International Symposium on, pp. 89–96, Dec 2010.
[2] S. Chen, J. Wu, and Z. Lu, ”A cloud computing resource
scheduling policy based on genetic algorithm with
multiple fitness,” Computer and Information Technology
(CIT), 2012 IEEE 12th International Conference on, pp.
177–184, Oct 2012.
[3] N. M. Razali and J. Geraghty, ”Genetic algorithm
performance with different selection strategies in solving
tsp,” Proceedings of the World Congress on Engineering,
vol. 2, Jul. 2011.
[4] R. Patel, M. M. Raghuwanshi, and L. G. Malik, ”An
improved ranking scheme for selection of parents in
multi-objective genetic algorithm,” Communication
Systems and Network Technologies (CSNT), 2011
International Conference on, pp. 734–739, June 2011.
[5] L. Zhang, H. Chang, and R. Xu, ”Equal-width
partitioning roulette wheel selection in genetic algorithm,”
2012 Conference on Technologies and Applications of
Artificial Intelligence, pp. 62–67, Nov 2012.
[6] J. Grover and S. Katiyar, ”Agent based dynamic load
balancing in cloud computing,” Human Computer
Interactions (ICHCI), 2013 International Conference on,
pp. 1–6, Aug 2013.
[7] Z. Gao, ”The allocation of cloud computing resources
based on the improved ant colony algorithm,” Intelligent
Human-Machine Systems and Cybernetics (IHMSC),
2014 Sixth International Conference on, vol. 2, pp. 334–
337, Aug 2014.
[8] V. Behal and A. Kumar, ”Cloud computing: Performance
analysis of load balancing algorithms in cloud
heterogeneous environment,” Confluence The Next
Generation Information Technology Summit
(Confluence), 2014 5th International Conference, pp.
200–205, Sept 2014.
[9] F. Bei, ”An improved ant colony algorithm based on
distribution estimation,” Intelligent Systems Design and
Engineering Applications (ISDEA), 2014 Fifth
International Conference on, pp. 161–164, June 2014.
[10] M. Rana, S. Bilgaiyan, and U. Kar, ”A study on load
balancing in cloud computing environment using
evolutionary and swarm based algorithms,” Control,
Instrumentation, Communication and Computational
Technologies (ICCICCT), 2014 International Conference
on, pp. 245–250, July 2014.
[11] P. A. Pattanaik, S. Roy, and P. K. Pattnaik, ”Performance
study of some dynamic load balancing algorithms in
cloud computing environment,” Signal Processing and
Integrated Networks (SPIN), 2015 2nd International
Conference on, pp. 619–624, Feb 2015.
[12] Y. Dai, Y. Lou, and X. Lu, ”A task scheduling algorithm
based on genetic algorithm and ant colony optimization
algorithm with multi-qos constraints in cloud computing,”
Intelligent Human-Machine Systems and Cybernetics
(IHMSC), 2015 7th International Conference on, vol. 2,
pp. 428–431, Aug 2015.
11
Reliability, November 2016
[13] S. Aslam and M. A. Shah, ”Load balancing algorithms in
cloud computing: A survey of modern techniques,” 2015
National Software Engineering Conference (NSEC), pp.
30–35, Dec 2015.
[14] F. T. Novais, L. S. Batista, A. J. Rocha, and F. G.
Guimares, ”A multiobjective estimation of distribution
algorithm based on artificial bee colony,” Computational
Intelligence and 11th Brazilian Congress on
Computational Intelligence (BRICS-CCI CBIC), 2013
BRICS Congress on, pp. 415–421, Sept 2013.
[15] P. K. Sundararajan, E. Fellery, J. Forgeaty, and O. J.
Mengshoel, ”A constrained genetic algorithm for
rebalancing of services in cloud data centers,” 2015 IEEE
8th International Conference on Cloud Computing, pp.
653–660, June 2015.
[16] R. Buyya, R. Ranjan, and R. N. Calheiros, ”Modeling and
simulation of scalable cloud computing environments and
the cloudsim toolkit: Challenges and opportunities,” High
Performance Computing Simulation, 2009. HPCS ’09.
International Conference on, pp. 1–11, June 2009.
[17] C. Y. Liu, C. M. Zou, and P. Wu, ”A task scheduling
algorithm based on genetic algorithm and ant colony
optimization in cloud computing,” Distributed Computing
and Applications to Business, Engineering and Science
(DCABES), 2014 13th International Symposium on, pp.
68–72, Nov 2014.
[18] S. H. Z. Edwin K. P. Chong, An Introduction to
optimization, 3rd edition, 3rd ed. John Wiley & Sons Inc,
Feb. 2008.
[19] J. Zhao, W. Zeng, M. Liu, G. Li, J. Zhao, W. Zeng, M.
Liu, and G. Li, ”Multi-objective optimization model of
virtual resources scheduling under cloud computing and
it’s solution,” Cloud and Service Computing (CSC), 2011
International Conference on, pp. 185–190, Dec 2011.
[20] N. Chen, X. Fang, and X. Wang, ”A cloud computing
resource scheduling scheme based on estimation of
distribution algorithm,” Systems and Informatics (ICSAI),
2014 2nd International Conference on, pp. 304–308, Nov
2014.
[21] Z. I. M. Yusoh and M. Tang, ”A penalty-based genetic
algorithm for the composite saas placement problem in
the cloud,” Evolutionary Computation (CEC), 2010 IEEE
Congress on, pp. 1–8, July 2010.
[22] Z. Zheng, R. Wang, H. Zhong, and X. Zhang, ”An
approach for cloud resource scheduling based on parallel
genetic algorithm,” Computer Research and Development
(ICCRD), 2011 3rd International Conference on, vol. 2,
pp. 444–447, March 2011.
[23] F. Xie, Y. Du, and H. Tian, ”A resource allocation
strategy based on particle swarm algorithm in cloud
computing environment,” Digital Manufacturing and
Automation (ICDMA), 2013 Fourth International
Conference on, pp. 69–72, June 2013.
[24] D. Kumar and Z. Raza, ”A pso based vm resource
scheduling model for cloud computing,” Computational
Intelligence Communication Technology (CICT), 2015
IEEE International Conference on, pp. 213–219, Feb
2015.
[25] S. Varshney, L. Srivastava, and M. Pandit, ”Comparison
of pso models for optimal placement and sizing of
statcom,” Sustainable Energy and Intelligent Systems
(SEISCON 2011), International Conference on, pp. 346–
351, July 2011.
[26] S.-C. Wang, K. Q. Yan, W.-P. Liao, and S.-S.
Wang, ”Towards a load balancing in a three-level cloud
computing network,” Computer Science and Information
Technology (ICCSIT), 2010 3rd IEEE International
Conference on, vol. 1, pp. 108–113, July 2010.
AUTHOR BIOGRAPHY
Yu-Lun Huang received the B.S., and Ph.D.
degrees in Computer Science, and
Information Engineering from the National
Chiao-Tung University, Taiwan in 1995, and
2001, respectively. She has been a member
of Phi Tau Phi Society since 1995. She is
now an associate professor in the Department
of Electrical & Computer Engineering of National Chiao-
Tung University (NCTU). She is now the Associate Dean of
NCTU Academic Affairs, Director of Center for Continuing
Education and Training at NCTU, and Director of Center for
Teaching and Learning Development at NCTU. She has been
serving the Secretary General of Taiwan Open Course
Consortium since 2014. Her research interests include wireless
security, virtualization security, embedded software,
embedded operating systems, risk assessment, secure payment
systems, VoIP, QoS and critical information infrastructure
protection (CIIP), IoT Security, LTE Security, creative and
innovative teaching model, etc.
Zong-Xian Li received the B.S., and Master
degrees in Electrical and Computer
Engineering from the National Chiao-Tung
University, Taiwan in 2015 and 2016,
respectively. His research interests include
embedded operating systems, cloud
computing and virtualization technologies.
12
Reliability, November 2016
Resolving Malware-Loading Problem by Leveraging Shellcode Detection Heuristics Michael Cheng Yi Cho National Chiao Tung University michcho.cs98g@g2.nctu.edu.tw
Zhi-Kai Zhang National Chiao Tung University skyzhang.cs99g@g2.nctu.edu.tw
Shiuhpyng Shieh National Chiao Tung University ssp@cs.nctu.edu.tw
Abstract - Dynamic malware analysis platform is developed
to extract malicious behavior from malware without the
knowledge of its internal structure. Based on the analysis
results, security experts can derive a signature for future
detection. However, a malware program may not be invoked
unless certain conditions are met. Therefore, the basic
requirement for dynamic malware analysis is able to
determine the conditions to activate a malware program. For
execution, malware is just like any other software that may
require additional supplementary information, such as
parameters, arguments, and entry point. Therefore, loading
malware with appropriate conditions to activate its malicious
behavior can be a challenge. This paper resolves the malware-
loading problem by leveraging shellcode detection heuristics
based on remote exploit traffic. A formal model is proposed to
model the malware loading problem. To cope with the
problem, a new scheme is proposed, implemented and
evaluated using malware samples collected from the wild. As
the evaluation showed, the scheme is highly effective. 15%
of the malware samples concealed their activities initially
when loaded. After the proposed scheme is applied, 97% of
the malware samples demonstrated activities.
Keywords - DLL malware; dynamic malware analysis;
honeypot; shellcode detection.
I. INTRODUCTION
The invention of the Internet allows users to exchange
information in a more convenient way. Along with the
convenience, it also provides a medium for malware
distribution. One of the malware distribution methods is
exploiting vulnerable network service applications. This type
of exploitation is known as remote exploit. To understand
remote exploit, research [1–4] has described common scenarios
for attackers to exploit network security vulnerabilities. The
common scenarios can be divided into three phases: pre-attack,
attack, and post-attack phases. The pre-attack phase gathers
information of a targeted victim for the preparation of
exploitation. The gathered information typically includes the
type and version of a network service. The exploitation occurs
in the attack phase where an attacker uses the gathered
information to craft remote exploits to attack the vulnerable
network service. The post-attack phase is to achieve the
objectives of an attacker, such as data exfiltration, internal
network system infection, and connect-back shell.
The attack phase can be divided into two procedures
according to recent research [1][5][6], that is, vulnerability
exploitation and egg-download. The vulnerability exploitation
sets up malware download (egg-download) from a remote host
and executes the downloaded malware to achieve the
objectives of an attacker. This exploitation approach is
designed to circumvent buffer limitation to achieve
sophisticated post-attack objectives while maintaining
flexibility of coupling exploit code with malware. In this
particular scenario, two software programs are used to carry
out the exploitation. One is the program to stage malware
download and malware execution, conventionally called
shellcode. The other is the program that carries out the
objective of post-attack phase, namely malware.
Leverage the aforementioned attack pattern, Baecher et al.
[5] presented a passive honeypot that simulates security
vulnerabilities to capture wild malware. The captured malware
is sent to a third-party dynamic malware analysis platform
(namely sandboxes [7–9]) for malicious behavior analysis.
Unfortunately, analyzing the captured malware can be a
tedious task due to the assumption of the dynamic malware
analysis. The assumption of dynamic malware analysis is based
on the execution ability of malware. In other words, it is useful
only if the malware can be executed on the dynamic malware
analysis platform. To narrow down the research scope, we will
look at malware on Windows platform specifically. Windows
malware samples are generally delivered in 3 different
executable formats [10], namely, executables, drivers, and
DLLs. Although the loading methods differ amongst the file
types, additional supplementary information is often needed for
execution, such as input arguments, system parameters and
entry points. Without the required supplementary information,
the behavior of malware is concealed, and cannot be analyzed
directly on the dynamic malware analysis platform. For
simplicity, we will name the problem as malware-loading
problem throughout this paper.
13
Reliability, November 2016
Malware execution can be context sensitive where
malicious behaviors can only be observed when certain
conditions are met. Research [11–13] addressed this problem
and suggested that different behaviors can be observed when
multiple execution traces are available. The approach is based
on concolic testing [14][15] where condition branch of a piece
of malware is labeled and conditions are computed to extend
code coverage of malware execution. Although its targeted
problem is related to execution-condition finding within a
program, the malware-loading problem incurred in the
shellcode is not addressed. We leverage shellcode detection
[16–18] which detects the existence of shellcode in a file. Our
proposed scheme locates the conditions needed to execute a
malware program. It relies on shellcode detection to resolve the
URI from the detected shellcode to locate and download the
malware for further analysis. Our approach ensures that the
malware is executed, rather than exploring code coverage, for
triggering malicious behaviors.
The contribution of this research is two-fold. First, a formal
model is derived to model the malware loading problem, and a
solution to the problem is given. Second, an automatic
execution framework is proposed that prepares the captured
malware for dynamic malware analysis. Since the framework is
automatic, it will save time and human resources on preparing
malware for dynamic malware analysis. Real malware samples
from the wild are used to evaluate the propose solution. The
evaluation demonstrates that the propose solution is effective.
The reminder of this paper is organized as follows. In
section 2, related work is discussed. Section 3 covers the
background knowledge related to shellcode detection and
dynamic malware analysis platform. Formal model of the
research problem and its solution are discussed in section 4.
Section 5 contains system design and implementation while
section 6 presents the result of evaluation using wild malware
samples. In section 7, limitation of the propose system is
discussed and section 8 summarizes the results of this paper.
II. RELATED WORK
Although the malware-loading problem remains unsolved,
the following related work may help understand the problem,
namely shellcode detection, dynamic malware analysis, and
code coverage for malware discovery.
Shellcode detection research focuses on detecting
shellcode in an arbitrary file. The goal of this research is to
discover shellcode traits in any given arbitrary file. Since the
majority of shellcode is encrypted to evade static signature
detection, most of shellcode research relies on dynamic
execution to locate traits of shellcode. Research [16–18] relies
on x86 emulator [19] to locate the traits of shellcode. Traits are
various implementations of GetPC code [20] and suspicious
techniques to memory location of APIs. These are crucial
information for malicious shellcode programmers to perform
useful attacks. Although these coding tricks are applicable to
any given program/shellcode, an authorized program author
does not have to take the longer route to obtain this program
information.
Dynamic malware analysis is used to extract malware
behavior by executing the targeted malware. Researchers [7–9]
dedicated their research effort to the correctness of extracting
execution behaviors. These research efforts are vital for anti-
virus company to develop antidote to remove malware from a
victim as well as developing detection signatures to avoid
infection. The problems of malware execution and trigger-
based malware are still not fully addressed in this research
domain.
Exploring malware code coverage [11–13] mixes
symbolic and concrete execution to determine the execution
conditions to trigger malicious behavior when malware
executes. The concept is based on concolic testing [14][15]
where the code coverage of analysis is maximized for software
reliability. However, without the knowledge of trigger type, the
process of finding the solution can be time consuming and
resource extensive. Our proposed solution aims to quickly
determine execution conditions for activating malware. It is to
complement, not to replace trigger-based malware condition
identification research.
III. BACKGROUND KNOWLEDGE
Malware is built to install itself into a victim and perform
malicious acts such as infiltration, building botnets, and taking
control of the victim. To infect a victim with a piece of
malware, an attacker will need to trick the victim to install the
desirable malware. The methods can be divided into active and
passive from attacker’s prospective. The active method
leverages security vulnerabilities to install desirable malware.
On the other hand, the passive method requires media to trick
users to download and install malware. In this scenario, the
attacker will need to wait until a victim responds. In this paper,
we limit our scope to an active method called arbitrary code
execution which leverages memory corruption vulnerabilities
remotely.
As mentioned in section I, a remote exploit which preforms
arbitrary code execution on memory corruption vulnerability
often involves two pieces of software: shellcode and malware.
Shellcode is a code segment to stage malware installation and
malware execution while the malware carries out post-attack
objectives. Shellcode is usually compact in size to fit into a
network packet. It uses limited application programming
interface (API) functions to be highly portable. One particular
common choice of API functions that appear in wild shellcode
samples is LoadLibrary and GetProcAddress from Kernel32.dll
[21–23]. As the names suggested, these APIs are used to load
library dynamically and locate the API function of the loaded
libraries while Kernel32.dll is loaded for all Windows
applications. Since shellcode exercises behaviors to search for
these APIs, research [16–18] used the behaviors as signatures
to detect shellcode existence in a file or a network packet.
Other detection methods also exist in the described literatures
which is based on the idea that shellcode attempts to discover
the current memory location it resides. This information is
particularly needed if the shellcode adopts code obfuscation
methodology to determine the memory location the obfuscated
code resides. The GetPC code [20] will not be encrypted given
the need of direct execution. Hence, it is an effective approach
to locate the shellcode.
Dynamic malware analysis [7–9] adopts the strategy of
recording the execution behavior in a controlled environment.
The method is useful in capturing malware behaviors.
However, the activation of malware is the basic requirement to
use dynamic malware analysis method. A piece of software is
defined as malware if the software performs malicious
functions. Just like any software, malware needs additional
supplementary information to run, such as parameters,
arguments, and entry point. Without the information, malware
14
Reliability, November 2016
cannot be executed on a dynamic malware analysis platform.
Therefore, we propose an activation scheme which leverages
shellcode detection results to provide execution information
needed for the activation of targeted malware.
IV. THE PROBLEM
As aforementioned, additional supplement information may
be required in order to activate malware. Without these
required information, malware cannot be analyzed on the
dynamic malware analysis platform. Therefore, we leverage
the knowledge of shellcode detection research domain to
extract the required information. The extracted information can
be used as an aid for dynamic malware analysis to extract
malicious behaviors of a studied malware. The problem is
formally defined as follows.
Computer program can be modeled as Mealy machine where a set of inputs and a set of states generate a set of outputs. Since the problem of interest regards to computer programs (malware), it is appropriate to model the problem using Mealy machine.
Definition 1: a FSM (Finite State Machine) is a quintuple
where and are finite and nonempty sets of input
symbols, output symbols, and states, respectively.
Majuscule letters representing sets and minuscule letters
representing elements.
We use Definition 1 to define the property of dynamic malware analysis. Dynamic malware analysis adopts the concept of black-box testing, where the analysis only focus on the generated outputs of the malware and the internal structure of malware is not studied. Thus, we define the inputs and the outputs of dynamic malware analysis for the research problem in Definition 2.
Definition 2: input and output of dynamic malware analysis
Suppose the target of interest for dynamic malware
analysis is
Input:
Output:
According to Definition 2, the necessary conditions to apply dynamic malware analysis are the starting state of the targeted malware and the corresponding input for the starting state. The conditions refer to parameters/arguments as the input symbols and the entry point as the starting state of the malware
under investigation. The outputs refer to execution behavior such as add/remove/modification of main memory, files, etc. Hence, knowing the starting state and input symbols of starting state is crucial to apply dynamic malware dynamic analysis.
Before we model a solution to the problem, we need to define the assumptions of remote exploits that leverage buffer over flow vulnerability. According to research [1–4], there are network exploits that downloads malware to infect victims. Within this exploitation, the following assumptions are made,
A piece of shellcode is included as exploitation input
The shellcode is responsible for malware download and downloaded malware execution
A program vulnerable to memory corruption can be modeled as follows according to Definition 1. Note that the vulnerabilities of a program often appear with careless implementation of the original FSM. Given careless implementation of FSM, we make use of prime to represent the elements that are different to the original FSM.
Let be a program that is prone to memory corruption
such that
where is an input that causes memory corruption
where is a state that contains memory
corruption vulnerability
where is the output generated by the state
prone to memory corruption
where is the state after
memory corruption
where is generated after is
exploited
Majuscule letters representing sets and minuscule letters representing elements.
According to the defined assumption, there are three FSM involved in a scenario of memory corruption exploitation. They are:
as a program that is prone to
memory corruption due to erroneous implementation
as a piece of shellcode that
carried out the actions of malware download and
downloaded malware execution
as a piece of malware that
is downloaded and executed by ; it is also the
target for dynamic malware analysis
The relations of the three FSMs are described as follows:
For to execute given
exists in
contains
where is identical to
15
Reliability, November 2016
can be executed independently if
of is identified
and is independent of of
where the initial input symbol of is null
Since Mshell is responsible for downloading and executing Mmal, Mshell individual execution will cause Mmal execution if dependency does not exists between M' and Mshell. A typical program dependency is the context of a given program which is the context of the current memory snapshots. Therefore, if Mshell and M' are independent; direct execution of Mshell will cause Mmal download and Mmal execution.
V. PROPOSED SCHEME
The goal is to extract shellcode from remote exploit network traffic to execute the downloaded malware. The execution behavior can be reconstructed by taking shellcode directly from the remote exploit packet, which is recorded network stream during remote honeypot exploitation. The recorded network streams are passed into shellcode detection algorithm to identify the offset of the shellcode. Upon shellcode identification, the shellcode is extracted from the recorded network stream and packed as an executable program. The generated executable program is ready to be analyzed in a dynamic malware analysis platform. To ensure Sandbox evaluation process operates malware download action accordingly, a gateway router is set up redirect the malware download network traffic to a customized malware host.
Figure 1 depicts system relations between the propose scheme, honeypot, and dynamic malware analysis system. Honeypot captures wild malware while the dynamic malware analysis platform analyzes the captured wild malware to develop signature for system protection. As aforementioned, a portion of the captured wild malware cannot be directly executed on the dynamic malware analysis platform where the captured wild malware required additional supplementary information for execution. Therefore, we propose a new scheme which interacts between honeypot and dynamic malware analysis platform to assist the malware analysis process.
Fig. 1. System architecture of the propose scheme
There are three major components in the propose scheme; shellcode extraction, shellcode packaging, and analysis gateway. The shellcode extraction component leverages shellcode detection algorithm to locate the shellcode offset value of the incoming packet. The offset value is used to
extract the shellcode as the input for shellcode packaging component. The shellcode packaging component will package the shellcode as an executable file that can be directly executed on the targeted dynamic malware analysis platform. The last component, analysis gateway, will act as the network gateway for shellcode execution when shellcode is analyzed on the dynamic malware analysis platform. In order to serve the corresponding malware when the executable shellcode made the request, the analysis gateway component must preserve the information of the Uniform Resource Identifier (URI) detection component on the honeypot. The preserved information is later used to serve the corresponding malware when the request is made by the captured shellcode.
As the result of execution of packaged shellcode on the
dynamic malware analysis platform, it will trigger the
behavior as the exploitation designed originally if the
shellcode has no dependency on the exploited network service.
The expected behavior will include download of malware and
execution of the downloaded malware. Both behaviors will be
recorded and reports will be generated accordingly for
information security experts to develop the signature for the
analyzed malware.
VI. EVALUATION
The evaluation samples are collected by deploying Dionaea [24], a honeypot, in the National Chiao Tung University (NCTU) campus network. The honeypot uses public Internet Protocol (IP) address and collected wild malware for over one year. The total collection of the wild malware samples is 343 distinctively, and over 15,000 exploits were recorded during the operation period. The wild malware samples are downloaded from 807 distinctive URI from 755 unique IP addresses. The wild malware samples are mainly delivered by using Hypertext Transfer protocol (HTTP) and Service Message Block (SMB) protocol, and the statistics are 658 and 149 respectively.
Figure 2 contains the file type distribution of the collected wild malware samples. There are 275 executable malware samples and 68 data files in total. Within the wild executable malware samples, 247 executable files are delivered by HTTP individually, and the other 28 executable files are delivered with 68 data/raw files which are carried by SMB. After investigating the exploitation of the malware samples, the exploitation can be divided into two categories based on the results of exploitation. Category one achieved arbitrary code execution [24][25], while the other category achieved remote code execution [26][27]. The difference of the two categories is divided by the existence of shellcode when vulnerability is exploited where the former results in arbitrary code/shellcode execution and the latter called the library API to achieve malicious acts. The malware samples delivered by HTTP belong to category one and the malware samples delivered by SMB belong to the other category. Since the propose scheme is shellcode-based, this evaluation will only focus on the 247 executable malware samples. Figure 2 also demonstrates the ratio of successful malware execution without prior knowledge of the execution condition. About 15% of the wild DLL malware samples and 38% of the wild executable malware samples require additional information for malware execution. Again, only the DLL malware samples are eligible for this evaluation; therefore, 15% of the evaluation malware samples require additional information for dynamic malware analysis.
16
Reliability, November 2016
Fig. 2. Sample malware file type distribution
Using the implementation of the proposed scheme, the shellcode of the collected remote exploits is extracted and wrapped as Windows binary executables for dynamic malware analysis. The dynamic malware analysis platform for the evaluation is called Malware Behavior Analyser (MBA) [28] [29] which is an enhancement of QEMU [30] platform. Like any other dynamic malware analysis platforms, MBA is designed to execute malware samples in a virtual environment to record the execution behaviors including alternation of the file system and registry, and packet recording throughout the execution. Each of the evaluated shellcode samples is given a window period of 2 minutes to carry out the execution. After execution of the given window period, a report is generated from the MBA. The generated report is then summarized to evaluate the result of execution.
Since that are over 14,000 samples available for evaluation, an experiment environment is set up to carry out the experiment automatically. The experiment environment is based on distributed system concept where a master machine is set up to distribute evaluation samples to 13 individual slave machines with MBA pre-installed for evaluation. Once an evaluation task is completed on a slave machine, the report is sent back to the master machine for further report summarization. During the experiment, the evaluation sample will try to acquire the executable malware from a remote host as shellcode originally intended. The malware acquiring network traffic will be redirected to a customized malware repository host by a preconfigured router. The customized malware repository host is designed to read the HTTP request and respond accordingly to a preconfigured database.
As a result of the evaluation, 97% of the evaluated samples significantly displayed the signs of malware execution. After investigating the failure cases, the failed shellcode samples can be further grouped into three categories. In category one, the evaluated shellcode does not demonstrate the behavior of malware acquisition. This problem is due to delivery failure of the malware host. In the experiment environment, only the simplest HTTP server is set up to answer the malware acquisition calls. Due to simplicity nature, HTTP request-dropping events occurred occasionally. This problem can be resolved by reevaluating the failed shellcode samples or deploying a malware host that is more reliable. The second category only demonstrated the traits of malware acquisition behavior. This is a common problem of evaluating a malware sample on a dynamic analysis platform where execution time period expires before the malware is completely acquired from
the malware host. To resolve the problem, a longer execution time period is assigned for another round of evaluation. The last category shows no sign of shellcode sample execution. After investigation, the error occurs during shellcode extraction process where the shellcode code segment exists before the location of the GetPC code [20]. In other words, code segment exists before the detected shellcode offset in shellcode extraction procedure. To resolve this problem, it requires human interpretation to correct shellcode offset and the entry point of the shellcode.
VII. LIMITATIONS AND FUTURE WORK
The proposed system does not guarantee the correctness of malware execution that exercises intended malicious behaviors. To ensure correctness of the intended malicious behaviors of malware is a difficult problem to be solved. Instead, we take the approach in evaluation to ensure that the targeted malware has been exercised, and our proposed scheme works properly according to the hypothesis.
One limitation is based on shellcode detection of different types of remote exploit. As mentioned in the introduction, different form of remote exploitation exists, e.g., document files with embedded macros/scripts that decrypt and execute shellcode. In such a scenario, the shellcode is usually encrypted where the shellcode detection heuristic used in this paper will produce a false negative result. Studies [31–34] showed that the embedded macro/script on the other hand is the medium for attacking the vulnerable functions, decrypting and executing the shellcode. Due to the scope of this paper, document malware of this kind is not included herein for the evaluation of effectiveness of our proposed scheme. However, our proposed scheme is also effective if the shellcode detection scheme is substituted with an appropriate document shellcode detection scheme and minor adjustments on the shellcode extraction scripts. If the extracted shellcode does not rely on the context of the document reader, the proposed scheme will work as it was originally proposed.
Shellcode execution that is strongly coupled with the context of the platform is another problem that can be addressed. This problem is well aware of in the beginning of this research. However, we have decided to address the context problem in future research work for the following reasons. For a piece of shellcode to remain portable, the developer will try to write a piece of shellcode that does not depend on the context of the platform. Therefore, a piece of portable shellcode can be adapted to a number of exploits. Thus, there is a good population for portable shellcode in the wild. Another obstacle is based on the malware sample collection. Since the malware sample is collected by Nepenthes passive honeypot and the vulnerabilities are emulated. It is reasonable to assume that the collected malware samples will have minimal dependency on the vulnerable services.
VIII. CONCLUSION
In this paper, we proposed a new scheme enabling dynamic malware analysis to analyze malware where the targeted malware requires supplementary execution information. The scheme leverages the results of honeypot/intrusion detection research of shellcode detection to extract shellcode to aid the dynamic malware analysis platform. As a result, the extracted shellcode will act as an agent that executes the target malware for dynamic malware analysis. The scheme is evaluated with wild malware captured by Dionaea honeypot platform and 97% of the malware samples demonstrated execution behaviors
17
Reliability, November 2016
when the proposed scheme is applied. Additionally, a formal model is constructed to support the proposed scheme. Using our proposed scheme, information security experts can instantly analyze a piece of malware that requires supplementary execution information without spending additional time to find out the execution conditions. Furthermore, the propose scheme is automated to save human effort in generating the execution conditions for malware.
ACKNOWLEGMENTS
This work was supported in part by MOST 104-2218-E-
001-002, Taiwan Information Security Center (TWISC),
Ministry of Science and Technology, Ministry of Education
(R.O.C.), Bureau of Investigation, National Security Council of
Taiwan, TTC, NCC, Chunghwa Telecomm., TSMC, and Trend
Micro Inc., respectively.
REFERENCES
[1] G. Gu, P. Porras, V. Yegneswaran, M. Fong, and W. Lee, "BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation," in Proc. of 16th USENIX Security Symp., 2007, pp. 167-182.
[2] I. Arce, “The Shellcode Generation,” in Security & Privacy 2.5, IEEE, 2004, pp. 72-76.
[3] D. Kennedy, J. O’Gorman, D. Kearns, and M. Aharoni, Metasploit: the penetration tester's guide. San Francisco, CA: No Starch Press, 2011.
[4] Meatsploit Development Team, (2014), Metasploit Project [Online]. http://www metasploit.com.
[5] P. Baecher, M. Koetter, T. Holz, M. Dornseif, and F. Freiling, “The Nepenthes Platform: An Efficient Approach to Collect Malware,” in Recent Advances in Intrusion Detection (RAID), 2006, pp. 165-184.
[6] M. Sikorski, and A. Honig, Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. San Francisco, CA: No Starch Press, 2012.
[7] C. Willems, T. Holz, and F. Freiling, “Toward Automated Dynamic Malware Analysis Using CWSandbox,” in Journal of IEEE Security and Privacy 5.2, March 2007, pp. 32-39.
[8] U. Bayer, C. Kruegel, and E. Kirda, “TTAnalyze: A Tool for Analyzing Malware.” na, 2006.
[9] C. Guarnieri, et al., (2014), Cuckoo Sandbox [Online]. http://www.cuckoosandbox.org.
[10] U. Bayer, I. Habibi, D. Balzarotti, E. Kirda, and C. Kruegel, “A View on Current Malware Behaviors,” in USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET), April 2009.
[11] D. Brumley, et al., “Automatically Identifying Trigger-based Behavior in Malware,” in Botnet Detection. Springer US, 2008, pp. 65-88.
[12] A. Moser, C. Kruegel, and E. Kirda, “Exploring Multiple Execution Paths for Malware Analysis,” in Proc. of the 2007 IEEE Symp. on Security and Privacy (SP ’07), 2007, pp. 231-245.
[13] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A Survey on Automated Dynamic Malware-analysis Techniques and Tools,” in ACM Computing Surveys 44.2:6 (CSUR), February 2012.
[14] J. C. King, “Symbolic Execution and Program Testing,” in Communications of the ACM 19.7, July 1976, pp. 385-394.
[15] K. Sen, D. Marinov, and G. Agha, “CUTE: a Concolic Unit Testing Engine for C”, in the Proc. of the 10th European Software Engineering Conference (ESEC/FSE-13), 2005, pp. 263-273.
[16] M. Polychronakis, K. G. Anagnostakis, and E. P. Markatos, “Comprehensive Shellcode Detection using Runtime Heuristics,” in Proc. of the 26th Annu. Computer Security Applications Conf. (ACSAC ‘10), 2010, pp. 287-296.
[17] M. Polychronakis, K. G. Anagnostakis, and E. P. Markatos, “Emulation-based Detection of Non-self-contained Polymorphic Shellcode,” in Recent Advances in Intrusion Detection (RAID), 2007, pp. 87-106.
[18] Q. Zhang, D. S. Reeves, P. Ning, and S. P. Iyer, “Analyzing Network Traffic to Detect Self-decrypting Exploit Code,” in Proc. of the 2nd ACM Symp. on Information, Computer and Communications Security (ASIACCS ‘07), March 2007, pp. 4-12.
[19] P. Baecher and M. Koetter, (2014), libemu [Online]. http://libemu.carnivore.it.
[20] SkyLined and Cipher, (2014), Hacking/Shellcode/GetPC [Online]. http://skypher.com/wiki/index.php?title=Hacking/Shellcode/GetPC.
[21] C. Anley, J. Heasman, F. Linder, and G. Richarte, The Shellcoder's Handbook: Discovering Exploiting Security Holes 2nd Edition. Indianapolis: Wiley, 2007.
[22] Skape, (2003), Understanding Windows Shellcode [Online]. http://www hick.org/code/skape/papers/win32-shellcode.pdf.
[23] SkyLined and Cipher, (2014), Hacking/Shellcode/kernel32 [Online]. http://skypher.com/wiki/index.php?title=Hacking/Shellcode/kernel32.
[24] Dionaea Development Team, (2014), Dionaea Catches Bug [Online]. http://dionaea.carnivore.it/.
[25] Microsoft Security TechCenter, (2014), Microsoft Security Bulletin MS08-067 – Critical [Online]. https://technet microsoft.com/en-us/library/security/ms08-067.aspx.
[26] Dionaea Development Team, (2014), MS10-061 Attacks? [Online]. http://carnivore.it/2010/10/18/ms10-061_attacks.
[27] Microsoft Security TechCenter, (2014), Microsoft Security Bulletin MS10-061 – Critical [Online]. https://technet microsoft.com/en-us/library/security/ms10-061.aspx.
[28] C. W. Wang, C. W. Wang, C. W. Hsu, and S. P. Shieh, “Malware Behavior Analysis Based on Virtual Machine Introspection and Snapshot Comparison,” in the 20th Cryptology and Information Security Conference (CISC 2010), Taiwan, May 2010, pp. 69-74.
[29] C. W. Wang, and S. P. Shieh, “SWIFT: Decoupled System-Wide Information Flow Tracking and its Optimizations,” in Journal of Information Science and Engineering, 31.4, 2015, pp. 1413 – 1429.
[30] F. Bellard, “QEMU, a Fast and Portable Dynamic Translator,” in USENIX Annu. Technical Conf., FREENIX Track, April 2005, pp. 41-46.
18
Reliability, November 2016
[31] D. Stevens, “Malicious PDF Documents Explained,” in
Security & Privacy 9.1, IEEE, 2011, pp. 80-82.
[32] Z. Tzermias, G. Sykiotakis, M. Polychronakis, and E. P. Markatos, “Combining Static and Dynamic Analysis for the Detection of Malicious Documents,” in Proc. of the Fourth Enropean Workshop on System Security (EUROSEC ’11), 2011.
[33] R. Merritt, (2014), Analyzing PDF Malware – Part 1 [Online]. http://blog.spiderlabs.com/2011/09/analyzing-pdf-malware-part-1.html.
[34] R. Merritt, (2014), Analyzing PDF Malware – Part 2 [Online]. http://blog.spiderlabs.com/2012/01/analyzing-pdf-malware-part-2.html.
AUTHOR BIOGRAPHY
Michael Cheng Yi Cho is a Ph.D. candidate in the Department of Computer Science at National Chiao Tung University, Hsinchu, Taiwan. His research interests include Named Data Networking, access control, network security, and malware analysis.
Zhi-Kai Zhang is a Ph.D. candidate in the Department of Computer Science at National Chiao Tung University, Hsinchu, Taiwan. His research interests include cryptography, IoT security, cloud security, and access control.
Prof. Shiuhpyng Winston Shieh received his M.S. and Ph.D. degrees in electrical and computer engineering from the University of Maryland, College Park, respectively. He is a Distinguished Professor of Computer Science Department and the Director of Taiwan Information Security Center at National Chiao Tung University (NCTU). He has served as the adviser to the National Security
Council of Taiwan, the chair of the Department of Computer Science, NCTU, and President of Chinese Cryptology and Information Security Association (CCISA). Being actively involved in IEEE, he has served as EIC of IEEE Reliability Magazine, EIC of RS Newsletter, Reliability Society VP Tech, Editor of IEEE Trans. on Reliability and IEEE Trans. on Dependable and Secure Computing. Dr. Shieh has also served as ACM SIGSAC Awards Committee member, Associate Editor of ACM Trans on Information and System Security, Journal of Computer Security, Journal of Information Science and Engineering, Journal of Computers, and the guest editor of IEEE Internet Computing, respectively. Furthermore, he has been on the organizing committees of many conferences, such as the founding Steering Committee Chair and Program Chair of ACM Symposium on Information, Computer and Communications Security (AsiaCCS), founding Steering Committee Chair of IEEE Conference on Dependable and Secure Computing, Program Chair of IEEE Conference on Security and Reliability. Along with Virgil Gligor of Carnegie Mellon University, he invented the first US patent in the intrusion detection field, and has published 200 technical papers, patents, and books. He is an IEEE Fellow, ACM Distinguished Scientist, and Distinguished Pro-fessor of Chinese Institute of Electrical Engineers. His re-search interests include system penetration and protection, malware behavior analysis, network and system security.
19
Reliability, November 2016
Submission Instructions
Authors should use the designated IEEE Reliability
Manuscript Central Website to summit their papers. Please
refer to the following steps to submit your papers:
1. Login to IEEE Reliability Manuscript Central. If you
have no account, sign up for one.
2. Click “Authors: Submit an article or manage
submissions”.
3. Please click “CLICK HERE” at the bottom of this page,
and you will be brought to the five-step submission
process.
4. You need to 1) choose the section that you are going to
submit your paper to; 2) complete the submission
checklist; 3) enter the comments for the editor, which is
optional; 4) save and continue.
5. If you have any supplementary files, please upload them
in step 4.
Manuscript Types
Manuscripts for regular issues fit within the scope of the
magazine, but are not intended for a special issue. Special
issue manuscripts cover a specific topic scheduled on our
editorial calendar. Please select the appropriate issue
(manuscript type) when uploading your manuscript. For
more information and to see upcoming special issue topics,
see our Editorial Calendar at
http://rs.ieee.org/reliability-digest/author-guidelines.html.
Typing Specifications
The manuscript should be written in Times New Roman in a
double-column format. The typical length of the submitted
manuscript is 4 single-spaced pages. The text portion of the
manuscript should be in 10-point font and the title should be
in 24-point font, bold.
Manuscript Length
The typical length of the submitted paper is 4 pages,
including text, bibliography, and author biographies. Please
note that proper citations are required.
Illustrations
The illustrations in the articles must be cited in the text and
numbered sequentially. Captions that identify and briefly
describe the subject are needed as well. In order to avoid
dense and hard-to-read illustrations, graphs should show
only the coordinate axes, or at most the major grid lines.
Line drawings should be clear. To prevent potential layout
problems from happening, related figures described within
the same section of text should be grouped together as parts
(a), (b), and so on.
References
All manuscript pages, footnotes, equations, and references
should be labeled in consecutive numerical order they are
mentioned in the text. Figures and tables should be cited in
text in numerical order.
Biographical Sketch
A brief biographical sketch should contain the full title of
the paper and complete names, affiliations, addresses, and
electronic mail addresses of all authors. The corresponding
author should be indicated.
Please provide a short biography and a picture for each
author of your paper at the end of your paper. The short
biography should contain no more than 150 words
Copyright
IEEE Reliability Society owns the copyrights of the articles
published in Reliability. If you wish to reproduce the
copyrighted materials, please contact us and seek the
permissions. The contents on this website can be referenced
with proper citation.
Special Issue Proposal Submissions
For a special issue in Reliability, experts are welcome to
serve as our guest editors. To know more information,
please contact Editor-in-Chief on Reliability,
Shiuhpyng Shieh: ssp@cs nctu.edu.tw.
Recommended