ABSTRACT: Structural health monitoring is based on the development of reliable and robust indicators able to detect, locate,
quantify and predict damage. Studies related to damage detection in civil engineering structures have a noticeable interest for
researchers in this area. Indeed, the detection of structural changes likely to become critical can avoid the occurrence of major
dysfunctions associated with social, economic and environmental consequences. Recently, many researchers have focused on
dynamic assessment as part of structural diagnosis. Most of the studied techniques are based on time or frequency domain
analyses to extract compressed information from modal characteristics or based on indicators built from these parameters. This
work has as its main interest the use a soft-clustering method coupled with the Symbolic Data Analysis (SDA) to detect
structural damage. SDA is applied to dynamic measurements obtained on site (accelerations) and to the identified modal
parameters. In order to attest the efficiency of the proposed approaches, an experimental investigation is carried out. It is shown
that SDA coupled with clustering methods is able to distinguish structural conditions with adequate rates.
KEY WORDS: Structural Health Monitoring, Damage Detection, Novelty Techniques
1 INTRODUCTION
Studies related to early damage detection is of special concern
for civil engineering structures. It is common knowledge that
if a damage process is not identified in time, structural
systems may have serious safety and economic consequences.
Traditional methods of damage detection and health
monitoring are often based on the variation of structural
vibration characteristics, i.e. natural frequencies, damping
ratios and mode shapes. These modal parameters are directly
affected by changes in the physical properties of the structure
including its mass and stiffness. Nevertheless, modal
parameters identification is a sort of filtering process, leading
to a loss of information compared to the raw data. This
compression process can erase any small changes due to a
structural modification. In turn, using raw dynamic
measurements (especially if high sampling frequencies are
used) leads to the storage of large set of data. Dynamic
measurements can easily contain over thou-sands of values
making an analysis process extensive and prohibitive.
Nevertheless, several damage detection methods exist in the
literature based on signature principles, but they usually fail
when making them practical. In this sense, despite the current
processing power of computers, the necessary computational
effort to manipulate large data sets re-mains a problem.
Furthermore, and this is certainly the major drawback when
using modal parameters, is that modal components are
essentially describing an equivalent linear behavior, a feature
which may be not exact for the analysis of specific degraded
systems.
Data mining is the process of extracting hidden patterns
from data. As more data is gathered in monitoring, data
mining is becoming an increasingly important tool to trans-
form this data into information. It is commonly used in a wide
range of profiling practices, such as marketing, fraud
detection and scientific discovery. Data mining can be applied
to data sets of any size. However, while it can be used to
uncover hidden patterns in data that has been collected,
obviously it can neither uncover patterns which are not
already present in the data, nor can it uncover patterns in data
that has not been collected. This richer type of data is called
symbolic data and it allows representing the variability and
uncertainty present in each variable. The development of new
methods of data analysis suitable for treating this type of data
is the aim of Symbolic Data Analysis (SDA). Most of the
currently developed techniques in symbolic data analysis are
extensions of statistical methods. Diday et al. [1] certify the
growth of data of symbolic nature and alert to the necessity of
developing new statistical methodologies for the treatment of
such information. In general, SDA provides suitable tools for
managing complex, aggregated, relational, and higher-level
data. The methodological issues under development
generalize the classical data analysis techniques, like
visualization, factorial techniques, decision tree,
discrimination, regression as well as classification and
clustering methods. Previous works in the literature
introduced the use of SDA applied to data representing
structural temperature variation in time.
The purpose of this paper is to present some principles of
symbolic data analysis and, more specifically, the use of
clustering methods applied to structural damage assessment.
The idea consists in using different classification procedures
in order to have insights about structural health conditions. In
other words, the SDA is applied to either the vibration data
(signals) obtained through dynamic tests or the modal
parameters in order to infer if a damage process is in progress
or if it has already occurred in the structure. In order to show
the potentialities of the proposed methodology, several results
regarding experimental tests performed on a real case, a rail
Symbolic soft-clustering applied to structural health monitoring
Vinícius Alves1, Alexandre Cury2, Flávio Barbosa, Christian Cremona3
1University of Ouro Preto, Post-Graduate Program in Civil Engineering, Ouro Preto, Brazil 2University of Juiz de Fora, Department of Applied Computational Mechanics, Juiz de Fora, Brazil
3Technical Centre for Bridge Engineering, CEREMA/DTITM, Sourdun, France
email: [email protected], [email protected], [email protected], christian.cremona@developpement-
durable.gouv.fr
Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014Porto, Portugal, 30 June - 2 July 2014
A. Cunha, E. Caetano, P. Ribeiro, G. Müller (eds.)ISSN: 2311-9020; ISBN: 978-972-752-165-4
2457
bridge located in France on the high speed line between Paris
and Lyon. Vibration measurements were obtained under three
different conditions: before, during and after a structural
modification process.
2 PRINCIPLES OF SYMBOLIC CLUSTERING
METHODS
In general, data acquisition campaigns in civil engineering
structures gather thousands of accelerations values measured
by several sensors. Consequently, analyzing all of these data
(classical data) directly may usually be time-consuming or
even prohibitive. In this sense, transforming this massive
quantity of data into a compact but also rich descriptive type
of data (symbolic data) becomes an attractive approach. Let us
consider, for instance, a signal X (which is part of a dynamic
test) containing n acceleration values measured by one single
sensor. There are several ways to transform classical data into
symbolic data. This signal can be represented by:
a k-category histogram: X={a1(n1), a2(n2), a3(n3), …,
ak(nk)};
an interquartile interval: X = [a25%; a75%];
a min/max interval: X = [amin; amax].
The same representation can be applied to modal
parameters, i.e. natural frequencies and mode shapes. In other
words, both of these quantities can be represented by intervals
or histograms. Transforming classical data to symbolic data is
carried out almost instantly, which does not prohibit or make
difficult the use of this methodology for a large ensemble of
dynamic tests. 10-category histograms and interquartile
intervals were used in the present study.
Data clustering is a common technique for statistical data
analysis, which is used in many fields. A clustering procedure
is an unsupervised learning process and can be defined as a
way of classifying a number of objects into different groups.
More precisely, it can be described as the partitioning of a
data set into subsets (clusters), so that the data in each subset
share some common properties. For an appropriate clustering,
it is necessary to minimize the within-cluster variation to
obtain the most homogeneous clusters as possible and, as a
natural consequence, to maximize the between-cluster
variation to obtain the most dissimilar clusters among each
other. To define these clusters and determine the proximity (or
similarity) among the tests, it is necessary to define suitable
dissimilarity measures. In a common sense, the lower these
values are the more similar the objects are and thus, they are
gathered in the same cluster. Conversely, the objects allocated
into different clusters are the ones that have greater distances
between them. Dissimilarity measures can take a variety of
forms and some applications might require specific ones.
More details can be found in Billard and Diday [2]. This paper
uses a soft-clustering method to discriminate structural
behaviors.
Fuzzy clustering plays an important role in solving
problems in the areas of pattern recognition and fuzzy model
identification. A variety of fuzzy clustering methods has been
proposed and most of them are based upon distance criteria
[3]. One widely used algorithm is the fuzzy c-means (FCM)
algorithm. It uses reciprocal distance to compute fuzzy
weights. A more efficient algorithm is the new FCFM. It
computes the cluster center using Gaussian weights, uses large
initial prototypes, and adds processes of eliminating,
clustering and merging.
2.1 Fuzzy c-means clustering method
Fuzzy clustering plays an important role in solving problems
in the areas of pattern recognition and fuzzy model
identification. A variety of fuzzy clustering methods has been
proposed and most of them are based upon distance criteria.
One widely used algorithm is the fuzzy c-means (FCM)
algorithm. It uses reciprocal distance to compute fuzzy
weights. A more efficient algorithm is the new FCFM. It
computes the cluster center using Gaussian weights, uses large
initial prototypes, and adds processes of eliminating,
clustering and merging.
The fuzzy c-means (FCM) algorithm has successfully been
applied to a wide variety of clustering problems. This
approach partitions a set of object data {x1,…,xn} Rs into c
(fuzzy) clusters based on a computed minimizer of the fuzzy
within-group least squares functional
2
21 1
),( ik
c
i
n
k
m
ikm vxUvUJ
(1)
where,
m > 1 is the fuzzification parameter,
vi Rs is the prototype (or mean) of the ith cluster,
Uik [0,1] is the degree to which datum xk belongs to the
ith cluster,
v = [vji] = [v1,…,vc] Rsc is the matrix of cluster
prototypes,
U = [Uik] is the partition matrix, and
2
2*
is the Euclidean or 2-norm, squared.
For later notational convenience, we will array the object
data {x1,…,xn} as columns in the object data matrix X = [xjk]
= [x1,…,xn] Rsn. The partition matrix U is a convenient
tool for representing cluster structure in the data {x1,…,xn};
we define the set of all nondegenerate fuzzy cn partition
matrices for partitioning n data into c clusters as
}0;1;10:,|{M11
fcn
n
k
ik
c
i
ikik
nc UUUkiRU (2)
The most popular and effective method of optimizing (1) is
the fuzzy c-means algorithm, which alternates between
optimizations of *)v|U(J~
m over U with v* fixed and
*)U|v(Jm over v with U* fixed, producing a sequence
{(U(r),v(r))}. Specifically, the r+1st value of v = [v1,…,vc] is
computed using the rth value of U in the right hand side of
vi =
n
1k
mik
n
1k
kmik
UU x for i=1,…,c (3)
Then the updated r+1st value of v is used to calculate the
r+1st value of U via
Uik =
c
1j
)1m/(1jk
)1m/(1ik
dd (4)
Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014
2458
where,
dik = 2
2ik vx > 0, for i = 1,…,c
and k = 1,…,n. (5)
The FCM iteration is initialized using some U Mfcn (or
possibly v Rsc), and continues by alternating the updates
in (3) and (4) until the difference measured in any norm on
Rcn (or Rsc), in successive partition matrices (or v
matrices) is less than some prescribed tolerance, .
3 EXPERIMENTAL APPLICATION
The studied bridge is an embedded steel bridge (Fig.1) located
on the South-East high speed track in France at the kilometric
point 075+317; it crosses the secondary road D939 between
the towns of Sens and Soucy in the Yonne County.
Figure 1. PK 075+317 bridge on the Paris-Lyon high speed track.
This bridge was built in the early eighties; the increase of
the operating speed of high speed trains (TGV) has moved the
excitation frequency of the trains close to the first natural
frequency of the bridge. This risk of resonance was
furthermore emphasized by the uncertainties in the mass of
the ballast disposed on the bridge. The first natural frequency
was 5.86 Hz and the excitation frequency is around 4.0 Hz.
The French railways SNCF considered that this difference
was not large enough and ballast recharging in connection
with new operating speeds may reduce it furthermore. This is
why SNCF thus set up a system of rods near the bearings
(fig.2a) tightened by torque wrench (fig.2b); this
strengthening brings stiffness and increase natural
frequencies. In 2003 a strengthening intervention was
scheduled and led to a change in natural frequencies and mode
shapes as given in Tab.1 and Fig.3 [4]. Fig.3 provides a spatial
visualization of the mode shapes (the missing measurement
points are interpolated from the identified modal values).
Figure 2(a). Strengthening system of the bridge
Figure 2(b). Strengthening process of the bridge
Table 1. Natural frequencies of the PK 075+317 bridge before
and after strengthening (2003)
Frequency Before strengthening After strengthening
Mean [Hz] Standard deviation
Mean [Hz] Standard deviation
1 5.848 0.242 6.461 0.267
2 8.507 0.322 8.592 0.415
3 13.017 0.305 13.078 0.296
4 16.850 0.502 17.142 0.507
a) Bending mode shape
Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014
2459
b) Bending and torsional mode shape
c) Torsional mode shape
d) Bending and torsional mode shape
Figure 3. Spatial visualization of the first four mode shapes
A long term monitoring was later on decided to appraise the
efficiency of this strengthening over a non-continuous 2 years
period (December 2004 to March 2006). Confidence intervals
of the identified natural frequencies (±1.96 standard
deviation) for the different measurement periods are compared
to the frequencies before and just after strengthening. It can be
noticed that, for all frequencies, but most notably for the first
one, the mean values have a shift to the right characterising a
sensitive frequency increase after the strengthening procedure.
For the subsequent campaigns (especially those in 2005),
there is a slight frequency decrease tendency showing a
possible efficiency loss of the strengthening procedure.
However, the confidence intervals are always superposed.
Mode shapes are also compared. It is particularly obvious that
it is hardly difficult to distinguish them over the measurement
period. The MAC values between “after strengthening” and
the different measurement sequences vary from 97% to 99%;
this highlights that the MAC values are not enough
discriminating.
The arising problem is therefore to know if the bridge
dynamic behaviour at a certain period of time remains close to
the behaviour after strengthening or if it moves to the initial
structural condition, or to a completely different behaviour.
This problem is an assignment problem in clustering
techniques: assuming different clusters of data, is it possible
to affect any new information to one of these clusters, or
eventually to identify a new type of information that cannot be
related to any of these clusters?
In a previous paper [5], the authors analysed the vibrational
response and the modal properties of this bridge just before
and after strengthening by different clustering techniques
(dynamic clouds, hierarchy-divisive and hierarchy-
agglomerative techniques). The objective was to identify the
optimal number of clusters (i.e. of different structural
behaviours) and the corresponding clusters. The expected goal
was to prove that several structural behaviours could be
detected taking into account the variability in the structural
response (due to environmental or operating conditions). For
this purpose, as usual clustering techniques rely on
deterministic data, it was instead proposed to handle the
statistical variability of the dynamic characteristics by
coupling these techniques with a Symbolic Data Analysis
(SDA) modelling. Different types of data can be indeed
manipulated in data analysis. SDA provides suitable tools for
managing complex, aggregated, relational, and higher-level
data. The methodological issues under development
generalize the classical data analysis techniques, like
visualization, factorial techniques, decision tree,
discrimination, regression as well as classification and
clustering methods.
The purpose of this paper is applied the coupled
SDA/clustering approach to the classification of new data to
pre-identified clusters as done in [5]. In a first part, clustering
techniques and data assignment are presented while in a
second part, the initial clustering results are presented as they
will serve for basis in new data classification.
4 SYMBOLIC DATA ANALYSIS AND CLUSTERING
PROCEDURES
Symbolic data analysis coupled with fuzzy c-means clustering
method is first applied to measured accelerations. The main
objective here is to try to separate the 3 structural conditions
(before, during and after modification) into 3 different
clusters. In other words, the goal is to separate the whole set
of 41 measurements into specific groups (before, during and
after). Table 2 shows the percentages of correct classification
for the 3 clusters when 12-category histograms are used as
symbolic representations.
Table 2. Percentages of correct classification (signals).
(%) Before During After Pertinence
Cluster 1 67 20 13 0,46 Cluster 2 31 62 8 0,66 Cluster 3 31 31 38 0,41
Results from Table 2 show that the method is relatively
robust to detect structural modification. As mentioned before,
the advantage of using soft-clustering methods in spite of hard
clustering methods is that the former gives the possibility to
classify each test into every single one of the clusters. That
means that a test of type “before” could be classified into an
“after” cluster. Evidently, this result would be absurd. Thus, in
order to attest the efficiency of this clustering method, it is
Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014
2460
possible to evaluate and index – named “pertinence” – which
quantifies the certainty of the classification. If this index is 1,
it means the method is completely sure about its results
(which does not imply that the result is correct). Otherwise, if
the pertinence value is close 0, the method is not sure about its
classification. In this sense, it is possible to observe that
pertinence values for the classification showed in Table 2 are
relatively small (except for Cluster 2).
Table 3 summarizes the results obtained when natural
frequencies are used to classify the 41 dynamic tests. In this
case, results are much better when compared to those obtained
using signals. Pertinence values are, however, very low. This
indicates that, although the method classifies the test
correctly, it is not too sure about its results.
Table 3. Percentages of correct classification (natural frequencies).
(%) Before During After Pertinence
Cluster 1 87 13 0 0,33 Cluster 2 0 100 0 0,33 Cluster 3 0 8 92 0,33
Table 4-7 present the results when mode shapes are used as
symbolic features. When it comes to the first and third mode
shapes (Tables 4 and 6), percentages of correct classification
are rather fair. It is observed that the use of mode shapes is
better than employing signals but worse compared to natural
frequencies. Pertinence values are, once again, very low. This
indicates that, although the method classifies the test
correctly, it is not too sure about its results.
Table 4. Percentages of correct classification (1st mode shape).
(%) Before During After Pertinence
Cluster 1 80 13 7 0,33 Cluster 2 8 69 23 0,33 Cluster 3 0 0 100 0,33
Table 5. Percentages of correct classification (2nd mode
shape).
(%) Before During After Pertinence
Cluster 1 67 33 0 0,33 Cluster 2 8 77 15 0,33 Cluster 3 0 0 100 0,33
Results obtained using the 2nd and 4th mode shapes (Tables
5 and 7) are somewhat worse compared to the other mode
shapes. This might be related to the sensitivity of the modal
amplitudes regarding the structural modifications carried out.
Pertinence values are the same as those previously obtained.
Table 6. Percentages of correct classification (3rd mode
shape).
(%) Before During After Pertinence
Cluster 1 80 13 7 0,33 Cluster 2 0 62 38 0,33 Cluster 3 8 15 77 0,33
Table 7. Percentages of correct classification (4th mode
shape).
(%) Before During After Pertinence
Cluster 1 67 20 13 0,33 Cluster 2 0 85 15 0,33 Cluster 3 15 23 62 0,33
In order to try to improve classification results, the
structural state “during” was removed from the set of input
tests. In fact, since the “during” state is a continuous and
rather transitory condition, it might not represent a real
structural state.
Table 8 presents the results obtained when raw data
(accelerations) are used. It is possible to observe that not only
the percentages of correct classifications have increased, but
also the pertinence values. This may be explained by the fact
that method is surer about its results and less “confused” by
the input data.
Table 8. Percentages of correct classification (signals).
(%) Before After Pertinence
Cluster 1 73 27 0,87 Cluster 2 38 62 0,72
When natural frequencies are considered, results are even
better (see Table 9). Now, 100% of tests are correctly
classified into two different clusters (“before” and “after”).
There is, however, and interesting remark: even though results
are perfect, the method has some “doubts” about its
classification. Pertinence values are equal to 0,5 (or 50%)
which means the tests could be classified either way. In order
to circumvent or minimize this issue, some specific
parameters (such as the fuzzyfication index) can be calibrated.
Table 9. Percentages of correct classification (natural
frequencies).
(%) Before After Pertinence
Cluster 1 100 0 0,50 Cluster 2 0 100 0,50
Table 10 assembles the results obtained when the 1st and 2nd
mode shapes are used as inputs. As observed with the natural
frequencies, the classification is perfect for both clusters.
However, the pertinence values are also 0,5. Again, this could
be due to the fuzzyfication index adopted in this study.
Table 10. Percentages of correct classification (1st and 2nd
mode shapes).
(%) Before After Pertinence
Cluster 1 100 0 0,50 Cluster 2 0 100 0,50
Finally, the percentages of correct classification using the
3rd and 4th modes are presented in Table 11. Results are
slightly worse compared to those obtained using the first two
mode shapes. Pertinence values, remain, however, unchanged.
Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014
2461
Table 11. Percentages of correct classification (3rd and 4th
mode shapes).
(%) Before After Pertinence
Cluster 1 80 20 0,50 Cluster 2 20 80 0,50
5 CONCLUSIONS
In this paper a novelty technique based on Symbolic Data
Analysis was introduced in order to classify different
structural behaviors. For this purpose, raw information
(acceleration measurements) as well as processed information
(modal data) were used for feature extraction. A soft-
clustering technique was applied for data classification: fuzzy
c-means algorithm.
Furthermore, an experimental application containing three
structural states of a railway bridge located in France was
studied. The objective was to use the clustering methods
applied to the data in order to discriminate the structure’s
different conditions. The results obtained showed that the
SDA methods are efficient to classify and to discriminate
structural modifications either considering the vibration data
or the modal parameters. In fact, more robust results are given
when considering modal data rather than applied to
measurement data.
In order to try a different classification approach, only two
data sets were considered (those belonging to the “before”
structural state and those of “after”). Removing the “during”
data set allowed improving the classification ratios on all
simulations. It could be observed that this data set was
bringing some “confusion” to the clustering algorithm.
Pertinence values – which measure the certainty of
classification – were mostly low throughout the simulations.
This, however, might by due to a specific parameter inherent
to the fuzzy c-means algorithm. Future works will concentrate
on improving these values as well as the percentages of
correct classification.
ACKNOWLEDGEMENTS
The authors would like to thank UFJF (Universidade Federal
de Juiz de Fora - Federal University of Juiz de Fora), CAPES
(Coordenação de Aperfeiçoamento de Pessoal de Nível
Superior), CNPq (Conselho Nacional de Desenvolvimento
Científico e Tecnológico - "National Council of
Technological and Scientific Development") and FAPEMIG
(Fundação de Amparo à Pesquisa do Estado de Minas Gerais)
for the financial support.
REFERENCES
[1] Diday, E., Noirhomme-Fraiture, M., 2008, Symbolic Data Analysis and the SODAS Software, Wiley and Sons.
[2] Billard, L.; Diday, E., 2006, Symbolic Data Analysis, Wiley and Sons.
[3] Kroszynski, U., Zhou, J., Fuzzy Clustering Principles, Methods and Examples, IKS, December 1998
[4] Cremona, C., 2004, Dynamic monitoring applied to the detection of
structural modifications. A high speed railway bridge study, Progress in Structural Engineering and Materials, 3, p.147-161.
[5] Cury, A., Cremona, C., Diday E., 2010, Application of Symbolic Data Analysis for structural modification assessment, Engineering Structures,
32, p.762-775.
Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014
2462