Download pdf - Symbolic soft-clustering applied to structural health

ABSTRACT: Structural health monitoring is based on the development of reliable and robust indicators able to detect, locate,

quantify and predict damage. Studies related to damage detection in civil engineering structures have a noticeable interest for

researchers in this area. Indeed, the detection of structural changes likely to become critical can avoid the occurrence of major

dysfunctions associated with social, economic and environmental consequences. Recently, many researchers have focused on

dynamic assessment as part of structural diagnosis. Most of the studied techniques are based on time or frequency domain

analyses to extract compressed information from modal characteristics or based on indicators built from these parameters. This

work has as its main interest the use a soft-clustering method coupled with the Symbolic Data Analysis (SDA) to detect

structural damage. SDA is applied to dynamic measurements obtained on site (accelerations) and to the identified modal

parameters. In order to attest the efficiency of the proposed approaches, an experimental investigation is carried out. It is shown

that SDA coupled with clustering methods is able to distinguish structural conditions with adequate rates.

KEY WORDS: Structural Health Monitoring, Damage Detection, Novelty Techniques

1 INTRODUCTION

Studies related to early damage detection is of special concern

for civil engineering structures. It is common knowledge that

if a damage process is not identified in time, structural

systems may have serious safety and economic consequences.

Traditional methods of damage detection and health

monitoring are often based on the variation of structural

vibration characteristics, i.e. natural frequencies, damping

ratios and mode shapes. These modal parameters are directly

affected by changes in the physical properties of the structure

including its mass and stiffness. Nevertheless, modal

parameters identification is a sort of filtering process, leading

to a loss of information compared to the raw data. This

compression process can erase any small changes due to a

structural modification. In turn, using raw dynamic

measurements (especially if high sampling frequencies are

used) leads to the storage of large set of data. Dynamic

measurements can easily contain over thou-sands of values

making an analysis process extensive and prohibitive.

Nevertheless, several damage detection methods exist in the

literature based on signature principles, but they usually fail

when making them practical. In this sense, despite the current

processing power of computers, the necessary computational

effort to manipulate large data sets re-mains a problem.

Furthermore, and this is certainly the major drawback when

using modal parameters, is that modal components are

essentially describing an equivalent linear behavior, a feature

which may be not exact for the analysis of specific degraded

systems.

Data mining is the process of extracting hidden patterns

from data. As more data is gathered in monitoring, data

mining is becoming an increasingly important tool to trans-

form this data into information. It is commonly used in a wide

range of profiling practices, such as marketing, fraud

detection and scientific discovery. Data mining can be applied

to data sets of any size. However, while it can be used to

uncover hidden patterns in data that has been collected,

obviously it can neither uncover patterns which are not

already present in the data, nor can it uncover patterns in data

that has not been collected. This richer type of data is called

symbolic data and it allows representing the variability and

uncertainty present in each variable. The development of new

methods of data analysis suitable for treating this type of data

is the aim of Symbolic Data Analysis (SDA). Most of the

currently developed techniques in symbolic data analysis are

extensions of statistical methods. Diday et al. [1] certify the

growth of data of symbolic nature and alert to the necessity of

developing new statistical methodologies for the treatment of

such information. In general, SDA provides suitable tools for

managing complex, aggregated, relational, and higher-level

data. The methodological issues under development

generalize the classical data analysis techniques, like

visualization, factorial techniques, decision tree,

discrimination, regression as well as classification and

clustering methods. Previous works in the literature

introduced the use of SDA applied to data representing

structural temperature variation in time.

The purpose of this paper is to present some principles of

symbolic data analysis and, more specifically, the use of

clustering methods applied to structural damage assessment.

The idea consists in using different classification procedures

in order to have insights about structural health conditions. In

other words, the SDA is applied to either the vibration data

(signals) obtained through dynamic tests or the modal

parameters in order to infer if a damage process is in progress

or if it has already occurred in the structure. In order to show

the potentialities of the proposed methodology, several results

regarding experimental tests performed on a real case, a rail

Symbolic soft-clustering applied to structural health monitoring

Vinícius Alves1, Alexandre Cury2, Flávio Barbosa, Christian Cremona3

1University of Ouro Preto, Post-Graduate Program in Civil Engineering, Ouro Preto, Brazil 2University of Juiz de Fora, Department of Applied Computational Mechanics, Juiz de Fora, Brazil

3Technical Centre for Bridge Engineering, CEREMA/DTITM, Sourdun, France

email: [email protected], [email protected], [email protected], christian.cremona@developpement-

durable.gouv.fr

Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014Porto, Portugal, 30 June - 2 July 2014

A. Cunha, E. Caetano, P. Ribeiro, G. Müller (eds.)ISSN: 2311-9020; ISBN: 978-972-752-165-4

2457

bridge located in France on the high speed line between Paris

and Lyon. Vibration measurements were obtained under three

different conditions: before, during and after a structural

modification process.

2 PRINCIPLES OF SYMBOLIC CLUSTERING

METHODS

In general, data acquisition campaigns in civil engineering

structures gather thousands of accelerations values measured

by several sensors. Consequently, analyzing all of these data

(classical data) directly may usually be time-consuming or

even prohibitive. In this sense, transforming this massive

quantity of data into a compact but also rich descriptive type

of data (symbolic data) becomes an attractive approach. Let us

consider, for instance, a signal X (which is part of a dynamic

test) containing n acceleration values measured by one single

sensor. There are several ways to transform classical data into

symbolic data. This signal can be represented by:

a k-category histogram: X={a1(n1), a2(n2), a3(n3), …,

ak(nk)};

an interquartile interval: X = [a25%; a75%];

a min/max interval: X = [amin; amax].

The same representation can be applied to modal

parameters, i.e. natural frequencies and mode shapes. In other

words, both of these quantities can be represented by intervals

or histograms. Transforming classical data to symbolic data is

carried out almost instantly, which does not prohibit or make

difficult the use of this methodology for a large ensemble of

dynamic tests. 10-category histograms and interquartile

intervals were used in the present study.

Data clustering is a common technique for statistical data

analysis, which is used in many fields. A clustering procedure

is an unsupervised learning process and can be defined as a

way of classifying a number of objects into different groups.

More precisely, it can be described as the partitioning of a

data set into subsets (clusters), so that the data in each subset

share some common properties. For an appropriate clustering,

it is necessary to minimize the within-cluster variation to

obtain the most homogeneous clusters as possible and, as a

natural consequence, to maximize the between-cluster

variation to obtain the most dissimilar clusters among each

other. To define these clusters and determine the proximity (or

similarity) among the tests, it is necessary to define suitable

dissimilarity measures. In a common sense, the lower these

values are the more similar the objects are and thus, they are

gathered in the same cluster. Conversely, the objects allocated

into different clusters are the ones that have greater distances

between them. Dissimilarity measures can take a variety of

forms and some applications might require specific ones.

More details can be found in Billard and Diday [2]. This paper

uses a soft-clustering method to discriminate structural

behaviors.

Fuzzy clustering plays an important role in solving

problems in the areas of pattern recognition and fuzzy model

identification. A variety of fuzzy clustering methods has been

proposed and most of them are based upon distance criteria

[3]. One widely used algorithm is the fuzzy c-means (FCM)

algorithm. It uses reciprocal distance to compute fuzzy

weights. A more efficient algorithm is the new FCFM. It

computes the cluster center using Gaussian weights, uses large

initial prototypes, and adds processes of eliminating,

clustering and merging.

2.1 Fuzzy c-means clustering method

Fuzzy clustering plays an important role in solving problems

in the areas of pattern recognition and fuzzy model

identification. A variety of fuzzy clustering methods has been

proposed and most of them are based upon distance criteria.

One widely used algorithm is the fuzzy c-means (FCM)

algorithm. It uses reciprocal distance to compute fuzzy

weights. A more efficient algorithm is the new FCFM. It

computes the cluster center using Gaussian weights, uses large

initial prototypes, and adds processes of eliminating,

clustering and merging.

The fuzzy c-means (FCM) algorithm has successfully been

applied to a wide variety of clustering problems. This

approach partitions a set of object data {x1,…,xn} Rs into c

(fuzzy) clusters based on a computed minimizer of the fuzzy

within-group least squares functional

2

21 1

),( ik

c

i

n

k

m

ikm vxUvUJ

(1)

where,

m > 1 is the fuzzification parameter,

vi Rs is the prototype (or mean) of the ith cluster,

Uik [0,1] is the degree to which datum xk belongs to the

ith cluster,

v = [vji] = [v1,…,vc] Rsc is the matrix of cluster

prototypes,

U = [Uik] is the partition matrix, and

2

2*

is the Euclidean or 2-norm, squared.

For later notational convenience, we will array the object

data {x1,…,xn} as columns in the object data matrix X = [xjk]

= [x1,…,xn] Rsn. The partition matrix U is a convenient

tool for representing cluster structure in the data {x1,…,xn};

we define the set of all nondegenerate fuzzy cn partition

matrices for partitioning n data into c clusters as

}0;1;10:,|{M11

fcn

n

k

ik

c

i

ikik

nc UUUkiRU (2)

The most popular and effective method of optimizing (1) is

the fuzzy c-means algorithm, which alternates between

optimizations of *)v|U(J~

m over U with v* fixed and

*)U|v(Jm over v with U* fixed, producing a sequence

{(U(r),v(r))}. Specifically, the r+1st value of v = [v1,…,vc] is

computed using the rth value of U in the right hand side of

vi =

n

1k

mik

n

1k

kmik

UU x for i=1,…,c (3)

Then the updated r+1st value of v is used to calculate the

r+1st value of U via

Uik =

c

1j

)1m/(1jk

)1m/(1ik

dd (4)

Proceedings of the 9th International Conference on Structural Dynamics, EURODYN 2014

2458

where,

dik = 2

2ik vx > 0, for i = 1,…,c

and k = 1,…,n. (5)

The FCM iteration is initialized using some U Mfcn (or

possibly v Rsc), and continues by alternating the updates

in (3) and (4) until the difference measured in any norm on

Rcn (or Rsc), in successive partition matrices (or v

matrices) is less than some prescribed tolerance, .

3 EXPERIMENTAL APPLICATION

The studied bridge is an embedded steel bridge (Fig.1) located

on the South-East high speed track in France at the kilometric

point 075+317; it crosses the secondary road D939 between

the towns of Sens and Soucy in the Yonne County.

Figure 1. PK 075+317 bridge on the Paris-Lyon high speed track.

This bridge was built in the early eighties; the increase of

the operating speed of high speed trains (TGV) has moved the

excitation frequency of the trains close to the first natural

frequency of the bridge. This risk of resonance was

furthermore emphasized by the uncertainties in the mass of

the ballast disposed on the bridge. The first natural frequency

was 5.86 Hz and the excitation frequency is around 4.0 Hz.

The French railways SNCF considered that this difference

was not large enough and ballast recharging in connection

with new operating speeds may reduce it furthermore. This is

why SNCF thus set up a system of rods near the bearings

(fig.2a) tightened by torque wrench (fig.2b); this

strengthening brings stiffness and increase natural

frequencies. In 2003 a strengthening intervention was

scheduled and led to a change in natural frequencies and mode

shapes as given in Tab.1 and Fig.3 [4]. Fig.3 provides a spatial

visualization of the mode shapes (the missing measurement

points are interpolated from the identified modal values).

Figure 2(a). Strengthening system of the bridge

Figure 2(b). Strengthening process of the bridge

Table 1. Natural frequencies of the PK 075+317 bridge before

and after strengthening (2003)

Frequency Before strengthening After strengthening

Mean [Hz] Standard deviation

Mean [Hz] Standard deviation

1 5.848 0.242 6.461 0.267

2 8.507 0.322 8.592 0.415

3 13.017 0.305 13.078 0.296

4 16.850 0.502 17.142 0.507

a) Bending mode shape


2459

b) Bending and torsional mode shape

c) Torsional mode shape

d) Bending and torsional mode shape

Figure 3. Spatial visualization of the first four mode shapes

A long term monitoring was later on decided to appraise the

efficiency of this strengthening over a non-continuous 2 years

period (December 2004 to March 2006). Confidence intervals

of the identified natural frequencies (±1.96 standard

deviation) for the different measurement periods are compared

to the frequencies before and just after strengthening. It can be

noticed that, for all frequencies, but most notably for the first

one, the mean values have a shift to the right characterising a

sensitive frequency increase after the strengthening procedure.

For the subsequent campaigns (especially those in 2005),

there is a slight frequency decrease tendency showing a

possible efficiency loss of the strengthening procedure.

However, the confidence intervals are always superposed.

Mode shapes are also compared. It is particularly obvious that

it is hardly difficult to distinguish them over the measurement

period. The MAC values between “after strengthening” and

the different measurement sequences vary from 97% to 99%;

this highlights that the MAC values are not enough

discriminating.

The arising problem is therefore to know if the bridge

dynamic behaviour at a certain period of time remains close to

the behaviour after strengthening or if it moves to the initial

structural condition, or to a completely different behaviour.

This problem is an assignment problem in clustering

techniques: assuming different clusters of data, is it possible

to affect any new information to one of these clusters, or

eventually to identify a new type of information that cannot be

related to any of these clusters?

In a previous paper [5], the authors analysed the vibrational

response and the modal properties of this bridge just before

and after strengthening by different clustering techniques

(dynamic clouds, hierarchy-divisive and hierarchy-

agglomerative techniques). The objective was to identify the

optimal number of clusters (i.e. of different structural

behaviours) and the corresponding clusters. The expected goal

was to prove that several structural behaviours could be

detected taking into account the variability in the structural

response (due to environmental or operating conditions). For

this purpose, as usual clustering techniques rely on

deterministic data, it was instead proposed to handle the

statistical variability of the dynamic characteristics by

coupling these techniques with a Symbolic Data Analysis

(SDA) modelling. Different types of data can be indeed

manipulated in data analysis. SDA provides suitable tools for

managing complex, aggregated, relational, and higher-level

data. The methodological issues under development

generalize the classical data analysis techniques, like

visualization, factorial techniques, decision tree,

discrimination, regression as well as classification and

clustering methods.

The purpose of this paper is applied the coupled

SDA/clustering approach to the classification of new data to

pre-identified clusters as done in [5]. In a first part, clustering

techniques and data assignment are presented while in a

second part, the initial clustering results are presented as they

will serve for basis in new data classification.

4 SYMBOLIC DATA ANALYSIS AND CLUSTERING

PROCEDURES

Symbolic data analysis coupled with fuzzy c-means clustering

method is first applied to measured accelerations. The main

objective here is to try to separate the 3 structural conditions

(before, during and after modification) into 3 different

clusters. In other words, the goal is to separate the whole set

of 41 measurements into specific groups (before, during and

after). Table 2 shows the percentages of correct classification

for the 3 clusters when 12-category histograms are used as

symbolic representations.

Table 2. Percentages of correct classification (signals).

(%) Before During After Pertinence

Cluster 1 67 20 13 0,46 Cluster 2 31 62 8 0,66 Cluster 3 31 31 38 0,41

Results from Table 2 show that the method is relatively

robust to detect structural modification. As mentioned before,

the advantage of using soft-clustering methods in spite of hard

clustering methods is that the former gives the possibility to

classify each test into every single one of the clusters. That

means that a test of type “before” could be classified into an

“after” cluster. Evidently, this result would be absurd. Thus, in

order to attest the efficiency of this clustering method, it is


2460

possible to evaluate and index – named “pertinence” – which

quantifies the certainty of the classification. If this index is 1,

it means the method is completely sure about its results

(which does not imply that the result is correct). Otherwise, if

the pertinence value is close 0, the method is not sure about its

classification. In this sense, it is possible to observe that

pertinence values for the classification showed in Table 2 are

relatively small (except for Cluster 2).

Table 3 summarizes the results obtained when natural

frequencies are used to classify the 41 dynamic tests. In this

case, results are much better when compared to those obtained

using signals. Pertinence values are, however, very low. This

indicates that, although the method classifies the test

correctly, it is not too sure about its results.

Table 3. Percentages of correct classification (natural frequencies).



Table 4-7 present the results when mode shapes are used as

symbolic features. When it comes to the first and third mode

shapes (Tables 4 and 6), percentages of correct classification

are rather fair. It is observed that the use of mode shapes is

better than employing signals but worse compared to natural

frequencies. Pertinence values are, once again, very low. This

indicates that, although the method classifies the test

correctly, it is not too sure about its results.

Table 4. Percentages of correct classification (1st mode shape).



Table 5. Percentages of correct classification (2nd mode

shape).



Results obtained using the 2nd and 4th mode shapes (Tables

5 and 7) are somewhat worse compared to the other mode

shapes. This might be related to the sensitivity of the modal

amplitudes regarding the structural modifications carried out.

Pertinence values are the same as those previously obtained.

Table 6. Percentages of correct classification (3rd mode

shape).



Table 7. Percentages of correct classification (4th mode

shape).



In order to try to improve classification results, the

structural state “during” was removed from the set of input

tests. In fact, since the “during” state is a continuous and

rather transitory condition, it might not represent a real

structural state.

Table 8 presents the results obtained when raw data

(accelerations) are used. It is possible to observe that not only

the percentages of correct classifications have increased, but

also the pertinence values. This may be explained by the fact

that method is surer about its results and less “confused” by

the input data.

Table 8. Percentages of correct classification (signals).

(%) Before After Pertinence

Cluster 1 73 27 0,87 Cluster 2 38 62 0,72

When natural frequencies are considered, results are even

better (see Table 9). Now, 100% of tests are correctly

classified into two different clusters (“before” and “after”).

There is, however, and interesting remark: even though results

are perfect, the method has some “doubts” about its

classification. Pertinence values are equal to 0,5 (or 50%)

which means the tests could be classified either way. In order

to circumvent or minimize this issue, some specific

parameters (such as the fuzzyfication index) can be calibrated.

Table 9. Percentages of correct classification (natural

frequencies).


Cluster 1 100 0 0,50 Cluster 2 0 100 0,50

Table 10 assembles the results obtained when the 1st and 2nd

mode shapes are used as inputs. As observed with the natural

frequencies, the classification is perfect for both clusters.

However, the pertinence values are also 0,5. Again, this could

be due to the fuzzyfication index adopted in this study.

Table 10. Percentages of correct classification (1st and 2nd

mode shapes).


Cluster 1 100 0 0,50 Cluster 2 0 100 0,50

Finally, the percentages of correct classification using the

3rd and 4th modes are presented in Table 11. Results are

slightly worse compared to those obtained using the first two

mode shapes. Pertinence values, remain, however, unchanged.


2461

Table 11. Percentages of correct classification (3rd and 4th

mode shapes).


Cluster 1 80 20 0,50 Cluster 2 20 80 0,50

5 CONCLUSIONS

In this paper a novelty technique based on Symbolic Data

Analysis was introduced in order to classify different

structural behaviors. For this purpose, raw information

(acceleration measurements) as well as processed information

(modal data) were used for feature extraction. A soft-

clustering technique was applied for data classification: fuzzy

c-means algorithm.

Furthermore, an experimental application containing three

structural states of a railway bridge located in France was

studied. The objective was to use the clustering methods

applied to the data in order to discriminate the structure’s

different conditions. The results obtained showed that the

SDA methods are efficient to classify and to discriminate

structural modifications either considering the vibration data

or the modal parameters. In fact, more robust results are given

when considering modal data rather than applied to

measurement data.

In order to try a different classification approach, only two

data sets were considered (those belonging to the “before”

structural state and those of “after”). Removing the “during”

data set allowed improving the classification ratios on all

simulations. It could be observed that this data set was

bringing some “confusion” to the clustering algorithm.

Pertinence values – which measure the certainty of

classification – were mostly low throughout the simulations.

This, however, might by due to a specific parameter inherent

to the fuzzy c-means algorithm. Future works will concentrate

on improving these values as well as the percentages of

correct classification.

ACKNOWLEDGEMENTS

The authors would like to thank UFJF (Universidade Federal

de Juiz de Fora - Federal University of Juiz de Fora), CAPES

(Coordenação de Aperfeiçoamento de Pessoal de Nível

Superior), CNPq (Conselho Nacional de Desenvolvimento

Científico e Tecnológico - "National Council of

Technological and Scientific Development") and FAPEMIG

(Fundação de Amparo à Pesquisa do Estado de Minas Gerais)

for the financial support.

REFERENCES

[1] Diday, E., Noirhomme-Fraiture, M., 2008, Symbolic Data Analysis and the SODAS Software, Wiley and Sons.

[2] Billard, L.; Diday, E., 2006, Symbolic Data Analysis, Wiley and Sons.

[3] Kroszynski, U., Zhou, J., Fuzzy Clustering Principles, Methods and Examples, IKS, December 1998

[4] Cremona, C., 2004, Dynamic monitoring applied to the detection of

structural modifications. A high speed railway bridge study, Progress in Structural Engineering and Materials, 3, p.147-161.

[5] Cury, A., Cremona, C., Diday E., 2010, Application of Symbolic Data Analysis for structural modification assessment, Engineering Structures,

32, p.762-775.


2462