View
317
Download
2
Tags:
Embed Size (px)
DESCRIPTION
An informative probability model enhancing real time echobiometry to improve fetal weight estimation accuracy G. Cevenini Æ F. M. Severi Æ C. Bocchi Æ F. Petraglia Æ P. Barbini
Citation preview
ORIGINAL ARTICLE
An informative probability model enhancing real timeechobiometry to improve fetal weight estimation accuracy
G. Cevenini Æ F. M. Severi Æ C. Bocchi ÆF. Petraglia Æ P. Barbini
Received: 4 May 2007 / Accepted: 28 November 2007 / Published online: 10 January 2008
� International Federation for Medical and Biological Engineering 2007
Abstract A multinormal probability model is proposed to
correct human errors in fetal echobiometry and improve the
estimation of fetal weight (EFW). Model parameters were
designed to depend on major pregnancy data and were
estimated through feed-forward artificial neural networks
(ANNs). Data from 4075 women in labour were used for
training and testing ANNs. The model was implemented
numerically to provide EFW together with probabilities of
congruence among measured echobiometric parameters. It
enabled ultrasound measurement errors to be real-time
checked and corrected interactively. The software was use-
ful for training medical staff and standardizing measurement
procedures. It provided multiple statistical data on fetal
morphometry and aid for clinical decisions. A clinical pro-
tocol for testing the system ability to detect measurement
errors was conducted with 61 women in the last week of
pregnancy. It led to decisive improvements in EFW accuracy.
Keywords Probability model � Neural networks �Ultrasound � Echobiometry � Fetal weight estimation
1 Introduction
Many decisions in obstetrics depend on gestational age
(GA) and fetal weight (FW). Accurate ultrasound
examination performed before 20 weeks of gestation
enables true GA to be estimated [42]. On the other hand,
estimation of FW (EFW) using standard biometric
parameters, usually related to geometric dimensions of the
fetal head, abdomen and long bones of extremities, is still
problematical [18].
Monitoring of fetal growth is fundamental in modern
perinatology because it is strictly related to fetal/neonatal
wellbeing [43]. Moreover, identification of abnormal
intrauterine growth patterns enables better pregnancy
management [10, 21, 43].
In the last 30 years, many methods have been developed
to improve EFW accuracy, most based on formulae derived
by regression analysis [3, 16, 22, 23, 25, 27, 33, 35, 36, 38,
41, 44], or on physical models [2, 14, 17, 29]. Artificial
neural networks (ANNs) and volumetric methods based on
three-dimensional (3D) ultrasonography were also recently
proposed [11, 20, 40].
Clinical use of these mathematical models led to intro-
duction of EFW in ultrasound reports. Although effective
in the original papers, ultrasound operators know that every
estimation model loses efficacy when applied in clinical
practice [9, 17]. The differences between accuracies in the
literature and those obtained in local clinical institutions
are due to many factors, the ones being significant statis-
tical dissimilarity between original and local populations
and samples, diversities in echobiometric measurement
procedures and lack of model generalization. Little atten-
tion has usually been paid to generalization, which refers to
a model ability to provide the same accuracy on data not
used for model identification [5]. Specifically, empirical
formulae do not guarantee a good compromise between
model flexibility to fit all useful information and robustness
to filter useless data variability. Too many model
parameters have been estimated from few ultrasound cases
G. Cevenini (&) � P. Barbini
Department of Surgery and Bioengineering, University of Siena,
Viale Mario Bracci 16, 53100 Siena, Italy
e-mail: [email protected]
F. M. Severi � C. Bocchi � F. Petraglia
Department of Pediatrics, Obstetrics and Reproductive
Medicine, University of Siena,
Viale Mario Bracci 16, 53100 Siena, Italy
123
Med Bio Eng Comput (2008) 46:109–120
DOI 10.1007/s11517-007-0299-2
near delivery. Sometimes fetuses with non-homogeneous
weight or GA intervals not representative of the whole
population are used. In other cases the clinical condition of
women in labour is neglected or incorrectly reported.
Although attempts to reduce statistical sample errors and
lack of generalization power by selecting the most accurate
and representative models have been made, a percentage
mean absolute error less than 7–8% of the true BW has
never been achieved in current clinical practice, with 25%
(or more) of estimates having an absolute error over 10%
[29]. Unfortunately, since most obstetricians take 10% as a
critical error threshold above which EFW cannot guarantee
correct clinical management, the method cannot yet be
considered reliable for clinical decision-making [7, 17].
Though many attempts have been made to reduce esti-
mation errors by means of models specialized in particular
ranges of FW or GA [16, 23, 36], or derived from sophisti-
cated 3D and ANN methods [11, 40], it has not proven
possible to significantly reduce the error, because it is pre-
sumably due to many different unpredictable factors (human,
environmental, instrumental, technological, etc.) associated
with digital processing of echobiometric values [17].
Since the 10% error limit for all populations of fetuses is
not so far away, there is great interest in finding solutions
that could improve EFW accuracy enough to reach the goal.
Actually, the only way to enhance fetal weight predic-
tion accuracy seems to be reduction of operator
measurement error. Indeed, readings made by operators
with long experience in fetal ultrasound have significantly,
but not still sufficiently, lower errors.
This paper describes a computerized information system
to help ultrasound operators in the control and interactive
correction of measurement errors in two-dimensional fetal
biometry. It is based on a Gaussian multivariate (multi-
normal) probability model, the parameters of which are
identified by ANNs trained with sample data representing a
wide fetal population. Therefore, it properly belongs to
machine learning methods which are widely used in com-
puting applications to support clinical decision making.
The effective level of real time improvement in the accu-
racy of EFW was tested clinically in a small sample of
pregnant women.
2 Methods
2.1 Population and samples
To design the model we used data of 4,075 fetuses in the
last week before birth, recorded in our clinics over the last
10 years. Only fetuses with evident malformations were
excluded from the database which was divided into three
samples equally representative of the fetal population:
a training set and a validation set of the same size from the
first 3,200 fetuses, the former by odd positions and the
latter by even positions of the chronologically ordered list;
the last 875 cases constituted a testing set. The training and
validation sets were used for model training, whereas the
testing set was used to check that model performance
remained statistically equivalent with new data (generali-
zation ability). Finally, the system was applied in clinical
practice to 61 pregnant women in the last week before
delivery to verify its effective capacity to support interac-
tive correction of real-time ultrasound measurements and
to improve EFW accuracy.
2.2 Measurement variables
Fetal echobiometric data, including biparietal diameter
(BPD), head and abdominal circumferences (HC, AC), and
femur length (FL), were measured by transabdominal
ultrasound scan with a Siemens Sonoline Elegra Millenium
Edition ultrasound system or a MYLAB Family instrument
(ESAOTE spa, Genova, Italy). Gestational age (GA) in
weeks was established by accurate menstrual history con-
firmed by ultrasound examination before the 20th week of
gestation. True FW was determined by measuring birth
weight (BW) with a precision balance soon after the
delivery. BW was the dependent variable used to train our
model to estimate FW from ultrasound scans just before
delivery.
Essential pregnancy data, namely amniotic fluid volume
(AF), number of fetuses (FN) and number of days between
last ultrasound examination and delivery (US-D) were also
entered in the training process.
AF was conceived as a binary-coded qualitative variable
with four categories: normal, absent, reduced and aug-
mented volume. US-D ranged from 0 (i.e. ultrasound
examination and delivery on the same day) to 6 (i.e.
ultrasound examination 6 days before delivery).
2.3 Multinormal probability model
To describe the probability space of the ultrasound mea-
surements we used the multivariate Gaussian density
function:
pðx=wÞ ¼ 1
2pð Þd=2 RðwÞj j1=2
exp � 1
2½x� lðwÞ�TR�1
w ½x� lðwÞ�� � ð1Þ
where T is the vector transposition operator, d = 5 the
parameter space dimension, x = [BPD HC AC FL GA] the
110 Med Bio Eng Comput (2008) 46:109–120
123
vector of current echobiometric parameters, w = [BW AF
FN US-D] an information vector conditioning density
function (1), and l ðwÞ and R ðwÞ the mean vector and
covariance matrix, respectively, of parameters which
depend on w and have to be estimated to completely define
the probability model (1).
2.4 Artificial neural networks
Three feed-forward ANNs were designed to estimate the
parameters l ðwÞ and R ðwÞ of the multivariate normal
model. They were made sufficiently flexible (sufficient
number of hidden neurons and appropriate functions of
neuron activation) to encompass all deterministic data
patterns. Proceeding by trial and error, we selected ANN
architecture having ten neurons in a single hidden layer. It
offered a good compromise between simplicity and
generalization ability through error minimisation. Hidden
neurons were equipped with biased tansig activation
functions. The output neurons had linear activation for
estimating model parameters. The input data were stan-
dardized before presentation to the network, so as to have
zero mean and unit standard deviation. Standardization has
been shown to increase the efficiency of ANN training [6].
The first ANN, ANN1, was designed to estimate the
model mean vector, l ðwÞ; for each combination of
pregnancy information w, considered as input data. A
block diagram of ANN1 is shown in Fig. 1, where the
training (T) and prediction (P) phases are in the upper and
lower left sides, respectively. Specifically, ANN1 is
trained to recognize the set of echobiometric measure-
ments x, i.e. BPD, HC, AC, FL and GA, from input data
w, i.e. BW, AF, FN and US-D. Once trained, ANN1
predicts the corresponding most likely (expected) para-
meter values �x; i.e. BPD;HC;AC; FL and GA; for any a
given set of pregnancy information. These expected values
are assumed as a reliable estimation of the mean para-
meter vector l ðwÞ: The ANN1 prediction phase is
reported in Fig.1 because it is necessary to obtain
parameter deviations, [xi - li(w)], (i = 1, 2,…, 5), namely
the differences between an echobiometric measurement,
xi, and its corresponding mean value, li, estimated by
ANN1 as a function of input data w. In the centre of Fig. 1
the calculation of deviations is illustrated, together with
their squared values, i.e. deviances di = [xi - li(w)]2, and
all their paired products, i.e. codeviances didj = [xi -
li(w)]�[xj - lj(w)] (i = j = 1, 2,…, 5).
The two remaining ANNs, ANN2 (upper right side of
Fig. 1) and ANN3 (lower right side of Fig. 1), were then
trained to recognize deviances and codeviances, respec-
tively. Once trained, ANN2 and ANN3 could therefore
estimate the expected values of deviances and codevi-
ances, E{[xi - li(w)]2} and E{[xi - li(w)]�[xj - lj(w)]},
respectively, which were taken as suitable estimations of
variances ri2 and covariances rirj of model covariance
matrix R ðwÞ: Of all the pregnancy information, only BW
was assumed to affect the model covariance matrix. It is
BPD
HC
AC
FL
GA
BW
AF
FN
US-D
(T)
ANN1
BPD
HC
AC
FL
GA
BW
AF
FN
US-D(P)
ANN1
- - - - -
δi2
δi δj
δ 2BPD
δ 2HC
δ 2AC
δ 2FL
δ 2GA
BW
(T)
ANN2
BW
(T)
ANN3
δ δBPD HC
δ δBPD AC
δ δBPD FL
δ δBPD GA
δ δHC AC
δ δHC FL
δ δHC GA
δ δAC FL
δ δAC GA
δ δFL GA
Fig. 1 Block diagram of the
feed-forward ANN training
process
Med Bio Eng Comput (2008) 46:109–120 111
123
well-known that the inferential process exploits a reduction
of data dimensions, especially when a large number of
parameters (matrix elements) have to be estimated [6].
Significantly improved accuracy of estimates largely
compensates for the lack of other pregnancy information.
ANN2 and ANN3 were therefore equipped with a single
BW input (see right of Fig. 1). Their prediction phase is not
reported in Fig. 1, to avoid unnecessary detail.
All the ANNs were trained using a batch training
method which updates synaptic weights and neuron biases
only after all inputs and targets have been presented, i.e.
after each iteration. An iterative training algorithm with
gradient descendent momentum and adaptive learning rate
was used to minimise the mean squared error between real
and predicted outputs.
To limit the influence of training algorithm initialization
on the solution, we performed 99 training sessions starting
from 99 different randomly-selected initial values of ANN
parameters (i.e. synaptic biases and weights), and chose the
session giving the median error value (50th sorted value).
The early-stopping method was applied directly during
the training process to control ANN generalization power
and avoid the problem of overfitting [6, 24]. At each iter-
ation, training and validation errors were calculated from
data used to train the ANN (training set) and to validate
generalization (validation set), respectively. Training was
stopped when the validation error did not decrease for ten
consecutive iterations. Testing data was then used to con-
firm generalization on a third set of cases that had not been
used during training.
2.5 Fetal weight estimation
The principal aim of this study was to predict FW, which
was strictly related to BW for training ANNs. BW is the
first component of pregnancy information vector w and
cannot be known for an unborn fetus.
In the case of a fetus, whose mathematical expressions
will be denoted with an upper symbol *, knowledge of the
other three components of vector ~w; that is AF, FN and US-
D, and its measured echobiometric parameters, ~x; allows
ANN1 to identify the vector of expected parameters,
~lðBWÞ; as a function of unknown BW. It identifies five
monotonic curves on which five expected values of BW
can be found corresponding to actual measurements ~x; they
are expressed by the five-dimensional vector BWexp.
The most probable value of BW, BWmp, corresponding
to ~x; can be derived from model (1) by calculating the
volume of the confidence region in parameter space, as
follows. Once the available pregnancy data of information
vector ~w are known, volume depends only on its first
unknown component, BW, and describes the cumulative
conditional probability of ~x representing the strength of
association between true fetal weight and its just-measured
ultrasound parameters. The higher the volume, the more
measurements are expected to be mutually congruent and
accurately related to the associated weight.
The confidence region can be described mathematically
by considering the scalar quantity in the exponential term
of model equation (1):
Q ¼ dTR�1d ð2Þ
where d = x - l represents the vector of generic param-
eter deviations.
Q is a quadratic form which was demonstrated to be
distributed asdðn2�1Þnðn�dÞ times a Fisher density function, F,
with d and (n - d) degrees of freedom [28]. In our appli-
cation, the number of fetuses n, used for model designing,
was much greater than the parameter space dimension d, so
that the valid approximations (n2 - 1) % n2 and (n -
d) % n, and thereforedðn2�1Þnðn�dÞ ffi d; were used for simplify-
ing. Thus, the confidence region at probability level a can
be defined as the locus of parameter deviations, d; which
satisfy the following inequality:
Q� d F�1c ðd; n; aÞ ð3Þ
where Fc-1 is the inverse of cumulative F distribution, Fc,
with d and n degrees of freedom and evaluated at the
probability level a.
Equation (3) describes a five-dimensional hyperellip-
soidal region.
The probability, ~a; defines the volume of the hyperel-
lipsoid on whose surface the current measurements, ~x; lie.
It can be derived by inverting Eq. (3):
~a ¼ Fcðd; n; ~Q=dÞ ð4Þ
where Fc has evaluated at the value ~Q=d and ~Q is calcu-
lated from formula (2) using ~d ¼ ~x� l:
The quadratic form of (3) implies a unique maximum,
~amax; for ~a: It corresponds to a value of BW necessarily
located in the interval between the minimum and the
maximum value of vector BWexp. Though ~amax could the-
oretically be evaluated analytically, for practical reasons
we did a numerical search among all ~a values corre-
sponding to the same number, N, of BW sampling values,
spaced at steps, DBW, of 10 g, that is
~amax ¼ maxiN
1~aðBWiÞf g
BW1 ¼ min BWexp
� �BWN ¼ max BWexp
� �BWiþ1 ¼ BWi þ DBW ; DBW ¼ 10 g
ð5Þ
BWmp was chosen to correspond with the region of
maximum probability volume, ~amax; and was assumed as
112 Med Bio Eng Comput (2008) 46:109–120
123
the current EFW, even long before birth. It represents the
most plausible value of FW associated with the available
pregnancy information and the current echobiometric
measurements, taken together.
The vector, ~l ¼ lðBWmpÞ; of expected parameter values
evaluated at BWmp, provides model deviations, ~dm ¼ ~x�~l; from actual measurements, and their probabilities, ~am;
which account for measurement errors and morphological
characteristics of fetal physiopathology.
~am can be derived by projecting the multivariate normal
model (1) along any generic parameter axes, xk (k = 1,
2,…, 5), as follows:
pðxk=wÞ ¼ 1ffiffiffiffiffiffiffiffiffiffi2p~r2
k
p exp � 1
2
xk � ~lkð Þ2
~r2k
( )ð6Þ
where ~lk is of course the kth component of ~l and ~r2k is the
corresponding variance from the principal diagonal of
covariance matrix ~R ¼ RðBWmpÞ:Any component ~am;k of vector ~am can therefore be cal-
culated from (6):
~am;k ¼1� 2
Z ~xk
�1pðxk=wÞ if ~xk � ~lk
1� 2
Z þ1~xk
pðxk=wÞ if ~xk [ ~lk
8>>><>>>:
ð7Þ
Accuracy of EFW was evaluated by computing the mean
absolute percentage error, MAE%:
MAE% ¼XN
1
iAEi
N� 100
AEi ¼EFWi � BWij j
BWi
ð8Þ
where AEi is the relative absolute error of the model in
predicting the i-th fetal weight.
2.6 Clinical evaluation of model performance
Our method for real-time control of fetal echobiometry was
then tested for its effective ability to detect and correct
measurement errors and therefore improve accuracy in
EFW.
Ultrasound parameters of 61 fetuses were evaluated
within 5 days of delivery in the Department of Pediatrics,
Obstetrics and Reproductive Medicine, University of Sie-
na, by real-time interaction with our multinormal model,
implemented numerically by software developed in Matlab
language [19].
To investigate whether the system was able to appro-
priately correct measurement errors difficult to detect and
to significantly improve the accuracy of EFW, an obste-
trician with good experience in ultrasound (at least 2 years
experience) was chosen to perform fetal biometry. Ultra-
sound data were entered in the model to evaluate the
probability of agreement among measured fetal biometric
parameters and actual EFW.
On the basis of clinical evidence, model-estimated
maximum probability, ~amax; corresponding to the most
probable EFW (i.e. BWmp) and congruence probabilities of
the parameters, ~am; the operators decided autonomously
whether or not to correct the first set of measurements and
to proceed with further refined measurements. Specifically,
for each set, ~x; of measured echobiometric parameters, the
operator was suggested to consider possible measurement
errors when at least one of the ~am parameter probabilities
was less than 50% or when the EFW probability, ~amax; was
less than 50%. In this case, the operator decided to make
new ultrasound measurements or to keep the current
measurements, depending on his/her clinical experience
and on case-specific clinical information.
Improvements of accuracy in EFW were assessed by
applying our interactive method on-line to the 61 above-
mentioned pregnant women in the last week before deli-
very. We calculated mean and maximum AE% (MAE% and
AEmax%) and the percentage of FW having AE% greater
than 10% (AEgt10%).
The effectiveness of measurement error correction was
also evaluated using some mathematical models from the
literature [3, 14, 22, 25, 33, 35, 44] proven to give per-
formance equivalent to our model by error comparison
using the non parametric statistical test of Wilcoxon [1].
3 Results
3.1 Model estimation of fetal weight
Model performance was statistically equivalent for the
training, validation and testing data sets (Wilcoxon test,
p [ 0.05). We therefore report the results for the entire
data set used for model design. Figure 2 shows the distri-
bution of percentage error in relation to birth weight for the
multinormal probability model and the seven models which
gave statistically equivalent performance on the 61 data
items used for evaluating our model in real-time clinical
practice. Table 1 gives the MAE% and the percentage of
cases with AE% greater than 10% (AEgt10%) for each
model. As we can see (Fig. 2), only our proposed multi-
normal model, by virtue of its probability nature, has
uniform non-biased behaviour over the whole range of
BW. On the contrary, all the other models based on
regression techniques have an error distribution strongly
influenced by training data density in BW space, with the
only exception being the Hadlock model, which has
moderate bias because it was trained on a data set having a
Med Bio Eng Comput (2008) 46:109–120 113
123
quite uniform BW distribution [22]. Table 1 shows that this
model had errors very similar (MAE% = 7.81, AEgt10% =
30.8%) to our model (MAE% = 7.86, AEgt10% = 31.3%).
In particular, Fig. 2 shows that the Ott [33], Combs [14],
Woo [44] and Robson [35] models overestimate low BWs
and underestimate high BWs, whereas the Hill [25] and
Benson [3] models have different biases, underestimating
low and high BWs and overestimating intermediate BWs.
The lowest performances in Table 1 are shown by models
particularly biased at high BWs. Cases with high errors
generally also had low probabilities associated with our
model EFW, presumably due to ultrasound measurement
errors. Probability region boundaries with low probability
values are therefore an inspection area in which measure-
ment errors should be checked and where the accuracy of
EFW could improve.
Fig. 2 Distribution of
percentage error in relation to
birth weight in our multinormal
model and the other seven
models selected to give
statistically equivalent
performance with our clinical
data
114 Med Bio Eng Comput (2008) 46:109–120
123
A prototypical numerical implementation of our model
is shown in Fig. 3 that reports the screen hard copy of
graphical user interface of the underlying software. In the
right side of Fig. 3 we have gestational information ð~wÞ;actual measurements ð~xÞ; probabilities of congruence
among them ð~amÞ and their model-estimated expected
values ð~lÞ: The lower the probability of parameter con-
gruence, the more suspect that parameter has to be
considered. High deviation ð~dmÞ from expected values may
be due to measurement errors. Excessively low probability
values or low values of more than one parameter suggest
that the ultrasound session should be repeated. Figure 3
(left side) shows the five plot windows of most probable
parameter values (black lines) and standard deviations
(light blue lines) in relation to BW, as estimated from
ANNs. Dots around curves represent training data. On the
top of the graphic windows are the EFW (BWmp) and its
multivariate probability ð~amaxÞ: Again, the lower this
probability, the more high measurement errors, or unusual
body conformation, or both, can be expected. When ~amax is
particularly low, at least one of the congruency probabili-
ties ~am is low as well. Dashed blue lines underline both
EFW (BWmp, vertical lines) and its corresponding model-
estimated expected parameter values ð~l; horizontal lines).
At the bottom of each plotting area, the univariate expected
EFWs (BWexp, vertical dashed red lines) are reported with
the measured parameter values ð~x; horizontal dashed red
lines). The multivariate most probable EFW, BWmp, is of
course between the minimum and maximum of five uni-
variate BWexp values.
Figure 3 shows an example of EFW by our system. It
concerned a fetus at 40 weeks. The system indicates that
measured head circumference (HC = 350 mm) has a low
probability (10%) of being congruent with respect to other
fetal biometric parameters and an EFW of 3,331 g (prob-
ability 13%). This could mean: (1) that the HC
measurement is incorrect and that it needs to be measured
again; (2) that fetal HC is correct but is bigger than
expected because of hereditary predisposition; (3) that HC
is bigger for pathological reasons. Only the operator
experience, if necessary with other clinical information,
can answer this question.
Table 1 Model performance evaluated on the whole set of data
(training, validation and testing sets) used to design the multinormal
model
Model MAE% AEgt10%
Multinormal 7.86 31.3
Ott 7.45 27.2
Combs 8.43 33.1
Hill 8.00 29.7
Woo 7.53 28.7
Benson 8.43 32.6
Hadlock 7.81 30.8
Robson 7.74 30.0
Mean absolute percentage error MAE%; percentage of fetuses esti-
mated to have an AE% greater than 10% AEgt10%
Fig. 3 Graphic user-interface
of interactive software for fetal
echobiometry control and
correction, to improve EFW
accuracy
Med Bio Eng Comput (2008) 46:109–120 115
123
3.2 Clinical evaluation of model performance
In 16 out of 61 cases (26.3%) fetal biometry was measured
once and in 45 cases it was repeated two or more times, to a
total of 153 measurements. System performance was
assessed by comparing its 61 initial FW estimates with
those obtained without (16 cases) or with one (3 cases) or
more (42 cases) re-measurements of ultrasound parameters
associated with low (less than 50%) congruence probabil-
ities. For comparison we used EFW, derived from 182
formulas (from 59 published papers) [17]. Considering the
61 initial estimates, seven formulas [3, 14, 22, 25, 33, 35,
44] showed a performance statistically equivalent to our
system (Wilcoxon test, P [ 0.05). All other formulas gave
significant higher errors. Table 2 shows the performances
of all models. It is evident that correction of detected errors
yielded statistically significant improvements not only in
our model EFW (MAE% from 6.5% to 2.6%) but also when
the new biometry was tested by the seven best models (i.e.
Hadlock formula MAE% from 6.7% to 3.5%), thus con-
firming that the system is able to correct measurement
errors that affect model performance, worsening their
accuracy.
In particular, although the Hadlock model showed the
second best decrease in MAE% after our system, we found
a drastic reduction in error variability, with a maximum
error of 9.0% (in the same fetus), lower than that made by
our system (maximum error of 10.7%). Nevertheless, this
maximum error of 10.7% is acceptable, because it concerns
a normal weight fetus (real weight 3,640 g) that was
underestimated by the system (EFW equal to 3,250 g).
Other models also showed very good performance with
few errors above 10%.
In the cases we analyzed, MAE% was low at initial
estimations because the measurements were made by an
experienced operator. After correction (excepting two
models), the percentage of cases with an error above 10%
reduced to zero, as shown in Table 2. Maximum error was
lower or just a little higher than 10%.
4 Discussion
Accurate prediction of BW by ultrasonographic measure-
ment of classical fetal morphometric parameters plus other
related pregnancy data, such as gestational age, amniotic
fluid volume and number of fetuses, is of considerable
interest in obstetrics, enabling clinicians to more accurately
predict infant morbidity and mortality [17]. Moreover,
EFW in utero is of great clinical interest for monitoring
fetal growth [31, 34] and may have a central role in major
medical decisions in critical conditions of preterm delivery
and fetal macrosomy [15, 20, 35, 36].
Although a lot of sophisticated mathematical formulas
and models have been developed in the last 30 years [3, 11,
14–17, 20, 22, 23, 25, 27, 29, 33, 35, 36, 38, 41, 44],
estimates still typically have too high an error variance,
preventing reliable clinical use [13, 15, 17, 29]. Even
operators with proven ability in ultrasound examination
provide remarkably high percentages (15–25%) of fetuses
whose BW is estimated with an AE% greater than 10%.
This problem seems difficult to overcome because the
many errors of fetal ultrasound evaluation are presumably
due to technological, environmental, intra- and inter-
observer variability in fetal measurement and so forth [17,
29]. There are currently unlikely to be major revolutions in
technology, ultrasonographic practice and other methods
that could significantly improve accuracy of measurements
and/or their ability to predict BW more reliably. At the
moment, it is not at all easy to quantify errors, and
Table 2 Model performance evaluated in 61 pregnancies before (initial measurements) and after (ultimate measurements) zero (16 cases), one
(3 cases), or more corrections (42 cases) of the initial ultrasound measurements
Model Initial measurements Ultimate measurements
MAE% AEmax% AEgt10% MAE% AEmax% AEgt10%
Multinormal 6.5 19.3 13.1 2.6 10.7 1.6
Ott 5.7 16.9 9.8 4.3 9.6 0.0
Combs 5.8 19.1 11.5 4.2 12.2 1.6
Hill 6.2 18.6 13.1 4.6 10.2 1.6
Woo 6.6 20.1 18.0 4.6 9.8 0.0
Benson 6.7 19.1 16.4 4.9 13.6 3.3
Hadlock 6.7 18.1 16.4 3.5 9.0 0.0
Robson 6.7 16.8 16.4 5.4 14.7 9.8
Corrections were decided autonomously by the operator using an interactive system based on the proposed multinormal model for fetal weight
estimation: absolute percentage AE%; mean absolute percentage error MAE%; maximum absolute percentage error AEmax%; percentage of
fetuses estimated to have AE% greater than 10% AEgt10%
116 Med Bio Eng Comput (2008) 46:109–120
123
particularly to discriminate errors due to intra- and inter-
observer variability in ultrasound measurements. Efforts
must be made to minimise this variability if EFW is to be
considered clinically useful [17].
Many recent attempts have been made to reduce the
estimation error on lower and higher FWs, where the
clinical interest is of course focused. In general, clinicians
distinguish these two critical intervals of weight from an
intermediate one that typically ranges from 2,500 to
4,000 g [16, 20, 23]. Almost all models for EFW exhibit a
worsening of accuracy in critical weight classes (below
2,500 g and above 4,000 g) where lower/higher weights
are usually over/under-estimated [13, 16, 29]. Most math-
ematical models are derived from statistical regressions
and account nonlinearly for ultrasound measurements by
fitting experimental data. They are therefore most accurate
for intermediate weights, where experimental data has
higher density, and produce increasing biases going from
median to lower or higher FWs where data density pro-
gressively decreases. Concerning this problem it is really
important to underline that it is in the critical weight
classes that weight estimation becomes fundamental from a
clinical point of view. A dangerous increase of the rate of
false normal weights arises. In other words, such biased
models tend to reassure excessively about a normal FW,
correctly identifying only very critical conditions that can
be detected by simple qualitative investigations.
Models specialized in critical weight ranges have also
been constructed and tested: they are sometimes much
more accurate in the range where they have been fitted and
dramatically less accurate elsewhere, as would be expected
[15, 17, 23, 29, 35, 36, 38, 41]. The use of these specialized
models therefore requires prior knowledge about the
weight range in which to classify the fetus, leading to
dangerous amplification of errors in borderline areas which
are of critical clinical interest. This has also legal impli-
cations for ultrasonographers who may make gross errors
with severe consequences for maternal and fetal health.
Moreover, there have been several studies to evaluate
the efficacy of mathematical models related to specific GA
intervals [32, 41]. Although GA intervals are better defined
than weight intervals, they are nevertheless affected by
gestational age estimation precision, that becomes less
accurate as pregnancy goes on, and it is only partially
related to microsomic and macrosomic fetuses.
In our opinion, the use of mathematical models spe-
cialized for specific FW and/or GA ranges can therefore be
dangerous, of little clinical interest and not significantly
better than those applicable to the entire fetal population. In
other words, they are of no help.
All other efforts to decrease AE% by introducing cor-
rection factors in the algorithms and new information, such
as amniotic fluid volume, number of fetuses and maternal
pathologies, or non-routine echobiometric parameters, have
failed to bring effective improvements [8]. Moreover, more
recent mathematical models, besides the above mentioned
limits, are sometimes based on echobiometric parameters
difficult to obtain, particularly by unskilled operators [8,
37, 40]. Specifically, three-dimensional (3D) ultrasound
enables volumetric parameters such as fetal thigh, upper
arm and abdomen to be measured for EFW. Although
preliminary studies seems to indicate improvements [40],
doubts remain about the utility of 3D for a substantial
improvement in the accuracy of EFW [17]. Moreover, 3D
ultrasound systems are expensive, not as widespread as 2D
systems, and unfamiliar for operators doing routine fetal
biometry. In any case, if the superiority of 3D ultrasound
systems were established, our model could be easily
extended to volumetric measurements.
Today, about ten models are considered to give the best,
not significantly different performances and none give a
MAE% below 7–8% [15, 17, 29].
We chose to tackle the problem of reducing human error
in the use of ultrasound devices for fetal biometry by sig-
nificantly improving the accuracy of EFW. An interesting
attempt to control ultrasound measurement errors by
enhancing the fetal border and reducing noise was recently
proposed for evaluation of nuchal translucency thickness
[30]. Its impact on fetal echobiometry for improving the
accuracy of EFW should be investigated.
We designed a weight-dependent Gaussian probability
model [1, 28] over the whole range of BWs, which avoids
the above-mentioned biases and provides detailed infor-
mation about the reliability of measurements through
interactive software, allowing redefinition of measurements
and real-time correction. Model parameters were estimated
from a large database of 3,000 fetuses, collected by ultra-
sound operators of proven experience, though presumably
containing measurement errors. Our hypothesis was that by
correcting or limiting these errors, we could obtain an EFW
of acceptable accuracy to protect fetal and maternal health
and reduce wrong medical decisions, which sometimes also
have legal implications.
In line with Dudley [17], we consider that insufficient
accuracy in EFW depends on excessive intra- and inter-
observer variability of measurements. The great advantage
of using a multivariate Gaussian model is that it assigns
probability values to the different ultrasound measurements
and to EFW. The model is designed and trained on ultra-
sound data measured by experienced ultrasound operators
who carefully followed the standardised protocols for
correct echobiometry [4]. It can therefore guide operators
to follow its reliable statistical representation suggesting
repetition of divergent readings to reduce errors. We
assumed that human errors occur more frequently in the
space of ultrasound measurements where the model
Med Bio Eng Comput (2008) 46:109–120 117
123
indicates lower probabilities of congruence among bio-
metric parameters. However, low probabilities can also
arise from fetal pathology or peculiar morphology, such as
maternal diabetes, unusual parental build and abnormal
fetal growth. Though these zones may not be distinguished
by ultrasound examination alone, they are both of great
clinical interest. Thus, when operators encounter low
model probabilities, they are alerted to investigate more
thoroughly than usual and to repeat suggested biometric
measurements. Two distinct situations are possible so that
new measurements can be: (1) the same as before and/or
still associated with low probabilities; (2) substantially
different but in the direction of model expected values,
increasing the probability of congruence with other fetal
parameters. In the first case, there may be abnormalities
suggesting the need to review other clinical data, such as
maternal/paternal build and pathologies. In the second
case, measurement errors may be detected and corrected. In
both situations, at least a third session of measurements is
recommended for confirmation. If any disagreement still
remains between measurement sessions, operators should
decide on the basis of other clinical information and/or
experience.
Since our method incorporated certain clinical infor-
mation about pregnancy, it was convenient to use an ANN
approach [24] to estimate multinormal model parameters
(i.e. mean vectors and covariance matrices), that were
made to depend on pregnancy data and FW. The model
dependence on pregnancy information gives a more accu-
rate probability but makes the problem of estimating its
parameters from sample data unfeasible with common
statistical methods, such as multivariate regression, which
would be inaccurate. For example, means of the parameter
vector could be estimated by entering pregnancy variables
in multivariate linear regression models where echobio-
metric measurements are assumed as dependent variables.
Unfortunately, all regression techniques are very sensitive
to empty regions in observation space and to outliers [1, 5,
6, 12, 28], and are most accurate where observations are
densest. Since in clinical application there is great interest
in regions with low data density, e.g. macrosomic and
microsomic fetuses, we choose an ANN approach to
overcome the many limits of regression technique [6, 24,
26]. ANNs are sophisticated machine learning methods
which make it possible to express the knowledge contained
in experimental data with great flexibility and precision,
and provide a uniform description, without discontinuities,
of the input-output relationship. They can therefore deter-
mine expected output values with satisfactory accuracy, by
interpolating missing data even in multivariate space with
few sparse observations [6]. Other important advantages of
ANNs with respect to statistical regression models are that
it is not necessary to specify model structure, hypotheses
about statistical data distribution are unnecessary, they are
able to describe nonlinearities, naturally take correlation of
input variables into account and can be trained with
examples like humans [24, 26, 39]. ANNs have recently
been successfully applied in many fields of medicine. All
that is required is a sufficiently large, representative set of
training examples. The main difficulty with ANNs is their
training which must be done with care to avoid overfitting,
a tendency of ANNs to learn even training data variability
which cannot be generalized to the whole phenomenon.
There are many methods of ensuring ANN generalization
power, for example regularization techniques, growing and
pruning algorithms, genetic algorithms and early-stopping
(ES) procedures [6, 26]. We applied the ES which is widely
used to train ANNs by virtue of its fast computational time
[6, 24]. It divides the available data into training and val-
idation sets. Generalization is ensured by stopping the
training process at the iteration when the ANN begins to
overfit, that is when the error computed on the validation
set starts to increase. However, since the validation set is
involved in the training process in any case, it must not be
used for estimating the generalization error. We therefore
tested the ANNs with the third set of data (testing set)
which had not been used during training [6].
When we tested our model in clinical practice to correct
operator measurement errors in real time, we obtained very
encouraging results. Fetal biometric measurements were
performed by an experienced operator because we wanted
to understand whether under optimum conditions, it was
possible to obtain errors below 10%. We were successful in
this endeavour.
The fact that we obtained a significant lowering of
MAE% when we fitted the corrected parameters in the best
estimation models of the literature, confirms that our sys-
tem can in fact help operators to correct measurement
errors. The system also promises to be useful for training
less experienced sonographers and could be used as a
quality control system for fetal biometry. By reducing
human error, it enhances EFW and clinical obstetric
management.
5 Conclusions
A multinormal probability model for the estimation of fetal
weight was implemented numerically to provide clinical
indications about the type and size of measurement errors
in real-time fetal echobiometry. The model compared
actual measures with expected values and associated
probability values with EFW, indicating the reliability of
EFW in terms of congruence with ultrasound measure-
ments. Low probabilities suggest more accurate repetition
of suspect measurements and help ultrasound operators to
118 Med Bio Eng Comput (2008) 46:109–120
123
interpret fetal morphology by distinguishing between
measurement errors and real pathophysiological
conditions.
Compared to other EFW models of equivalent accuracy,
probability models also have the major clinical advantage
of avoiding over- and under-estimation of micro- and
macrosomic fetal weights.
Clinical testing of the model on a sample of 61 fetuses
revealed its good performance in correcting measurement
errors and showed a remarkable improvement in accuracy of
EFW, confirmed by other mathematical models of proven
accuracy. Our proposed interactive software therefore offers
valid support for training operators in fetal echobiometry.
Although system capacity clearly needs to be tested on a
wider scale, its clinical utility and simplicity, as well as the
sharp improvement in accuracy of EFW, suggest that it
could be used as a reliable auxiliary for clinical decision
making in pregnancy. This is also an advance in the direc-
tion of standardization of measuring procedures, which are
often a severe limiting factor in ultrasonographic practice.
Acknowledgments This work was financed by the Italian Ministry of
Education, University and Research (MIUR). Special thanks to ESA-
OTE S.p.A., Genoa, Italy, for its precious and prompt technical support.
References
1. Armitage P, Berry G (1987) Statistical methods in medical
research. Blackwell, Oxford
2. Ben-Haroush A, Yogev Y, Hod M (2004) Fetal weight estimation
in diabetic pregnancies and suspected fetal macrosomia. J Perinat
Med 32(2):113–121
3. Benson CB, Doubilet PM, Saltzman DH (1987) Sonographic
determination of fetal weights in diabetic pregnancies. Am J
Obstet Gynecol 156(2):441–444
4. Bettelheim D, Deutinger J, Bernaschek (1997) Fetal sonographic
biometry: a guide to normal and abnormal measurements. The
Parthenon Publishing Group
5. Biagioli B, Scolletta S, Cevenini G, Barbini E, Giomarelli P,
Barbini P (2006) A multivariate Bayesian model for assessing
morbidity after coronary artery surgery. Crit Care 10(3):R94. doi:
10.1186/cc4951
6. Bishop HCM (1995) Neural networks for pattern recognition.
Clarendon, Oxford
7. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Kenney
SP, Devoe LD (1998) Limitations of clinical and sonographic
estimates of birth weight: experience with 1034 parturients.
Obstet Gynecol 91(1):72–77
8. Chauhan SP, West DJ, Scardo JA, Boyd JM, Joiner J, Hendrix
NW (2000) Antepartum detection of macrosomic fetus: clinical
versus sonographic, including soft-tissue measurements. Obstet
Gynecol 95(5):639–642
9. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Scardo JA,
Berghella V (2005) A review of sonographic estimate of fetal
weight: vagaries of accuracy. J Matern Fetal Neonatal Med
18(4):211–220
10. Chauhan SP, Cole J, Sanderson M, Magann EF, Scardo JA (2006)
Suspicion of intrauterine growth restriction: use of abdominal
circumference alone or estimated fetal weight below 10%. J Ma-
tern Fetal Neonatal Med 19(9):557–562
11. Chuang L, Hwang JY, Chang CH, Yu CH, Chang FM (2002)
Ultrasound estimation of fetal weight with the use of computer-
ized artificial neural network model. Ultrasound Med Biol
28(8):991–996
12. Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple
regression: correlation analysis for the behavioral sciences. Erl-
baum, London
13. Colman A, Maharaj D, Hutton J, Tuohy J (2006) Reliability of
ultrasound estimation of fetal weight in term singleton pregnan-
cies. New Zeal Med J 119(1241):U2146
14. Combs CA, Jaekle RK, Rosenn B, Pope M, Miodovnik M, Sid-
diqi TA (1993) Sonographic estimation of fetal weight based on a
model of fetal volume. Obstet Gynecol 82(3):365–370
15. Coomarasamy A, Connock M, Thornton J, Khan KS (2005)
Accuracy of ultrasound biometry in the prediction of macroso-
mia: a systematic quantitative review. Brit J Obstet Gynaec
112(11):1461–1466
16. Dudley NJ (1995) Selection of appropriate ultrasound methods
for the estimation of fetal weight. Brit J Radiol 68:385–388
17. Dudley NJ (2005) A systematic review of the ultrasound esti-
mation of fetal weight. Ultrasound Obstet Gynecol 25(1):80–89
18. Edwards A, Goff J, Baker L (2001) Accuracy and modifying
factors of the sonographic estimation of fetal weight in a high-
risk population. Aust NZ J Obstet Gyn 41(2):187–190
19. Etter DM, Kuncicky DC, Moore H (2005) Introduction to
MATLAB 7. Prentice Hall, Englewood Cliffs
20. Farmer RM, Medearis AL, Hirata GI, Platt LD (1992) The use of
a neural network for the ultrasonographic estimation of fetal
weight in the macrosomic fetus. Am J Obstet Gynecol
166(5):1467–1472
21. Goldberg JD (2004) Routine screening for fetal anomalies:
expectations. Obstet Gynecol Clin North Am 31(1):35–50
22. Hadlock FP, Harrist RB, Sharman RS, Deter RL, Park SK (1985)
Estimation of fetal weight with the use of head, body, and femur
measurements - a prospective study. Am J Obstet Gynecol
151:333–7
23. Hadlock FP (1990) Sonographic estimation of fetal age and
weight. Fetal Ultrasound 28(1):39–51
24. Haykin S (1994) Neural networks: a comprehensive foundation.
Maxwell Macmillian, Canada
25. Hill LM, Breckle R, Gehrking WC, O’Brien PC (1985) Use of
femur length in estimation of fetal weight. Am J Obstet Gynecol
152:847–852
26. Jamshidi M (2003) Tools for intelligent control: fuzzy control-
lers, neural networks and genetic algorithms. Philos Transact A
Math Phys Eng Sci 361(1809):1781–1808
27. Jordaan HV (1983) Estimation of fetal weight by ultrasound.
J Clin Ultrasound 11(2):59–66
28. Krzanowski WJ (1988) Principles of multivariate analysis: a
user’s perspective. Clarendon, Oxford
29. Kurmanavicius J, Burkhardt T, Wisser J, Huch R (2004) Ultr-
asonographic fetal weight estimation: accuracy of formulas and
accuracy of examiners by birth weight from 500 to 5000 g.
J Perinat Med 32(2):155–161
30. Lee YB, Kim MJ, Kim MH (2007) Robust border enhancement
and detection for measurement of fetal nuchal translucency in
ultrasound images. Med Biol Eng Comput (Spec issue). doi:
10.1007/s11517-007-0225-7
31. Lockwood CJ, Weiner S (1986) Assessment of fetal growth. Clin
Perinatol 13(1):3–35
32. Mongelli M, Biswas A (2002) Menstrual age-dependent
systematic error in sonographic fetal weight estimation: a math-
ematical model. J Clin Ultrasound 30(3):139–44
Med Bio Eng Comput (2008) 46:109–120 119
123
33. Ott WJ, Doyle S, Flamm S, Wittman J (1986) Accurate ultrasonic
estimation of fetal weight. Prospective analysis of a new ultra-
sonic formula. Am J Perinatol 3(4):307–10
34. Ott WJ (2006) Sonographic diagnosis of fetal growth restriction.
Clin Obstet Gynecol 49(2):295–307
35. Robson SC, Gallivan S, Walkinshaw SA, Vaughan J, Rodeck CH
(1993) Ultrasonic estimation of fetal weight: use of targeted
formulas in small for gestational age fetuses. Obstet Gynecol
82(3):359–364
36. Rosati P, Exacoustos C, Caruso A, and Mancuso S (1992)
Ultrasound diagnosis of fetal macrosomia. Ultrasound Obstet
Gynecol 2(1):23–29
37. Rotmensch S, Celentano C, Liberati M, Malinger G, Sadan O,
Bellati U, Glezerman M (1999) Screening efficacy of the sub-
cutaneous tissue width/femur length ratio for fetal macrosomia in
the non-diabetic pregnancy. Ultrasound Obstet Gynecol
13(5):340–344
38. Sabbagha RE, Minogue J, Tamura RK, Hungerford SA (1989)
Estimation of birth weight by use of ultrasonographic formulas
targeted to LGA, AGA, and SGA fetuses. Am J Obstet Gynecol
160:854–862
39. Sargent DJ (2001) Comparison of artificial neural networks with
other statistical approaches: results from medical data sets.
Cancer 91(S8):1636–1642
40. Schild RL, Fimmers R, Hansmann M (2000) Fetal weight esti-
mation by three-dimensional ultrasound. Ultrasound Obstet
Gynecol 16(5):445–452
41. Secher NJ, Djursing H, Hansen PK, Lenstrup C, Sindberg-Erik-
sen P, Thomsen BL, Keiding N (1987) Estimation of fetal weight
in the third trimester by ultrasound. Eur J Obstet Gynecol Reprod
Biol 24:1–11
42. Sladkevicius P, Saltvedt S, Almstrom H, Kublickas M, Grune-
wald C, Valentin L (2005) Ultrasound dating at 12–14 weeks of
gestation. A prospective cross-validation of established dating
formulae in in vitro fertilized pregnancies. Ultrasound Obstet
Gynecol 26(5):504–511
43. Thornton JG, Hornbuckle J, Vail A, Spiegelhalter DJ, Levene M,
GRIT study group (2004) Infant wellbeing at 2 years of age in the
growth restriction intervention trial (GRIT): multicentred ran-
domised controlled trial. Lancet 364(9433):513–520
44. Woo JS, Wan MC (1986) An evaluation of fetal weight predic-
tion using a simple equation containing the fetal femur length.
J Ultrasound Med 5(8):453–457
120 Med Bio Eng Comput (2008) 46:109–120
123