Informativni model verjetnosti | An informative probability model

ORIGINAL ARTICLE

An informative probability model enhancing real timeechobiometry to improve fetal weight estimation accuracy

G. Cevenini Æ F. M. Severi Æ C. Bocchi ÆF. Petraglia Æ P. Barbini

Received: 4 May 2007 / Accepted: 28 November 2007 / Published online: 10 January 2008

� International Federation for Medical and Biological Engineering 2007

Abstract A multinormal probability model is proposed to

correct human errors in fetal echobiometry and improve the

estimation of fetal weight (EFW). Model parameters were

designed to depend on major pregnancy data and were

estimated through feed-forward artificial neural networks

(ANNs). Data from 4075 women in labour were used for

training and testing ANNs. The model was implemented

numerically to provide EFW together with probabilities of

congruence among measured echobiometric parameters. It

enabled ultrasound measurement errors to be real-time

checked and corrected interactively. The software was use-

ful for training medical staff and standardizing measurement

procedures. It provided multiple statistical data on fetal

morphometry and aid for clinical decisions. A clinical pro-

tocol for testing the system ability to detect measurement

errors was conducted with 61 women in the last week of

pregnancy. It led to decisive improvements in EFW accuracy.

Keywords Probability model � Neural networks �Ultrasound � Echobiometry � Fetal weight estimation

1 Introduction

Many decisions in obstetrics depend on gestational age

(GA) and fetal weight (FW). Accurate ultrasound

examination performed before 20 weeks of gestation

enables true GA to be estimated [42]. On the other hand,

estimation of FW (EFW) using standard biometric

parameters, usually related to geometric dimensions of the

fetal head, abdomen and long bones of extremities, is still

problematical [18].

Monitoring of fetal growth is fundamental in modern

perinatology because it is strictly related to fetal/neonatal

wellbeing [43]. Moreover, identification of abnormal

intrauterine growth patterns enables better pregnancy

management [10, 21, 43].

In the last 30 years, many methods have been developed

to improve EFW accuracy, most based on formulae derived

by regression analysis [3, 16, 22, 23, 25, 27, 33, 35, 36, 38,

41, 44], or on physical models [2, 14, 17, 29]. Artificial

neural networks (ANNs) and volumetric methods based on

three-dimensional (3D) ultrasonography were also recently

proposed [11, 20, 40].

Clinical use of these mathematical models led to intro-

duction of EFW in ultrasound reports. Although effective

in the original papers, ultrasound operators know that every

estimation model loses efficacy when applied in clinical

practice [9, 17]. The differences between accuracies in the

literature and those obtained in local clinical institutions

are due to many factors, the ones being significant statis-

tical dissimilarity between original and local populations

and samples, diversities in echobiometric measurement

procedures and lack of model generalization. Little atten-

tion has usually been paid to generalization, which refers to

a model ability to provide the same accuracy on data not

used for model identification [5]. Specifically, empirical

formulae do not guarantee a good compromise between

model flexibility to fit all useful information and robustness

to filter useless data variability. Too many model

parameters have been estimated from few ultrasound cases

G. Cevenini (&) � P. Barbini

Department of Surgery and Bioengineering, University of Siena,

Viale Mario Bracci 16, 53100 Siena, Italy

e-mail: [email protected]

F. M. Severi � C. Bocchi � F. Petraglia

Department of Pediatrics, Obstetrics and Reproductive

Medicine, University of Siena,

Viale Mario Bracci 16, 53100 Siena, Italy

123

Med Bio Eng Comput (2008) 46:109–120

DOI 10.1007/s11517-007-0299-2

near delivery. Sometimes fetuses with non-homogeneous

weight or GA intervals not representative of the whole

population are used. In other cases the clinical condition of

women in labour is neglected or incorrectly reported.

Although attempts to reduce statistical sample errors and

lack of generalization power by selecting the most accurate

and representative models have been made, a percentage

mean absolute error less than 7–8% of the true BW has

never been achieved in current clinical practice, with 25%

(or more) of estimates having an absolute error over 10%

[29]. Unfortunately, since most obstetricians take 10% as a

critical error threshold above which EFW cannot guarantee

correct clinical management, the method cannot yet be

considered reliable for clinical decision-making [7, 17].

Though many attempts have been made to reduce esti-

mation errors by means of models specialized in particular

ranges of FW or GA [16, 23, 36], or derived from sophisti-

cated 3D and ANN methods [11, 40], it has not proven

possible to significantly reduce the error, because it is pre-

sumably due to many different unpredictable factors (human,

environmental, instrumental, technological, etc.) associated

with digital processing of echobiometric values [17].

Since the 10% error limit for all populations of fetuses is

not so far away, there is great interest in finding solutions

that could improve EFW accuracy enough to reach the goal.

Actually, the only way to enhance fetal weight predic-

tion accuracy seems to be reduction of operator

measurement error. Indeed, readings made by operators

with long experience in fetal ultrasound have significantly,

but not still sufficiently, lower errors.

This paper describes a computerized information system

to help ultrasound operators in the control and interactive

correction of measurement errors in two-dimensional fetal

biometry. It is based on a Gaussian multivariate (multi-

normal) probability model, the parameters of which are

identified by ANNs trained with sample data representing a

wide fetal population. Therefore, it properly belongs to

machine learning methods which are widely used in com-

puting applications to support clinical decision making.

The effective level of real time improvement in the accu-

racy of EFW was tested clinically in a small sample of

pregnant women.

2 Methods

2.1 Population and samples

To design the model we used data of 4,075 fetuses in the

last week before birth, recorded in our clinics over the last

10 years. Only fetuses with evident malformations were

excluded from the database which was divided into three

samples equally representative of the fetal population:

a training set and a validation set of the same size from the

first 3,200 fetuses, the former by odd positions and the

latter by even positions of the chronologically ordered list;

the last 875 cases constituted a testing set. The training and

validation sets were used for model training, whereas the

testing set was used to check that model performance

remained statistically equivalent with new data (generali-

zation ability). Finally, the system was applied in clinical

practice to 61 pregnant women in the last week before

delivery to verify its effective capacity to support interac-

tive correction of real-time ultrasound measurements and

to improve EFW accuracy.

2.2 Measurement variables

Fetal echobiometric data, including biparietal diameter

(BPD), head and abdominal circumferences (HC, AC), and

femur length (FL), were measured by transabdominal

ultrasound scan with a Siemens Sonoline Elegra Millenium

Edition ultrasound system or a MYLAB Family instrument

(ESAOTE spa, Genova, Italy). Gestational age (GA) in

weeks was established by accurate menstrual history con-

firmed by ultrasound examination before the 20th week of

gestation. True FW was determined by measuring birth

weight (BW) with a precision balance soon after the

delivery. BW was the dependent variable used to train our

model to estimate FW from ultrasound scans just before

delivery.

Essential pregnancy data, namely amniotic fluid volume

(AF), number of fetuses (FN) and number of days between

last ultrasound examination and delivery (US-D) were also

entered in the training process.

AF was conceived as a binary-coded qualitative variable

with four categories: normal, absent, reduced and aug-

mented volume. US-D ranged from 0 (i.e. ultrasound

examination and delivery on the same day) to 6 (i.e.

ultrasound examination 6 days before delivery).

2.3 Multinormal probability model

To describe the probability space of the ultrasound mea-

surements we used the multivariate Gaussian density

function:

pðx=wÞ ¼ 1

2pð Þd=2 RðwÞj j1=2

exp � 1

2½x� lðwÞ�TR�1

w ½x� lðwÞ�� ð1Þ

where T is the vector transposition operator, d = 5 the

parameter space dimension, x = [BPD HC AC FL GA] the

110 Med Bio Eng Comput (2008) 46:109–120

123

vector of current echobiometric parameters, w = [BW AF

FN US-D] an information vector conditioning density

function (1), and l ðwÞ and R ðwÞ the mean vector and

covariance matrix, respectively, of parameters which

depend on w and have to be estimated to completely define

the probability model (1).

2.4 Artificial neural networks

Three feed-forward ANNs were designed to estimate the

parameters l ðwÞ and R ðwÞ of the multivariate normal

model. They were made sufficiently flexible (sufficient

number of hidden neurons and appropriate functions of

neuron activation) to encompass all deterministic data

patterns. Proceeding by trial and error, we selected ANN

architecture having ten neurons in a single hidden layer. It

offered a good compromise between simplicity and

generalization ability through error minimisation. Hidden

neurons were equipped with biased tansig activation

functions. The output neurons had linear activation for

estimating model parameters. The input data were stan-

dardized before presentation to the network, so as to have

zero mean and unit standard deviation. Standardization has

been shown to increase the efficiency of ANN training [6].

The first ANN, ANN1, was designed to estimate the

model mean vector, l ðwÞ; for each combination of

pregnancy information w, considered as input data. A

block diagram of ANN1 is shown in Fig. 1, where the

training (T) and prediction (P) phases are in the upper and

lower left sides, respectively. Specifically, ANN1 is

trained to recognize the set of echobiometric measure-

ments x, i.e. BPD, HC, AC, FL and GA, from input data

w, i.e. BW, AF, FN and US-D. Once trained, ANN1

predicts the corresponding most likely (expected) para-

meter values �x; i.e. BPD;HC;AC; FL and GA; for any a

given set of pregnancy information. These expected values

are assumed as a reliable estimation of the mean para-

meter vector l ðwÞ: The ANN1 prediction phase is

reported in Fig.1 because it is necessary to obtain

parameter deviations, [xi - li(w)], (i = 1, 2,…, 5), namely

the differences between an echobiometric measurement,

xi, and its corresponding mean value, li, estimated by

ANN1 as a function of input data w. In the centre of Fig. 1

the calculation of deviations is illustrated, together with

their squared values, i.e. deviances di = [xi - li(w)]2, and

all their paired products, i.e. codeviances didj = [xi -

li(w)]�[xj - lj(w)] (i = j = 1, 2,…, 5).

The two remaining ANNs, ANN2 (upper right side of

Fig. 1) and ANN3 (lower right side of Fig. 1), were then

trained to recognize deviances and codeviances, respec-

tively. Once trained, ANN2 and ANN3 could therefore

estimate the expected values of deviances and codevi-

ances, E{[xi - li(w)]2} and E{[xi - li(w)]�[xj - lj(w)]},

respectively, which were taken as suitable estimations of

variances ri2 and covariances rirj of model covariance

matrix R ðwÞ: Of all the pregnancy information, only BW

was assumed to affect the model covariance matrix. It is

BPD

HC

AC

FL

GA

BW

AF

FN

US-D

(T)

ANN1

BPD

HC

AC

FL

GA

BW

AF

FN

US-D(P)

ANN1

- - - - -

δi2

δi δj

δ 2BPD

δ 2HC

δ 2AC

δ 2FL

δ 2GA

BW

(T)

ANN2

BW

(T)

ANN3

δ δBPD HC

δ δBPD AC

δ δBPD FL

δ δBPD GA

δ δHC AC

δ δHC FL

δ δHC GA

δ δAC FL

δ δAC GA

δ δFL GA

Fig. 1 Block diagram of the

feed-forward ANN training

process

Med Bio Eng Comput (2008) 46:109–120 111

123

well-known that the inferential process exploits a reduction

of data dimensions, especially when a large number of

parameters (matrix elements) have to be estimated [6].

Significantly improved accuracy of estimates largely

compensates for the lack of other pregnancy information.

ANN2 and ANN3 were therefore equipped with a single

BW input (see right of Fig. 1). Their prediction phase is not

reported in Fig. 1, to avoid unnecessary detail.

All the ANNs were trained using a batch training

method which updates synaptic weights and neuron biases

only after all inputs and targets have been presented, i.e.

after each iteration. An iterative training algorithm with

gradient descendent momentum and adaptive learning rate

was used to minimise the mean squared error between real

and predicted outputs.

To limit the influence of training algorithm initialization

on the solution, we performed 99 training sessions starting

from 99 different randomly-selected initial values of ANN

parameters (i.e. synaptic biases and weights), and chose the

session giving the median error value (50th sorted value).

The early-stopping method was applied directly during

the training process to control ANN generalization power

and avoid the problem of overfitting [6, 24]. At each iter-

ation, training and validation errors were calculated from

data used to train the ANN (training set) and to validate

generalization (validation set), respectively. Training was

stopped when the validation error did not decrease for ten

consecutive iterations. Testing data was then used to con-

firm generalization on a third set of cases that had not been

used during training.

2.5 Fetal weight estimation

The principal aim of this study was to predict FW, which

was strictly related to BW for training ANNs. BW is the

first component of pregnancy information vector w and

cannot be known for an unborn fetus.

In the case of a fetus, whose mathematical expressions

will be denoted with an upper symbol *, knowledge of the

other three components of vector ~w; that is AF, FN and US-

D, and its measured echobiometric parameters, ~x; allows

ANN1 to identify the vector of expected parameters,

~lðBWÞ; as a function of unknown BW. It identifies five

monotonic curves on which five expected values of BW

can be found corresponding to actual measurements ~x; they

are expressed by the five-dimensional vector BWexp.

The most probable value of BW, BWmp, corresponding

to ~x; can be derived from model (1) by calculating the

volume of the confidence region in parameter space, as

follows. Once the available pregnancy data of information

vector ~w are known, volume depends only on its first

unknown component, BW, and describes the cumulative

conditional probability of ~x representing the strength of

association between true fetal weight and its just-measured

ultrasound parameters. The higher the volume, the more

measurements are expected to be mutually congruent and

accurately related to the associated weight.

The confidence region can be described mathematically

by considering the scalar quantity in the exponential term

of model equation (1):

Q ¼ dTR�1d ð2Þ

where d = x - l represents the vector of generic param-

eter deviations.

Q is a quadratic form which was demonstrated to be

distributed asdðn2�1Þnðn�dÞ times a Fisher density function, F,

with d and (n - d) degrees of freedom [28]. In our appli-

cation, the number of fetuses n, used for model designing,

was much greater than the parameter space dimension d, so

that the valid approximations (n2 - 1) % n2 and (n -

d) % n, and thereforedðn2�1Þnðn�dÞ ffi d; were used for simplify-

ing. Thus, the confidence region at probability level a can

be defined as the locus of parameter deviations, d; which

satisfy the following inequality:

Q� d F�1c ðd; n; aÞ ð3Þ

where Fc-1 is the inverse of cumulative F distribution, Fc,

with d and n degrees of freedom and evaluated at the

probability level a.

Equation (3) describes a five-dimensional hyperellip-

soidal region.

The probability, ~a; defines the volume of the hyperel-

lipsoid on whose surface the current measurements, ~x; lie.

It can be derived by inverting Eq. (3):

~a ¼ Fcðd; n; ~Q=dÞ ð4Þ

where Fc has evaluated at the value ~Q=d and ~Q is calcu-

lated from formula (2) using ~d ¼ ~x� l:

The quadratic form of (3) implies a unique maximum,

~amax; for ~a: It corresponds to a value of BW necessarily

located in the interval between the minimum and the

maximum value of vector BWexp. Though ~amax could the-

oretically be evaluated analytically, for practical reasons

we did a numerical search among all ~a values corre-

sponding to the same number, N, of BW sampling values,

spaced at steps, DBW, of 10 g, that is

~amax ¼ maxiN

1~aðBWiÞf g

BW1 ¼ min BWexp

� �BWN ¼ max BWexp

� �BWiþ1 ¼ BWi þ DBW ; DBW ¼ 10 g

ð5Þ

BWmp was chosen to correspond with the region of

maximum probability volume, ~amax; and was assumed as


123

the current EFW, even long before birth. It represents the

most plausible value of FW associated with the available

pregnancy information and the current echobiometric

measurements, taken together.

The vector, ~l ¼ lðBWmpÞ; of expected parameter values

evaluated at BWmp, provides model deviations, ~dm ¼ ~x�~l; from actual measurements, and their probabilities, ~am;

which account for measurement errors and morphological

characteristics of fetal physiopathology.

~am can be derived by projecting the multivariate normal

model (1) along any generic parameter axes, xk (k = 1,

2,…, 5), as follows:

pðxk=wÞ ¼ 1ffiffiffiffiffiffiffiffiffiffi2p~r2

k

p exp � 1

2

xk � ~lkð Þ2

~r2k

( )ð6Þ

where ~lk is of course the kth component of ~l and ~r2k is the

corresponding variance from the principal diagonal of

covariance matrix ~R ¼ RðBWmpÞ:Any component ~am;k of vector ~am can therefore be cal-

culated from (6):

~am;k ¼1� 2

Z ~xk

�1pðxk=wÞ if ~xk � ~lk

1� 2

Z þ1~xk

pðxk=wÞ if ~xk [ ~lk

8>>><>>>:

ð7Þ

Accuracy of EFW was evaluated by computing the mean

absolute percentage error, MAE%:

MAE% ¼XN

1

iAEi

N� 100

AEi ¼EFWi � BWij j

BWi

ð8Þ

where AEi is the relative absolute error of the model in

predicting the i-th fetal weight.

2.6 Clinical evaluation of model performance

Our method for real-time control of fetal echobiometry was

then tested for its effective ability to detect and correct

measurement errors and therefore improve accuracy in

EFW.

Ultrasound parameters of 61 fetuses were evaluated

within 5 days of delivery in the Department of Pediatrics,

Obstetrics and Reproductive Medicine, University of Sie-

na, by real-time interaction with our multinormal model,

implemented numerically by software developed in Matlab

language [19].

To investigate whether the system was able to appro-

priately correct measurement errors difficult to detect and

to significantly improve the accuracy of EFW, an obste-

trician with good experience in ultrasound (at least 2 years

experience) was chosen to perform fetal biometry. Ultra-

sound data were entered in the model to evaluate the

probability of agreement among measured fetal biometric

parameters and actual EFW.

On the basis of clinical evidence, model-estimated

maximum probability, ~amax; corresponding to the most

probable EFW (i.e. BWmp) and congruence probabilities of

the parameters, ~am; the operators decided autonomously

whether or not to correct the first set of measurements and

to proceed with further refined measurements. Specifically,

for each set, ~x; of measured echobiometric parameters, the

operator was suggested to consider possible measurement

errors when at least one of the ~am parameter probabilities

was less than 50% or when the EFW probability, ~amax; was

less than 50%. In this case, the operator decided to make

new ultrasound measurements or to keep the current

measurements, depending on his/her clinical experience

and on case-specific clinical information.

Improvements of accuracy in EFW were assessed by

applying our interactive method on-line to the 61 above-

mentioned pregnant women in the last week before deli-

very. We calculated mean and maximum AE% (MAE% and

AEmax%) and the percentage of FW having AE% greater

than 10% (AEgt10%).

The effectiveness of measurement error correction was

also evaluated using some mathematical models from the

literature [3, 14, 22, 25, 33, 35, 44] proven to give per-

formance equivalent to our model by error comparison

using the non parametric statistical test of Wilcoxon [1].

3 Results

3.1 Model estimation of fetal weight

Model performance was statistically equivalent for the

training, validation and testing data sets (Wilcoxon test,

p [ 0.05). We therefore report the results for the entire

data set used for model design. Figure 2 shows the distri-

bution of percentage error in relation to birth weight for the

multinormal probability model and the seven models which

gave statistically equivalent performance on the 61 data

items used for evaluating our model in real-time clinical

practice. Table 1 gives the MAE% and the percentage of

cases with AE% greater than 10% (AEgt10%) for each

model. As we can see (Fig. 2), only our proposed multi-

normal model, by virtue of its probability nature, has

uniform non-biased behaviour over the whole range of

BW. On the contrary, all the other models based on

regression techniques have an error distribution strongly

influenced by training data density in BW space, with the

only exception being the Hadlock model, which has

moderate bias because it was trained on a data set having a


123

quite uniform BW distribution [22]. Table 1 shows that this

model had errors very similar (MAE% = 7.81, AEgt10% =

30.8%) to our model (MAE% = 7.86, AEgt10% = 31.3%).

In particular, Fig. 2 shows that the Ott [33], Combs [14],

Woo [44] and Robson [35] models overestimate low BWs

and underestimate high BWs, whereas the Hill [25] and

Benson [3] models have different biases, underestimating

low and high BWs and overestimating intermediate BWs.

The lowest performances in Table 1 are shown by models

particularly biased at high BWs. Cases with high errors

generally also had low probabilities associated with our

model EFW, presumably due to ultrasound measurement

errors. Probability region boundaries with low probability

values are therefore an inspection area in which measure-

ment errors should be checked and where the accuracy of

EFW could improve.

Fig. 2 Distribution of

percentage error in relation to

birth weight in our multinormal

model and the other seven

models selected to give

statistically equivalent

performance with our clinical

data


123

A prototypical numerical implementation of our model

is shown in Fig. 3 that reports the screen hard copy of

graphical user interface of the underlying software. In the

right side of Fig. 3 we have gestational information ð~wÞ;actual measurements ð~xÞ; probabilities of congruence

among them ð~amÞ and their model-estimated expected

values ð~lÞ: The lower the probability of parameter con-

gruence, the more suspect that parameter has to be

considered. High deviation ð~dmÞ from expected values may

be due to measurement errors. Excessively low probability

values or low values of more than one parameter suggest

that the ultrasound session should be repeated. Figure 3

(left side) shows the five plot windows of most probable

parameter values (black lines) and standard deviations

(light blue lines) in relation to BW, as estimated from

ANNs. Dots around curves represent training data. On the

top of the graphic windows are the EFW (BWmp) and its

multivariate probability ð~amaxÞ: Again, the lower this

probability, the more high measurement errors, or unusual

body conformation, or both, can be expected. When ~amax is

particularly low, at least one of the congruency probabili-

ties ~am is low as well. Dashed blue lines underline both

EFW (BWmp, vertical lines) and its corresponding model-

estimated expected parameter values ð~l; horizontal lines).

At the bottom of each plotting area, the univariate expected

EFWs (BWexp, vertical dashed red lines) are reported with

the measured parameter values ð~x; horizontal dashed red

lines). The multivariate most probable EFW, BWmp, is of

course between the minimum and maximum of five uni-

variate BWexp values.

Figure 3 shows an example of EFW by our system. It

concerned a fetus at 40 weeks. The system indicates that

measured head circumference (HC = 350 mm) has a low

probability (10%) of being congruent with respect to other

fetal biometric parameters and an EFW of 3,331 g (prob-

ability 13%). This could mean: (1) that the HC

measurement is incorrect and that it needs to be measured

again; (2) that fetal HC is correct but is bigger than

expected because of hereditary predisposition; (3) that HC

is bigger for pathological reasons. Only the operator

experience, if necessary with other clinical information,

can answer this question.

Table 1 Model performance evaluated on the whole set of data

(training, validation and testing sets) used to design the multinormal

model

Model MAE% AEgt10%

Multinormal 7.86 31.3

Ott 7.45 27.2

Combs 8.43 33.1

Hill 8.00 29.7

Woo 7.53 28.7

Benson 8.43 32.6

Hadlock 7.81 30.8

Robson 7.74 30.0

Mean absolute percentage error MAE%; percentage of fetuses esti-

mated to have an AE% greater than 10% AEgt10%

Fig. 3 Graphic user-interface

of interactive software for fetal

echobiometry control and

correction, to improve EFW

accuracy


123

3.2 Clinical evaluation of model performance

In 16 out of 61 cases (26.3%) fetal biometry was measured

once and in 45 cases it was repeated two or more times, to a

total of 153 measurements. System performance was

assessed by comparing its 61 initial FW estimates with

those obtained without (16 cases) or with one (3 cases) or

more (42 cases) re-measurements of ultrasound parameters

associated with low (less than 50%) congruence probabil-

ities. For comparison we used EFW, derived from 182

formulas (from 59 published papers) [17]. Considering the

61 initial estimates, seven formulas [3, 14, 22, 25, 33, 35,

44] showed a performance statistically equivalent to our

system (Wilcoxon test, P [ 0.05). All other formulas gave

significant higher errors. Table 2 shows the performances

of all models. It is evident that correction of detected errors

yielded statistically significant improvements not only in

our model EFW (MAE% from 6.5% to 2.6%) but also when

the new biometry was tested by the seven best models (i.e.

Hadlock formula MAE% from 6.7% to 3.5%), thus con-

firming that the system is able to correct measurement

errors that affect model performance, worsening their

accuracy.

In particular, although the Hadlock model showed the

second best decrease in MAE% after our system, we found

a drastic reduction in error variability, with a maximum

error of 9.0% (in the same fetus), lower than that made by

our system (maximum error of 10.7%). Nevertheless, this

maximum error of 10.7% is acceptable, because it concerns

a normal weight fetus (real weight 3,640 g) that was

underestimated by the system (EFW equal to 3,250 g).

Other models also showed very good performance with

few errors above 10%.

In the cases we analyzed, MAE% was low at initial

estimations because the measurements were made by an

experienced operator. After correction (excepting two

models), the percentage of cases with an error above 10%

reduced to zero, as shown in Table 2. Maximum error was

lower or just a little higher than 10%.

4 Discussion

Accurate prediction of BW by ultrasonographic measure-

ment of classical fetal morphometric parameters plus other

related pregnancy data, such as gestational age, amniotic

fluid volume and number of fetuses, is of considerable

interest in obstetrics, enabling clinicians to more accurately

predict infant morbidity and mortality [17]. Moreover,

EFW in utero is of great clinical interest for monitoring

fetal growth [31, 34] and may have a central role in major

medical decisions in critical conditions of preterm delivery

and fetal macrosomy [15, 20, 35, 36].

Although a lot of sophisticated mathematical formulas

and models have been developed in the last 30 years [3, 11,

14–17, 20, 22, 23, 25, 27, 29, 33, 35, 36, 38, 41, 44],

estimates still typically have too high an error variance,

preventing reliable clinical use [13, 15, 17, 29]. Even

operators with proven ability in ultrasound examination

provide remarkably high percentages (15–25%) of fetuses

whose BW is estimated with an AE% greater than 10%.

This problem seems difficult to overcome because the

many errors of fetal ultrasound evaluation are presumably

due to technological, environmental, intra- and inter-

observer variability in fetal measurement and so forth [17,

29]. There are currently unlikely to be major revolutions in

technology, ultrasonographic practice and other methods

that could significantly improve accuracy of measurements

and/or their ability to predict BW more reliably. At the

moment, it is not at all easy to quantify errors, and

Table 2 Model performance evaluated in 61 pregnancies before (initial measurements) and after (ultimate measurements) zero (16 cases), one

(3 cases), or more corrections (42 cases) of the initial ultrasound measurements

Model Initial measurements Ultimate measurements

MAE% AEmax% AEgt10% MAE% AEmax% AEgt10%

Multinormal 6.5 19.3 13.1 2.6 10.7 1.6

Ott 5.7 16.9 9.8 4.3 9.6 0.0

Combs 5.8 19.1 11.5 4.2 12.2 1.6

Hill 6.2 18.6 13.1 4.6 10.2 1.6

Woo 6.6 20.1 18.0 4.6 9.8 0.0

Benson 6.7 19.1 16.4 4.9 13.6 3.3

Hadlock 6.7 18.1 16.4 3.5 9.0 0.0

Robson 6.7 16.8 16.4 5.4 14.7 9.8

Corrections were decided autonomously by the operator using an interactive system based on the proposed multinormal model for fetal weight

estimation: absolute percentage AE%; mean absolute percentage error MAE%; maximum absolute percentage error AEmax%; percentage of

fetuses estimated to have AE% greater than 10% AEgt10%


123

particularly to discriminate errors due to intra- and inter-

observer variability in ultrasound measurements. Efforts

must be made to minimise this variability if EFW is to be

considered clinically useful [17].

Many recent attempts have been made to reduce the

estimation error on lower and higher FWs, where the

clinical interest is of course focused. In general, clinicians

distinguish these two critical intervals of weight from an

intermediate one that typically ranges from 2,500 to

4,000 g [16, 20, 23]. Almost all models for EFW exhibit a

worsening of accuracy in critical weight classes (below

2,500 g and above 4,000 g) where lower/higher weights

are usually over/under-estimated [13, 16, 29]. Most math-

ematical models are derived from statistical regressions

and account nonlinearly for ultrasound measurements by

fitting experimental data. They are therefore most accurate

for intermediate weights, where experimental data has

higher density, and produce increasing biases going from

median to lower or higher FWs where data density pro-

gressively decreases. Concerning this problem it is really

important to underline that it is in the critical weight

classes that weight estimation becomes fundamental from a

clinical point of view. A dangerous increase of the rate of

false normal weights arises. In other words, such biased

models tend to reassure excessively about a normal FW,

correctly identifying only very critical conditions that can

be detected by simple qualitative investigations.

Models specialized in critical weight ranges have also

been constructed and tested: they are sometimes much

more accurate in the range where they have been fitted and

dramatically less accurate elsewhere, as would be expected

[15, 17, 23, 29, 35, 36, 38, 41]. The use of these specialized

models therefore requires prior knowledge about the

weight range in which to classify the fetus, leading to

dangerous amplification of errors in borderline areas which

are of critical clinical interest. This has also legal impli-

cations for ultrasonographers who may make gross errors

with severe consequences for maternal and fetal health.

Moreover, there have been several studies to evaluate

the efficacy of mathematical models related to specific GA

intervals [32, 41]. Although GA intervals are better defined

than weight intervals, they are nevertheless affected by

gestational age estimation precision, that becomes less

accurate as pregnancy goes on, and it is only partially

related to microsomic and macrosomic fetuses.

In our opinion, the use of mathematical models spe-

cialized for specific FW and/or GA ranges can therefore be

dangerous, of little clinical interest and not significantly

better than those applicable to the entire fetal population. In

other words, they are of no help.

All other efforts to decrease AE% by introducing cor-

rection factors in the algorithms and new information, such

as amniotic fluid volume, number of fetuses and maternal

pathologies, or non-routine echobiometric parameters, have

failed to bring effective improvements [8]. Moreover, more

recent mathematical models, besides the above mentioned

limits, are sometimes based on echobiometric parameters

difficult to obtain, particularly by unskilled operators [8,

37, 40]. Specifically, three-dimensional (3D) ultrasound

enables volumetric parameters such as fetal thigh, upper

arm and abdomen to be measured for EFW. Although

preliminary studies seems to indicate improvements [40],

doubts remain about the utility of 3D for a substantial

improvement in the accuracy of EFW [17]. Moreover, 3D

ultrasound systems are expensive, not as widespread as 2D

systems, and unfamiliar for operators doing routine fetal

biometry. In any case, if the superiority of 3D ultrasound

systems were established, our model could be easily

extended to volumetric measurements.

Today, about ten models are considered to give the best,

not significantly different performances and none give a

MAE% below 7–8% [15, 17, 29].

We chose to tackle the problem of reducing human error

in the use of ultrasound devices for fetal biometry by sig-

nificantly improving the accuracy of EFW. An interesting

attempt to control ultrasound measurement errors by

enhancing the fetal border and reducing noise was recently

proposed for evaluation of nuchal translucency thickness

[30]. Its impact on fetal echobiometry for improving the

accuracy of EFW should be investigated.

We designed a weight-dependent Gaussian probability

model [1, 28] over the whole range of BWs, which avoids

the above-mentioned biases and provides detailed infor-

mation about the reliability of measurements through

interactive software, allowing redefinition of measurements

and real-time correction. Model parameters were estimated

from a large database of 3,000 fetuses, collected by ultra-

sound operators of proven experience, though presumably

containing measurement errors. Our hypothesis was that by

correcting or limiting these errors, we could obtain an EFW

of acceptable accuracy to protect fetal and maternal health

and reduce wrong medical decisions, which sometimes also

have legal implications.

In line with Dudley [17], we consider that insufficient

accuracy in EFW depends on excessive intra- and inter-

observer variability of measurements. The great advantage

of using a multivariate Gaussian model is that it assigns

probability values to the different ultrasound measurements

and to EFW. The model is designed and trained on ultra-

sound data measured by experienced ultrasound operators

who carefully followed the standardised protocols for

correct echobiometry [4]. It can therefore guide operators

to follow its reliable statistical representation suggesting

repetition of divergent readings to reduce errors. We

assumed that human errors occur more frequently in the

space of ultrasound measurements where the model


123

indicates lower probabilities of congruence among bio-

metric parameters. However, low probabilities can also

arise from fetal pathology or peculiar morphology, such as

maternal diabetes, unusual parental build and abnormal

fetal growth. Though these zones may not be distinguished

by ultrasound examination alone, they are both of great

clinical interest. Thus, when operators encounter low

model probabilities, they are alerted to investigate more

thoroughly than usual and to repeat suggested biometric

measurements. Two distinct situations are possible so that

new measurements can be: (1) the same as before and/or

still associated with low probabilities; (2) substantially

different but in the direction of model expected values,

increasing the probability of congruence with other fetal

parameters. In the first case, there may be abnormalities

suggesting the need to review other clinical data, such as

maternal/paternal build and pathologies. In the second

case, measurement errors may be detected and corrected. In

both situations, at least a third session of measurements is

recommended for confirmation. If any disagreement still

remains between measurement sessions, operators should

decide on the basis of other clinical information and/or

experience.

Since our method incorporated certain clinical infor-

mation about pregnancy, it was convenient to use an ANN

approach [24] to estimate multinormal model parameters

(i.e. mean vectors and covariance matrices), that were

made to depend on pregnancy data and FW. The model

dependence on pregnancy information gives a more accu-

rate probability but makes the problem of estimating its

parameters from sample data unfeasible with common

statistical methods, such as multivariate regression, which

would be inaccurate. For example, means of the parameter

vector could be estimated by entering pregnancy variables

in multivariate linear regression models where echobio-

metric measurements are assumed as dependent variables.

Unfortunately, all regression techniques are very sensitive

to empty regions in observation space and to outliers [1, 5,

6, 12, 28], and are most accurate where observations are

densest. Since in clinical application there is great interest

in regions with low data density, e.g. macrosomic and

microsomic fetuses, we choose an ANN approach to

overcome the many limits of regression technique [6, 24,

26]. ANNs are sophisticated machine learning methods

which make it possible to express the knowledge contained

in experimental data with great flexibility and precision,

and provide a uniform description, without discontinuities,

of the input-output relationship. They can therefore deter-

mine expected output values with satisfactory accuracy, by

interpolating missing data even in multivariate space with

few sparse observations [6]. Other important advantages of

ANNs with respect to statistical regression models are that

it is not necessary to specify model structure, hypotheses

about statistical data distribution are unnecessary, they are

able to describe nonlinearities, naturally take correlation of

input variables into account and can be trained with

examples like humans [24, 26, 39]. ANNs have recently

been successfully applied in many fields of medicine. All

that is required is a sufficiently large, representative set of

training examples. The main difficulty with ANNs is their

training which must be done with care to avoid overfitting,

a tendency of ANNs to learn even training data variability

which cannot be generalized to the whole phenomenon.

There are many methods of ensuring ANN generalization

power, for example regularization techniques, growing and

pruning algorithms, genetic algorithms and early-stopping

(ES) procedures [6, 26]. We applied the ES which is widely

used to train ANNs by virtue of its fast computational time

[6, 24]. It divides the available data into training and val-

idation sets. Generalization is ensured by stopping the

training process at the iteration when the ANN begins to

overfit, that is when the error computed on the validation

set starts to increase. However, since the validation set is

involved in the training process in any case, it must not be

used for estimating the generalization error. We therefore

tested the ANNs with the third set of data (testing set)

which had not been used during training [6].

When we tested our model in clinical practice to correct

operator measurement errors in real time, we obtained very

encouraging results. Fetal biometric measurements were

performed by an experienced operator because we wanted

to understand whether under optimum conditions, it was

possible to obtain errors below 10%. We were successful in

this endeavour.

The fact that we obtained a significant lowering of

MAE% when we fitted the corrected parameters in the best

estimation models of the literature, confirms that our sys-

tem can in fact help operators to correct measurement

errors. The system also promises to be useful for training

less experienced sonographers and could be used as a

quality control system for fetal biometry. By reducing

human error, it enhances EFW and clinical obstetric

management.

5 Conclusions

A multinormal probability model for the estimation of fetal

weight was implemented numerically to provide clinical

indications about the type and size of measurement errors

in real-time fetal echobiometry. The model compared

actual measures with expected values and associated

probability values with EFW, indicating the reliability of

EFW in terms of congruence with ultrasound measure-

ments. Low probabilities suggest more accurate repetition

of suspect measurements and help ultrasound operators to


123

interpret fetal morphology by distinguishing between

measurement errors and real pathophysiological

conditions.

Compared to other EFW models of equivalent accuracy,

probability models also have the major clinical advantage

of avoiding over- and under-estimation of micro- and

macrosomic fetal weights.

Clinical testing of the model on a sample of 61 fetuses

revealed its good performance in correcting measurement

errors and showed a remarkable improvement in accuracy of

EFW, confirmed by other mathematical models of proven

accuracy. Our proposed interactive software therefore offers

valid support for training operators in fetal echobiometry.

Although system capacity clearly needs to be tested on a

wider scale, its clinical utility and simplicity, as well as the

sharp improvement in accuracy of EFW, suggest that it

could be used as a reliable auxiliary for clinical decision

making in pregnancy. This is also an advance in the direc-

tion of standardization of measuring procedures, which are

often a severe limiting factor in ultrasonographic practice.

Acknowledgments This work was financed by the Italian Ministry of

Education, University and Research (MIUR). Special thanks to ESA-

OTE S.p.A., Genoa, Italy, for its precious and prompt technical support.

References

1. Armitage P, Berry G (1987) Statistical methods in medical

research. Blackwell, Oxford

2. Ben-Haroush A, Yogev Y, Hod M (2004) Fetal weight estimation

in diabetic pregnancies and suspected fetal macrosomia. J Perinat

Med 32(2):113–121

3. Benson CB, Doubilet PM, Saltzman DH (1987) Sonographic

determination of fetal weights in diabetic pregnancies. Am J

Obstet Gynecol 156(2):441–444

4. Bettelheim D, Deutinger J, Bernaschek (1997) Fetal sonographic

biometry: a guide to normal and abnormal measurements. The

Parthenon Publishing Group

5. Biagioli B, Scolletta S, Cevenini G, Barbini E, Giomarelli P,

Barbini P (2006) A multivariate Bayesian model for assessing

morbidity after coronary artery surgery. Crit Care 10(3):R94. doi:

10.1186/cc4951

6. Bishop HCM (1995) Neural networks for pattern recognition.

Clarendon, Oxford

7. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Kenney

SP, Devoe LD (1998) Limitations of clinical and sonographic

estimates of birth weight: experience with 1034 parturients.

Obstet Gynecol 91(1):72–77

8. Chauhan SP, West DJ, Scardo JA, Boyd JM, Joiner J, Hendrix

NW (2000) Antepartum detection of macrosomic fetus: clinical

versus sonographic, including soft-tissue measurements. Obstet

Gynecol 95(5):639–642

9. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Scardo JA,

Berghella V (2005) A review of sonographic estimate of fetal

weight: vagaries of accuracy. J Matern Fetal Neonatal Med

18(4):211–220

10. Chauhan SP, Cole J, Sanderson M, Magann EF, Scardo JA (2006)

Suspicion of intrauterine growth restriction: use of abdominal

circumference alone or estimated fetal weight below 10%. J Ma-

tern Fetal Neonatal Med 19(9):557–562

11. Chuang L, Hwang JY, Chang CH, Yu CH, Chang FM (2002)

Ultrasound estimation of fetal weight with the use of computer-

ized artificial neural network model. Ultrasound Med Biol

28(8):991–996

12. Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple

regression: correlation analysis for the behavioral sciences. Erl-

baum, London

13. Colman A, Maharaj D, Hutton J, Tuohy J (2006) Reliability of

ultrasound estimation of fetal weight in term singleton pregnan-

cies. New Zeal Med J 119(1241):U2146

14. Combs CA, Jaekle RK, Rosenn B, Pope M, Miodovnik M, Sid-

diqi TA (1993) Sonographic estimation of fetal weight based on a

model of fetal volume. Obstet Gynecol 82(3):365–370

15. Coomarasamy A, Connock M, Thornton J, Khan KS (2005)

Accuracy of ultrasound biometry in the prediction of macroso-

mia: a systematic quantitative review. Brit J Obstet Gynaec

112(11):1461–1466

16. Dudley NJ (1995) Selection of appropriate ultrasound methods

for the estimation of fetal weight. Brit J Radiol 68:385–388

17. Dudley NJ (2005) A systematic review of the ultrasound esti-

mation of fetal weight. Ultrasound Obstet Gynecol 25(1):80–89

18. Edwards A, Goff J, Baker L (2001) Accuracy and modifying

factors of the sonographic estimation of fetal weight in a high-

risk population. Aust NZ J Obstet Gyn 41(2):187–190

19. Etter DM, Kuncicky DC, Moore H (2005) Introduction to

MATLAB 7. Prentice Hall, Englewood Cliffs

20. Farmer RM, Medearis AL, Hirata GI, Platt LD (1992) The use of

a neural network for the ultrasonographic estimation of fetal

weight in the macrosomic fetus. Am J Obstet Gynecol

166(5):1467–1472

21. Goldberg JD (2004) Routine screening for fetal anomalies:

expectations. Obstet Gynecol Clin North Am 31(1):35–50

22. Hadlock FP, Harrist RB, Sharman RS, Deter RL, Park SK (1985)

Estimation of fetal weight with the use of head, body, and femur

measurements - a prospective study. Am J Obstet Gynecol

151:333–7

23. Hadlock FP (1990) Sonographic estimation of fetal age and

weight. Fetal Ultrasound 28(1):39–51

24. Haykin S (1994) Neural networks: a comprehensive foundation.

Maxwell Macmillian, Canada

25. Hill LM, Breckle R, Gehrking WC, O’Brien PC (1985) Use of

femur length in estimation of fetal weight. Am J Obstet Gynecol

152:847–852

26. Jamshidi M (2003) Tools for intelligent control: fuzzy control-

lers, neural networks and genetic algorithms. Philos Transact A

Math Phys Eng Sci 361(1809):1781–1808

27. Jordaan HV (1983) Estimation of fetal weight by ultrasound.

J Clin Ultrasound 11(2):59–66

28. Krzanowski WJ (1988) Principles of multivariate analysis: a

user’s perspective. Clarendon, Oxford

29. Kurmanavicius J, Burkhardt T, Wisser J, Huch R (2004) Ultr-

asonographic fetal weight estimation: accuracy of formulas and

accuracy of examiners by birth weight from 500 to 5000 g.

J Perinat Med 32(2):155–161

30. Lee YB, Kim MJ, Kim MH (2007) Robust border enhancement

and detection for measurement of fetal nuchal translucency in

ultrasound images. Med Biol Eng Comput (Spec issue). doi:

10.1007/s11517-007-0225-7

31. Lockwood CJ, Weiner S (1986) Assessment of fetal growth. Clin

Perinatol 13(1):3–35

32. Mongelli M, Biswas A (2002) Menstrual age-dependent

systematic error in sonographic fetal weight estimation: a math-

ematical model. J Clin Ultrasound 30(3):139–44


123

http://dx.doi.org/10.1186/cc4951

http://dx.doi.org/10.1007/s11517-007-0225-7

33. Ott WJ, Doyle S, Flamm S, Wittman J (1986) Accurate ultrasonic

estimation of fetal weight. Prospective analysis of a new ultra-

sonic formula. Am J Perinatol 3(4):307–10

34. Ott WJ (2006) Sonographic diagnosis of fetal growth restriction.

Clin Obstet Gynecol 49(2):295–307

35. Robson SC, Gallivan S, Walkinshaw SA, Vaughan J, Rodeck CH

(1993) Ultrasonic estimation of fetal weight: use of targeted

formulas in small for gestational age fetuses. Obstet Gynecol

82(3):359–364

36. Rosati P, Exacoustos C, Caruso A, and Mancuso S (1992)

Ultrasound diagnosis of fetal macrosomia. Ultrasound Obstet

Gynecol 2(1):23–29

37. Rotmensch S, Celentano C, Liberati M, Malinger G, Sadan O,

Bellati U, Glezerman M (1999) Screening efficacy of the sub-

cutaneous tissue width/femur length ratio for fetal macrosomia in

the non-diabetic pregnancy. Ultrasound Obstet Gynecol

13(5):340–344

38. Sabbagha RE, Minogue J, Tamura RK, Hungerford SA (1989)

Estimation of birth weight by use of ultrasonographic formulas

targeted to LGA, AGA, and SGA fetuses. Am J Obstet Gynecol

160:854–862

39. Sargent DJ (2001) Comparison of artificial neural networks with

other statistical approaches: results from medical data sets.

Cancer 91(S8):1636–1642

40. Schild RL, Fimmers R, Hansmann M (2000) Fetal weight esti-

mation by three-dimensional ultrasound. Ultrasound Obstet

Gynecol 16(5):445–452

41. Secher NJ, Djursing H, Hansen PK, Lenstrup C, Sindberg-Erik-

sen P, Thomsen BL, Keiding N (1987) Estimation of fetal weight

in the third trimester by ultrasound. Eur J Obstet Gynecol Reprod

Biol 24:1–11

42. Sladkevicius P, Saltvedt S, Almstrom H, Kublickas M, Grune-

wald C, Valentin L (2005) Ultrasound dating at 12–14 weeks of

gestation. A prospective cross-validation of established dating

formulae in in vitro fertilized pregnancies. Ultrasound Obstet

Gynecol 26(5):504–511

43. Thornton JG, Hornbuckle J, Vail A, Spiegelhalter DJ, Levene M,

GRIT study group (2004) Infant wellbeing at 2 years of age in the

growth restriction intervention trial (GRIT): multicentred ran-

domised controlled trial. Lancet 364(9433):513–520

44. Woo JS, Wan MC (1986) An evaluation of fetal weight predic-

tion using a simple equation containing the fetal femur length.

J Ultrasound Med 5(8):453–457


123

Health & Medicine

Informativni model verjetnosti | An informative probability model