Examination of relation between nutrient components and fruits: Biplot approach

Examination of relation between nutrient componentsand fruits: Biplot approach

CEMAL ATAKAN1, BARIS ALKAN2 & AFSIN SAHIN3

1Ankara University, Faculty of Science, Department of Statistics, Tandogan, Ankara, Turkey,2Sinop University, Faculty of Sciences and Arts, Department of Statistics, Sinop, Turkey, and3Ministry of Agriculture and Rural Affairs, Foreign Relations and European Union, Tarim

Bakanligi Kampusu Lodumlu, Ankara, Turkey

AbstractAdequate intake of fruits and vegetables as part of the daily diet may help prevent majordiseases. Low fruit intake is a major risk factor for cancer, coronary heart disease and stroke.The World Health Organization recommends eating at least five portions of a variety of fruit,which is nearly 400 g/day. Essential nutrients, water, carbohydrates, oils and vitamins areneeded in appropriate quantities in order to have a well-functioning body. In this study we try tocarry out a food composition study to identify and determine the chemical nature of the organicand inorganic macro-nutrient and micro-nutrient properties of the main fruit types that affecthuman nature, by a biplot graphical approach. The biplot can be considered as multivariateequivalents of scatter plots that have been used for graphically analyzing bivariate data. Biplotapproaches show a simultaneous display of fruits and nutrient components in low dimensions.In the present study, the theory of biplot and different types of biplot will be given and than anapplication of the biplot approach will be applied to the real data.

Keywords: Nutrients, singular value decomposition, biplot

Introduction

Adequate intake of fruits and vegetables as part of the daily diet may help prevent

major non-communicable diseases (Ebrahimof et al. 2006). Low fruit intake is a major

risk factor for cancer, coronary heart disease and stroke (Bashirian et al. 2008).

Therefore, diets rich in fruits are associated with a lower risk of chronic diseases.

Anemia, rickets, obesity, avitaminosis, simple guatr and tooth decay are common

diseases arising from wrong nutritional habits. There is clear and growing evidence for

the protective effects of fruits against chronic diseases. Fruits also increase fiber intake

and reduce fat intake, reduce obesity, manage diabetes, and reduce blood pressure.

The World Health Organization (WHO) recommends eating at least five portions of a

variety of fruit, which is nearly 400 g/day (see WHO [1990] for the details).

Essential nutrients, water, carbohydrates, oils, proteins (nitrogen-containing class of

foods assembled from monomers called amino acids; see Carey and Hanley [1999] for

the details) and vitamins are needed in appropriate quantities in order to have a well-

functioning body. In several fruits, there are several different types of nutrient

Correspondence: Baris Alkan, Sinop University, Faculty of Sciences and Arts, Department of Statistics,

57000 Sinop, Turkey. Email: [email protected]

ISSN 0963-7486 print/ISSN 1465-3478 online # 2009 Informa UK Ltd

DOI: 10.1080/09637480802668489

International Journal of Food Sciences and Nutrition,

August 2009; 60(S1): 181�189

Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

components in different amounts. These nutrient components are carbohydrates,

proteins, oils, vitamins and water. Fruits are rich in vitamins, mineral and fitochemical

components. The energy-yielding organic nutrients*carbohydrates, oils, proteins*provide fuel for the bioenergetics reactions taking place in water (nearly 45�75% of

our body), and vitamins catalyze those carbon, hydrogen and oxygen elements

(Driskell 2000: p. 3).

Human choices are crucial for nutrient selection. Human beings maximize their

utility by consuming conscious fruits. This is initially important for human beings and

for the ecosystem. Therefore, in this study we try to carry out a food composition

study to identify and determine the chemical nature of the organic and inorganic

macro-nutrient and micro-nutrient properties of the main fruit types that affect

human nature, by a biplot graphical approach. The biplot technique used for showing

significant properties of multivariate data structure was initially introduced by Gabriel

(1971) and later developed by Bradu and Gabriel (1978), Gabriel and Zamir (1979),

Gower and Harding (1988), Gower and Hand (1996). This technique has been

applied to different areas and its practicability has been proven. The biplot can be

considered as multivariate equivalents of dot scatter plots that have been used for

analyzing bivariate data. When we have more than two variables, graphical analysis of

the relationships among the variables becomes harder. In a multivariate data-set, to

interpret the relationships, lower and generally two-dimensional space approaches

have been attempted to obtain. This situation causes knowledge losses; however,

relationships have been interpreted more easily.

The biplot approach produces graphs showing a simultaneous representation of

fruits and nutrient component variables of the data matrix. The ‘bi’ exposition in

biplot does not represent the dimension of the graph, but shows that the fruits and

nutrient components can be shown in the same graph. This technique is based on the

singular value decomposition analysis.

The following section gives the theoretical background for the biplot technique and

considers several types. In the third section of the study, the results of the biplot

approach to the nutrient components of the 25 types of fruit are presented and the

results are discussed.

Materials and methods

Singular value decomposition

A matrix is singular if its determinant is zero or not invertible. Singular value

decomposition (SVD) is a generalization of the eigenvalue decomposition. This

technique is used for decomposition of a non-invertible and rectangular matrix. When

we apply SVD to a matrix, we obtain three simple matrices. Two of them are

orthogonal and the third is a diagonal matrix.

SVD of a data matrix X that has a rank ((Rank(Xnxp5min{n, p}) and X �Rnxp) is

shown by Equation (1) (Johnson and Wichern 1998):

Xnxp�UnxrGrxrVTrxp (1)

where Unxr and Vpxr are the left and right singular vectors, respectively. U and V are

singular vectors satisfying the following conditions: UTU�Ir and VTV�Ir*and

therefore are orthogonal matrices. The symbol Ir denotes here the unit matrix of size

rxr. Grxr is a diagonal matrix consisting of the following singular values:

182 C. Atakan et al.

Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

G�diag(g1; g2; :::; gr); g1]g2] :::]gr�0:

The square root of the eigenvalues (li, i�1,2, . . .,r) of the XXT or XTX matrices gives

singular values. Besides, the rank of X is a non-zero singular value number.

The normalized eigenvectors of XXT are the ‘left’ singular vectors (U) while the

normalized eigenvectors of XTX are the ‘right’ singular vectors (V).

SVD of a data matrix X is written by:

Xnxp�UnxrGrxrVTrxp�

Xr

i�1

giuivTi �g1u1vT

1 �g2u2vT2 � � � ��grurv

Tr (2)

Eckart�Young theorem

SVD of a data matrix X is shown by Equation (1). SVD of X (k)nxp where (k5r) is

given by:

X (k)nxp�UnxkGkxkVT

kxp (3)

The Eckart�Young theorem finds the optimal approximation of X (k)nxp to Xnxp by the

least-squares approach theorem. Namely, the aim is to minimize the sum of squared

residuals. According to this, we may obtain the following (Eckart and Young 1936):

minrank(B)�k

kX �Bk�kX �X (k)k�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr

i�k�1

g2i

vuut �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr

i�k�1

li

vuut (4)

Approximation to the X matrix with lower ranked matrices

Assume that we want to approximate to the matrix Xnxp of rank r by a matrix X (k)nxpof

lower rank (k5r). To do this, we initially define the approximation error concept. The

measure of approximation error is generally given by E�X�X(k) as the Euclid norm

of error matrix. We can write the squared Euclid norm matrix as a trace of the internal

product of the matrix, and therefore we can write the following equation (Bartkowiak

and Szustalewicz 1995).

kEk�kX �X (k)k�[trace(ET E)]1=2��Xn

i�1

Xp

j�1

e2ij

�1=2

(5)

The problem is how to approach the matrix Xnxp with lower rank matrices with

minimum error. This problem was initially considered by Householder and Young

(1938). According to this, the best approximation of a matrix Xnxp of rank r by a

matrix X (k)nxp of rank k5r, when minimizing the Euclid norm of error matrix E�X�

X(k), is obtained by taking as the approximation matrix the first k component of the

singular value decomposition of X (Bartkowiak and Szustalewicz 1995). The singular

value decomposition of the X (k)nxp approach matrix is given by Equation (3). While

detecting the error of approximation, if in Equation (5) k�r, then the error is zero.

The error of approximation can be obtained by the SVD of the matrices Xnxp and X (k)nxp

and the Eckart�Young theorem.

kEk� minrank(B)�k

kX �Bk�kX �X (k)k�kUnxrGrxrVTrxp�UnxkGkxkV T

kxpk

�kXr

i�1

giuivTi �

Xk

i�1

giuivTi k; k5r

Relation between nutrient components and fruits 183

Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

�kgk�1uk�1vTk�1� � � ��grurv

Tr k ;

�kXk�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffitrace(X T X )

p �

�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr

i�k�1

g2i

vuut �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr

i�k�1

li

vuut

Goodness of the approach is called the goodness-of-fit. Goodness-of-fit is defined by

the ratio of the squared norm of the X (k)nxp matrix to the squared norm of the Xnxp matrix.

Goodness-of-Fit�kX (k)k2

kXk2 �l1 � � � �� lk

l1 � � � �� lr

; k5r:

All of the eigenvalues are equal to the variance of the data cloud given by the X matrix:

Total variance�kXk2�trace(X T X )�l1� � � ��lr:

Biplot approach

The biplot is an approach showing the fruits and nutrient components of the matrix

Xnxp on a single graph. The biplot approach can be used in continuous and categorical

data. In nearly all applications concerning the structure of the data, a transformation

has been done to the X matrix and the biplot technique has been used. An example of

the transformation is centralization according to the mean of the variable, standardi-

zation of the variables and logarithmic transformations (Kuhfeld 1992). Assume that

the rank of the matrix Znxp after the transformation is r. Decomposition of the Znxp is

given by Equation (6) after the transformation (Aitchison and Greenacre 2002):

Znxp�FnxrGTrxp (6)

SVD is used for decomposition of this matrix.

In the r dimensional Euclidean space the columns of the GT matrix and the rows of

the F matrix imply coordinates of the p points for nutrient components and n points

for the fruits, respectively. It is called whole space because of having a dimension equal

to the rank of the Z matrix. Different biplot types Znxp (r rank) arose after giving

different weights to the singular values U and V (Cox and Cox 2001).

Biplot types

According to the values of the a constant used for obtaining F and G matrices,

different biplot approaches have been achieved. F and G matrices are found by

equations F�U Ga and G�V G(1�a) for the a constant, which can take values between

05a51 (Aitchison and Greenacre 2002). The most frequent used values for the aconstant are zero and one. All of the choice here gives the same matrix approach and

introduces the different method for the data matrix. Here, a basic coordinates term is

used for singular vectors scaled by singular values. Besides, standard coordinates are

used for singular vectors not scaled by singular values (Greenacre 1984).

Graphs showing the nutrient components of the graphs are obtained for the low

values of the a constant; however, graphs giving more weight to the fruits of the graphs

are obtained for the higher values of the constant.


Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

Covariance biplot

If a�0, singular values are turned over to the Z matrix’s right single vectors. Fruits are

the standard coordinates and nutrient components are the basic coordinates;

therefore, we can obtain graphs giving the nutrient components. The covariance

biplot is defined by GGT/(n�1), which is the ordinary least-squares approach of the

S�ZTZ/(n�1) covariance matrix (Aitchison and Greenacre 2002). For a�0 the

following equations are obtained:

Fnxr�U and Gpxr�VG; (7)

Properties of the covariance biplot

i. The length of a vector is proportional to the standard deviation of the data of the

vector.

ii. The cosine of the angle between the two nutrient components is equal to the

correlation between the nutrient components:

r�Cov(zi; zj)ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

s2zi� s2

zj

q �zT

i zjffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffikzik

2kzjk2

q �kzik � kzjk � Cos u

kzik � kzjk�Cos u (8)

iii. In whole space, the distance between two points approximates the Mahalanobis

distance between rows, and can be given by:

d2i;j �(zi�zj)

TS�1(zi�zj) (9)

Form biplot

If a�1, singular values are transferred to the left singular vectors of the Z matrix. This

is used for obtaining form biplot to present the graph showing the fruits. It is

approached to the FFT/(n�1) form matrix by ZZT/(n�1), which is the scalar product

of the rows between the Z matrix (Aitchison and Greenacre 2002).

For a�1 the following equations are obtained:

Fnxr�UG and Gpxr�V (10)

Properties of the form biplot

1. The distance between the fruits points is calculated by the Euclid distance.

2. The length of the nutrient components vectors shows the goodness of approach.

3. The projection of fruit point onto the nutrient component vector gives the

approximate element of the transformed data matrix.

Results and discussion

In this study, we applied the biplot technique to detect the similarities among the 25

types of fruits in terms of the quantities of five nutrient components. Table I presents

the names of the fruits used and the five types of nutrient components we considered.

The rows in Table I are the fruits, and the columns are the five nutrient components.

We initially standardized the 25�5 data matrix for an application of the biplot

technique.

Second we applied SVD to the standardized data matrix. Table II presents the

singular values and the explanation ratios of the total variance of each singular values.


Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

Table III and Table IV indicate the nutrient components coordinates for form and

covariance biplot, respectively. Sixty-seven percent of the total variance is explained by

the first dimension, and 25% of the total variance is explained by the second

dimension. Six percent of the total variance is explained by the third dimension, 1.7%

of the total variance is explained by the fourth dimension, and 0.3% of the total

variance is explained by the fifth dimension. As a result, 92% of the total variance is

explained by the two-dimensioned biplot graph (Johnson and Wichern 1998).

According to this, the optimal approach to the Z25�5 that has a rank of five is

sustained by the Z(2)25�5 matrix, which has a rank of two. The U and V singular vectors

and singular values are obtained by SVD. Also, the F and G matrices are obtained by

Table I. Nutrient components in fruits (100 g).

Fruit Water (g) Protein (g) Carbohydrate (g) Oil (g) Energy (kcal)

1 Grape (fresh) 81 0.6 17.3 0.3 67

2 Grape (dry) 18 2.5 77.4 0.2 289

3 Fig (fresh) 82 1.2 20.4 0.4 88

4 Fig (dry) 23 4.3 6.9 1.3 274

5 Orange 86 0.8 8.5 0.1 35

6 Tangerine 87 0.8 11.6 0.2 46

7 Grayfruit 91 0.6 5.3 0.2 43

8 Lemon (juice) 91 0.3 1.6 0.5 7

9 Apple 84 0.3 11.9 0.3 46

10 Banana 71 1.1 19.2 0.2 76

11 Peach 86 0.6 9.1 0.2 37

12 Strawberry 89 0.6 6.2 0.3 26

13 Apricot (fresh) 85 1.0 18.2 0.6 51

14 Apricot (dry) 25 5.0 66.5 1.0 260

15 Pear 83 0.3 10.6 0.2 41

16 Watermelon 92 0.5 6.4 0.2 26

17 Melon 90 0.8 7.7 0.3 33

18 Avocado 69 4.2 1.8 22.2 223

19 Plum (red) 81 0.5 17.8 0.2 66

20 Pineapple (Concentrate) 77 0.5 11.6 0.3 46

21 Pomegranate 82 0.5 16.0 0.3 63

22 Berry 80 1.3 17.4 0.3 70

23 Black mulberry 76 0.9 19.8 1.1 93

24 Raspberry 84 1.2 13.6 0.5 55

25 Quince 83 0.4 15.3 0.1 71

Sources: Food-Info (2005), DFCD (2005), and GDS (2005).

Table II. Singular values and proportions explained.

Dimension

Singular

values

Squares of singular values

(eigenvalues)

Proportion

explained

Cumulative

proportion

1 8.9888 80.7985 0.6700 0.6700

2 5.4926 30.1686 0.2500 0.9200

3 2.5842 6.6780 0.0600 0.9800

4 1.4298 2.0443 0.0170 0.9970

5 0.5575 0.3108 0.0030 1

Total 120.0002 1


Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

the singular values for the zero and one values of the a constant. For the two different

F and G matrices, we obtained the biplot graphs shown in Figures 1 and 2.

If we take a�0, we obtain the covariance biplot graph shown in Figure 1. The

cosine of any two component vectors gives approximately the correlation between

these nutrient components. Thus, we can interpret any two nutrient components that

have a high positive correlation ranged in a similar way. When the graph in Figure 1

is analyzed, it is seen that protein and energy vectors have a high correlation by

r�0.9744, u�138 and ranged in a similar direction. At the same time, there is a high

negative correlation between protein and water (r��0.8740, u�1518) and between

carbohydrate and water (r��0.8991, u�1548). These take opposite directions.

Avocado and dry fig are the two fruits containing the most oil. The other fruits show

similar properties concerning the oil content. Dry apricot, dry grape, avocado and dry

fig are the most rich fruits concerning protein and energy. Dry grape, dry apricot and

dry fig are the most rich fruits concerning carbohydrates. The fruits that are the

richest concerning the water component are those from Fruit 8 to Fruit 23.

If we take a�1, the biplot graph shown in Figure 2 is obtained. The form biplot and

the covariance biplot give similar information. But when we consider the relationship

among the fruits, generally the form biplot is considered. The fruits are similar in the

form biplot, and therefore the fruits have similar structures. This graph shows the

relation among different fruits and decomposes the different types of fruits. When we

analyze the form biplot graph in Figure 2, dry apricot and dry grape show a similar

structure. The other interpretation is that the fruits between Fruit 8 and Fruit 23 have

a similar structure. For the biplot graphics as described in this study, we have used

programs written in MATLAB.

Figure 1. Covariance biplot.


Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

Conclusion

The biplot approach is a successful technique for classifying the relation among

variables and summarizing the properties of a multivariate data-set. This approach

gives an opportunity to analyze the data by a visual graph. Form and covariance plot

graphs perform well on classification of biplot fruits. Form and covariance biplot

graphs give identical results. In this respect, when we consider the these two biplot

graphs, we observed that protein and energy components are highly correlated,

protein�water and carbohydrate�water components are negatively correlated, and dry

Table III. Nutrient component coordinates for a�0.

Water Protein Carbohydrate Oil Energy

Dimension 1 �4.6613 4.5136 3.5139 1.7326 4.8320

Dimension 2 1.0611 1.3155 �2.8243 4.3885 0.2751

Figure 2. Form biplot.

Table IV. Nutrient component coordinates for a�1.

Water Protein Carbohydrate Oil Energy

Dimension 1 �0.5186 0.5021 0.3909 0.1928 0.5376

Dimension 2 0.1932 0.2395 �0.5142 0.7990 0.0501


Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

grape and dry apricot show similar structure. The fruits that have rich protein and

energy are dry apricot, dry grape, avocado and dry fig; and the fruit that has a high

amount of oil is avocado. We can obtain more meaningful information by combining

the biplot approach with clustering decomposition methods such as k-average

clustering. One of the deficiencies considering the clustering decomposition method

is not being able to obtain detailed information on the clusters. This imperfection of

the clustering decomposition method can be modified by using the biplot approach.

References

Aitchison J, Greenacre M. 2002. Biplots of compositional data. Appl Stat 51:375�392.

Bartkowiak A, Szustalewicz A. 1995. The augmented biplot and some examples of its use. Machine

Graphics Vision 4:161�185.

Bashirian S, Allahverdpour H, Moeini B. 2008. Fruit and vegetable intakes among elementary schools’

pupils: Using five-a-day educational program. J Res Health Sci 8:56�63.

Bradu D, Gabriel KR. 1978. The biplot as a diagnostic tool for models of two-way tables. Technometrics

20:47�68.

Carey J, Hanley V. 1999. Protein structure. Maryland: In online textbook of the Biophysical Society.

Cox TF, Cox MAA. 2001. Multidimensional scaling. London: Chapman&Hall/CRC.

DFCD. 2005. Danish food composition databank. National Food Institute of Technical University of

Denmark. Available online at: http://www.foodcomp.dk/v7/fcdb_download.asp. (Accessed 3 June 2008).

Driskell JA. 2000. Sports nutrition. New York: CRC Press.

Ebrahimof S, Hoshyarrad A, Hossein A, Zandi N, Larijani B, Kimiagar M. 2006. Fruit and vegetable intake

in postmenopausal women with osteoperia. ARYA J 1:183�187.

Eckart C, Young G. 1936. The approximation of one matrix by another of lower rank. Psychometrica 1:

211�218.

Food-Info. 2005. Food composition table. Wageningen University, The Netherlands. Available online at:

http://www.food-info.net/uk/foodcomp/table.htm. (Accessed 3 June 2008).

Gabriel KR. 1971. The biplot graphical display of matrices with application to principal component

analysis. Biometrica 58:453�467.

Gabriel KR, Zamir S. 1979. Lower rank approximation of matrices by least squares with any choice of

weights. Technometrics 21:489�498.

GDS. 2005. Energy, protein, oil ingredients of the nutrients. Food Industry. Available online at: http://

www.gidasanayii.com/modules.php?name�News&file�article&sid�5203. (Accessed 4 June 2008).

Gower JC, Hand DJ. 1996. Biplots. London: Chapman&Hall.

Gower JC, Harding SA. 1988. Nonlinear biplots. Biometrika Trust 75:445�455.

Greenacre MJ. 1984. Theory and applications of correspondence analysis. London: Academic Press.

Householder AS, Young G. 1938. Matrix approximation and latent roots. American Mathematical Monthly.

45:165�171.

Johnson RA, Wichern DW. 1998. Applied multivarite statistical analysis. New Jersey: Prentice&Hall.

Kuhfeld WF. 1992. Marketing research: Uncovering competitive advantages. SAS Technical Report. Cary

NC: SAS Institute Inc.

WHO. 1990. Diet, nutrition and prevention of chronic diseases. Geneva: World Health Organization.

This paper was first published online on iFirst on 23 February 2009.


Int J

Foo

d Sc

i Nut

r D

ownl

oade

d fr

om in

form

ahea

lthca

re.c

om b

y U

nive

rsity

of

Lav

al o

n 07

/09/

14Fo

r pe

rson

al u

se o

nly.

Documents

Examination of relation between nutrient components and fruits: Biplot approach