Upload
afsin
View
214
Download
1
Embed Size (px)
Citation preview
Examination of relation between nutrient componentsand fruits: Biplot approach
CEMAL ATAKAN1, BARIS ALKAN2 & AFSIN SAHIN3
1Ankara University, Faculty of Science, Department of Statistics, Tandogan, Ankara, Turkey,2Sinop University, Faculty of Sciences and Arts, Department of Statistics, Sinop, Turkey, and3Ministry of Agriculture and Rural Affairs, Foreign Relations and European Union, Tarim
Bakanligi Kampusu Lodumlu, Ankara, Turkey
AbstractAdequate intake of fruits and vegetables as part of the daily diet may help prevent majordiseases. Low fruit intake is a major risk factor for cancer, coronary heart disease and stroke.The World Health Organization recommends eating at least five portions of a variety of fruit,which is nearly 400 g/day. Essential nutrients, water, carbohydrates, oils and vitamins areneeded in appropriate quantities in order to have a well-functioning body. In this study we try tocarry out a food composition study to identify and determine the chemical nature of the organicand inorganic macro-nutrient and micro-nutrient properties of the main fruit types that affecthuman nature, by a biplot graphical approach. The biplot can be considered as multivariateequivalents of scatter plots that have been used for graphically analyzing bivariate data. Biplotapproaches show a simultaneous display of fruits and nutrient components in low dimensions.In the present study, the theory of biplot and different types of biplot will be given and than anapplication of the biplot approach will be applied to the real data.
Keywords: Nutrients, singular value decomposition, biplot
Introduction
Adequate intake of fruits and vegetables as part of the daily diet may help prevent
major non-communicable diseases (Ebrahimof et al. 2006). Low fruit intake is a major
risk factor for cancer, coronary heart disease and stroke (Bashirian et al. 2008).
Therefore, diets rich in fruits are associated with a lower risk of chronic diseases.
Anemia, rickets, obesity, avitaminosis, simple guatr and tooth decay are common
diseases arising from wrong nutritional habits. There is clear and growing evidence for
the protective effects of fruits against chronic diseases. Fruits also increase fiber intake
and reduce fat intake, reduce obesity, manage diabetes, and reduce blood pressure.
The World Health Organization (WHO) recommends eating at least five portions of a
variety of fruit, which is nearly 400 g/day (see WHO [1990] for the details).
Essential nutrients, water, carbohydrates, oils, proteins (nitrogen-containing class of
foods assembled from monomers called amino acids; see Carey and Hanley [1999] for
the details) and vitamins are needed in appropriate quantities in order to have a well-
functioning body. In several fruits, there are several different types of nutrient
Correspondence: Baris Alkan, Sinop University, Faculty of Sciences and Arts, Department of Statistics,
57000 Sinop, Turkey. Email: [email protected]
ISSN 0963-7486 print/ISSN 1465-3478 online # 2009 Informa UK Ltd
DOI: 10.1080/09637480802668489
International Journal of Food Sciences and Nutrition,
August 2009; 60(S1): 181�189
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
components in different amounts. These nutrient components are carbohydrates,
proteins, oils, vitamins and water. Fruits are rich in vitamins, mineral and fitochemical
components. The energy-yielding organic nutrients*carbohydrates, oils, proteins*provide fuel for the bioenergetics reactions taking place in water (nearly 45�75% of
our body), and vitamins catalyze those carbon, hydrogen and oxygen elements
(Driskell 2000: p. 3).
Human choices are crucial for nutrient selection. Human beings maximize their
utility by consuming conscious fruits. This is initially important for human beings and
for the ecosystem. Therefore, in this study we try to carry out a food composition
study to identify and determine the chemical nature of the organic and inorganic
macro-nutrient and micro-nutrient properties of the main fruit types that affect
human nature, by a biplot graphical approach. The biplot technique used for showing
significant properties of multivariate data structure was initially introduced by Gabriel
(1971) and later developed by Bradu and Gabriel (1978), Gabriel and Zamir (1979),
Gower and Harding (1988), Gower and Hand (1996). This technique has been
applied to different areas and its practicability has been proven. The biplot can be
considered as multivariate equivalents of dot scatter plots that have been used for
analyzing bivariate data. When we have more than two variables, graphical analysis of
the relationships among the variables becomes harder. In a multivariate data-set, to
interpret the relationships, lower and generally two-dimensional space approaches
have been attempted to obtain. This situation causes knowledge losses; however,
relationships have been interpreted more easily.
The biplot approach produces graphs showing a simultaneous representation of
fruits and nutrient component variables of the data matrix. The ‘bi’ exposition in
biplot does not represent the dimension of the graph, but shows that the fruits and
nutrient components can be shown in the same graph. This technique is based on the
singular value decomposition analysis.
The following section gives the theoretical background for the biplot technique and
considers several types. In the third section of the study, the results of the biplot
approach to the nutrient components of the 25 types of fruit are presented and the
results are discussed.
Materials and methods
Singular value decomposition
A matrix is singular if its determinant is zero or not invertible. Singular value
decomposition (SVD) is a generalization of the eigenvalue decomposition. This
technique is used for decomposition of a non-invertible and rectangular matrix. When
we apply SVD to a matrix, we obtain three simple matrices. Two of them are
orthogonal and the third is a diagonal matrix.
SVD of a data matrix X that has a rank ((Rank(Xnxp5min{n, p}) and X �Rnxp) is
shown by Equation (1) (Johnson and Wichern 1998):
Xnxp�UnxrGrxrVTrxp (1)
where Unxr and Vpxr are the left and right singular vectors, respectively. U and V are
singular vectors satisfying the following conditions: UTU�Ir and VTV�Ir*and
therefore are orthogonal matrices. The symbol Ir denotes here the unit matrix of size
rxr. Grxr is a diagonal matrix consisting of the following singular values:
182 C. Atakan et al.
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
G�diag(g1; g2; :::; gr); g1]g2] :::]gr�0:
The square root of the eigenvalues (li, i�1,2, . . .,r) of the XXT or XTX matrices gives
singular values. Besides, the rank of X is a non-zero singular value number.
The normalized eigenvectors of XXT are the ‘left’ singular vectors (U) while the
normalized eigenvectors of XTX are the ‘right’ singular vectors (V).
SVD of a data matrix X is written by:
Xnxp�UnxrGrxrVTrxp�
Xr
i�1
giuivTi �g1u1vT
1 �g2u2vT2 � � � ��grurv
Tr (2)
Eckart�Young theorem
SVD of a data matrix X is shown by Equation (1). SVD of X (k)nxp where (k5r) is
given by:
X (k)nxp�UnxkGkxkVT
kxp (3)
The Eckart�Young theorem finds the optimal approximation of X (k)nxp to Xnxp by the
least-squares approach theorem. Namely, the aim is to minimize the sum of squared
residuals. According to this, we may obtain the following (Eckart and Young 1936):
minrank(B)�k
kX �Bk�kX �X (k)k�
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr
i�k�1
g2i
vuut �
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr
i�k�1
li
vuut (4)
Approximation to the X matrix with lower ranked matrices
Assume that we want to approximate to the matrix Xnxp of rank r by a matrix X (k)nxpof
lower rank (k5r). To do this, we initially define the approximation error concept. The
measure of approximation error is generally given by E�X�X(k) as the Euclid norm
of error matrix. We can write the squared Euclid norm matrix as a trace of the internal
product of the matrix, and therefore we can write the following equation (Bartkowiak
and Szustalewicz 1995).
kEk�kX �X (k)k�[trace(ET E)]1=2��Xn
i�1
Xp
j�1
e2ij
�1=2
(5)
The problem is how to approach the matrix Xnxp with lower rank matrices with
minimum error. This problem was initially considered by Householder and Young
(1938). According to this, the best approximation of a matrix Xnxp of rank r by a
matrix X (k)nxp of rank k5r, when minimizing the Euclid norm of error matrix E�X�
X(k), is obtained by taking as the approximation matrix the first k component of the
singular value decomposition of X (Bartkowiak and Szustalewicz 1995). The singular
value decomposition of the X (k)nxp approach matrix is given by Equation (3). While
detecting the error of approximation, if in Equation (5) k�r, then the error is zero.
The error of approximation can be obtained by the SVD of the matrices Xnxp and X (k)nxp
and the Eckart�Young theorem.
kEk� minrank(B)�k
kX �Bk�kX �X (k)k�kUnxrGrxrVTrxp�UnxkGkxkV T
kxpk
�kXr
i�1
giuivTi �
Xk
i�1
giuivTi k; k5r
Relation between nutrient components and fruits 183
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
�kgk�1uk�1vTk�1� � � ��grurv
Tr k ;
�kXk�
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffitrace(X T X )
p �
�
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr
i�k�1
g2i
vuut �
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXr
i�k�1
li
vuut
Goodness of the approach is called the goodness-of-fit. Goodness-of-fit is defined by
the ratio of the squared norm of the X (k)nxp matrix to the squared norm of the Xnxp matrix.
Goodness-of-Fit�kX (k)k2
kXk2 �l1 � � � �� lk
l1 � � � �� lr
; k5r:
All of the eigenvalues are equal to the variance of the data cloud given by the X matrix:
Total variance�kXk2�trace(X T X )�l1� � � ��lr:
Biplot approach
The biplot is an approach showing the fruits and nutrient components of the matrix
Xnxp on a single graph. The biplot approach can be used in continuous and categorical
data. In nearly all applications concerning the structure of the data, a transformation
has been done to the X matrix and the biplot technique has been used. An example of
the transformation is centralization according to the mean of the variable, standardi-
zation of the variables and logarithmic transformations (Kuhfeld 1992). Assume that
the rank of the matrix Znxp after the transformation is r. Decomposition of the Znxp is
given by Equation (6) after the transformation (Aitchison and Greenacre 2002):
Znxp�FnxrGTrxp (6)
SVD is used for decomposition of this matrix.
In the r dimensional Euclidean space the columns of the GT matrix and the rows of
the F matrix imply coordinates of the p points for nutrient components and n points
for the fruits, respectively. It is called whole space because of having a dimension equal
to the rank of the Z matrix. Different biplot types Znxp (r rank) arose after giving
different weights to the singular values U and V (Cox and Cox 2001).
Biplot types
According to the values of the a constant used for obtaining F and G matrices,
different biplot approaches have been achieved. F and G matrices are found by
equations F�U Ga and G�V G(1�a) for the a constant, which can take values between
05a51 (Aitchison and Greenacre 2002). The most frequent used values for the aconstant are zero and one. All of the choice here gives the same matrix approach and
introduces the different method for the data matrix. Here, a basic coordinates term is
used for singular vectors scaled by singular values. Besides, standard coordinates are
used for singular vectors not scaled by singular values (Greenacre 1984).
Graphs showing the nutrient components of the graphs are obtained for the low
values of the a constant; however, graphs giving more weight to the fruits of the graphs
are obtained for the higher values of the constant.
184 C. Atakan et al.
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
Covariance biplot
If a�0, singular values are turned over to the Z matrix’s right single vectors. Fruits are
the standard coordinates and nutrient components are the basic coordinates;
therefore, we can obtain graphs giving the nutrient components. The covariance
biplot is defined by GGT/(n�1), which is the ordinary least-squares approach of the
S�ZTZ/(n�1) covariance matrix (Aitchison and Greenacre 2002). For a�0 the
following equations are obtained:
Fnxr�U and Gpxr�VG; (7)
Properties of the covariance biplot
i. The length of a vector is proportional to the standard deviation of the data of the
vector.
ii. The cosine of the angle between the two nutrient components is equal to the
correlation between the nutrient components:
r�Cov(zi; zj)ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s2zi� s2
zj
q �zT
i zjffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffikzik
2kzjk2
q �kzik � kzjk � Cos u
kzik � kzjk�Cos u (8)
iii. In whole space, the distance between two points approximates the Mahalanobis
distance between rows, and can be given by:
d2i;j �(zi�zj)
TS�1(zi�zj) (9)
Form biplot
If a�1, singular values are transferred to the left singular vectors of the Z matrix. This
is used for obtaining form biplot to present the graph showing the fruits. It is
approached to the FFT/(n�1) form matrix by ZZT/(n�1), which is the scalar product
of the rows between the Z matrix (Aitchison and Greenacre 2002).
For a�1 the following equations are obtained:
Fnxr�UG and Gpxr�V (10)
Properties of the form biplot
1. The distance between the fruits points is calculated by the Euclid distance.
2. The length of the nutrient components vectors shows the goodness of approach.
3. The projection of fruit point onto the nutrient component vector gives the
approximate element of the transformed data matrix.
Results and discussion
In this study, we applied the biplot technique to detect the similarities among the 25
types of fruits in terms of the quantities of five nutrient components. Table I presents
the names of the fruits used and the five types of nutrient components we considered.
The rows in Table I are the fruits, and the columns are the five nutrient components.
We initially standardized the 25�5 data matrix for an application of the biplot
technique.
Second we applied SVD to the standardized data matrix. Table II presents the
singular values and the explanation ratios of the total variance of each singular values.
Relation between nutrient components and fruits 185
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
Table III and Table IV indicate the nutrient components coordinates for form and
covariance biplot, respectively. Sixty-seven percent of the total variance is explained by
the first dimension, and 25% of the total variance is explained by the second
dimension. Six percent of the total variance is explained by the third dimension, 1.7%
of the total variance is explained by the fourth dimension, and 0.3% of the total
variance is explained by the fifth dimension. As a result, 92% of the total variance is
explained by the two-dimensioned biplot graph (Johnson and Wichern 1998).
According to this, the optimal approach to the Z25�5 that has a rank of five is
sustained by the Z(2)25�5 matrix, which has a rank of two. The U and V singular vectors
and singular values are obtained by SVD. Also, the F and G matrices are obtained by
Table I. Nutrient components in fruits (100 g).
Fruit Water (g) Protein (g) Carbohydrate (g) Oil (g) Energy (kcal)
1 Grape (fresh) 81 0.6 17.3 0.3 67
2 Grape (dry) 18 2.5 77.4 0.2 289
3 Fig (fresh) 82 1.2 20.4 0.4 88
4 Fig (dry) 23 4.3 6.9 1.3 274
5 Orange 86 0.8 8.5 0.1 35
6 Tangerine 87 0.8 11.6 0.2 46
7 Grayfruit 91 0.6 5.3 0.2 43
8 Lemon (juice) 91 0.3 1.6 0.5 7
9 Apple 84 0.3 11.9 0.3 46
10 Banana 71 1.1 19.2 0.2 76
11 Peach 86 0.6 9.1 0.2 37
12 Strawberry 89 0.6 6.2 0.3 26
13 Apricot (fresh) 85 1.0 18.2 0.6 51
14 Apricot (dry) 25 5.0 66.5 1.0 260
15 Pear 83 0.3 10.6 0.2 41
16 Watermelon 92 0.5 6.4 0.2 26
17 Melon 90 0.8 7.7 0.3 33
18 Avocado 69 4.2 1.8 22.2 223
19 Plum (red) 81 0.5 17.8 0.2 66
20 Pineapple (Concentrate) 77 0.5 11.6 0.3 46
21 Pomegranate 82 0.5 16.0 0.3 63
22 Berry 80 1.3 17.4 0.3 70
23 Black mulberry 76 0.9 19.8 1.1 93
24 Raspberry 84 1.2 13.6 0.5 55
25 Quince 83 0.4 15.3 0.1 71
Sources: Food-Info (2005), DFCD (2005), and GDS (2005).
Table II. Singular values and proportions explained.
Dimension
Singular
values
Squares of singular values
(eigenvalues)
Proportion
explained
Cumulative
proportion
1 8.9888 80.7985 0.6700 0.6700
2 5.4926 30.1686 0.2500 0.9200
3 2.5842 6.6780 0.0600 0.9800
4 1.4298 2.0443 0.0170 0.9970
5 0.5575 0.3108 0.0030 1
Total 120.0002 1
186 C. Atakan et al.
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
the singular values for the zero and one values of the a constant. For the two different
F and G matrices, we obtained the biplot graphs shown in Figures 1 and 2.
If we take a�0, we obtain the covariance biplot graph shown in Figure 1. The
cosine of any two component vectors gives approximately the correlation between
these nutrient components. Thus, we can interpret any two nutrient components that
have a high positive correlation ranged in a similar way. When the graph in Figure 1
is analyzed, it is seen that protein and energy vectors have a high correlation by
r�0.9744, u�138 and ranged in a similar direction. At the same time, there is a high
negative correlation between protein and water (r��0.8740, u�1518) and between
carbohydrate and water (r��0.8991, u�1548). These take opposite directions.
Avocado and dry fig are the two fruits containing the most oil. The other fruits show
similar properties concerning the oil content. Dry apricot, dry grape, avocado and dry
fig are the most rich fruits concerning protein and energy. Dry grape, dry apricot and
dry fig are the most rich fruits concerning carbohydrates. The fruits that are the
richest concerning the water component are those from Fruit 8 to Fruit 23.
If we take a�1, the biplot graph shown in Figure 2 is obtained. The form biplot and
the covariance biplot give similar information. But when we consider the relationship
among the fruits, generally the form biplot is considered. The fruits are similar in the
form biplot, and therefore the fruits have similar structures. This graph shows the
relation among different fruits and decomposes the different types of fruits. When we
analyze the form biplot graph in Figure 2, dry apricot and dry grape show a similar
structure. The other interpretation is that the fruits between Fruit 8 and Fruit 23 have
a similar structure. For the biplot graphics as described in this study, we have used
programs written in MATLAB.
Figure 1. Covariance biplot.
Relation between nutrient components and fruits 187
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
Conclusion
The biplot approach is a successful technique for classifying the relation among
variables and summarizing the properties of a multivariate data-set. This approach
gives an opportunity to analyze the data by a visual graph. Form and covariance plot
graphs perform well on classification of biplot fruits. Form and covariance biplot
graphs give identical results. In this respect, when we consider the these two biplot
graphs, we observed that protein and energy components are highly correlated,
protein�water and carbohydrate�water components are negatively correlated, and dry
Table III. Nutrient component coordinates for a�0.
Water Protein Carbohydrate Oil Energy
Dimension 1 �4.6613 4.5136 3.5139 1.7326 4.8320
Dimension 2 1.0611 1.3155 �2.8243 4.3885 0.2751
Figure 2. Form biplot.
Table IV. Nutrient component coordinates for a�1.
Water Protein Carbohydrate Oil Energy
Dimension 1 �0.5186 0.5021 0.3909 0.1928 0.5376
Dimension 2 0.1932 0.2395 �0.5142 0.7990 0.0501
188 C. Atakan et al.
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.
grape and dry apricot show similar structure. The fruits that have rich protein and
energy are dry apricot, dry grape, avocado and dry fig; and the fruit that has a high
amount of oil is avocado. We can obtain more meaningful information by combining
the biplot approach with clustering decomposition methods such as k-average
clustering. One of the deficiencies considering the clustering decomposition method
is not being able to obtain detailed information on the clusters. This imperfection of
the clustering decomposition method can be modified by using the biplot approach.
References
Aitchison J, Greenacre M. 2002. Biplots of compositional data. Appl Stat 51:375�392.
Bartkowiak A, Szustalewicz A. 1995. The augmented biplot and some examples of its use. Machine
Graphics Vision 4:161�185.
Bashirian S, Allahverdpour H, Moeini B. 2008. Fruit and vegetable intakes among elementary schools’
pupils: Using five-a-day educational program. J Res Health Sci 8:56�63.
Bradu D, Gabriel KR. 1978. The biplot as a diagnostic tool for models of two-way tables. Technometrics
20:47�68.
Carey J, Hanley V. 1999. Protein structure. Maryland: In online textbook of the Biophysical Society.
Cox TF, Cox MAA. 2001. Multidimensional scaling. London: Chapman&Hall/CRC.
DFCD. 2005. Danish food composition databank. National Food Institute of Technical University of
Denmark. Available online at: http://www.foodcomp.dk/v7/fcdb_download.asp. (Accessed 3 June 2008).
Driskell JA. 2000. Sports nutrition. New York: CRC Press.
Ebrahimof S, Hoshyarrad A, Hossein A, Zandi N, Larijani B, Kimiagar M. 2006. Fruit and vegetable intake
in postmenopausal women with osteoperia. ARYA J 1:183�187.
Eckart C, Young G. 1936. The approximation of one matrix by another of lower rank. Psychometrica 1:
211�218.
Food-Info. 2005. Food composition table. Wageningen University, The Netherlands. Available online at:
http://www.food-info.net/uk/foodcomp/table.htm. (Accessed 3 June 2008).
Gabriel KR. 1971. The biplot graphical display of matrices with application to principal component
analysis. Biometrica 58:453�467.
Gabriel KR, Zamir S. 1979. Lower rank approximation of matrices by least squares with any choice of
weights. Technometrics 21:489�498.
GDS. 2005. Energy, protein, oil ingredients of the nutrients. Food Industry. Available online at: http://
www.gidasanayii.com/modules.php?name�News&file�article&sid�5203. (Accessed 4 June 2008).
Gower JC, Hand DJ. 1996. Biplots. London: Chapman&Hall.
Gower JC, Harding SA. 1988. Nonlinear biplots. Biometrika Trust 75:445�455.
Greenacre MJ. 1984. Theory and applications of correspondence analysis. London: Academic Press.
Householder AS, Young G. 1938. Matrix approximation and latent roots. American Mathematical Monthly.
45:165�171.
Johnson RA, Wichern DW. 1998. Applied multivarite statistical analysis. New Jersey: Prentice&Hall.
Kuhfeld WF. 1992. Marketing research: Uncovering competitive advantages. SAS Technical Report. Cary
NC: SAS Institute Inc.
WHO. 1990. Diet, nutrition and prevention of chronic diseases. Geneva: World Health Organization.
This paper was first published online on iFirst on 23 February 2009.
Relation between nutrient components and fruits 189
Int J
Foo
d Sc
i Nut
r D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y U
nive
rsity
of
Lav
al o
n 07
/09/
14Fo
r pe
rson
al u
se o
nly.