Upload
brock-hyde
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Andrew Smith. Describing childhood diet with cluster analysis 6th September 2012. Describing diet with cluster analysis. Kate Northstone Pauline Emmett PK Newby World Cancer Research Fund MRC, Wellcome Trust, University of Bristol. Describing diet with cluster analysis. Outline. - PowerPoint PPT Presentation
Citation preview
Andrew Smith
Describing childhood diet with cluster analysis6th September 2012
Describing diet with cluster analysis
• Kate Northstone
• Pauline Emmett
• PK Newby
• World Cancer Research Fund
• MRC, Wellcome Trust, University of Bristol
2
Describing diet with cluster analysis3
Outline
• Introductions• ALSPAC• Food frequency questionnaires / diet diaries• Dietary patterns• Cluster analysis
• k-means cluster analysis
• Results• 4 cluster solution• Associations with socio-demographic variables
4
ALSPAC
• Avon Longitudinal Study of Parents and Children
• Birth cohort study
• 14,541 pregnant women and their children
• www.bris.ac.uk/alspac
5
Food frequency questionnaires6
Diet diaries
• Records all food and drink consumed over 3 day
period
• 2 weekdays and 1 weekend day
• Parent completes age 7
• Child completes age 10 and 13
7
Dietary patterns
• Examine diet as a whole
• Start with many variables
(food group intakes)
• Express as a small number of
variables
Image: Paul / FreeDigitalPhotos.net
8
Principal components analysis (PCA)
• Examine diet as a whole
• Start with many variables
• Use correlations between foods
• Express as a small number of
components
Image: Paul / FreeDigitalPhotos.net
9
Cluster analysis
• Examine diet as a whole
• Start with many variables
• Use similarities between people
• Express as a small number of
clusters
Image: Paul / FreeDigitalPhotos.net
10
Cluster analysis
• Separate subjects into
non-overlapping
groups
• Based on ‘distances’
between individuals
• Unsupervised learning
11
Image: Boaz Yiftach / FreeDigitalPhotos.net
k-means cluster analysis
• Most widely used for dietary patterns
• Number of clusters, k, is specified beforehand
• Minimises – Distance from each subject to his/her cluster
mean– Summed over all subjects in that cluster– Summed over all clusters
12
k-means cluster analysis13
Problems with the standard algorithm
The algorithm for k-means cluster analysis is:
• Short-sighted
• Tends to find solutions that are at a local minimum– So run algorithm 100 times and choose solution
that is minimum out of all minima
14
Standardising the input variables15
Reliability of the cluster solution
• Split sample in half
• Perform separate analyses on each half
• See how many children change clusters
• Repeat 5 times– 32 out of 8,279 children changed cluster (0.4%)
16
Results
• Food frequency questionnaire (FFQ) data– Age 7– 3 clusters
• Diet diary data– Age 7, 10 and 13– 4 clusters
17
Processed30.2% of children18
Image: Suat Eman, artemisphoto, -Marcus- / FreeDigitalPhotos.net
27.8% of childrenPlant-based (Healthy)19
Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net
Traditional British21.3% of children20
Image: Suat Eman, Maggie Smith, Simon Howden / FreeDigitalPhotos.net
Packed Lunch20.6% of children21
Image: Grant Cochrane, luigi diamanti, Rawich, Master Isolated Images / FreeDigitalPhotos.net
Associations with socio-demographic vars
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
Girls 3,115 1 1 1
Boys 2,941 0.82 (0.72, 0.93)
1.03(0.89, 1.20)
1.18 (1.04, 1.34)
22
Associations with socio-demographic vars
Maternal age
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
< 21 130 1 1 1
21-25 994 0.59 (0.33, 1.07)
1.07 (0.56, 2.05)
1.57(1.02, 2.43)
26-30 2,644 0.52(0.29, 0.92)
1.20(0.64, 2.28)
1.60(1.04, 2.46)
31+ 2,288 0.37(0.21, 0.67)
1.50(0.79, 2.88)
1.77(1.13, 2.76)
23
Associations with socio-demographic vars
Maternal education
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
CSE 740 1 1 1
Vocational 504 0.84(0.60, 1.17)
1.19(0.82, 1.72)
1.01(0.76, 1.32)
O level 2,163 0.65(0.51, 0.83)
1.46(1.10, 1.94)
1.05(0.86, 1.30)
A level 1,604 0.42(0.33, 0.55)
2.01(1.50, 2.69)
1.18(0.95, 1.48)
Degree 1,045 0.30(0.23, 0.39)
2.75(2.00, 3.76)
1.22(0.94, 1.57)
24
Associations with socio-demographic vars
Siblings
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
0 older 2,755 1 1 1
1 older 2,317 1.21(1.03, 1.42)
1.12 (0.94, 1.36)
0.73(0.62, 0.86)
2+ older 984 1.58(1.28, 1.97)
0.99(0.76, 1.27)
0.64(0.52, 0.80)
25
Associations with socio-demographic vars
Siblings
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
0 younger 2,946 1 1 1
1 younger 2,490 1.01(0.86, 1.19)
0.58(0.48, 0.71)
1.69(1.44, 1.99)
2+ younger 620 1.21(0.92, 1.57)
0.43(0.33, 0.58)
1.90(2.50, 2.40)
26
Summary
• Multivariate methods to compress dietary data into
dietary patterns
• k-means cluster analysis is widespread but must
be applied carefully
• 3 clusters in FFQ data (Processed, Plant-based
and Traditional British)
• 4 clusters in diet diary data ( + Packed Lunch)
27
References
• Northstone, AS et al. (2012) ‘Longitudinal
comparisons of dietary patterns derived by cluster
analysis in 7 to 13 year old children’ British Journal
of Nutrition to appear.
• AS et al. (2011) ‘A comparison of dietary patterns
derived by cluster and principal components
analysis in a UK cohort of children.’ European
Journal of Clinical Nutrition 65, p1102-9.
28