If you can't read please download the document
Upload
lephuc
View
222
Download
1
Embed Size (px)
Citation preview
VERONIQUE STORME
SAS and SAS/EG at the Plant Systems Biology Department
Using the mixed procedure to analyze data with empty cells
random coefficient model with a fitted periodic function
invoke SAS in batch mode in a unix environment passing through environment variables
SAS Enterprise Guide: using prompts
SAS information
outline
Using the mixed procedure to analyze data with empty cells
Leaf area data on 6 different plant varieties
Clustered data
Leaf series analysis: Leaf area was measured on 11 leaves from a single plant
Some varieties have less than 11 leaves
The experiment was performed 3 times
Problem setting
Subject plots
Frequency tables
Table of Line by Leaf
Line Leaf
Frequency 0 1 3 4 5 6 7 8 9 10 11 Total
AA1_H 30 30 30 30 30 30 30 28 12 1 0 251
AA1_WT 30 30 30 30 30 30 30 30 23 4 0 267
AA2_H 28 28 28 28 28 28 28 28 25 5 1 255
AA2_WT 28 28 28 28 28 28 28 28 25 4 0 253
AA3_H 24 24 24 24 24 24 24 23 17 4 0 212
AA3_WT 30 30 30 30 30 30 30 30 28 14 5 287
Total 170 170 170 170 170 170 170 167 130 32 6 1525
Analysis 1:
model Area= Line|Leaf / ddfm = KR ;
repeated leaf/type=un subject=PlantID ;
random int /subject=experiment ;
Analysis 2:
model Area= Line|Leaf / ddfm = KR ;
repeated leaf/type=ar(1) subject=PlantID ;
random int /subject=experiment ;
Analysis 3:
model Area= Line|Leaf ;
repeated leaf/type=ar(1) subject=PlantID ;
random int /subject=experiment
Proc mixed analyses
Results type 3 tests of fixed effects ANALYSIS 1 (KR UN)
Analysis 1:
model Area= Line|Leaf / ddfm = KR ;
repeated leaf/type=un subject=PlantID ;
random int /subject=experiment ;
Analysis 2:
model Area= Line|Leaf / ddfm = KR ;
repeated leaf/type=ar(1) subject=PlantID ;
random int /subject=experiment ;
Analysis 3:
model Area= Line|Leaf ;
repeated leaf/type=ar(1) subject=PlantID ;
random int /subject=experiment
Proc mixed analyses
ANALYSIS 2 (KR AR(1))
Results type 3 tests of fixed effects ANALYSIS 1 (KR UN)
Analysis 1:
model Area= Line|Leaf / ddfm = KR ;
repeated leaf/type=un subject=PlantID ;
random int /subject=experiment ;
Analysis 2:
model Area= Line|Leaf / ddfm = KR ;
repeated leaf/type=ar(1) subject=PlantID ;
random int /subject=experiment ;
Analysis 3:
model Area= Line|Leaf ;
repeated leaf/type=ar(1) subject=PlantID ;
random int /subject=experiment
Proc mixed analyses
ANALYSIS 3 (CONTAINMENT AR(1)) ANALYSIS 2 (KR AR(1))
Results type 3 tests of fixed effects ANALYSIS 1 (KR UN)
Option 1: truncate the data
Option 2: tune the sensitivity in sweeping with the singular option
model Area= Line|Leaf / ddfm = KR singular = 1E-7;
repeated leaf/type=ar(1) subject=PlantID ;
The singular option
random coefficient model with a fitted periodic function
2 Arabidopsis varieties
4 treatment conditions
High-througput phenotyping (IGIS)
Phenotype: compactness
Compactness describes if the leaves are nearer around the centroid or farther away from it, e. g. by having longer stipes
rhythmic leaf movements (circadian clock)
Problem setting
Research question:
Is there an effect of the variety and/or treatment on the amplitude
Mean line plot
Fit a model with a fundamental sine wave
Allow sinusodal deviations for each plant
Assume these random coefficients come from the same normal distribution
Use a single variance component for all the trigonometric components
model compactness =
S C G(enotype) T(reatment) time S*G C*G S*T C*T G*time T*time
S*G*T C*G*T G*T*time/ ddfm=Satterthwaite;
random plantID;
random S*plantID C*plantID /type=toep(1) ;
Where S = sin(2*constant('pi')*time/24)
C = cos(2*constant('pi')*time/24)
The analysis model
b0 + b1*S + b2*C + b3*G1 + b4*T1 + b5*T2 + b6*T3 + b7*time +
b8*S*G1 + b9*C*G1 +
b10*S*T1 + b11*S*T2 + b12*S*T3 + b13*C*T1 + b14*C*T2 + b15*C*T3 +
b16*time*G1 + b17*time*T1 + b18*time*T2 + b19*time*T3 +
b20*S*G1*T1 + b21*S*G1*T2 + b22*S*G1*T3 +
b23*C*G1*T1 + b24*C*G1*T2 + b25*C*G1*T3 +
b26*time*G1*T1 + b27*time*G1*T1 + b28*time*G1*T1
The regression model
Amplitude:
1 + 82 + 2 + 9
2
GT1 AND REF TREATMENT
b0 + b1*S + b2*C + b7*time
Amplitude: 12 + 2
2
REF GT AND REF TREATMENT
Mean predicted amplitude
Amplitude:
1 + 102 + 2 + 13
2
REF GT AND TREATMENT T 1
Amplitude:
1 + 8 + 10 + 202 + 2 + 9 + 13 + 23
2
GT1 AND TREATMENT T 1
Parametric bootstrap (ie resampling residuals)
Fit the model
Bootstrap sample from the residuals
Add the randomly resampled e to Y-hat
Fit the model for each of the B reps
Compute bootstrap estimates
Difficulty: unbalanced clustered data
Standard errors ?
invoke SAS in batch mode in a unix environment passing through environment variables
%let path=%sysget(fullpath); * returns the value as string;
%let libname=%sysget(wkd);
libname &libname "&path";
%include "&path.m_NameConversions.sas" /source2;
%include "&path.m_selectSNPs.sas" /source2;
%include "&path.m_selectPheno.sas" /source2;
%NameConversions(libname=wkd,traitfile=file);
%selectSNPs(libname=wkd);
%selectPheno(libname=wkd);
Main code
[vesto@midas TEST]$ sas
-set fullpath "/group/biostat/myGWASprojects/SNP_ARAB/GCEP/GALAXY/TEST/"
-set wkd "stem"
-set file "stems3.txt"
-sysin workflow_unix.sas
set : defines an environment variable
sysin: specifies an external file
Invoke SAS in batch mode (Red Hat Enterprise LINUX 6)
SAS Enterprise Guide: using prompts
Way to automate your project
Prompts pass parameters to macro variables
Example:
routine two-way analysis of variance where the whole experiment was repeated 3 times independently
Performing simple tests of effects with the plm procedure
SAS Enterprise Guide Prompts
Open the sas program
Create prompts
Assign the prompts to the program
Steps
libname dir "&path";
ods graphics on;
proc mixed data =dir.&inputdata scoring = 3;
class &F1 &F2 █
model &Y=&F1 &F2 &F1.*&F2 /ddfm= satterthwaite solution vciry
outp=out singular=1E-7;
random █
lsmeans &F1.*&F2 ;
repeated /group = &F2 ;
store work.result;
run;
ods graphics off;
proc plm restore = work.result;
slice &F1.*&F2/sliceby=&F2 diff adjust=&method;
effectplot;
lsmeans &F1.*&F2 ;
run;
path
Give the path where the inputdata is located
Name of the
macro
variable
proc plm restore = work.result;
slice &F1.*&F2/sliceby=&F2 diff
adjust=&method;
SAS provides a wealth of information
Koen Knapen
SAS Technical Support:
Aditya
Bart
ACKNOWLEDGEMENTS
Proc mixed manual
Schaalje et al (SAS Paper 262-26)
Cassell D.L. SAS paper 183-2007
Morris J.S., 2002
Shang S and Cavanaugh J.E., 2008
Hettinger P. SAS paper DV-03
REFERENCES
references
Default algorithm to optimize the likelihood function:
ridge-stabilized Newton-Raphson algorithm
Possible problems
covariance parameters are on a different scale
Rescale the effects
poor MIVQUE(0) starting values
Use the Fisher scoring algorithm in the first 3 steps
Proc mixed data = input scoring = 3;
Other convergence problems