77

Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

The Survival Kit V����

User�s Manual

Vincent Ducrocq and Johann S�olkner

July ��� ����

Page 2: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Page 3: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Contents

Preface � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Class of models supported � � � � � � � � � � � � � � � � � � � � � � � �

� Main features � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� The programs � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Hardware requirements � � � � � � � � � � � � � � � � � � � � � � � � � �

� Last Changes � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Disclaimer � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

Availability � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Data Preparation Program �PREPARE� ��

� Syntax � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� FILES Statement � � � � � � � � � � � � � � � � � � � � � � ��

�� INPUT Statement � � � � � � � � � � � � � � � � � � � � � � ��

�� TIME Statement � � � � � � � � � � � � � � � � � � � � � � � ��

�� CENSCODE Statement � � � � � � � � � � � � � � � � � � � ��

�� DISCRETE SCALE Statement � � � � � � � � � � � � � � � ��

�� IDREC Statement � � � � � � � � � � � � � � � � � � � � � � ��

� TRUNCATE Statement � � � � � � � � � � � � � � � � � � � ��

�� TIMECOV Statement � � � � � � � � � � � � � � � � � � � � ��

�� CONVDT Statement � � � � � � � � � � � � � � � � � � � � � ��

��� CLASS Statement � � � � � � � � � � � � � � � � � � � � � � �

��� JOINCODE Statement � � � � � � � � � � � � � � � � � � � �

��� TIMEDEP Statement � � � � � � � � � � � � � � � � � � � � �

Page 4: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� CONTENTS

��� COMBINE Statement � � � � � � � � � � � � � � � � � � � � ��

��� OUTPUT Statement � � � � � � � � � � � � � � � � � � � � ��

��� FUTURE Statement � � � � � � � � � � � � � � � � � � � � ��

��� PEDIGREE Statement � � � � � � � � � � � � � � � � � � � � ��

�� FORMOUT Statement � � � � � � � � � � � � � � � � � � � � ��

� Sorting of the recoded �le � � � � � � � � � � � � � � � � � � � � � � � ��

�� COX Program � � � � � � � � � � � � � � � � � � � � � � � � ��

�� WEIBULL Program � � � � � � � � � � � � � � � � � � � � � ��

�� Sorting of �future records� � � � � � � � � � � � � � � � � � � ��

� Programs COX and WEIBULL ��

�� Parameter �le ��� � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

��� Syntax � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

��� TIME Statement � � � � � � � � � � � � � � � � � � � � � � � ��

��� CODE Statement � � � � � � � � � � � � � � � � � � � � � � ��

��� ID Statement and PREVIOUS TIME Statement � � � � � ��

��� COVARIATE Statement � � � � � � � � � � � � � � � � � � � �

��� DISCRETE SCALE Statement � � � � � � � � � � � � � � � �

�� JOINCODE Statement � � � � � � � � � � � � � � � � � � � �

��� Format Statement � � � � � � � � � � � � � � � � � � � � � � ��

�� Parameter �le ��� � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

��� Syntax � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

��� FILES Statement � � � � � � � � � � � � � � � � � � � � � � ��

��� TITLE Statement � � � � � � � � � � � � � � � � � � � � � � ��

��� STRATA Statement � � � � � � � � � � � � � � � � � � � � � ��

��� RHO FIXED Statement � � � � � � � � � � � � � � � � � � � ��

��� MODEL Statement � � � � � � � � � � � � � � � � � � � � � ��

�� COEFFICIENT Statement � � � � � � � � � � � � � � � � � ��

��� �ONLY �STATISTICS Statement � � � � � � � � � � � � � ��

��� RANDOM Statement � � � � � � � � � � � � � � � � � � � � ��

���� INTEGRATE OUT Statement � � � � � � � � � � � � � � � ��

Page 5: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

CONTENTS �

���� INCLUDE ONLY Statement � � � � � � � � � � � � � � � � �

���� TEST Statement � � � � � � � � � � � � � � � � � � � � � � � �

���� STD ERROR and DENSE HESSIAN Statements � � � � � ��

���� BASELINE Statement � � � � � � � � � � � � � � � � � � � � ��

���� KAPLAN Statement � � � � � � � � � � � � � � � � � � � � � ��

���� SURVIVAL Statement � � � � � � � � � � � � � � � � � � � � ��

��� RESIDUALS Statement � � � � � � � � � � � � � � � � � � � ��

���� CONSTRAINTS Statement � � � � � � � � � � � � � � � � � ��

���� CONVERGENCE CRITERION Statement � � � � � � � � ��

���� STORAGE Statement � � � � � � � � � � � � � � � � � � � � ��

���� STORE SOLUTIONS and READ OLD SOLUTIONS State ment � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

���� STORE BINARY Statement � � � � � � � � � � � � � � � � ��

� The parinclu �le ��

� A small example ��

� Analysis of very large data sets �

�� PREPARE program � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Before running the COX or WEIBULL programs � � � � � � � � � � �

�� COX and WEIBULL programs � � � � � � � � � � � � � � � � � � � � �

Page 6: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� CONTENTS

Preface

The Survival Kit is primarily intended to �ll a gap in the software available to animalbreeders who generally tend to use extremely large data sets and want to estimate randome�ects Methods of survival analysis have primarily been developed in the area of clinicalbiometrics where data sets and number of levels of e�ects are usually smaller Theoryabout random e�ects in survival analysis �frailty models� is less well developed than forlinear models and there are no programs publicly available which deal with such models

Although developed by animal breeders for animal breeders� the programs of the Sur vival Kit should therefore be interesting for people from other areas encountering similarproblems of large models and random e�ects To make the Survival Kit user friendly�commands used in the parameter �les mimic the SAS command language

� Class of models supported

The models supported by the Survival Kit belong to the following class of univariateproportional hazards models with a single continuous or discrete response time�

��t�x�t�� z�t�� � ��j�t� expfx�t��b� z�t��ug ���

where ��t�x�t�� z�t�� is the hazard function of an individual depending on time t� ��j�t�is the baseline hazard function� x�t� is a vector of �possibly time dependent� �xed co variates with corresponding parameter vector b� z�t� is a vector of random �possiblytime dependent� covariates with corresponding parameter vector u

For a detailed statistical presentation of the methodology of survival analysis� see �Cox����� Prentice and Gloeckler� ���� Cox and Oakes� ����� Kalb�eisch and Prentice� �����Klein and Moeschberger� ����

� Main features

Baseline hazard function�

The �possibly strati�ed� baseline hazard function ��j�t� may either be unspeci�ed or followa Weibull hazard distribution ��j�t� � �j�j��jt��j�� In the �rst case� if the failure timevariable is t is coninuous� ��� de�nes a Cox model Estimates of b and u are obtainedusing what is known as a partial likelihood � a part of the full likelihood in which thebaseline hazard function does not appear When the failure time variable is discrete withfew categories and many observations with the same failure time �ties�� the Cox�s partiallikelihood approach is no longer valid Because the baseline hazard function can then bedescribed with few parameters� these can be estimated together with b and u using anapproach due to Prentice and Gloeckler�����

Page 7: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� PREFACE

The second case corresponds to a Weibull model� a common type of regression model thathas been shown to be �exible and often adequate for biological data

Fixed covariates�

Any number of �xed covariates is supported They may either be discrete �class� variablesor continuous There is no explicit limit to the number of levels of a discrete covariate�the limit will usually be a function of the computer memory avalaible� An intercept�grand mean� is always implicitly included in the Weibull mode �when there is onlyone stratum and no covariate speci�ed� this intercept is equal to � log � where � and� are the two Weibull parameters� Covariates may be time dependent �x�t��� wherethe dependency is modelled through �piecewise� constant hazard functions with jumpsat times corresponding to calendar dates �eg� January �st� or linked to the individualitself �eg� starting or stopping of smoking when the e�ect of smoking on survival isinvestigated�

Random covariates�

The random �discrete� covariates in vector z�t� may be de�ned to follow a log gammaor a normal distribution They may also follow a multivariate normal distribution wherethe covariance structure between individuals is modelled by the matrix of genetic rela tionships �a typical approach to describe the additive genetic values of individuals inanimal breeding� The log gamma was chosen because exp�u� is then gamma distributed�a usual assumption for frailty models The two parameters of the gamma distribution aretaken to be equal so that the expectation E�exp�u�� � � With the normal distribution�E�u� � � The distribution parameters �Gamma parameter for the log gamma distri bution and variance for the normal or multivariate normal distribution� may be eitherprespeci�ed or estimated alongside with the e�ects in the model �Ducrocq and Casella������ Several random e�ects may be speci�ed in the same model� but only one mayinvolve a relationship matrix Random e�ects may also be time dependent Althoughthe expression frailty term is used to generally describe random e�ects in survival anal ysis� it was originally introduced to account for individual heterogeneity of observationsas is done by the error term in the linear model Such a term may also follow one of thedistributions mentioned above and is technically treated in the same way as the otherrandom e�ects

Strata�

Strati�cation may be used to separate groups of individuals with di�erent baseline hazardfunctions ��j�t� where j is serving as group indicator Together with time dependent co variates� this is another mean of relaxing the required assumption of proportional hazardsfor all individuals over the total observation period Only one variable may be chosen asstrata variable The number of strata is not restricted

In addition to estimation of �xed and random e�ects �and their distribution parameters��

Page 8: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� CONTENTS

the Survival Kit o�ers options for calculating asymptotic standard errors of e�ects �onlyfor smaller models� where the matrix of second derivatives may be actually set up�� asequence of likelihood ratio tests and di�erent ways of setting constraints to deal withdependencies in the model As a special feature� di�erent values of the survivor functionmay be estimated for individuals with preset covariate structure This way� it is possibleto calculate estimated median survival time �for example� or survival probability to aspeci�ed age for combinations of covariate values of special interest Generalized� mar tingale and deviance residuals �Cox and Snell� ����� Klein and Moeschberger� ���� canalso be computed

� The programs

The Survival Kit mainly consists of a set of three Fortran programs� called prepare�f�cox�f and weibull�f �denoted as PREPARE� COX and WEIBULL subsequently� and a �leparinclu holding parameter de�nitions that is included in each of the programs �via aFortran include statement� The package works stand alone� ie does not rely on anysubroutines from mathematical subroutine libraries The optimisation routines used arepartly taken from public domain subroutine libraries �Liu and Nocedal� ����� Perez Encisoet al� ����� and are integrated in the programs PREPARE is used to prepare the datafor the actual analysis done with either COX or WEIBULL Data preparation includesrecoding of class variables and in the presence of time dependent covariates splitting upindividual records into so called elementary records with each elementary record coveringonly the time span from one change in any time dependent covariate to the next Therecoded �le may therefore have many more records than the original one

The estimation of e�ects under the proportional hazards model described above is thenperformed by COX or WEIBULL� depending on whether the baseline hazard function isassumed to be unspeci�ed in the Cox model or it is assumed to follow the two parameterWeibull hazard distribution Speci�cations for both models are similar� but it is compu tationally easier and less time consuming to estimate the parameters of distributions ofthe random e�ects under the Weibull model

For extremely large applications � eg� national genetic evaluations� the number of elemen tary records may become huge �sometimes � ��� millions� when time dependent covari ates are used in the model In the �Survival Kit� version �� and above� modi�ed versionsof programs PREPARE and WEIBULL �preparec�f and weibullc�f� denoted hereafter asPREPAREC and WEIBULLC � were written using public domain C subroutines for com pressing and decompressing data during I�O operations �zlib general purpose compressionlibrary� version ���� J Gailly and M Adler� web page� http���questjplnasagov�zlib��Our experience is that the resulting programs are about � times slower but compressionis extremely e�cient since the compressed �les may take up to �� or �� times less diskspace A similar version for the Cox model or for the grouped data model of Prentice and

Page 9: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� PREFACE �

Gloeckler ����� was not implemeted as they are not really suited for huge applications

� Hardware requirements

The programs have been written in Fortran and have been tested on PC �using Lahey�sFortran compiler� and on several UNIX platforms No system routines are used� excepta timing subroutine �second�� for UNIX platform� which can be replacedor switched o� without any consequence The size of the program may be variedthrough changes in parameters a�ecting the maximum number of records and maximumnumber of levels of e�ects to be estimated These parameters are de�ned in a single �lecalled parinclu and included in each program using a Fortran INCLUDE statement Tomake the changes e�ective �and only then�� it is is necessary to recompile the programsFor PC� compilers making use of extended memory are favorable

To be used on most PCs� it is important to change the extension �f� into �for� forall programs �eg� prepare�for� cox�for� weibull�for� At least on the Lahey�s Fortrancompiler� one should also rename the parinclu �le into parinclu�for and makeall appropriate changes to the include statements �include �parinclu for�� of allsubroutines of all programs It is also often preferable to use Fixed Formats insteadof Free Formats

Programs PREPAREC and WEIBULLC require the compiled C subroutines of zlib Theseare supplied with the �Survival Kit�

� Last Changes

� In version ��� the programs PREPAREC and WEIBULLC were made availablefor very large applications A few bugs were corrected and some features �like theJOINCODE� COEFFICIENT� RESIDUALS� INTEGRATE OUT � � � WITHIN � � �statements� choice of input�output formats� inclusion of groups of unknown parents�use of left truncated records with COX�� were added

� In version ��� the main addition was the possibility to properly analyze discretefailure time data Discrete failure times with very few distinct observed val ues occur for example when survival time is expressed in years or parities Then�it becomes more di�cult to �nd proper parametric proportional hazards modelsand the semi parametric approach of Cox �Cox model� is no longer suitable �anexact calculation of the partial likelihood is not feasible in general in presence ofmany �ties�� The approach of Prentice and Gloeckler ����� for grouped data ismuch more satisfying� a full likelihood is written� involving a limited number ofparameters describing the baseline hazard function Using a reparameterization of

Page 10: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CONTENTS

Prentice and Gloeckler�s model� it is possible to transform the grouped data modelinto a form close to an exponential regression model including a particular time dependent covariate �time unit�� with changes at every time point The PREPAREand WEIBULL programs have been modi�ed to accomodate such a model �notehowever that the resulting model is not a Weibull model� The only change that isrequired from the user is the speci�cation that the time scale is discrete �keyword�DISCRETE SCALE� in the �rst parameter �le The use of D� �ddmmyy� for mats with years larger or equal to year ���� is now possible �option NBUG ���� inthe parinclu �le�

� In version ��� �April ������ another set of small bugs was corrected The statementSIRE DAM MODEL was made obsolete �JOINCODE does the same thing andshould be prefered� It is now possible to use both at the same time the statementsSTRATA and INTEGRATE OUT ��� WITHIN ��

� Disclaimer

The �Survival Kit� can be freely used for non commercial purposes provided its use isbeing credited �Ducrocq and S�olkner� ��� Ducrocq and S�olkner� ��� andcan be freely distributed Use it at your own risk There is no technical support� butquestions can be directed to�

Dr Vincent DucrocqStation de Genetique Quantitative et AppliqueeInstitut National de la Recherche Agronomique���� Jouy en Josas FrancePhone� � �� � �� �� �� �� Fax� � �� � �� �� �� ��email� VincentDucrocq�dgajouyinrafr

or�

Dr Johann S�olknerUniversit�at f�ur BodenkulturGregor Mendel Strasse ������ Vienna AustriaTel� � �� � ���� ��� Fax ��� � ������email� soelkner�mailbokuacat

� Availability

The programs and this manual can be retrieved on the Web in compressed form at�

http���wwwbokuacat�nuwi�popgen or at http���www sgqajouyinrafr�di�usionshtm

Page 11: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Chapter �

Data Preparation Program�PREPARE�

When called� PREPARE will ask for the name of a parameter �le or a default namecan be prespeci�ed in the parinclu �le This parameter �le provides the program withinformation about the �les and variables to be used In particular� the dependent andcensoring variables are de�ned� class variables and variables that may be combined intonew variables are stated Information relating to the time scale may either be given inabsolute values relative to the start of the individual observation or via dates In the lattercase� the date of the start of the observation has also to be given and the program willtransform the information given though dates into information about time relative to thestarting point of the observation Levels of variables de�ned in the CLASS statement willbe recoded from � to no� of levels for use by the analysis programs COX and WEIBULLWhen time dependent variables are de�ned �or when it is speci�ed that the time scale isdiscrete with few categories�� the program will �cut� the observation into pieces �calledelementary records� each ranging from one change in any of the time dependent covariatesto the next

� Syntax

The following statements may be used with PREPARE Statements written in capitals areobligatory Names between � � are omitted when they are not needed The sequenceof statements should be as shown Comments may be included in the parameter �le Thestart of a comment is ��� the end is ��� ie �� This text is a comment �� The textbetween the two delimiters may be on more than one line No more than �� charactersshould appear on one line� but a same statement can cover several lines� each statementends with a semi colon ���

��

Page 12: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� DATA PREPARATION PROGRAM �PREPARE�

FILES �le� �le� �le� �le��INPUT variable type variable type ��� variable type�TIME variable �variable �variable���CENSCODE variable value�discrete scale�idrec variable�truncate variable�timecov variable type values�convdt variable�variablevariable variable�variablevariable ����class variables�joincode variable variable�timedep variable typedi variable typedi ����combine variable�variable�variable variable�variable�variable�variable ����OUTPUT variables�future �le� �le �pedigree �le� �le� variable �variable��formout type�

� � FILES Statement

FILES �le� �le� �le� �le��

The FILES statement is used to de�ne the names of the �les needed in the process ofdata preparation for the actual survival analysis

� �le� is the name of the original data �le �must exist in your directory�

� �le� is the name of the recoded �le produced by the program to be used by subse quent programs COX or WEIBULL This �le may be much larger than the originaldata �le when time dependent covariates are used in the analysis� as so called ele mentary record are written spanning the times between any changes in time depen dent covariates

� �le� is the name of a �le containing the original and new codes �after internalrecoding� for each variable in the class statement The �le name is required evenwhen no class statement is de�ned In this case it will be an empty �le

� �le� is the name of a �le containing information about the variables in the dataset to be used by subsequent COX or WEIBULL �It is one of two parameter �lesrequired by those programs�

Page 13: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

Note for UNIX users� If the UPPER CASE �ag in �le parinclu �see later for a detaileddescription of the parameters in parinclu� is set to TRUE� all lowercase letters in theparameter �le are transformed into upper case internally �to avoid troubles with variablenames typed in di�erent ways by the user� This implies that on systems where �lenames are case sensitive �mainly UNIX�� the �le name of �le� has to be upper case to berecognised by the program Also� the names of the �les produced by PREPARE ��le� to�le�� will be upper case even if stated lower case in the FILES statement

If the UPPER CASE �ag is set to FALSE� all names �other than keywords� which canbe either upper or lower case� used in parameter �les are case sensitive �they are nottransformed�

� � INPUT Statement

INPUT variable type variable type ��� variable type�

In the INPUT statement� variables names and their respective types are de�ned Theinput �le ��le�� is read in free format implying that variables have to be separated byblanks and that all variables in the data set �whether needed or not in the current analysis�have to be speci�ed Variable names may be up to � characters long �otherwise the nameis truncated�� and may be comprised of any combination of characters� numbers andsymbols� except for the following symbols� � �� � �� The following variable types are

allowed�I� integer numberR� real numberD� date with � digits �ddmmyy�D� date with � digits �ddmmyyyy�Note that the �Y ���� bug� is avoided for D� dates by adding ��� to the value of �yy�

when yy is less than NBUG ���� �set in the parinclu �le�

� � TIME Statement

TIME variable � variable � variable ���

In the TIME statement� the dependent variable is de�ned This is usually time� but maybe other things like lifetime production� etc Time values of � should be avoided Thetime statement may have three di�erent forms�

� TIME variable�

The variable must have been speci�ed in the input statement and contains the time

Page 14: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� DATA PREPARATION PROGRAM �PREPARE�

till death or censoring of an individual �must be of type I��

� TIME variable� variable��

variable� is the time variable as above� variable� is a date �D� or D�� indicatingthe beginning of the observation �must also be contained in list of input variables�This is necessary for models with time dependent covariates when changes in riskare given by dates and not by times in the TIMECOV or TIMEDEP statements

� TIME variable� variable� variable��

variable� is the name of the time variable It may or may not be contained in the listof variables in the INPUT statement variable� and variable� are dates �D� or D��indicating the beginning and end of the observation They must have been de�nedin the INPUT statement If variable� is not found in the input statement� its valuewill be calculated as di�erence between end and beginning of the observation

Note� Only one dependent variable may be speci�ed If di�erent dependent variables areof interest� separate analyses have to be run all of them starting with the data preparationstep

� � CENSCODE Statement

CENSCODE variable value�

In the CENSCODE statement� the censoring variable is de�ned This is an indicatorvariable The variable name must be identical to one of the names speci�ed in the INPUTstatement and must contain the censoring code �must be of type I�� value de�nes anumber indicating �right� censored records All other values of the variable indicatedeath �failure�

� � DISCRETE SCALE Statement

DISCRETE SCALE�

The DISCRETE SCALE statement speci�es that the failure time variable �variable� inthe TIME statement� is expressed in discrete units with few �say� � ��� distinct valuesThis is in preparation for� later on� an analysis using a grouped data model �Prentice andGloeckler� ���� with the WEIBULL program �although it is not a Weibull model�

The immediate consequence of the DISCRETE SCALE statement is the implicit de�nitionand calculation of a time dependent covariate called time unit� taking distinct values at

Page 15: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

time �� �� �� �� � � � � failure time�� An elementary record is created for each value of thistime unit variable

� � IDREC Statement

The statement can have two mutually exclusive alternative forms�

IDREC variable�IDREC STORE PREVIOUS TIME�

� In the IDREC statement� normally a variable is de�ned that is unique to each recordThis variable may be needed for sorting the elementary records when using a Weibullmodel and is therefore obligatory when using WEIBULL as analysis program Thevariable must be of type I�

Note� The same variable may also appear in the class statement �eg� animalidenti�cation for an animal model In this case� the variable will appear twice inthe recoded �le� that is a result of this program ��rst� unrecoded as the �rd variableon the record� and second� recoded as one of the class variables�

� The alternate formulation �IDREC STORE PREVIOUS TIME� is needed in thespecial case when later on in the WEIBULL program� a �log gamma� randomtime dependent covariate is integrated out �and only then� The beginning of eachinterval is then stored at the place where the identity of the record is normallystoredNote� if the �INTEGRATE OUT variable� WITHIN variable�� statement is used

in the WEIBULL parameter �le� the basic form �IDREC variable� is maintained�no need of �STORE PREVIOUS TIME��

� � TRUNCATE Statement

In the TRUNCATE statement� information is provided for handling left truncatedrecords For these records� the origin point �eg date of birth� is known but occursbefore the beginning of the study period� so that the covariate status between originpoint and begin of study period is unknown It has the following form

TRUNCATE variable�

variable holds the time of the truncation point It must be of type I�� D� or D� If oftype I�� it speci�es the time between the beginning of the observation and the point intime when the individual actually entered the study �truncation point� If of type D� orD�� it holds the truncation date and time will be calculated as di�erence between this

Page 16: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� DATA PREPARATION PROGRAM �PREPARE�

date and the date speci�ed in the �rst argument of the time statement �types must beidentical� If the variable is coded � in the data set� the record is treated as untruncated�independently from data type�Left truncated records are now �in version �� and above� handled by both the WEIBULLand the COX programs

� � TIMECOV Statement

TIMECOV variable type values�

The TIMECOV statement is needed when the hazard changes with time �this may becalendar time or time as measured from the beginning of each observation� independentfrom special �stochastic� events

� variable de�nes the name of the timecov variable This must be a new name notspeci�ed previously �in the INPUT or TIME statements�

� type may be either integer �I�� or a date �D� or D��

� values indicates a list of values �with a maximum number speci�ed through theparameter MAXFIG in the parinclu �le� de�ning points in time when the hazardchanges

If type is integer� values indicate changes in the hazard rate relative to the beginningof each observation If type is a date� it determines points in the calendar time� whenthe risk changes for all individuals The change on the time axis of the individual isthen calculated as di�erence between the respective date given and the date indicatingthe starting point of the observation in the TIME statement Therefore� when usinga TIMECOV statement with type as a date� the TIME statement must hold the dateof the beginning of the observation as second argument and the date of the end of theobservation as third argument The variable types of those arguments must be identical

Note� Only one TIMECOV statement can be handled by the program �in version �� andabove�

� CONVDT Statement

CONVDT variable�variablevariable variable�variablevariable ����

With the CONVDT statement� the di�erence between two date variables is calculatedand written onto a new time variable The variable to the left of the equality sign mustbe a new name� the variables to both sides of the minus sign must be dates of the same

Page 17: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

type �D� or D�� This statement will be used for calculation of covariates �eg� age at�rst calving from date of birth and date of calving� but not for the dependent variable�which is processed in the TIME statement

Note� the variable to the left of the minus sign is the date later in time �eg� date of �rstcalving�� to the right is the one earlier in time �eg date of birth�

� �� CLASS Statement

CLASS variables�

The CLASS statement lists the names the classi�cation variables to be used in the anal ysis Independent variables de�ned in the MODEL statements of the COX or WEIBULLprograms which are not de�ned in the CLASS statement will be treated as continuouscovariates All variables de�ned in the class statement will be internally recoded �list ofcodes in �le�� Codes are transformed back to original values for the listing of the resultsfrom COX and Weibull The recoded values are needed� though� in the CONSTRAINTSstatement of COX and WEIBULL �see there�

� �� JOINCODE Statement

JOINCODE variable variable�

The JOINCODE statement speci�es that the two variables indicated must be recodedtogether� as if they pertained to a single list This is necessary� eg� when sires andmaternal grand sires are speci�ed as di�erent variables in the input statement but mustappear in a single recoded list� sires can also appear as maternal grand sires and theiradditive genetic e�ect as grand sires is �� times their e�ect as sires This is also oneway �out of �� for the de�nition of sire dam models� sires and dams are recoded togetherusing the JOINCODE option and an �overall� relationship matrix is used for both siresand dams

Both variable names must have been declared previously in the CLASS statement

� �� TIMEDEP Statement

TIMEDEP variable typedi variable typedi ������

Time dependent covariates� ie� independent variables whose value may change duringthe time an individual is observed �may be CLASS or continuous covariates� are de�nedwith this statement variable names must be de�ned in the INPUT statement

Page 18: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� DATA PREPARATION PROGRAM �PREPARE�

In the input �le ��le��� any changes of value of these variables with time have to be givenat the end of each record as triplets in the following way �see also section about �les��

� consecutive number of the time dependent covariate This means that one has tocount on which position in the record the variable �with its initial value� is locatedand this value has to be supplied

� date or time of change in covariate value

� new covariate value

typedi relates to the way the di�erence between beginning of the observation and theactual change of the covariate is given If it is I�� it is given as time� if it is D� or D�� itis calculated as the di�erence between the date given in b� and the date describing thebeginning of the observation in the TIME statement �both dates must be of the sametype�

One of the most frequent problems �errors� that users encounter with the Survival Kitx�is related to an inadequate preparation of these triplets �for example� with time of change� � �or even � �� or dates of change before the origin �or the truncation� point or afterthe failure date

� �� COMBINE Statement

COMBINE variable � variable� variable variable � variable� variable� variable����

The COMBINE statement is used to to produce �interaction e�ects� between two or threevariables de�ned in the CLASS statement Continuous variables are not allowed Thevariable to the left of the equality sign must be a new name If one of the variablescombined is time dependent �de�ned in the TIMEDEP statement�� the resulting variablewill also be time dependent

� �� OUTPUT Statement

OUTPUT variables�

In the OUTPUT statement� the variables de�ned in the variables list will be written tothe output �le ��le� of the FILES statement� This �le will then be used for the analysiswith the COX or WEIBULL programs The variables must be of type I� or R� whende�ned in the INPUT statement All variables newly de�ned in subsequent statements�TIME� CONVDT� COMBINE� are also allowed No date variables �D� or D�� may beincluded into the output �le�

Page 19: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

� �� FUTURE Statement

FUTURE �le� �le �

In the FUTURE statement� you provide information about a set of records� for which youwant a printout on predicted values of the survivor curve �quantile values or functionalvalues of the survival function at given times�

� �le� is the name of the original �le holding the �future� records �must exist in yourdirectory� The structure of the �le must be exactly identical to that of �le�

� �le is the name of the recoded �le produced by the program identical in structureto the �le� data �le It will be used by COX or WEIBULL to produce values relatedto the corresponding survival functions Note� if you have a future statement� youalso need to have an IDREC statement� as elementary records have to be sorted byidentity of the original records

� �� PEDIGREE Statement

In the PEDIGREE statement� you give information about a pedigree �le for includingrelationships between individuals into the model Depending on the model to be usedlater� it can have two alternative forms�

PEDIGREE �le� �le� variable� �for sire or animal models�or for sire dam models if �JOINCODE sire dam�� is used�

PEDIGREE �le� �le� variable� variable�� �for sire dam models�

� �le� is the name of the original pedigree �le �must exist�

� �le� is the name of the recoded pedigree �le produced by the program

� variable is the name of the random variable �animal or sire� in the data �le �mustbe stated in INPUT and OUTPUT�

� variable� and variable� �for sire dam models� unless JOINCODE is used� are thenames of the sire and dam variables in the data �le �must be stated in INPUT andOUTPUT�

FORM OF THE PEDIGREE FILE�The pedigree �le has ALWAYS to consist of four columns�

Page 20: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� DATA PREPARATION PROGRAM �PREPARE�

� � � identity of animal

� � � sex code in sire dam models �� � male� � � female� � for group of unknownparents �� may be used for other purposes in other situations

� � � identity of sire

� � � identity of dam �or maternal grandsire�

Unknown parents are de�ned as � or with negative values corresponding to groups ofunknown parentsImportant� there should be one line de�ned for each animal� each parent and each groupof unknown parents

� �� FORMOUT Statement

FORMOUT type�

FORMOUT FIXED FORMAT In Fn�d�

With the FORMOUT statement� the format of the recoded output �les is speci�ed �theseare �le� of the FILES statement and �le of the FUTURE statement� If a �xed formatis chosen� the type FIXED FORMAT must be followed by the Fortran description offormats for integers and reals ��In Fn�d�� eg� �I� F���� for integers with � digits andreals with �� digits� including � decimal �gures� The other admitted types are�

FREE FORMATUNFORMATTEDBLOCKED UNFORMATTEDCOMPRESSED

� FREE FORMAT is the default option

� FIXED FORMAT is useful when you need to sort output �les but your sortingroutine is not really adequate when the records are not strictly presented in columns�especially on PCs� Be careful� if the formats for integers and�or reals are not bigenough� the results will be incorrect Note� the previous way �version ��� toproduce a recoded �le in �xed format through a parameter in the parinclu �le is nolonger available

� UNFORMATTED writes binary �les This saves time and �some� space when writ ing

Page 21: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

� BLOCKED UNFORMATTED reads and writes NRECBLOC binary records to gether NRECBLOC is speci�ed in the parinclu �le It is the most e�cient option�it is much faster than when records are written or read one at a time

� With COMPRESSED� blocs are compressed before being written and decompressedafter being read�

COMPRESSED is available only when the program PREPAREC �calling subrou tines for compression � decompression� is used It takes about � times longer toread and write compressed �les but these are substantially smaller

The BLOCKED UNFORMATTED and COMPRESSED options should not be used ifthe output �le�s� must be sorted In this same situation� the UNFORMATTED optioncan be used only if a binary sort subroutine is available

� Sorting of the recoded le

For the use in COX or �sometimes� in WEIBULL� the recoded �le produced by PREPARE��le� of the FILES statement� has to be sorted The sorting order is di�erent for COX andWEIBULL A sorting routine is not included in the Survival Kit� as it is often machinedependent The speci�cation of a �xed format �FORMOUT FIXED FORMAT� may benecessary when compiling the Survival Kit with some �PC� compilers� where the outputof a single record in free format is put into separate lines when it exceeds �� charactersYou have to �nd out�

� COX Program

This is the sorting order �from highest to lowest level� for COX�

� STRATA variable� in ascending order �this is necessary only when a STRATAstatement is used in COX� see below�

� TIME variable� in descending order �the TIME variable is always the �rst variablein the recoded �le�

� CENSCODE variable� in ascending order �the censoring variable is always the sec ond variable in the recoded �le�

This sort prevents the use of UNFORMATTED �on most computers�� BLOCKED UN FORMATTED or COMPRESSED formats

Page 22: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� DATA PREPARATION PROGRAM �PREPARE�

� WEIBULL Program

Several special cases exist� These cases are mutually exclusive�

� A STRATA statement is used in WEIBULL The sorting order should be�

� STRATA variable� in ascending order

� IDREC variable� in ascending order �the IDREC variable is always the thirdvariable in the recoded �le�

� TIME variable� in ascending order �the TIME variable is always the �rst inthe recoded �le�

� A LOGGAMMA random variable is de�ned in the WEIBULL parameter �le Thisrandom variable is algebraically integrated out in order to reduce the number ofparameters to estimate or to obtain an exact marginal posterior distribution Thisis done using the INTEGRATE OUT statement in the parameter �le �see below�

Again two situations exist�

� If the INTEGRATE OUT statement is followed by the name of one variableonly �eg� INTEGRATE OUT sire�� the sorting order should be�

� the variable to be integrated out� in ascending order

� IDREC variable� in ascending order �the IDREC variable is always thethird variable in the recoded �le�

� TIME variable� in ascending order �the TIME variable is always the �rstin the recoded �le�

Note that if the variable to be integrated out is time dependent �eg� herd year season� the statement IDREC STORE PREVIOUS TIME must have beenused

� If the INTEGRATE OUT statement is followed by the name of two variables�eg� INTEGRATE OUT year WITHIN herd� and if the original records werealready sorted by herd� the statement IDREC STORE PREVIOUS TIME isnot needed� and there is no need to sort according to the variable �year�Records should simply be sorted by IDREC variable and ascending TIMEvariable This is the natural way records are recoded and therefore� the need tosort the elementary records disappears This may lead to a substantial saving oftime and allows the use of UNFORMATTED� BLOCKED UNFORMATTEDor COMPRESSED formats

�also applies to the grouped data model of Prentice and Gloeckler ������

Page 23: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

� General case�

In all other situations� there is no need to sort� the recoded data set is naturallyrecoded by IDREC variable and ascending TIME variable This allows the use ofUNFORMATTED� BLOCKED UNFORMATTED or COMPRESSED formats

� Sorting of future records

The sorting of �future records� ��le of the FUTURE statement� is identical to the sortingorder for WEIBULL �general case�� irrespective of whether COX or WEIBULL is usedlater on

Page 24: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� DATA PREPARATION PROGRAM �PREPARE�

Page 25: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Chapter �

Programs COX and WEIBULL

Depending on the parameterisation of the baseline hazard function� either program COXor program WEIBULL may be called after data preparation with PREPARE Using thesame recoded data set� various alternative survival analyses may be carried out with bothprograms The programs require two di�erent parameter �les The �rst is produced byPREPARE The second parameter �le essentially describes the model of analysis Moststatements used in the second parameter �le may be called by both programs� a few�mainly relating to estimation of the distributional parameters of log gamma distributedrandom e�ects� may only be called by WEIBULL Generalized residuals are computedonly by COX

As already mentioned� for extremely large applications for which the number of elemen tary records is huge a modi�ed version of WEIBULL �weibullc�f� denoted hereafterWEIBULLC� was written using public domain C subroutines for compressing and de compressing data during I�O operations �a similar version for the Cox model was notimplemeted as the Cox model is not suited for huge applications� Except for the formatspeci�cation in parameter �le ��� �which is the direct result of the use of PREPAREC�see above�� the parameter �les for WEIBULLC are exactly the same as for WEIBULL

� Parameter le ���

As pointed out above� this parameter �le is produced by PREPARE ��le� of the FILESstatement of PREPARE� Note that you normally will not want to change anything inthis �le This section is written to aid the user in understanding the �le

�� � Syntax

The following statements are used in this parameter �le

��

Page 26: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

TIME position�CODE position�ID position or PREVIOUS TIME position�COVARIATE variable nlevels pos� pos� variable nlevels pos� pos� �����discrete scale�joincode variable� variable��format type�

�� � TIME Statement

TIME position�

position is a �gure which labels the position of the dependent �TIME� variable on therecoded data �le ��le� of the FILES statement of PREPARE�Note� The position of the time variable is always �

�� � CODE Statement

CODE position�

position is a �gure which labels the position of the censoring code on the recoded data�le ��le� of the FILES statement of PREPARE� In this �le� censored records are alwayscoded �� uncensored records are coded �� but other codes are also used� �� for the�rst elementary record refering to a truncated observation� �� for an elementary recordindicating a change in a time dependent covariateNote� The position of the censoring code is always �

�� � ID Statement and PREVIOUS TIME Statement

The two statements are mutually exclusive

ID position�

PREVIOUS TIME position�

position labels the position of the record identi�cation code �ID Statement� or of thebeginning of the time period for an elementary record in the special case described in theIDREC STORE PREVIOUS TIME statement of program PREPARE �required when atime dependent covariate is integrated out in the WEIBULL program� and only then�Note� The position of the identi�cation variable is always �

Page 27: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� � COVARIATE Statement

COVARIATE variable nlevels pos� pos� variable nlevels pos� pos� �����

The COVARIATE statement gives information about the covariates that might be used inthe model of analysis These are all variables in the OUTPUT statement of the PREPAREparameter �le except for the TIME and CODE variables described above

The covariates are listed one per line� �rst continuous and then class covariates variableis the name of the covariable� nlevels gives the number of di�erent levels found for classvariables �zero for continuous covariates�� and pos� and pos� give the position�s� of thecovariate on the recoded �le ��le� in the FILES statement of the PREPARE parameter�le� When pos� and pos� are identical� the covariate is time independent When pos�and pos� are di�erent �subsequent� numbers� the covariate is time dependent and thecovariate values on pos� and pos� give the status of the covariate before and after thechDange of the hazard function of an individual due to a change in this �or another� timedependent covariate

�� � DISCRETE SCALE Statement

DISCRETE SCALE�

The DISCRETE SCALE statement indicates that the recoded �le is prepared to be usedwith the WEIBULL program to �t the grouped data model of Prentice and Gloeckler�����

It is the direct result of the usage of the DISCRETE SCALE in the PREPARE parametesr�le The consequence has been the de�nition of a speci�c time dependent covariate calledtime unit� which changes values at each time point ��� �� � � � � between the origin and theobserved failure time or censoring time A corresponding number of elementary recordswere therefore created

The statement also avoids the need to repeat it in the user supplied parameter �le ���If one wants to run a regular Weibull model� one can simply comment out this statementin parameter �le ��� However� if a lot of elementary records were created because ofthe de�nition of the time unit time dependent variable� it may be more e�cient to startagain� running PREPARE after deleting the DISCRETE SCALE statement in its owninput parameter �le

�� � JOINCODE Statement

JOINCODE variable� variable��

Page 28: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

The JOINCODE statement indicates that the two variables variable� and variable� wererecoded together� in a single list This is the case� eg� when sires and maternal grand siresare speci�ed as di�erent variables but used in a sire maternal grand sire model� sires mayappear as maternal grand sires and their additive genetic e�ect as grand sires is �� timestheir e�ect as sires This statement is the direct result of the usage of the JOINCODEstatement in the PREPARE parameter �le

�� � Format Statement

format type�

format type displays the format of the recoded �output� �les from PREPARE �in the PRE PARE parameter �les� these are �le� of the FILES statement and �le of the FUTUREstatement�

If a �xed format is chosen� format type is FIXED FORMAT and is followed by the Fortrandescription of formats for integers and reals ��In Fn�d�� eg� �FIXED FORMAT I� F����for integers with � digits and reals with �� digits� including � decimal �gures� The otherpossible expressions for format type are�FREE FORMAT� �� default�UNFORMATTED�BLOCKED UNFORMATTED�COMPRESSED� �only with PREPAREC�WEIBULLC�

�� Parameter le ���

In this parameter �le� the statistical model used in the data analysis is described togetherwith various options regarding storage� statistical tests and output Most statementsand options are valid for both COX and WEIBULL programs Therefore� the sameparam eter �le can be used with both programs However� a few statements apply to onlyone of these programs and should be commented out when the other program is run� inorder to avoid warning or error messages These will be indicated below

The grouped data model of Prentice and Gloeckler ����� can also be �tted usingthe WEIBULL program� although it is not a Weibull model For the user� the pa rameter �le ��� has exactly the same form as for the Weibull model� the use of thegrouped data model being already speci�ed in the parameter �le ��� through the DIS CRETE SCALE keyword The latter will automatically generate an extra time dependentcovariate time unit which will appear in the model �do not repeat it in the MODEL state ment� Note that a few statements are not applicable to such a model �eg� RHO FIXEDor INTEGRATE OUT��

Page 29: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

�� � Syntax

The following statements are used in this parameter �le Statements written in capitals areobligatory Names between � � are omitted when they are not needed The sequenceof statements should be as shownComments may be included in the parameter �le The start of a comment is ��� the endis ��� ie �� This text is a comment �� The text between the two delimiters may be onmore than one lineSometimes the statement or option names are longer than � characters �eg� the statementread old solutions shown below� In this case� only the �rst � characters �ie read oldin our example� are signi�cant to the programs� the rest may or may not be written bythe user

FILES �le� �le� �le� ��le� �le� �le ��title Title of analysis�strata variable�rho �xed value�MODEL�variables��coe�cient variablevalue� �only �statistics�random variable �estimate �moments�� distribution �rules� parameter�s��repeat sequence for next variable�s�� �integrate out � joint mode � variable� � within variable� ��include only variable value value�test �type�s� of test��std error�dense hessian�baseline�kaplan�survival �le� �le� option�residual �le� �le��constraints options�convergence criterion�storage on disk or in core�store solutions�read old solutions�store binary�

Page 30: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

�� � FILES Statement

FILES �le� �le� �le� ��le� �le� �le ��

The FILES statement is used to de�ne the names of the �les needed for running COX orWEIBULL

�le� is the name of the recoded data �le ��le� of FILES Statement in PREPARE� This�le must exist in your directory

�le� is the name of the �le containing the original and new codes �after internal recoding�for each CLASS variable ��le� of FILES Statement in PREPARE� The �le name isrequired even when no CLASS variables were de�ned in PREPARE In this case it willbe an empty �le

�le� is the name of the output �le where the results of the analysis are listed

�le� is the �recoded� pedigree �le �le� of the PEDIGREE Statement of PREPARE This�le is not obligatory and will only be needed for analyses where the additive geneticrelationship structure between individuals is considered

�le� is only needed when for very large applications� solutions from a previous run are tobe read in because the READ OLD SOLUTIONS statement is used �see below�

�le is only needed when for very large applications� solutions are to be stored for asubsequent run with the STORE SOLUTIONS statement �see below�

Files �� � must exist in your directory� �les � and � must exist when subsequent statementsmake use of them� �le � and � will be new �les �Note� existing �les with the same nameare overwritten without warning� In situations where �les � or � are required but not�les � or � and �� dummy names have to be stated for the latter ones

Note for UNIX users� Unless the UPPER CASE �ag in �le parinclu �see later for adetailed descriptions of the parameters in parinclu� is set to TRUE� all lowercase lettersin the parameter �le are transformed into upper case internally �to avoid troubles withvariable names typed in di�erent ways by the user� This implies that on systems where�le names are case sensitive �mainly UNIX�� the �le name of �le� has to be upper caseto be recognised by the program Also� the names of the �les produced by PREPARE��le� to �le�� will be upper case even if stated lower case in the FILES statement IfUPPER CASE is set to FALSE in �le parinclu� �le and variable names will not berecognized if typed di�erently

�� � TITLE Statement

TITLE Title of analysis�

Page 31: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

The TITLE of analysis may �ll the rest of the line �up to position ��� after the statementname It will be written at the beginning of the output �les ��le� of the FILES statementabove and �le� of the SURVIVAL statement below�

�� � STRATA Statement

STRATA variable�

Strati�cation is an extension of the proportional hazards model With strati�cation� theassumption of the proportionality of hazards is restricted to individuals within subgroupsof the population� where grouping is de�ned by the variable indicated in the STRATAstatement Only one variable can be used as strata variable If a combination of variablesis more sensible �eg year season�� the individual variables may be combined in theCOMBINE statement of PREPARE

Note� Do not forget that in models with strata� data have to be sorted by strata �as cending� and within strata by the time variable �descending� and the censoring variable�ascending� See section on Files� records and sorting

�� � RHO FIXED Statement

RHO FIXED value�

This statement can be used only with the WEIBULL program �with COX� a warningmessage is printed and the statement is ignored� It speci�es a �xed value for the Weibullparameter � For example� RHO FIXED ��� will constrain � to be �� thereforede�ning an exponential regression model instead of a more general Weibull model

This statement is ignored when a grouped data model is �tted �then � is always assumedto be ��

�� � MODEL Statement

MODEL � variables ��

The MODEL statement speci�es the independent variables a�ecting the dependent vari able described in the TIME statement �remember that only one dependent variable maybe speci�ed there� The variables listed must be a subset of the variables de�ned in theCOVARIATE statement of Parameter �le �

Model building capabilities� Discrete �class� and continuous variables may be included

Page 32: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

No covariate name at all is also acceptedno covariate� if no variable is speci�ed between MODEL and the semi colon� the COXmodel can be used to simply compute a nonparametric estimate of the survivor curve�KAPLAN statement� see below� for each stratum The WEIBULL program with nocovariate speci�ed will lead to the �t of a � parameter Weibull function for each stratumdiscrete covariates� they have to be stated in the CLASS and COMBINE statements ofthe parameter �le for PREPAREcontinuous covariates� all variables in the MODEL which have not been speci�ed viaCLASS and COMBINE Polynomial regression models may be �tted by stating vari�able��N where N is the power to which the covariate value is taken For example� for athird order polynomial in varx� the statement would be�MODEL varx varx��� varx��� � any number of other discrete and continuous covariates�Covariables are always automatically centered for computations �ie their mean value issubtracted from each observation� For polynomial expressions� the values are �rst takento the power of N� the mean value is then calculated and subtracted for this new variableThe mean value �or the sum of mean values if there are more than just one continuouscovariate� is printed in the output of COX and is incorporated to the intercept in theoutput of WEIBULL This is important to know when you want to draw the regressioncurves�interactions between discrete covariates� They may only be �tted by combining the orig inal values of two or three variables into a new one via the COMBINE statement ofPREPAREInteractions between class and continuous covariates �ie individual regressions� andnested models are currently not supportedVariables are treated as �xed unless they are stated in the RANDOM statement describedbelow

When a grouped data model �� Prentice and Gloeckler�s model� is �tted� an extra time dependent covariate time unit will be automatically generated and will appear in themodel �do not repeat it in the MODEL statement� The purpose of this covariate is theestimation of the baseline hazard function at the same time as the regression parameters

�� � COEFFICIENT Statement

COEFFICIENT �variable value �variable value� ��

The COEFFICIENT statement speci�es that the covariate variable will always be mul tiplied by the coe�cient value This is especially useful when used in conjunction withthe statement JOINCODE in PREPARE For example� a typical sire maternal grand sireused in animal breeding will be obtained by specifying� JOINCODE sire mgs� �in the

Page 33: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

data preparation parameter �le� for PREPARE�andCOEFFICIENT mgs ���

Also� the best possible way to de�ne a sire dam model is�

JOINCODE sire dam� �in the data preparation parameter �le� for PREPARE�andCOEFFICIENT sire �� dam �� �

�� � �ONLY �STATISTICS Statement

ONLY STATISTICS�STATISTICS�

These statements can be used only with the WEIBULL program �they are ignored other wise� They request the computation and the printing of a number of elementary statisticsrelated to each level of the class variables speci�ed in the MODEL statement �numberof observations� number of observed failures� age at failure� average value of continuouscovariates for all observations and for the uncensored ones� etc� If STATISTICS ispreceded by ONLY � the program will stop as soon as these values have been printed

�� RANDOM Statement

RANDOM variable � ESTIMATE � MOMENTS �� distribution � rules �parameter � s �� repeat previous sequence for next variable � s ���

The RANDOM statement gives information on variables to be treated as random Thedistribution parameters of the random covariates are either assumed to be known or theymay be estimated �optional parameter ESTIMATE�

distribution allows to specify the distribution that the random variable is assumed to fol low The user may choose from � alternatives�LOGGAMMA� the levels of the random e�ect follow a log gamma distribution This isidentical with assuming a gamma frailty term The frailty term� say w� is de�ned as a mul tiplicative term to the usual hazard function with �xed e�ects only� ie� w � expfz�t�ugNORMAL� the levels of the random e�ect are independently normally distributed Thisassumption is not so common in survival analysis but for rather large gamma parameters�� � ���� the log gamma and normal are similar and the normal distribution is needed tomake the step to the multivariate normal distribution described nextMULTINORMAL rules� the levels of the random e�ect follow a multivariate normaldistribution with covariances between levels being induced by genetic relationships

Page 34: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

Two types of relationships are allowed and may be stated via the rules parameter�USUAL RULES are the relationships under an animal model� MGS RULES relate toa sire model or a sire maternal grandsire model� accounting for male relationships� Notethat SIRE DAM MODEL that was used in earlier versions to relate sires and dams in asire dam model in which sires and dams had been recoded separately is now obsolete andnot accepted any more Instead� simply use the JOINCODE option in PREPAREF andMULTINORMAL USUAL RULES sire ��� for sire dam models� When MULTINOR MAL is used� information about the covariances between individuals has to be providedvia a pedigree �le ��le� of the FILES statement�

parameter is the distribution parameter �gamma parameter with the log gamma distri bution and the variance with normal or multivariate normal distributions� It may bepreset� in which case one value of the parameter has to be provided or it may be estimated�see ESTIMATE option below�� in which case three values have to be given �see below�

ESTIMATE� when stated after the variable name� the distribution parameter is esti mated as the mode of its marginal posterior density which is approximated by Laplacianintegrationthe parameter value is replaced by three values� the �rst two values are the bounds ofthe interval to be searched� the third value gives the �nal tolerance �accuracy� Using theESTIMATE option� the parameter of only one random variable may be estimated at onetime If the estimated value is at one of the bounds prespeci�ed� a warning message isprinted and the analysis should be repeated changing the values of the bounds �Example�you stated parameters ��� �� ���� as lower bound� upper bound and tolerance� whichmeans that the program must look for the mode of the marginal posterior density inthe interval ����� �� and the searching process is stopped when the current interval forthe estimate is smaller than ���� If the program output tells you �the mode is between����� and ����� and the best value is ������� you need to reset the bounds� eg� to ����and ���� and start again� The STORE SOLUTIONS and READ OLD SOLUTIONSstatements described below may be used to avoid starting from scratch againWhen the INTEGRATE OUT JOINT MODE statement is used �only with a Weibullmodel and for a log gamma random e�ect�� the ESTIMATE statement should not beused� the gamma parameter is then estimated jointly with the other e�ects after exactalgebraic integration of the log gamma random e�ect

MOMENTS� this option may be used together with ESTIMATE to compute estimatedmean� standard deviation and skewness of the marginal persterior value of the distribu tional parameter The computation is based on an iterated Gauss Hermite quadrature ofthe approximate marginal posterior densities The number of points used for the integra tion is NPGAUSS� which is speci�ed in the parinclu �le

When the amount of information available for the estimation is limited� unpredictableresults �if any� may be obtained� although the mode of the distribution may have been

Page 35: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

successfully computed

�� �� INTEGRATE OUT Statement

INTEGRATE OUT � JOINT MODE � variable � WITHIN variable� ��

This option is only available in the WEIBULL program �the COX program prints anerror message and stops� and is not applicable to a grouped data model When a randome�ect is assumed to follow a log gamma distribution in a Weibull model� it is possible toalgebraically integrate it out from the joint posterior density This technique decreasesthe number of parameters to estimate �sometimes drastically� Algebraic integration isequivalent here to absorption of a group of equations in systems of linear equations Theconsequences are similar� a smaller system �but usually much less sparse� and no directavailability of the estimates of the e�ects integrated out �other than the gamma parameter�

The INTEGRATE OUT statement can be used alone� ie� only followed by the nameof the random variable to integrate out �must appear in the RANDOM statement too�Then� the gamma parameter of the log gamma distribution is assumed to be known Forexample�MODEL eect� eect� eect��RANDOM LOGGAMMA eect� ����INTEGRATE OUT eect��will lead to the integration of the log gamma distributed random variable eect� and theestimation of e�ects eect� and eect� and of the Weibull parameters� assuming a valueof ��� for the gamma parameter of eect�

If the INTEGRATE OUT statement is used with the JOINT MODE option� �as INTE GRATE OUT JOINT MODE eect�� in the example above� then the gamma parameteris estimated jointly with the other e�ects as the mode of the marginal posterior distribu tion of the gamma parameter� eect�� eect� and the Weibull parameters The value ���speci�ed in the RANDOM statement is just used as a starting value in the optimisationsubroutineThe advantage of this approach is that it performs exactly the marginalisation of therandom e�ect� which otherwise is done only approximately via Laplacian integration�Ducrocq and Casella� ����� Note� however� that the �xed and other random e�ectsas well as the Weibull parameters still have to be integrated out if one wants the fullmarginal posterior density for the gamma parameter of the log gamma distribution�The combined use of RANDOM ESTIMATE � � � and INTEGRATE OUT JOINT MODE� � � o�ers an easy solution to the joint estimation of the distribution parameters of tworandom variables

Page 36: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

Note� In order to perform the algebraic integration� the recoded data �le �le � must besorted by increasing levels of the random e�ect to integrate out

For very large datasets� this sorting according to the levels of the random e�ect to integrateout may be cumbersome� in particular because it is not compatible with the use of theBLOCKED UNFORMATTED and COMPRESSED options An alternative approachwhich avoids this sorting using the �WITHIN� statement has been developed when thee�ect to integrate out has a hierachical structure This will be better illustrated throughthe example of a random herd year� season� interaction e�ect

The �hard work� approach is to have the herd and year e�ects included in the original data�le The data preparation data �le �for PREPARE� includes the following statements�

INPUT � � � herd I� year I� � � � �� � �IDREC STORE PREVIOUS TIME�CLASS � � � herd year � � � �COMBINE � � � hy � herd � year � � � �OUTPUT � � � hy � � � �FORMOUT FREE FORMAT�

Each possible herd year combination is recoded separately and the recoded data �le hasto be sorted by increasing hy levels The use of IDREC STORE PREVIOUS TIME isnecessary because the e�ect hy to integrate out is a time dependent covariate �see the de scription of the IDREC STORE PREVIOUS TIME statement and the sorting rules� Theneed to sort also forces the use of a free format for the output �le Then� in WEIBULL�the integration of a random herd year season e�ect implies the following statements �as suming a log gamma distribution with initial parameter �����

MODEL � � � hy � � � �RANDOM hy LOGGAMMA ��� � � � �INTEGRATE OUT JOINT MODE hy�

The alternative �much easier� way simply requires that the initial data set be sorted byherd �and by herd only� There is no need to recode each herd year interaction separately�no need to specify STORE PREVIOUS TIME� no need to restrict the output format tofree format� no need to sort the recoded �le The data preparation parameter �le lookslike �for example��

INPUT � � � herd I� year I� � � � �� � �IDREC animal id�

Page 37: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

CLASS � � � herd year � � � �OUTPUT � � � herd year � � � �FORMOUT BLOCKED UNFORMATTED�� � �

and the parameter �le ��� for WEIBULL�

MODEL � � � year � � � �RANDOM year LOGGAMMA ��� � � � �INTEGRATE OUT JOINT MODE year WITHIN herd�

The output of these two approaches would be exactly the same

�� �� INCLUDE ONLY Statement

INCLUDE ONLY variable value value�

This option was developed in version �� of the Survival Kit� almost exclusively for theestimation of the gamma parameter of a log�gamma time�dependent random variableThis was needed because the Weibull model dids not accept left truncated records Nowthat left truncated records are handled by WEIBULL as well� INCLUDE ONLY can beregarded as obsolete

�� �� TEST Statement

TEST �type�s� of test��

Hypothesis testing is generally performed via likelihood ratio tests The following typesof tests may be requested by the TEST statement��GENERAL� test of the full model vs the model with no covariate This is the defaultoption and will be used even without request�SEQUENTIAL � test of the e�ects included in the model in sequential order �ie de

pending on the sequence in the MODEL statement�LAST� likelihood ratio test comparing the full model with models excluding one e�ect

at a time This is done for each e�ect separatelyEFFECT list of variables� the same type of test as with LAST is performed� but only fore�ects stated in the list

SEQUENTIAL and LAST may be stated together EFFECT may be stated either alone�with SEQUENTIAL or with LAST When EFFECT is used with LAST� it is redundant

Page 38: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

If it is used with SEQUENTIAL� the sequential inclusion of the e�ects in the model one ata time will start with the �rst e�ect appearing in the MODEL statement which is statedin the list following EFFECT For example�MODEL var� var� var� var� var� var��TEST SEQUENTIAL EFFECT var� var� var��will lead to likelihood ratio tests corresponding to the sequential introduction of variablesvar�� var�� var�

�� �� STD ERROR and DENSE HESSIAN Statements

Asymptotic standard errors of estimates may be requested by the following statement�

STD ERROR�

Be cautious with the use of the STD ERROR statement with large models as the Hes sian matrix has to be calculated and stored to calculate standard deviations of esti mates In the case of the COX program or the WEIBULL program with the particularDENSE HESSIAN statement� the full �square� Hessian matrix is stored Its dimensionis speci�ed by the parameter NDIMAX� which is de�ned in the parinclu �le and maybecome limiting with respect to storage capacity In the case of the WEIBULL program�the Hessian matrix is stored in sparse form �unless the DENSE HESSIAN statement isspeci�ed� and the vector space required in order to do so is speci�ed by the parametersAVAIL and NZE in the parinclu �le DENSE HESSIAN�

The full storage of the square Hessian matrix is the only option when the COX programis used �the DENSE HESSIAN statement is not required� The default option for theWEIBULL program is the sparse storage of the Hessian� unless the DENSE HESSIANstatement is used In cases where the memory is not limiting or when a log gammarandom e�ect is algebraically integrated out �leading to smaller but denser Hessian�� theDENSE HESSIAN option may save substantial computing time

�� �� BASELINE Statement

BASELINE�

The statement speci�es that the baseline hazard� the baseline cumulative hazard and thebaseline survivor functions should be computed and printed� in the COX program Thesevalues are computed at each distinct value of the TIME variable In strati�ed analyses�use of the STRATA statement�� separate baseline functions are calculated for strata

Page 39: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

With the WEIBULL program� the computation of the baseline Weibull parameters isalways implicit� the BASELINE statement is ignored

�� �� KAPLAN Statement

KAPLAN�

The statement speci�es that the product limit ��Kaplan Meier� estimate of the sur vivor function should be computed and printed The Kaplan Meier estimator is a non parametric population estimator of the survivor function that does not take into accountany of the e�ects stated in the MODEL statement The KAPLAN statement is permittedonly with the COX program �it is ignored otherwise�

�� �� SURVIVAL Statement

SURVIVAL �le� �le� option�s��

The SURVIVAL statement may be used to get information about the survivor functionof individuals with speci�c covariate values It is very useful to relate the estimatedregression coe�cients to a more conventional scale like median survival time or probabilityof survival to a certain age

�le� is the name of the recoded �le produced by program PREPARE on request in theFUTURE statement of PREPARE �called �le there�

�le� holds the results invoked by the use of the SURVIVAL statement

The following options may be used�

QUANTILES value�s�With this option� times at which speci�c values �quantiles� of the survivor curve are

reached are calculated and printed for each individual speci�ed in �le� The values mustlie between � and ��� and will be divided by ��� to produce the value of the survivorfunction �eg� with a value of ��� S�t�� ���� The median survival time can therefore berequested by the option QUANTILE ��

ALL TIMESThe estimate of the survivor curve will be computed at all times �� failure time or changein time dependent covariates� indicated for each individual in �le�

EQUAL SPACE interval limitThe estimate of the survivor curve will be computed at equally spaced times for each

individual two �gures are needed� time interval and time limit For example� SURVIVALEQUAL SPACE � ��� will compute the value of the survivor curve at time �� ��� ��� � � �

Page 40: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

��� ���

SPECIFIC time�s�The estimate of the survivor curve will be computed at the speci�ed times indicated for

each individual

Note� The statements SURVIVAL and RESIDUALS are mutually exclusive and shouldnot appear in the same parameter �le

�� �� RESIDUALS Statement

RESIDUALS �le� �le� �

The use of the RESIDUALS statement is �so far� limited to the COX program It speci�esthat the generalized residuals �Cox and Snell� ����� of the records in �le � should becomputed and printed in the output �le �le � If all generalized residuals are requested�general case�� �le � is the unsorted recoded data �le �direct output of PREPARE�If the option STORAGE ON DISK is used� the output will also include the martingaleresiduals and the deviance residuals �Klein and Moeschberger� ����

Note� The statements RESIDUALS and SURVIVAL are mutually exclusive and shouldnot appear in the same parameter �le

�� �� CONSTRAINTS Statement

CONSTRAINTS option�

For models not of full rank �all models with �xed discrete class covariates�� constraints canbe imposed upon the e�ects to be estimated to get a set of meaningfull� easy to interpretestimable e�ects The program o�ers di�erent options for setting those constraints�

LARGEST� the program will set to zero the level of each discrete covariate with the largestnumber of uncensored failures This is the default procedure of handling constraints

FIND� the constraints are found by the program This will guarantee the correct numberof constraints to produce a full rank Hessian matrix� so that the tests for the full modelas well as for individual e�ects are based on the correct degrees of freedom �however�the constraints found do not guarantee an easy interpretation of the results� Lineardependencies in the Hessian matrix are found by performing a Cholesky decompositionFor very large problems� the storage of this matrix may be a limiting factor �as for thecomputation of the standard error� see the STD ERROR statement�

NONE� no constraints speci�ed �but implicit ones will appear if standard errors are to becomputed� ie a generalized inverse of the Hessian matrix will be used if needed�

Page 41: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

IMPOSE �CHECK� variable rec level variable rec level ���� the constraints are suppliedby the user variable de�nes the e�ect for which a constraint is to be set �must be a CLASSvariable that has been stated in the MODEL statement� rec level is the recoded valueof the e�ect that should be constraintNote� You have to look into the �le that holds original and recoded levels of the CLASSe�ects ��le� de�ned in the PREPARE parameter �le� to look up what is the recoded valueof the level you actually want to put the constraint on To do so� look for the name ofthe e�ect of interest Columns � �when there is no interaction�� � and � �when two e�ectswere combined into an interaction� or �� �� and � �when three e�ects were combined�display the original code The last column de�nes the new code By use of the optioncheck the program will check whether the constraints speci�ed by the user make sense

�� � CONVERGENCE CRITERION Statement

CONVERGENCE CRITERION real number�

The real number in the CONVERGENCE CRITERION statement provides the termi nation point for the optimisation routine used in the likelihood maximisation Its defaultvalue is parameter EPS BFDEF set in parinclu �usually� �D �� Alternative values maybe invoked either by using this statement or by changing the value of EPS BFDEF inparincluWarning� an unnecessary strict convergence criterion set by the user will lead to a warningmessage by the optimisation routine telling �amongst others� �LINE SEARCH FAILED��although the results are printed and usually valid

�� �� STORAGE Statement

STORAGE option�

The data �le �le� may either be read in and held in core �option IN CORE or while beingread for the �rst time� be written out on a temporary �le in binary mode on disk �optionON DISK� for faster access in subsequent readings of the �le in cases where the corememory available is not big enough to hold the whole data �le The largest number ofelementary records possibly stored in core at the same time is speci�ed by the parameterNRECMAX in the parinclu �le

�� �� STORE SOLUTIONS and READ OLD SOLUTIONS Statement

STORE SOLUTIONS�

Page 42: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

READ OLD SOLUTIONS�

These two statements are only useful for very large applications �applications runningfor hours and days� If the STORE SOLUTIONS statement has been used to write the�nal solutions to �le of the FILES statement� the READ OLD SOLUTIONS statementmay be used in the next run to retrieve those solutions from �le� of the FILES statementThe model does not need to be the same in both runs This is mainly useful when oneis �tting a complex model from a simpler one or in connection with the ESTIMATEoption of the RANDOM statement When the results of the estimation indicate thatthe true value of the distributional parameter for the log gamma or Normal distributionslies outside the prespeci�ed interval� the values of the interval may be changed and usingREAD OLD SOLUTIONS the optimisation may proceed from the point reached alreadyFor storage of solutions not only at the end of the optimisation procedure but aftereach iteration� an alternative approach has to be taken In the parinclu �le� a characterparameter BACKUP is de�ned This parameter may be used to de�ne the name of a�le where solutions will be stored each iteration If the solutions are needed �eg ina restart after a system shut down�� you only have to give the BACKUP �lename as�le� �they have exactly the same structure� in the FILES statement and include theREAD OLD SOLUTIONS statement in your parameter �le

�� �� STORE BINARY Statement

STORE BINARY�

This statement is only useful for very large applications �applications running for hoursand days� The reading of binary �les is much faster than the reading of �les in freeformat If you have very large datasets which you may use for di�erent types of repeatedanalyses� it may therefore be useful to create and repeatedly read a binary version of therecoded data �le This is easily done by specifying �FORMOUT UNFORMATTED�� or�FORMOUT BLOCKED UNFORMATTED�� in the data preparation parameter �le �forPREPARE�� when the initial data �le is recoded

However� this strategy is often �in the case of FORMOUT UNFORMATTED� or al ways �in the case of FORMOUT BLOCKED UNFORMATTED� incompatible with thesorting of the recoded �le� for example when strati�cation or integration of a randome�ect is envisioned The statement STORE BINARY is a way to avoid this di�culty�with STORE BINARY in your parameter �le� the program only reads �le� in free for mat and writes it �under the same name � �bin� as a binary �blocked unformatted��le Then the program will stop In the new �le� records are blocked together by

Page 43: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

groups of size NRECBLOC� a parameter that is set �and may be changed� in parin�clu Then� you must change the format statement in parameter �le ��� from what it is toBLOCKED UNFORMATTED In the subsequent runs� the STORE BINARY statementis deleted and �le� is replaced by its new name �ie� with �bin at the end� Note how ever that you cannot sort the blocked binary �le anymore This means that the modelsshould not di�er in the de�nition of the STRATA statement or of the INTEGRATE OUTstatement �these are sorting criteria�

Note� the statement READ BINARY used in previous versions of the Survival Kit isnow obsolete� the fact that the �le to be read is unformatted is indicated in the parameter�le ��� used in COX or WEIBULL

Page 44: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� PROGRAMS COX AND WEIBULL

Page 45: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Chapter �

The parinclu �le

In the parinclu �le��indexparinclu parameters are de�ned which are used for dimensioningof vectors and arrays They may be rede�ned to �t your personal needs and computationalfacilities Criteria like iteration numbers and stopping rules are de�ned and may berede�ned Finally� �le and directory names may be supplied for redirecting special inputand output �les The variables and their settings are explained subsequently If you makeany changes to parinclu� you have to recompile prepare�f� cox�f and weibull�f �or preparec�fand weibullc�f

c�����������������������������������������������������������������������

c Definition of parameters to used in the various programs of the �

c SURVIVAL KIT �

c�����������������������������������������������������������������������

IMPLICIT NONE

c

c�����������������������������������������������������������������������

c�����������������������������������������������������������������������

c PARAMETERS THAT MAY BE MODIFIED BY THE USER

c Remember� when you change any of the parameters� you have to recompile

c your programs�

c�����������������������������������������������������������������������

c�����������������������������������������������������������������������

c

c

c Parameters related to the size of your problem

c ����������������������������������������������

c

INTEGER MXSTRA�NTIMMAX�NSTIMAX

��

Page 46: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� THE PARINCLU FILE

INTEGER NRECMAX�NRECMAX��MXEF�MXEF�USED�NDIMAX�NDIMAX�

c

c�� NRECMAX � maximum number of elementary records

c�� �Caution� with time�dependent variables this number may be much larger

c�� than the number of records in your original data file

c

PARAMETER�NRECMAX�������c

c�� NRECMAX� � maximum number of elementary records in the future

c�� data file �FUTURE statement in PREPARE�

c�� These will normally be much fewer than in the data file

c

PARAMETER�NRECMAX�����c

c�� MXEF � max� number of covariates� These are discrete

c�� �CLASS covariates and continuous covariates�

c�� Time dependent covariates have to be counted twice �because

c�� the states before and after change occupy two positions on

c�� the recoded data file

c

PARAMETER�MXEF���c

c�� MXEF�USED � max� number of covariates actually used in the

c�� MODEL statement of COX or WEIBULL�

c�� There may be situations where you create recoded files with

c�� very many covariates with PREPARE and never use all of them

c�� together in COX or WEIBULL� In this case it may make sense

c�� to have MXEF�USED smaller that mxef �it must not be larger

c�� to save core memory space�

c

PARAMETER�MXEF�USED���c

c�� MXSTRA� max� number of strata� i�e� levels of the strata variable

c�� defined in the STRATA statement�

c

PARAMETER�MXSTRA���c

c�� NDIMAX � max� total number of levels of effects in the model

c�� �size of the vector of solutions

c

PARAMETER�NDIMAX����

Page 47: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

c

c�� NDIMAX� � this parameter is used to indicate whether the Hessian

c�� matrix is to be stored or not� It is used in COX and �WEIBULL with the

c�� DENSE�HESSIAN statement�

c�� The Hessian is needed to calculate standard deviations of estimated

c�� effects �STD�ERROR statement and to find dependencies �option FIND

c�� in the CONSTRAINT statement�

c�� In WEIBULL� the Hessian is normally stored in sparse form �see

c�� parameters AVAIL and NZE below except when stating DENSE�HESSIAN

c�� ndimax� � ndimax �� Hessian is needed �and fully stored

c�� ndimax� � � �� Hessian is not needed �or sparsely stored

c

c�� vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

c�� � CAUTION� this is likely to be a limiting parameter for the size �

c�� � of your problem� An incorrect specification of this parameter is �

c�� � is also one of the most frequent sources of errors when using the �

c�� � Survival Kit �

c�� vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

c

c PARAMETER�NDIMAX���

PARAMETER�NDIMAX��NDIMAXc

c�� NTIMMAX � largest possible value of the �dependent time variable

c�� this value is necessary as an upper limit for an efficient

c�� comutation of log�time and exp�time and when calculating

c�� functional values of the survivor curve �SURVIVAL statement

c�� of COX and WEIBULL

c

PARAMETER�NTIMMAX�����c

c�� NSTIMAX � maximum number of distinct times or quantiles that

c�� can be defined in the SURVIVAL statement options SPECIFIC

c�� and QUANTILE�

c

PARAMETER�NSTIMAX����c

c�� INTE�ST gives the possibility to integrate out one loggamma

c�� random effect with the option INTEGRATE�OUT ��� WITHIN ���

c�� WHILE a STRATA statement is specified

c�� INTE�ST�� ��� usual case

c�� INTE�ST�� ��� STRATA � INTEGRATE�OUT �� WITHIN ���

c

PARAMETER�INTE�ST��

Page 48: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� THE PARINCLU FILE

c

INTEGER BLKSIZE�BLKSIZE��BLKSIZE�

PARAMETER�BLKSIZE���MXEF���NRECMAX

PARAMETER�BLKSIZE��MXEF�USED�INTE�ST�NRECMAX

PARAMETER�BLKSIZE���MXEF���NRECMAX�

c

c Parameters used in the lbfgs maximisationn routine

c ��������������������������������������������������

c

INTEGER ITER�BF�MVEC�BF�NO�LOG

REAL�� EPS�BFDEF

c�� MVEC�BF � number of vectors used to store the approximation of the Hessian

c�� �rank of approx� Hessian � MVEC�BF� Values between � and �� might be

c�� used�

c

PARAMETER�MVEC�BF� ��c

c�� EPS�BSDEF � Default convergence criterion� may be overridden using

c�� the CONVERGENCE�CRITERION statement in COX or WEIBULL�

c�� ���GRADIENT���FUNCTION � EPS�BS max�����BETA��

c

PARAMETER�EPS�BFDEF���D��

c

c�� ITER�BF � maximum number of iterations�

c

PARAMETER�ITER�BF� ����c

c�� NO�LOG relates to the parameter rho of the WEIBULL distribution�

c�� no�log � � �� rho is used in the maximisation routine

c�� no�log � � �� log�rho is used in the maximisation routine

c

PARAMETER�NO�LOG��

c

c Parameters used in the numerical integration to get moments of the

c marginal posterior of a random effect variance parameter

c �when the MOMENTS option in the RANDOM statement is used

c�������������������������������������������������������������������

c

INTEGER NPGAUSS�NITER�GAUSS

c

c�� NPGAUSS � number of points used during Gauss�Hermite integration

c �has to be between � and �

c

PARAMETER�NPGAUSS��

Page 49: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

c

c�� NITER�GAUSS � number of iterations in the Gauss�Hermite integration

c process

C

PARAMETER�NITER�GAUSS��

c

c

c Parameters relating to sparse storage of the Hessian matrix in WEIBULL

c ����������������������������������������������������������������������

c

INTEGER NDIMSPAR

INTEGER AVAIL�NZE

c

c�� NDIMSPAR � working vector to store intermediate elements used in

c�� the recoding in PREPARE or during the sparse computation of

c�� the HESSIAN in Weibull

c�� this should be a few times bigger than NDIMAX �no general rule

c

PARAMETER�NDIMSPAR�NDIMAX����c

c�� AVAIL � declared dimension of working vector IS �used in fspak�f�

c�� the sparse matrix package of MISZTAL and PEREZ�ENCISO

c�� the required number is a function of the number of nonzero elements

c�� of the HESSIAN matrix

c�� �note� this is the dimension of a VECTOR� not a MATRIX

c

PARAMETER�AVAIL������c

c�� NZE � declared dimension of the vector �HESSIAN� which includes

c�� all nonzero elements of the matrix of second derivatives�

c�� Often� one can take NZE � AVAIL

c�� �note� this is the dimension of a VECTOR� not a MATRIX

c

PARAMETER�NZE������

c

c

c Parameters relating to the binary storage of the initial file when

c�� the statement STORE�BINARY is used

c ����������������������������������������������������������������������

INTEGER NRECBLOC

c

c�� nrecbloc � number of records per block when writting�reading

Page 50: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� THE PARINCLU FILE

c�� a binary file for fast I�O �statements STORE�BINARY and

c�� READ�BINARY

c

PARAMETER�NRECBLOC������c

c Parameters related to the parameter file of PREPARE

c ���������������������������������������������������

c

LOGICAL SUPPLY�INPUT�

CHARACTER��� INPUT�FILE�

c

c�� Normally� the name of the parameter file will be requested by

c�� program PREPARE� The following parameters are designed to

c�� circumvent this request and start the program using a defined name

c�� for the parameter file

c

c�� SUPPLY�INPUT� is a logical variable defining the way the name

c�� of the parameter file is supplied� �true� if the name of the input

c�� supply�file��true� �� parameter file is supplied at running time

c�� supply�file��false� �� name of parameter file is defined in parinclu

c

PARAMETER �SUPPLY�INPUT���true�c

c�� INPUT�FILE� � name of the parameter file �used only if

c�� supply�file� � �false�

c

PARAMETER �INPUT�FILE�� PARM�

c

c

c Parameters related to the parameter and report files in COX and WEIBULL

c �����������������������������������������������������������������������

c

LOGICAL UPPER�CASE�DETAIL�IT

LOGICAL SUPPLY�INPUT��SUPPLY�INPUT��SUPPLY�OUTPUT

CHARACTER��� INPUT�FILE��INPUT�FILE��OUTPUT�FILE

CHARACTER���� TEMPDIR�BACKUP

c

c�� upper�case � �TRUE� if all words �including file names used in the

c�� parameter file are to be converted to upper case �otherwise� the

c�� variables in the statements are case sensitive�

PARAMETER�UPPER�CASE��false�

c

Page 51: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

c�� SUPPLY�FILE� � logical variable � �true� if the name of parameter file �

c�� �provided by PREPARE is supplied at running time ��false� if supplied

c�� in parinclu

c

PARAMETER �SUPPLY�INPUT���true�

c

c�� INPUT�FILE� � name of parameter file � �only if supply�file���false�

c

PARAMETER �INPUT�FILE�� para

c

c�� SUPPLY�FILE� � logical variable � �true� if the name of parameter file �

c�� is supplied at running time ��false� if supplied in parinclu

c

PARAMETER �SUPPLY�INPUT���true�

c

c�� INPUT�FILE� � name of parameter file � �used only if supply�file���false�

c

PARAMETER �INPUT�FILE�� PARM�

c

c�� SUPPLY�OUTPUT � logical variable � �true� if the name of the detailed output

c�� �with information about iterations is supplied at running time

c

PARAMETER �SUPPLY�OUTPUT��true�

c

c�� OUTPUT�FILE � name of output file �used only if supply�output��false�

c

PARAMETER �OUTPUT�FILE�

c

c�� TEMPDIR � name of the directory used for temporary files

c�� �� if � current directory

c

PARAMETER �TEMPDIR�

c

c�� BACKUP � name of the file used to backup solutions� Solutions

c�� will be stored after each iteration and may be used to restart

c�� the iteration procedure in case the machine breaks down during

c�� the maximisation �you have to use the READ�OLD�SOLUTIONS statement

c�� of COX or WEIBULL then� Only useful for very large applications�

c�� backup � if no backup file should be produced�

c

PARAMETER �BACKUP� backup

Page 52: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� THE PARINCLU FILE

c

c�� DETAIL�IT � switch on �if � �true� or off �if ��false� the

c�� printing of detailed information �for each effect at each iteration

c�� of the maximization subroutine �average�mean or max absolute change

c�� between iterations

PARAMETER �DETAIL�IT��false�c

c�����������������������������������������������������������������������

c�����������������������������������������������������������������������

c PARAMETERS THAT MAY �NOT� BE MODIFIED BY THE USER

c Don t you dare look beyond this point ��

c�����������������������������������������������������������������������

c�����������������������������������������������������������������������

c

c Parameters related to the reading and translation of parameter files

c ��������������������������������������������������������������������

c

INTEGER MAXWORD�MAXLENG�MAXFIG�LRECL�NBUG�����

INTEGER NKEYFE�NKEYC�NKEYW

c

c�� maxword � max� number of words in the parameter file

c�� maxleng � max� number of characters in a word

c�� maxfig � max� number of figures �i�e� numbers in the parameter file

c�� lrecl � number of columns read per record

c�� nbug����� � maximum year value �yy������nbug����� under which a D� date

c�� is transformed from ddmmyy to ddmm�����yy

C�� to avoid the year ���� bug

c�� parameter files should be automatically transformed to upper�case names

c�� nkeyfe � number of keywords �i�e� statement types used in the

c�� parameter file for PREPARE

c�� nkeyc � number of keywords used in the parameter file for COX

c�� nkeyc � number of keywords used in the parameter file for WEIBULL

PARAMETER�MAXWORD�����

PARAMETER�MAXLENG���

PARAMETER�MAXFIG����

PARAMETER�LRECL���

PARAMETER�NBUG��������PARAMETER�NKEYFE���

PARAMETER�NKEYC���

PARAMETER�NKEYW���

Page 53: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Chapter �

A small example

The following �completely arti�cial� example will be used to illustrate how to analyzesurvival data using the Survival Kit

� Inital data �le small�dat�Consider the data �le�

� � � � � � � � ��

� �� � � � � � � ��

� �� � � � � � � ��

� �� � � � �

� �� � � � �

� �� � � � � � � ��

� �� � � � �

� � � � � �

�� � � � �

�� �� � � � � � �� ��

�� �� � � � � � �� ��

As speci�ed in the INPUT line of the data preparation parameter �le �small�par�see below�� the �rst variable for each line corresponds to the animal identi�cationnumber �not necessarily from � to n as it is the case here� The second repre sents failure or censoring time The third is the censoring code �here� ��censored�Column � refers to the level of sex e�ect and column � to the level of treatmente�ect It is assumed that treatment does not start at time � but later on Hence�the treatment variable is considered as a time dependent covariate This explainswhy the �fth column is always � here� the animal is not treated at t�� The nextcolumn indicated the number of changes in time dependent covariates Here� thereis only � change per animal� at most When there is a change �column � � ��� atriplet follows indicating the column number of the relevant covariate �column � ��treatment��� the time of change �eg� t�� for record �� and the new value of thecovariate ���� here�� at time �� the treatment level for animal � changes from thevalue � to the value ��

��

Page 54: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

� Data preparation parameter �le small�par

This is the �le that will be used as input for the PREPARE program

� FILE NAMES �

FILES small�dat small��dat small�nco small�paa

� VARIABLE NAMES �

INPUT id I� longev I� code I� sex I� treat I�

� TIME VARIABLE �

TIME longev

� CENSORING CODE �

CENSCODE code �

� TIME DEPENDENT COVARIATES �

TIMEDEP treat I�

� IDENTIFICATION VARIABLE FOR EACH RECORD �

IDREC id

� DISCRETE VARIABLES �

CLASS treat sex

� DEFINITION OF AN INTERACTION �

COMBINE s�by�t� sex � treat

� LIST OF RECODED VARIABLES �

OUTPUT longev id code treat sex s�by�t

� FORMAT OF THE OUTPUT FILE �

FORMOUT FREE�FORMAT

The treatment variable is explicitely de�ned as a time dependent covariate� withchanges indicated as integers �I��� not dates The COMBINE statement de�nes anew variable s by t� representing the interaction of sex by treatment The id doesnot appear in the CLASS statement and therefore will not be recoded

� Log �le from the PREPARE program

The �le name small�par is typed when PREPARE asks for an input �le The log�le indicates that � elementary �recoded� records were created �� ��� becausetreatment is time dependent The number of classes is also indicated for the discrete�class� variables

Records read from input data file ��

Elementary records written ��

Class variable treat with � classes

Class variable sex with � classes

Class variable s�by�t with � classes

REMINDER� File small��dat has to be sorted for the use of COX�

Sorting order�

Page 55: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

Strata variable� ascending �only if strata are used�

Time variable� descending

Censoring variable� ascending

� Recoded data set small��dat

� �� � � � � � �

� � � � � � � �

� �� � � � � � �

�� � � � � � � �

� �� � � � � � �

�� � � � � � � �

�� � � � � � � �

�� � � � � � � �

� �� � � � � � �

�� � � � � � � �

�� � � � � � � �

� � � � � � � �

�� � � � � � �

�� �� �� � � � � �

�� � �� � � � � �

�� �� �� � � � � �

�� � �� � � � � �

The structure of this �le is given in �le small�paa which is an output �le of thePREPARE program and an input �les for the programs COX and WEIBULL seebelow� The �rst column is the time of failure� of censoring� of change in at leastone time dependent covariate� or of truncature �for truncated records� These fourcategories are indicated by a code in column � with value �� �� � or � respectivelyThe third column indicates the id number The remaining ones correspond tothe covariates used� individually recoded from � to the total number of levels� fordiscrete covariates When a covariate is time dependent� two columns are used� onecorresponding to the value of the covariate before the change and the next one forafter the change For example� for animal � ��rst elementary record�� at time �� thelevel of the treatment e�ect changes �code� �� from recoded value � to recodedvalue � �columns � and �� while the level for the s by t e�ect changes also from � to� �columns and �� At time � �second elementary record�� animal � fails �code���

� parameter �le ��� �for COX and WEIBULL� small�paa

This �le is created by the PREPARE program It explains the structure of thesmall��dat recoded �le

TIME �

CODE �

ID �

COVARIATE

� class variables �

treat � � �

sex � � �

s�by�t � � �

FREE�FORMAT

Page 56: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

For example� there are � levels of treatment stored in columns � �before any change�and � �after the change� The sex e�ect is time independent� only one column ���is needed

� �le small�nco

This �le is also an output �le from the PREPARE program It gives the orrespon dence between old and new codes

treat � � � � � �� �

treat � � �� � � � �

sex � � � � � � �

sex � � � � � � �

s�by�t � � � � � � �

s�by�t � � � �� � � �

s�by�t � � � � � � �

s�by�t � � � �� � � �

For each level of a discrete covariate� the name of the covariate �eg� s by t for thelast line� is followed by the total number of levels of this e�ect ���� the numberof columns involved in its description ��� because it is a � term interaction� andthe corresponding original codes �sex � x treatment ��� Up to � e�ects can becombined into an interaction Column indicates the number of elementary recordsfound with this particular level of the covariate ��� and column � �last one� givesthe new code ���

recoded data set for COX small�s�dat

This is the recoded data set small��dat after sorting by decreasing time change�column �� and increasing codes �column �� �no strati�cation assumed�

�� � �� � � � � �

�� � �� � � � � �

�� � � � � � �

� � � � � � � �

�� � � � � � � �

�� �� �� � � � � �

�� �� �� � � � � �

�� � � � � � � �

�� � � � � � � �

�� � � � � � � �

�� � � � � � � �

�� � � � � � � �

� �� � � � � � �

� �� � � � � � �

� �� � � � � � �

� � � � � � � �

� �� � � � � � �

� parameter �le ��� for COX

Page 57: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� FILES �

FILES small�s�dat small�nco small�rco

� TITLE �

TITLE SMALL EXAMPLE � COX MODEL

� MODEL �

MODEL sex treat

� TEST �

TEST SEQUENTIAL LAST

� STANDARD ERROR �

STD�ERROR

� OTHER COMPUTATIONS �

BASELINE

RESIDUAL small��dat small�res

The input data �le is the sorted� recoded data �le small�s�dat The model sim ply includes sex treatment e�ects �no interaction here� The results will be storedin small�rco Likelihood ratio tests will be performed� comparing models with nocovariates� with sex only and sex and treatment �TEST SEQUENTIAL� and com paring the full model with restricted model �excluding sex � with treatment only�and excluding treatment � with sex only� TEST LAST� After computing the re gression coe�cients using Cox�s partial likelihood� the baseline will be computed�BASELINE� and generalized residuals will be calculated for all records in the un�sorted �le small��dat and they will be stored in the small�res �le

� log �le from COX

This �le appears on the screen terminal if the answer �NONE� or � � is given whena name for the detailed output �le is requested

The �le starts with the names of the � input parameter �les and other �le names�followed by a brief list of characteristics of the data set� of the model and someelementary statistics

parameter file� small�paa

parameter file� small�cox

�����������������������������������������

SMALL EXAMPLE � COX MODEL

�����������������������������������������

FILES USED �

RECODED DATASET � small�s�dat

NEWOLD CODES � small�nco

OUTPUT � small�rco

Page 58: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

RESIDUALS �

RECODED DATASET � small��dat

OUTPUT � small�res

�����������������������������������������

NO STRATIFICATION

FAILURE TIME READ IN COLUMN � �

CENSORING CODE READ IN COLUMN � �

IDENTIFICATION NUMBER READ IN COLUMN � �

TOTAL NUMBER OF COVARIATES � �

CONTINUOUS COVARIATES INCLUDED IN THE ANALYSIS � �

DISCRETE COVARIATES INCLUDED IN THE ANALYSIS � �

COVARIATES READ BUT NOT INCLUDED IN THE ANALYSIS � �

POWERS OF CONTINUOUS COVARIATES � �

�������������� COX MODEL ����������������

COVARIATE CHARACTERISTICS

� sex � DISCRETE TIME�INDEPENDENT READ IN COLUMN �

� treat � DISCRETE TIME�DEPENDENT READ IN COLUMNS � �BEFORE T� AND � �AFTER T�

CONVERGENCE CRITERION USED � ������D��� �� DEFAULT VALUE�

���������� SIMPLE STATISTICS ������������

TOTAL NUMBER OF STRATA � �

TOTAL NUMBER OF ELEMENTARY RECORDS � ��

RIGHT CENSORED RECORDS � � � ��������

MINIMUM CENSORING TIME � ��

MAXIMUM CENSORING TIME � ��

AVERAGE CENSORING TIME � ������

UNCENSORED RECORDS �

MINIMUM FAILURE TIME � �

MAXIMUM FAILURE TIME � ��

AVERAGE FAILURE TIME � ������

EFFECT � sex MIN � � MAX � �

EFFECT � treat MIN � � MAX � �

Then the �rst few recoded elementary records are printed� as well as the constraintsused �here the COX program chose to set to zero level � of sex and level �� oftreatment� because they are the levels with the largest number of uncensored obser vations�

DATA �

� � �� � � � � �

� � �� � � � � �

� � �� � � � � �

� � � � � � � �

� � �� � � � � �

� � �� �� � � � �

� � �� �� � � � �

Page 59: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

� � �� � � � � �

� �� � � � � �

NUMBER OF ELEMENTARY RECORDS KEPT � ��

������� CONSTRAINTS ������

THE SOLUTION FOR THE FOLLOWING �RECODED� LEVELS IS SET TO ��

�� LEVEL WITH LARGEST NUMBER OF UNCENSORED FAILURES FOR EACH EFFECT�

�WARNING � THE VALIDITY OF THESE CONSTRAINTS IS NOT CHECKED�

IN CASE THEY ARE MORE DEPENDENCIES�

THE DEGREES OF FREEDOM IN THE TEST�S� BELOW ARE INCORRECT�

EFFECT sex LEVEL �

EFFECT treat LEVEL ��

Then limited storage quasi Newton �Liu and Nocedal� ����� minimization of FUNCT�minus the likelihood function starts for each �sub �model GNORM is the norm ofvector of �rst derivative of FUNCT CONV CRITERION is equal to�

GNORM�max��� FUNCT ��

Let CC be the requested convergence criterion The program stops the minimiza tion process in the following situations�

� CONV CRITERION is less than ��� � CC

� CONV CRITERION is less than CC and the total number of FUNCT evalu ations is at least equal to the number of corrections ���� below�

� After many attemps �usually ���� the program fails to �nd a smaller value ofFUNCT This is very often due to too strict a convergence criterion A carefulexamination of the variation of the FUNCT value during the last iterations willhelp to reveal that If FUNCT is still varying substantially� the program failedto �nd a proper minimum �often because the model is incorrect or inconsistent�If FUNCT is stable� one can safely assume that the results are correct and thata minimum has been found

DATA �

�������������������������������������������������

N� � NUMBER OF CORRECTIONS���

INITIAL VALUES

F� �����D��� GNORM� ����D���

�������������������������������������������������

I NFN FUNCT GNORM STEPLENGTH CONV� CRITERION

� � ����������D��� �����D��� ����D��� ����D���

� � ����������D��� �����D��� �����D��� �����D���

� � ����������D��� �����D��� �����D��� �����D���

� � ����������D��� ����D��� �����D��� �����D���

� � ����������D��� ����D��� �����D��� �����D��

Page 60: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

� � ����������D��� ����D��� �����D��� ����D���

THE MINIMIZATION TERMINATED WITHOUT DETECTING ERRORS�

CONVERGENCE AFTER � EVALUATIONS OF FUNCTION FCOX IN ��� SECONDS

�������������������������������������������������

N� � NUMBER OF CORRECTIONS���

INITIAL VALUES

F� �����D��� GNORM� �����D���

�������������������������������������������������

I NFN FUNCT GNORM STEPLENGTH CONV� CRITERION

� � �����������D��� �����D��� �����D��� �����D���

� � �����������D��� ����D��� �����D��� �����D���

� � �����������D��� ���D��� �����D��� �����D���

� � �����������D��� �����D��� �����D��� �����D���

THE MINIMIZATION TERMINATED WITHOUT DETECTING ERRORS�

CONVERGENCE AFTER � EVALUATIONS OF FUNCTION FCOX IN ��� SECONDS

�������������������������������������������������

N� � NUMBER OF CORRECTIONS���

INITIAL VALUES

F� �����D��� GNORM� �����D���

�������������������������������������������������

I NFN FUNCT GNORM STEPLENGTH CONV� CRITERION

� � �����������D��� �����D��� ����D��� �����D���

� � ����������D��� �����D��� �����D��� �����D���

� � ���������D��� �����D��� �����D��� �����D���

� � ���������D��� �����D��� �����D��� ����D���

� � ���������D��� �����D��� �����D��� �����D���

� � ���������D��� ����D��� �����D��� ���D���

� � ���������D��� ����D��� �����D��� �����D��

� ���������D��� �����D��� �����D��� ����D���

THE MINIMIZATION TERMINATED WITHOUT DETECTING ERRORS�

CONVERGENCE AFTER � EVALUATIONS OF FUNCTION FCOX IN ��� SECONDS

VALUE OF F � �����������

TIME FOR THE COMPUTATION OF THE HESSIAN � ��� SECONDS

TIME FOR NUMERICAL FACTORIZATION � ��� SECONDS

TOTAL TIME � ��� SECONDS

������� BYE � BYE ��������

�� Results from COXThe beginning is exactly the same as the log �le ��les names� model characteristics�simple statistics� constraints�

�����������������������������������������

Page 61: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

SMALL EXAMPLE � COX MODEL

�����������������������������������������

FILES USED �

RECODED DATASET � small�s�dat

NEWOLD CODES � small�nco

OUTPUT � small�rco

RESIDUALS �

RECODED DATASET � small��dat

OUTPUT � small�res

�����������������������������������������

NO STRATIFICATION

FAILURE TIME READ IN COLUMN � �

CENSORING CODE READ IN COLUMN � �

IDENTIFICATION NUMBER READ IN COLUMN � �

TOTAL NUMBER OF COVARIATES � �

CONTINUOUS COVARIATES INCLUDED IN THE ANALYSIS � �

DISCRETE COVARIATES INCLUDED IN THE ANALYSIS � �

COVARIATES READ BUT NOT INCLUDED IN THE ANALYSIS � �

POWERS OF CONTINUOUS COVARIATES � �

�������������� COX MODEL ����������������

COVARIATE CHARACTERISTICS

� sex � DISCRETE TIME�INDEPENDENT READ IN COLUMN �

� treat � DISCRETE TIME�DEPENDENT READ IN COLUMNS � �BEFORE T� AND � �AFTER T�

���������� SIMPLE STATISTICS ������������

TOTAL NUMBER OF STRATA � �

TOTAL NUMBER OF ELEMENTARY RECORDS � ��

RIGHT CENSORED RECORDS � � � ��������

MINIMUM CENSORING TIME � ��

MAXIMUM CENSORING TIME � ��

AVERAGE CENSORING TIME � ������

UNCENSORED RECORDS �

MINIMUM FAILURE TIME � �

MAXIMUM FAILURE TIME � ��

AVERAGE FAILURE TIME � ������

EFFECT � sex MIN � � MAX � �

EFFECT � treat MIN � � MAX � �

NUMBER OF ELEMENTARY RECORDS KEPT � ��

������� CONSTRAINTS ������

THE SOLUTION FOR THE FOLLOWING �RECODED� LEVELS IS SET TO ��

�� LEVEL WITH LARGEST NUMBER OF UNCENSORED FAILURES FOR EACH EFFECT�

�WARNING � THE VALIDITY OF THESE CONSTRAINTS IS NOT CHECKED�

IN CASE THEY ARE MORE DEPENDENCIES�

Page 62: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

THE DEGREES OF FREEDOM IN THE TEST�S� BELOW ARE INCORRECT�

EFFECT sex LEVEL �

EFFECT treat LEVEL ��

CONVERGENCE AFTER � EVALUATIONS OF FUNCTION FCOX IN ��� SECONDS

CONVERGENCE AFTER � EVALUATIONS OF FUNCTION FCOX IN ��� SECONDS

CONVERGENCE AFTER � EVALUATIONS OF FUNCTION FCOX IN ��� SECONDS

The �rst part of the results includes all the likelihood ratio tests �

������� RESULTS ��������

�� LOG LIKELIHOOD ����������

STANDARDIZED NORM OF GRAD��� LOG L� � ������

������� TESTS ��������

LIKELIHOOD RATIO TEST FOR H�� BETA � �

MODEL CHI� � �������� WITH � DF �PROB �CHI� � ������

R� OF MADDALA ��MEASURE OF EXPLAINED VARIATION� � ����

LIKELIHOOD RATIO TESTS � SEQUENTIAL

VARIABLE TOTAL �� LOG LIK CHI� DELTA PROB R� OF

Z DF INCLUDING Z DF �CHI� MADDALA

NO COVARIATE � ���������

sex � ����������� ����� � ����� �����

treat � ���������� ����� � ����� ����

LIKELIHOOD RATIO TESTS � LAST

VARIABLE TOTAL �� LOG LIK CHI� DELTA PROB R� OF

Z DF EXCLUDING Z DF �CHI� MADDALA

sex � ����������� ������ � ����� �����

treat � ����������� ����� � ����� �����

For example� when the treatment e�ect is added to a model with sex as only co variate� the full model includes sex � treatment while the restricted model includessex only and the corresponding likelihood ratio statistic has a value of ����� Thisstatistic follows a chi square distribution with � degree of freedom under H� �treat ment e�ects��� with a corresponding p value of �� The R� OF MADDALA is ameasure of proportion of explained variation by the model �Maddala� ����� quotedby Schemper� ����� Its formal de�nition is�

R�

M � � ��LR�LU

���n

where n denotes the total sample size and LR and LU represent the restricted andunrestricted �full model� maximum likelihoods� respectively Under moderate cen soring� R�

M can be grossly interpreted as the usual R� for linear models

Page 63: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

Then the estimates of the regression coe�cients are presented� followed by theirasymptotic standard error� a chi square statistic ��square of estimate�standard er ror� for a Wald test of each particular regression coe�cient �i �which tests �i � ���with its associated p value The risk ratio is the exponential of the estimate of theregression coe�cient The last column indicates the number of uncensored recordsfor each level �it is related to the standard error of the estimate�

������� SOLUTIONS ��������

COVARIATE � ESTIMATE STANDARD CHI� PROB RISK UNCENSORED

ERROR �CHI� RATIO FAILURES

� sex �DISCRETE�

� ����� � � � ����� �

� ������� ������ ���� ���� ���� �

� treat �DISCRETE�

� ������� ���� ��� ����� ���� �

�� ����� � � � ����� �

For example� females �sex � �� are at a ������� � �� lower risk of dying thanmales �sex � ��

For ease of interpretation� it is a good idea to impose constraints to force thecomputation of speci�c contrasts For example� one may be interested in evaluatingthe increase in risk of males compared to females and not treated compared totreated� by choosing non treated females as a reference This is done by using theCONSTRAINTS IMPOSE � � � statement in the parameter �le small�cox To de�nethe proper constraints� one has to look in the small�nco �le� sex � �old code� isrecoded as � �new code� and treatment � �old code� is recoded as � �new code� Ifwe add�

CONSTRAINTS IMPOSE sex � treat ��

to small�cox and running COX� we now get�

������� SOLUTIONS ��������

COVARIATE � ESTIMATE STANDARD CHI� PROB RISK UNCENSORED

ERROR �CHI� RATIO FAILURES

� sex �DISCRETE�

� ������ ������ ���� ���� ����� �

� ����� � � � ����� �

� treat �DISCRETE�

� ����� � � � ����� �

�� ������ ���� ��� ����� ����� �

Page 64: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

Now� it directly appears that males are � times more likely to fail than femalesand that treated animals are at a ��� times higher risk than non treated onesThe contrasts between males and females or between treated and non treated ani mals remain the same The baseline however is changed The one reported belowcorresponds to the situation without explicit constraints

������� BASELINE ��������

CUMULATIVE HAZARD FUNCTION AND SURVIVOR FUNCTION FOR STRATUM �

TIME CUMULATIVE SURVIVOR

HAZARD FUNCTION

� ������ ������

�� ������ ������

�� ������� ������

�� ������ �����

�� ������� ������

� ������ ������

�� ������� ������

�� �������� ������

TOTAL TIME � ��� SECONDS

�� Residuals for the Cox model

The generalized residuals of Cox and Snell ������ are calculated for all records ofthe unsorted small��dat �le If the model is correct� they are distributed according toa censored exponential distribution with parameter � This can be checked globallyby plotting the cumulated hazard distribution � log S�RES�� or the expected orderstatistics of the unit exponential distribution against the value of RESIDUALS Thisshould result in a straight line going through the origin and with slope � Note thatexpected order statistics are computed only when STORAGE IN CORE is used

ANIMALS TOTAL FAILURE RESIDUALS S�RES� �LOG S�RES� EXPECTED

AT RISK FAILED ORDER STAT�

�� � � �������� ���� ������� �����

� � ������� �������� ������� ��������

� � � ������� �������� �������� ��������

� � � �������� �������� �������� �������

� � � �������� �������� ������� ��������

� � � ������� �������� ������� ��������

� � � ��������� ������� ��������� ��������

� � � ������� �������� ��������� ��������

� � ��������� �������� INF ��������

It is also possible to get individual residuals� ie� for each observation� includingmartingale and deviance residuals by simply specifying STORAGE ON DISK inthe small�cox �le

Page 65: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

��

TIME ANIMAL CENSORING STRATUM GENERALIZED MARTINGALE DEVIANCE

CODE RESIDUAL RESIDUAL RESIDUAL

� � � � �������� �������� �������

�� � � � ������� ������� ��������

�� � � � ������� �������� ������

�� � � � �������� ��������� ���������

�� � � � �������� ����� ���������

�� � � � �������� �������� �������

�� � � � ��������� ��������� ��������

� � � � ������� ������� ��������

�� � � ������� ������� �������

�� �� � � ��������� ���������� ���������

�� �� � � ��������� ���������� ��������

�� parameter �le ��� for Weibull

The following parameter �le small�wei now replaces small�cox�

� FILES �

FILES small��dat small�nco small�rwe

� TITLE �

TITLE SMALL EXAMPLE � WEIBULL MODEL

� MODEL �

MODEL sex treat

STATISTICS

� TEST �

TEST SEQUENTIAL LAST

� STANDARD ERROR �

STD�ERROR

�� results from WEIBULL

The beginning is similar to the Cox outputs� except that here� raw statistics by levelof each discrete factor included in the model are given�

�����������������������������������������

SMALL EXAMPLE � WEIBULL MODEL

�����������������������������������������

FILES USED�

RECODED DATASET � small��dat

NEWOLD CODES � small�nco

OUTPUT � small�rwe

�����������������������������������������

FAILURE TIME READ IN COLUMN � �

CENSORING CODE READ IN COLUMN � �

Page 66: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

IDENTIFICATION NUMBER READ IN COLUMN � �

TOTAL NUMBER OF COVARIATES � �

CONTINUOUS COVARIATES INCLUDED IN THE ANALYSIS � �

DISCRETE COVARIATES INCLUDED IN THE ANALYSIS � �

COVARIATES READ BUT NOT INCLUDED IN THE ANALYSIS � �

POWERS OF CONTINUOUS COVARIATES � �

���������������� MODEL ������������������

COVARIATE CHARACTERISTICS

� sex � DISCRETE TIME�INDEPENDENT READ IN COLUMN �

� treat � DISCRETE TIME�DEPENDENT READ IN COLUMNS � AND �

���������� SIMPLE STATISTICS ������������

TOTAL NUMBER OF WEIBULL PARAMETERS TO COMPUTE � �

TOTAL NUMBER OF RECORDS � ��

TOTAL NUMBER OF ELEMENTARY RECORDS � ��

RIGHT CENSORED RECORDS � � � ��������

MINIMUM CENSORING TIME � ��

MAXIMUM CENSORING TIME � ��

AVERAGE CENSORING TIME � ������

UNCENSORED RECORDS �

MINIMUM FAILURE TIME � �

MAXIMUM FAILURE TIME � ��

AVERAGE FAILURE TIME � ������

EFFECT � sex MIN � � MAX � �

EFFECT � treat MIN � � MAX � �

�����������

� WEIBULL PARAMETER�S� RHO WILL BE ESTIMATED

CONVERGENCE CRITERION USED � ������D��� �� DEFAULT VALUE�

NUMBER OF ELEMENTARY RECORDS KEPT � ��

���������� STATISTICS LEVEL ������������

EFFECT NUMBER � OBSERVED � AGE AT TOTAL � AVERAGE

OF FAILURES FAILURE TIME TIME

INDIVIDUALS

� sex

� � ����� � ����� ����� ��� ����� �����

� � ����� � ����� ���� ���� ����� �����

� treat

� �� ������ � ����� ����� ���� ����� �����

�� � ����� � ����� ����� ��� ���� ����

Note that these statistics should be interpreted with caution for time dependentcovariates� for example� here� �� individuals had a record with level of treatment

Page 67: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

e�ect � � �all animals at t��� while � of them also had a record with level oftreatment e�ect � ��

������� CONSTRAINTS ������

THE SOLUTION FOR THE FOLLOWING �RECODED� LEVELS IS SET TO ��

�� LEVEL WITH LARGEST NUMBER OF UNCENSORED FAILURES FOR EACH EFFECT�

�WARNING � THE VALIDITY OF THESE CONSTRAINTS IS NOT CHECKED�

IF THEY ARE MORE DEPENDENCIES� DEGREES OF FREEDOM IN TEST�S� BELOW ARE INCORRECT�

EFFECT sex LEVEL �

EFFECT treat LEVEL ��

CONVERGENCE AFTER � EVALUATIONS OF FUNCTION FWEIB IN ��� SECONDS

CONVERGENCE AFTER �� EVALUATIONS OF FUNCTION FWEIB IN ��� SECONDS

CONVERGENCE AFTER �� EVALUATIONS OF FUNCTION FWEIB IN ��� SECONDS

������� RESULTS ��������

�� LOG LIKELIHOOD ����������

STANDARDIZED NORM OF GRAD��� LOG L� � ������

������� TESTS ��������

LIKELIHOOD RATIO TEST FOR H�� BETA � �

MODEL CHI� � ������� WITH � DF �PROB �CHI� � ������

R� OF MADDALA ��MEASURE OF EXPLAINED VARIATION� � �����

LIKELIHOOD RATIO TESTS � SEQUENTIAL

VARIABLE TOTAL �� LOG LIK CHI� DELTA PROB R� OF

Z DF INCLUDING Z DF �CHI� MADDALA

NO COVARIATE � �����������

sex � ���������� ���� � ����� �����

treat � ���������� ������ � ����� �����

LIKELIHOOD RATIO TESTS � LAST

VARIABLE TOTAL �� LOG LIK CHI� DELTA PROB R� OF

Z DF EXCLUDING Z DF �CHI� MADDALA

sex � ����������� ������ � ����� �����

treat � ���������� ������ � ����� �����

The main di�erence iin WEIBULL compared to COX is the inclusion of estimatesof the Weibull parameter � and � log � �called INTERCEPT here� These twoparameters fully describe the baseline hazard function and the baseline survivorfunction S��t� � expf���t��g

������� SOLUTIONS ��������

WEIBULL PARAMETER�S��

Page 68: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� CHAPTER �� A SMALL EXAMPLE

FOR STRATUM � ��� RHO � ������� STE � �����

INTERCEPT � �������� STE � �������

COVARIATE � ESTIMATE STANDARD CHI� PROB RISK UNCENSORED

ERROR �CHI� RATIO FAILURES

� sex �DISCRETE�

� ����� � � � ����� �

� ������ ������ ��� ����� ���� �

� treat �DISCRETE�

� ������ ����� ���� ����� ���� �

�� ����� � � � ����� �

TOTAL TIME � ��� SECONDS

Page 69: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Chapter �

Analysis of very large data sets

This section includes some advice to e�ciently analyze very large data sets

�� PREPARE program

Usually� computing time and memory requirements related to the PREPARE programare not limiting factors The only potential problem that may occur is related to the sizeof the output �recoded� data �le� because the use of time dependent covariates may leadto a much larger number of elementary records �corresponding to each time interval forwhich all time dependent covariates remain constant� than the actual number of recordsin the initial data set In order to limit the size of the recoded data set�

� in the OUPUT statement� specify only the covariates that you will need later�

� because �at this stage� the PREPARE program requires all main e�ects to be in cluded in the OUTPUT statement when an interaction is created with the COM BINE statement� it may be advantageous to recode the interactions beforehand� ifthe main e�ects are not to be used later In contrast� if the interaction e�ect isgoing to be treated as a random� log gamma e�ect that will be integrated out in aWeibull model� it is possible to avoid the recoding of the interaction in PREPAREif INTEGRATE OUT � � � WITHIN � � � is used later on

� for extremely large applications� the program PREPAREC with the statement FOR MOUT COMPRESSED should be prefered as the output �recoded� �le will bewritten in compressed format� using C subroutines In some applications� thecompressed �le may be �� to �� times smaller but computing times are typicallyincreased by a factor �� to �

��

Page 70: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� CHAPTER �� ANALYSIS OF VERY LARGE DATA SETS

Also� unless when an individual random e�ect is to be �tted later on� the ID numbershould not be recoded� ie� it should not appear in the CLASS statement �only in theIDREC and OUTPUT statements�

�� Before running the COX or WEIBULL programs

� In order to limit memory requirements� it is essential to accurately specify someparameters in the parinclu �le In particular�

� NRECMAX �maximum number of elementary records stored in core at onetime� This number should be relatively large �in particular� it should belarger or equal to the exact number of elementary records if the statementSTORAGE IN CORE is used� However� for obvious reasons� this numbershould not exceed the memory capacity

� MXEF USED �maximum number of covariates actually used in the MODELstatement of COX or WEIBULL� Often� the recoded �les created with PRE PARE contain many variables but these are never actually used together inCOX or WEIBULL In such a case� MXEF USED should be equal to the strictmaximum number of e�ects �tted together

� For fast Input � Output�

� I�O operations on free formatted �les are relatively slow In order to e� ciently read and write the data �le� the options UNFORMATTED or �evenbetter� BLOCKED UNFORMATTED should have been used in PREPAREor the STORE BINARY statement should be used the �rst time COX orWEIBULL is run In the latter case� the recoded data �le is copied as ablocked binary �le �and the format in the �rst parameter �le must be changedto BLOCKED UNFORMATTED� NRECBLOC records are written and readtogether This parameter is speci�ed in the parinclu �le It should be ratherlarge �several thousands� in order to play the role of a bu�er but not irrealis tically large �eg� if NRECBLOC������� when the actual number of recordsis ������ will lead to the writing and reading of two blocks of size ������ ���

� When the STORAGE ON DISK statement is used� a blocked binary temporary�le is created with NRECMAX records by block Again� NRECMAX shouldbe large but not too large If space is limiting on the directory where theprograms are run� the name of a �work� directory should be speci�ed in theparinclu �le �parameter TEMPDIR�

Page 71: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

�� COX and WEIBULL programs

� If the assumption of a Weibull baseline hazard function is reasonable� it is highlyrecommended to use a Weibull model instead of a Cox model� in particular whenstandard errors or distribution parameters of random e�ects are requested

� In order to decrease the size of the vectors of parameters�

� Use strati�cation if possible �STRATA statement�

� With WEIBULL� when a random e�ect is included in the model whose es timates are not really needed� a loggamma prior should be used if possible�and the e�ect should be integrated out �INTEGRATE OUT statement�� thisis analogous to the absorption of the equations corresponding to the speci�ede�ect in a linear system of equations Note that like for an �absorption�� the re sulting Hessian matrix is no longer very sparse Hence� the DENSE HESSIANstatement should also be used

� To limit the number of iterations or function evaluations required�

� Store solutions at the end of each run �STORE SOLUTIONS statement� oruse backup �les �BACKUP parameter in the parinclu �le� Start the next runfrom the old solutions �READ OLD SOLUTIONS statement�

� If only a few e�ects are added to a �well known� model� don�t use the timeconsuming TEST SEQUENTIAL or TEST LAST statements which specifythat all the e�ects in the model should be tested� starting from scratch Usethe TEST �SEQUENTIAL� EFFECTS � � � statement instead

� Check that the convergence criterion is not too strict� by looking at the evolu tion of �Log likelihood�� the function which is mimized to get the parametersestimates �CONVERGENCE statement� Be careful� however� an incorrectconcergence criterion may lead to incorrect results

� With WEIBULL� the use of the RHO FIXED statement should lead to a fasterconvergence rate

� When estimating the hyperparameter�s� of the distribution�s� of random e�ects�

� When a loggamma prior is used with the INTEGRATE OUT statement� theJOINT MODE option �which computes the gamma parameter jointly with theother e�ects� should be prefered �faster convergence�

� Don�t choose huge initial intervals or too strict �nal tolerances with the ESTI MATE statement

� Use the MOMENTS statement only if necessary or at least after a run to �ndthe mode of the marginal posterior distribution of the hyperparameter

Page 72: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� CHAPTER �� ANALYSIS OF VERY LARGE DATA SETS

Page 73: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Bibliography

Cox� D ����� Regression models and life table J� Royal Stat� Soc� � Series B� ��������� with discussion

Cox� D R and Oakes� D ������ Analysis of survival data Chapman and Hall� London�UK

Cox� D and Snell� E J ������ A general de�nition of residuals J� Royal Stat� Soc� �Series B� ��������� with discussion

Ducrocq� V and Casella G ������ A Bayesian nalysis of mixed survival models Genet�Sel� Evol�� ��� ��� ���

Ducrocq� V and S�olkner� J ������ �The Survival Kit�� a FORTRAN package for theanalysis of survival data In� �th World Cong Genet Appl Livest Prod� Volume ���pages ����� Dep Anim Poultry Sci� Univ of Guelph� Guelph� Ontario� Canada

Ducrocq� V and S�olkner� J ������ �The Survival Kit � V���� a package for largeanalyses of survival data In� �th World Cong Genet Appl Livest Prod� Volume ��pages ������ Anim Genetics and Breeding Unit� Univ of New England� Armidale�Australia

Kalb�eisch� J D and Prentice� R L ������ The statistical analysis of failure time dataJohn Wiley and sons� New York� USA

Klein� J P and Moeschberger� M ����� Survival analysis Springer Verlag� New York�USA

Liu� D C and Nocedal� J ������ On the limited memory BFGS method for large scaleoptimization Mathematical Programming� ����������

Maddala� G S ������ Limited�dependent and quantitative variables in econometricsCambridge Univ Press� UK

Page 74: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� BIBLIOGRAPHY

Perez Enciso� M� Mizstal� I� and Elzo� M A ������ Fspak� an interface for publicdomain sparse matrix subroutine In� �th World Cong Genet Appl Livest Prod�Volume ��� pages ���� Dep Anim Poultry Sci� Univ of Guelph� Guelph� Ontario�Canada

Prentice� R and Gloeckler� L ����� Regression analysis of grouped survival data withapplication to breast cancer data Biometrics� ������

Schemper M ������ Further results on the explained variation in proportional hazardsregression Biometrika� ���������

Page 75: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

Index

��� ����� ��� ��� ��parinclu �le� �� ��� ��� ��� �

algebraic integration� ��� �ALL TIMES� ��animal id� �animal model� ��� ��AVAIL� ��� ��

BACKUP� ��� ��� �BASELINE� ��� ��� ��baseline� �� binary �le� ��� ��BLOCKED UNFORMATTED� ��� ��� ���

��� ��� ��� �Bug year ����� ��� ��� ��

C subroutines� �� �� �case sensitive� ��� ��CENSCODE� ��censored record� ��CHECK� ��CLASS� ��� ��� ��class variable� � ��� ��CODE� ��� �COEFFICIENT� ��COMBINE� ��� ��� ��� ��comment� ��� ��compile� �COMPRESSED� �� ������ ��� ��� �CONSTRAINT� �� ��� ��� ��continuous variable� � ��� ��� ��CONVDT� ��convergence� ��� ��� ��

convergence criterion� �CONVERGENCE CRITERION� ��� ���

��COVARIATE� ��COX� �� ��� ��� �� ��� ��� �Cox model� �

data preparation� ��date� ��� ��� �����degrees of freedom� ��DENSE HESSIAN� ��� �DETAIL IT� ��deviance residual� �� ��� ��discrete scale� �� �� ��� ��� ��� ��discrete variable� � ��� ��� ��DISCRETE SCALE� ��� ��� ��� ��

EFFECT� ��� �elementary record� ��� �� ��� �elementary records� ��EPS BFDEF� ��� ��EQUAL SPACE� ��ESTIMATE� ��� ��� ��� �explained variation� ��

FILES� ��� ��FIND� ��FIXED FORMAT� ��� ��format� ��� ��FORMOUT� ��frailty� �� � ��FREE FORMAT� ��� ��fspak� ��� ��FUTURE� ��� ��� ��

Page 76: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

� INDEX

Gauss Hermite quadrature� �GENERAL� ��generalized residual� �� �� ��� ��genetic parameter estimation� ��goodness of �t test� ��group of unknown parents� ��grouped data� �� �� ��� ��� ��� ��� ��� ���

��� �

hazard function� �herd year e�ect� ��

ID� ��IDREC� �� ��� ��� ��� ��� ��� �IMPOSE� ��� ��IN CORE� ��� �INCLUDE ONLY� ��INPUT� ��� ��INPUT FILE� ��� ��INTE ST� �INTEGRATE OUT� �� ��� ������ �� �interaction� ��� ��� ��� ��� �intercept� ��ITER BF� ��

JOINCODE� ��� ��� ��JOINT MODE� ��� ��� �

KAPLAN� ��� ��Kaplan Meier estimate� ��

Laplacian integration� ��large dataset� �� �� �� ��� ��� �LARGEST� ��LAST� ��� ��likelihood ratio test� �� ��� ��� ��LINE SEARCH FAILED� ��log gamma distribution� � �� ��� �� ���

��� ��LOGGAMMA� ��� ��

Maddala� ��martingale residual� �� ��� ��

MAXFIG� ��MGS RULES� ��minimization� �� ��MODEL� ��� ��MOMENTS� ��� �MULTINORMAL� ��multivariate normal distribution� � ��MVEC BF� ��MXEF� ��MXEF USED� ��� �MXSTRA� ��

NBUG ����� ��� ��� ��NDIMAX� ��NDIMAX�� ��� ��NDIMSPAR� ��new code� ��� ��NITER GAUSS� ��NONE� ��NORMAL� ��normal distribution� � ��NPGAUSS� �� ��NRECBLOC� ��� ��� ��� �NRECMAX� ��� ��� �NRECMAX�� ��NSTIMAX� ��NTIMMAX� ��number of corrections� ��NZE� ��� ��

ON DISK� ��� �ONLY STATISTICS� ��OUTPUT� ��OUTPUT FILE� ��

partial likelihood� �PC� ��PEDIGREE� ��� ��polynomial regression� ��prediction� ��Prentice and Gloeckler�s model� �� �� ���

��� ��� ��� ��� ��� ��� �

Page 77: Users Man - UGent · 2008. 1. 2. · TIMECO V Statemen t CONVDT Statemen t CLASS Statemen t JOINCODE Statemen t TIMEDEP Statemen t. CONTENTS COMBINE Statemen t OUTPUT Statemen t FUTURE

INDEX

PREPARE� �� ��� ��� �PREPAREC� �� �� ��� ��� �PREVIOUS TIME� ��proportional hazards model� �� � ��

QUANTILES� ��Quasi Newton� �� ��

R� of Maddala� ��RANDOM� ��� ��random e�ect� � �� ��� ����� ��random e�ects� ���� �� ��rank� ��READ BINARY� ��READ OLD SOLUTIONS� ��� ��� ��� �relationship matrix� � ��� ��� ��RESIDUALS� ��� ��� ��results� ��RHO FIXED� ��� �risk ratio� ��

second�� �timing subroutine�� �SEQUENTIAL� ��� ��sire model� ��� ��sire dam model� ��� ��� ��� ��sire mgs model� ��� ��� ��� ��� ��SIRE DAM MODEL� ��sort� ��� ��� ��� ��� ��sparse matrix� �� ��� ��� ��� �SPECIFIC� ��standard error� �� ��� ��STATISTICS� ��� �STD� ��STD ERROR� ��STORAGE� ��� �STORE BINARY� �STORE PREVIOUS TIME� �� ��� ���

��STORE SOLUTIONS� ��� ��� ��� �STRATA� ��� ��� ��� ��� �strata� SUPPLY INPUT� ��� ��

SUPPLY OUTPUT� ��SURVIVAL� ��syntax� �

TEMPDIR� ��� �TEST� ��� ��� �TIME� ��� ��� �time dependent� �� � ��� �� ��� ��� ���

��� ��� ��time unit� ��� �TIMECOV� ��� ��TIMEDEP� ��� ��TITLE� ��triplet� ��� ��TRUNCATE� �truncated record� �� �

UNFORMATTED� ��� ��� ��� ��� ��� �UPPER CASE� ��� ��USUAL RULES� ��

variable name� ��variable type� ��variance parameter� ��

Wald test� ��WEIBULL� �� ��� ��� �� ��� �� �Weibull distribution� �� � �WEIBULLC� �� �� �� ��WITHIN� �� ����� �