Upload
dokhue
View
216
Download
0
Embed Size (px)
Citation preview
1
AnImprovedTwo-StageProceduretoCompareHazardCurves
ZhongxueChen1*,HanwenHuang2*,andPeihuaQiu31DepartmentofEpidemiologyandBiostatistics,SchoolofPublicHealth,IndianaUniversity
Bloomington.1025E.7thstreet,Bloomington,IN,47405,USA.
2DepartmentofEpidemiologyandBiostatistics,CollegeofPublicHealth,UniversityofGeorgia.
Athens,GA30602,USA.
3DepartmentofBiostatistics,CollegeofPublicHealth&HealthProfessionsandCollegeof
Medicine,UniversityofFlorida.Gainesville,FL32611,USA.
*Authorscontributeequally
Runningtitle:Improvedtwo-stageprocedureforsurvivaldata
2
Abstract
Qiu and Sheng has proposed a powerful and robust two-stage procedure to compare two
hazard rate functions. In this paper we improve their method by using the Fisher test to
combine the asymptotically independent p-values obtained from the two stages of their
procedure. In addition, we extend the procedure to situations with multiple hazard rate
functions.Our comprehensive simulation study shows that theproposedmethodhasagood
performanceintermsofcontrollingthetypeIerrorrateandofdetectingpower.Tworealdata
applicationsareconsideredforillustratingtheuseofthenewmethod.
Keywords:Bonferronicorrection;crossing;Fishertest;hazardrate;survivaldata
3
1. IntroductionInsurvivaldataanalysis,onecommontask is tocomparetwoormorehazardcurves.Tothis
end, severalnonparametricmethods, including the log-rank (LR),Gehan–WilcoxonandPeto–
Petotests,amongseveralothers,havebeenproposedintheliterature(LeeandWang,2003).
However, theperformanceof thesemethodsdependson the truealternatives. For instance,
theLRtestismorepowerfulthanothermethodsifthehazardfunctionsunderinvestigationare
parallel. But,when such a proportional hazards assumption is violated, the LR testmay lose
power dramatically. To circumvent this limitation, some methods that are robust to the
violationoftheproportionalhazardsassumptionhavebeenproposedintheliterature(Cheng
et al., 2009; Li et al., 2015; Lin andWang, 2004; Liuet al., 2007;Mantel andStablein, 1988;
Moreauetal.,1992;ParkandQiu,2014;QiuandSheng,2008).
The method proposed by Qiu and Sheng (called the QS test hereafter) is a two-stage
proceduredesignedforcomparingtwohazardratefunctions(QiuandSheng,2008).Inthefirst
stage,theLRtestisperformedwiththehopethattwoparallelhazardscanbedistinguishedin
the casewhen they arenot identical. In the second stage, a test statistic (denoted asNP) is
constructed which has the following properties: it is asymptotically independent of the one
usedinthefirststage,anditispowerfulwhenthetwohazardfunctionscrosseachother.Let𝛼
be the overall significance level; 𝛼! and 𝛼! the significance levels for stages one and two,
respectively.Denote𝑝!and𝑝! thep-valuesfromthefirstandthesecondstages,respectively.
TheQS testperformsas follows. In stageone, if𝑝! ≤ 𝛼!, thenwe reject thenullhypothesis
that thetwohazard functionsarethesameandstopthetestingprocedure;otherwise,goto
stage two. In stage two, if 𝑝! ≤ 𝛼!, then we reject the null hypothesis mentioned above;
4
otherwise,theQStestfailstorejectthenullhypothesis.Therefore,asymptoticallywehavethe
followingrelationshipamongthesignificancelevels:
𝛼! + 𝛼! 1− 𝛼! = 𝛼.(1)
Thereareinfinitelymanydifferentchoicesfor𝛼!and𝛼!foragivenoverallsignificancelevel
𝛼in(1).IntheoriginalQStest,forconvenience,𝛼!and𝛼!arechosentobe
𝛼! = 𝛼! = 1− 1− 𝛼.(2)
Theoverallp-valuefortheQStestcanbedefinedby
𝑝 − 𝑣𝑎𝑙𝑢𝑒 =𝑝!, 𝑝! ≤ 𝛼!
𝛼! + 𝑝!(1− 𝛼!), 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.(3)
There are some interesting characteristics associatedwith theQS test. First, theoverall p-
value and power depend on the choices of𝛼! and𝛼!, and they also depend on the preset
significance level𝛼.Second, from(3), it isobviousthattheoverall𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≥ min (𝑝!,𝛼!).
Thismaybeashortcomingifoneviewsthep-valueasthedegreeofevidenceagainstthenull
hypothesis(e.g.,asmallerp-valueimpliesastrongerevidence).Third,thep-valueandpowerof
thistestdependontheorderofthetwostages.Fourth,theQStestwasdesignedforsituations
with two hazard curves only. In practice, onemay havemultiple hazard curves to compare;
therefore,itisdesirabletoextendtheQStesttocaseswithmultiplehazardcurves.
To overcome the aforementioned shortcomings of the QS test, and more importantly, to
increase its detecting power and make it suitable for handling multiple hazard curves, we
proposean improvementof theQS test. Througha comprehensive simulation study,wewill
showthatthenewmethodismorepowerfulthantheQStestundermanydifferentsituations.
Wewillalsousethreerealdataexamplestoillustratetheuseoftheproposedmethod.
5
2. Method
2.1.TheQStest
Suppose we have two treatment groups with nj (j=1,2) subjects and the total number of
subjects is n=n1+n2.We denote the D distinct ordered failure times as t1, t2, …, tD. At each
failuretimeti (i=1,2,…,D),dijandYijarethenumberofeventsandthenumberofsubjectsat
riskforgroupj.Letdi=di1+di2,andYi=Yi1+Yi2.TheLRteststatisticusedinthefirststageoftheQS
testcanbewrittenas
𝑈 = 𝑤!!(𝑑!! − 𝑌!!!!!!)!
!!! / !!!!!
!!!!!
!!!!
!!!!!!!!!
𝑑! ,(4)
where𝑤!! = 1here.
Toconstructtheteststatisticinthesecondstage,weneedthefollowingnotations.Foreachj
(j=1,2), let Tkj (k=1,2,…,nj) denote the event timeof the kth subject in group jwhichhas the
cumulative distribution function Fj, and Ckj be the censoring time which has the cumulative
distributionfunctionGj.Denotethesurvivalfunctionsforthesurvivaltimeandcensoringtime
asSj(s)=1-Fj(s)andLj(s)=1-Gj(s),respectively.LetXkj=min(Tkj,Ckj)betheobservedsurvivalor
censored time, 𝛿!" = 𝐼(𝑇!" < 𝐶!") be the censoring indicator, and 𝜋! 𝑠 = 𝑃 𝑋!" > 𝑠 =
𝑆!(𝑠)𝐿!(𝑠)bethesurvivalfunctionoftheobservedtime.Then,underthenullhypothesisthat
thetwohazardfunctionsarethesame,wehave𝐹! = 𝐹!(orequivalently,𝑆! = 𝑆!).
TheteststatisticusedinthesecondstageoftheQStestisaweightedlog-ranktestdefined
by 𝑉 = 𝑠𝑢𝑝!!!!!!!!!(𝑉!),(5)
where𝑚 = 𝐷𝑟 denotestheintegerpartof𝐷𝑟,forany𝑟 ∈ 𝜀, 1− 𝜀 ,0 < 𝜀 < 0.5isagiven
smallnumber,𝑉! = 𝑤!!(!)(𝑑!! − 𝑌!!
!!!!)!
!!! / 𝑤!!(!) !!!
!!
!!!!!
!!!!
!!!!!!!!!
𝑑! ,
6
𝑤!!(!) = −1, 𝑖𝑓 𝑖 = 1,2,… ,𝑚
𝑐!, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,
𝑐! = !! !! !! !!!!! !! !! !
!!! !! !!
∆𝑆(𝑡!)!!!! / !! !! !! !!
!!! !! !! !
!!! !! !!
∆𝑆(𝑡!)!!!!!! ,and𝐿!,𝐿!,and𝑆are
theKaplan-Meierestimates(KaplanandMeier,1958)ofL1,L2andS,respectively.Thep-value
forVcanbeapproximatedbyabootstrapprocedure.
ForthetwostatisticsUandVusedinthetwostagesoftheQStest,wehavethefollowing
niceproperty(QiuandSheng,2008).
Theorem1.ThetwostatisticsUandVin(4)and(5)areasymptoticallyindependentunderthe
nullhypothesisthatthetwohazardfunctionsinquestionarethesame.
2.2.Theproposedtest
SincethetwostatisticsUandVareasymptoticallyindependent,thetwop-valuesobtainedby
thetwostatisticsarealsoasymptoticallyindependentandidenticallydistributedfrom0and1
underthenullhypothesisthatthetwohazardcurvesareequal.Basedonthisfact,wepropose
toobtainanoverallp-valueusingtheFisher-testmethodsince it ismorerobust in thesense
that ithasreasonablepowerwheneitheroneofthetwop-values issmall (Chen,2011;Chen
andNadarajah,2014;Fisher,1932;Owen,2009).Bythismethod,theoverallp-valueisdefined
by
𝑝! = 𝐻!!(−2 ln[𝑝!𝑝!]),(6)
where𝐻 is the survival functionofa randomvariable thathasa chi-squaredistributionwith
degreesoffreedom4basedontheFisher’stheorem(Fisher,1932).
Method(6)canbeextendedtocaseswith𝐾 (𝐾 ≥ 2)hazardcurveseasily, inwhichwecan
make pairwise comparisons for a total of 𝐾(𝐾 − 1)/2 pairs of curves. For each of the
7
comparisons,wecancalculateitsoverallp-valueusing(6).Then,theBonferroniprocedurefor
multiplecomparisonscanbeusedtodeterminewhetherthenullhypothesisshouldberejected
ornot.Morespecifically,ifthesmallestp-valuefromthe𝐾(𝐾 − 1)/2comparisonsislessthan
the given significance level divided by 𝐾(𝐾 − 1)/2, then the null hypothesis is rejected;
otherwiseitisnotrejected.Itshouldbepointedoutthatherewedon’tuseaglobaltestforthe
overallcomparisonalthoughtheadoptionofBonferronicorrectionguaranteesthecontrolling
of family-wise error rate. It is possible that some global tests, if exist, may reject the null
hypothesis at a given significance levelbutall of theadjustedp-values fromourmethodare
greaterthanthatnominallevel.
There are some good properties for the proposed test. Figure 1 shows the acceptance
regionsoftheoriginalQStestandtheproposedmethodbasedontheFishertest,bothwitha
significancelevelof0.05incaseswhenK=2.ItclearlyshowsthattheQStestwouldrejectthe
null hypothesis when either p1 or p2 is less than 0.0253. In contrast, the proposedmethod
basedontheFishertestcanrejectthenullhypothesisincertaincaseswhenbothp1andp2are
larger than0.0253 (i.e., (p1,p2) in the shaded region inFigure1). Inotherwords, for certain
alternativehypotheses thatbothUandVhave considerablepowers todetect, theproposed
methodbasedontheFishertestwouldbemorepowerfulthantheoriginalQStest.Inpractice,
suchalternativehypothesescorrespondtocaseswhenthetwohazardcurvesacrossatanearly
or late time, which are common in applications. On the other hand, for the alternative
hypothesesthatoneofUandVhaslittlepowertodetectbuttheotheronehasagoodpower,
theQS testmay perform slightly better. These alternative hypotheses usually correspond to
caseswhenthetwohazardcurvescrossataquitemiddletimepointortheyarequiteparallel
8
to each other, which are less common in practice. In cases when there aremultiple hazard
curves,itwillbeshownthattheproposedmethodismuchmorepowerfulthantheQStestin
mostscenarios.
Results
3.1. Simulationstudy
Toassesstheperformanceoftheproposedtest,weconductacomprehensivesimulationstudy
to compare itwith theQS tests. Thedistributionsof the survival time (T)areassumed tobe
uniform (𝑈:𝑈(𝜃 − 𝑎,𝜃 + 𝑎)), exponential (𝐸: exp 𝑎 + 𝜃),or log-normal (LN) (𝐿𝑁: 𝐿𝑁(𝜃,𝑎));
the corresponding censoring time (C) follows a uniform (𝑈(𝜃 − 𝑎,𝜃 + 𝑎 + 2(1− 2𝑝)/𝑝)),
exponential (𝐸: exp(𝑎 !!!!
)+ 𝜃), and log-uniform (LU) (𝐿𝑈: 𝐿𝑈(𝜃 + 𝑎𝑈(−2,−2+ 2/𝑝))),
respectively. Here p is the expected censoring rate. In the simulation we consider different
valuesforp:0,0.2,0.4,and0.6.WhenestimatingtheempiricaltypeIerrorrate,weassumethe
distributions for the survival times are identical for all groups. When assess the power, in
additiontocaseswhenallgroupshavethesametypesofdistribution,wealsoconsidercases
when treatment groups have different types of distributions. More specifically, we also
consider the following cases: some groups have uniform distributions and the others have
exponential distributions (denoted as U+E), some groups have uniform distributions and the
others have log-normal distributions (denoted as U+LN), and some groups have exponential
distributions and theothershave log-normal (denotedas E+LN). We compare theproposed
method (denoted as New) with the QS two-stage test using default values 𝛼! = 𝛼! = 1−
1− 𝛼(denotedasTS),thelog-ranktest(denotedasLR),andthetestusedinthesecondstage
9
oftheQStest(denotedasNP).Toseehowthesignificancelevels𝛼!,𝛼!inthetwostageaffect
theperformanceof theQSapproach,wealsoconsider twoQStestswithdifferentvalues for
𝛼!and 𝛼!. Specifically,wechose𝛼! = 0.01,𝛼! = 0.0404 (denotedasTS1), and𝛼! = 0.0404,
𝛼! = 0.01(denotedasTS2).
Wechoosesamplesizestobe50forbothgroups.WeusetheRpackageTSHRCwithdefault
setting(i.e,𝜀 = 0.1,andthenumberofbootstrapsamplesis1000)togetthep-valuesforTS,
LR,andNP.Tocompute theempirical type Ierror rateand thepower,weusea significance
levelof0.05andcomputetheproportionofrejectionsoutof1000replicatedsimulations.
First,weconsider thecaseswhenthereareonly twohazardcurves.Table1belowreports
theempirical type Ierrorrates foreachmethod. Itcanbeseenthatallmethodsconsidered
cancontrolthetypeIerrorratewellindifferentcases,althoughthemethodsTS,NP,andNew
tend tohave slightly smallerempirical type I error rates in certain caseswhen the censoring
ratesarelow.
Table2givestheempiricalpowervaluesforeachmethodunderdifferentsettings.Itshows
that(i)incaseswhenbothLRandNPhavesomepowertodetectthedifference,theproposed
test(i.e.,New)isusuallymorepowerfulthantheoriginalQStest(i.e.,TS),and(ii)incaseswhen
onlyoneofLRandNPhaspower,theQStest isslightlybetterthantheproposedtest.These
resultsareconsistentwithourexpectationsdemonstratedinFigure1.
Next,weconsidersituationswhenwehavefourhazardcurves.Similartotheproposedtest,
forcomparisonofmultiplehazardcurves,thep-valuesfromthepairwiseQStestscanbeused
todeterminewhetherweshouldrejecttheoverallnullhypothesisornot.Inotherwords,ifthe
smallest p-value of the QS test obtained from the𝐾(𝐾 − 1)/2 pairwise comparisons is less
10
thanthegivensignificanceleveldividedby𝐾(𝐾 − 1)/2,thentheovernullhypothesisthatall
hazardcurvesarethesameisrejected;otherwisethenullisnotrejected.Itshouldbepointed
thattheLRtestcanbeapplieddirectlytocaseswhenK>2.ThemethodNPisnotincludedhere
sinceitrequiresanenormousnumberofbootstrapsamplestoaccuratelyestimateitsp-value,
whichisnotfeasibleinthisexamplewithrelativelysmallsamplesizes.
Table3reportstheempiricalsizesofthemethodbasedontheQStests(TS,TS1,andTS2),
the LR test and theproposedmethodNew.Again, all of themethods can control the type I
errorratereasonablywell.
Table4reportstheempiricalpowervaluesforeachmethodunderdifferentsettings.Itcan
beseenthatinmanysituations,theproposedtesthaslargerpowervaluesthantheLRtestand
theQStests.Sometimes,thedifferencesofthepowervaluesbetweentheproposedtestand
theQStestsarebig.Forexample,whenthesurvivaltimesareallfromtheuniformdistributions
forthefourhazardcurves,regardlessofthecensoringrates,theproposedmethodcandetect
thedifferencealmostallthetime.Incontrast,theLRandQSmethodshavemuchlower
powers,especiallywhenthecensoringratesarehigh(e.g.,p=0.4and0.6).
AmongthoseTStestswithdifferentvaluesfor 𝛼! and 𝛼!,ingeneral,asexpected,under
situationswhereLRtestismorepowerfulthantheNPtest,TS2ismorepowerfulthanTS1;on
theotherhand,whenLRislesspowerfulthantheNPtest,TS1ismorepowerfulthanTS2.If
therearetwogroupstobecompared(Tables2andS2),weobservedthatundersome
conditions,theTS1orTS2mayhavelargerempiricalpowervaluesthantheproposedtest.
However,ingeneral,thenewtesthasreasonablepowerunderallconditionsconsideredinthe
simulation:ithassimilarorlargerpowervaluescomparedwiththemaximumpowervalues
11
fromTS,TS1andTS2.Whentherearefourhazardratefunctionstobecompared(Tables4and
S4),thenewtestusuallyhaslargerpowervaluesthanthosefromTS,TS1,andTS2.Inaddition,
withdifferentvaluesof 𝛼! and 𝛼!,theempiricalpowervalueoftheTStests(e.g.,TS1vs.TS2)
maychangedramatically.
ToseehowsamplesizesaffecttheperformanceofthenewtestandtheTStests,wealso
simulateddatawithsamplesize100pergroup.Wekeepalloftheotherparametersthesame
butusesignificancelevel𝛼 = 0.01.ThesimulationresultswerereportedintheSupplementary
TablesS1-S4.WehavesimilarobservationsasthosefromTables1-4:ingeneral,theproposed
testispreferredasithasreasonablepowerunderallofthesituationsconsidered.
3.2. Realdataapplications
Toillustratetheuseoftheproposedtest,weappliedittothreedatasets.Thefirstdatasetwas
obtainedfromastudyaboutthetumorigenesisofadrug(Manteletal.,1977)whichhasbeen
analyzed by Qiu and Sheng (Qiu and Sheng, 2008). In this study, rats were taken from 50
distinctlitters.Oneratfromeachlitterwasrandomlyselectedandgiventhedrug,andanother
tworatswereselectedascontrolsandweregivenaplacebo.Therewere29and81censored
observations in the treatmentgroupandthecontrolgroup, respectively.The twop-values in
thefirstandthesecondstagesoftheQStestwere0.0034and0.051,respectively.Ifweusethe
commonlyusedsignificancelevel0.05,theoverallp-valuefromtheQStestwillbe0.0034and
thenullhypothesis isthenrejected.However, ifwechoosethesignificanceleveltobe0.005,
then the overall p-value will be 0.053, which is larger than the preset significance level.
Therefore,wewillnotrejectthenullhypothesisinsuchcases.Asacomparison,theproposed
12
test has the overall p-value 0.0017,which does not depend on the preset significance level.
Therefore,itismorepowerfulinthisexample.
Theseconddatasetistakenfromastudyonthekidneydialysispatientstoassessthetimeto
first exit site infection (in months) in 119 patients with renal impairment. Among those
patients,43utilizedasurgicallyplacedcatheter(group1),76utilizedapercutaneousplacement
of their catheter (group 2). Therewere 27 and 65 censored observations in groups 1 and 2,
respectively.ThisdatasetwasanalyzedinthepapersbyLinandWang(LinandWang,2004),
andQiuandSheng (QiuandSheng,2008).Using thesignificance levelof0.05,wegot thep-
valuesof0.11and0.00078inthefirststageandthesecondstageoftheQStest,andtheoverall
p-valueis0.026.Ifwechangethesignificancelevelto0.001,theoverallp-valueoftheQStest
becomes0.0013,whichislargerthan0.001;therefore,wewillnotrejectthenullhypothesis.In
contrast,thep-valuefromtheproposedtestis0.00090,whichismuchsmallerthantheoverall
p-valuesoftheQStestinbothcases.
Thethirddatasetwasfromtherandomized,double-blindedDigoxinInterventionTrial(The
DigitalisInvestigationGroup,1997).Inthetrial,patientswithleftventricularejectionfractions
of0.45or lesswere randomlyassigned todigoxin (3397patients)orplacebo (3403patients)
groups.Aprimaryoutcomewasthemortalityduetoworseningheartfailure.Thesubjectswere
categorized based on their treatments and gender. It is of interest to compare the survival
distributionsamongthefourgroups.Weappliedvariousmethodstothisdataset.Table5lists
thep-valuesobtainedbytheQStestandtheproposedtest.Herewehavesixcomparisonsin
total, using 0.05 as the significance level, a comparison with an adjusted p-value less than
0.05/6,i.e.,0.0083,wouldbeclaimedsignificant.Noneofthep-valuesfromQStestislessthan
13
0.0083,however,thep-valueforcomparingthefirstandsecondgroupsfromtheproposedtest
is0.0028.Inaddition,thep-valuesfromtheLRtestandtheWilcoxontestare0.11and0.092,
respectively.
4. DiscussionandConclusion
Inthispaper,weproposedanimprovementoftheQStestbyusingtheFishertestindefining
theoverallp-valueofthetest.Thereareseveraladvantagestousetheproposedmethod.First,
itismorepowerfulthantheoriginalQStestinvariouscases.Second,unliketheQStest,thep-
valueofourproposedmethodisindependentofthepresetsignificancelevelandoftheorder
of the two stages as well. Third, for comparing multiple hazard curves, the proposed test
performsmuchbetterthantheQStestinmanycases.
It shouldbepointedout that in theoriginalQS test, the authors set thedefault values as
𝛼! = 𝛼! = 1− 1− 𝛼.We should choose appropriate values for𝛼! and 𝛼! in the TS test if
priorinformationaboutthehazardcurvesisavailable,sothattheTStesthasoptimaldetecting
power.However, if the valuesof𝛼! and 𝛼! are set inappropriately, theTSmethodmayalso
losepowerdramatically.
BesidestheFishertestconsideredhere,severalotherp-valuecombiningmethodshavebeen
proposedintheliterature.Forexample,theweightedZtestsandthegeneralizedFishertests
arecommonlyusedintheliterature(Chen,2011;ChenandNadarajah,2014;Chenetal.,2014).
The method proposed by Chen and Nadarajah (Chen and Nadarajah, 2014) has similar
performancetothatoftheFishertest.However,themethodsbasedontheweightedZ-testare
notrecommendedsincetheyarenotrobusttothep-valuesobtainedintheindividualstepsof
thetwo-stageschemeandcanpotentiallylosepowerdramatically.Inaddition,ifwehavesome
14
prior information about the hazard curves, powerfulmethods of combining p-values can be
designedaccordingly.
Intheliterature,therearemanymultiplecomparisonmethods,suchastheSidakprocedure,
Tukey’s procedure, and Dunnett’s method. However, all of them require some assumptions
thatmaynotbevalidhere.Asacomparison,theBonferroniprocedureadoptedheredoesnot
requireanyassumptions, suchas the independenceamong individualp-values. Furthermore,
our simulation study has shown that the proposed method using the Bonferroni procedure
workswellintermsofcontrollingthetypeIerrorrate.
It shouldbepointedout that, although theoriginalQS test and theproposedmethodare
designed for comparing hazard rate functions, they can also be used to compare survival
curves. In practice, one may want to compare certain quantiles (e.g., the survival median)
(BrookmeyerandCrowley,1982;Chen,2014a,b;ChenandZhang,2016)insteadofthewhole
survival curves. In such cases, both the QS test and the proposed new method cannot be
applieddirectly,andmuchfutureresearchisneeded.
References
Brookmeyer,R.,Crowley,J.,1982.Ak-samplemediantestforcensoreddata.J.Amer.Statist.Assoc.77,433-440.
Chen,Z.,2011.Istheweightedz-testthebestmethodforcombiningprobabilitiesfromindependenttests?J.Evol.Biol.24,926-930.
Chen,Z.,2014a.ExtensionofMood’smediantestforsurvivaldata.Statistics&ProbabilityLetters95,77-84.
Chen,Z.,2014b.ANonparametricApproachtoDetectingtheDifferenceofSurvivalMedians.CommunStatSimulat,(inpress),DOI:10.1080/03610918.03612014.03964804.
Chen,Z.,Nadarajah,S.,2014.Ontheoptimallyweightedz-testforcombiningprobabilitiesfromindependentstudies.Comput.Stat.DataAnal.70,387–394.
15
Chen,Z.,Yang,W.,Liu,Q.,Yang,J.Y.,Li,J.,Yang,M.Q.,2014.Anewstatisticalapproachtocombiningp-valuesusinggammadistributionanditsapplicationtogenome-wideassociationstudy.BMCBioinformatics15(Suppl17),S3.
Chen,Z.,Zhang,G.,2016.Comparingsurvivalcurvesbasedonmedians.BMCMed.Res.Methodol.16,1.
Cheng,M.-Y.,Qiu,P.,Tan,X.,Tu,D.,2009.Confidenceintervalsforthefirstcrossingpointoftwohazardfunctions.LifetimeDataAnal.15,441-454.
Fisher,R.A.,1932.StatisticalMethodsforResearchWorkers.OliverandBoyd,Edinburgh.
Kaplan,E.L.,Meier,P.,1958.Nonparametricestimationfromincompleteobservations.J.Amer.Statist.Assoc.53,457-481.
Lee,E.T.,Wang,J.,2003.Statisticalmethodsforsurvivaldataanalysis.Wiley-Interscience.
Li,H.,Han,D.,Hou,Y.,Chen,H.,Chen,Z.,2015.StatisticalInferenceMethodsforTwoCrossingSurvivalCurves:AComparisonofMethods.PLoSOne10.
Lin,X.,Wang,H.,2004.Anewtestingapproachforcomparingtheoverallhomogeneityofsurvivalcurves.BiometricalJournal46,489-496.
Liu,K.,Qiu,P.,Sheng,J.,2007.ComparingtwocrossinghazardratesbyCoxproportionalhazardsmodelling.Stat.Med.26,375-391.
Mantel,N.,Bohidar,N.R.,Ciminera,J.L.,1977.Mantel-Haenszelanalysesoflitter-matchedtime-to-responsedata,withmodificationsforrecoveryofinterlitterinformation.CancerRes.37,3863-3868.
Mantel,N.,Stablein,D.M.,1988.Thecrossinghazardfunctionproblem.TheStatistician,59-64.
Moreau,T.,Maccario,J.,Lellouch,J.,Huber,C.,1992.Weightedlogrankstatisticsforcomparingtwodistributions.Biometrika79,195-198.
Owen,A.B.,2009.KarlPearson'smeta-analysisrevisited.Ann.Statist37,3867–3892.
Park,K.Y.,Qiu,P.,2014.Modelselectionanddiagnosticsforjointmodelingofsurvivalandlongitudinaldatawithcrossinghazardratefunctions,.Stat.Med.33,4532–4546.
Qiu,P.,Sheng,J.,2008.Atwo-stageprocedureforcomparinghazardratefunctions.J.Roy.Stat.Soc.Ser.B.(Stat.Method.)70,191-208.
TheDigitalisInvestigationGroup,1997.Theeffectofdigoxinonmortalityandmorbidityinpatientswithheartfailure.N.Engl.J.Med.336,525-533.
16
Tables:
Table 1.Empirical type I error rates of eachmethodunder different settingswith sample size 50 foreachgroup(twogroups)andthesignificancelevel0.05.Theresultswereobtainedfrom1000replicates.distribution method censoringrate
0,0 0.2,0.2 0.4,0.4 0.6,0.6 0.4,0.6U:
𝜃 = (6,6)𝑎 = (2,2)
TS 0.046 0.045 0.043 0.049 0.046TS1 0.044 0.042 0.043 0.048 0.045TS2 0.049 0.049 0.050 0.052 0.050LR 0.053 0.055 0.052 0.052 0.049NP 0.039 0.036 0.041 0.046 0.043New 0.042 0.040 0.043 0.048 0.046
E:𝜃 = (12,12)𝑎 = (0.1,0.1)
TS 0.044 0.045 0.047 0.049 0.044TS1 0.043 0.042 0.045 0.048 0.042TS2 0.049 0.049 0.054 0.052 0.049LR 0.051 0.051 0.055 0.050 0.050NP 0.039 0.039 0.039 0.045 0.039New 0.040 0.043 0.044 0.046 0.043
LN:𝜃 = (5,5)𝑎 = (1,1)
TS 0.047 0.046 0.045 0.050 0.049TS1 0.045 0.046 0.040 0.050 0.046TS2 0.051 0.049 0.052 0.052 0.053LR 0.054 0.050 0.053 0.051 0.053NP 0.040 0.042 0.035 0.046 0.043New 0.045 0.040 0.041 0.051 0.046
Table2.Empiricalpowersofeachmethodunderdifferentsettingswithsamplesize50foreachgroup(twogroups)andthesignificancelevel0.05.Theresultswereobtainedfrom1000replicates.distribution method censoringrate
0,0 0.2,0.2 0.4,0.4 0.6,0.6 0.4,0.6U:
𝜃 = (6,6)𝑎 = (8,3)/3
TS 0.616 0.547 0.4750 0.375 0.388TS1 0.606 0.607 0.550 0.445 0.465TS2 0.549 0.444 0.355 0.298 0.287LR 0.243 0.105 0.049 0.067 0.068NP 0.569 0.605 0.570 0.469 0.488New 0.769 0.586 0.463 0.381 0.367
E:𝜃 = (12,10)𝑎 = (4,5)/40
TS 0.730 0.743 0.749 0.774 0.780TS1 0.720 0.734 0.740 0.757 0.783TS2 0.709 0.721 0.728 0.751 0.757LR 0.593 0.595 0.600 0.631 0.639NP 0.632 0.652 0.637 0.614 0.729New 0.810 0.825 0.832 0.846 0.854
LN:𝜃 = (5,5.6)𝑎 = (1,1)
TS 0.723 0.676 0.604 0.543 0.575TS1 0.635 0.585 0.514 0.447 0.468TS2 0.762 0.721 0.648 0.598 0.630
17
LR 0.773 0.733 0.666 0.619 0.642NP 0.265 0.243 0.204 0.145 0.120New 0.748 0.708 0.625 0.560 0.581
U+E:𝜃 = (10, 4)𝑎 = (5,0.1)
TS 0.908 0.607 0.544 0.526 0.659TS1 0.832 0.553 0.577 0.593 0.700TS2 0.927 0.622 0.456 0.409 0.547LR 0.899 0.485 0.161 0.069 0.115NP 0.156 0.342 0.557 0.613 0.715New 0.991 0.789 0.592 0.520 0.658
U+LN:𝜃 = (5, 1)𝑎 = (5,1)
TS 0.733 0.449 0.352 0.404 0.375TS1 0.793 0.462 0.296 0.303 0.295TS2 0.645 0.419 0.383 0.471 0.419LR 0.163 0.287 0.393 0.490 0.440NP 0.800 0.422 0.136 0.049 0.086New 0.752 0.505 0.384 0.388 0.366
E+LN:𝜃 = (0, 1)𝑎 = (0. 2, 2)
TS 0.667 0.525 0.401 0.269 0.301TS1 0.667 0.582 0.467 0.306 0.351TS2 0.602 0.425 0.296 0.213 0.228LR 0.247 0.097 0.053 0.096 0.080NP 0.616 0.588 0.491 0.308 0.369New 0.864 0.538 0.369 0.278 0.297
Table 3. Empirical type I error rates of eachmethodunder different settingswith sample size 50 foreachgroup(fourgroups)andthesignificancelevel0.05.Theresultswereobtainedfrom1000replicates.distribution method censoringrate
0,0,0,0 (2,2,2,2)/10 (4,4,4,4)/10 (6,6,6,6)/10 (4,6,4,6)/10U:
𝜃 = (6,6,6,6)𝑎 = (2,2,2,2)
TS 0.046 0.040 0.040 0.044 0.048TS1 0.046 0.040 0.040 0.044 0.048TS2 0.046 0.040 0.040 0.044 0.048LR 0.059 0.057 0.049 0.055 0.049New 0.030 0.031 0.035 0.044 0.043
E:𝜃 = (12,12,12,12)𝑎 = (0.1,0.1,0.1,0.1)
TS 0.044 0.035 0.048 0.043 0.044TS1 0.044 0.035 0.048 0.043 0.044TS2 0.044 0.035 0.048 0.043 0.044LR 0.053 0.046 0.059 0.063 0.055New 0.029 0.028 0.045 0.050 0.035
LN:𝜃 = (5,5,5,5)𝑎 = (1,1,1,1)
TS 0.060 0.052 0.047 0.058 0.051TS1 0.060 0.052 0.047 0.058 0.051TS2 0.060 0.052 0.047 0.058 0.051LR 0.060 0.051 0.076 0.073 0.054New 0.032 0.037 0.040 0.050 0.052
Table4.Empiricalpowersofeachmethodunderdifferentsettingswithsamplesize50foreachgroup(fourgroups)andthesignificancelevel0.05.Theresultswereobtainedfrom1000replicates.distribution method censoringrate
0,0,0,0 (2,2,2,2)/10 (4,4,4,4)/10 (6,6,6,6)/10
(4,6,4,6)/10
18
U:𝜃 = (6, 6,6,6)𝑎 = (8,8,3,3)/3
TS 0.676 0.380 0.069 0.286 0.217TS1 0.676 0.380 0.069 0.286 0.217TS2 0.676 0.380 0.069 0.286 0.217LR 0.823 0.507 0.092 0.329 0.149New 1.000 1.000 1.000 0.994 0.999
E:𝜃 = (12,10,12,10)𝑎 = (4,4,5,5)/40
TS 0.412 0.418 0.424 0.484 0.502TS1 0.412 0.418 0.424 0.484 0.502TS2 0.412 0.418 0.424 0.484 0.502LR 0.441 0.449 0.476 0.554 0.557New 0.736 0.789 0.802 0.842 0.828
LN:𝜃 = (5,5,5.6,5.6)𝑎 = (1, 1,1,1)
TS 0.850 0.814 0.773 0.687 0.725TS1 0.850 0.814 0.773 0.687 0.725TS2 0.850 0.814 0.773 0.687 0.725LR 0.913 0.876 0.857 0.765 0.803New 0.854 0.814 0.767 0.654 0.716
U+E:𝜃 = (10,10,4,4)𝑎 = (5,5, 0.1,0.1)
TS 0.931 0.483 0.135 0.063 0.108TS1 0.931 0.483 0.135 0.063 0.108TS2 0.931 0.483 0.135 0.063 0.108LR 0.976 0.602 0.192 0.076 0.142New 0.933 0.537 0.509 0.526 0.537
U+LN:𝜃 = (5,5,1, 1)𝑎 = (5, 5,1,1)
TS 0.166 0.293 0.445 0.550 0.511TS1 0.166 0.293 0.445 0.550 0.511TS2 0.166 0.293 0.445 0.550 0.511LR 0.194 0.342 0.497 0.628 0.537New 0.842 0.602 0.437 0.470 0.455
E+LN:𝜃 = (0,0,1,1)
𝑎 = ( 0.2,0.2,2,2)
TS 0.198 0.079 0.039 0.113 0.065TS1 0.198 0.079 0.039 0.113 0.065TS2 0.198 0.079 0.039 0.113 0.065LR 0.310 0.108 0.049 0.126 0.075New 0.610 0.471 0.369 0.291 0.352
Table5.P-valuesobtainedbytheQStestandtheproposedmethodfromeachpairofgroupsoftheDITdata.
Grouppair LR NP TS New1vs.2 0.019 0.026 0.019 0.00281vs.3 0.17 0.32 0.36 0.211vs.4 0.38 0.16 0.18 0.232vs.3 0.86 0.016 0.041 0.0732vs.4 0.47 0.012 0.037 0.0353vs.4 0.66 0.69 0.70 0.81