8
Statistics for Biology and Health Series Editors: M. Gail K. Krickeberg J. Samet A. Tsiatis W.Wong For other titles published in this series, go to http://www.springer.com/series/2848

Statistics for Biology and Health - Springer978-1-4419-1572-6/1.pdf · Preface This book is intended to provide a text on statistical methods for detecting clus-ters and/or clustering

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Statistics for Biology and HealthSeries Editors: M. Gail K. Krickeberg J. Samet A. Tsiatis W.Wong

For other titles published in this series, go to http://www.springer.com/series/2848

Toshiro Tango

Statistical Methods for DiseaseClustering

Springer is part of Springer Science+Business Media (www.springer.com)

Springer New York Dordrecht Heidelberg London

Editors:

or dissimilar methodology now known or hereafter developed is forbidden.

to proprietary rights.

© Springer Science+Business Media, LLC 2010All rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY

with any form of information storage and retrieval, electronic adaptation, computer software, or by similar

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they arenot identified as such, is not to be taken as an expression of opinion as to whether or not they are subject

Toshiro Tango

National Institute of Public Health3-6 Minami 2 chomeWako, Saitama351-0197 [email protected]

ISSN 1431-8776

10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection

ISBN 978-1-4419-1571-9 e-ISBN 978-1-4419-1572-6 DOI 10.1007/978-1-4419-1572-6

Library of Congress Control Number: 2010920016

M. Gail National Cancer Institute Bethesda, MD 20892 USA

K. Krickeberg Le Chatelet F-63270 Manglieu France

Jonathan M. Samet Department of Preventive Medicine Keck School of Medicine University of Southern California 1441 Eastlake Ave. Room 4436, MC 9175 Los Angeles, CA 90089 USA

A. Tsiatis Department of Statistics North Carolina State University Raleigh, NC 27695 USA

W. Wong Department of Statistics Stanford University Stanford, CA 94305-4065 USA

Printed on acid-free paper

Department of Technology Assesment & Biostatistics

Preface

This book is intended to provide a text on statistical methods for detecting clus-ters and/or clustering of health events that is of interest to final-year undergraduate-and graduate-level statistics, biostatistics, epidemiology, and geography students butwill also be of relevance to public health practitioners, statisticians, biostatisticians,epidemiologists, medical geographers, human geographers, environmental scien-tists, and ecologists. Prerequisites are introductory biostatistics and epidemiologycourses.

With increasing public health concerns about environmental risks, the need forsophisticated methods for analyzing spatial health events is immediate. Further-more, the research area of statistical tests for disease clustering now attracts a wideaudience due to the perceived need to implement wide-ranging monitoring systemsto detect possible health-related bioterrorism activity. With this background and thedevelopment of the geographical information system (GIS), the analysis of diseaseclustering of health events has seen considerable development over the last decade.Therefore, several excellent books on spatial epidemiology and statistics have re-cently been published. However, it seems to me that there is no other book solelyfocusing on statistical methods for disease clustering. I hope that readers will findthis book useful and interesting as an introduction to the subject.

Although the view of statistical methods of disease clustering embodied in thisbook is, of course, my own, it has been formed over many years through collab-oration and contact with many statisticians. Especially, I must acknowledge thetremendous debt I owe to Martin Kulldorff, who has always provided me with in-valuable insight and suggestions for improving my original ideas. I also thank Ku-nihiko Takahashi for preparing several figures and carefully reading the final text.My thanks also go to John Kimmel of Springer for inviting me to write this bookand providing continual support and encouragement. Finally, I would like to thankTaeko Becque for checking my poor English.

Tokyo Toshiro TangoJuly 2009

v

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Classification of Disease Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Data Used for Disease Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Organization of the Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Statistical Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5.2 SaTScan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5.3 FleXScan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5.4 Splancs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Clustering and Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Spatial Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Spatial Point Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Homogeneous Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Inhomogeneous Poisson Process . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Back to the Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Approaches Using Regional Count Data . . . . . . . . . . . . . . . . . . . . . . . 152.5 Approaches Using Case-Control Location Data . . . . . . . . . . . . . . . . . 252.6 Monte Carlo Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.7 Spatial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Disease Mapping: Visualization ofSpatial Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Basic Models for Relative Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3 Likelihood Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4 Poisson-Gamma Bayesian Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.1 Empirical Bayes Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4.2 Hierarchical Full Bayes Estimator . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Hierarchical Bayesian Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

vii

viii Contents

3.5.1 Log-normal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.2 Conditional Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . 46

4 Tests for Temporal Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2 Null Hypothesis vs. Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . 514.3 Historical Overview of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.4 Selected Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.4.1 Ederer-Myers-Mantel’s Method for Count Data . . . . . . . . . . . 564.4.2 Naus’ Scan Statistic for Point Data . . . . . . . . . . . . . . . . . . . . . 574.4.3 Nagarwalla’s Scan Statistic for Point Data . . . . . . . . . . . . . . . 584.4.4 Kulldorff’s Scan Statistic for Count Data . . . . . . . . . . . . . . . . 594.4.5 Tango’s Index for Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.5 Illustration with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.5.1 Congenital Oesophageal Atresia Data . . . . . . . . . . . . . . . . . . . 624.5.2 Trisomy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 General Tests for Spatial Clustering: Regional Count Data . . . . . . . . . . 715.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.2 Null Hypothesis vs. Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . 735.3 Historical Overview of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.4 Selected Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.4.1 Tango’s Index for Spatial Clustering . . . . . . . . . . . . . . . . . . . . 865.4.2 Kulldorff’s Circular Spatial Scan Statistic . . . . . . . . . . . . . . . . 885.4.3 Tango and Takahashi’s Flexible Spatial Scan Statistic . . . . . . 895.4.4 Tango’s Spatial Scan Statistic with Restricted Likelihood

Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.5 Illustration with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5.1 Japanese Gallbladder Cancer Mortality Data . . . . . . . . . . . . . 915.5.2 New York Incident Leukemia Cases . . . . . . . . . . . . . . . . . . . . . 100

5.6 Power Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6 General Tests for Spatial Clustering : Case-Control Point Data . . . . . . 1136.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.2 Null Hypothesis vs. Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . 1156.3 Historical Overview of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.4 Selected Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.4.1 Cuzick and Edwards’ Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.4.2 Tango’s Index for Spatial Clustering . . . . . . . . . . . . . . . . . . . . 1216.4.3 Diggle and Chetwynd’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.4.4 Kulldorff’s Spatial Scan Statistic . . . . . . . . . . . . . . . . . . . . . . . 128

6.5 Illustration with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.5.1 Leukemia and lymphoma in North Humberside . . . . . . . . . . . 1296.5.2 Early Medieval Grave Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Contents ix

6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7 Tests for Space-Time Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.2 Null Hypothesis vs Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . 1507.3 Historical Overview of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1517.4 Selected Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.4.1 Knox’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1607.4.2 Mantel’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617.4.3 Baker’s Max Test for the Knox Test . . . . . . . . . . . . . . . . . . . . . 1627.4.4 Jacquez’s k-NN Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637.4.5 Diggle et al.’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1647.4.6 Kulldorff and Hjalmars’s Approach for the Knox Test . . . . . . 167

7.5 Illustrations with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1687.5.1 Kaposi’s Sarcoma in the West Nile Distric of Uganda . . . . . . 168

7.6 Power Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

8 Focused Tests for Spatial Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1818.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1828.2 Null Hypothesis vs. Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . 1838.3 Historical Overview of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1858.4 Selected Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

8.4.1 Stone’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1918.4.2 Bithell’s Linear Risk Score Test . . . . . . . . . . . . . . . . . . . . . . . . 1928.4.3 Waller and Lawson’s Score Test . . . . . . . . . . . . . . . . . . . . . . . . 1928.4.4 Tango’s Score Test for Decline Trend . . . . . . . . . . . . . . . . . . . 1938.4.5 Tango’s Score Test for Peak-Decline Trend . . . . . . . . . . . . . . . 1948.4.6 Diggle, Morris, and Morton-Jones’ Test Based on

Case-Control Point Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1958.5 Illustration with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

8.5.1 Infant Deaths Around Municipal Solid Waste Incinerators . . 1968.5.2 Leukemia Cases Near Inactive Hazardous Waste Sites . . . . . 2038.5.3 Larynx and Lung Cancer Near a Disused Incinerator . . . . . . . 204

8.6 Power Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

9 Space-Time Scan Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2119.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2119.2 Null Hypothesis vs. Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . 2139.3 Historical Overview of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

9.3.1 Retrospective Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2159.3.2 Syndromic Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

9.4 Selected Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2199.4.1 Kulldorff’s Cylindrical Space-Time Scan Statistic . . . . . . . . . 2199.4.2 Takahashi et al.’s Prismatic Space-time Scan Statistic . . . . . . 220

9.5 Illustration with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

x Contents

9.5.1 Syndromic Surveillance of the Massachusetts Data . . . . . . . . 2229.6 Power Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2249.7 Discussion with a New Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

A List of R functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245