54
1 2 3 Points to Consider Document: 4 Scientific and Regulatory 5 Considerations for the Analytical 6 Validation of Assays Used in the 7 Qualification of Biomarkers in 8 Biological Matrices 9 10 11 Biomarker Assay Collaborative Evidentiary Considerations 12 Writing Group, Critical Path Institute (C-Path) 13 14 Steven P. Piccoli, Neoteric Consulting & John Michael Sauer, Critical Path Institute 15 16 Contributing Authors: 17 Shashi Amur, U.S. FDA; Jiri Aubrecht, Pfizer; Amanda Baker, Critical Path Institute; Robert Becker, 18 U.S. FDA; Jennifer Burkey, Critical Path Institute; Robert Dean, Eli Lilly; Martha Donoghue, U.S. FDA; 19 Russell Grant, LabCorp; Steven Gutman, Illumina; Kylie Haskins, U.S. FDA; John Kadavil, U.S. FDA; 20 Nicholas King, Critical Path Institute; Jean Lee, BioQualQuan; Vasum Peiris, U.S. FDA; Afshin Safavi, 21 BioAgilytix; Shelli Schomaker, Pfizer; Meena Subramanyam, Takeda; Sue Jane Wang, U.S. FDA 22 Biomarker Assay Collaborative Evidentiary Considerations Writing Group 1

Points to Consider Document - Duke University · 2019-05-23 · 1 2 3 Points to Consider Document: 4 Scientific and Regulatory 5 Considerations for the Analytical 6 Validation of

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    2

    3 Points to Consider Document: 4 Scientific and Regulatory 5 Considerations for the Analytical 6 Validation of Assays Used in the 7 Qualification of Biomarkers in 8 Biological Matrices

    9

    10

    11 Biomarker Assay Collaborative Evidentiary Considerations

    12 Writing Group, Critical Path Institute (C-Path) 13

    14 Steven P. Piccoli, Neoteric Consulting & John Michael Sauer, Critical Path Institute 15 16 Contributing Authors: 17 Shashi Amur, U.S. FDA; Jiri Aubrecht, Pfizer; Amanda Baker, Critical Path Institute; Robert Becker,

    18 U.S. FDA; Jennifer Burkey, Critical Path Institute; Robert Dean, Eli Lilly; Martha Donoghue, U.S. FDA;

    19 Russell Grant, LabCorp; Steven Gutman, Illumina; Kylie Haskins, U.S. FDA; John Kadavil, U.S. FDA; 20 Nicholas King, Critical Path Institute; Jean Lee, BioQualQuan; Vasum Peiris, U.S. FDA; Afshin Safavi,

    21 BioAgilytix; Shelli Schomaker, Pfizer; Meena Subramanyam, Takeda; Sue Jane Wang, U.S. FDA

    22

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 1

  • 23 Table of Contents

    24 Points to Consider Document: Scientific and Regulatory Considerations for the Analytical 25 Validation of Assays Used in the Qualification of Biomarkers in Biological Matrices ...................... 1

    26 List of Tables ................................................................................................................................ 4

    27 List of Figures ............................................................................................................................... 4

    28 Introduction ................................................................................................................................. 5 29 Biomarker Qualification and the Context of Use .................................................................................... 6 30 Analytical Validation vs Clinical Validation............................................................................................. 8 31 Biomarker Assay Validation and Fit-for-Purpose .................................................................................... 8

    32 History of Guidance Documents Relevant to Assay Validation..................................................... 11

    33 Assay Design, Development and Validation ................................................................................ 14 34 Assay Design and Technology Selection ............................................................................................... 14 35 Pre-Analytical ............................................................................................................................................. 16 36 Analytical Performance Requirements for Biomarker Assays ................................................................... 19 37 Assay Performance..................................................................................................................................... 24 38 Assay Validation Acceptance Criteria................................................................................................... 28 39 Accuracy (Relative)..................................................................................................................................... 29 40 Analytical Measurement Range (AMR) ...................................................................................................... 29 41 Parallelism .................................................................................................................................................. 31 42 Reproducibility ........................................................................................................................................... 35 43 Selectivity ................................................................................................................................................... 35 44 Specificity ................................................................................................................................................... 35 45 Stability (Sample) ....................................................................................................................................... 36

    46 Case Study: Analytical Validation Approach for Kidney Safety Biomarkers................................... 37

    47 Conclusions ................................................................................................................................ 42

    48 References ................................................................................................................................. 43

    49 Appendix 1. Assay Performance Characteristics Definitions........................................................ 46 50 Accuracy (Relative) ............................................................................................................................. 46 51 Analytical Measurement Range (AMR) ................................................................................................ 46 52 Analytical Validation ........................................................................................................................... 46 53 Bias .................................................................................................................................................... 46 54 Characterization of Reference Materials (and Stability)........................................................................ 46 55 Context of Use .................................................................................................................................... 46 56 Detection Limit or Limit of Detection (LOD) ......................................................................................... 47 57 Intended Use ...................................................................................................................................... 47 58 Linearity/Dilution Verification/Parallelism........................................................................................... 47 59 Lower Limit of Quantitation (LLOQ) and Upper Limit of Quantitation (ULOQ) ....................................... 47 60 Parallelism.......................................................................................................................................... 47 61 Precision............................................................................................................................................. 48 62 Quality Control/Reproducibility .......................................................................................................... 48 63 Reportable Range ............................................................................................................................... 48

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 2

  • 64 Robustness and Ruggedness................................................................................................................ 48 65 Selectivity/Interference ...................................................................................................................... 49 66 Sensitivity........................................................................................................................................... 49 67 Specificity ........................................................................................................................................... 49 68 Spike Recovery ................................................................................................................................... 50 69 Stability .............................................................................................................................................. 50 70 Standard/Calibration Curve Range and Model ..................................................................................... 51 71 Working Range/Reportable Range....................................................................................................... 51

    72 Appendix 2. Pre-Analytical Resources ........................................................................................ 52 73 Websites ............................................................................................................................................ 52 74 Literature Citations ............................................................................................................................. 52 75

    76

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 3

  • 77 List of Tables

    78 Table 1: Approaches for Biomarker Assay Validation..........................................................................10 79 Table 2: CLSI Guidelines Related to the Validation of Biomarker Assays ............................................12 80 Table 3: Points to Consider in Assay Design and Development ...........................................................14 81 Table 4: Examples of Pre-Analytical Factors to be Considered ...........................................................17 82 Table 5: Example data and two-level Nested ANOVA for Preliminary CVI and CVG determination ....20 83 Table 6: Calculating TAE from Bias and Precision and Determining Measurement Ranges

    84 (Uncertainty) .........................................................................................................................................22 85 Table 7: Seven Key Analytical Parameters to be Considered during Biomarker Assay Validation......25 86 Table 8: Additional Analytical Parameters to be Considered during Biomarker Assay Validation .....26 87 Table 9: Comparison of Regulatory Expectations for Precision Validation Studies ............................26 88 Table 10: Considerations for Evaluating Inter-laboratory vs. Intra-laboratory Reproducibility..........28 89 Table 11: Example Scenarios for Parallelism Assessment. ...................................................................33 90 Table 12: Pre-Analytical Factors Considered during the Validation of Neutrophil Gelatinase91 Associated Lipocalin (NGAL) (specific to the BioPorto assay) ..............................................................39 92 Table 13: Analytical Parameters Evaluated during the Validation of Neutrophil Gelatinase93 Associated Lipocalin (NGAL) .................................................................................................................41 94 Table 14: Summary of the Neutrophil Gelatinase-Associated Lipocalin (NGAL) Validation ...............41 95 Table 15: CLSI guidelines for Pre-analytical Variables .........................................................................52 96

    97 List of Figures

    98 Figure 1: Definitions of Precision and Accuracy in terms of Random, Systematic and Total Analytical 99 Errors.....................................................................................................................................................21

    100 Figure 2: Extrapolation of Measurement Uncertainty from TAE .........................................................23 101 Figure 3: Influence of power analysis on measurement differences as a function of TAE or CVA.......24 102

    103

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 4

  • 104 Introduction

    105 The characterization and analytical validation of biomarker assay performance to define assay 106 capability and limitations is an important aspect of biomarker qualification. To ensure accuracy and 107 reproducibility of the data generated to support biomarker qualification, assays should be 108 analytically validated before confirmatory clinical validation studies for the biomarker are 109 performed. It is important that the assay procedure and resulting measurements are suitable for its 110 intended purpose. Measurement errors that could result in biases and affect the biomarker’s 111 predictive accuracy would thus limit its utility as a valuable Drug Development Tool (DDT).

    112 Inherent in the measurement of biomarkers, unlike the measurement of xenobiotics (drugs), is that 113 biomarkers are endogenous entities or molecules. Therefore, biomarker assays typically measure an 114 increase or decrease in the endogenous level of the molecule which often fluctuates because of 115 individual variability in physiology, disease biology, pathology, comorbidities, treatment 116 administered, and environmental factors. Given these factors, the requirements and expectations 117 for assays used in the qualification of biomarkers must take into consideration 1) the type of 118 molecules being measured and 2) the context in which the biomarker is being applied in drug 119 development and in regulatory decision-making.

    120 The key acceptance criteria for the analytical validation (hereafter, unless otherwise specified, the 121 term validation refers exclusively to analytical validation) of Pharmacokinetic assays (i.e., drug 122 concentration), and for in vitro diagnostic devices (IVD) used in clinical practice, have been well 123 defined but are not universally transferable or applicable to biomarker assays as DDTs. This is 124 because the expectations (both clinical and analytical) for assays used to support biomarker 125 qualification are distinct. While the criteria used in the validation of drug concentration assays and 126 IVD devices used in clinical practice can be applied as a framework for the development of criteria 127 for biomarker assay validation, they cannot be adopted unequivocally. Thus, the analytical 128 validation of assays used to generate data for biomarker qualification must be refined to fit the 129 proposed drug development context of use.

    130 The goal of this document is to define the scientific and regulatory considerations for the analytical 131 validation of assays for fluid-based biomarkers used in the qualification of DDTs. It does not address 132 exploratory activities for internal use only. The topics to be discussed include considerations for 133 assay design and technology selection, optimization of pre-analytical factors, core assay 134 performance expectations, and setting minimally acceptable assay performance criteria. Technology 135 areas covered include singleplex ligand and immuno-binding assays, mass spectrometry, and 136 enzyme based assays. Out of scope of this document are, immunohistochemistry (IHC), flow 137 cytometry, genetics, genomics, imaging biomarkers, and multiplex assays. Likewise, the 138 development and analytical validation of assays to be used in commercial clinical practice (i.e., those 139 regulated solely by Clinical Laboratory Improvement Amendments of 1988 [CLIA]), as well as assays 140 used for measuring exploratory biomarkers in clinical drug development, are outside the scope of 141 this document. However, the general analytical validation principles outlined in this document for 142 biomarker assays may also be applicable to biomarker methods used in clinical development of 143 biopharmaceutics.

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 5

  • 144 The two primary areas of this document that require consensus and agreement are 1) the 145 experimental characterization of the biomarker assays used in qualification (Assay Design, 146 Development and Validation), and 2) the approach to defining the requisite assay performance and 147 acceptance criteria (Assay Validation Acceptance Criteria). It is recognized that multiple iterations of 148 each may be necessary to achieve the final objectives.

    149 Biomarker Qualification and the Context of Use 150 The U.S. Food and Drug !dministration’s (FD!) �iomarker Qualification Program (�QP) is designed 151 to provide a mechanism for external stakeholders to work with the Center for Drug Evaluation and 152 Research (CDER) to develop biomarkers for use as tools in the drug development process (FDA 153 2017). The goals of the BQP are to provide a platform to 1) qualify biomarkers and make supporting 154 information publicly available, 2) facilitate uptake of qualified biomarkers in the regulatory review 155 process, and 3) encourage the identification of new biomarkers for use in drug development and 156 regulatory decision-making (Amur et al. 2015). Terms used in biomarker qualification have been 157 defined by the FDA-NIH Biomarker Working Group and can be found in the BEST (Biomarkers, 158 EndpointS, and other Tools) Resource (BEST Resource 2016) under BEST glossary.

    159 ! biomarker is a “defined characteristic that is measured as an indicator of normal biological 160 processes, pathogenic processes, or responses to an exposure or intervention, including 161 therapeutic interventions. Molecular, histologic, radiographic, or physiologic characteristics 162 are types of biomarkers. A biomarker is not an assessment of how an individual feels, 163 functions, or survives” (BEST Resource 2016).

    164 Qualification is defined as “a conclusion, based on a formal regulatory process, that within 165 the stated context of use (COU), a medical product development tool can be relied upon to 166 have a specific interpretation and application in medical product development and 167 regulatory review” (BEST Resource 2016).

    168 Once a biomarker is qualified, it can be used for the qualified COU in drug development programs 169 without the need for CDER to re-review the supporting information.

    170 The �ontext of Use (�OU) is “A statement that fully and clearly describes the way the 171 medical product development tool is to be used and the medical product development172 related purpose of the use” (BEST Resource 2016).

    173 ! biomarker’s �OU should be proposed early in the biomarker qualification process, at the Letter of 174 Intent stage (FDA 2014a), as it is the basis of the level of evidence that needs to be considered for 175 qualification. The COU may be modified, as needed, as new data are acquired. The COU consists of a 176 concise ‘Use Statement’ containing the biomarker’s name, identity and proposed use in drug 177 development, as well as the ‘�onditions for Qualified Use’, a comprehensive description of how the 178 biomarker will be used in the qualified setting (FDA 2014b). The proposed use in drug development 179 may include assessment of susceptibility or risk, diagnosis of disease or disease subtype, prognosis 180 of disease severity, prediction of clinical outcome of interest or patient early response or 181 tolerability, and monitoring of toxicity or response to therapy. It should be noted that the aims of

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 6

    http://www.ncbi.nlm.nih.gov/books/NBK338448/def-item/medical-product-development-tool/

  • 182 the COUs are specific to drug development and do not necessarily directly overlap with the 183 indications for use of an FDA Premarket Approval Application (PMA) or Premarket Notification 184 (510(k)) for IVD devices used in clinical practice. As such, there is a continuum of validation 185 requirements to address the needs of qualifications resultant from disparate COUs.

    186 The COU also helps define the fit-for-purpose expectations for the validation of the biomarker assay.

    187 Fit-for-purpose is a conclusion that the level of [assay] validation associated with a medical 188 product development tool is sufficient to support its context of use (BEST Resource 2016).

    189 The COU determines the assay rigor by defining the use of the biomarker measurement in drug 190 development. Since drug development decisions will be made based upon qualified biomarkers, the 191 assay used to measure the biomarker must be robust, sensitive, and specific enough to support the 192 specific decisions defined by the COU.

    193 Biomarker assays for soluble targets are required to measure changes in response to disease or 194 treatment in endogenous concentrations or activities of biomolecules against a variable background 195 found in the intended use population defined by the COU. It is important that the relevant changes 196 in biomarker concentrations are measured as accurately and precisely as necessary to enable 197 investigators and health authorities to make informed decisions. Therefore, the magnitude of the 198 biomarker change from baseline to reach a medically actionable level (cut-off value) will have a 199 direct effect on the amount of acceptable analytical variability in an assay. For example, if a 200 biomarker has a baseline of 5 units and a medically relevant change in that biomarker is an increase 201 of 2 units, an assay capable of appropriate discrimination must be very precise with only a small 202 amount of total analytical error. However, if a medically relevant change is an increase of 200 units 203 in that biomarker, then a lower level of assay precision and a higher amount of total analytical error 204 may be acceptable to yield medically useful results. If the assay yields a result of 10 ± 6 in the first 205 example, the data are not useful due to the variability associated with the result; in the second 206 example, this result is useful and can be interpreted as an important change in the biomarker that is 207 not medically relevant. This determination is further compounded by intra- and inter-individual 208 variation for normal and diseased states for the biomarker. This topic is further discussed in the 209 Assay Validation Acceptance Criteria section of this document, and put into the context of a 210 Performance Standard (PS) for a biomarker assay and Allowable Total Error (ATE) for the biomarker.

    211 The intended use population defined by the COU will also determine the expected reference interval 212 for the biomarker. The reference interval, or commonly, reference range, is the central 95% of the 213 range of values present within the distribution (mean ±2SD) of all the values of a group of otherwise 214 healthy patients. The measurements generated by the assay are described as in range or out of 215 range bounded by the upper and lower limits of normal. The reference interval can be influenced by 216 endogenous factors such as age and sex, comorbidities, and exogenous factors such as exercise or 217 fasting. Genetics, geographical location, different laboratories, and different statistical analysis 218 methods can also impact the reference interval.

    219 The COU will help to determine the performance characteristics for the assay based in part on the 220 medical decision point for the population being tested, be that a normal or diseased population or

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 7

  • 221 both, and each population will have an appropriately defined reference interval or cut-off value. 222 Rarely are reference intervals generated for disease populations, usually receiver operating 223 characteristic (ROC) curves are generated to define clinical sensitivity and specificity with 224 appropriate cut-off values.

    225 Analytical Validation vs Clinical Validation 226 In the qualification of biomarkers, both analytical and clinical factors must be considered. Thus, for 227 biomarker qualification, demonstration of both analytical validation, as it relates to the accurate and 228 precise measurement of the biomarker, and clinical validation, as it relates to the correct 229 interpretation of the biomarker measurement for a specific COU, are necessary. However, these 230 concepts are easily confused and mistakenly combined into one concept.

    231 !nalytical validation is the process of “Establishing that the performance characteristics of a 232 test, tool, or instrument are acceptable in terms of its sensitivity, specificity, accuracy, 233 precision, and other relevant performance characteristics using a specified technical protocol 234 (which may include specimen collection, handling and storage procedures). This is validation 235 of the test, tools, or instrument’s technical performance, but is not validation of the item’s 236 usefulness.” (BEST Resource 2016).

    237 �linical validation is the process of “Establishing that the test, tool, or instrument acceptably 238 identifies, measures, or predicts the concept of interest/” (BEST Resource 2016)

    239 Analytical validation supports the biomarker measurement and includes all factors that are part of 240 the assay and is dependent only upon the acceptability of the samples, critical reagents, and the 241 performance characteristics of the test system. Clinical validation supports the interpretation of the 242 biomarker measurement and is dependent on the consistency and clinical performance (clinical 243 sensitivity, clinical specificity, clinical accuracy) of the biomarker in predicting the outcome claimed. 244 Clinical validation should not be confused with clinical utility, which expresses to what extent 245 diagnostic testing improves health outcomes relative to the current best alternative (Bossuyt et al. 246 2012), or “The conclusion that a given use of a medical product will lead to a net improvement in 247 health outcome or provide useful information about diagnosis, treatment, management, or 248 prevention of a disease. Clinical utility includes the range of possible benefits or risks to individuals 249 and populations/” (BEST Resource 2016) This document focuses solely on the analytical validation of 250 fluid-based biomarkers assays used to generate data for biomarker qualification.

    251 Biomarker Assay Validation and Fit-for-Purpose 252 As stated in the Biomarker Qualification and the Context of Use section of this document, the COU 253 helps to define the fit-for-purpose expectations for the validation of the assay. Fundamentally, all 254 valid bioanalytical assays are fit-for-purpose based on their defined application. The remainder of 255 this document is dedicated to defining the appropriate level of characterization and validation that 256 should be expected for assays used for biomarker qualification.

    257 The goal of biomarker assay development is to construct an assay that adequately meets the goals 258 of the investigation. The term fit-for-purpose is often used in this context. Properly defined, fit-for-

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 8

    https://www.ncbi.nlm.nih.gov/books/NBK338448/def-item/outcome/

  • 259 purpose is “! conclusion that the level of validation associated with a medical product development 260 tool [assay] is sufficient to support its context of use” (BEST Resource 2016). However, too often the 261 term is used inappropriately and without sufficient rationale, labeling assays as such without 262 correlating the level of validation with the assay’s purpose/

    263 Assays that measure biomarkers seeking qualification are used to produce the evidence required to 264 establish and confirm decision points, and therefore should undergo sufficiently extensive and 265 rigorous validation to ensure that assay performance and application match (Table 1). A fully 266 validated assay would be required in all confirmatory biomarker qualification studies including the 267 establishment of reference ranges and biomarker response decision points.

    268 The fit-for-purpose process can be used to develop an assay that is accomplishing clinically what is 269 necessary and relevant. The concept and proper implementation of fit-for-purpose has been 270 thoroughly summarized by Lee et al. (2006) and Lee (2009). This is an iterative process, where data 271 informs further development and refinement of the assay (Table 1). The fit-for-purpose process 272 involves four continuous steps including method development, exploratory method qualification, 273 “full” or extensive method validation, and in-study method validation, in an iterative progression 274 with the intended use of the biomarker data as the driving force for the analytical validation (Lee et 275 al. 2006). This process must be directly related to and support the COU.

    276

    277

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 9

    http://www.ncbi.nlm.nih.gov/books/NBK338448/def-item/validation/http://www.ncbi.nlm.nih.gov/books/NBK338448/def-item/medical-product-development-tool/http://www.ncbi.nlm.nih.gov/books/NBK338448/def-item/medical-product-development-tool/http://www.ncbi.nlm.nih.gov/books/NBK338448/def-item/context-of-use/

  • 278 Table 1: Approaches for Biomarker Assay Validation

    Discovery/Exploratory Translational/Partial Full Validation * Validation Validation

    Decision level Rank ordering, screening Candidate selection High risk actionable data

    Stage of drug Discovery Translational Research Clinical trials development

    Reference Standard • When available, or • When available, or • Requires calibrator or surrogate surrogate reference standard or

    surrogate

    Matrix • Authentic or surrogate • Authentic or surrogate • Authentic or surrogate

    • Test parallelism if samples matrix matrix available • Spiked reference • Spiked reference

    standard calibrator • Consider disease state, • Consider disease state,

    multiple donors multiple donors • Test parallelism • Test parallelism

    Standard and Quality • Acceptance criteria not • Acceptance criteria based • Acceptance criteria based Control Accuracy and needed on evaluation results and on evaluation results and Precision criteria • Established based on technology-based technology-based

    evaluation results analytical considerations analytical considerations

    • Native animal/human • Native animal/human samples as quality control samples as quality control

    samples samples

    Accuracy** and • Not required • Minimum one run • Six runs Precision qualification

    Stability evaluation • Bench top • Collection, room • Collection, room • Scientific judgment temperature, temperature,

    freeze/thaw, and long freeze/thaw, and long term stability as needed term stability • Reference standard or • Reference standard or

    matrix stability test with matrix stability test with acquired animal/human acquired animal/human samples samples

    Data output • Qualitative • Relative quantitative • Absolute quantitation** • Semi-quantitative • Semi-quantitative • Relative quantitative

    • Absolute quantitative** • Semi-quantitative • Qualitative

    279 *Assays that measure biomarkers seeking qualification are used to produce the evidence required 280 to establish and confirm decision points, and therefore should undergo full validation to ensure that 281 assay performance and application match

    282 **For heterogeneous (i.e., large molecule) biomarkers, the calibrators are generally prepared with 283 recombinant reference material in a surrogate matrix. The assay cannot provide absolute 284 quantification; only relative accuracy can be evaluated. Thus, the term relative accuracy (rather than 285 accuracy) is appropriate for nearly all biomarkers where the calibration material differs from the 286 endogenous biomarker.

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 10

  • 287 History of Guidance Documents Relevant to Assay Validation

    288 Multiple guidance documents have been published for pharmacokinetic (PK)/bioequivalence and 289 IVD assay development and validation. These documents recommend fundamental concepts 290 necessary for the development and validation of biomarker assays for use in the qualification of 291 DDTs. Although the application of these concepts for biomarker assay validation has not been 292 codified, the lessons learned and knowledge gained in the development of these guidance 293 documents can be used to build a more comprehensive and relevant document that is directly 294 applicable to biomarker qualification. Below is an overview of the currently available documents 295 related to guidelines for the validation of biomarker qualification assays.

    296 In 2001 CDER and the Center for Veterinary Medicine (CVM) at the FDA jointly published the 297 “Guidance for Industry, �ioanalytical Method Validation” (FDA 2001). This document addressed the 298 validation of methods for use in human clinical pharmacology, bioavailability, and bioequivalence 299 studies requiring a pharmacokinetic evaluation. It described three types of validation (full, partial, 300 and cross-validation) and identified four key parameters recommended for validation: (1) selectivity, 301 (2) accuracy, precision, recovery, (3) calibration curve, and (4) stability of analyte in spiked samples.

    302 In September 2013, the FD! published a revised draft. “Guidance for Industry, �ioanalytical Method 303 Validation” (FDA 2013). This draft guidance was intended to address recent advances in science and 304 technology related to bioanalytical method validation, while still identifying a familiar list of 305 fundamental parameters for method validation including accuracy, precision, selectivity, sensitivity, 306 reproducibility, and stability. After the draft guidance was opened to public review and comment, 307 the joint FDA/American Association of Pharmaceutical Scientists (AAPS) Crystal City V Meeting took 308 place in Baltimore from December 3-5, 2013, to continue the feedback/comment process. A 309 consensus was reached on several issues at this meeting (Booth et al. 2015). As a follow-up to 310 Crystal City V, the AAPS Workshop Crystal City VI: Bioanalytical Methods Validation on Biomarkers 311 was held in September 2015 in Baltimore to clarify residual concerns pertaining to validation of 312 Ligand Binding Assays (LBA) and Liquid Chromatography-Mass Spectrometry (LC-MS) assays (Lowes 313 and Ackerman 2016; Arnold et al. 2016) and to distinguish biomarker assay validation from these 314 principles. However, a final guidance has yet to be issued.

    315 Currently, specified criteria for PK assay performance outlined in the 2013 Bioanalytical Method 316 Validation Draft Guidance (FDA 2013) are being considered for assays to support biomarker 317 measurement. The draft guidance states that “Method validation for biomarker assays should 318 address the same questions as method validation for PK assays. Tiered approaches have been 319 informally proposed by both regulators and industry organizations… The accuracy, precision, 320 selectivity, range, reproducibility, and stability of a biomarker assay are important characteristics 321 that define the method. The approach used for PK assays should be the starting point for validation 322 of biomarker assays, although FDA realizes that some characteristics may not apply or that different 323 considerations may need to be addressed as it pertains to certain parameters such as relative 324 accuracy and parallelism.”

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 11

  • 325 Parallelism is the extent to which the dose-response relationship between two materials (i.e., 326 calibrator versus unknown specimens) is constant for the examined range of concentrations. Note 327 that evolving terminology reflects a shift from “accuracy” to “relative accuracy” in almost all cases, 328 and parallelism is now regarded as required, not optional. With the goal of ensuring accurate and 329 reliable conclusions, the level of analytical rigor and industry standard practices needed for 330 validation of biomarker assays is currently poorly defined (Lee et al. 2006).

    331 To date, the most comprehensive collection of guidance documents addressing analytical validation 332 of biomarker assays cleared or approved as IVDs is that provided by the Clinical and Laboratory 333 Standards Institute (CLSI) (www.clsi.org.) CLSI is a recognized standards development organization 334 and has a well-defined process for issuing standards and other guidance documents, many of which 335 are recognized internationally. CLSI has published dozens of documents addressing issues related to 336 laboratory testing and the development of laboratory testing products for commercial distribution, 337 some of which (Table 2) are directly relevant to this document. The Center for Devices and 338 Radiological Health (CDRH) also has a formal process for standards recognition and has granted 339 formal or informal, full or partial recognition, to a number of the CLSI published standards.

    340 However, it should be noted that CLSI documents, although directed at parameters and metrics akin 341 to those described in this document, have a different purpose. CLSI documents are frequently used 342 in FDA premarket reviews of diagnostic products proposed for decision-making in medical practice. 343 These documents may help FDA in demonstrating that a new version of a legally marketed product 344 is substantially equivalent to its predicate or in demonstrating that a new device for a known and 345 qualified biomarker is safe and effective.

    346 This Points to Consider document is intended to address the performance of an assay used to 347 generate data to support biomarker qualification. Testing in this setting is likely to be limited and 348 performed under the well-controlled environment of drug development. Qualified biomarkers are 349 most often expected to be used as part of early feasibility testing to make developmental decisions, 350 identify likely drug candidates for further testing, monitor safety of individual patients, or help in the 351 planning of more definitive clinical trials. These uses may allow for a validation that is suitable for its 352 COU, but is insufficient for a commercial diagnostic product being sold to multiple laboratories for 353 potential use in multiple different settings.

    354 Table 2: CLSI Guidelines Related to the Validation of Biomarker Assays

    CLSI EP05-A3 Evaluation of Precision of Quantitative Measurement Procedures; Approved Guideline – Third Edition

    CLSI EP06-A Evaluation of Linearity of Quantitative Measurement Procedures: A Statistical Approach; Approved Guideline

    CLSI EP07-A2 Interference Testing in Clinical Chemistry; Approved Guideline – Second Edition

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 12

    http://www.clsi.org/

  • CLSI EP09-A3 Measurement Procedure Comparison and Bias Estimation Using Patient Samples; Approved Guideline – Third Edition

    CLSI EP17-A2 Evaluation of Detection Capability for Clinical Laboratory Measurement Procedures; Approved Guideline – Second Edition

    CLSI EP21-Ed2 Evaluation of Total Analytical Error for Quantitative Medical Laboratory Measurement Procedures – Second Edition

    CLSI EP28-A3c Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline – Third Edition

    355 In addition to use of CLSI standards, the Office of In Vitro Diagnostics and Radiological Health (OIR) 356 in CDRH has published over 100 guidance documents addressing a wide variety of products (OIVD 357 Guidance Documents).

    358 Although there are no free-standing documents addressing analytical validity per se, many of the 359 product specific documents have sections outlining current thinking on best practices for 360 establishing the analytical validation of different types of new tests and test technology.

    361 Finally, if a biomarker test is used for clinical decision making in the context of medical practice 362 outside of a clinical trial, or under some circumstances in a controlled clinical study, it becomes 363 subject to oversight by the CLIA of 1988 administered by the Centers for Medicare and Medicaid 364 (CMS). Among the broad range of oversight this covers are specific requirements for types of 365 analytical performance, types of calibration, and types of quality control (CLIA manual).

    366 In conclusion, there are already numerous guidance documents in place published by FDA, CLSI, and 367 CMS to aid in establishing the analytical validity of biomarker assays. These have varying relevance 368 for biomarker assays intended to support biomarker qualification, depending on the COU, testing 369 objectives, analytes of interest, and types of regulatory control that may be dictated by current 370 government requirements. This Points to Consider document is intended to develop a practical and 371 pragmatic approach to establishing analytical performance, specifically for use in biomarker 372 qualification as a DDT.

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 13

    http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfggp/Results.CFM?Doc_Type=1&Doc_IsCur=1&Doc_OFFICE=OIVD&lookandfeel=1&SORT_ORDER=origin,documentdate%20deschttp://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfggp/Results.CFM?Doc_Type=1&Doc_IsCur=1&Doc_OFFICE=OIVD&lookandfeel=1&SORT_ORDER=origin,documentdate%20deschttps://www.cms.gov/Regulations-and-Guidance/Legislation/CLIA/Downloads/App-C_Survey-Procedures-IGs-for-Labs-Labs-Svcs-Final.pdf

  • 373 Assay Design, Development and Validation

    374 In order to develop this document, several key assumptions regarding the nature and use of assays 375 for qualification of soluble biomarkers measured in biological matrices were made and are outlined 376 below.

    377 1. Assay design and technology selection considerations for biomarkers are dependent on the 378 COU. 379 2. The analytical validation parameters for assays used in biomarker qualification are not 380 identical to the expectations outlined for pharmacokinetic (drug concentration) or 381 toxicokinetic assays. 382 3. The expectations for the performance characteristics of assays used in biomarker 383 qualification are dependent on the COU and ultimately the application in drug development. 384 4. Qualification of a biomarker does not indicate that assays used to generate the qualification 385 data are approved or cleared by CDRH. 386 5. An FDA approved or cleared assay is not required to support a biomarker qualification effort, 387 and the strictly defined clinical utility statement for such assays may not match the desired 388 �OU as a DDT, and may be considered an “off-label” use of the biomarker assay. Thus, the 389 final validated method should meet acceptable performance characteristics to support 390 qualification of a biomarker. 391 6. The performance characteristics of the assays used for qualifying biomarkers are considered 392 suitable for use in drug development and regulatory submissions but are not assumed to be 393 directly acceptable in, or transferrable to, regulated clinical practice, without clearance or 394 approval by CDRH.

    395 Assay Design and Technology Selection 396 The most important pre-requisite for assay design and technology selection is the definition of the 397 intended purpose or COU of the biomarker. Full consideration of the COU will focus attention on 398 practical considerations for the assay’s design/ For example, consideration of risks and standard of 399 care practices for sample acquisition might affect assay design. Likewise, intended use under highly 400 controlled or field-like conditions, for high volume or low volume testing, and with professionally 401 trained or lay operators can drive design decisions enabling practical use of the assay. For some 402 COUs, planning for iterative changes in the assay is needed (i.e., for scale-up or for 403 different/improved performance over the course of a drug development program).

    404 Even though limited historical data for novel biomarker end points may be available with regard to 405 endogenous levels and prevalence in normal and diseased populations, establishing the working 406 criteria for the assay is foundational for the selection of appropriate detection technology, and for 407 designing the assay format and selection of optimal reagents. Table 3 highlights points to consider in 408 assay design and development.

    409 Table 3: Points to Consider in Assay Design and Development

    Biomarker Stability of biomarker if known in disease conditions

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 14

  • Context of Use Application requirements

    Test population (e.g., human (healthy, disease), animal)

    Patient population comorbidities

    Sample acquisition

    Use Environment Lab vs. field

    Ruggedness / Robustness

    User training

    Maintenance

    Sample collection timing, methods, transport and storage

    Immediate vs stored analyses

    Sample preservation

    Single/multiple sites

    Single/multiple use

    Contamination effects (e.g., blood in CSF)

    Assay features Analyte(s) selection (measurands)

    Qualitative/SemiQuantitative/Quantitative

    Calibrators/ reference material

    Controls (external, internal)

    Reportable range

    Reference interval

    Specimen volume/quantity requirements

    Allowable analytical precision and total error, Desired detection sensitivity – upper and lower limits and Putative detection range Selectivity and specificity considerations including probable interference factors in endogenous matrix

    Results turn-around time

    Batchwise vs random access performance in automated clinical analyzers

    Automation

    Process software

    Analyte or reagent carry-over

    Analytical software, user interface

    Waste/hazard containment

    Cost

    Technical support requirements

    Iteration (versioning, Platform/technology migration, convergence) Interim data evaluation

    Assay refinement

    410 Selection of a technology platform for biomarker detection will be primarily driven by the nature of 411 the biomarker being measured (protein, nucleic acid, etc.) and the sensitivity requirements. The 412 biophysical nature of the assay technology and the quality of the assay reagents will impact the 413 absolute and relative measurements of the intended biomarker. Typically, plate- and bead-based 414 assay formats and a variety of detection modalities including fluorescence, chemiluminescence,

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 15

  • 415 electrochemiluminescence, chromogenic detection, mass-spectrometer based assessments, and 416 relatively new acoustic detection systems can be considered for the evaluation. Since most of the 417 current automated technologies demonstrate acceptable precision, comparing various assay 418 parameters using available reagents for biomarker detection with a given technology becomes a 419 critical consideration for technology selection. A method comparison of performance between 420 technology platforms may be assessed using a fixed set of assay reagents and normal samples to 421 estimate the reproducibility and relative error of back fit concentrations of the biomarker (spiked or 422 endogenous) in relevant buffer matrix. Then the comparison should be extended to disease samples 423 of interest to measure endogenous biomarker detection. Use of parallelism criteria will enable the 424 identification of potential interference factors in the desired matrix. Another important 425 consideration for technology selection includes scalability. Manual methodologies requiring high 426 technical expertise may not be suitable for a biomarker method that requires global 427 implementation. Likewise, the ease of use of the data processing software is another important 428 consideration.

    429 Once the technology platform is selected, the assay can be optimized using a checker-board or 430 design of experiment (DOE) approach (fractional factorial experiments, central composite designs) 431 to simultaneously evaluate multiple parameters such as minimum required dilution of samples, 432 assay reagent concentrations, calibrator levels, incubation periods, blocking and washing 433 parameters etc., for optimization of the assay prior to finalizing the assay format. Another important 434 consideration in the assay design finalization phase is the selection of a regression model for 435 quantitative assays (polynomial (linear, quadratic); nonlinear models (four or five parameter logistic 436 models, power model)) for the calibration curve to assess the performance characteristics of the 437 prototype method and to show acceptability of system suitability criteria. Further development of 438 the assay may then proceed, defining pre-analytical factors, followed by validation, in preparation 439 for implementation of the developed methodology.

    440 Pre-Analytical

    441 The following discussion is meant to provide points to consider, but will not necessarily apply to 442 each qualification submission. It is important to evaluate early within each project which pre443 analytical factors are relevant and strive to find the appropriate balance of rigor necessary in a fit444 for-purpose approach. Pre-analytical factors refer to all procedures that occur prior to sample 445 analysis including sample collection, processing, transportation, and storage (See Table 4 for some 446 examples). The physiology and/or patient specific characteristics of the human research participants 447 are largely outside the control of the laboratory but can also have a significant impact on laboratory 448 results. These include, but are not limited to, such factors as age, gender, ethnicity and ongoing 449 diseases. Factors such as exercise, eating, drinking, and medication also affect patient results. These 450 factors should be thought of as part of the sample history and should be documented as completely 451 as possible.

    452 Table 4 lists some examples of the pre-analytical factors in sample handling and processing that can 453 affect quantitation of biomarkers. These variables can introduce inconsistency to assay results. Not 454 only must these factors be taken into consideration with regard to the COU in specific populations 455 early in the assay development phase prior to the full validation of the assay, but they must be

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 16

  • 456 established and remain consistent across assay validation, qualification, and post-qualification use. 457 To ensure consistency, standard operating procedures, quality control indices, and criteria for 458 sample acceptance/or exclusion must be developed.

    459 It should be appreciated that the pre-analytical factors may change across multiple assays for the 460 same biomarker and need to be established for each assay. The pre-analytical factors for the same 461 biomarker may also be different depending upon the biological matrix being analyzed. Not all 462 biomarkers will be impacted by all factors, but as learning increases, documentation of earlier 463 studies will make previously collected data interpretable. Table 4 is not meant to be exhaustive, but 464 to provide common examples of factors to consider; sound scientific expertise and understanding 465 must be utilized for each assay developed for a specific COU.

    466 Table 4: Examples of Pre-Analytical Factors to be Considered

    Pre-Analytical Factor Examples (not all inclusive)

    Sample Type Whole blood, cord blood, serum, plasma, platelet-poor plasma (PPP), platelet-rich plasma (PRP), neat urine, centrifuged urine, saliva, ocular fluid, cerebrospinal fluid (CSF)

    Interference Endogenous: lipids (lipemia), hemoglobin (hemolysis), icterus (bilirubin), glucose, rheumatoid factors (immunoglobulins, CRP)

    Exogenous: drug interferences, OTC medications, skin disinfectants, collection tube additives/preservatives, bacterial contamination

    Collection Procedure Collection method (catheter vs venipuncture), type of needle, time of venous occlusion, collection site, volume, draw order, patient posture, adherence of staff to SOPs, timing of sample to pretreatment (protein inhibitors)

    Collection Tube Anticoagulant or preservative type and concentration (e.g. clot activator, EDTA, heparin, thrombin, sodium citrate, acid citrate, sodium fluoride, protein inhibitors), tube composition (low protein adherence, plastic leaching); breakage, proper tube labelling

    Sample Collection Time Time of day, frequency

    Collection Variables Proper mixing; use of additive, preservative, and/or anticoagulant, temperature, light exposure, timing between collection and processing

    Sample processing Centrifugation (relative centrifugal force, angle/pelleting factor, time, braking), aliquoting (e.g. micro-aliquots

  • µL relative to tube volume), storage tube material, closure, type of aliquot tube, de-salting, solid-phase extraction, adherence of staff to SOPs

    Post Collection Variables Collection and immediate storage temperature, minimization of time not stabilized, requirements for protection from light

    Logistics of transport Temperature (shipping on wet ice, dry ice), permits for human or primate blood, manifests, upright shipping, light exposure

    Storage Considerations and Stability

    Desired short and long term stability goals (timeframe),

    desiccation, oxidation, sublimation, temperature (-4C,

    20C, -70C, -80C, -120°C, -196°C (liquid nitrogen)), freeze/thaw cycles

    Thawing Considerations Temperature (room temperature, quick thaw, etc.), addition of stabilizers

    467 A number of resources and references have been developed to help identify and control sources of 468 pre-analytic variation including CLSI guidelines which are listed in Appendix 2. The NCI Biospecimen 469 Research Database (http://brd.nci.nih.gov) provides a compilation of primary literature that 470 addresses biospecimen science. The Biospecimen Reporting for Improved Study Quality 471 recommendations outline and prioritize elements for biospecimen studies (Moore et al. 2011). The 472 International Society for Biological and Environmental Repositories Biospecimen Science Working 473 Group developed a “Standard PREanalytical �ode” (SPRE�) that provides a common list of 474 preanalytical variables for fluid samples and corresponding sample labeling system (code) that is 475 intended to provide a generic format for specimen comparison (Betsou et al. 2010). Many of the 476 measures implanted in clinical diagnostics as quality indicators for the preanalytical phase may also 477 apply to the qualification setting. For example, the International Federation of Clinical Chemistry 478 Working Group on Laboratory Errors and Patient Safety has defined quality indicators for the 479 preanalytical stage (West et al. 2017). These are some of the many references available to aid the 480 assay developer in controlling preanalytical variability.

    481 The importance of documenting patient characteristics and understanding the influence of pre482 analytical factors cannot be overemphasized. Standardized techniques for sample collection and 483 handling need to be employed, quality control procedures developed, and personnel adequately 484 trained to ensure sample integrity. It is important to realize that some retrospective/biobanked 485 samples may not have been collected in a manner consistent with the pre-analytical conditions 486 defined during assay validation. Indeed, some biobanked samples may have incomplete 487 documentation which makes it difficult or impossible to establish the full history of the patient or 488 status of the sample. These samples should be used with caution, and consideration given to not 489 using the sample in question or flagging the results as questionable.

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 18

    http://brd.nci.nih.gov/

  • 490 Analytical Performance Requirements for Biomarker Assays

    491 When considering the performance needs of a biomarker assay, it is expected that efforts are made 492 to “understand the biology” of the biomarker/ While it may be difficult to execute this to complete 493 satisfaction, a process will be described to provide preliminary assessment for comparison of 494 required precision and bias goals based upon biological variation of the biomarker levels in the 495 intended population. When this approach is technically unfeasible, a consideration for the 496 “confidence” in measuring effect sizes is described (such as treatment over time for an individual or 497 treatment versus control group), derived from analytical parameters determined during assay 498 validation.

    499 A number of approaches are proposed based on the use of a biomarker for decision making (Klee 500 2010). Many of these approaches require a clinical context for assignment of appropriate 501 performance characteristics. A more tractable determination of appropriate quality specifications is 502 derived from two components of biological variation, namely, within-subject (Coefficient of 503 variation [CVI]) and between-subject (CVG) variation (CV = standard deviation/mean, expressed as a 504 percentage (Fraser et al. 1997)). These are base determinants specifying the minimum meaningful 505 change in biomarker concentration which can be used to support or demonstrate a significant 506 clinical change. As analytical variation (CVA) will add variability to the “true” test result, three levels 507 of CVA are proposed. The optimal specification is CVA < 0.25*CVI, where CVA comprises ~3% of CVI. A 508 more appropriate and widely accepted quality standard is a “desirable” specification of CVA < 509 0.5*CVI, where CVA comprises ~12% of CVI. In the situation where the desired performance is 510 outside of the performance capability of the current technology or methodology, a minimal CVA < 511 0.75*CVI, where CVA is ~25% of CVI is proposed, with desirable specifications set as an improvement 512 goal.

    513 Furthermore, analytical bias (BA) may be considered in a similar context, that is, the acceptable error 514 associated with a measurement that would incorrectly assign a change from a group as a function of 515 analytical performance (i.e. the error that an effect of treatment is assigned when compared to the 516 group of subjects receiving treatment or the subject result pre-treatment). Three tiers are proposed, 517 optimal BA < 0.125*(CVI2 + CVG2)1/2, would falsely assign a maximum of 3.3% and minimum of 1.8%

    518 of subjects outside the group at a 90% confidence interval of the reference limits (mean 1.645 519 times the standard error (SE), where SE = SD/N1/2, with N being the relevant sample size). Desirable 520 bias is defined as BA < 0.25*(CVI2 + CVG2)1/2, (falsely assigning a maximum of 4.4% and minimum of 521 1.4% outside the group at a 90% confidence interval). Minimal acceptable bias is thus defined as BA 522 < 0.375*(CVI2 + CVG2)1/2, (5.74% and 1.4% above and below the group at a 90% confidence interval). 523 As noted above, minimal bias specifications should only be used when the performance capability of 524 the current technology or methodology does not facilitate achievement of desirable bias goals, the 525 latter being a goal for enhancement of method performance.

    526 Preliminary Determination of CVI and CVG

    527 A streamlined and simplified proposal for provisional determination of CVI and CVG is described 528 below (Ichihara and Boyd 2010). The example overcomes the confounding variables effect of 529 univariate analysis by way of nested analysis of variance (ANOVA) allowing simultaneous

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 19

  • 543

    530 comparison of multiple sources of variance within a single experiment. The study comprises drawing 531 samples from three subjects (pure component of between-individual variance, CVG) over 3 days 532 (pure component of within-individual variance, CVI) and measuring each specimen twice (singlicate 533 measure on two separate days to derive in part the pure determination of analytical variance, CVA). 534 The subjects required for this study should either be normal (control arm of study), diseased (testing 535 arm of the study) or replicated as both a control arm and testing arm independently (two sets of 3 536 subjects). The goal of the study is to determine CVI and CVG in one or other of the subject groups to 537 define performance needs. Therefore, the 3 subjects must not be a mixture of normal and diseased 538 groups. More subjects are naturally optimal if the expected change in the biomarker is small (refer 539 to “power” and statistician engagement comments below, where small is arbitrarily assigned as < 540 20%, to reflect the methodological constraints often associated with immunometric and MS based 541 assay performance).

    542 Table 5: Example data and two-level Nested ANOVA for Preliminary CVI and CVG determination

    Sample Draw

    Subject/Assay run Day 1 Day 2 Day 3

    Subject 1 run 1 23 25 27

    Subject 1 run 2 25 24 25

    Subject 2 run 1 28 35 39

    Subject 2 run 2 28 34 40

    Subject 3 run 1 52 48 37

    Subject 3 run 2 50 48 36

    Two-level Nested ANOVA Alpha 0.05

    SS df MS F p-value sig

    Between individual variance (CVG) 1244.3 2 622.17 10.04 0.0122 yes

    Within individual variance (CVI) 371.7 6 61.94 69.69 0.0000 yes

    Residual 8.0 9 0.89

    Total Variance 1624 17 95.53

    Analysis of variance component (VC)

    VC VC, % SD CV (VC)

    Between individual variance (CVG) 93.370 74.824 9.663 27.874

    Within individual variance (CVI) 30.528 24.464 5.525 15.938

    Residual 0.889 0.712 0.943 2.720

    Sum of variance 124.787

    Grand mean 34.67

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group

    544

    20

  • 545 Using data derived from the Table 5 (Ichihara and Boyd 2010), the preliminary desirable 546 determination of analytical precision (CVA) would be < 7.969% (CVA

  • 569 estimation of TAE, a multiplier of 1.65 is used (Z-score, 5% excluded at both ends of a distribution). 570 Consequently, the formula becomes TAE = BA + 1.65*CVA.

    571 Analytical determination of precision is generally derived from inter-assay precision studies; 572 however, determination of bias requires some consideration to absolute truth. For established 573 biomarkers, higher order reference methods (with materials for testing) or comparison to existing 574 assays is performed to determine bias of new methods (Klee 2010; Westgard et al. 1974; Fraser 575 2001). Determination of bias in the absence of these comparators may require consideration of 576 analytical parameters that enable calculation such as spike and recovery (Bias (%) = 100-recovery 577 (%)), or via back-calculated bias samples of known concentration such as Lower Limit of 578 Quantification (LLOQ) and Upper Limit of Quantification (ULOQ) replicates from inter-assay accuracy 579 studies (Bias (%) = 100 – accuracy (%)). The influence of precision on measurement is reduced by 580 assaying replicates in multiple runs to reduce precision by a factor of n1/2, (n= number of replicates 581 (Fraser 2001)).

    582 An example of the generation of TAE for an assay with a bias of 10% and precision of 15% (assumed 583 homoscedasticity) is shown in Table 6 below. The calculation of TAE = 10 + 1.65*15 = 34.75%. When 584 analyzing samples of true concentrations (10, 30 and 50 ng/mL), the measurable concentration 585 range incorporating TAE is calculated as upper (true result * (100+TAE)/100)) and lower boundaries 586 (true result * (100-TAE)/100)).

    587 Table 6: Calculating TAE from Bias and Precision and Determining Measurement Ranges 588 (Uncertainty)

    True Concentration (ng/mL)

    Bias (%, BA)

    Precision (%, CVA)

    Total Allowable Error (%, TAE)

    Measured Concentration Range within TAE (ng/mL)

    Lower Upper

    10 10 15 34.75 6.525 13.475

    30 10 15 34.75 19.575 40.425

    50 10 15 34.75 32.625 67.375

    589 The results from Table 6 are graphically displayed in Figure 2. The line of unity (solid) is bracketed 590 with divergent TAE boundaries for upper (short dash) and lower (long dash) lines with slopes of y = 591 1.3475x and y = 0.6525x respectively. For analysis of a sample with a measured result of 40ng/mL 592 (dotted line), the true result can be interpolated from these TAE boundary conditions; lower range 593 of true result = 29.685ng/mL (40ng/mL/1.3475) and upper range of true result = 61.303ng/mL (40 594 ng/mL/0.6525). The range of these results represents, in part, the measurement uncertainty.

    595

    596

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 22

  • •••••••••••••••••••••••• ;,,!I •••••• , : , . , . ---,, ---,, . ---, :,,,,,,'

    ,,,' t--

    -

    597 Figure 2: Extrapolation of Measurement Uncertainty from TAE

    598

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    0 20 40 60 80 100

    Me

    asu

    red

    Co

    nce

    ntr

    atio

    n (n

    g/m

    L)

    True Concentration (ng/mL)

    Upper TEA (y=1.3475x)

    True Value (y=x)

    Lower TEA (y=0.6575x)

    40ng/ml interpolated

    599 The penultimate component of the process determines the change that may be observed in a 600 subject following treatment (pretreatment versus post treatment measurement), where the impact 601 of treatment upon biology is demonstrated following de-convolution of measurement uncertainty 602 from the two measurements. Considering whether an observed difference may be assigned to 603 biological changes requires consideration to the degree of false positivity that is acceptable 604 (incorrectly assigning measurement error to biological change, type 1 error, in Figure 3) together 605 with the degree of false negativity that is acceptable (incorrectly missing biological change due to 606 the results falling within measurement error, type 2 error). Using a 95% power as in Figure 3, 607 biological change would be inferred with a 5% false negative rate.

    608 Consequently, for a TAE of 34.75%, the difference between two measures of the same subject that 609 could be attributed to biological changes with 5% false positivity and 5% false negativity is 610 calculated as 177.34% (biological change threshold = 5.1035*34.75%). In other words, an almost 611 three-fold difference between two results would be necessary before there is confidence that a 612 biological change is being observed. Lower power results in less confidence that observed biological 613 changes are true. For example, using an 80% power (20% type 2 error) measured differences > 614 138.17% (greater than two-fold changes) are attributable to biological change, however, the false 615 negative rate is 4-fold higher than at a 95% power.

    616

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 23

    http:5.1035*34.75

  • 0%

    50%

    100%

    150%

    200%

    250%

    300%

    0% 10% 20% 30% 40% 50%

    % D

    iffe

    ren

    ce

    TEA or CVA

    Power 70, y = 3.5223x

    Power 80, y = 3.9762x

    Power 90, y = 4.5937x

    Power 95, y = 5.1035x

    617 Figure 3: Influence of power analysis on measurement differences as a function of TAE or CVA

    618

    619 The final step (or perhaps the first step when one considers anticipated effects sizes that are small 620 (< 20%)) when considering applicability of the analytical method (from analytical validation studies, 621 not a priori CVI and CVG assessment) involves the “effect size” that needs to be measured with 622 confidence (prescribed false negative and false positive rate). While the above example 623 demonstrates the implications of TAE when measuring one subject at two discreet time points in 624 singlicate, smaller “effect sizes” may be discernible by incorporation of a larger number of subjects 625 in cohorts or repeat analysis of subject specimens from all time points. We recommend that the 626 details described thus far are used as a framework for discussion with an appropriate statistician. 627 Ideally, the study design incorporates these criteria to discern significant biological changes from 628 analytical limitations a priori, ensuring that appropriately powered studies are carried out to support 629 the COU.

    630 Assay Performance

    631 Parameters for Validating Analytical Assay Performance Characteristics

    632 In this section, the seven key analytical assay parameters needed to validate a biomarker assay 633 performance are discussed. As outlined in the draft PK bioanalytical guidance (FDA 2013), basic 634 parameters have already been identified that should be considered when developing an assay for 635 the qualification of biomarkers. It should be noted that not all parameters will be applicable for 636 every biomarker assay, but each should be considered based on the biomarker COU. If a parameter 637 is not addressed, a justification should be formulated for why it was excluded at that time. The

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 24

  • 638 parameter may be added back as needed, such as if the COU changes. Different platforms will have 639 different requirements for the assessment of performance criteria and may have other 640 considerations beyond this list, or may not include some parameters.

    641 When considering the performance and rigor of criteria required for biomarker assay analytical 642 validation, it is essential to understand the purpose and clinical requirements of that assay as they 643 relate to the biomarker’s �OU/ Early in the exploration of a biomarker’s utility, a simple and 644 minimally validated assay may be sufficient to generate informative data. However, when qualifying 645 a biomarker, a fully validated assay will be needed to provide robust data for confirmatory and 646 clinical study sample analysis.

    647 Analytical validation is the confirmation via extensive laboratory investigations that the analytical 648 performance characteristics of an assay are suitable and reliable for its intended use. At a 649 fundamental level, the validation of a biomarker assay used for qualification should include the 650 assessment of seven parameters: accuracy (relative), analytical measurement range (including LLOQ 651 and ULOQ), parallelism (and/or dilutional linearity as appropriate), reproducibility, selectivity, 652 specificity, and stability (Table 7). In some cases, information on additional analytical performance 653 parameters may be needed, including accuracy (or relative accuracy), linearity, robustness, and 654 ruggedness (Table 8). Detailed definitions of these measurements can be found in Appendix 1. 655 Information on how to evaluate assays using these parameters can be found in the next section 656 titled Assay Validation Acceptance Criteria and also in Appendix 1.

    657 Table 7: Seven Key Analytical Parameters to be Considered during Biomarker Assay Validation

    658 Accuracy (Relative) 659 Analytical Measurement Range 660 o Lower limit of quantitation 661 o Upper limit of quantitation 662 Parallelism 663 o Minimum Required Dilution 664 o Dilutional linearity 665 Reproducibility 666 o Within run 667 o Between runs 668 o Between days 669 o Between operators 670 o Between lots 671 Selectivity 672 Specificity 673 Stability 674 o Bench top 675 o Short term 676 o Long term 677 o Freeze-thaw

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 25

  • 678 Table 8: Additional Analytical Parameters to be Considered during Biomarker Assay Validation

    679 Accuracy/Trueness 680 Robustness 681 Ruggedness

    682 As with validation of all bioanalytical methods, a primary consideration is the number of samples 683 that will be required during the validation of a biomarker assay. Table 7 gives a range of 684 expectations based upon the guidance documents and standards for PK and IVD method validations 685 for evaluations of precision according to varying COUs and intended uses. As seen in Table 7, 686 method validation for use in exploratory/feasibility studies requires the least amount of 687 performance data; review of class III (high risk) medical devices requires the most. Table 7 has been 688 derived from information condensed from guidance documents and pivotal scientific publications; 689 individual reference documents should be consulted for additional detail and justification. 690 Additional samples will be needed depending on the number of analytical parameters being 691 characterized. For assays being used to support biomarker qualification, an approach like that 692 outlined for the CDER Bioanalytical Full Method Validation in Table 9 is appropriate.

    693 Table 9: Comparison of Regulatory Expectations for Precision Validation Studies

    �rystal �ity White Papers Partial Method Validation

    Exploratory/Feasibility Phase of Testinga

    �DER �ioanalytical Full Method Validation For Use in �iomarker Qualificationb, c

    �DRH

    510(k)

    For �linical Use (�lass II Medical Devices d

    �DRH

    PM!

    For �linical Use (�lass III Medial Devices) d

    �ontrols 3 6 2 3

    Duplicates, analytical

    2 2 2 2

    Replicates, sample

    5 5 - -

    Sites 1 1 2 3

    Operators 1 e 1 e 2 3

    Reagent Lots 1 1 2 3

    �alibration �ycles NA NA 5 5

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 26

  • Runs 6 6 2 f 2 f

    Days 3g 3g 20 20

    Runs/Day 1 1 2 2

    Minimum Data 60 120 640 2160 Points / Sample

    694 a b

    White Papers – DeSilva (2003), Viswanathan (2007a,b), Lee (2007), Lee (2009); FDA Bioanalytical Method Validation

    695 c d

    Final 2001; FDA Bioanalytical Method Validation Draft 2013; Harmonized w/ CLSI Approved Guideline Method

    696 e f

    Evaluation Protocol EP05-A3; DeSilva (2003), Viswanathan (2007a,b), Lee (2006), Lee (2009) recommend two (2); Two

    697 runs per day (AM & PM) for 20 days yielding a total of 40 runs; g

    Not per day, but over three days, ergo a total of 6 runs 698 699 Method precision and relative accuracy are performance characteristics that describe the 700 magnitude of random errors (variation) and systematic error (bias) associated with repeated 701 measurements of the same homogeneous (native or spiked pools or spiked non-physiological 702 matrices, in decreasing order of preference) sample under specified conditions. Within-run 703 precision, between-run precision, and accuracy should be initially established during method 704 development, followed by confirmation during pre-study validation. However, biomarkers rarely 705 have well-characterized reference standards or calibrators. Therefore, precision and relative 706 accuracy parameters are often established either (preferably) using patient samples, or a surrogate 707 such as the most appropriate recombinant control material spiked into a blank or surrogate matrix. 708 In the absence of reference methods or materials, sponsors developing assays to support biomarker 709 qualification should have well-defined and well-characterized surrogates for reference standards to 710 ensure performance of the assay of interest remains consistent over time.

    711 It should be noted that if testing is being performed for the purpose of individual clinical decision712 making within clinical trials, such as patient dose selection, assay performance will be subjected to 713 oversight by the CLIA administered by the CMS. CLIA requires laboratories to establish and test 714 analytical performance and to assure constant test performance by carrying out calibration 715 verification using samples with known values at 6-month intervals. This repeat testing is possible 716 only if there is a reliable source of a reference or surrogate standard for assessment of calibration 717 drift.

    718 When biomarker samples are being analyzed across multiple laboratories, both intra- and inter719 laboratory reproducibility should be evaluated. Table 10 provides a guide for comparing sample 720 requirements for inter-laboratory versus intra-laboratory reproducibility based on FDA 510(k) and 721 PMA guidelines. In cases where only a single laboratory is utilized to conduct biomarker qualification 722 analysis, there is no need to demonstrate inter-laboratory reproducibility. However, inter723 instrument reproducibility may be applicable. These numbers reflect diagnostic standards and are 724 shown only for comparative purposes.

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 27

  • 725 Table 10: Considerations for Evaluating Inter-laboratory vs. Intra-laboratory Reproducibility

    Multiple

    laboratories

    Single laboratory

    Validation Sample Replicate

    Expectations

    Controls 6 3

    Duplicates 2 -

    Replicates 5 2

    Sites 2-3 1

    Operators 2-3 1

    Reagent Lots 2-3 1

    Calibration Cycles 5 0

    Runs 40 6

    Days 20 2-3

    Runs/Day

    Min. Obs./Sample

    2

    ≥640

    1

    120

    726

    727 System Suitability, Assay Format and Detection System

    728 Initial decisions on assay format and the detection system should be made based on the 729 characteristics of the analyte. These decisions can be influenced by factors such as the necessary 730 assay detection limits, the available reagents, and the volume of sample that the study will provide. 731 The system suitability is commonly measured by injecting replicate standards on a GC, HPLC, or MS, 732 or detecting known positives with a kit assay.

    733 Assay Validation Acceptance Criteria 734 Determining assay acceptance criteria for biomarker assays is likely the most challenging exercise 735 for a biomarker assay validation. Unlike the predefined acceptance criteria established for small and 736 large molecule PK assays, the acceptance criteria for biomarker assays are dependent upon each 737 biomarker’s physiological behavior, similar to the validation approach used for IVD methods/ 738 However, a more difficult question is the nature of the appropriate validation samples.

    739 As discussed by Lee et al. (2006), the fit-for-purpose status of a biomarker method is deemed 740 acceptable if the assay is capable of discriminating changes that are statistically significant from the 741 intra- and inter-subject variation associated with the biomarker. If the assay is not capable of such 742 discrimination, either the assay lacks the appropriate analytical attributes, the biomarker is not 743 suitable for the proposed purpose, or the study size / subject selection is inappropriate. For 744 example, an assay with 40% total error allowable determined during validation may be adequate for 745 statistically detecting a desired treatment effect in a clinical trial for a certain acceptable sample 746 size, but this same assay may not be suitable for a clinical trial involving a different study population

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 28

  • 747 that has much greater physiological variability. In this example, bias would be ~10% and imprecision 748 would be 18% (Bias + 1.65 times imprecision). The fallacious assumption that bias is zero mistakenly 749 allows for apparently increased imprecision. However, if bias goes up, the required imprecision goes 750 down, i.e., if bias = 20%, then imprecision must equal 12% to meet this TAE criterion.

    751 To be considered acceptably validated: (1) appropriate assay characterization practices must be 752 applied (the Seven Key Analytical Parameters defined in Table 7 plus relative accuracy), and (2) the 753 assay must be able to distinguish biomarker changes that are outside of the normal biological 754 variability. Of course, it is desirable to have a well-performing, fully validated assay so that if 755 additional analytical error is introduced into the assay, the biomarker’s performance will not suffer/

    756 Accuracy (Relative)

    757 Accuracy is the closeness of agreement between the result of a measurement and the true value of 758 the measure. In practice, an accepted reference value where available is substituted for the true 759 value. Accuracy can be expressed as %bias, and is also called Trueness or Bias. Ideally this requires a 760 “gold” standard material or reference method procedure which is frequently not available for 761 biomarkers. In the absence of these metrological anchors, a comparison to an established reference 762 laboratory’s results may substitute/ !ccuracy is influenced by the number of measurements (i.e., 763 fewer measurements are usually less accurate than more). Relative accuracy is commonly measured 764 by comparing the measured value of a known specimen to that of a known value of reference 765 material (or spiked sample) in replicate samples, preferably in the expected range of concentrations.

    766 Accuracy = ((Actual value - Measurement) / Actual value) x 100%

    767 Analytical Measurement Range (AMR)

    768 The Analytical Measurement Range (AMR) is the range of analyte values that a method can directly 769 measure on the specimen which is not part of the usual assay process. AMR validation is the process 770 of confirming that the assay system will correctly recover the concentration or activity of the analyte 771 over the AMR. As an example, for assays that can measure a specimen without dilution (for 772 instance, externally calibrated MS assays with isotope dilution), the AMR is determined using the 773 maximum validated dilution and calculated as:

    774 Analytical Measurement Range = LLOQ up to the (ULOQ * maximum validated dilution)

    775 For assays which require specimen dilution prior to measurement (Immunoassays using specimen 776 dilution which differs from calibrator dilution), the AMR is calculated using the minimum required 777 dilution and the maximum validated dilution as:

    778 Analytical Measurement Range = (LLOQ * Minimum required dilution) up to the (ULOQ * maximum 779 validated dilution)

    780 Sensitivity is often defined by the assay LLOQ, but formally defined as the slope of the calibration 781 curve. However, the AMR is still bounded by the LLOQ, the lowest concentration of analyte that has 782 been demonstrated to be measurable with acceptable levels of total error. Total error may be

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 29

  • 783 initially evaluated (as described above using ANOVA for replicate specimens from control or 784 diseased/pre-treatment specimens) or determined from spiking test samples in surrogate matrix 785 (ULOQ and LLOQ back-fit accuracy and precision specimens) and pooled authentic matrices (spike 786 and recovery studies for accuracy, repetition for precision).

    787 For commercial diagnostic kits the analytical sensitivity is usually defined by the limit of detection 788 (LOD), which is determined via extrapolation of concentrations from a response signal of + 3SD of 789 the mean background signal determined using blank matrix samples (n>10, usually assay diluent). It 790 must be noted that the variability at the LOD to LLOQ range is much higher than that in the working 791 range. Therefore, data below the LLOQ should be applied with caution.

    792 A minimum of 5 samples with known concentrations spaced evenly across the range (previously 793 assigned via higher order methods or prepared through spiking into ideally authentic matrix or 794 surrogate matrix), including samples with concentrations that exceed the limits by 10-20% are used 795 to validate the AMR. Samples should be measured in duplicate for assays without internal 796 standardization (in singlicate for assays with internal standardization). Regression analysis by an 797 appropriate linear or non-linear method should be performed comparing the measured to the 798 expected analyte across the quantification range. The results should be plotted, a best fitting line 799 determined, and the y-intercept should be close to zero. The acceptance criteria for the correlation 800 coefficient (r) should be predetermined based on the COU. The general considerations for the 801 following r value ranges are: 0-0.19 very weak, 0.2-0.39 weak, 0.40-0.59 moderate, 0.6-0.79 strong, 802 and 0.8-1 very strong correlation. Calibration systems with non-linear response functions (such as 803 sigmoidal curves for immunoassays) may require consideration to the range of concentrations used 804 in this regression due to increased imprecision at or beyond asymptotic regions of measurement.

    805 To further understand an assay’s tolerance in the event of additional bias, the concept of 806 Performance Standard (PS) has been applied (CLSI EP21-Ed2)/ !s both the assay and the biomarker’s 807 intrinsic physiological behavior are the primary sources of variability in demonstrating the utility of a 808 biomarker and its qualification, both of these sources of error must be taken into account. This 809 approach is outlined below by defining a minimal PS for the biomarker.

    810 PS is defined by the amount of TAE for the biomarker at the Decision Level (XC).

    811 PS = TAE at XC

    812 TAE is the amount of error that can be tolerated without invalidating the medical usefulness of the 813 result.

    814 Decision Level is any concentration of the analyte that is critical for medical interpretation (i.e. 815 diagnosis, monitoring and medical decision limits).

    816 For biomarkers, Acceptable Imprecision can be derived from intra-individual biological variation of 817 the biomarker itself, and the magnitude of the biomarker’s change from baseline in response to a 818 valid biological stimulus or medically significant event. The Bias needs to be calculated as the sum of

    Biomarker Assay Collaborative Evidentiary Considerations Writing Group 30

    http:0.6-0.79http:0.40-0.59http:0.2-0.39

  • 819 squares from both the individual and group variances/ Thus, the biomarker’s minimal PS can be used 820 as a guide to set criteria for the acceptability of the TE associated with the assay.

    821 TE is the sum of all systematic bias and variance components that affect a result (i.e., the sum of the 822 absolute value of the Bias (B) and Intermediate Precision (PI) of the biomarker assay). This reflects 823 the closeness of the test results obtained by the biomarker assay to the true value (concentration) 824 of the biomarker.

    825 TE = B + PI

    826 Bias is any systematic error that contributes to the difference between the mean of a large number 827 of test results and an accepted reference value.

    828 Intermediate Precision is the within-laboratory variation based on different days, different analysts, 829 different equipment, etc.

    830 Finally, performance criteria can be formulated to judge the acceptability of an assay’s performance 831 by comparing the observed analytical TE to the specification for the final Performance Standard. 832 This is generally not possible for exploratory or partially validated methods.

    833 Performance is acceptable when observed analytical TE is less than the PS (TE < PS).

    834 Performance is not acceptable when observed analytical TE is greater than the PS (TE > PS).

    835 Using this approach, biomarkers with a high degree of biological variability and lower amplitude of 836 response to stimulus would require an assay with relatively low TE. While higher TE would be 837 acceptable for assays with biomarkers that have low biological variability and higher amplitude of 838 response to stimulus.

    839 The concept of a PS for a biomarker in conjunction with an assay’s TE also allows for the 840 determination of stability and interference thresholds. Both lack of stability and assay interference 841 introduce bias into an assay and directly contribute to TE. As described above, if either of these 842 factors result in the TE exceeding the PS, the performance of the assay would be