2013 SMEP-TV Web Streaming Conference Schedule

2013 SMEP-TV Web Streaming Conference ScheduleOctober 17 – 19, 2013

TradeWinds Island Resort, St. Petersburg, Florida

Thursday, October 17

Issues in Modeling I9:50 –10:30 Horizons Room

9:50 –10:10 Comparing visual and statistical analysis in single-subject studies using pub-lished studiesWayne F. Velicer & Magdalena HarringtonUniversity of Rhode Island

10:10 –10:20 Assessing the effects of child anxiety treatment when an RCT ceases to bean RCT: Combining IDA with causal mediationAntonio A. Morgan-Lopez, Lissette M. Saavedra, & Wendy K. SilvermanUniversity of North Carolina, Research Triangle Institute

Issues in Modeling II10:50 –12:00 Horizons Room

10:50 –11:10 2→ 1→ 1 Multilevel mediation: Conceptual problems linking levels of analy-sisNiall Bolger & Jean-Philippe LaurenceauColumbia University

11:10 –11:30 Construct validity and the use of risk indices

Keith F. WidamanUniversity of California, Davis

11:30 –11:50 Exploring predictors of well-being as people transition into retirement

Gitta LubkeUniversity of Notre Dame

11:50 –12:00 Weighted least squares with missing data

Michael Neale & Ryne EstabrookVirginia Commonwealth University

2013 SMEP-TV Web Streaming Conference Schedule 1

Intensive Observations and Dynamic Processes1:30 – 3:00 Horizons Room

1:30 – 1:50 Is autocorrelation needed in single-subject designs?

David RindskopfCity University of New York

2:00 – 2:20 More explorations of generalized additive (mixed) models for analyzingsingle-case designsWilliam R. Shadish, Alain Zuur, & Kristynn J. SullivanUniversity of California, Merced

2:20 – 2:40 Dynamic factor analysis of nonstationary processes

Peter C.M. Molenaar & A.M. BelzThe Pennsylvania State University

2:40 – 3:00 Maintained individual data distributed likelihood estimation (MIDDLE)

Steven Boker & Michael NealeUniversity of Virginia & Virginia Commonwealth University

Cattell Award Address3:30 –4:20 Horizons Room (Introduction: Robert MacCallum)

3:30 –4:20 Flexible multidimensional item analysis, test scoring, and model fit evaluation

Li CaiUniversity of California, Los Angeles

2 2013 SMEP-TV Web Streaming Conference Schedule

Friday, October 18

Measurement9:00 –10:10 Horizons Room

9:00 – 9:20 Identifying the source of misfit in item response theory models

Yang Liu & Alberto Maydeu-OlivaresUniversity of Barcelona

9:20 – 9:30 Combining item response theory estimates across discrete stages of testing

Susan Embretson, Kristin Morrison, & Hea Won JungGeorgia Institute of Technology

9:30 – 9:50 A cross-national study of uncertainty and perceptions of global climate change

David V. Budescu, Han Hui Por, & Stephen BroomellFordham University

Latent Variable Modeling10:30 –11:40 Horizons Room

10:50 –11:10 A nonlinear bifactor model of polysubstance use

Patrick J. Curran, Sierra Bainter, Andrea M. Hussong,Daniel J. Bauer & Andrea HowardUniversity of North Carolina, Chapel Hill

11:10 –11:30 Slowly moving from repeated measures ANOVA to dynamic but structuralmodelingJohn McArdleUniversity of Southern California

11:30 –11:40 The cusp catastrophe model as a regime-switching mixture structural equa-tion modelSy-Miin Chow, Katie Witkiewitz, Raoul Grasman, & Stephen A. Maisto,The Pennsylvania State UniversityPresidential Address

1:40 – 2:30 Horizons Room (Introduction: Patrick Shrout)

1:40 – 2:30 On the robustness of results from longitudinal observational studies: Inte-grative data analysis and designs for optimizing detection of within-personchangeScott HoferUniversity of Victoria


Saturday, October 19

Statistical Issues I9:00 –10:10 Horizons Room

9:00 – 9:20 Tukey post hoc means comparison test reconceptualized as an effect size

Joseph S. RossiUniversity of Rhode Island

9:20 – 9:40 Confidence distributions for standardized effect size estimates: The standard-ized regression coefficientJeremy C. BiesanzUniversity of British Columbia

9:40 –10:00 Fungible correlation matrices: A new tool for evaluating penalized regressionmodelsNiels WallerUniversity of Minnesota

10:00 –10:10 The impact of violations of measurement invariance on selection: The discretecaseRoger E. MillsapArizona State University

Statistical Issues II10:30 –11:30 Horizons Room

10:30 –10:50 Bayesian model averaging for propensity score analysis

David Kaplan & Jianshen ChenUniversity of Wisconsin, Madison

10:50 –11:10 Benchmarking in classification and cluster analysis

Douglas SteinleyUniversity of Missouri

11:10 –11:20 Standard errors for SAPA correlations

William Revelle & Ashley BrownNorthwestern University

11:20 –11:30 Evaluating approaches for synthesizing a common measure from differentinstruments to facilitate multi-study integrative data analyses of substanceuse and disorderDaniel BauerUniversity of North Carolina, Chapel Hill


Quantitative Training11:30 –11:50 Horizons Room

11:30 –11:40 Teaching thinking skill in research methods courses

Charles S. ReichardtUniversity of Denver

Cognitive Abilities2:30 – 3:10 Horizons Room

2:30 – 2:50 Are birth order effects on intelligence really Flynn effects? ReinterpretingBelmont and Marolla 40 years laterJoseph Lee RodgersVanderbilt University

2:50 – 3:10 Forty years later: What happens to mathematically precocious youth identi-fied at age 12?David Lubinski, Camilla P. Benbow, & Harrison J. KellVanderbilt University

Sells Award Address3:10 –4:00 Horizons Room (Introduction: Kristopher Preacher)

3:10 –4:00 A Letter from Tuck, and How It Triggered One Miserable Experience andDecades of Research on the Nature and Effects of ErrorRobert MacCallumUniversity of North Carolina, Chapel Hill


2013 SMEP Conference Abstracts(sorted alphabetically by SMEP member last name)

Evaluating approaches for synthesizing a common measure from different instruments tofacilitate multi-study integrative data analyses of substance use and disorder

Daniel J. Bauer, Patrick J. Curran, & Andrea M. HussongThe University of North Carolina at Chapel Hill

Perennial challenges in the field of substance use include obtaining sufficient sample sizes to test low-base rate behaviors, comparing findings across a wide array of substance use measures and diagnosticinstruments, and evaluating the generalizability of findings across the highly heterogeneous population ofsubstance users (e.g., subgroup comparisons). One methodological approach that can help to address thesechallenges is Integrative Data Analysis (IDA) or the simultaneous analysis of data pooled from multiplestudies. Recent research using IDA demonstrates the feasibility of this approach for studying substance useand disorder. The promise of IDA, however, rests on the availability of effective techniques for establishingcommon measurement across studies, that is, the creation of substance use and substance use disordermeasures that are equivalent in scale and meaning across studies despite variation in the primary measuresoriginally used to assess the participants. We describe a newly funded program of research, combining theuse of computer simulation studies with laboratory analogue studies, to determine the conditions underwhich psychometric models can effectively synthesize common measures from distinct primary assessmentinstruments.


Confidence Distributions for Standardized Effect Size Estimates: The StandardizedRegression Coefficient

Jeremy C. BiesanzUniversity of British Columbia

Methodological recommendations strongly emphasize the routine reporting of effect sizes and associatedconfidence intervals to express the uncertainty around primary outcomes. Confidence intervals (CI) forunstandardized effects are easy to construct. However, CIs for standardized effect size estimates suchas the standardized mean difference (e.g., Cohen’s d), the Pearson product-moment correlation, partialcorrelation, squared multiple correlation, and the standardized regression coefficient are far more difficultto construct. The confidence distribution represents a general approach for generating confidence intervalsthat places a single distribution – the confidence distribution – around an effect size estimate. Confidencedistributions are distributions whose quantiles provide the correct confidence interval limits. Addressing acritical gap in the literature, exact confidence intervals for standardized effect size estimates are developedfor both fixed and random predictor models using confidence distributions. Using this approach, animproved approximate confidence interval for the standardized regression coefficient based on a confidencedistribution is developed and compared to existing approaches including bootstrapping, Kelly’s MBESSroutine in R, and Yuan’s (2011) asymptotic delta method.


Maintained Individual Data Distributed Likelihood Estimation (MIDDLE)

Steven BokerUniversity of Virginia

Michael NealeVirginia Commonwealth University

Maintained Individual Data Distributed Likelihood Estimation (MIDDLE) is a revolutionary new paradigmfor the design and analysis of research in the behavioral, social, and health sciences. The MIDDLE approachis based on the seemingly–impossible idea that data can be privately maintained by participants and neverrevealed to researchers, while still enabling statistical models to be fit and scientific hypotheses tested.MIDDLE rests on the assumption that data should belong to, be controlled by, and remain in the possessionof participants. Statistical models are fit by sending an objective function and vector of parameters toeach participants’ personal device (e.g., smartphone), where the likelihood of that individual’s data iscalculated locally. Only the likelihood value is returned to the central optimizer. The optimizer aggregateslikelihood values from all participants and chooses new vectors of parameters until the model converges.A MIDDLE study provides: significantly greater privacy for participants; automatic management of opt–in and opt–out consent; lower cost for the researcher and funding institute; and faster determination ofresults. Furthermore, if a participant opts into several studies simultaneously and opts into data sharing,all of the studies automatically have access to individual–level longitudinal data linked across all studies.


2→1→1 Multilevel Mediation: Conceptual Problems Linking Levels of Analysis

Niall BolgerColumbia University

Jean-Philippe LaurenceauUniversity of Delaware

Although quantitative psychologists now have effectively solved the statistical issues in assessing multilevelmediation, conceptual problems remain. In particular, the conceptual status of 2→1→1 mediation, wherethe X is a between-subject variable, and M and Y are within-subject variables, can be problematic. The2→1 link, is of necessity between-subjects, but the 1→1 link can be assessed at both levels of analysis.What if, as is frequently the case in intensive longitudinal data, the between-subjects 1→1 link is markedlydifferent (in magnitude and/or direction) from the within-subjects link? How should one calculate indirecteffects? If one’s conceptual goal is to understand within-person processes, the within-person 1→1 link isthe logical one to use in calculating indirect effects. Doing so, however, will result in an estimate thatbears no necessary relation to the total effect of X on Y. If one carries out a more standard 2→1→1mediation analysis, the standard mediation decomposition of effects can be calculated, but this result canbe questioned on conceptual grounds. We will illustrate these issues with intensive longitudinal data onpersonality processes in daily life.


A cross-national study of uncertainty and perceptions of global climate change

David V. Budescu, Han Hui Por, & Stephen BroomellFordham University

The Intergovernmental Panel on Climate Change (IPCC) assessments use seven linguistic probability terms(e.g., likely) to communicate uncertainty. Budescu, Por and Broomell (2012) have shown that the publicmisinterprets these probabilistic statements and that supplementing the probability words with numericalranges makes communication of uncertainty more effective. We report results of an international experiment(27 samples, 18 languages, and 10,792 responses) that confirms that the dual presentation format (Wordsand Numbers) is highly beneficial. These results are remarkably stable across all samples and languagesand provide the strongest possible justification for changing the way the IPCC communicates uncertaintyto the public all over the world.


Flexible multidimensional item analysis, test scoring, and model fit evaluation

Li CaiUniversity of California, Los Angeles

Extending prior work on parameter estimation and test scoring using single and multilevel hierarchicalitem factor models, I describe and implement a framework suitable for operational item analysis and testscoring with multidimensional item response models that would allow for flexible specifications of latent andobserved variable distributions. Mixtures of classification and continuous latent variables are permitted,thereby giving the framework sufficient room to handle popular non-compensatory diagnostic classificationmodels, along with compensatory hierarchical and/or higher-order item factor models. In addition, thedistributional shape of dimensional latent variables may be characterized empirically, further relaxing themultivariate normality assumption ubiquitous to item factor analysis. The developments, however, donot take away existing capacity to implement computationally efficient bifactor-type dimension reductionwhen a multitude of specific dimensions are included to model residual item-level dependence. Limited-information model fit statistics and diagnostics are also developed.


The Cusp Catastrophe Model as a Regime-Switching Mixture Structural Equation Model

Sy-Miin ChowPennsylvania State University

Katie WitkiewitzUniversity of New Mexico

Raoul GrasmanUniversity of Amsterdam

Stephen A. MaistoSyracuse University

Catastrophe theory is the study of the many ways in which continuous changes in a system’s parameters canresult in discontinuous changes in one or several outcome variables of interest. Catastrophe theory-inspiredmodels have been used to represent a variety of change phenomena in the realm of social and behavioralsciences. Although promising in its own right, current approaches of fitting catastrophe models do notaddress several practical data analytic problems, such as the presence of incomplete data and categoricalindicators, difficulties in performing model comparison, as well as heterogeneous timing of shifts withinand across subjects. To account for these data analytic issues, we propose a mixture structural equationmodel with regime switching (MSEM-RS) as an alternative way to represent features of one specific typeof catastrophe modelthe cusp catastrophe model. Using a simulation study and empirical examples basedon longitudinal drinking and affect data, we show that the proposed model can capture key features of thecusp catastrophe model while providing renewed insights into the study of change.


A Nonlinear Bifactor Model of Polysubstance Use

Patrick J. Curran, Sierra Bainter, Andrea M. Hussong, Daniel J. Bauer, & Andrea HowardUniversity of North Carolina, Chapel Hill

A longstanding challenge in the study of substance use and abuse has been how to best obtain optimalnumerical measures of consumption and impairment. Not only are individual items typically measuredusing scales that are binary, ordinal, or counts, but it is not always clear how multiple items should best becombined within a single scale. In an attempt to address these issues, we applied a moderated nonlinearbifactor model to eight self-reported items assessing alcohol and illicit drug use obtained from a longitudinalsample of n=1964 adolescents and young adults. Individual items were defined by binary, trichotomous,and ordinal response scales; four of the items assessed alcohol use and four assessed use of specific typesof illicit drugs. A bifactor model was defined such that all eight items loaded on a general polysubstancefactor, and four of the eight loaded on an orthogonal alcohol use-specific factor. This bifactor model wasthen regressed on a set of covariates including gender, age, and parental alcoholism diagnosis to allowfor tests of impact and differential item functioning. An initial measurement model was developed usinga randomly drawn calibration sample; these parameter estimates were then used to obtain continuouslydistributed factor scores of polysubstance use for the entire longitudinal sample. We present an overviewof our general modeling strategy; we examine the estimated general and specific scores; and we discuss thepotential advantages and disadvantages of this analytic approach.


Combining Item Response Theory Estimates Across Discrete Stages of Testing

Susan Embretson, Kristin Morrison & Hea Won JungGeorgia Institute of Technology

Many tests are administered in multiple stages. For example, multistage adaptive tests consist of sep-arate tests that are administered sequentially. Other tests, particularly those used for achievement orcertification, consist of sections that are administered on separate occasions. Ideally, item response theoryestimates of trait level are obtained using the full set of item responses across stages. However, jointestimation can be impractical if the tests are administered in discrete stages. In such cases, weighting theperson’s estimates across stages by the associated information value is often recommended for estimatingoverall trait level. Another possible approach, which is Bayesian, is to use estimates from preceding stagesas individual priors for succeeding stages. A simulation study was conducted to compare the information-weighting method to the individual prior method. It was found that the Bayesian method with individualpriors was more precise across varying conditions of overall test length and relative length of the stages.


On the Robustness of Results from Longitudinal Observational Studies: Integrative DataAnalysis and Designs for Optimizing Detection of Within-Person Change

Scott HoferUniversity of Victoria

Research findings and conclusions often differ across independent longitudinal studies addressing the sametopic. Differences in measurements, sample composition (e.g., age, cohort, country/culture), and statisticalmodels (e.g., change/time function, covariate set, centering, treatment of incomplete data) can affectthe replicability of results (i.e., pattern of significance, magnitude of effect). The central aim of theIntegrative Analysis of Longitudinal Studies of Aging (IALSA) research network (NIH/NIA R01AG026453,P01AG043362) is to optimize opportunities for replication and cross-validation across heterogeneous sourcesof longitudinal data by evaluating comparable conceptual and statistical models at the construct-level. Inthe first part of my talk, I will examine the robustness of within-study results to the selection of measuresand measurement models, study design factors, and statistical models, then discuss the implications forcross-study comparison of results.The second part of my talk focuses on optimal designs for the prospective identification of critical changesin cognitive functioning, with the potential for improved and earlier treatment of cognitive decline. We havelearned from long-term longitudinal studies that it is not uncommon for participants to have undiagnoseddementia for years. Recent results from longitudinal studies show that onset and rate of change in theprodromal period of Alzheimer’s disease varies as a function of cognitive domain (e.g., onset of accelerateddecline for fluid versus crystallized abilities occurs approximately 10 and 5 years before diagnosis, respec-tively). Rather than defining impairment relative to the performance of others (i.e., norm-referenced),clinical care and research require more frequent measurements of functioning over long periods of timein order to permit “prospective” detection of change from an individual’s own normative or typical levelof functioning. I will describe essential design features to identify individual change-points and our cur-rent research that leverages a client-facing web portal connected with a regional, provider-facing electronichealth record (EHR) platform to enable development and deployment of repeated cognitive assessments.


Bayesian Model Averaging for Propensity Score Analysis

David Kaplan & Jianshen ChenUniversity of Wisconsin, Madison

This paper considers Bayesian model averaging as a means of addressing uncertainty in the selection ofthe propensity score equation. Given the full list of covariates in the propensity score equation, we inves-tigate an approximate Bayesian model averaging approach based on the model-averaged propensity scoreestimates produced by the BMA package, but which ignores the uncertainty in the propensity score. There-fore, we also provide a fully Bayesian model averaging approach via a Markov chain Monte Carlo (MCMC)method to account for uncertainty in both parameters and models. A detailed study of our approach ex-amines the differences in the causal estimate when incorporating non-informative versus informative priorsin the model averaging stage. We examine these approaches under propensity score stratification, optimalmatching, and regression adjustment. In addition, we evaluate the impact of changing the size of Occam’swindow used to narrow down the range of possible models. Two comprehensive simulation studies and onecase study are conducted. Overall, results of both simulation and case studies show that both Bayesianmodel averaging propensity score approaches recover the treatment effect estimates well and generally pro-vide larger uncertainty estimates, as expected. The approximate Bayesian model averaging performs bestunder stratification, optimal matching and regression, while the fully Bayesian model averaging approachperforms best under the weighting method. Results also suggest that priors on the propensity score modelparameters have little impact on the treatment effect estimation, but this depends on the choice of priors.In addition, the Bayesian model averaging propensity score approaches are robust to the misspecificationof the propensity score model. Results of the case study indicate that the effects of the Occam’s windowvary slightly across different propensity score methods.


Forty Years Later:What Happens to Mathematically Precocious Youth Identified at Age 12?

David Lubinski, Camilla P. Benbow, & Harrison J. KellVanderbilt University

Preliminary findings from the first midlife follow-up of 1,650 participants from the Study of Mathemati-cally Precocious Youth’s (SMPY) two oldest cohorts will be presented. During 1972-1974 or 1976-1978,participants were identified at age 12 as in the top 1% in mathematical reasoning ability. They were sur-veyed over the web from January 2012 to February 2013 on their accomplishments, family, and personalwell-being. Particular attention will be devoted to their occupational status, creative accomplishments,and mate preferences as well as how they invest their time and future plans. Sex differences in occupa-tional preferences, personal views, and life values will be reviewed. This presentation will conclude with adiscussion of satisfaction with: career success, career direction, relationships, and life.


Exploring predictors of well-being as people transition into retirement

Gitta LubkeUniversity of Notre Dame

Longitudinal data collections in Psychology are often limited to a few hundred subjects but include largenumbers of different questionnaires. Prior knowledge that would permit the specification of structuralrelations between the constructs measured with these questionnaires is also often rather limited, andtherefor the use of parametric modeling is not always appropriate. Data-mining methods rely on fewerassumptions, and are designed for the exploration of data sets where the number of variables can be largerthan the number of subjects. Gradient boosting is a tree-based method that can be used to build predictionmodels in an iterative fashion. This technique is illustrated with an application to psychological well-beingusing data from the Notre Dame Study of Health and Wellbeing. The focus is on exploring the changingimportance of predictors of psychological well-being as people transition from work life to pre-retirementto post retirement.


A Letter from Tuck, and How It Triggered One Miserable Experience and Decades ofResearch on the Nature and Effects of Error

Robert MacCallumUniversity of North Carolina, Chapel Hill

In this talk I will present a retrospective overview of many years of research on the nature and consequencesof error in data, especially in the context of covariance structure and factor analysis models and methods,along with a description of some ongoing developments. I will describe how much of this research originatedin key comments in a 1987 letter from Ledyard Tucker, as well as in a paper that was soundly rejectedfor publication. A subsequent published paper became the basis for several distinct lines of research onestimation methods, sample size questions, and parcels. Turning to ongoing developments, I will discusshow the general notion of the nature and role of error is the basis for recent research in the field ofuncertainty quantification on the problem of fitting models to data in the presence of both sampling errorand model error.


Imagery and Memory Theory as Substantive Validation of Mediation Analysis

David P. MacKinnon, Ingrid C. Wurpts, & Matthew J. ValenteArizona State University

This presentation describes the validation of a statistical model by demonstrating that it generates accurateconclusions in the analysis of data on a known or established phenomenon. The established phenomenonstudied was that increasing imagery when learning a list of words improves recall for words. The validity ofstatistical mediation analysis methods was evaluated. An inaccurate method would generate inconsistentestimates and research conclusions for a known substantive effect. Six studies of an imagery manipulationon reported imagery and word recall were conducted. Traditional and modern causal inference mediationmethods were applied to the data. The different methods led to conclusions about mediation that weregenerally consistent with established theory that increased imagery leads to increased memory. Valida-tion based on established theory or Known Effect Validation (KEV) is discussed in the context of otherapproaches such as mathematical proof, successful application by researchers, and statistical simulation.Validation based on established substantive theory is discussed as a general way to investigate statisticalmethods.


Identifying the source of misfit in item response theory models

Yang LiuUniversity of North Carolina, Chapel Hill

Alberto Maydeu-OlivaresUniversity of Barcelona

When an item response theory model fails to fit adequately, the items for which the model provides a goodfit and those for which it does not must be determined. To this end, we compare the performance of severalstatistics for item pairs with known asymptotic distributions under maximum likelihood estimation of theitem parameters: a) a mean and variance adjustment to Pearson’s X2, b) a bivariate subtable analogueto Reiser’s (1996) overall goodness-of-fit test, c) a z-statistic for the residual cross-product, d) Maydeu-Olivares and Joe’s (2006) M2 statistic. The unadjusted Pearson’s X2 and X2 with degrees of freedomheuristically equal to that of the independence model, as suggested by Chen and Thissen (1997), are alsoincluded in the comparison. For binary and ordinal data, the z-statistic is recommended due to its TypeI error and power results in the simulation study. For polytomous nominal data, the mean and varianceadjusted X2 is recommended.


Slowly Moving from Repeated Measures ANOVA to Dynamic BUT Structural Modeling

John J. McArdleUniversity of Southern California

The predominance of Repeated Measures ANOVA (RANOVA) in longitudinal data analysis is considered.Controversies about the required covariance assumptions of the data (i.e., compound symmetry) may havebeen settled by the use of an epsilon factor to correct the probability values.But the recent surge of activity in Longitudinal Structural Equation Models (LSEM) should not be ignoredeither (McArdle, 2009; McArdle & Prindle, 2008). Although it is not often stated, the RANOVA can bethought of and fitted as a special case of the more general LSEM approach. That is, RANOVA can befitted as an LSEM with restrictions. As soon as this basic RANOVA option is demonstrated in LSEM,other longitudinal modeling approaches become clear.The need for these new approaches to dynamic analysis comes largely when we want to examine hypothesesabout the individual differences in changes. The LSEM is not considered the final statement here, andother models can be used instead. To clarify this first option, numerical examples are presented usingstandard SEM and SAS software. The key dynamic question arises – “What is your model for change?”The biggest surprise comes when many researchers have questions and ideas that are well beyond theRANOVA approaches they use.

McArdle, J.J. (2008). Latent variable modeling of differences and changes with longitudinal data. AnnualReview of Psychology, 60, 577–605. PMCID: 18817479

McArdle, J.J. & Prindle, J.J. (2008). A latent change score analysis of a randomized clinical trial inreasoning training. Psychology and Aging, 23(4), 702–719. PMCID: 19140642


The impact of violations of measurement invariance on selection: The discrete case

Roger E. MillsapArizona State University

While a large literature exists on methods for detecting violations of measurement invariance, relativelylittle work exists on the consequences of violations for the use of tests in practice. Millsap and Kwok (2004)presented a method for evaluating the impact of violations of invariance on the use of tests for selection. Themethod examined the impact on the accuracy of selection in two populations, using sensitivity, specificity,and the success ratio as the statistics of interest. One weakness of the method was that it assumed themeasures were continuous in scale, as assumption that is inappropriate for most item data. The extensionof the method to handle discrete measures is now being developed. This brief talk will describe the progressto date on this project.


Dynamic Factor Analysis of Nonstationary Processes

Molenaar, P. C. M., & Belz, A. M.The Pennsylvania State University

Using a Gaussian linear dynamic factor model with arbitrarily time-varying parameters, a Monte Carlostudy is carried out to test the performance of the second-order extended Kalman filter to fit this model tothe data without any a priori knowledge about which parameters are time-varying and, if so, how. Someresults will be presented. In the closing part theoretical considerations will be presented on the inadequacyof standard methods to test for measurement invariance in factor models for this class of nonstationarydynamic factor analysis and possible extensions of testing for measurement equivalence will be hinted at.


Assessing the Effects of Child Anxiety Treatment When an RCT Ceases to be an RCT:Combining Integrative Data Analysis with Causal Mediation

Antonio A. Morgan-Lopez, Lissette M. SaavedraRTI International

Wendy K. SilvermanYale University

Evidence from over 30 randomized controlled trials provides strong and consistent evidence for the short-term efficacy of exposure-based cognitive behavioral treatment (CBT) for reducing anxiety in children.The picture is not as clear with regard to the long-term treatment effects of CBT on anxiety and theprevention of other conditions (e.g., depression, substance use) during young adulthood because wait-list control conditions (the design of choice in the late 1990’s) compromise the assessment of long-termfollow-up in RCTs. Using non-RCT alternatives, anxiety researchers have compared (a) an original treatedgroup and a wait-listed treated group, (b) comparisons between two or more active CBT conditions, or(c) the long-term comparison of children who still met criteria for anxiety after one year versus thosethat did not; all of these alternatives are open to clear threats to causal inference. We proposed a quasi-experiment comparing (a) two combined clinical trial samples of treated anxious youth (total n = 106),and (b) an independent cohort of untreated anxious youth (n = 274) and used both the Causal Mediationframework and Integrative Data Analysis to address both selection and measurement bias that is germaneto non-equivalent control group designs. Preliminary results from a sensitivity analysis indicated that,though similar inferences would have been made favoring the long-term efficacy of CBT on young adultsubstance use (as mediated through changes in anxiety), the differences in effect sizes favoring treatmentacross methods were non-trivial. This work serves as an illustration that quasi-experimental designs (incombination with propensity scoring and IDA) offer a potent alternative to true RCTs (which are notalways feasible in child mental health treatment) and to weaker long-term follow-up designs.


Weighted least squares with missing data

Michael NealeVirginia Commonwealth University

Ryne EstabrookNorthwestern University

Analysis of raw ordinal data by normal theory full-information maximum likelihood (FIML) is attractivebut can be prohibitively slow for large numbers of items, or other sources of non-independence that create’wide’ data. One of the attractions of FIML is its handling of missing data, which may be missing completelyat random (MCAR) or at random (MAR) without causing asymptotic bias in the estimates of tetrachoricand polychoric correlations. Earlier seminal work by Browne (1992) showed how asymptotic weightedleast squares offered a practical and rapid approach to the analysis of ordinal data, but certain patterns ofmissing data present problems. In particular, MAR structures may yield (downwardly) biased estimatesof correlations with two-step approaches used by popular commercial software packages. Accordingly, weshow how a hybrid approach of maximum likelihood estimation of correlations may be combined with aweighted least squares approach to facilitate model fitting with many items. Applications to substance useand abuse, and to stem-probe interview designs are considered.


Teaching Thinking Skills in Research Methods Courses

Charles S. ReichardtUniversity of Denver

Thinking is a skill. Becoming skilled at thinking requires extensive practice. I survey a large collection ofexercises that are well suited for use in research methods courses and that provide students with extensivepractice in critical thinking.The exercises are available at http://mysite.du.edu/ creichar.


Standard Errors for SAPA correlations

William Revelle & Ashley BrownNorthwestern University

As part of the SAPA project (Synthetic Aperture Personality Assessment) we use a Massively Missing atRandom strategy which is essential the Matrix Sampling type-12 procedure discussed by Lord (1955). Wecompare the standard errors of the correlations between two constructs using SAPA techniques and moreconventional procedures and argue for the benefits of SAPA. Consider the case of studying two constructswith 1600 subjects who because of time constraints may be given only 4 items each. Is it better to give all1600 the same 4 items (two for each measure), or is it better to randomly sample 4 items from 16 itemsand thus give each item to only 400 people? We show some surprising (at least to us) results of the benefitsof the SAPA procedure and consider the implications for large scale survey designs.


Is autocorrelation needed in single-subject designs?

David RindskopfCity University of New York

Models for single-case designs (SCDs), being models for short time series, sometimes include an auto-correlation parameter. However, there is good reason to believe that these autocorrelations are at bestoverestimates, and at worst actually zero. The problem is that autocorrelation can be caused by omit-ted fixed effects, which commonly plague SCDs. This paper discusses implications of incorrectly fittingautocorrelations, and detection of omitted fixed effects.


Are Birth Order Effects on Intelligence Really Flynn Effects? Reinterpreting Belmont andMarolla 40 Years Later

Joseph Lee RodgersVanderbilt University

I reinterpret a forty-year-old finding by Belmont and Marolla (1973), who believed their Dutch IQ patternswere caused by within-family processes related to birth order. However, their inferred relation was almostcertainly caused by differences between families – in parental IQ, maternal education, and/or dozens ofother processes. I show that the Flynn Effect (which emerges from and is likely caused by combinationsof such between-family processes) can theoretically account for the Belmont and Marolla patterns. I thendraw on past research and additional analysis to show that the Flynn Effect was actually occurring in theNetherlands at the correct time and magnitude to explain the Belmont and Marolla patterns.


Tukey Post Hoc Means Comparison Test Reconceptualized as an Effect Size

Joseph S. RossiUniversity of Rhode Island

In simple ANOVA designs, the test of interest is usually not the omnibus F test but the follow-up posthoc pairwise means comparison test. Such tests usually require obtained study results to be calculated(e.g., MS-error). A simple rearrangement of terms permits the Tukey post hoc test to be reconceptualizedin terms of Cohen’s d. Doing so permits researchers to calculate easily the critical magnitude of effect,dc, that will be necessary for the Tukey test results to be significant. This approach also permits theconstruction of a simple table that allows researchers to look up dc based only on the number of groupsand the number of subjects per group. Since both of these study design parameters should be known priorto conducting a study, this approach permits researchers to estimate the minimum effect size needed to besignificant before conducting a study.


More explorations of generalized additive (mixed) models for analyzing single-case designs

William R. Shadish, Alain Zuur, & Kristynn J. SullivanUniversity of California, Merced

Generalized additive models (GAMs) are a semi-parametric regression method that allows researchers tocombine parametric predictors with nonparametric smoothers in the same analysis. GAMs using nonpara-metric smoothers to create data-based nonlinear functions of the relationship between covariates and thedependent variable, where they exist. This is useful when the functional form of that relationship is notknown, as is typically the case in parametric regression. This talk will demonstrate the application ofGAMs to assessing functional form in single-case designs (SCDs), which are interrupted time series usedto assess treatment effects in many areas. We will also show how to use generalized additive mixed modelsto include autoregressive and random effects models.


Benchmarking in Classification and Cluster Analysis

Douglas SteinleyUniversity of Missouri

As incoming president of the Classification Society, I am spearheading an effort in benchmarking. Thegoal herein is to establish a set of foundational guidelines for benchmarking in cluster analysis for theevaluation of clustering algorithms. By no means are we under the illusion that such guidelines will becomprehensive; however, we hope to discuss a broad enough set of concerns such that a working rubricis provided for a minimum standard of inclusion when evaluating the performance of newly proposedmethodologies. Such topics discussed includes appropriate evaluative methods of simulation studies anddecision making processes in assessing methodologies in general, with specific emphasis geared towardsmultivariate analysis.


Comparing Visual and Statistical Analysis in Single-Subject Studies Using PublishedStudies

Wayne F. Velicer & Magdalena HarringtonUniversity of Rhode Island

Objective. There has been an ongoing scientific debate regarding the most reliable and valid methodof single-subject data evaluation in the applied behavior analysis area among the advocates of the visualanalysis and proponents of the interrupted time-series analysis (ITSA). To address this debate, a head-to-head comparison of both methods was performed, as well as an overview of serial dependency, effect sizesand sample sizes.Method. Conclusions drawn from visual analysis of the graphs published in the Journal of AppliedBehavior Analysis (2010) were compared with the findings based on the ITSA of the same data. Thiscomparison was made possible by the development of software, called UnGraph R© which permits therecovery of the raw data from the graphs, allowing application of ITSA.Results. ITSA was successfully applied to 94% of the examined time-series data with number of observa-tions ranging from 8 to 136. Over 60% of the data had moderate to high level first order autocorrelations(> .40). A large effect size (≤ .80) was found for 73% of eligible studies. Comparison of the conclusionsdrawn from visual analysis and ITSA revealed an overall low level of agreement (Kappa= .14).Conclusions. These findings show that ITSA can be broadly implemented in applied behavior analysisresearch and can facilitate evaluation of intervention effects, particularly when specific characteristics ofsingle-subject data limit the reliability and validity of visual analysis. These two methods should be viewedas complimentary and used concurrently.


Fungible Correlation Matrices: A New Tool for Evaluating Penalized Regression Models

Niels G. WallerUniversity of Minnesota

Although much is known about the asymptotic properties of penalized regression models (e.g., the lasso,the elastic-net, ridge regression; Huang, Horowitz, & Ma, 2008; Knight & Fu, 2000; Zhao & Yu, 2006), lessis known about their finite sample performance. Thus, applied researchers often wonder “Which penalizedregression model is best for my data?” Simulated dataor what Raymond Cattell called ‘data plasmodes’canhelp answer this question. In this talk, I demonstrate that for a fixed set of standardized regression coeffi-cients, β, and a fixed coefficient of determination, R2, there are an infinite number of predictor correlationmatrices, R?

x, that will satisfy βT R?xβ = R2. I call such matrices: fungible correlation matrices. I de-

scribe an algorithm for generating positive definite, positive semidefinite, or indefinite fungible correlationmatrices that have a random or fixed (user-defined) smallest eigenvalue. The underlying equations of thealgorithm are reviewed from both algebraic and geometric perspectives. Monte Carlo results indicate thatfungible correlation matrices can be profitably used to evaluate the relative performance of penalized re-gression algorithms and variable selection routines. R code for generating fungible correlation matrices isdescribed.


Construct Validity and the Use of Risk Indices

Keith F. WidamanUniversity of California, Davis

For more than three decades, developmental scientists have used risk indices as shorthand indicators ofbarriers to optimal development. For example, Sameroff and colleagues identified 10 variables reflectingdifficult environmental circumstances, dichotomized these 10 variables at notable points, and summedthem – resulting in a 0-10 scale of caretaking casualty. More recently, work on GxE studies has used singlenucleotide polymorphisms (SNPs). SNPs often are coded 0, 1, 2 for the number of putative risk alleles.These SNPs can then be summed to form a risk index across a set of SNPs. I will discuss some patternsin data to support the use of such risk indices as well as analyses that can be performed to investigatethe construct validity of risk indices. Implications for research and theory – and for the way in which weanalyze our data – will be stressed.



Documents

2013 SMEP-TV Web Streaming Conference Schedule