The DEcIDE Methods Center publishes a monthly literature scan of current articles of interest to the field of comparative effectiveness research.

You can find them all here.

July 2011


CER Scan [Published within the past 30 days]


    1. Pharmacoepidemiol Drug Saf. 2011 Jun 30. doi: 10.1002/pds.2152. [Epub ahead of print]

    Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records. Toh S, García Rodríguez LA, Hernán MA.Department of Population Medicine, Harvard Medical School/Harvard Pilgrim Health Care Institute, Boston, MA, USA.

    PURPOSE: A semi-automated high-dimensional propensity score (hd-PS) algorithm has been proposed to adjust for confounding in claims databases. The feasibility of using this algorithm in other types of healthcare databases is unknown.

    METHODS: We estimated the comparative safety of traditional non-steroidal anti-inflammatory drugs (NSAIDs) and selective COX-2 inhibitors regarding the risk of upper gastrointestinal bleeding (UGIB) in The Health Improvement Network, an electronic medical record (EMR) database in the UK. We compared the adjusted effect estimates when the confounders were identified using expert knowledge or the semi-automated hd-PS algorithm.

    RESULTS: Compared with the 411,616 traditional NSAID initiators, the crude odds ratio (OR) of UGIB was 1.50 (95%CI: 0.98, 2.28) for the 43,569 selective COX-2 inhibitor initiators. The OR dropped to 0.81 (0.52, 1.27) upon adjustment for known risk factors for UGIB that are typically available in both claims and EMR databases. The OR remained similar when further adjusting for covariates-smoking, alcohol consumption, and body mass index-that are not typically recorded in claims databases (OR 0.81; 0.51, 1.26) or adding 500 empirically identified covariates using the hd-PS algorithm (OR 0.78; 0.49, 1.22). Adjusting for age and sex plus 500 empirically identified covariates produced an OR of 0.87 (0.56, 1.34).

    CONCLUSIONS: The hd-PS algorithm can be implemented in pharmacoepidemiologic studies that use primary care EMR databases such as The Health Improvement Network. For the NSAID-UGIB association for which major confounders are well known, further adjustment for covariates selected by the algorithm had little impact on the effect estimate. Copyright © 2011 John Wiley & Sons, Ltd.

    PMID: 21717528 [PubMed – as supplied by publisher]

    CER Scan [published within the last 2 months]

    2. BMC Med Res Methodol. 2011 May 23;11:77.

    Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes. Li B, Lingsma HF, Steyerberg EW, Lesaffre E.Department of Biostatistics, Erasmus MC, Dr, Molewaterplein 50, Rotterdam, the Netherlands.

    BACKGROUND: Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models.

    METHODS: We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesianapproaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted.

    RESULTS: The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient.

    CONCLUSIONS: On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either afrequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen”non-informative” prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.

    PMCID: PMC3112198 PMID: 21605357 [PubMed – in process]

    Free article:

    We recommend reviewing Supplemental Material: Additional File #2

    3. Stat Med. 2011 Jul 10;30(15):1837-51. doi: 10.1002/sim.4240. Epub 2011 Apr 15.

    Semiparametric regression models for detecting effect modification in matched case-crossover studies. Kim I, Cheong HK, Kim H. Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, U.S.A.

    In matched case-crossover studies, it is generally accepted that covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model because any stratum effect is removed by the conditioning on the fixed number of sets of a case and controls in the stratum. Hence, the conditional logistic regression model is not able to detect any effects associated with the matching covariates by stratum. In addition, the matching covariates may be effect modification and the methods for assessing and characterizing effect modification by matching covariates are quite limited. In this article, we propose a unified approach in its ability to detect both parametric and nonparametric relationships between the predictor and the relative risk of disease or binary outcome, as well as potential effect modifications by matching covariates. Two methods are developed using two semiparametric models: (1) the regression spline varying coefficients model and (2) the regression spline interaction model. Simulation results show that the two approaches are comparable. These methods can be used in any matched case-control study and extend to multilevel effect modification studies. We demonstrate the advantage of our approach using an epidemiological example of a 1-4 bi-directional case-crossover study of childhood aseptic meningitis associated with drinking water turbidity. Copyright © 2011 John Wiley & Sons, Ltd.

    PMID: 21495061 [PubMed – in process]

Theme: Data Linkage



    1. J Clin Epidemiol. 2011 May;64(5):565-72. Epub 2010 Oct 16.

    Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. Tromp M, Ravelli AC, Bonsel GJ, Hasman A, Reitsma JB. Department of Medical Informatics, Academic Medical Center, University of Amsterdam, 1100 DE Amsterdam, The Netherlands.

    OBJECTIVE: To gain insight into the performance of deterministic record linkage (DRL) vs. probabilistic record linkage (PRL) strategies under different conditions by varying the frequency of registration errors and the amount of discriminating power.

    STUDY DESIGN AND SETTING: A simulation study in which data characteristics were varied to create a range of realistic linkage scenarios. For each scenario, we compared the number of misclassifications (number of false nonlinks and false links) made by the different linking strategies: deterministic full, deterministic N-1, and probabilistic.

    RESULTS: The full deterministic strategy produced the lowest number of false positive links but at the expense of missing considerable numbers of matches dependent on the error rate of the linking variables. The probabilistic strategy outperformed the deterministic strategy (full or N-1) across all scenarios. A deterministic strategy can match the performance of a probabilistic approach providing that the decision about which disagreements should be tolerated is made correctly. This requires a priori knowledge about the quality of all linking variables, whereas this information is inherently generated by a probabilistic strategy.

    CONCLUSION: PRL is more flexible and provides data about the quality of the linkage process that in turn can minimize the degree of linking errors, given the data provided.

    PMID: 20952162 [PubMed – indexed for MEDLINE]

    2. Am J Epidemiol. 2011 May 1;173(9):1059-68. Epub 2011 Mar 23.

    Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project. St Sauver JL, Grossardt BR, Yawn BP, Melton LJ 3rd, Rocca WA. Division of Epidemiology, Department of Health Sciences Research, College of Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.

    The Rochester Epidemiology Project (REP) is a unique research infrastructure in which the medical records of virtually all persons residing in Olmsted County, Minnesota, for over 40 years have been linked and archived. In the present article, the authors describe how the REP links medical records from multiple health care institutions to specific individuals and how residency is confirmed over time. Additionally, the authors provide evidence for the validity of the REP Census enumeration. Between 1966 and 2008, 1,145,856 medical records were linked to 486,564 individuals in the REP. The REP Census was found to be valid when compared with a list of residents obtained from random digit dialing, a list of residents of nursing homes and senior citizen complexes, a commercial list of residents, and a manual review of records. In addition, the REP Census counts were comparable to those of 4 decennial US censuses (e.g., it included 104.1% of 1970 and 102.7% of 2000 census counts). The duration for which each person was captured in the system varied greatly by age and calendar year; however, the duration was typically substantial. Comprehensive medical records linkage systems like the REP can be used to maintain a continuously updated census and to provide an optimal sampling framework for epidemiologic studies.

    PMCID: PMC3105274 [Available on 2012/5/1] PMID: 21430193 [PubMed – indexed for MEDLINE]

    3. Stat Methods Med Res. 2011 Jun 10. [Epub ahead of print]

    Linkage of patient records from disparate sources. Li X, Shen C. Division of Biostatistics, Indiana University School of Medicine, Indianapolis, US.

    We review ideas, approaches and progress in the field of record linkage. We point out that the latent class models used in probabilistic matching have been well developed and applied in a different context of diagnostic testing when the true disease status is unknown. The methodology developed in the diagnostic testing setting can be potentially translated and applied in record linkage. Although there are many methods for record linkage, a comprehensive evaluation of methods for a wide range of real-world data with different data characteristics and with true match status is absent due to lack of data sharing. However, the recent availability of generators of synthetic data with realistic characteristics renders such evaluations feasible.

    PMID: 21665896 [PubMed – as supplied by publisher]

Share →