Publications

 

The following informatics and translational research articles are hand-curated from various sources and include publications by CTSA Program authors.

 

Informatics Publications

Title Publication Date Abstract Authors Venue
Ethical Machine Learning in Health Care

The use of machine learning (ML) in health care raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline ethical considerations for equitable ML in the advancement of health care. Specifically, we frame ethics of ML in health care through the lens of social justice. We describe ongoing efforts and outline challenges in a proposed pipeline of ethical ML in health, ranging from problem selection to post-deployment considerations. We close by summarizing recommendations to address these challenges.

Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi

Annual Reviews
The case for open science: rare diseases

The premise of Open Science is that research and medical management will progress faster if data and knowledge are openly shared. The value of Open Science is nowhere more important and appreciated than in the rare disease (RD) community. Research into RDs has been limited by insufficient patient data and resources, a paucity of trained disease experts, and lack of therapeutics, leading to long delays in diagnosis and treatment. These issues can be ameliorated by following the principles and practices of sharing that are intrinsic to Open Science. Here, we describe how the RD community has adopted the core pillars of Open Science, adding new initiatives to promote care and research for RD patients and, ultimately, for all of medicine. We also present recommendations that can advance Open Science more globally.

https://doi.org/10.1093/jamiaopen/ooaa030

Yaffa R RubinsteinPeter N RobinsonWilliam A GahlPaul AvillachGareth BaynamHelene CederrothRebecca M GoodwinStephen C GroftMats G HanssonNomi L HarrisVojtech HuserDeborah MascalzoniJulie A McMurryMatthew MightChristoffer NellakerBarend MonsDina N PaltooJonathan PevsnerManuel PosadaAlison P Rockett-FraseMarco RoosTamar B RubinsteinDomenica TaruscioEsther van EnckevortMelissa A Haendel

JAMIA Open
Clinical concept extraction: A methodology review

Background: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement.

Objectives: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications.

Methods: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library.

Results: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.

https://doi.org/10.1016/j.jbi.2020.103526

Sunyang Fu, David Chen, Huan He, Sijia Liu, Sungrim Moon, Kevin J.Peterson, Feichen Shen, Liwei Wang, Yanshan Wang, Andrew Wen, Yiqing Zhao, Sunghwan Sohn, Hongfang Liu

Journal of Biomedical Informatics
FDA Finalizes Guidance on Civil Money Penalties Relating to the ClinicalTrials.gov Data Bank

The U.S. Food and Drug Administration (FDA) has finalized guidance for civil money penalties related to reporting violations on the ClinicalTrials.gov website. The document details how the agency plans to identify if responsible parties have failed to submit required clinical trial registration or results information to the data bank, if they have submitted false or misleading information, or if they have failed to submit certification to the FDA. Additionally, the guidance lists the situations in which FDA may seek civil money penalties for noncompliance and the penalty amounts that could be assessed for ClinicalTrials.gov reporting violations. "Innovative advances in medical products and transparency in the clinical trials process depend on compliance with ClinicalTrials.gov submission requirements. Certain clinical trials must be registered, and summary results information for such clinical trials must, generally, be submitted within one year of the trial's primary completion date," says Anand Shah, MD, FDA's Deputy Commissioner for Medical and Scientific Affairs. Shah adds that while voluntary compliance with the law is optimal, "we intend to hold responsible parties and submitters accountable, including potential legal action, if they are not in compliance."

Anand Shah, MD, Deputy Commissioner for Medical and Scientific Affairs

FDA Website
Translational Personas and Hospital Library Services

Academic health centers, CTSA hubs, and hospital libraries experience similar funding challenges and charges to do more with less. In recent years academic health center and hospital librarians have risen to these challenges by examining their service models, and beyond that, examining their patron base and users’ needs. To meet the needs of employees, patients, and those who assist patients, hospital librarians can employ the CTS Personas, a project of the Clinical and Translational Science Awards (CTSA) Program National Center for Data to Health. The Persona profiles, which outline the motivations, goals, pain points, wants, and needs of twelve employees and two patients in translational science, provide vital information and insights that can inform everything from designing software tools and educational services, to advertising these services, to designing impactful and collaborative library spaces.

https://doi.org/10.1080/15323269.2020.1778983

Sara GonzalesLisa O’KeefeKaren GutzmanGuillaume Viger,Annie B. WescottBailey FarrowAllison P. HeathMeen Chul KimDeanne TaylorRobin ChampieuxPo-Yin Yen & Kristi L. Holmes

 

Journal of Hospital Librarianship
Leveraging Synthetic Data for COVID-19 Research, Collaboration

Researchers at Washington University are using synthetic data to accelerate COVID-19 research and facilitate collaboration among healthcare institutions.

Jessica Kent

Health IT Analytics
Interpretable Clinical Genomics with a Likelihood Ratio Paradigm

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%–50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.

DOI: https://doi.org/10.1016/j.ajhg.2020.06.021

Peter N. Robinson, Vida Ravanmehr, Julius O.B. JacobsenDaniel Danis, Xingmin Aaron Zhang, Leigh C. Carmody, Michael A. Gargano, Courtney L. Thaxton, UNC Biocuration Core, Guy Karlebach, Justin Reese, Manuel Holtgrewe, Sebastian Köhler, Julie A. McMurry, Melissa A. Haendel, Damian Smedley

AJHG
Big Data and Collaboration Seek to Fight Covid-19

Researchers try unprecedented data sharing and cooperation to understand COVID-19—and develop a model for diseases beyond the coronavirus pandemic.

Emma Yasinski
The Scientist
Understanding enterprise data warehouses to support clinical and translational research

Objective

Among National Institutes of Health Clinical and Translational Science Award (CTSA) hubs, adoption of electronic data warehouses for research (EDW4R) containing data from electronic health record systems is nearly ubiquitous. Although benefits of EDW4R include more effective, efficient support of scientists, little is known about how CTSA hubs have implemented EDW4R services. The goal of this qualitative study was to understand the ways in which CTSA hubs have operationalized EDW4R to support clinical and translational researchers.

Materials and Methods

After conducting semistructured interviews with informatics leaders from 20 CTSA hubs, we performed a directed content analysis of interview notes informed by naturalistic inquiry.

Results

We identified 12 themes: organization and data; oversight and governance; data access request process; data access modalities; data access for users with different skill sets; engagement, communication, and literacy; service management coordinated with enterprise information technology; service management coordinated within a CTSA hub; service management coordinated between informatics and biostatistics; funding approaches; performance metrics; and future trends and current technology challenges.

Discussion

This study is a step in developing an improved understanding and creating a common vocabulary about EDW4R operations across institutions. Findings indicate an opportunity for establishing best practices for EDW4R operations in academic medicine. Such guidance could reduce the costs associated with developing an EDW4R by establishing a clear roadmap and maturity path for institutions to follow.

Conclusions

CTSA hubs described varying approaches to EDW4R operations that may assist other institutions in better serving investigators with electronic patient data.

https://doi.org/10.1093/jamia/ocaa089

Thomas R Campion, JrCatherine K CravenDavid A DorrBoyd M Knosp

JAMIA
Celebrating G. Octo Barnett, MD

In the eighth month of 2020, in which the COVID-19 (coronavirus disease 2019) pandemic remains a global health crisis and there is heightened awareness of structural racism in our society, I’ve chosen to step back from these critical issues and briefly reflect on the legacy of G. Octo Barnett, MD, medical informatics pioneer, who died at the end of June.

Octo will be missed, but there is no doubt that his influence on our field will live on.

https://doi.org/10.1093/jamia/ocaa170

JAMIA editorial

JAMIA
SCOR: A secure international informatics infrastructure to investigate COVID-19

Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, the scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate automated and legally compliant federated analysis on an international scale. Existing health informatics systems do not incorporate the latest progress in modern security and federated machine learning algorithms, which are poised to offer solutions. An international group of passionate researchers came together with a joint mission to solve the problem with our finest models and tools. The SCOR consortium has developed a ready-to-deploy secure infrastructure using world-class privacy and security technologies to reconcile the privacy/utility conflicts. We hope our effort will make a change and accelerate research in future pandemics with broad and diverse samples on an international scale.

https://doi.org/10.1093/jamia/ocaa172

J L RaisaroFrancesco MarinoJuan Troncoso-PastorizaRaphaelle Beau-LejdstromRiccardo BellazziRobert MurphyElmer V BernstamHenry WangMauro BucaloYong ChenAssaf GottliebArif HarmanciMiran KimYejin KimJeffrey KlannCatherine KlersyBradley A MalinMarie MéanFabian PrasserLuigia ScudellerAli TorkamaniJulien VaucherMamta PuppalaStephen T C WongMilana Frenkel-MorgensternHua XuBaba Maiyaki MusaAbdulrazaq G HabibTrevor CohenAdam WilcoxHamisu M SalihuHeidi SofiaXiaoqian JiangJ P Hubaux

JAMIA
Economic evaluations of big data analytics for clinical decision-making: a scoping review

Objective

Much has been invested in big data analytics to improve health and reduce costs. However, it is unknown whether these investments have achieved the desired goals. We performed a scoping review to determine the health and economic impact of big data analytics for clinical decision-making.

Materials and Methods

We searched Medline, Embase, Web of Science and the National Health Services Economic Evaluations Database for relevant articles. We included peer-reviewed papers that report the health economic impact of analytics that assist clinical decision-making. We extracted the economic methods and estimated impact and also assessed the quality of the methods used. In addition, we estimated how many studies assessed “big data analytics” based on a broad definition of this term.

Results

The search yielded 12 133 papers but only 71 studies fulfilled all eligibility criteria. Only a few papers were full economic evaluations; many were performed during development. Papers frequently reported savings for healthcare payers but only 20% also included costs of analytics. Twenty studies examined “big data analytics” and only 7 reported both cost-savings and better outcomes.

Discussion

The promised potential of big data is not yet reflected in the literature, partly since only a few full and properly performed economic evaluations have been published. This and the lack of a clear definition of “big data” limit policy makers and healthcare professionals from determining which big data initiatives are worth implementing.

https://doi.org/10.1093/jamia/ocaa102

Lytske BakkerJos AartsCarin Uyl-de GrootWilliam Redekop

JAMIA
Artificial intelligence driven assessment of routinely collected healthcare data is an effective screening test for COVID-19 in patients presenting to hospital

This article is a preprint and has not been peer-reviewed [what does this mean?]. It reports new medical research that has yet to be evaluated and so should not be used to guide clinical practice.

The early clinical course of SARS-CoV-2 infection can be difficult to distinguish from other undifferentiated medical presentations to hospital, however viral specific real- time polymerase chain reaction (RT-PCR) testing has limited sensitivity and can take up to 48 hours for operational reasons. In this study, we develop two early-detection models to identify COVID-19 using routinely collected data typically available within one hour (laboratory tests, blood gas and vital signs) during 115,394 emergency presentations and 72,310 admissions to hospital. Our emergency department (ED) model achieved 77.4% sensitivity and 95.7% specificity (AUROC 0.939) for COVID- 19 amongst all patients attending hospital, and Admissions model achieved 77.4% sensitivity and 94.8% specificity (AUROC 0.940) for the subset admitted to hospital. Both models achieve high negative predictive values (>99%) across a range of prevalences (<5%), facilitating rapid exclusion during triage to guide infection control. We prospectively validated our models across all patients presenting and admitted to a large UK teaching hospital group in a two-week test period, achieving 92.3% (n= 3,326, NPV: 97.6%, AUROC: 0.881) and 92.5% accuracy (n=1,715, NPV: 97.7%, AUROC: 0.871) in comparison to RT-PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improves apparent accuracy (95.1% and 94.1%) and NPV (99.0% and 98.5%). Our artificial intelligence models perform effectively as a screening test for COVID-19 in emergency departments and hospital admission units, offering high impact in settings where rapid testing is unavailable.

medRxiv
OpenSAFELY: Factors Associated COVID-19 Deaths in 17 Million Patients

COVID-19 has rapidly affected mortality worldwide1. There is unprecedented urgency to understand who is most at risk of severe outcomes, requiring new approaches for timely analysis of large datasets. Working on behalf of NHS England, here we created OpenSAFELY: a secure health analytics platform covering 40% of all patients in England, holding patient data within the existing data centre of a major primary care electronic health records vendor. Primary care records of 17,278,392 adults were pseudonymously linked to 10,926 COVID-19-related deaths. COVID-19-related death was associated with: being male (hazard ratio (HR) 1.59, 95% confidence interval (CI) 1.53–1.65); older age and deprivation (both with a strong gradient); diabetes; severe asthma; and various other medical conditions. Compared with people with white ethnicity, Black and South Asian people were at higher risk even after adjustment for other factors (HR 1.48, 1.30–1.69 and 1.44, 1.32–1.58, respectively). We have quantified a range of clinical risk factors for COVID-19-related death in the largest cohort study conducted by any country to date. OpenSAFELY is rapidly adding further patients’ records; we will update and extend results regularly.

https://doi.org/10.1038/s41586-020-2521-4

Elizabeth J. WilliamsonAlex J. WalkerKrishnan BhaskaranSeb BaconChris BatesCaroline E. MortonHelen J. CurtisAmir MehrkarDavid EvansPeter InglesbyJonathan CockburnHelen I. McDonaldBrian MacKennaLaurie TomlinsonIan J. DouglasChristopher T. RentschRohini MathurAngel Y. S. WongRichard GrieveDavid HarrisonHarriet ForbesAnna SchultzeRichard CrokerJohn ParryFrank HesterSam HarperRafael PereraStephen J. W. EvansLiam Smeeth & Ben Goldacre 

Nature
Communication through the electronic health record: frequency and implications of free text orders

Communication for non-medication order (CNMO) is a type of free text communication order providers use for asynchronous communication about patient care. The objective of this study was to understand the extent to which non-medication orders are being used for medication-related communication. We analyzed a sample of 26 524 CNMOs placed in 6 hospitals. A total of 42% of non-medication orders contained medication information. There was large variation in the usage of CNMOs across hospitals, provider settings, and provider types. The use of CNMOs for communicating medication-related information may result in delayed or missed medications, receiving medications that should have been discontinued, or important clinical decision being made based on inaccurate information. Future studies should quantify the implications of these data entry patterns on actual medication error rates and resultant safety issues.

https://doi.org/10.1093/jamiaopen/ooaa020

Swaminathan KandaswamyAaron Z HettingerDaniel J HoffmanRaj M RatwaniJenna Marquard

JAMIA Open
Is authorship sufficient for today’s collaborative research? A call for contributor roles

Assigning authorship and recognizing contributions to scholarly works is challenging on many levels. Here we discuss ethical, social, and technical challenges to the concept of authorship that may impede the recognition of contributions to a scholarly work. Recent work in the field of authorship shows that shifting to a more inclusive contributorship approach may address these challenges. Recent efforts to enable better recognition of contributions to scholarship include the development of the Contributor Role Ontology (CRO), which extends the CRediT taxonomy and can be used in information systems for structuring contributions. We also introduce the Contributor Attribution Model (CAM), which provides a simple data model that relates the contributor to research objects via the role that they played, as well as the provenance of the information. Finally, requirements for the adoption of a contributorship-based approach are discussed.

https://doi.org/10.1080/08989621.2020.1779591

Nicole A. Vasilevsky ,Mohammad HosseiniSamantha TeplitzkyVioleta IlikEhsan MohammadiJuliane SchneiderBarbara KernJulien ColombScott C. EdmundsKaren GutzmanDaniel S. HimmelsteinMarijane White,Britton SmithLisa O’KeefeMelissa Haendel & Kristi L. Holmes

Accountability in Research
COVID-19 TestNorm - A tool to normalize COVID-19 testing names to LOINC codes

Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on COVID-19. Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from eight healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online web application for end-users (https://clamp.uth.edu/covid/loinc.php). We believe it will be a useful tool to support secondary use of EHRs for research on COVID-19.

https://doi.org/10.1093/jamia/ocaa145

Xiao Dong, M.DJianfu Li, Ph.DEkin Soysal, B.SJiang Bian, Ph.DScott L DuVall, Ph.DElizabeth Hanchrow, RN, MSNHongfang Liu, Ph.DKristine E Lynch, Ph.DMichael Matheny, M.D., M.S.,M.P.HKarthik Natarajan, Ph.D

Lucila Ohno-Machado, M.D., Ph.DSerguei Pakhomov, Ph.DRuth Madeleine Reeves, Ph.DAmy M Sitapati, M.DSwapna Abhyankar, M.DTheresa Cullen, M.D., M.SJami DeckardXiaoqian Jiang, Ph.DRobert Murphy, M.DHua Xu, Ph.D

JAMIA
Special Issue on Novel Informatics Approaches to COVID-19 Research

The outbreak of the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-Co-V2) started in December 2019 and it was declared a pandemic by the World Health Organization (WHO) on March 11th 2020 [1]. As of May 27th, over 5 million cases and 355,000 deaths have been reported worldwide [2]. In addition to the human health burden, the COVID-19 pandemic has disrupted the global economy and daily life on an unprecedented scale. Researchers worldwide have acted quickly to combat the pandemic of COVID-19, working from different perspectives such as omics, imaging, clinical, and population health research, to understand the etiology and to identify effective treatment and prevention strategies. Informatics methods and tools have played an important role in research about the COVID-19 pandemic. For example, using virus genomes collected across the world, researchers were able to reconstruct the early evolutionary paths of COVID-19 by genetic network analysis, providing insights to virus transmission patterns [3]. In a clinical context, researchers have developed novel approaches to predict infection with SARS-Cov-2 accurately using lung CT scans and other clinical data [4]. At a population scale, researchers have used Bayesian methods to integrate continental-scale data on mobility and mortality to infer the time-varying reproductive rate and the true number of people infected [5]. This Special Issue aims to highlight the development of novel informatics approaches to collect, integrate, harmonize, and analyze all types of data relevant to COVID-19 in order to accelerate knowledge acquisition and scientific discoveries in COVID-19 research, thus informing better decision making in clinical practice and health policies. Investigators are encouraged to submit clear and detailed descriptions of their novel methodological results.

https://doi.org/10.1016/j.jbi.2020.103485

Hua Xu , David Buckeridge , Fei Wang (and Guest Editors from the Department of Population Health Sciences, Cornell University, New York, NY USA)

Journal of Biomedical Informatics
EHR Data Reveals Risk Factors for Poor Outcomes with COVID-19

A team from NYU Langone Health analyzed EHR data and found that low levels of blood oxygen and markers of inflammation were strongly associated with poor outcomes among patients hospitalized with COVID-19.

For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.

Jessica Kent

Healthcare IT Analytics
The Role of Preprints During the Pandemic

A new analysis reveals the breadth and scope of preprint articles related to the COVID-19 pandemic. According to the research, articles about COVID-19 are accessed and distributed from the biomedical servers bioRxiv and medRxiv 15 times more frequently than articles not related to the virus. In addition, preprints account for about 40 percent of papers about COVID-19, the report finds. COVID-19-related preprints are also shared much more often on Twitter. The most tweeted pandemic-related preprints were tweeted more than 10,000 times, compared with about 1,300 tweets for the most tweeted preprint not related to COVID-19. The study further notes that COVID-19 preprints were published more rapidly than other preprints—26 days faster, on average—and nearly three-quarters had no changes to the wording or numbers in their abstracts, when comparing the preprints to their published versions. The findings were posted on bioRxiv.

Gemma Conroy

Nature Index
COVID-19 and the Need for a National Health Information Technology Infrastructure

The need for timely, accurate, and reliable data about the health of the US population has never been greater. Critical questions include the following: (1) how many individuals test positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and how many are affected by the disease it causes—novel coronavirus disease 2019 (COVID-19) in a given geographic area; (2) what are the age and race of these individuals; (3) how many people sought care at a health care facility; (4) how many were hospitalized; (5) within individual hospitals, how many patients required intensive care, received ventilator support, or died; and (6) what was the length of stay in the hospital and in the intensive care unit for patients who survived and for those who died. In an attempt to answer some of these questions, on March 29, 2020, Vice President Mike Pence requested all hospitals to email key COVID-19 testing data to the US Department of Health and Human Services (HHS).1 The National Healthcare Safety Network, an infection-tracking system of the CDC, was tasked with coordinating additional data collection through a new web-based COVID-19 module. Because reporting is optional and partial reporting is allowed, it is unclear how many elements of the requested information are actually being collected and how they will be used. Although the US is one of the most technologically advanced societies in the world and one that spends the most money on health care, this approach illustrates the need for more effective solutions for gathering COVID-19 data at a national level.

doi:10.1001/jama.2020.7239

Dean F. Sittig, PhDHardeep Singh, MD, MPH

JAMA Network
Domains, tasks, and knowledge for health informatics practice: results of a practice analysis

Objective

To develop a comprehensive and current description of what health informatics (HI) professionals do and what they need to know.

Materials and Methods

Six independent subject-matter expert panels drawn from and representative of HI professionals contributed to the development of a draft HI delineation of practice (DoP). An online survey was distributed to HI professionals to validate the draft DoP. A total of 1011 HI practitioners completed the survey. Survey respondents provided domain, task, knowledge and skill (KS) ratings, qualitative feedback on the completeness of the DoP, and detailed professional background and demographic information.

Results

This practice analysis resulted in a validated, comprehensive, and contemporary DoP comprising 5 domains, 74 tasks, and 144 KS statements.

Discussion

The HI practice analysis defined “health informatics professionals” to include practitioners with clinical (eg, dentistry, nursing, pharmacy), public health, and HI or computer science training. The affirmation of the DoP by reviewers and survey respondents reflects the emergence of a core set of tasks performed and KSs used by informaticians representing a broad spectrum of those currently practicing in the field.

Conclusion

The HI practice analysis represents the first time that HI professionals have been surveyed to validate a description of their practice. The resulting HI DoP is an important milestone in the maturation of HI as a profession and will inform HI certification, accreditation, and education activities.

https://doi.org/10.1093/jamia/ocaa018

Cynthia S GaddElaine B SteenCarla M CaroSandra GreenbergJeffrey J WilliamsonDouglas B Fridsma

JAMIA
Future-proofing Biobanks' Governance

Good biobank governance implies-at a minimum-transparency and accountability and the implementation of oversight mechanisms. While the biobanking community is in general committed to such principles, little is known about precisely which governance strategies biobanks adopt to meet those objectives. We conducted an exploratory analysis of governance mechanisms adopted by research biobanks, including genetic biobanks, located in Europe and Canada. We reviewed information available on the websites of 69 biobanks, and directly contacted them for additional information. Our study identified six types of commonly adopted governance strategies: communication, compliance, expert advice, external review, internal procedures, and partnerships. Each strategy is implemented through different mechanisms including, independent ethics assessment, informed consent processes, quality management, data access control, legal compliance, standard operating procedures and external certification. Such mechanisms rely on a wide range of bodies, committees and actors from both within and outside the biobanks themselves. We found that most biobanks aim to be transparent about their governance mechanisms, but could do more to provide more complete and detailed information about them. In particular, the retrievable information, while showing efforts to ensure biobanks operate in a legitimate way, does not specify in sufficient detail how governance mechanisms support accountability, nor how they ensure oversight of research operations. This state of affairs can potentially undermine biobanks' trustworthiness to stakeholders and the public in a long-term perspective. Given the ever-increasing reliance of biomedical research on large biological repositories and their associated databases, we recommend that biobanks increase their efforts to future-proof their governance.

PMID: 32424324  |  DOI: 10.1038/s41431-020-0646-4

Felix GilleEffy VayenaAlessandro Blasimme

PubMed
Real-time tracking of self-reported symptoms to predict potential COVID-19

A total of 2,618,862 participants reported their potential symptoms of COVID-19 on a smartphone-based app. Among the 18,401 who had undergone a SARS-CoV-2 test, the proportion of participants who reported loss of smell and taste was higher in those with a positive test result (4,668 of 7,178 individuals; 65.03%) than in those with a negative test result (2,436 of 11,223 participants; 21.71%) (odds ratio = 6.74; 95% confidence interval = 6.31–7.21). A model combining symptoms to predict probable infection was applied to the data from all app users who reported symptoms (805,753) and predicted that 140,312 (17.42%) participants are likely to have COVID-19.

Cristina MenniAna M. ValdesMaxim B. FreidinCarole H. SudreLong H. NguyenDavid A. DrewSajaysurya GaneshThomas VarsavskyM. Jorge CardosoJulia S. El-Sayed MoustafaAlessia ViscontiPirro HysiRuth C. E. BowyerMassimo ManginoMario FalchiJonathan WolfSebastien OurselinAndrew T. ChanClaire J. Steves & Tim D. Spector 

naturemedicine (a natureresearch journal)
Estimating the deep replicability of scientific findings using human and artificial intelligence

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study’s replicability. Here, we trained an artificial intelligence model to estimate a paper’s replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model’s generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model’s predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like “remarkable” or “unexpected.” We did find that the model’s accuracy is higher when trained on a paper’s text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications—a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

https://doi.org/10.1073/pnas.1909046117

Yang Yang, Wu Youyou, and Brian Uzzi

PNAS
Against pandemic research exceptionalism

The global outbreak of coronavirus disease 2019 (COVID-19) has seen a deluge of clinical studies, with hundreds registered on clinicaltrials.gov. But a palpable sense of urgency and a lingering concern that “in critical situations, large randomized controlled trials are not always feasible or ethical” (1) perpetuate the perception that, when it comes to the rigors of science, crisis situations demand exceptions to high standards for quality. Early phase studies have been launched before completion of investigations that would normally be required to warrant further development of the intervention (2), and treatment trials have used research strategies that are easy to implement but unlikely to yield unbiased effect estimates. Numerous trials investigating similar hypotheses risk duplication of effort, and droves of research papers have been rushed to preprint servers, essentially outsourcing peer review to practicing physicians and journalists. Although crises present major logistical and practical challenges, the moral mission of research remains the same: to reduce uncertainty and enable caregivers, health systems, and policy-makers to better address individual and public health. Rather than generating permission to carry out low-quality investigations, the urgency and scarcity of pandemics heighten the responsibility of key actors in the research enterprise to coordinate their activities to uphold the standards necessary to advance this mission.

Alex John London, Jonathan Kimmelman

Science
A real-time dashboard of clinical trials for COVID-19

Given the accelerated rate at which trial information and findings are emerging, an urgent need exists to track clinical trials, avoid unnecessary duplication of efforts, and understand what trials are being done and where. In response, we have developed a COVID-19 clinical trials registry to collate all trials. Data are pulled from the International Clinical Trials Registry Platform, including those from the Chinese Clinical Trial Registry, ClinicalTrials.gov, Clinical Research Information Service - Republic of Korea, EU Clinical Trials Register, ISRCTN, Iranian Registry of Clinical Trials, Japan Primary Registries Network, and German Clinical Trials Register. Both automated and manual searches are done to ensure minimisation of duplicated entries and for appropriateness to the research questions. Identified studies are then manually reviewed by two separate reviewers before being entered into the registry. Concurrently, we have developed artificial intelligence (AI)-based methods for data searches to identify potential clinical studies not captured in trial registries. These methods provide estimates of the likelihood of importance of a study being included in our database, such that the study can then be reviewed manually for inclusion. Use of AI-based methods saves 50–80% of the time required to manually review all entries without loss of accuracy. Finally, we will use content aggregator services, such as LitCovid, to ensure our data acquisition strategy is complete. With this three-step process, the probability of missing important publications is greatly diminished and so the resulting data are representative of global COVID-19 research efforts.

Kristian Thorlund, Louis Dron, Jay Park, Grace Hsu, Jamie Forrest, Edward J Mills

The Lancet Digital Health
International Electronic Health Record-Derived COVID-19 Clinical Course Profile: The 4CE Consortium

INTRODUCTION: The Coronavirus Disease 2019 (COVID-19) epidemic has caused extreme strains on health systems, public health infrastructure, and economies of many countries. A growing literature has identified key laboratory and clinical markers of pulmonary, cardiac, immune, coagulation, hepatic, and renal dysfunction that are associated with adverse outcomes. Our goal is to consolidate and leverage the largely untapped resource of clinical data from electronic health records of hospital systems in affected countries with the aim to better-define markers of organ injury to improve outcomes. METHODS: A consortium of international hospital systems of different sizes utilizing Informatics for Integrating Biology and the Bedside (i2b2) and Observational Medical Outcomes Partnership (OMOP) platforms was convened to address the COVID-19 epidemic. Over a course of two weeks, the group initially focused on admission comorbidities and temporal changes in key laboratory test values during infection. After establishing a common data model, each site generated four data tables of aggregate data as comma-separated values files. These non-interlinked files encompassed, for COVID-19 patients, daily case counts; demographic breakdown; daily laboratory trajectories for 14 laboratory tests; and diagnoses by diagnosis codes. RESULTS: 96 hospitals in the US, France, Italy, Germany, and Singapore contributed data to the consortium for a total of 27,927 COVID-19 cases and 187,802 performed laboratory values. Case counts and laboratory trajectories were concordant with existing literature. Laboratory test values at the time of viral diagnosis showed hospital-level differences that were equivalent to country-level variation across the consortium partners. CONCLUSIONS: In under two weeks, we formed an international community of researchers to answer critical clinical and epidemiological questions around COVID-19. Harmonized data sets analyzed locally and shared as aggregate data has allowed for rapid analysis and visualization of regional differences and global commonalities. Despite the limitations of our datasets, we have established a framework to capture the trajectory of COVID-19 disease in various subsets of patients and in response to interventions.

doi: https://doi.org/10.1101/2020.04.13.20059691

Gabriel A Brat, Griffin M Weber, Nils Gehlenborg, et al

medRxiv
Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency

Machine Intelligence (MI) is rapidly becoming an important approach across biomedical discovery, clinical research, medical diagnostics/devices, and precision medicine. Such tools can uncover new possibilities for researchers, physicians, and patients, allowing them to make more informed decisions and achieve better outcomes. When deployed in healthcare settings, these approaches have the potential to enhance efficiency and effectiveness of the health research and care ecosystem, and ultimately improve quality of patient care. In response to the increased use of MI in healthcare, and issues associated when applying such approaches to clinical care settings, the National Institutes of Health (NIH) and National Center for Advancing Translational Sciences (NCATS) co-hosted a Machine Intelligence in Healthcare workshop with the National Cancer Institute (NCI) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) on 12 July 2019. Speakers and attendees included researchers, clinicians and patients/ patient advocates, with representation from industry, academia, and federal agencies. A number of issues were addressed, including: data quality and quantity; access and use of electronic health records (EHRs); transparency and explainability of the system in contrast to the entire clinical workflow; and the impact of bias on system outputs, among other topics. This whitepaper reports on key issues associated with MI specific to applications in the healthcare field, identifies areas of improvement for MI systems in the context of healthcare, and proposes avenues and solutions for these issues, with the aim of surfacing key areas that, if appropriately addressed, could accelerate progress in the field effectively, transparently, and ethically.

doi: 10.1038/s41746-020-0254-2

Christine M Cutillo, Karlie R Sharma, Luca Foschini, Shinjini Kundu, Maxine Mackintosh, Kenneth D Mandl

npj | Digital Medicine
Early in the epidemic: impact of preprints on global discourse about COVID-19 transmissibility.

Since it was first reported by WHO in Jan 5, 2020, over 80 000 cases of a novel coronavirus disease (COVID-19) have been diagnosed in China, with exportation events to nearly 90 countries, as of March 6, 2020.1 Given the novelty of the causative pathogen (named SARS-CoV-2), scientists have rushed to fill epidemiological, virological, and clinical knowledge gaps—resulting in over 50 new studies about the virus between January 10 and January 30 alone.2 However, in an era where the immediacy of information has become an expectation of decision makers and the general public alike, many of these studies have been shared first in the form of preprint papers—before peer review.

DOI:https://doi.org/10.1016/S2214-109X(20)30113-3

 

Maimuna S Majumder and Kenneth D Mandl

The Lancet Global Health
Time for NIH to lead on data sharing

Vol. 367, Issue 6484, pp. 1308-1309; DOI: 10.1126/science.aba4456

Ida Sim, Michael Stebbins, Barbara E. Bierer, Atul J. Butte, Jeffrey Drazen, Victor Dzau, Adrian F. Hernandez  

Science
Data Citzenship Under the 21st Century Cures Act

A new federal rule facilitates health data exchange and enforces right of access to a computable version of one’s medical record. The essential next steps include addressing cybersecurity, privacy, and insurability risks.

PMID: 32160449;  DOI: 10.1056/NEJMp1917640

Kenneth D. Mandl, MD, MPH and Isaac S. Kohane, MD, PhD

New England Journal of Medicine
Personas for the translational workforce

Twelve evidence-based profiles of roles across the translational workforce and two patients were made available through clinical and translational science (CTS) Personas, a project of the Clinical and Translational Science Awards (CTSA) Program National Center for Data to Health (CD2H). The persona profiles were designed and researched to demonstrate the key responsibilities, motivators, goals, software use, pain points, and professional development needs of those working across the spectrum of translation, from basic science to clinical research to public health. The project’s goal was to provide reliable documents that could be used to inform CTSA software development projects, educational resources, and communication initiatives. This paper presents the initiative to create personas for the translational workforce, including the methodology, engagement strategy, and lessons learned. Challenges faced and successes achieved by the project may serve as a roadmap for others searching for best practices in the creation of Persona profiles.

Sara Gonzales, Lisa O’Keefe, Karen Gutzman, Guillaume Viger, Annie B. Wescott, Bailey Farrow, Allison P. Heath, Meen Chul Kim, Deanne Taylor, Robin Champieux, Po-Yin Yen and Kristi Holmes

Journal of Clinical and Translational Science
20 things to know about Epic, Cerner heading into 2020

Epic and Cerner are the two largest EHR companies for hospitals and health systems across the country. Here are 10 things to know about each company as they approach the new decade.

Laura Dydra

Health IT
Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research

Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources.

Nicholas J Dobbins, Clifford H Spital, Robert A Black, Jason M Morrison, Bas de Veer, Elizabeth Zampino, Robert D Harrington, Bethene D Britt, Kari A Stephens, Adam B Wilcox, Peter Tarczy-Hornoch, Sean D Mooney

Journal of the American Medical Informatics Association
Results of VIVO Community Feedback Survey

In early 2018, the VIVO Leadership group brought together parties from across the broader VIVO community to Duke University to discuss critical aspects of VIVO as both a product and a community. At the meeting, a number of working groups were created to do deeper work on a set of focus areas to help inform the VIVO leadership in taking steps toward the future growth of VIVO. One group was tasked with understanding the current perception of VIVO's governance and structure from effectiveness, to openness and inclusivity, to make recommendations to the VIVO Leadership group concerning key strengths to preserve and challenges that needed to be addressed.

Michael Conlon, Kristi Holmes, Daniel W Hook, Dean B Krafft, Mark P Newton, Julia Trimmer

VIVO: 2019 Conference
A Platform to Support Science of Translational Science Research

There are numerous sources of metadata regarding research activity that Clinical and Translational Science Award (CTSA) hubs currently duplicate effort in acquiring, linking and analyzing. The Science of Translational Science (SciTS) project provides a shared data platform for hubs to collaboratively manage these resources, and avoid redundant effort. In addition to the shared resources, participating CTSA hubs are provided private schemas for their own use, as well as support in integrating these resources into their local environments.

This project builds upon multiple components completed in the first phase of the Center for Data to Health (CD2H), specifically: a) data aggregation and indexing work of research profiles and their ingest into and improvements to CTSAsearch by Iowa (http://labs.cd2h.org/search/facetSearch.jsp); b) NCATS 4DM, a map of translational science; and c) metadata requirements analysis and ingest of from a number of other CD2H and CTSA projects, including educational resources from DIAMOND and N-lighten, development resources from GitHub, and data resources from DataMed (bioCADDIE) and DataCite. This work also builds on other related work on data sources, workflows, and reporting from the SciTS team, including entity extraction from the acknowledgement sections of PubMed Central papers, disambiguated PubMed authorship, ORCiD data and integrations, NIH RePORT, Federal RePORTER, and other data sources and tools.

David Eichmann, Kristi Holmes VIVO: 2019 Conference
Feasibility and utility of applications of the common data model to multiple, disparate observational health databases

Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research.

Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results.

Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour.

Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases.

Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases.

https://doi.org/10.1093/jamia/ocu023

Erica A VossRupa MakadiaAmy MatchoQianli MaChris KnollMartijn SchuemieFrank J DeFalcoAjit LondheVivienne ZhuPatrick B Ryan

JAMIA
Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)

Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org). Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases (“data marts”) can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL: http://www.i2b2.org/software.

DOI: 10.1136/jamia.2009.000893

Shawn N Murphy, Griffin Weber, Michael Mendis, Vivian Gainer, Henry C Chueh,Susanne Churchill, Isaac Kohane

JAMIA