Skip to main content
  • AACR Publications
    • Blood Cancer Discovery
    • Cancer Discovery
    • Cancer Epidemiology, Biomarkers & Prevention
    • Cancer Immunology Research
    • Cancer Prevention Research
    • Cancer Research
    • Clinical Cancer Research
    • Molecular Cancer Research
    • Molecular Cancer Therapeutics

AACR logo

  • Register
  • Log in
  • My Cart
Advertisement

Main menu

  • Home
  • About
    • The Journal
    • AACR Journals
    • Subscriptions
    • Permissions and Reprints
  • Articles
    • OnlineFirst
    • Current Issue
    • Past Issues
    • Meeting Abstracts
    • Collections
      • COVID-19 & Cancer Resource Center
      • Focus on Computer Resources
      • Highly Cited Collection
      • Editors' Picks
      • "Best of" Collection
  • For Authors
    • Information for Authors
    • Author Services
    • Early Career Award
    • Best of: Author Profiles
    • Submit
  • Alerts
    • Table of Contents
    • Editors' Picks
    • OnlineFirst
    • Citations
    • Author/Keyword
    • RSS Feeds
    • My Alert Summary & Preferences
  • News
    • Cancer Discovery News
  • COVID-19
  • Webinars
  • Search More

    Advanced Search

  • AACR Publications
    • Blood Cancer Discovery
    • Cancer Discovery
    • Cancer Epidemiology, Biomarkers & Prevention
    • Cancer Immunology Research
    • Cancer Prevention Research
    • Cancer Research
    • Clinical Cancer Research
    • Molecular Cancer Research
    • Molecular Cancer Therapeutics

User menu

  • Register
  • Log in
  • My Cart

Search

  • Advanced search
Cancer Research
Cancer Research
  • Home
  • About
    • The Journal
    • AACR Journals
    • Subscriptions
    • Permissions and Reprints
  • Articles
    • OnlineFirst
    • Current Issue
    • Past Issues
    • Meeting Abstracts
    • Collections
      • COVID-19 & Cancer Resource Center
      • Focus on Computer Resources
      • Highly Cited Collection
      • Editors' Picks
      • "Best of" Collection
  • For Authors
    • Information for Authors
    • Author Services
    • Early Career Award
    • Best of: Author Profiles
    • Submit
  • Alerts
    • Table of Contents
    • Editors' Picks
    • OnlineFirst
    • Citations
    • Author/Keyword
    • RSS Feeds
    • My Alert Summary & Preferences
  • News
    • Cancer Discovery News
  • COVID-19
  • Webinars
  • Search More

    Advanced Search

Bioinformatics and Systems Biology

Abstract 5303: Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor

Cosmin A. Bejan, Andrew Sochacki, Shilin Zhao, Yaomin Xu and Michael Savona
Cosmin A. Bejan
Vanderbilt University, Nashville, TN.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew Sochacki
Vanderbilt University, Nashville, TN.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shilin Zhao
Vanderbilt University, Nashville, TN.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yaomin Xu
Vanderbilt University, Nashville, TN.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Savona
Vanderbilt University, Nashville, TN.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1158/1538-7445.AM2018-5303 Published July 2018
  • Article
  • Info & Metrics
Loading
Proceedings: AACR Annual Meeting 2018; April 14-18, 2018; Chicago, IL

Abstract

Myelofibrosis (MF) is a devastating myeloproliferative neoplasm (MPN) hallmarked by marrow fibrosis, extramedullary hematopoiesis, vascular thromboembolism, and ~50% incidence of JAK2V617F. MF is difficult to study in large EHR datasets due to clinical heterogeneity and unreliable ICD coding. The Synthetic Derivative is a cloned and de-identified research EHR with 2.9 million unique patients linked to BioVU, a DNA biorepository. To develop phenotype-genotype associations, we created an algorithm to classify MF, using NLP with negation detection of MF keywords, medications, and ICD coding. To enrich our cohort, we developed JAKextractor, an algorithm to identify patients tested clinically for JAK2V617F across all 248,000 BioVU patients.

For MF identification, we trained a supervised learning algorithm to learn decision rules that encode counts of MF-specific ICD codes, medications, text mentions, as well as the assertion status of MF and JAK2 mentions in patient notes. Experiments were evaluated using a 10-fold cross validation scheme. JAKextractor used pattern matching to extract the status (WT vs MUT) of each JAK2 text mention. Machine learning predicted a JAK2V617F patient based on the information extracted in the previous step from patient notes. We subsequently genotyped banked DNA on an enriched subset of MF cases via a Illumina® TruSight myeloid NGS panel to validate JAKextractor.

The top performing MF algorithm combined all sources of clinical information and achieved an F1-measure (F1) of 96% and identified 309 MF patients in BioVU. The extracted decision rule for predicting an MF patient was [JAK2V617F ^ ICD>1] v [JAK2WT ^ ICD>1 ^ TXT>3]. ICD is necessary but not sufficient to predict MF identification. Utilizing only ICD counts created a detrimentally lower F1 of 88% (P<0.001). Our MF cohort had a mean age at onset (60.3±12.6), last visit age (63.1±12.1), and JAK2V617F (46.1%). The mean age of MF onset was higher with JAK2V617F (64) compared to JAK2WT (57) (P<0.001). Survival was no different between JAK2V617F and JAK2WT MF cases via log-rank test (P=0.11) with median survival 108 months. 131 MF cases were genotyped with JAK2V617F in 71/131 (54.2%) compared to 66/131 (50.4%) via JAKextractor. Mean JAK2V617F allelic frequency was 0.569 with detection ranging 0.069-0.976. Ten cases displayed disagreement between JAKextractor and NGS. There were 2 FP and 4 FN JAKextractor predictions 6/131(4.6%); 2 true NGS failures, 1 incomplete chart and 1 loss of JAK2V617F over time. NGS detected JAKV617F on MF patients who had not been previously tested 7/131 (5.3%).

Our results demonstrated successful identification of MF and JAK2V617F within an EHR. We established the feasibility of creating a MPN database with retrospective genotyping of biobanked DNA. We plan for scaled implementation of similar algorithms across all myeloid disease within BioVU with the ability to retrospectively genotype each case.

Citation Format: Cosmin A. Bejan, Andrew Sochacki, Shilin Zhao, Yaomin Xu, Michael Savona. Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 5303.

  • ©2018 American Association for Cancer Research.
Previous
Back to top
Cancer Research: 78 (13 Supplement)
July 2018
Volume 78, Issue 13 Supplement
  • Table of Contents

Sign up for alerts

Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for sharing this Cancer Research article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Abstract 5303: Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor
(Your Name) has forwarded a page to you from Cancer Research
(Your Name) thought you would be interested in this article in Cancer Research.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Abstract 5303: Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor
Cosmin A. Bejan, Andrew Sochacki, Shilin Zhao, Yaomin Xu and Michael Savona
Cancer Res July 1 2018 (78) (13 Supplement) 5303; DOI: 10.1158/1538-7445.AM2018-5303

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Abstract 5303: Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor
Cosmin A. Bejan, Andrew Sochacki, Shilin Zhao, Yaomin Xu and Michael Savona
Cancer Res July 1 2018 (78) (13 Supplement) 5303; DOI: 10.1158/1538-7445.AM2018-5303
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
  • Info & Metrics
Advertisement

Related Articles

Cited By...

More in this TOC Section

Bioinformatics and Systems Biology

  • Abstract 4261: Pathways of metastatic bladder cancer from a longitudinal patient data set
  • Abstract 1287: Exploring somatic DNA structural alteration and aberrant genomic interactions in cancer through GenomePaint
  • Abstract LB-024: The effects of chromatin structure on variation in fragment size of cell free DNA
Show more Bioinformatics and Systems Biology

Poster Presentations - Proffered Abstracts

  • Abstract 2084: PocketOnco: A CoreML based app for diagnosis and prognosis of colorectal, breast and skin cancer using multilayered convolutional neural network algorithms
  • Abstract PO-048: MicroRNA-10b is a regulator of cellular viability and proliferation in fibrolamellar carcinoma
  • Abstract PO-055: Pan-cancer metabolic profiling of the tumor microenvironment
Show more Poster Presentations - Proffered Abstracts

Poster Presentations - New Algorithms

  • Abstract 5302: Phenotypic heterogeneity of patient-derived tumor cells visualized by unsupervised analysis in cell-based personalized drug testing
  • Abstract 5311: Integrated approaches for design of precision cancer immunotherapies: Selection of Class I and Class II T cell neo-epitopes and removal of Treg epitopes
  • Abstract 5294: CIMerge: A machine learning approach for merging and genotyping complex indel calls from NGS data
Show more Poster Presentations - New Algorithms
  • Home
  • Alerts
  • Feedback
  • Privacy Policy
Facebook  Twitter  LinkedIn  YouTube  RSS

Articles

  • Online First
  • Current Issue
  • Past Issues
  • Meeting Abstracts

Info for

  • Authors
  • Subscribers
  • Advertisers
  • Librarians

About Cancer Research

  • About the Journal
  • Editorial Board
  • Permissions
  • Submit a Manuscript
AACR logo

Copyright © 2021 by the American Association for Cancer Research.

Cancer Research Online ISSN: 1538-7445
Cancer Research Print ISSN: 0008-5472
Journal of Cancer Research ISSN: 0099-7013
American Journal of Cancer ISSN: 0099-7374

Advertisement