bioRxiv | 2019

ModEx: A text mining system for extracting mode of regulation of Transcription Factor-gene regulatory interaction

 
 
 

Abstract


Transcription factors (TFs) are proteins that are fundamental to transcription and regulation of gene expression. Each TF may regulate multiple genes and each gene may be regulated by multiple TFs. TFs can act as either activator or repressor of gene expression. This complex network of interactions between TFs and genes underlies many developmental and biological processes and is implicated in several human diseases such as cancer. Hence deciphering the network of TF-gene interactions with information on mode of regulation (activation vs. repression) is an important step toward understanding the regulatory pathways that underlie complex traits. There are many experimental, computational, and manually curated databases of TF-gene interactions. In particular, high-throughput ChIP-seq datasets provide a large-scale map or transcriptional regulatory interactions. However, these interactions are not annotated with information on context and mode of regulation. Such information is crucial to gain a global picture of gene regulatory mechanisms and can aid in developing machine learning models for applications such as biomarker discovery, prediction of response to therapy, and precision medicine. In this work, we introduce a text-mining system to annotate ChIP-seq derived interaction with such meta data through mining PubMed articles. We evaluate the performance of our system using the gold standard small scale manually curated TRUSST database. Our results show that the method is able to accurately extract mode of regulation with F-score 0.77 on TRRUST curated interaction and F-score 0.96 on intersection of TRUSST and ChIP-network. We provide a HTTP REST API for our code to facilitate usage. Availability Source code and datasets are available for download on GitHub: https://github.com/samanfrm/modex HTTP REST API https://watson.math.umb.edu/modex/

Volume None
Pages None
DOI 10.1101/672725
Language English
Journal bioRxiv

Full Text