Martin Renqiang Min
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Renqiang Min.
Bioinformatics | 2015
Pavel P. Kuksa; Martin Renqiang Min; Rishabh Dugar; Mark Gerstein
MOTIVATIONnEffective computational methods for peptide-protein binding prediction can greatly help clinical peptide vaccine search and design. However, previous computational methods fail to capture key nonlinear high-order dependencies between different amino acid positions. As a result, they often produce low-quality rankings of strong binding peptides. To solve this problem, we propose nonlinear high-order machine learning methods including high-order neural networks (HONNs) with possible deep extensions and high-order kernel support vector machines to predict major histocompatibility complex-peptide binding.nnnRESULTSnThe proposed high-order methods improve quality of binding predictions over other prediction methods. With the proposed methods, a significant gain of up to 25-40% is observed on the benchmark and reference peptide datasets and tasks. In addition, for the first time, our experiments show that pre-training with high-order semi-restricted Boltzmann machines significantly improves the performance of feed-forward HONNs. Moreover, our experiments show that the proposed shallow HONN outperform the popular pre-trained deep neural network on most tasks, which demonstrates the effectiveness of modelling high-order feature interactions for predicting major histocompatibility complex-peptide binding.nnnAVAILABILITY AND IMPLEMENTATIONnThere is no associated distributable [email protected] or [email protected] INFORMATIONnSupplementary data are available at Bioinformatics online.
pacific symposium on biocomputing | 2013
Martin Renqiang Min; Salim A. Chowdhury; Yanjun Qi; Alex Stewart; Rachel Ostroff
Disrupted or abnormal biological processes responsible for cancers often quantitatively manifest as disrupted additive and multiplicative interactions of gene/protein expressions correlating with cancer progression. However, the examination of all possible combinatorial interactions between gene features in most case-control studies with limited training data is computationally infeasible. In this paper, we propose a practically feasible data integration approach, QUIRE (QUadratic Interactions among infoRmative fEatures), to identify discriminative complex interactions among informative gene features for cancer diagnosis and biomarker discovery directly based on patient blood samples. QUIRE works in two stages, where it first identifies functionally relevant gene groups for the disease with the help of gene functional annotations and available physical protein interactions, then it explores the combinatorial relationships among the genes from the selected informative groups. Based on our private experimentally generated data from patient blood samples using a novel SOMAmer (Slow Off-rate Modified Aptamer) technology, we apply QUIRE to cancer diagnosis and biomarker discovery for Renal Cell Carcinoma (RCC) and Ovarian Cancer (OVC). To further demonstrate the general applicability of our approach, we also apply QUIRE to a publicly available Colorectal Cancer (CRC) dataset that can be used to prioritize our SOMAmer design. Our experimental results show that QUIRE identifies gene-gene interactions that can better identify the different cancer stages of samples, as compared to other state-of-the-art feature selection methods. A literature survey shows that many of the interactions identified by QUIRE play important roles in the development of cancer.
international conference on big data | 2016
Ke Zhang; Jianwu Xu; Martin Renqiang Min; Guofei Jiang; Konstantinos Pelechrinis; Hui Zhang
In mission critical IT services, system failure prediction becomes increasingly important; it prevents unexpected system downtime, and assures service reliability for end users. While operational console logs record rich and descriptive information on the health status of those IT systems, existing system management technologies mostly use them in a labor-intensive forensics approach, i.e., identifying what went wrong after the fact. Recent efforts on log-based system management take an automation approach with text mining techniques, such as term frequency — inverse document frequency (TF-IDF). However, those techniques lead to a high-dimensional feature space, and are not easily generalizable to heterogeneous log formats. In this paper, we present a novel system that automatically parses streamed console logs and detects early warning signals for IT system failure prediction. In particular, our solution includes a log pattern extraction method by clustering together logs with similar format and content. We then resemble the TF-IDF idea by considering each pattern as a word and the set of patterns in each discretized epoch as a document. This leads to a feature space with significantly lower dimensionality that can provide robust signals for the status of the system. As system failures tend to occur very rare, we apply a recurrent neural network, namely, Long Short-Term Memory (LSTM), to deal with the “rarity” of labeled data in the training process. LSTM is able to capture the long-range dependency across sequences, therefore outperforms traditional supervised learning methods in our application domain. We evaluated and compared our proposed technology with state-of-the-art machine learning approaches using real log traces from two large enterprise systems. The results showed the advantage and potentials of our system in prediction of complex IT failures. To our knowledge, our work is the first that employs LSTM for log-based system failure prediction.
knowledge discovery and data mining | 2017
Huayu Li; Martin Renqiang Min; Yong Ge; Asim Kadav
Neural network based sequence-to-sequence models in an encoder-decoder framework have been successfully applied to solve Question Answering (QA) problems, predicting answers from statements and questions. However, almost all previous models have failed to consider detailed context information and unknown states under which systems do not have enough information to answer given questions. These scenarios with incomplete or ambiguous information are very common in the setting of Interactive Question Answering (IQA). To address this challenge, we develop a novel model, employing context-dependent word-level attention for more accurate statement representations and question-guided sentence-level attention for better context modeling. We also generate unique IQA datasets to test our model, which will be made publicly available. Employing these attention mechanisms, our model accurately understands when it can output an answer or when it requires generating a supplementary question for additional input depending on different contexts. When available, users feedback is encoded and directly applied to update sentence-level attention to infer an answer. Extensive experiments on QA and IQA datasets quantitatively demonstrate the effectiveness of our model with significant improvement over state-of-the-art conventional QA models.
knowledge discovery and data mining | 2014
Sanjay Purushotham; Martin Renqiang Min; C.-C. Jay Kuo; Rachel Ostroff
Identifying interpretable discriminative high-order feature interactions given limited training data in high dimensions is challenging in both machine learning and data mining. In this paper, we propose a factorization based sparse learning framework termed FHIM for identifying high-order feature interactions in linear and logistic regression models, and study several optimization methods for solving them. Unlike previous sparse learning methods, our model FHIM recovers both the main effects and the interaction terms accurately without imposing tree-structured hierarchical constraints. Furthermore, we show that FHIM has oracle properties when extended to generalized linear regression models with pairwise interactions. Experiments on simulated data show that FHIM outperforms the state-of-the-art sparse lear-ning techniques. Further experiments on our experimentally generated data from patient blood samples using a novel SOMAmer (Slow Off-rate Modified Aptamer) technology show that, FHIM performs blood-based cancer diagnosis and bio-marker discovery for Renal Cell Carcinoma much better than other competing methods, and it identifies interpretable block-wise high-order gene interactions predictive of cancer stages of samples. A literature survey shows that the interactions identified by FHIM play important roles in cancer development.
international joint conference on artificial intelligence | 2017
Martin Renqiang Min; Hongyu Guo; Dongjin Song
Metric learning methods for dimensionality reduction in combination with k-Nearest Neighbors (kNN) have been extensively deployed in many classification, data embedding, and information retrieval applications. However, most of these approaches involve pairwise training data comparisons, and thus have quadratic computational complexity with respect to the size of training set, preventing them from scaling to fairly big datasets. Moreover, during testing, comparing test data against all the training data points is also expensive in terms of both computational cost and resources required. Furthermore, previous metrics are either too constrained or too expressive to be well learned. To effectively solve these issues, we present an exemplar-centered supervised shallow parametric data embedding model, using a Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a shallow high-order parametric embedding function and compares training/test data only with learned or precomputed exemplars, resulting in a cost function with linear computational complexity for both training and testing. We also empirically demonstrate, using several benchmark datasets, that for classification in two-dimensional embedding space, our approach not only gains speedup of kNN by hundreds of times, but also outperforms state-of-the-art supervised embedding approaches.
international conference on machine learning | 2010
Martin Renqiang Min; Laurens van der Maaten; Zineng Yuan; Anthony J. Bonner; Zhaolei Zhang
international conference on artificial intelligence and statistics | 2014
Martin Renqiang Min; Xia Ning; Chao Cheng; Mark Gerstein
international conference on learning representations | 2018
Bo Zong; Qi Song; Martin Renqiang Min; Wei Cheng; Cristian Lumezanu; Daeki Cho; Haifeng Chen
arXiv: Learning | 2015
Hongyu Guo; Xiaodan Zhu; Martin Renqiang Min