Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Junming Yin is active.

Publication


Featured researches published by Junming Yin.


IEEE Transactions on Knowledge and Data Engineering | 2016

Scalable Temporal Latent Space Inference for Link Prediction in Dynamic Social Networks

Linhong Zhu; Dong Guo; Junming Yin; Greg Ver Steeg; Aram Galstyan

We propose a temporal latent space model for link prediction in dynamic social networks, where the goal is to predict links over time based on a sequence of previous graph snapshots. The model assumes that each user lies in an unobserved latent space, and interactions are more likely to occur between similar users in the latent space representation. In addition, the model allows each user to gradually move its position in the latent space as the network structure evolves over time. We present a global optimization algorithm to effectively infer the temporal latent space. Two alternative optimization algorithms with local and incremental updates are also proposed, allowing the model to scale to larger networks without compromising prediction accuracy. Empirically, we demonstrate that our model, when evaluated on a number of real-world dynamic networks, significantly outperforms existing approaches for temporal link prediction in terms of both scalability and predictive power.We study temporal link prediction problem, where, given past interactions, our goal is to predict new interactions. We propose a dynamic link prediction method based on nonnegative matrix factorization. This method assumes that interactions are more likely between users that are similar to each other in the latent space representation. We propose a global optimization algorithm to effectively learn the temporal latent space with quadratic convergence rate and bounded error. In addition, we propose two alternative algorithms with local and incremental updates, which provide much better scalability without deteriorating prediction accuracy. We evaluate our model on a number of real-world dynamic networks and demonstrate that our model significantly outperforms existing approaches for temporal link prediction in terms of both scalability and predictive power.


Bioinformatics | 2009

Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data

Junming Yin; Michael I. Jordan; Yun S. Song

Motivation: Two known types of meiotic recombination are crossovers and gene conversions. Although they leave behind different footprints in the genome, it is a challenging task to tease apart their relative contributions to the observed genetic variation. In particular, for a given population SNP dataset, the joint estimation of the crossover rate, the gene conversion rate and the mean conversion tract length is widely viewed as a very difficult problem. Results: In this article, we devise a likelihood-based method using an interleaved hidden Markov model (HMM) that can jointly estimate the aforementioned three parameters fundamental to recombination. Our method significantly improves upon a recently proposed method based on a factorial HMM. We show that modeling overlapping gene conversions is crucial for improving the joint estimation of the gene conversion rate and the mean conversion tract length. We test the performance of our method on simulated data. We then apply our method to analyze real biological data from the telomere of the X chromosome of Drosophila melanogaster, and show that the ratio of the gene conversion rate to the crossover rate for the region may not be nearly as high as previously claimed. Availability: A software implementation of the algorithms discussed in this article is available at http://www.cs.berkeley.edu/∼yss/software.html. Contact: [email protected]


pacific symposium on biocomputing | 2011

Finding genome-transcriptome-phenome association with structured association mapping and visualization in GenAMap.

Ross E. Curtis; Junming Yin; Peter Kinnaird; Eric P. Xing

Despite the success of genome-wide association studies in detecting novel disease variants, we are still far from a complete understanding of the mechanisms through which variants cause disease. Most of previous studies have considered only genome-phenome associations. However, the integration of transcriptome data may help further elucidate the mechanisms through which genetic mutations lead to disease and uncover potential pathways to target for treatment. We present a novel structured association mapping strategy for finding genome-transcriptome-phenome associations when SNP, gene-expression, and phenotype data are available for the same cohort. We do so via a two-step procedure where genome-transcriptome associations are identified by GFlasso, a sparse regression technique presented previously. Transcriptome-phenome associations are then found by a novel proposed method called gGFlasso, which leverages structure inherent in the genes and phenotypic traits. Due to the complex nature of three-way association results, visualization tools can aid in the discovery of causal SNPs and regulatory mechanisms affecting diseases. Using wellgrounded visualization techniques, we have designed new visualizations that filter through large three-way association results to detect interesting SNPs and associated genes and traits. The two-step GFlasso-gGFlasso algorithmic approach and new visualizations are integrated into GenAMap, a visual analytics system for structured association mapping. Results on simulated datasets show that our approach has the potential to increase the sensitivity and specificity of association studies, compared to existing procedures that do not exploit the full structural information of the data. We report results from an analysis on a publically available mouse dataset, showing that identified SNP-gene-trait associations are compatible with known biology.


Statistical Applications in Genetics and Molecular Biology | 2006

Model selection for mixtures of mutagenetic trees.

Junming Yin; Niko Beerenwinkel; Jörg Rahnenführer; Thomas Lengauer

The evolution of drug resistance in HIV is characterized by the accumulation of resistance-associated mutations in the HIV genome. Mutagenetic trees, a family of restricted Bayesian tree models, have been applied to infer the order and rate of occurrence of these mutations. Understanding and predicting this evolutionary process is an important prerequisite for the rational design of antiretroviral therapies. In practice, mixtures models of K mutagenetic trees provide more flexibility and are often more appropriate for modelling observed mutational patterns.Here, we investigate the model selection problem for K-mutagenetic trees mixture models. We evaluate several classical model selection criteria including cross-validation, the Bayesian Information Criterion (BIC), and the Akaike Information Criterion. We also use the empirical Bayes method by constructing a prior probability distribution for the parameters of a mutagenetic trees mixture model and deriving the posterior probability of the model. In addition to the model dimension, we consider the redundancy of a mixture model, which is measured by comparing the topologies of trees within a mixture model. Based on the redundancy, we propose a new model selection criterion, which is a modification of the BIC.Experimental results on simulated and on real HIV data show that the classical criteria tend to select models with far too many tree components. Only cross-validation and the modified BIC recover the correct number of trees and the tree topologies most of the time. At the same optimal performance, the runtime of the new BIC modification is about one order of magnitude lower. Thus, this model selection criterion can also be used for large data sets for which cross-validation becomes computationally infeasible.


Bioinformatics | 2016

A time-varying group sparse additive model for genome-wide association studies of dynamic complex traits

Micol Marchetti-Bowick; Junming Yin; Judie A. Howrylak; Eric P. Xing

MOTIVATION Despite the widespread popularity of genome-wide association studies (GWAS) for genetic mapping of complex traits, most existing GWAS methodologies are still limited to the use of static phenotypes measured at a single time point. In this work, we propose a new method for association mapping that considers dynamic phenotypes measured at a sequence of time points. Our approach relies on the use of Time-Varying Group Sparse Additive Models (TV-GroupSpAM) for high-dimensional, functional regression. RESULTS This new model detects a sparse set of genomic loci that are associated with trait dynamics, and demonstrates increased statistical power over existing methods. We evaluate our method via experiments on synthetic data and perform a proof-of-concept analysis for detecting single nucleotide polymorphisms associated with two phenotypes used to assess asthma severity: forced vital capacity, a sensitive measure of airway obstruction and bronchodilator response, which measures lung response to bronchodilator drugs. AVAILABILITY AND IMPLEMENTATION Source code for TV-GroupSpAM freely available for download at http://www.cs.cmu.edu/~mmarchet/projects/tv_group_spam, implemented in MATLAB. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


PLOS ONE | 2014

GWAS in a box: statistical and visual analytics of structured associations via GenAMap.

Eric P. Xing; Ross E. Curtis; Georg P. Schoenherr; Seunghak Lee; Junming Yin; Kriti Puniyani; Wei Wu; Peter Kinnaird

With the continuous improvement in genotyping and molecular phenotyping technology and the decreasing typing cost, it is expected that in a few years, more and more clinical studies of complex diseases will recruit thousands of individuals for pan-omic genetic association analyses. Hence, there is a great need for algorithms and software tools that could scale up to the whole omic level, integrate different omic data, leverage rich structure information, and be easily accessible to non-technical users. We present GenAMap, an interactive analytics software platform that 1) automates the execution of principled machine learning methods that detect genome- and phenome-wide associations among genotypes, gene expression data, and clinical or other macroscopic traits, and 2) provides new visualization tools specifically designed to aid in the exploration of association mapping results. Algorithmically, GenAMap is based on a new paradigm for GWAS and PheWAS analysis, termed structured association mapping, which leverages various structures in the omic data. We demonstrate the function of GenAMap via a case study of the Brem and Kruglyak yeast dataset, and then apply it on a comprehensive eQTL analysis of the NIH heterogeneous stock mice dataset and report some interesting findings. GenAMap is available from http://sailing.cs.cmu.edu/genamap.


IEEE Transactions on Knowledge and Data Engineering | 2018

Supervised Topic Modeling Using Hierarchical Dirichlet Process-Based Inverse Regression: Experiments on E-Commerce Applications

Weifeng Li; Junming Yin; Hsinchsun Chen

The proliferation of e-commerce calls for mining consumer preferences and opinions from user-generated text. To this end, topic models have been widely adopted to discover the underlying semantic themes (i.e., topics). Supervised topic models have emerged to leverage discovered topics for predicting the response of interest (e.g., product quality and sales). However, supervised topic modeling remains a challenging problem because of the need to prespecify the number of topics, the lack of predictive information in topics, and limited scalability. In this paper, we propose a novel supervised topic model, Hierarchical Dirichlet Process-based Inverse Regression (HDP-IR). HDP-IR characterizes the corpus with a flexible number of topics, which prove to retain as much predictive information as the original corpus. Moreover, we develop an efficient inference algorithm capable of examining large-scale corpora (millions of documents or more). Three experiments were conducted to evaluate the predictive performance over major e-commerce benchmark testbeds of online reviews. Overall, HDP-IR outperformed existing state-of-the-art supervised topic models. Particularly, retaining sufficient predictive information improved predictive R-squared by over 17.6 percent; having topic structure flexibility contributed to predictive R-squared by at least 4.1 percent. HDP-IR provides an important step for future study on user-generated texts from a topic perspective.


BMC Genetics | 2014

Hypothesis testing of meiotic recombination rates from population genetic data.

Junming Yin

BackgroundMeiotic recombination, one of the central biological processes studied in population genetics, comes in two known forms: crossovers and gene conversions. A number of previous studies have shown that when one of these two events is nonexistent in the genealogical model, the point estimation of the corresponding recombination rate by population genetic methods tends to be inflated. Therefore, it has become necessary to obtain statistical evidence from population genetic data about whether one of the two recombination events is absent.ResultsIn this paper, we formulate this problem in a hypothesis testing framework and devise a testing procedure based on the likelihood ratio test (LRT). However, because the null value (i.e., zero) lies on the boundary of the parameter space, the regularity conditions for the large-sample approximation to the distribution of the LRT statistic do not apply. In turn, the standard chi-squared approximation is inaccurate. To address this critical issue, we propose a parametric bootstrap procedure to obtain an approximate p-value for the observed test statistic. Coalescent simulations are conducted to show that our approach yields accurate null p-values that closely follow the theoretical prediction while the estimated alternative p-values tend to concentrate closer to zero. Finally, the method is demonstrated on a real biological data set from the telomere of the X chromosome of African Drosophila melanogaster.ConclusionsOur methodology provides a necessary complement to the existing procedures of estimating meiotic recombination rates from population genetic data.


neural information processing systems | 2013

A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

Junming Yin; Qirong Ho; Eric P. Xing


Archive | 2013

Petuum: A Framework for Iterative-Convergent Distributed ML

Wei Dai; Jinliang Wei; Xun Zheng; Jin Kyu Kim; Seunghak Lee; Junming Yin; Qirong Ho; Eric P. Xing

Collaboration


Dive into the Junming Yin's collaboration.

Top Co-Authors

Avatar

Eric P. Xing

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Qirong Ho

University of Arizona

View shared research outputs
Top Co-Authors

Avatar

Aram Galstyan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Dong Guo

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Greg Ver Steeg

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Linhong Zhu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter Kinnaird

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Ross E. Curtis

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Seunghak Lee

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge