Prabhanjan Kambadur | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Prabhanjan Kambadur is active.

Explore More

Publication

Featured researches published by Prabhanjan Kambadur.

north american chapter of the association for computational linguistics | 2016

Geolocation for Twitter: Timing Matters

Mark Dredze; Miles Osborne; Prabhanjan Kambadur

Automated geolocation of social media messages can benefit a variety of downstream applications. However, these geolocation systems are typically evaluated without attention to how changes in time impact geolocation. Since different people, in different locations write messages at different times, these factors can significantly vary the performance of a geolocation system over time. We demonstrate cyclical temporal effects on geolocation accuracy in Twitter, as well as rapid drops as test data moves beyond the time period of training data. We show that temporal drift can effectively be countered with even modest online model updates.

international conference on data mining | 2014

Orthogonal Matching Pursuit for Sparse Quantile Regression

Aleksandr Y. Aravkin; Aurelie C. Lozano; Ronny Luss; Prabhanjan Kambadur

We consider new formulations and methods for sparse quantile regression in the high-dimensional setting. Quantile regression plays an important role in many data mining applications, including outlier-robust exploratory analysis in gene selection. In addition, the sparsity consideration in quantile regression enables the exploration of the entire conditional distribution of the response variable given the predictors and therefore yields a more comprehensive view of the important predictors. We propose a generalized Orthogonal Matching Pursuit algorithm for variable selection, taking the misfit loss to be either the traditional quantile loss or a smooth version we call quantile Huber, and compare the resulting greedy approaches with convex sparsity-regularized formulations. We apply a recently proposed interior point methodology to efficiently solve all formulations, provide theoretical guarantees of consistent estimation, and demonstrate the performance of our approach using empirical studies of simulated and genomic datasets.

Nature Communications | 2018

Stratification of TAD boundaries reveals preferential insulation of super-enhancers by strong boundaries

Yixiao Gong; Charalampos Lazaris; Theodore Sakellaropoulos; Aurelie C. Lozano; Prabhanjan Kambadur; Panagiotis Ntziachristos; Iannis Aifantis; Aristotelis Tsirigos

The metazoan genome is compartmentalized in areas of highly interacting chromatin known as topologically associating domains (TADs). TADs are demarcated by boundaries mostly conserved across cell types and even across species. However, a genome-wide characterization of TAD boundary strength in mammals is still lacking. In this study, we first use fused two-dimensional lasso as a machine learning method to improve Hi-C contact matrix reproducibility, and, subsequently, we categorize TAD boundaries based on their insulation score. We demonstrate that higher TAD boundary insulation scores are associated with elevated CTCF levels and that they may differ across cell types. Intriguingly, we observe that super-enhancers are preferentially insulated by strong boundaries. Furthermore, we demonstrate that strong TAD boundaries and super-enhancer elements are frequently co-duplicated in cancer patients. Taken together, our findings suggest that super-enhancers insulated by strong TAD boundaries may be exploited, as a functional unit, by cancer cells to promote oncogenesis.Topologically associating domains (TADs) detected by Hi-C technologies are megabase-scale areas of highly interacting chromatin. Here Gong, Lazaris et al. develop a computational approach to improve the reproducibility of Hi-C contact matrices and stratify TAD boundaries based on their insulating strength.

Journal of the American Statistical Association | 2017

Statistical Tests for Large Tree-Structured Data

Karthik Bharath; Prabhanjan Kambadur; Dipak K. Dey; Arvind Rao; Veerabhadran Baladandayuthapani

ABSTRACT We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the continuum random tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton–Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ2 and F random variables. We illustrate our methods on an important application of detecting tumor heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients. Supplementary materials for this article are available online.

research in computational molecular biology | 2016

An Efficient Nonlinear Regression Approach for Genome-wide Detection of Marginal and Interacting Genetic Variations.

Seunghak Lee; Aurelie C. Lozano; Prabhanjan Kambadur; Eric P. Xing

Genome-wide association studies have revealed individual genetic variants associated with phenotypic traits such as disease risk and gene expressions. However, detecting pairwise interaction effects of genetic variants on traits still remains a challenge due to a large number of combinations of variants (\(\sim 10^{11}\) SNP pairs in the human genome), and relatively small sample sizes (typically \(< 10^{4}\)). Despite recent breakthroughs in detecting interaction effects, there are still several open problems, including: (1) how to quickly process a large number of SNP pairs, (2) how to distinguish between true signals and SNPs/SNP pairs merely correlated with true signals, (3) how to detect non-linear associations between SNP pairs and traits given small sample sizes, and (4) how to control false positives? In this paper, we present a unified framework, called SPHINX, which addresses the aforementioned challenges. We first propose a piecewise linear model for interaction detection because it is simple enough to estimate model parameters given small sample sizes but complex enough to capture non-linear interaction effects. Then, based on the piecewise linear model, we introduce randomized group lasso under stability selection, and a screening algorithm to address the statistical and computational challenges mentioned above. In our experiments, we first demonstrate that SPHINX achieves better power than existing methods for interaction detection under false positive control. We further applied SPHINX to late-onset Alzheimer’s disease dataset, and report 16 SNPs and 17 SNP pairs associated with gene traits. We also present a highly scalable implementation of our screening algorithm which can screen \(\sim \) 118 billion candidates of associations on a 60-node cluster in \(<{}5.5\) hours. SPHINX is available at http://www.cs.cmu.edu/\(\sim \)seunghak/SPHINX/.

international conference on management of data | 2016

How Twitter is Changing the Nature of Financial News Discovery

Mark Dredze; Prabhanjan Kambadur; Gary Kazantsev; Gideon Mann; Miles Osborne

Access to the most relevant and current information is critical to financial analysis and decision making. Historically, financial news has been discovered through company press releases, required disclosures and news articles. More recently, social media has reshaped the financial news landscape, radically changing the dynamics of news dissemination. In this paper we discuss the ways in which Twitter, a leading social media platform, has contributed to changes in this landscape. We explain why today Twitter is a valuable source of material financial information and describe opportunities and challenges in using this novel news source for financial information discovery.

international acm sigir conference on research and development in information retrieval | 2018

Weakly-supervised Contextualization of Knowledge Graph Facts

Nikos Voskarides; Edgar Meij; Ridho Reinanda; Abhinav Khaitan; Miles Osborne; Giorgio Stefanoni; Prabhanjan Kambadur; Maarten de Rijke

Knowledge graphs (KGs) model facts about the world; they consist of nodes (entities such as companies and people) that are connected by edges (relations such as founderOf ). Facts encoded in KGs are frequently used by search applications to augment result pages. When presenting a KG fact to the user, providing other facts that are pertinent to that main fact can enrich the user experience and support exploratory information needs. \em KG fact contextualization is the task of augmenting a given KG fact with additional and useful KG facts. The task is challenging because of the large size of KGs; discovering other relevant facts even in a small neighborhood of the given fact results in an enormous amount of candidates. We introduce a neural fact contextualization method (\em NFCM ) to address the KG fact contextualization task. NFCM first generates a set of candidate facts in the neighborhood of a given fact and then ranks the candidate facts using a supervised learning to rank model. The ranking model combines features that we automatically learn from data and that represent the query-candidate facts with a set of hand-crafted features we devised or adjusted for this task. In order to obtain the annotations required to train the learning to rank model at scale, we generate training data automatically using distant supervision on a large entity-tagged text corpus. We show that ranking functions learned on this data are effective at contextualizing KG facts. Evaluation using human assessors shows that it significantly outperforms several competitive baselines.

bioRxiv | 2017

Stratification of TAD boundaries identified in reproducible Hi-C contact matrices reveals preferential insulation of super-enhancers by strong boundaries

Yixiao Gong; Charalampos Lazaris; Theodore Sakellaropoulos; Aurelie C. Lozano; Prabhanjan Kambadur; Panagiotis Ntziachristos; Iannis Aifantis; Aristotelis Tsirigos

The metazoan genome is compartmentalized in megabase-scale areas of highly interacting chromatin known as topologically associating domains (TADs), typically identified by computational analyses of Hi-C sequencing data. TADs are demarcated by boundaries that have been shown to be largely conserved across cell types and even across species. Increasing evidence suggests that the seemingly invariant TADs may exhibit some plasticity in certain cases and their boundary strength can vary. However, a genome-wide characterization of TAD boundary strength in mammals is still lacking. In this study, we use fused two-dimensional lasso as a machine-learning method to first improve Hi-C contact matrix reproducibility and subsequently categorize TAD boundaries based on their strength. We demonstrate that increased boundary strength is associated with elevated levels of CTCF and that TAD boundary insulation scores may differ across cell types. Intriguingly, we also found that super-enhancer elements are preferentially insulated by strong boundaries. Presumably, genetic or epigenetic inactivation of strong boundaries may lead to loss of insulation around super-enhancers, disrupt the physiological transcriptional program and cause disease.The metazoan genome is compartmentalized in megabase-scale areas of highly interacting chromatin known as topologically associating domains (TADs), typically identified by computational analyses of Hi-C sequencing data. TADs are demarcated by boundaries that are largely conserved across cell types and even across species, although, increasing evidence suggests that the seemingly invariant TAD boundaries may exhibit plasticity and their insulating strength can vary. However, a genome-wide characterization of TAD boundary strength in mammals is still lacking. A systematic classification and characterization of TAD boundaries may generate new insights into their function. In this study, we first use fused two-dimensional lasso as a machine learning method to improve Hi-C contact matrix reproducibility, and, subsequently, we categorize TAD boundaries based on their insulation score. We demonstrate that higher TAD boundary insulation scores are associated with elevated CTCF levels and that they may differ across cell types. Intriguingly, we observe that super-enhancer elements are preferentially insulated by strong boundaries, i.e. boundaries of higher insulation score. Furthermore, we perform a pan-cancer analysis to demonstrate that strong TAD boundaries and super-enhancer elements are frequently co-duplicated in cancer patients. Taken together, our findings suggest that super-enhancers insulated by strong TAD boundaries may be exploited, as a functional unit, by cancer cells to promote oncogenesis.

bioRxiv | 2017