Is this you? Create Your Porfile

Vineet Chaoji

Rensselaer Polytechnic Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vineet Chaoji is active.

Explore More

Publication

Featured researches published by Vineet Chaoji.

international world wide web conferences | 2010

Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Medha Atre; Vineet Chaoji; Mohammed Javeed Zaki; James A. Hendler

The Semantic Web community, until now, has used traditional database systems for the storage and querying of RDF data. The SPARQL query language also closely follows SQL syntax. As a natural consequence, most of the SPARQL query processing techniques are based on database query processing and optimization techniques. For SPARQL join query optimization, previous works like RDF-3X and Hexastore have proposed to use 6-way indexes on the RDF data. Although these indexes speed up merge-joins by orders of magnitude, for complex join queries generating large intermediate join results, the scalability of the query processor still remains a challenge. In this paper, we introduce (i) BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and (ii) a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. We have demonstrated our method against RDF graphs of upto 1.33 billion triples - the largest among results published until now (single-node, non-parallel systems), and have compared our method with the state-of-the-art RDF stores - RDF-3X and MonetDB. Our results show that the competing methods are most effective with highly selective queries. On the other hand, BitMat can deliver 2-3 orders of magnitude better performance on complex, low-selectivity queries over massive data.

PLOS ONE | 2009

Proteomic and phospho-proteomic profile of human platelets in basal, resting state: insights into integrin signaling.

Amir H. Qureshi; Vineet Chaoji; Dony Maiguel; Mohd Hafeez Faridi; Constantinos J. Barth; Saeed Salem; Mudita Singhal; Darren Stoub; Bryan Krastins; Mitsunori Ogihara; Mohammed Javeed Zaki; Vineet Gupta

During atherogenesis and vascular inflammation quiescent platelets are activated to increase the surface expression and ligand affinity of the integrin αIIbβ3 via inside-out signaling. Diverse signals such as thrombin, ADP and epinephrine transduce signals through their respective GPCRs to activate protein kinases that ultimately lead to the phosphorylation of the cytoplasmic tail of the integrin αIIbβ3 and augment its function. The signaling pathways that transmit signals from the GPCR to the cytosolic domain of the integrin are not well defined. In an effort to better understand these pathways, we employed a combination of proteomic profiling and computational analyses of isolated human platelets. We analyzed ten independent human samples and identified a total of 1507 unique proteins in platelets. This is the most comprehensive platelet proteome assembled to date and includes 190 membrane-associated and 262 phosphorylated proteins, which were identified via independent proteomic and phospho-proteomic profiling. We used this proteomic dataset to create a platelet protein-protein interaction (PPI) network and applied novel contextual information about the phosphorylation step to introduce limited directionality in the PPI graph. This newly developed contextual PPI network computationally recapitulated an integrin signaling pathway. Most importantly, our approach not only provided insights into the mechanism of integrin αIIbβ3 activation in resting platelets but also provides an improved model for analysis and discovery of PPI dynamics and signaling pathways in the future.

very large data bases | 2010

GRAIL: scalable reachability index for large graphs

Hilmi Yildirim; Vineet Chaoji; Mohammed Javeed Zaki

Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability trade-off indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they simply do not scale to very large real-world graphs. We present a very simple, but scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling, and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size.Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability trade-off indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they simply do not scale to very large real-world graphs. We present a very simple, but scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling, and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size.

international conference on data mining | 2007

ORIGAMI: Mining Representative Orthogonal Graph Patterns

M. Al Hasan; Vineet Chaoji; Saeed Salem; Jérémy Besson; Mohammed Javeed Zaki

In this paper, we introduce the concept of alpha-orthogonal patterns to mine a representative set of graph patterns. Intuitively, two graph patterns are alpha-orthogonal if their similarity is bounded above by alpha. Each alpha-orthogonal pattern is also a representative for those patterns that are at least beta similar to it. Given user defined alpha, beta isin [0,1], the goal is to mine an alpha-orthogonal, beta-representative set that minimizes the set of unrepresented patterns. We present ORIGAMI, an effective algorithm for mining the set of representative orthogonal patterns. ORIGAMI first uses a randomized algorithm to randomly traverse the pattern space, seeking previously unexplored regions, to return a set of maximal patterns. ORIGAMI then extracts an alpha-orthogonal, beta-representative set from the mined maximal patterns. We show the effectiveness of our algorithm on a number of real and synthetic datasets. In particular, we show that our method is able to extract high quality patterns even in cases where existing enumerative graph mining methods fail to do so.

international world wide web conferences | 2012

Recommendations to boost content spread in social networks

Vineet Chaoji; Sayan Ranu; Rajeev Rastogi; Rushi Bhatt

Content sharing in social networks is a powerful mechanism for discovering content on the Internet. The degree to which content is disseminated within the network depends on the connectivity relationships among network nodes. Existing schemes for recommending connections in social networks are based on the number of common neighbors, similarity of user profiles, etc. However, such similarity-based connections do not consider the amount of content discovered. In this paper, we propose novel algorithms for recommending connections that boost content propagation in a social network without compromising on the relevance of the recommendations. Unlike existing work on influence propagation, in our environment, we are looking for edges instead of nodes, with a bound on the number of incident edges per node. We show that the content spread function is not submodular, and develop approximation algorithms for computing a near-optimal set of edges. Through experiments on real-world social graphs such as Flickr and Twitter, we show that our approximation algorithms achieve content spreads that are as much as 90 times higher compared to existing heuristics for recommending connections.

conference on information and knowledge management | 2010

Predicting product adoption in large-scale social networks

Rushi Bhatt; Vineet Chaoji; Rajesh Parekh

Online social networks offer opportunities to analyze user behavior and social connectivity and leverage resulting insights for effective online advertising. We study the adoption of a paid product by members of a large and well-connected Instant Messenger (IM) network. This product is important to the business and poses unique challenges to advertising due to its low baseline adoption rate. We find that adoption by highly connected individuals is correlated with their social connections (friends) adopting after them. However, there is little evidence of social influence by these high degree individuals. Further, the spread of adoption remains mostly local to first-adopters and their immediate friends. We observe strong evidence of peer pressure wherein future adoption by an individual is more likely if the product has been widely adopted by the individuals friends. Social neighborhoods rich in adoptions also continue to add more new adoptions compared to those neighborhoods that are poor in adoption. Using these insights we build predictive models to identify individuals most suited for two types of marketing campaigns - direct marketing where individuals with highest propensity for future adoption are targeted with suitable ads and social neighborhood marketing which involves messaging to members of the social network who are most effective in using the power of their network to convince their friends to adopt. We identify the most desirable features for predicting future adoption of the PC To Phone product which can in turn be leveraged to effectively promote its adoption. Offline analysis shows that building predictive models for direct marketing and social neighborhood marketing outperforms several widely accepted marketing heuristics. Further, these models are able to effectively combine user features and social features to predict adoption better than using either user features or social features in isolation.

very large data bases | 2012

GRAIL: a scalable index for reachability queries in very large graphs

Hilmi Yildirim; Vineet Chaoji; Mohammed Javeed Zaki

Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/.

Pattern Recognition Letters | 2009

Robust partitional clustering by outlier and density insensitive seeding

Mohammad Al Hasan; Vineet Chaoji; Saeed Salem; Mohammed Javeed Zaki

The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality.

Data Mining and Knowledge Discovery | 2008

An integrated, generic approach to pattern mining: data mining template library

Vineet Chaoji; Mohammad Al Hasan; Saeed Salem; Mohammed Javeed Zaki

Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software (http://dmtl.sourceforge.net/), and has been downloaded by numerous researchers from all over the world.

international conference on formal concept analysis | 2005

Towards generic pattern mining

Mohammed Javeed Zaki; Nagender Parimi; Nilanjana De; Feng Gao; Benjarath Phoophakdee; Joe Urban; Vineet Chaoji; Mohammad Al Hasan; Saeed Salem

Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for FPM, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. Our experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.

Explore More