Is this you? Create Your Porfile

Nima Asadi

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nima Asadi is active.

Explore More

Publication

Featured researches published by Nima Asadi.

international world wide web conferences | 2012

Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce

Ke Zhai; Jordan L. Boyd-Graber; Nima Asadi; Mohamad L. Alkhouja

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference for LDA. In this paper, we introduce a novel and flexible large scale topic modeling package in MapReduce (Mr. LDA). As opposed to other techniques which use Gibbs sampling, our proposed framework uses variational inference, which easily fits into a distributed environment. More importantly, this variational implementation, unlike highly tuned and specialized implementations based on Gibbs sampling, is easily extensible. We demonstrate two extensions of the models possible with this scalable framework: informed priors to guide topic discovery and extracting topics from a multilingual corpus. We compare the scalability of Mr. LDA against Mahout, an existing large scale topic modeling package. Mr. LDA out-performs Mahout both in execution speed and held-out likelihood.

international acm sigir conference on research and development in information retrieval | 2013

Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures

Nima Asadi; Jimmy J. Lin

This paper examines a multi-stage retrieval architecture consisting of a candidate generation stage, a feature extraction stage, and a reranking stage using machine-learned models. Given a fixed set of features and a learning-to-rank model, we explore effectiveness/efficiency tradeoffs with three candidate generation approaches: postings intersection with SvS, conjunctive query evaluation with WAND, and disjunctive query evaluation with WAND. We find no significant differences in end-to-end effectiveness as measured by NDCG between conjunctive and disjunctive WAND, but conjunctive query evaluation is substantially faster. Postings intersection with SvS, while fast, yields substantially lower end-to-end effectiveness, suggesting that document and term frequencies remain important in the initial ranking stage. These findings show that conjunctive WAND is the best overall candidate generation strategy of those we examined.

IEEE Transactions on Knowledge and Data Engineering | 2014

Runtime Optimizations for Tree-Based Machine Learning Models

Nima Asadi; Jimmy J. Lin; Arjen P. de Vries

Tree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression trees for learning to rank. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processors. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures. Experiments on synthetic data and on three standard learning-to-rank datasets show that our approach is significantly faster than standard implementations.

ACM Transactions on Information Systems | 2013

Fast candidate generation for real-time tweet search with bloom filter chains

Nima Asadi; Jimmy J. Lin

The rise of social media and other forms of user-generated content have created the demand for real-time search: against a high-velocity stream of incoming documents, users desire a list of relevant results at the time the query is issued. In the context of real-time search on tweets, this work explores candidate generation in a two-stage retrieval architecture where an initial list of results is processed by a second-stage rescorer to produce the final output. We introduce Bloom filter chains, a novel extension of Bloom filters that can dynamically expand to efficiently represent an arbitrarily long and growing list of monotonically-increasing integers with a constant false positive rate. Using a collection of Bloom filter chains, a novel approximate candidate generation algorithm called BWand is able to perform both conjunctive and disjunctive retrieval. Experiments show that our algorithm is many times faster than competitive baselines and that this increased performance does not require sacrificing end-to-end effectiveness. Our results empirically characterize the trade-off space defined by output quality, query evaluation speed, and memory footprint for this particular search architecture.

Information Retrieval | 2013

Document vector representations for feature extraction in multi-stage document ranking

Nima Asadi; Jimmy J. Lin

We consider a multi-stage retrieval architecture consisting of a fast, “cheap” candidate generation stage, a feature extraction stage, and a more “expensive” reranking stage using machine-learned models. In this context, feature extraction can be accomplished using a document vector index, a mapping from document ids to document representations. We consider alternative organizations of such a data structure for efficient feature extraction: design choices include how document terms are organized, how complex term proximity features are computed, and how these structures are compressed. In particular, we propose a novel document-adaptive hashing scheme for compactly encoding term ids. The impact of alternative designs on both feature extraction speed and memory footprint is experimentally evaluated. Overall, results show that our architecture is comparable in speed to using a traditional positional inverted index but requires less memory overall, and offers additional advantages in terms of flexibility.

international acm sigir conference on research and development in information retrieval | 2011

Cross-corpus relevance projection

Nima Asadi; Donald Metzler; Jimmy J. Lin

Document corpora are key components of information retrieval test collections. However, for certain tasks, such as evaluating the effectiveness of a new retrieval technique or estimating the parameters of a learning to rank model, a corpus alone is not enough. For these tasks, queries and relevance judgments associated with the corpus are also necessary. However, researchers often find themselves in scenarios where they only have access to a corpus, in which case evaluation and learning to rank become challenging. Document corpora are relatively straightforward to gather. On the other hand, obtaining queries and relevance judgments for a given corpus is costly. In production environments, it may be possible to obtain low-cost relevance information using query and click logs. However, in more constrained research environments these options are not available, and relevance judgments are usually provided by humans. To reduce the cost of this potentially expensive process, researchers have developed low-cost evaluation strategies, including minimal test collections [2] and crowdsourcing [1]. Despite the usefulness of these strategies, the resulting relevance judgments cannot easily be “ported” to a new or different corpus. To overcome these issues, we propose a new method to reduce manual annotation costs by transferring relevance judgments across corpora. Assuming that a set of queries and relevance judgments have been manually constructed for a source document corpus Ds, our goal is to automatically construct a test collection for a target document corpus Dt by projecting the existing test collection from Ds onto Dt. The goal of projecting test collections is not to produce manual quality test collections. In fact, it is assumed that projected test collections will contain noisy relevance judgments (i.e., ones which humans are unlikely to agree with). The important question, however, is whether these noisy projected judgments are useful for training ranking models in the target corpus.

international acm sigir conference on research and development in information retrieval | 2011