Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Virgil Pavlu is active.

Publication


Featured researches published by Virgil Pavlu.


international acm sigir conference on research and development in information retrieval | 2013

A document rating system for preference judgements

Maryam Bashir; Jesse Anderton; Jie Wu; Peter B. Golbus; Virgil Pavlu; Javed A. Aslam

High quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by factors such as inter-assessor disagreement and the arbitrariness of grades. Previous research has shown that it is easier for assessors to make pairwise preference judgments. However, unless the preferences collected are largely transitive, it is not clear how to combine them in order to obtain document relevance scores. Another difficulty is that the number of pairs that need to be assessed is quadratic in the number of documents. In this work, we consider the problem of inferring document relevance scores from pairwise preference judgments by analogy to tournaments using the Elo rating system. We show how to combine a linear number of pairwise preference judgments from multiple assessors to compute relevance scores for every document.


conference on information and knowledge management | 2015

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

Pavel Metrikov; Virgil Pavlu; Javed A. Aslam

Existing approaches used for training and evaluating search engines often rely on crowdsourced assessments of document relevance with respect to a user query. To use such assessments for either evaluation or learning, we propose a new framework for the inference of true document relevance from crowdsourced data---one simpler than previous approaches and achieving better performance. For each assessor, we model assessor quality and bias in the form of Gaussian distributed class conditionals of relevance grades. For each document, we model true relevance and difficulty as continuous variables. We estimate all parameters from crowdsourced data, demonstrating better inference of relevance as well as realistic models for both documents and assessors. A document-pair likelihood model works best, and it is extended to pairwise learning to rank. Utilizing more information directly from the input data, it shows better performance as compared to existing state-of-the-art approaches for learning to rank from crowdsourced assessments. Experimental validation is performed on four TREC datasets.


conference on information and knowledge management | 2016

A Study of Realtime Summarization Metrics

Matthew Ekstrand-Abueg; Richard McCreadie; Virgil Pavlu; Fernando Diaz

Unexpected news events, such as natural disasters or other human tragedies, create a large volume of dynamic text data from official news media as well as less formal social media. Automatic real-time text summarization has become an important tool for quickly transforming this overabundance of text into clear, useful information for end-users including affected individuals, crisis responders, and interested third parties. Despite the importance of real-time summarization systems, their evaluation is not well understood as classic methods for text summarization are inappropriate for real-time and streaming conditions. The TREC 2013-2015 Temporal Summarization (TREC-TS) track was one of the first evaluation campaigns to tackle the challenges of real-time summarization evaluation, introducing new metrics, ground-truth generation methodology and dataset. In this paper, we present a study of TREC-TS track evaluation methodology, with the aim of documenting its design, analyzing its effectiveness, as well as identifying improvements and best practices for the evaluation of temporal summarization systems.


international acm sigir conference on research and development in information retrieval | 2013

Exploring semi-automatic nugget extraction for Japanese one click access evaluation

Matthew Ekstrand-Abueg; Virgil Pavlu; Makoto Kato; Tetsuya Sakai; Takehiro Yamamoto

Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically-extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text.


international acm sigir conference on research and development in information retrieval | 2013

Live nuggets extractor: a semi-automated system for text extraction and test collection creation

Matthew Ekstrand-Abueg; Virgil Pavlu; Javed A. Aslam

The Live Nugget Extractor system provides users with a method of efficiently and accurately collecting relevant information for any web query rather than providing a simple ranked lists of documents. The system utilizes an online learning procedure to infer relevance of unjudged documents while extracting and ranking information from judged documents. This creates a set of judged and inferred relevance scores for both documents and text fragments, which can be used for test collections, summarization, and other tasks where high accuracy and large collections with minimal human effort are needed.


international conference on the theory of information retrieval | 2013

A Modification of LambdaMART to Handle Noisy Crowdsourced Assessments

Pavel Metrikov; Jie Wu; Jesse Anderton; Virgil Pavlu; Javed A. Aslam

We consider noisy crowdsourced assessments and their impact on learning-to-rank algorithms. Starting with EM-weighted assessments, we modify LambdaMART in order to use smoothed probabilistic preferences over pairs of documents, directly as input to the ranking algorithm.


european conference on information retrieval | 2016

An Empirical Study of Skip-Gram Features and Regularization for Learning on Sentiment Analysis

Cheng Li; Bingyu Wang; Virgil Pavlu; Javed A. Aslam

The problem of deciding the overall sentiment of a user review is usually treated as a text classification problem. The simplest machine learning setup for text classification uses a unigram bag-of-words feature representation of documents, and this has been shown to work well for a number of tasks such as spam detection and topic classification. However, the problem of sentiment analysis is more complex and not as easily captured with unigram (single-word) features. Bigram and trigram features capture certain local context and short distance negations—thus outperforming unigram bag-of-words features for sentiment analysis. But higher order n-gram features are often overly specific and sparse, so they increase model complexity and do not generalize well.


text retrieval conference | 2007

Million Query Track 2007 Overview

James Allan; Ben Carterette; Javed A. Aslam; Virgil Pavlu; Blagovest Dachev; Evangelos Kanoulas


text retrieval conference | 2013

TREC 2013 Temporal Summarization.

Javed A. Aslam; Matthew Ekstrand-Abueg; Virgil Pavlu; Fernando Diaz; Tetsuya Sakai


NTCIR | 2013

Overview of the NTCIR-10 1CLICK-2 Task.

Makoto Kato; Matthew Ekstrand-Abueg; Virgil Pavlu; Tetsuya Sakai; Takehiro Yamamoto

Collaboration


Dive into the Virgil Pavlu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bingyu Wang

Northeastern University

View shared research outputs
Top Co-Authors

Avatar

Cheng Li

Northeastern University

View shared research outputs
Top Co-Authors

Avatar

Jie Wu

Northeastern University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge