Piyush Arora | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Piyush Arora is active.

Explore More

Publication

Featured researches published by Piyush Arora.

international conference on computational linguistics | 2014

DCU: Aspect-based Polarity Classification for SemEval Task 4

Joachim Wagner; Piyush Arora; Santiago Cortes; Utsab Barman; Dasha Bogdanova; Jennifer Foster; Lamia Tounsi

We describe the work carried out by DCU on the Aspect Based Sentiment Analysis task at SemEval 2014. Our team submitted one constrained run for the restaurant domain and one for the laptop domain for sub-task B (aspect term polarity prediction), ranking highest out of 36 systems on the restaurant test set and joint highest out of 32 systems on the laptop test set.

advances in social networks analysis and mining | 2015

The Good, the Bad and their Kins: Identifying Questions with Negative Scores in StackOverflow

Piyush Arora; Debasis Ganguly; Gareth J. F. Jones

A rapid increase in the number of questions posted on community question answering (CQA) forums is creating a need for automated methods of question quality moderation to improve the effectiveness of such forums in terms of response time and quality. Such automated approaches should aim to classify questions as good or bad for a particular forum as soon as they are posted based on the guidelines and quality standards defined/listed by the forum. Thus, if a question meets the standard of the forum then it is classified as good else we classify it as bad. In this paper, we propose a method to address this problem of question classification by retrieving similar questions previously asked in the same forum, and then using the text from these previously asked similar questions to predict the quality of the current question. We empirically validate our proposed approach on the set of StackOverflow data, a massive CQA forum for programmers, comprising of about 8M questions. With the use of these additional text retrieved from similar questions, we are able to improve the question quality prediction accuracy by about 2.8% and improve the recall of negatively scored questions by about 4.2%. This improvement of 4.2% in recall would be helpful in automatically flagging questions as bad (unsuitable) for the forum and will speed up the moderation process thus saving time and human effort.

international world wide web conferences | 2015

A Comparative Study of Online Translation Services for Cross Language Information Retrieval

Ali Hosseinzadeh Vahid; Piyush Arora; Qun Liu; Gareth J. F. Jones

Technical advances and its increasing availability, mean that Machine Translation (MT) is now widely used for the translation of search queries in multilingual search tasks. A number of free-to-use high-quality online MT systems are now available and, although imperfect in their translation behavior, are found to produce good performance in Cross-Language Information Retrieval (CLIR) applications. Users of these MT systems in CLIR tasks generally assume that they all behave similarly in CLIR applications, and the choice of MT system is often made on the basis of convenience. We present a set of experiments which compare the impact of applying two of the best known online systems, Google and Bing translation, for query translation across multiple language pairs and for two very different CLIR tasks. Our experiments show that the MT systems perform differently on average for different tasks and language pairs, but more significantly for different individual queries. We examine the differing translation behavior of these tools and seek to draw conclusions in terms of their suitability for use in different settings.

north american chapter of the association for computational linguistics | 2015

DCU: Using Distributional Semantics and Domain Adaptation for the Semantic Textual Similarity SemEval-2015 Task 2

Piyush Arora; Chris Hokamp; Jennifer Foster; Gareth J. F. Jones

We describe the work carried out by the DCU team on the Semantic Textual Similarity task at SemEval-2015. We learn a regression model to predict a semantic similarity score between a sentence pair. Our system exploits distributional semantics in combination with tried-and-tested features from previous tasks in order to compute sentence similarity. Our team submitted 3 runs for each of the five English test sets. For two of the test sets, belief and headlines, our best system ranked second and fourth out of the 73 submitted systems. Our best submission averaged over all test sets ranked 26 out of the 73 systems.

workshop on statistical machine translation | 2014

DCU-Lingo24 Participation in WMT 2014 Hindi-English Translation task

Xiaofeng Wu; Rejwanul Haque; Tsuyoshi Okita; Piyush Arora; Andy Way; Qun Liu

This paper describes the DCU-Lingo24 submission to WMT 2014 for the HindiEnglish translation task. We exploit miscellaneous methods in our system, including: Context-Informed PB-SMT, OOV Word Conversion (OWC), MultiAlignment Combination (MAC), Operation Sequence Model (OSM), Stemming Align and Normal Phrase Extraction (SANPE), and Language Model Interpolation (LMI). We also describe various preprocessing steps we tried for Hindi in this task.

cross language evaluation forum | 2017

Query Expansion for Sentence Retrieval Using Pseudo Relevance Feedback and Word Embedding

Piyush Arora; Jennifer Foster; Gareth J. F. Jones

This study investigates the use of query expansion (QE) methods in sentence retrieval for non-factoid queries to address the query-document term mismatch problem. Two alternative QE approaches: i) pseudo relevance feedback (PRF), using Robertson term selection, and ii) word embeddings (WE) of query words, are explored. Experiments are carried out on the WebAP data set developed using the TREC GOV2 collection. Experimental results using P@10, NDCG@10 and MRR show that QE using PRF achieves a statistically significant improvement over baseline retrieval models, but that while WE also improves over the baseline, this is not statistically significant. A method combining PRF and WE expansion performs consistently better than using only the PRF method.

conference on human information interaction and retrieval | 2017

Identifying Useful and Important Information within Retrieved Documents

Piyush Arora; Gareth J. F. Jones

We describe an initial study into the identification of important and useful information units within documents retrieved by an information retrieval system in response to a user query created in response to an underlying information need. This study is part of a large investigation of the exploitation of useful and important units from retrieved documents to generate rich document surrogates to improve user search experience. We report three user studies using a crowdsourcing platform, where participants were first asked to read an information need and contents of a relevant document and then to perform actions depending on the type of study: i) write important information units (WIIU), ii) highlight important information units (HIIU) and iii) assess importance of already highlighted information units (AIHIU). Further, we discuss a novel mechanism of measuring similarities between content annotations. We find majority agreement of about 0.489 and pairwise agreement of 0.340 among users annotation in the AIHIU study, and average cosine similarity of 0.50 and 0.57 between participant annotations and documents in the WIIU and HIIU studies respectively.

north american chapter of the association for computational linguistics | 2016

DCU-SEManiacs at SemEval-2016 Task 1: Synthetic Paragram Embeddings for Semantic Textual Similarity.

Chris Hokamp; Piyush Arora

We experiment with learning word representations designed to be combined into sentence level semantic representations, using an objective function which does not directly make use of the supervised scores provided with the training data, instead opting for a simpler objective which encourages similar phrases to be close together in the embedding space. This simple objective lets us start with high quality embeddings trained using the Paraphrase Database (PPDB) (Wieting et al., 2015; Ganitkevitch et al., 2013), and then tune these embeddings using the official STS task training data, as well as synthetic paraphrases for each test dataset, obtained by pivoting through machine translation. Our submissions include runs which only compare the similarity of phrases in the embedding space, directly using the similarity score to produce predictions, as well as a run which uses vector similarity in addition to a suite of features we investigated for our 2015 Semeval submission. For the crosslingual task, we simply translate the Spanish sentences to English, and use the same system we designed for the monolingual task.

international acm sigir conference on research and development in information retrieval | 2015

Promoting User Engagement and Learning in Amorphous Search Tasks

Piyush Arora

Much research in information retrieval (IR) focuses on optimization of the rank of relevant retrieval results for single shot ad hoc IR tasks. Relatively little research has been carried out on user engagement to support more complex search tasks. We seek to improve user engagement for IR tasks by providing richer representation of retrieved information. It is our expectation that this strategy will promote implicit learning within search activities. Specifically, we plan to explore methods of finding semantic concepts within retrieved documents, with the objective of creating improved document surrogates. Further, we would like to study search effectiveness in terms of different facets such as the users search experience, satisfaction, engagement and learning. We intend to investigate this in an experimental study, where our richer document representations are compared with the traditional document surrogates for the same user queries.

forum for information retrieval evaluation | 2013

Applying Query Formulation and Fusion Techniques For Cross Language News Story Search

Piyush Arora; Jennifer Foster; Gareth J. F. Jones

Cross Language News story search (CLNSS) is concerned with finding documents describing the same events in documents in different languages. As well as supporting information retrieval (IR), CLNSS has other applications in mining parallel and comparable data across different languages. In this paper, we present an overview of the work carried out for our participation in the Cross Language !ndian News Story Search (CL!NSS) task at FIRE 2013. In the CL!NSS task we explored the problem of cross language news search for the English-Hindi language pair. English news stories are used as queries to seek similar news documents from Hindi news articles. Hindi being a resource-scarce language offers many challenges towards retrieving relevant news articles. We investigate and contrast translation of input queries from English to Hindi using the Google and Bing translation services. To support translation of out-of-vocabulary words we use the Google transliteration service. A key challenge of the CL!NSS task is formation of search queries from the English news articles, since they are much longer than the much shorter queries typically used in IR applications. To address this problem, we explore the use of summarization to extract a query from the input news documents, and use these summarized queries as the input to the cross language IR system. We explore the use of query expansion using pseudo relevance feedback (PRF) in the IR process, since this has been shown to be effective for cross language IR in many previous investigations. We also explore in detail the use of data fusion techniques over different sets of retrieved results obtained using diverse query formulation techniques. For the CL!NSS task our team submitted 3 main runs. The results of our best run was ranked first among official submissions based on NDCG@5 and NDCG@10 values and second for NDCG@1 values. For the 25 test queries the results of our best main run were NDCG@1 0.7400, NDCG@5 0.6809 and NDCG@10 0.7268. We present our methodology, official results and results of a number of post-task experiments that were conducted to further examine the cross language search problem. Our experiments reveal that query formulation plays a vital role in improving search results for news documents across different languages. Instead of using the complete news documents the summarized queries show better performance. Data fusion techniques also help to improve the performance of the system by boosting the rank of documents, thus improving the NDCG scores.

Explore More