Denzil Correa
Indraprastha Institute of Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Denzil Correa.
international world wide web conferences | 2014
Denzil Correa; Ashish Sureka
Stack Overflow is the most popular Community based Question Answering (CQA) website for programmers on the web with 2.05M users, 5.1M questions and 9.4M answers. Stack Overflow has explicit, detailed guidelines on how to post questions and an ebullient moderation community. Despite these precise communications and safeguards, questions posted on Stack Overflow can be extremely off topic or very poor in quality. Such questions can be deleted from Stack Overflow at the discretion of experienced community members and moderators. We present the first study of deleted questions on Stack Overflow. We divide our study into two parts - (i) Characterization of deleted questions over ~5 years (2008-2013) of data, (ii) Prediction of deletion at the time of question creation. Our characterization study reveals multiple insights on question deletion phenomena. We find that it takes substantial time to vote a question to be deleted but once voted, the community takes swift action. We also see that question authors delete their questions to salvage reputation points. We notice some instances of accidental deletion of good quality questions but such questions are voted back to be undeleted quickly. We discover a pyramidal structure of question quality on Stack Overflow and find that deleted questions lie at the bottom (lowest quality) of the pyramid. We also build a predictive model to detect the deletion of question at the creation time. We experiment with 47 features -- based on User Profile, Community Generated, Question Content and Syntactic style -- and report an accuracy of 66%. Our findings reveal important suggestions for content quality maintenance on community based question answering websites. To the best of our knowledge, this is the first large scale study on poor quality (deleted) questions on Stack Overflow.
Proceedings of the 3rd international workshop on Search and mining user-generated contents | 2011
Denzil Correa; Ashish Sureka
Automatic tag recommendation or annotation can help in improving the efficiency of text-based information retrieval on online social media services like Blogger, Last.FM, Flickr and YouTube. In this work, we investigate alternate solutions for tag recommendations by employing a Wisdom of Crowd approach in a mashup framework. In particular, we mine tweets on Twitter and use their hashtag(s) and content to annotate videos on Flickr, Photobucket, YouTube, Dailymotion and SoundCloud. We crawl Twitter to collect a random sample of tweets containing Flickr, Photo- bucket, YouTube, Dailymotion and SoundCloud URLs. We then recommend tags for these services using hashtag(s) and content present in tweets. We use a hybrid technique (automated and manual) to validate our results on different subsets (presence / absence of hashtags, presence / absence of media tags) of data. Experimental results demonstrate that the proposed solution approach is effective and reliable.
ACM Sigcas Computers and Society | 2016
Swati Agarwal; Nitish Mittal; Rohan Katyal; Ashish Sureka; Denzil Correa
The low participation by women authors in research is an important equity issue in Computer Science Research (CSR). There are various parameters and methodologies that can be used to measure the gender imbalance. In this paper, we present a study on gender gap, imbalance and women participation in CSR. We conduct our experiments on DBLP bibliographical database and analyze several years of publication dataset across various domains of CSR. We perform Exploratory Data Analysis on the bibliographical dataset and study the trend of gender imbalance over several years. We propose eight research questions across various facets and our results shows a significant gender imbalance in different sub-fields within CSR and low rate of women participation across various regions of world.
conference on privacy, security and trust | 2012
Denzil Correa; Ashish Sureka; Raghav Sethi
Twitter is a popular micro-blogging website which allows users to post 140-character limit messages called tweets. We demonstrate a cheap and elegant solution - WhACKY! - to harness the multi-source information from tweets to link Twitter profiles across other external services. In particular, we exploit activity feed sharing patterns to map Twitter profiles to their corresponding external service accounts using publicly available APIs. We illustrate a proof-of-concept by mapping 69,496 Twitter profiles to at least one of the five popular external services : Flickr (photo-sharing service), Foursquare (location-based service), YouTube (video-sharing service), Facebook (a popular social network) and LastFM (music-sharing service). We evaluate our solution against a commercial social identity mapping service - FlipTop - and demonstrate the efficiency of our approach. WhACKY! guarantees that the mapped profiles are 100% true-positive and helps quantify the unintended leakage of Personally Identifiable Information (PII) attributes. During the process, WhACKY! is also able to detect duplicate Twitter profiles connected to multiple external services.We also develop a web application based on WhACKY!1 for perusal by Twitterers which can help them better understand unintended leakage of their PII.
australian software engineering conference | 2013
Denzil Correa; Ashish Sureka
Issue tracking systems such as Bugzilla are tools to facilitate collaboration between software maintenance professionals. Popular issue tracking systems consists of discussion forums to facilitate bug reporting and comment posting. We observe that several comments posted in issue tracking system contains link to external websites such as YouTube (video sharing website), Twitter (micro-blogging website), Stack overflow (a community-based question and answering website for programmers), Wikipedia and focused discussions forums. Stack overflow is a popular community-based question and answering website for programmers and is widely used by software engineers as it contains answers to millions of questions (an extensive knowledge resource) posted by programmers on diverse topics. We conduct a series of experiments on open-source Google Chromium and Android issue tracker data (publicly available real-world dataset) to understand the role and impact of Stack overflow in issue resolution. Our experimental results show evidences of several references to Stack overflow in threaded discussions and demonstrate correlation between a lower mean time to repair (in one dataset) with presence of Stack overflow links. We also observe that the average number of comments posted in response to bug reports are less when Stack overflow links are presented in contrast to bug reports not containing Stack overflow references. We conduct experiments based on textual similarly analysis (content-based linguistic features) and contextual data analysis (exploited metadata such as tags associated to a Stack overflow question) to recommend Stack overflow questions for an incoming bug report. We perform empirical analysis to measure the effectiveness of the proposed method on a dataset containing ground-truth and present our insights. We present the result of a survey (of Google Chromium Developers) that we conducted to understand practitioners perspective and experience.
granular computing | 2009
Ashish Sureka; Vikram Goyal; Denzil Correa; Anirban Mondal
Semantic orientation of a word indicates whether the word denotes a positive or a negative evaluation. We present an approach to compute semantic orientation of words using machine-interpretable common-sense knowledge. We employ ConceptNet (a large semantic network of commonsense knowledge) for determining the polarity or semantic orientation of a sentiment expressing word. We apply heuristics on certain pre-defined predicates expressing semantic relationship between two concepts for classifying words that have a positive or negative polarity and finding words that have similar polarity. The advantages of the proposed approach are that it does not require any pre-annotated training dataset or manually created seed list. The proposed solution relies on a lexical resource which is created by volunteers on the Internet and not by trained or specialized knowledge engineers. We test our approach on publicly available pre-classified sentiment lexicon and present the results of our experiments and also examine the tradeoffs and limitations of the proposed solution. We conclude that it is possible to determine polarity of words with high accuracy by exploiting a machine-understandable laymans knowledge and basic facts that ordinary people know about the world.
asia-pacific software engineering conference | 2013
Denzil Correa; Sangeeta Lal; Apoorv Saini; Ashish Sureka
Several widely used Issue tracking systems (such as Google Issue Tracker and Bugzilla) contains an integrated threaded discussion forum to facilitate discussion between the development and maintenance team (bug reporters, bug triagers, bug fixers and quality assurance managers). We observe that several comments (and even bug report descriptions) posted to issue tracing system contains links to external websites as references to knowledge sources relevant to the discussion. We conduct a survey (and present the results of the survey) of Google Chromium Developers on the importance and usefulness of web references in issue tracking system comments and the need of a web-browser extension which facilitates easy organization and inclusion of web-links in the post. We conduct a characterization study on an experimental dataset from Google Chromium Issue Tracking system and present results on the distribution of number of links in the dataset, categorization of links into pre-defined classes (such as blogs, community based Q&A websites, developer discussion forums, version control system), correlation of number and types of links with various bug report types (such as security, crash, regression and clean-up) and relation between presence of links and bug resolution time. Survey results and data characterization study motivate the need of building a developer productivity tool to facilitate web-link (as references) organization and inclusion in issue tracking system comments. We present a Google Chromium Web Browser Extension called as Samekana and publish the extension on Google Chromium Web Store which can be freely downloaded by users worldwide. The extension contains features such as annotating (using tags, title and description) and saving web references pertaining to multiple bug reports and tasks and then posting it as bibliography (for easy citation and reference) in issue tracking system comments.
conference on online social networks | 2013
Denzil Correa; Ashish Sureka
Archive | 2010
Ashish Sureka; Denzil Correa
Proceedings of the 5th Ph.D. workshop on Information and knowledge | 2012
Denzil Correa; Ashish Sureka