Yasen Kiprov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasen Kiprov is active.

Explore More

Publication

Featured researches published by Yasen Kiprov.

north american chapter of the association for computational linguistics | 2016

PMI-cool at SemEval-2016 Task 3: Experiments with PMI and Goodness Polarity Lexicons for Community Question Answering.

Daniel Balchev; Yasen Kiprov; Ivan Koychev; Preslav Nakov

We describe our submission to SemEval-2016 Task 3 on Community Question Answering. We participated in subtask A, which asks to rerank the comments from the thread for a given forum question from good to bad. Our approach focuses on the generation and use of goodness polarity lexicons, similarly to the sentiment polarity lexicons, which are very popular in sentiment analysis. In particular, we use a combination of bootstrapping and pointwise mutual information to estimate the strength of association between a word (from a large unannotated set of question-answer threads) and the class of good/bad comments. We then use various features based on these lexicons to train a regression model, whose predictions we use to induce the final comment ranking. While our system was not very strong as it lacked important features, our lexicons contributed to the strong performance of another top-performing system.

cross language evaluation forum | 2017

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Georgi Karadzhov; Tsvetomila Mihaylova; Yasen Kiprov; Georgi Georgiev; Ivan Koychev; Preslav Nakov

Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a texts writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient, and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.

international acm sigir conference on research and development in information retrieval | 2017

Large-Scale Goodness Polarity Lexicons for Community Question Answering

Todor Mihaylov; Daniel Balchev; Yasen Kiprov; Ivan Koychev; Preslav Nakov

We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.

north american chapter of the association for computational linguistics | 2016

SUper Team at SemEval-2016 Task 3: Building a Feature-Rich System for Community Question Answering.

Tsvetomila Mihaylova; Pepa Gencheva; Martin Boyanov; Ivana Yovcheva; Todor Mihaylov; Momchil Hardalov; Yasen Kiprov; Daniel Balchev; Ivan Koychev; Preslav Nakov; Ivelina Nikolova; Galia Angelova

We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors trained on QatarLiving data and similarities between the question and the comment for subtasks A and C, and between the original and the related question for Subtask B.

international conference on computational linguistics | 2014

SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter

Boris Velichkov; Borislav Kapukaranov; Ivan Grozev; Jeni Karanesheva; Todor Mihaylov; Yasen Kiprov; Preslav Nakov; Ivan Koychev; Georgi Georgiev

We describe the submission of the team of the Sofia University to SemEval-2014 Task 9 on Sentiment Analysis in Twitter. We participated in subtask B, where the participating systems had to predict whether a Twitter message expresses positive, negative, or neutral sentiment. We trained an SVM classifier with a linear kernel using a variety of features. We used publicly available resources only, and thus our results should be easily replicable. Overall, our system is ranked 20th out of 50 submissions (by 44 teams) based on the average of the three 2014 evaluation data scores, with an F1-score of 63.62 on general tweets, 48.37 on sarcastic tweets, and 68.24 on LiveJournal messages.

international conference on user modeling adaptation and personalization | 2017

Generating Labeled Datasets of Twitter Users

Yasen Kiprov; Pepa Gencheva; Ivan Koychev

In this paper we present a simple, yet powerful approach to generating labeled datasets of Twitter1 users. Our focus falls on sensitive personal details, shared as background information in tweets. Such tweets avoid the focus of users attention and also tend to resist the vast amounts of humor, wishes or hypothetical thinking typical for tweets. Our approach combines selecting search queries, followed up by a semi-supervised filtering of indicative messages. We create datasets in several unrelated domains and prove that all sorts of target groups can be built with minimal manual annotator effort. The generated datasets include separate groups of users with specific characteristics: pet ownership, blood pressure, diabetes and psychotropic medicine usage, for which to our knowledge manually labeled data was previously not available. Our search-based approach is also used to generate a cross-domain corpus, matching Twitter users with their Yelp2 profiles.

Unknown Journal | 2015