Yasen Kiprov
Sofia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yasen Kiprov.
north american chapter of the association for computational linguistics | 2016
Daniel Balchev; Yasen Kiprov; Ivan Koychev; Preslav Nakov
We describe our submission to SemEval-2016 Task 3 on Community Question Answering. We participated in subtask A, which asks to rerank the comments from the thread for a given forum question from good to bad. Our approach focuses on the generation and use of goodness polarity lexicons, similarly to the sentiment polarity lexicons, which are very popular in sentiment analysis. In particular, we use a combination of bootstrapping and pointwise mutual information to estimate the strength of association between a word (from a large unannotated set of question-answer threads) and the class of good/bad comments. We then use various features based on these lexicons to train a regression model, whose predictions we use to induce the final comment ranking. While our system was not very strong as it lacked important features, our lexicons contributed to the strong performance of another top-performing system.
cross language evaluation forum | 2017
Georgi Karadzhov; Tsvetomila Mihaylova; Yasen Kiprov; Georgi Georgiev; Ivan Koychev; Preslav Nakov
Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a texts writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient, and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.
international acm sigir conference on research and development in information retrieval | 2017
Todor Mihaylov; Daniel Balchev; Yasen Kiprov; Ivan Koychev; Preslav Nakov
We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.
north american chapter of the association for computational linguistics | 2016
Tsvetomila Mihaylova; Pepa Gencheva; Martin Boyanov; Ivana Yovcheva; Todor Mihaylov; Momchil Hardalov; Yasen Kiprov; Daniel Balchev; Ivan Koychev; Preslav Nakov; Ivelina Nikolova; Galia Angelova
We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors trained on QatarLiving data and similarities between the question and the comment for subtasks A and C, and between the original and the related question for Subtask B.
international conference on computational linguistics | 2014
Boris Velichkov; Borislav Kapukaranov; Ivan Grozev; Jeni Karanesheva; Todor Mihaylov; Yasen Kiprov; Preslav Nakov; Ivan Koychev; Georgi Georgiev
We describe the submission of the team of the Sofia University to SemEval-2014 Task 9 on Sentiment Analysis in Twitter. We participated in subtask B, where the participating systems had to predict whether a Twitter message expresses positive, negative, or neutral sentiment. We trained an SVM classifier with a linear kernel using a variety of features. We used publicly available resources only, and thus our results should be easily replicable. Overall, our system is ranked 20th out of 50 submissions (by 44 teams) based on the average of the three 2014 evaluation data scores, with an F1-score of 63.62 on general tweets, 48.37 on sarcastic tweets, and 68.24 on LiveJournal messages.
international conference on user modeling adaptation and personalization | 2017
Yasen Kiprov; Pepa Gencheva; Ivan Koychev
In this paper we present a simple, yet powerful approach to generating labeled datasets of Twitter1 users. Our focus falls on sensitive personal details, shared as background information in tweets. Such tweets avoid the focus of users attention and also tend to resist the vast amounts of humor, wishes or hypothetical thinking typical for tweets. Our approach combines selecting search queries, followed up by a semi-supervised filtering of indicative messages. We create datasets in several unrelated domains and prove that all sorts of target groups can be built with minimal manual annotator effort. The generated datasets include separate groups of users with specific characteristics: pet ownership, blood pressure, diabetes and psychotropic medicine usage, for which to our knowledge manually labeled data was previously not available. Our search-based approach is also used to generate a cross-domain corpus, matching Twitter users with their Yelp2 profiles.
Unknown Journal | 2015
Yasen Kiprov; Momchil Hardalov; Preslav Nakov; Ivan Koychev
cross-language evaluation forum | 2017
Georgi Karadzhov; Tsvetomila Mihaylova; Yasen Kiprov; Georgi Georgiev; Ivan Koychev; Preslav Nakov
CLEF (Working Notes) | 2016
Pepa Gencheva; Martin Boyanov; Elena Deneva; Preslav Nakov; Georgi Georgiev; Yasen Kiprov; Ivan Koychev
CLEF (Working Notes) | 2016
Valentin Zmiycharov; Dimitar Alexandrov; Hristo Georgiev; Yasen Kiprov; Georgi Georgiev; Ivan Koychev; Preslav Nakov