Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Todor Mihaylov is active.

Publication


Featured researches published by Todor Mihaylov.


conference on computational natural language learning | 2015

Finding Opinion Manipulation Trolls in News Community Forums

Todor Mihaylov; Georgi Georgiev; Preslav Nakov

The emergence of user forums in electronic news media has given rise to the proliferation of opinion manipulation trolls. Finding such trolls automatically is a hard task, as there is no easy way to recognize or even to define what they are; this also makes it hard to get training and testing data. We solve this issue pragmatically: we assume that a user who is called a troll by several people is likely to be one. We experiment with different variations of this definition, and in each case we show that we can train a classifier to distinguish a likely troll from a non-troll with very high accuracy, 82‐95%, thanks to our rich feature set.


north american chapter of the association for computational linguistics | 2016

SemanticZ at SemEval-2016 Task 3: Ranking Relevant Answers in Community Question Answering Using Semantic Similarity Based on Fine-tuned Word Embeddings

Todor Mihaylov; Preslav Nakov

We describe our system for finding good answers in a community forum, as defined in SemEval-2016, ask 3 on Community Question Answering. Our approach relies on several semantic similarity features based on fine-tuned word embeddings and topics similarities. In the main Subtask C, our primary submission was ranked third, with a MAP of 51.68 and accuracy of 69.94. In Subtask A, our primary submission was also third, with MAP of 77.58 and accuracy of 73.39.


meeting of the association for computational linguistics | 2016

Hunting for troll comments in news community forums

Todor Mihaylov; Preslav Nakov

There are different definitions of what a troll is. Certainly, a troll can be somebody who teases people to make them angry, or somebody who offends people, or somebody who wants to dominate any single discussion, or somebody who tries to manipulate people’s opinion (sometimes for money), etc. The last definition is the one that dominates the public discourse in Bulgaria and Eastern Europe, and this is our focus in this paper. In our work, we examine two types of opinion manipulation trolls: paid trolls that have been revealed from leaked “reputation management contracts” and “mentioned trolls” that have been called such by several different people. We show that these definitions are sensible: we build two classifiers that can distinguish a post by such a paid troll from one by a non-troll with 81-82% accuracy; the same classifier achieves 81-82% accuracy on so called mentioned troll vs. non-troll posts.


arXiv: Computation and Language | 2017

Story Cloze Ending Selection Baselines and Data Examination

Todor Mihaylov; Anette Frank

This paper describes two supervised baseline systems for the Story Cloze Test Shared Task (Mostafazadeh et al. 2016). We first build a classifier using features based on word embeddings and semantic similarity computation. We further implement a neural LSTM system with different encoding strategies that try to model the relation between the story and the provided endings. Our experiments show that a model using representation features based on average word embedding vectors over the given story words and the candidate ending sentences words, joint with similarity features between the story and candidate ending representations performed better than the neural models. Our best model achieves an accuracy of 72.42, ranking 3rd in the official evaluation.


international acm sigir conference on research and development in information retrieval | 2017

Large-Scale Goodness Polarity Lexicons for Community Question Answering

Todor Mihaylov; Daniel Balchev; Yasen Kiprov; Ivan Koychev; Preslav Nakov

We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.


north american chapter of the association for computational linguistics | 2016

SUper Team at SemEval-2016 Task 3: Building a Feature-Rich System for Community Question Answering.

Tsvetomila Mihaylova; Pepa Gencheva; Martin Boyanov; Ivana Yovcheva; Todor Mihaylov; Momchil Hardalov; Yasen Kiprov; Daniel Balchev; Ivan Koychev; Preslav Nakov; Ivelina Nikolova; Galia Angelova

We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors trained on QatarLiving data and similarities between the question and the comment for subtasks A and C, and between the original and the related question for Subtask B.


international conference on computational linguistics | 2014

SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter

Boris Velichkov; Borislav Kapukaranov; Ivan Grozev; Jeni Karanesheva; Todor Mihaylov; Yasen Kiprov; Preslav Nakov; Ivan Koychev; Georgi Georgiev

We describe the submission of the team of the Sofia University to SemEval-2014 Task 9 on Sentiment Analysis in Twitter. We participated in subtask B, where the participating systems had to predict whether a Twitter message expresses positive, negative, or neutral sentiment. We trained an SVM classifier with a linear kernel using a variety of features. We used publicly available resources only, and thus our results should be easily replicable. Overall, our system is ranked 20th out of 50 submissions (by 44 teams) based on the average of the three 2014 evaluation data scores, with an F1-score of 63.62 on general tweets, 48.37 on sarcastic tweets, and 68.24 on LiveJournal messages.


Internet Research | 2018

The dark side of news community forums: opinion manipulation trolls

Todor Mihaylov; Tsvetomila Mihaylova; Preslav Nakov; Lluís Màrquez; Georgi Georgiev; Ivan Koychev

The purpose of this paper is to explore the dark side of news community forums: the proliferation of opinion manipulation trolls. In particular, it explores the idea that a user who is called a troll by several people is likely to be one. It further demonstrates the utility of this idea for detecting accused and paid opinion manipulation trolls and their comments as well as for predicting the credibility of comments in news community forums.,The authors are aiming to build a classifier to distinguish trolls vs regular users. Unfortunately, it is not easy to get reliable training data. The authors solve this issue pragmatically: the authors assume that a user who is called a troll by several people is likely to be such, which are called accused trolls. Based on this assumption and on leaked reports about actual paid opinion manipulation trolls, the authors build a classifier to distinguish trolls vs regular users.,The authors compare the profiles of paid trolls vs accused trolls vs non-trolls, and show that a classifier trained to distinguish accused trolls from non-trolls does quite well also at telling apart paid trolls from non-trolls.,The troll detection works even for users with about 10 comments, but it achieves the best performance for users with a sizable number of comments in the forum, e.g. 100 or more. Yet, there is not such a limitation for troll comment detection.,The approach would help forum moderators in their work, by pointing them to the most suspicious users and comments. It would be also useful to investigative journalists who want to find paid opinion manipulation trolls.,The authors can offer a better experience to online users by filtering out opinion manipulation trolls and their comments.,The authors propose a novel approach for finding paid opinion manipulation trolls and their posts.


Proceedings of the CoNLL-16 shared task | 2016

Discourse Relation Sense Classification Using Cross-argument Semantic Similarity Based on Word Embeddings

Todor Mihaylov; Anette Frank

This paper describes our system for the CoNLL 2016 Shared Task’s supplementary task on Discourse Relation Sense Classification. Our official submission employs a Logistic Regression classifier with several cross-argument similarity features based on word embeddings and performs with overall F-scores of 64.13 for the Dev set, 63.31 for the Test set and 54.69 for the Blind set, ranking first in the Overall ranking for the task. We compare the feature-based Logistic Regression classifier to different Convolutional Neural Network architectures. After the official submission we enriched our model for Non-Explicit relations by including similarities of explicit connectives with the relation arguments, and part of speech similarities based on modal verbs. This improved our Non-Explicit result by 1.46 points on the Dev set and by 0.36 points on the Blind set.


recent advances in natural language processing | 2015

Exposing Paid Opinion Manipulation Trolls

Todor Mihaylov; Ivan Koychev; Georgi Georgiev; Preslav Nakov

Collaboration


Dive into the Todor Mihaylov's collaboration.

Top Co-Authors

Avatar

Preslav Nakov

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tushar Khot

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Lluís Màrquez

Qatar Computing Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge