Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vasiliki Simaki is active.

Publication


Featured researches published by Vasiliki Simaki.


international acm sigir conference on research and development in information retrieval | 2012

Queries without clicks: evaluating retrieval effectiveness based on user feedback

Athanasia Koumpouri; Vasiliki Simaki

Until recently, the lack of user activity on search results was perceived as a sign of user dissatisfaction from retrieval performance. However, recent studies have reported that some queries might not be followed by clicks to the content of the retrieved results, because the search task can be satisfied in the list of retrieved results the user views without the need to click through them. In this paper, we propose a method for evaluating user satisfaction from the results of searches that are not followed by clickthrough activity to the retrieved results. We found that there is a strong association between some implicit measures of user activity and users explicit satisfaction judgments. Moreover, we developed a predictive model of user satisfaction based on implicit measures, achieving accuracy up to 86%.


Corpus Linguistics and Linguistic Theory | 2017

Annotating speaker stance in discourse: The Brexit Blog Corpus

Vasiliki Simaki; Carita Paradis; Maria Skeppstedt; Magnus Sahlgren; Kostiantyn Kucher; Andreas Kerren

Abstract The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts was compiled, the Brexit Blog Corpus (BBC). An analytical protocol and interface (Active Learning and Visual Analytics) for the annotations was set up and the data were independently annotated by two annotators. The annotation procedure, the annotation agreements and the co-occurrence of more than one stance in the utterances are described and discussed. The careful, analytical annotation process has returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC.


international conference on speech and computer | 2017

Stance Classification in Texts from Blogs on the 2016 British Referendum

Vasiliki Simaki; Carita Paradis; Andreas Kerren

The problem of identifying and correctly attributing speaker stance in human communication is addressed in this paper. The data set consists of political blogs dealing with the 2016 British referendum. A cognitive-functional framework is adopted with data annotated for six notional stance categories: contrariety, hypotheticality, necessity, prediction, source of knowledge, and uncertainty. We show that these categories can be implemented in a text classification task and automatically detected. To this end, we propose a large set of lexical and syntactic linguistic features. These features were tested and classification experiments were implemented using different algorithms. We achieved accuracy of up to 30% for the six-class experiments, which is not fully satisfactory. As a second step, we calculated the pair-wise combinations of the stance categories. The contrariety and necessity binary classification achieved the best results with up to 71% accuracy.


international conference on speech and computer | 2017

Detection of stance and sentiment modifiers in political blogs

Maria Skeppstedt; Vasiliki Simaki; Carita Paradis; Andreas Kerren

The automatic detection of seven types of modifiers was studied: Certainty, Uncertainty, Hypotheticality, Prediction, Recommendation, Concession/Contrast and Source. A classifier aimed at detecting local cue words that signal the categories was the most successful method for five of the categories. For Prediction and Hypotheticality, however, better results were obtained with a classifier trained on tokens and bigrams present in the entire sentence. Unsupervised cluster features were shown useful for the categories Source and Uncertainty, when a subset of the training data available was used. However, when all of the 2,095 sentences that had been actively selected and manually annotated were used as training data, the cluster features had a very limited effect. Some of the classification errors made by the models would be possible to avoid by extending the training data set, while other features and feature representations, as well as the incorporation of pragmatic knowledge, would be required for other error types.


Journal of Quantitative Linguistics | 2017

Sociolinguistic Features for Author Gender Identification: From Qualitative Evidence to Quantitative Analysis

Vasiliki Simaki; Christina Aravantinou; Iosif Mporas; Marianna Kondyli; Vasileios Megalooikonomou

Abstract Theoretical and empirical studies prove the strong relationship between social factors and the individual linguistic attitudes. Different social categories, such as gender, age, education, profession and social status, are strongly related with the linguistic diversity of people’s everyday spoken and written interaction. In this paper, sociolinguistic studies addressed to gender differentiation are overviewed in order to identify how various linguistic characteristics differ between women and men. Thereafter, it is examined if and how these qualitative features can become quantitative metrics for the task of gender identification from texts on web blogs. The evaluation results showed that the “syntactic complexity”, the “tag questions”, the “period length”, the “adjectives” and the “vocabulary richness” characteristics seem to be significantly distinctive with respect to the author’s gender.


panhellenic conference on informatics | 2013

Evaluating the correspondence of educational software to learning theories

Maria Spyropoulou; Dimitra Ntourou; Vasiliki Simaki; Dionysia Malagkoniari; Athanasia Koumpouri; Maria Sorra

As new technologies emerge, more and more people depend on them for a variety of purposes. Now more than ever there is a tendency for technological implications to substitute for face-to-face communication and education. In this paper we attempt to investigate whether the usability and pedagogical factors for quality educational platforms meet the expectations of three dominant learning theories of the past century, which, namely are: behaviorism, cognitivism and constructivism. We assign specific factors derived from 9 evaluation models to the 3 learning theories. A list of 15 questions was produced to help evaluators in the assessment of the educational software. Then we evaluated 11 educational websites that aim to help anglophone students improve their language skills e.g. through grammar and spelling exercises. The results show the level of correspondence of these educational websites to the learning theories.


ICAME Journal | 2018

Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis

Vasiliki Simaki; Carita Paradis; Andreas Kerren

Abstract This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.


recent advances in natural language processing | 2017

Identifying the Authors' National Variety of English in Social Media Texts.

Vasiliki Simaki; Panagiotis Simakis; Carita Paradis; Andreas Kerren

In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selectionprocess. The classification accuracy achieved, when the 31 highest rankedfeatures were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed. (Less)


conference on intelligent text processing and computational linguistics | 2016

Age Identification of Twitter Users : Classification Methods and Sociolinguistic Analysis

Vasiliki Simaki; Iosif Mporas; Vasileios Megalooikonomou

In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.


text speech and dialogue | 2015

Using Sociolinguistic Inspired Features for Gender Classification of Web Authors

Vasiliki Simaki; Christina Aravantinou; Iosif Mporas; Vasileios Megalooikonomou

In this article we present a methodology for classification of text from web authors, using sociolinguistic inspired text features. The proposed methodology uses a baseline text mining based feature set, which is combined with text features that quantify results from theoretical and sociolinguistic studies. Two combination approaches were evaluated and the evaluation results indicated a significant improvement in both combination cases. For the best performing combination approach the accuracy was 84.36%, in terms of percentage of correctly classified web posts.

Collaboration


Dive into the Vasiliki Simaki's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge