Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lucie Flekova is active.

Publication


Featured researches published by Lucie Flekova.


international world wide web conferences | 2014

What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data

Lucie Flekova; Oliver Ferschke; Iryna Gurevych

With more than 22 million articles, the largest collaborative knowledge resource never sleeps, experiencing several article edits every second. Over one fifth of these articles describes individual people, the majority of which are still alive. Such articles are, by their nature, prone to corruption and vandalism. Manual quality assurance by experts can barely cope with this massive amount of data. Can it be effectively replaced by feedback from the crowd? Can we provide meaningful support for quality assurance with automated text processing techniques? Which properties of the articles should then play a key role in the machine learning algorithms and why? In this paper, we study the user-perceived quality of Wikipedia articles based on a novel Wikipedia user feedback dataset. In contrast to previous work on quality assessment which mostly relied on judgements of active Wikipedia authors, we analyze ratings of ordinary Wikipedia users along four quality dimensions (Complete, Well written, Trustworthy and Objective). We first present an empirical analysis of the novel dataset with over 36 million Wikipedia article ratings. We then select a subset of biographical articles and perform classification experiments to predict their quality ratings along each of the dimensions, exploring multiple linguistic, surface and network properties of the rated articles. Additionally, we study the classification performance and differences for the biographies of living and dead people as well as those for men and women. We demonstrate the effectiveness of our approach by the F-scores of 0.94, 0.89, 0.73, and 0.73 for the dimensions Complete, Well written, Trustworthy, and Objective. Based on the results, we believe that the quality assessment of big textual data can be effectively supported by current text classification and language processing tools.


meeting of the association for computational linguistics | 2016

Exploring Stylistic Variation with Age and Income on Twitter

Lucie Flekova; Daniel Preotiuc-Pietro; Lyle H. Ungar

Writing style allows NLP tools to adjust to the traits of an author. In this paper, we explore the relation between stylistic and syntactic features and authors’ age and income. We confirm our hypothesis that for numerous feature types writing style is predictive of income even beyond age. We analyze the predictive power of writing style features in a regression task on two data sets of around 5,000 Twitter users each. Additionally, we use our validated features to study daily variations in writing style of users from distinct income groups. Temporal stylistic patterns not only provide novel psychological insight into user behavior, but are useful for future research and applications in social media.


meeting of the association for computational linguistics | 2016

Analysing Biases in Human Perception of User Age and Gender from Text

Lucie Flekova; Jordan Carpenter; Salvatore Giorgi; Lyle H. Ungar; Daniel Preoţiuc-Pietro

User traits disclosed through written text, such as age and gender, can be used to personalize applications such as recommender systems or conversational agents. However, human perception of these traits is not perfectly aligned with reality. In this paper, we conduct a large-scale crowdsourcing experiment on guessing age and gender from tweets. We systematically analyze the quality and possible biases of these predictions. We identify the textual cues which lead to miss-assessments of traits or make annotators more or less confident in their choice. Our study demonstrates that differences between real and perceived traits are noteworthy and elucidates inaccurately used stereotypes in human perception.


Social Psychological and Personality Science | 2017

Real Men Don’t Say “Cute”: Using Automatic Language Analysis to Isolate Inaccurate Aspects of Stereotypes

Jordan Carpenter; Daniel Preotiuc-Pietro; Lucie Flekova; Salvatore Giorgi; Courtney Hagan; Margaret L. Kern; Anneke Buffone; Lyle H. Ungar; Martin E. P. Seligman

People associate certain behaviors with certain social groups. These stereotypical beliefs consist of both accurate and inaccurate associations. Using large-scale, data-driven methods with social media as a context, we isolate stereotypes by using verbal expression. Across four social categories—gender, age, education level, and political orientation—we identify words and phrases that lead people to incorrectly guess the social category of the writer. Although raters often correctly categorize authors, they overestimate the importance of some stereotype-congruent signal. Findings suggest that data-driven approaches might be a valuable and ecologically valid tool for identifying even subtle aspects of stereotypes and highlighting the facets that are exaggerated or misapplied.


empirical methods in natural language processing | 2015

Personality Profiling of Fictional Characters using Sense-Level Links between Lexical Resources

Lucie Flekova; Iryna Gurevych

This study focuses on personality prediction of protagonists in novels based on the Five-Factor Model of personality. We present and publish a novel collaboratively built dataset of fictional character personality and design our task as a text classification problem. We incorporate a range of semantic features, including WordNet and VerbNet sense-level information and word vector representations. We evaluate three machine learning models based on the speech, actions and predicatives of the main characters, and show that especially the lexical-semantic features significantly outperform the baselines. The most predictive features correspond to reported findings in personality psychology.


empirical methods in natural language processing | 2015

Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words

Lucie Flekova; Daniel Preoţiuc-Pietro; Eugen Ruppert

Contemporary sentiment analysis ap- proaches rely heavily on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced context- aware method. Our method enhances the assessment of lexicon based sentiment de- tection algorithms and can be further used to quantify ambiguous words.


international conference on computational linguistics | 2014

UKPDIPF: Lexical Semantic Approach to Sentiment Polarity Prediction in Twitter Data

Lucie Flekova; Oliver Ferschke; Iryna Gurevych

We present a sentiment classification system that participated in the SemEval 2014 shared task on sentiment analysis in Twitter. Our system expands tokens in a tweet with semantically similar expressions using a large novel distributional thesaurus and calculates the semantic relatedness of the expanded tweets to word lists representing positive and negative sentiment. This approach helps to assess the polarity of tweets that do not directly contain polarity cues. Moreover, we incorporate syntactic, lexical and surface sentiment features. On the message level, our system achieved the 8th place in terms of macroaveraged F-score among 50 systems, with particularly good performance on the LifeJournal corpus (F1=71.92) and the Twitter sarcasm (F1=54.59) dataset. On the expression level, our system ranked 14 out of 27 systems, based on macro-averaged F-score.


Archive | 2018

Content-based Analysis and Visualization of Story Complexity

Lucie Flekova; Florian Stoffel; Iryna Gurevych; Daniel A. Keim

Diagramme spielen auch in der Linguistik eine große Rolle. Ob der Verständlichkeit, mit der Diagramme erstellt und verwendet werden, geht die Reflexion über die diagrammatische Praxis manchmal verloren. Der folgende Beitrag ist ein Plädoyer, diese Praxis aus drei unterschiedlichen Perspektiven zu befragen: Aus diagrammatischer, algorithmischer und wissensgeschichtlicher Perspektive. Dieses Programm einer „Visual Linguistics“ stellt Fragen nach dem Charakter von Diagrammen, dem Status von Diagrammen in Forschungsprozessen und insbesondere dazu, welchen Einfluss Digitalität auf die Visualisierung sprachlicher Phänomene ausübt. Schließlich kann mit Ludwik Fleck die diagrammatische Praxis in Beziehung zu wissenschaftlichen Denkstilen gesetzt werden. Vor dem Hintergrund dieser Überlegungen ergeben sich fünf diagrammatische Grundformen, die bei der Visualisierung von sprachlichen Daten eine wichtige Rolle spielen: Liste, Karte, Partitur, Vektoren, Graph/Netz. Listen und Partituren werden im vorliegenden Beitrag ausführlich diskutiert und es wird gezeigt, welche Rolle sie bei der Gegenstandskonstitution in der Linguistik haben.


meeting of the association for computational linguistics | 2016

Supersense Embeddings: A Unified Model for Supersense Interpretation, Prediction, and Utilization

Lucie Flekova; Iryna Gurevych

Coarse-grained semantic categories such as supersenses have proven useful for a range of downstream tasks such as question answering or machine translation. To date, no effort has been put into integrating the supersenses into distributional word representations. We present a novel joint embedding model of words and supersenses, providing insights into the relationship between words and supersenses in the same vector space. Using these embeddings in a deep neural network model, we demonstrate that the supersense enrichment leads to a significant improvement in a range of downstream classification tasks.


CLEF (Working Notes) | 2013

Can We Hide in the Web? Large Scale Simultaneous Age and Gender Author Profiling in Social Media Notebook for PAN at CLEF 2013.

Lucie Flekova; Iryna Gurevych

Collaboration


Dive into the Lucie Flekova's collaboration.

Top Co-Authors

Avatar

Iryna Gurevych

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Lyle H. Ungar

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jordan Carpenter

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Salvatore Giorgi

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Oliver Ferschke

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eugen Ruppert

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge