Daniel Preotiuc-Pietro

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Preotiuc-Pietro is active.

Explore More

Publication

Featured researches published by Daniel Preotiuc-Pietro.

meeting of the association for computational linguistics | 2016

Exploring Stylistic Variation with Age and Income on Twitter

Lucie Flekova; Daniel Preotiuc-Pietro; Lyle H. Ungar

Writing style allows NLP tools to adjust to the traits of an author. In this paper, we explore the relation between stylistic and syntactic features and authors’ age and income. We confirm our hypothesis that for numerous feature types writing style is predictive of income even beyond age. We analyze the predictive power of writing style features in a regression task on two data sets of around 5,000 Twitter users each. Additionally, we use our validated features to study daily variations in writing style of users from distinct income groups. Temporal stylistic patterns not only provide novel psychological insight into user behavior, but are useful for future research and applications in social media.

Information Processing and Management | 2017

Sub-story detection in Twitter with hierarchical Dirichlet processes

P. K. Srijith; Mark Hepple; Kalina Bontcheva; Daniel Preotiuc-Pietro

Social media has now become the de facto information source on real world events. The challenge, however, due to the high volume and velocity nature of social media streams, is in how to follow all posts pertaining to a given event over time, a task referred to as story detection. Moreover, there are often several different stories pertaining to a given event, which we refer to as sub-stories and the corresponding task of their automatic detection as sub-story detection. This paper proposes hierarchical Dirichlet processes (HDP), a probabilistic topic model, as an effective method for automatic sub-story detection. HDP can learn sub-topics associated with sub-stories which enables it to handle subtle variations in sub-stories. It is compared with state- of-the-art story detection approaches based on locality sensitive hashing and spectral clustering. We demonstrate the superior performance of HDP for sub-story detection on real world Twitter data sets using various evaluation measures. The ability of HDP to learn sub-topics helps it to recall the sub- stories with high precision. Another contribution of this paper is in demonstrating that the conversational structures within the Twitter stream can be used to improve sub-story detection performance significantly.

Artificial Intelligence Review | 2014

Unsupervised word sense disambiguation with N-gram features

Daniel Preotiuc-Pietro; Florentina Hristea

The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Naïve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are “helping” a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature selection is a reliable alternative to previously existing approaches, provided that a “quality list” of features, adapted to the part of speech, is used.

Social Psychological and Personality Science | 2017

Real Men Don’t Say “Cute”: Using Automatic Language Analysis to Isolate Inaccurate Aspects of Stereotypes

Jordan Carpenter; Daniel Preotiuc-Pietro; Lucie Flekova; Salvatore Giorgi; Courtney Hagan; Margaret L. Kern; Anneke Buffone; Lyle H. Ungar; Martin E. P. Seligman

People associate certain behaviors with certain social groups. These stereotypical beliefs consist of both accurate and inaccurate associations. Using large-scale, data-driven methods with social media as a context, we isolate stereotypes by using verbal expression. Across four social categories—gender, age, education level, and political orientation—we identify words and phrases that lead people to incorrectly guess the social category of the writer. Although raters often correctly categorize authors, they overestimate the importance of some stereotype-congruent signal. Findings suggest that data-driven approaches might be a valuable and ecologically valid tool for identifying even subtle aspects of stereotypes and highlighting the facets that are exaggerated or misapplied.

conference on information and knowledge management | 2016

Studying the Dark Triad of Personality through Twitter Behavior

Daniel Preotiuc-Pietro; Jordan Carpenter; Salvatore Giorgi; Lyle H. Ungar

Research into the darker traits of human nature is growing in interest especially in the context of increased social media usage. This allows users to express themselves to a wider online audience. We study the extent to which the standard model of dark personality -- the dark triad -- consisting of narcissism, psychopathy and Machiavellianism, is related to observable Twitter behavior such as platform usage, posted text and profile image choice. Our results show that we can map various behaviors to psychological theory and study new aspects related to social media usage. Finally, we build a machine learning algorithm that predicts the dark triad of personality in out-of-sample users with reliable accuracy.

meeting of the association for computational linguistics | 2014

Gaussian Processes for Natural Language Processing

Trevor Cohn; Daniel Preotiuc-Pietro; Neil D. Lawrence

Gaussian Processes (GPs) are a powerful modelling framework incorporating kernels and Bayesian inference, and are recognised as stateof-the-art for many machine learning tasks. Despite this, GPs have seen few applications in natural language processing (notwithstanding several recent papers by the authors). We argue that the GP framework offers many benefits over commonly used machine learning frameworks, such as linear models (logistic regression, least squares regression) and support vector machines. Moreover, GPs are extremely flexible and can be incorporated into larger graphical models, forming an important additional tool for probabilistic inference. Notably, GPs are one of the few models which support analytic Bayesian inference, avoiding the many approximation errors that plague approximate inference techniques in common use for Bayesian models (e.g. MCMC, variational Bayes).1 GPs accurately model not just the underlying task, but also the uncertainty in the predictions, such that uncertainty can be propagated through pipelines of probabilistic components. Overall, GPs provide an elegant, flexible and simple means of probabilistic inference and are well overdue for consideration of the NLP community. This tutorial will focus primarily on regression and classification, both fundamental techniques of wide-spread use in the NLP community. Within NLP, linear models are near ubiquitous, because they provide good results for many tasks, support efficient inference (including dynamic programming in structured prediction) and support simple parameter interpretation. However, linear models are inherently limited in the types of relationships between variables they can model. Often

Anthrozoos | 2017

Personality Profiles of Users Sharing Animal-related Content on Social Media

Courtney Hagan; Jordan Carpenter; Lyle H. Ungar; Daniel Preotiuc-Pietro

ABSTRACT Animal preferences are thought to be linked with more salient psychological traits of people, and most research examining owner personality as a differentiating factor has obtained mixed results. The rise in usage of social networks offers users a new medium in which they can broadcast their preferences and activities, including about animals. In two studies, the first on Facebook status updates and the second on images shared on Twitter, we revisited the link between Big Five personality traits and animal preference, specifically focusing on cats and dogs. We used automatic content analysis of text and images to unobtrusively measure preference for animals online using large datasets. In study 1, a dataset of Facebook status updates (n = 72,559) were analyzed and it was found that those who mentioned ownership of a cat (by using the phrase “my cat” (n = 5,053)) in their status updates were more open to experience, introverted, neurotic, and less conscientious when compared with the general population. Users mentioning ownership of a dog (by using “my dog” (n = 8,045)) were only less conscientious compared with the rest of the population. In study 2, a dataset of Twitter images was analyzed and revealed that users who featured either cat (n = 1,036) or dog (n = 1,499) images in their tweets were more neurotic, less conscientious, and less agreeable than those who did not. In addition, posting images containing cats was specific to users higher in openness, while posting images featuring dogs was associated with users higher in extraversion. These findings taken together align with some previous findings on the relationship between owner personality and animal preference, additionally highlighting some social media-specific behaviors.

international conference on weblogs and social media | 2012