Thin Nguyen
Deakin University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thin Nguyen.
Knowledge and Information Systems | 2013
Thin Nguyen; Dinh Q. Phung; Brett Adams; Svetha Venkatesh
Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviors, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modeling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network, and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics, and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events.
knowledge discovery and data mining | 2010
Thin Nguyen; Dinh Q. Phung; Brett Adams; Truyen Tran; Svetha Venkatesh
Automatic data-driven analysis of mood from text is an emerging problem with many potential applications Unlike generic text categorization, mood classification based on textual features is complicated by various factors, including its context- and user-sensitive nature We present a comprehensive study of different feature selection schemes in machine learning for the problem of mood classification in weblogs Notably, we introduce the novel use of a feature set based on the affective norms for English words (ANEW) lexicon studied in psychology This feature set has the advantage of being computationally efficient while maintaining accuracy comparable to other state-of-the-art feature sets experimented with In addition, we present results of data-driven clustering on a dataset of over 17 million blog posts with mood groundtruth Our analysis reveals an interesting, and readily interpreted, structure to the linguistic expression of emotion, one that comprises valuable empirical evidence in support of existing psychological models of emotion, and in particular the dipoles pleasure–displeasure and activation–deactivation.
Knowledge and Information Systems | 2014
Thin Nguyen; Dinh Q. Phung; Brett Adams; Svetha Venkatesh
We present a large-scale mood analysis in social media texts. We organise the paper in three parts: (1) addressing the problem of feature selection and classification of mood in blogosphere, (2) we extract global mood patterns at different level of aggregation from a large-scale data set of approximately 18 millions documents (3) and finally, we extract mood trajectory for an egocentric user and study how it can be used to detect subtle emotion signals in a user-centric manner, supporting discovery of hyper-groups of communities based on sentiment information. For mood classification, two feature sets proposed in psychology are used, showing that these features are efficient, do not require a training phase and yield classification results comparable to state of the art, supervised feature selection schemes; on mood patterns, empirical results for mood organisation in the blogosphere are provided, analogous to the structure of human emotion proposed independently in the psychology literature; and on community structure discovery, sentiment-based approach can yield useful insights into community formation.
international conference on multimedia and expo | 2013
Thin Nguyen; Dinh Q. Phung; Svetha Venkatesh
Current growth of individuals on the autism spectrum disorder (ASD) requires continuous support and care. With the popularity of social media, online communities of people affected by ASD emerge. This paper presents an analysis of these online communities through understanding aspects that differentiate such communities. In this paper, the aspects given are not expressed in terms of friendship, exchange of information, social support or recreation, but rather with regard to the topics and linguistic styles that people express in their on-line writing. Using data collected unobtrusively from LiveJournal, we analyze posts made by ten autism communities in conjunction with those made by a control group of standard communities. Significant differences have been found between autism and control communities when characterized by latent topics of discussion and psycholinguistic features. Latent topics are found to have greater predictive power than linguistic features when classifying blog posts as either autism or control community. This study suggests that data mining of online blogs has the potential to detect clinically meaningful data. It opens the door to possibilities including sentinel risk surveillance and harnessing the power in diverse large datasets.
web information systems engineering | 2011
Thin Nguyen; Dinh Q. Phung; Brett Adams; Svetha Venkatesh
Social media corpora, including the textual output of blogs, forums, and messaging applications, provide fertile ground for linguistic analysis material diverse in topic and style, and at Web scale. We investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and author mood, of a large corpus of blog posts, to analyze the impact of age, emotion, and social connectivity. These properties are found to be significantly different across the examined cohorts, which suggest discriminative features for a number of useful classification tasks.We build binary classifiers for old versus young bloggers, social versus solo bloggers, and happy versus sad postswith high performance. Analysis of discriminative features shows that age turns upon choice of topic, whereas sentiment orientation is evidenced by linguistic style. Good prediction is achieved for social connectivity using topic and linguistic features, leaving tagged mood a modest role in all classifications.
IEEE Transactions on Multimedia | 2013
Dinh Q. Phung; Sunil Kumar Gupta; Thin Nguyen; Svetha Venkatesh
Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from Live Journal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being.
Multimedia Tools and Applications | 2017
Thin Nguyen; Bridianne O'Dea; Mark E. Larsen; Dinh Q. Phung; Svetha Venkatesh; Helen Christensen
Depression is a highly prevalent mental health problem and is a co-morbidity of other mental, physical, and behavioural disorders. The internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these discussions have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A total of 5,000 posts were randomly selected from 24 online communities. Five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide. Psycholinguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the depression communities from the other subgroups. Topics and psycholinguistic features were found to be highly valid predictors of community subgroup. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in mental health.
Journal of Epidemiology and Community Health | 2015
Thin Nguyen; Truyen Tran; Wei Luo; Sunil Kumar Gupta; Santu Rana; Dinh Q. Phung; Melanie Nichols; Lynne Millar; Svetha Venkatesh; S. Allender
Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. Methods Web activity output for each element of the WHOs Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearmans r. Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearmans r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.
web information systems engineering | 2014
Bo Dao; Thin Nguyen; Dinh Q. Phung; Svetha Venkatesh
Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [12]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting.
IEEE Transactions on Affective Computing | 2015
Thin Nguyen; Thi V. Duong; Svetha Venkatesh; Dinh Q. Phung
The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings.