Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dong-Phuong Nguyen is active.

Publication


Featured researches published by Dong-Phuong Nguyen.


conference on information and knowledge management | 2012

Federated search in the wild: the combined power of over a hundred search engines

Dong-Phuong Nguyen; Thomas Demeester; Dolf Trieschnigg; Djoerd Hiemstra

Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federated search reflecting an actual web environment has been absent. As a result, it has been difficult to assess whether proposed systems are suitable for the web setting. We introduce a new test collection containing the results from more than a hundred actual search engines, ranging from large general web search engines such as Google and Bing to small domain-specific engines. We discuss the design and analyze the effect of several sampling methods. For a set of test queries, we collected relevance judgements for the top 10 results of each search engine. The dataset is publicly available and is useful for researchers interested in resource selection for web search collections, result merging and size estimation of uncooperative resources.


Computational Linguistics | 2016

Computational sociolinguistics: A survey

Dong-Phuong Nguyen; A. Seza Doğruöz; Carolyn Penstein Rosé; Franciska de Jong

Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of “computational sociolinguistics” that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction, and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions used in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.


asia information retrieval symposium | 2012

What Snippets Say About Pages in Federated Web Search

Thomas Demeester; Dong-Phuong Nguyen; Dolf Trieschnigg; Chris Develder; Djoerd Hiemstra

What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research questions from a global perspective. Our test collection covers the main Web search engines like Google, Yahoo!, and Bing, as well as a number of smaller search engines dedicated to multimedia, shopping, etc., and as such reflects a realistic Web environment. Using a large set of relevance assessments, we are able to investigate the connection between snippet quality and page relevance. The dataset is strongly inhomogeneous, and although the assessors’ consistency is shown to be satisfying, care is required when comparing resources. To this end, a number of probabilistic quantities, based on snippet and page relevance, are introduced and evaluated.


european conference on information retrieval | 2013

Folktale classification using learning to rank

Dong-Phuong Nguyen; Dolf Trieschnigg; Mariët Theune

We present a learning to rank approach to classify folktales, such as fairy tales and urban legends, according to their story type, a concept that is widely used by folktale researchers to organize and classify folktales. A story type represents a collection of similar stories often with recurring plot and themes. Our work is guided by two frequently used story type classification schemes. Contrary to most information retrieval problems, the text similarity in this problem goes beyond topical similarity. We experiment with approaches inspired by distributed information retrieval and features that compare subject-verb-object triplets. Our system was found to be highly effective compared with a baseline system.


Information Retrieval | 2016

Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation

Thomas Demeester; Robin Aly; Djoerd Hiemstra; Dong-Phuong Nguyen; Chris Develder

Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the predicted relevance model (PRM), which allows predicting a particular result’s relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain, which nowadays often use heuristic and data-independent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out non-relevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections.


workshop on computational approaches to code switching | 2014

Predicting Code-switching in Multilingual Communication for Immigrant Communities

Evangelos E. Papalexakis; Dong-Phuong Nguyen; A. Seza Doğruöz

Immigrant communities host multilingual speakers who switch across languages and cultures in their daily communication practices. Although there are in-depth linguistic descriptions of code-switching across different multilingual communication settings, there is a need for automatic prediction of code-switching in large datasets. We use emoticons and multi-word expressions as novel features to predict code-switching in a large online discussion forum for the Turkish-Dutch immigrant community in the Netherlands. Our results indicate that multi-word expressions are powerful features to predict code-switching.


Digital Scholarship in the Humanities | 2016

The Apocalypse on Twitter

Theo Meder; Dong-Phuong Nguyen; Rilana Gravel

There was one trending topic on Twitter in December 2012 that we could have seen coming for a few years now: the New Age prophecy of the End of Times on 21 December 2012—all because some Mayan calendar supposedly ended on this date. For 2 weeks long—a week before the Apocalypse and a week after—we monitored Twitter for Dutch words concerning the End of the World. We caught 52,000 tweets in 2 weeks. When did the stream of rumours peek? How many retweets were involved? Was there much micro-variation? What was the overall content of the tweets? What emotions were expressed in the tweets? How did religious people respond? And finally, how many people confessed they were truly scared because of the prophecy? These are intriguing questions that we can answer by using a few basic computational tools. Although the Apocalypse got a lot of attention in the news media, it turned out most Dutch people on Twitter took the End of Days with a grain of salt.


empirical methods in natural language processing | 2015

#SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns

Dong-Phuong Nguyen; Tijs Adriaan van den Broek; Claudia Hauff; Djoerd Hiemstra; Michel Léon Ehrenhard

We consider the task of automatically identifying participants’ motivations in the public health campaign Movember and investigate the impact of the different motivations on the amount of campaign donations raised. Our classification scheme is based on the Social Identity Model of Collective Action (van Zomeren et al., 2008). We find that automatic classification based on Movember profiles is fairly accurate, while automatic classification based on tweets is challenging. Using our classifier, we find a strong relation between types of motivations and donations. Our study is a first step towards scaling-up collective action research methods.


Proceedings of the 14th SIGMORPHON Workshop on Computational Research in#N# Phonetics, Phonology, and Morphology | 2016

Automatic Detection of Intra-Word Code-Switching

Dong-Phuong Nguyen; Leonie M.E.A. Cornips

Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.


empirical methods in natural language processing | 2013

Word Level Language Identification in Online Multilingual Communication

Dong-Phuong Nguyen; A. Seza Doğruöz

Collaboration


Dive into the Dong-Phuong Nguyen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Claudia Hauff

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Franciska de Jong

Erasmus University Rotterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Theo Meder

Royal Netherlands Academy of Arts and Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge