Bei Yu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bei Yu is active.

Explore More

Publication

Featured researches published by Bei Yu.

knowledge discovery and data mining | 2004

A cross-collection mixture model for comparative text mining

ChengXiang Zhai; Atulya Velivelli; Bei Yu

In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of comparative text mining is to discover any latent common themes across all collections as well as summarize the similarity and differences of these collections along each common theme. This general problem subsumes many interesting applications, including business intelligence and opinion summarization. We propose a generative probabilistic mixture model for comparative text mining. The model simultaneously performs cross-collection clustering and within-collection clustering, and can be applied to an arbitrary set of comparable text collections. The model can be estimated efficiently using the Expectation-Maximization (EM) algorithm. We evaluate the model on two different text data sets (i.e., a news article data set and a laptop review data set), and compare it with a baseline clustering method also based on a mixture model. Experiment results show that the model is quite effective in discovering the latent common themes across collections and performs significantly better than our baseline mixture model.

Cornell Hospitality Quarterly | 2013

Spreading Social Media Messages on Facebook: An Analysis of Restaurant Business-to-Consumer Communications

Linchi Kwok; Bei Yu

As a method of determining what types of social media messages work best for hospitality firms, this study examined what types of messages gained the most clicks of “Like” and comments on Facebook. An analysis of the number of likes and comments regarding nine hundred and eighty-two Facebook messages from ten restaurant chains and two independent operators revealed clear patterns. The more popular keywords involved information about the restaurant (e.g., menu descriptions) and the less popular messages were those that contained marketing-related words (including “winner” and “check”). Dividing the messages into four media types, namely, status (text only), link (containing a URL), video (embedding a video), and photo (showing photos), revealed that photo and status receive more likes and comments than the other two categories. Social media messages can also be categorized into two message types: sales and marketing (about two-thirds of the messages in this study) and conversational messages. Based on number of likes and comments, conversational messages are endorsed by more Facebook users. Finally, cross-effects of media type and message type affect the number of comments a message received. Although these results do not expressly assess Facebook users’ reactions, the guidelines developed here should help managers improve their use of Facebook, as well as provide groundwork for developing a defined typology of Facebook messages and an automatic text classifier with the machine learning techniques.

British Journal of Political Science | 2012

Language and Ideology in Congress

Daniel Diermeier; Jean-François Godbout; Bei Yu; Stefan Kaufmann

Legislative speech records from the 101st to 108th Congresses of the US Senate are analysed to study political ideologies. A widely-used text classification algorithm – Support Vector Machines (SVM) – allows the extraction of terms that are most indicative of conservative and liberal positions in legislative speeches and the prediction of senators’ ideological positions, with a 92 per cent level of accuracy. Feature analysis identifies the terms associated with conservative and liberal ideologies. The results demonstrate that cultural references appear more important than economic references in distinguishing conservative from liberal congressional speeches, calling into question the common economic interpretation of ideological differences in the US Congress.

Journal of Medical Internet Research | 2013

Crowdsourcing Participatory Evaluation of Medical Pictograms Using Amazon Mechanical Turk

Bei Yu; Matt Willis; Peiyuan Sun; Jun Wang

Background Consumer and patient participation proved to be an effective approach for medical pictogram design, but it can be costly and time-consuming. We proposed and evaluated an inexpensive approach that crowdsourced the pictogram evaluation task to Amazon Mechanical Turk (MTurk) workers, who are usually referred to as the “turkers”. Objective To answer two research questions: (1) Is the turkers’ collective effort effective for identifying design problems in medical pictograms? and (2) Do the turkers’ demographic characteristics affect their performance in medical pictogram comprehension? Methods We designed a Web-based survey (open-ended tests) to ask 100 US turkers to type in their guesses of the meaning of 20 US pharmacopeial pictograms. Two judges independently coded the turkers’ guesses into four categories: correct, partially correct, wrong, and completely wrong. The comprehensibility of a pictogram was measured by the percentage of correct guesses, with each partially correct guess counted as 0.5 correct. We then conducted a content analysis on the turkers’ interpretations to identify misunderstandings and assess whether the misunderstandings were common. We also conducted a statistical analysis to examine the relationship between turkers’ demographic characteristics and their pictogram comprehension performance. Results The survey was completed within 3 days of our posting the task to the MTurk, and the collected data are publicly available in the multimedia appendix for download. The comprehensibility for the 20 tested pictograms ranged from 45% to 98%, with an average of 72.5%. The comprehensibility scores of 10 pictograms were strongly correlated to the scores of the same pictograms reported in another study that used oral response–based open-ended testing with local people. The turkers’ misinterpretations shared common errors that exposed design problems in the pictograms. Participant performance was positively correlated with their educational level. Conclusions The results confirmed that crowdsourcing can be used as an effective and inexpensive approach for participatory evaluation of medical pictograms. Through Web-based open-ended testing, the crowd can effectively identify problems in pictogram designs. The results also confirmed that education has a significant effect on the comprehension of medical pictograms. Since low-literate people are underrepresented in the turker population, further investigation is needed to examine to what extent turkers’ misunderstandings overlap with those elicited from low-literate people.

Literary and Linguistic Computing | 2014

Language and gender in Congressional speech

Bei Yu

This study draws from a large corpus of Congressional speeches from the 101st to the 110th Congress (1989–2008), to examine gender differences in language use in a setting of political debates. Female legislators’ speeches demonstrated characteristics of both a feminine language style (e.g. more use of emotion words, fewer articles) and a masculine one (e.g. more nouns and long words, fewer personal pronouns). A trend analysis found that these gender differences have consistently existed in the Congressional speeches over the past 20 years, regardless of the topic of debate. The findings lend support to the argument that gender differences in language use persist in professional settings like the floor of Congress.

Tourism and Hospitality Research | 2016

Taxonomy of Facebook messages in business-to-consumer communications: What really works?

Linchi Kwok; Bei Yu

This research combines machine learning and human intelligence to analyze 2654 Facebook messages initiated by 26 hospitality companies to develop the taxonomy of Facebook messages in business-to-consumer (B2C) communications. Facebook messages can be classified into two broad message types: Sales/Marketing messages, with five sub-categories of Social Responsibility, Direct Boasting, Indirect Boasting, Product Highlight, and Campaign/Sales, and Conversational messages, with four sub-categories of Call for Action, Provoke Feedback, Advice/Suggestions, and Updates. By comparison, Conversational messages received more “Likes” and comments than Sales/Marketing messages. Direct Boasting, Product Highlight, Call for Action, Provoke Feedback, Advice/Suggestions, and Updates received more “Likes” than other types; Provoke Feedback and Call for Action received more comments. As compared to current literature, the results allow managers to advance more specific strategies to better engage with Facebook users and provide a more thorough taxonomy for additional analysis on companies’ B2C messages on other social media websites.

Worldwide Hospitality and Tourism Themes | 2015

Documenting business-to-consumer (B2C) communications on Facebook: what have changed among restaurants and consumers?

Linchi Kwok; Feifei Zhang; Yung-Kuei Huang; Bei Yu; Prabhukrishna Maharabhushanam; Kasturi Rangan

Purpose – The purpose of this study is to document how restaurant’s business-to-consumer communication strategies evolved on Facebook over time and how consumers’ reactions to a variety of Facebook messages changed over time. Design/methodology/approach – This study analyzed 2,463 Facebook messages posted by seven quick-service restaurant chains and three casual-dining restaurant chains in the fourth quarter of 2010, 2012 and 2014. ANOVA and post hoc t-test were used to compare the differences among four media types (photo, status update, video and hyperlink) in terms of their usage by companies and Facebook users’ reactions to these messages (measured by number of “Likes”, number of comments and number of shares). Findings – Over the three periods of time under observation, there is a substantial decrease of status updates by restaurants and a dramatic increase of photo updates. Photo remained as the most “popular” media type, receiving most “Likes”, comments and shares from consumers. Video was not cons...

Proceedings of the 2011 iConference on | 2011

The emotional world of health online communities

Bei Yu

This article presents a preliminary study on the emotional world of health online communities. Using sentiment analysis and natural language processing techniques, this study aims to (1) examine the strength of various kinds of emotions (positivity, optimism, negativity, anxiety, anger, and sadness) in online health forum discussions, and (2) compare the emotional status and expression of forum participants under different roles, such as askers and answerers, men and women, and caregivers and patients. This study is expected to improve the understanding of the emotional communication in online health communities.

international conference on data mining | 2002

Concept tree based clustering visualization with shaded similarity matrices

Jun Wang; Bei Yu; Les Gasser

One problem with existing clustering methods is that the interpretation of clusters may be difficult. Two different approaches have been used to solve this problem: conceptual clustering in machine learning and clustering visualization in statistics and graphics. The purpose of this paper is to investigate the benefits of combining clustering visualization and conceptual clustering to obtain better cluster interpretations. In our research we have combined concept trees for conceptual clustering with shaded similarity matrices for visualization. Experimentation shows that the two interpretation approaches can complement each other to help us understand data better.

conference on recommender systems | 2015

3rd International Workshop on News Recommendation and Analytics (INRA 2015)

Jon Atle Gulla; Bei Yu; Özlem Özgöbek; Nafiseh Shabib

The 3rd International Workshop on News Recommendation and Analytics (INRA 2015) is held in conjunction with RecSys 2015 Conference in Vienna, Austria. This paper presents a brief summary of the INRA 2015. This workshop aims to create an interdisciplinary community that addresses design issues in news recommender systems and news analytics, and promote fruitful collaboration opportunities between researchers, media companies and practitioners. We have a keynote speaker and an invited demo presentation in addition to 4 papers accepted in this workshop.

Explore More