Chee Wee Leong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chee Wee Leong is active.

Explore More

Publication

Featured researches published by Chee Wee Leong.

international conference on multimodal interfaces | 2014

Towards Automated Assessment of Public Speaking Skills Using Multimodal Cues

Lei Chen; Gary Feng; Jilliam Joe; Chee Wee Leong; Christopher Kitchen; Chong Min Lee

Traditional assessments of public speaking skills rely on human scoring. We report an initial study on the development of an automated scoring model for public speaking performances using multimodal technologies. Task design, rubric development, and human rating were conducted according to standards in educational assessment. An initial corpus of 17 speakers with 4 speaking tasks was collected using audio, video, and 3D motion capturing devices. A scoring model based on basic features in the speech content, speech delivery, and hand, body, and head movements significantly predicts human rating, suggesting the feasibility of using multimodal technologies in the assessment of public speaking skills.

international conference on multimodal interfaces | 2015

Evaluating Speech, Face, Emotion and Body Movement Time-series Features for Automated Multimodal Presentation Scoring

Vikram Ramanarayanan; Chee Wee Leong; Lei Chen; Gary Feng; David Suendermann-Oeft

We analyze how fusing features obtained from different multimodal data streams such as speech, face, body movement and emotion tracks can be applied to the scoring of multimodal presentations. We compute both time-aggregated and time-series based features from these data streams--the former being statistical functionals and other cumulative features computed over the entire time series, while the latter, dubbed histograms of cooccurrences, capture how different prototypical body posture or facial configurations co-occur within different time-lags of each other over the evolution of the multimodal, multivariate time series. We examine the relative utility of these features, along with curated speech stream features in predicting human-rated scores of multiple aspects of presentation proficiency. We find that different modalities are useful in predicting different aspects, even outperforming a naive human inter-rater agreement baseline for a subset of the aspects analyzed.

Proceedings of the Third Workshop on Metaphor in NLP | 2015

Supervised Word-Level Metaphor Detection: Experiments with Concreteness and Reweighting of Examples

Beata Beigman Klebanov; Chee Wee Leong; Michael Flor

We present a supervised machine learning system for word-level classification of all content words in a running text as being metaphorical or non-metaphorical. The system provides a substantial improvement upon a previously published baseline, using re-weighting of the training examples and using features derived from a concreteness database. We observe that while the first manipulation was very effective, the second was only slightly so. Possible reasons for these observations are discussed.

Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge | 2014

Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality

Lei Chen; Chee Wee Leong; Gary Feng; Chong Min Lee

The ability of making presentation slides and delivering them effectively to convey information to the audience is a task of increasing importance, particularly in the pursuit of both academic and professional career success. We envision that multimodal sensing and machine learning techniques can be employed to evaluate, and potentially help to improve the quality of the content and delivery of public presentations. To this end, we report a study using the Oral Presentation Quality Corpus provided by the 2014 Multimodal Learning Analytics (MLA) Grand Challenge. A set of multimodal features were extracted from slides, speech, posture and hand gestures, as well as head poses. We also examined the dimensionality of the human scores, which could be concisely represented by two Principal Component (PC) scores, comp1 for delivery skills and comp2 for slides quality. Several machine learning experiments were performed to predict the two PC scores using multimodal features. Our experiments suggest that multimodal cues can predict human scores on presentation tasks, and a scoring model comprising both verbal and visual features can outperform that using just a single modality.

Proceedings of the 2014 workshop on Emotion Representation and Modelling in Human-Computer-Interaction-Systems | 2014

An Initial Analysis of Structured Video Interviews by Using Multimodal Emotion Detection

Lei Chen; Su-Youn Yoon; Chee Wee Leong; Michelle Paulette Martin; Min Ma

Recently online video interviews have been increasingly used in the employment process. Though several automatic techniques have emerged to analyze the interview videos, so far, only simple emotion analyses have been attempted, e.g. counting the number of smiles on the face of an interviewee. In this paper, we report our initial study of employing advanced multimodal emotion detection approaches for the purpose of measuring performance on an interview task that elicits emotion. On an acted interview corpus we created, we performed our evaluations using a Speech-based Emotion Recognition (SER) system, as well as an off-the-shelf facial expression analysis toolkit (FACET). While the results obtained suggest the promise of using FACET for emotion detection, the benefits of employing the SER are somewhat limited.

international conference on multimodal interfaces | 2015

Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits

Chee Wee Leong; Lei Chen; Gary Feng; Chong Min Lee; Matthew Mulholland

Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.

workshop on innovative use of nlp for building educational applications | 2014

Automated scoring of speaking items in an assessment for teachers of English as a Foreign Language

Klaus Zechner; Keelan Evanini; Su-Youn Yoon; Lawrence Davis; Xinhao Wang; Lei Chen; Chong Min Lee; Chee Wee Leong

This paper describes an end-to-end prototype system for automated scoring of spoken responses in a novel assessment for teachers of English as a Foreign Language who are not native speakers of English. The 21 speaking items contained in the assessment elicit both restricted and moderately restricted responses, and their aim is to assess the essential speaking skills that English teachers need in order to be effective communicators in their classrooms. Our system consists of a state-of-the-art automatic speech recognizer; multiple feature generation modules addressing diverse aspects of speaking proficiency, such as fluency, pronunciation, prosody, grammatical accuracy, and content accuracy; a filter that identifies and flags problematic responses; and linear regression models that predict response scores based on subsets of the features. The automated speech scoring system was trained and evaluated on a data set involving about 1,400 test takers, and achieved a speaker-level correlation (when scores for all 21 responses of a speaker are aggregated) with human expert scores of 0.73.

meeting of the association for computational linguistics | 2016

Semantic classifications for detection of verb metaphors

Beata Beigman Klebanov; Chee Wee Leong; E.Dario Gutierrez; Ekaterina Shutova; Michael Flor

We investigate the effectiveness of semantic generalizations/classifications for capturing the regularities of the behavior of verbs in terms of their metaphoricity. Starting from orthographic word unigrams, we experiment with various ways of defining semantic classes for verbs (grammatical, resource-based, distributional) and measure the effectiveness of these classes for classifying all verbs in a running text as metaphor or non metaphor.

conference of the international speech communication association | 2016

Automatic Scoring of Monologue Video Interviews Using Multimodal Cues.

Lei Chen; Gary Feng; Michelle P. Martin-Raugh; Chee Wee Leong; Christopher Kitchen; Su-Youn Yoon; Blair Lehman; Harrison J. Kell; Chong Min Lee

Job interviews are an important tool for employee selection. When making hiring decisions, a variety of information from interviewees, such as previous work experience, skills, and their verbal and nonverbal communication, are jointly considered. In recent years, Social Signal Processing (SSP), an emerging research area on enabling computers to sense and understand human social signals, is being used develop systems for the coaching and evaluation of job interview performance. However this research area is still in its infancy and lacks essential resources (e.g., adequate corpora). In this paper, we report on our efforts to create an automatic interview rating system for monologuestyle video interviews, which have been widely used in today’s job hiring market. We created the first multimodal corpus for such video interviews. Additionally, we conducted manual rating on the interviewee’s personality and performance during 12 structured interview questions measuring different types of jobrelated skills. Finally, focusing on predicting overall interview performance, we explored a set of verbal and nonverbal features and several machine learning models. We found that using both verbal and nonverbal features provides more accurate predictions. Our initial results suggest that it is feasible to continue working in this newly formed area.

international conference on multimodal interfaces | 2016