Gary Feng
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gary Feng.
international conference on multimodal interfaces | 2014
Lei Chen; Gary Feng; Jilliam Joe; Chee Wee Leong; Christopher Kitchen; Chong Min Lee
Traditional assessments of public speaking skills rely on human scoring. We report an initial study on the development of an automated scoring model for public speaking performances using multimodal technologies. Task design, rubric development, and human rating were conducted according to standards in educational assessment. An initial corpus of 17 speakers with 4 speaking tasks was collected using audio, video, and 3D motion capturing devices. A scoring model based on basic features in the speech content, speech delivery, and hand, body, and head movements significantly predicts human rating, suggesting the feasibility of using multimodal technologies in the assessment of public speaking skills.
international conference on multimodal interfaces | 2015
Vikram Ramanarayanan; Chee Wee Leong; Lei Chen; Gary Feng; David Suendermann-Oeft
We analyze how fusing features obtained from different multimodal data streams such as speech, face, body movement and emotion tracks can be applied to the scoring of multimodal presentations. We compute both time-aggregated and time-series based features from these data streams--the former being statistical functionals and other cumulative features computed over the entire time series, while the latter, dubbed histograms of cooccurrences, capture how different prototypical body posture or facial configurations co-occur within different time-lags of each other over the evolution of the multimodal, multivariate time series. We examine the relative utility of these features, along with curated speech stream features in predicting human-rated scores of multiple aspects of presentation proficiency. We find that different modalities are useful in predicting different aspects, even outperforming a naive human inter-rater agreement baseline for a subset of the aspects analyzed.
Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge | 2014
Lei Chen; Chee Wee Leong; Gary Feng; Chong Min Lee
The ability of making presentation slides and delivering them effectively to convey information to the audience is a task of increasing importance, particularly in the pursuit of both academic and professional career success. We envision that multimodal sensing and machine learning techniques can be employed to evaluate, and potentially help to improve the quality of the content and delivery of public presentations. To this end, we report a study using the Oral Presentation Quality Corpus provided by the 2014 Multimodal Learning Analytics (MLA) Grand Challenge. A set of multimodal features were extracted from slides, speech, posture and hand gestures, as well as head poses. We also examined the dimensionality of the human scores, which could be concisely represented by two Principal Component (PC) scores, comp1 for delivery skills and comp2 for slides quality. Several machine learning experiments were performed to predict the two PC scores using multimodal features. Our experiments suggest that multimodal cues can predict human scores on presentation tasks, and a scoring model comprising both verbal and visual features can outperform that using just a single modality.
international conference on multimodal interfaces | 2015
Chee Wee Leong; Lei Chen; Gary Feng; Chong Min Lee; Matthew Mulholland
Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.
conference of the international speech communication association | 2016
Lei Chen; Gary Feng; Michelle P. Martin-Raugh; Chee Wee Leong; Christopher Kitchen; Su-Youn Yoon; Blair Lehman; Harrison J. Kell; Chong Min Lee
Job interviews are an important tool for employee selection. When making hiring decisions, a variety of information from interviewees, such as previous work experience, skills, and their verbal and nonverbal communication, are jointly considered. In recent years, Social Signal Processing (SSP), an emerging research area on enabling computers to sense and understand human social signals, is being used develop systems for the coaching and evaluation of job interview performance. However this research area is still in its infancy and lacks essential resources (e.g., adequate corpora). In this paper, we report on our efforts to create an automatic interview rating system for monologuestyle video interviews, which have been widely used in today’s job hiring market. We created the first multimodal corpus for such video interviews. Additionally, we conducted manual rating on the interviewee’s personality and performance during 12 structured interview questions measuring different types of jobrelated skills. Finally, focusing on predicting overall interview performance, we explored a set of verbal and nonverbal features and several machine learning models. We found that using both verbal and nonverbal features provides more accurate predictions. Our initial results suggest that it is feasible to continue working in this newly formed area.
international conference on multimodal interfaces | 2016
Lei Chen; Gary Feng; Chee Wee Leong; Blair Lehman; Michelle P. Martin-Raugh; Harrison J. Kell; Chong Min Lee; Su-Youn Yoon
Archive | 2014
Lei Chen; Gary Feng; Chee Wee Leong; Christopher Kitchen; Chong Min Lee
ETS Research Report Series | 2015
Jilliam Joe; Christopher Kitchen; Lei Chen; Gary Feng
ETS Research Report Series | 2017
Harrison J. Kell; Michelle P. Martin-Raugh; Lauren Carney; Patricia A. Inglese; Lei Chen; Gary Feng
ETS Research Report Series | 2017
Harrison J. Kell; Michelle P. Martin-Raugh; Lauren Carney; Patricia A. Inglese; Lei Chen; Gary Feng