Joseph G. Ellis
Columbia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joseph G. Ellis.
international conference on multimodal interfaces | 2014
Joseph G. Ellis; Brendan Jou; Shih-Fu Chang
We present a multimodal sentiment study performed on a novel collection of videos mined from broadcast and cable television news programs. To the best of our knowledge, this is the first dataset released for studying sentiment in the domain of broadcast video news. We describe our algorithm for the processing and creation of person-specific segments from news video, yielding 929 sentence-length videos, and are annotated via Amazon Mechanical Turk. The spoken transcript and the video content itself are each annotated for their expression of positive, negative or neutral sentiment. Based on these gathered user annotations, we demonstrate for news video the importance of taking into account multimodal information for sentiment prediction, and in particular, challenging previous text-based approaches that rely solely on available transcripts. We show that as much as 21.54% of the sentiment annotations for transcripts differ from their respective sentiment annotations when the video clip itself is presented. We present audio and visual classification baselines over a three-way sentiment prediction of positive, negative and neutral, as well as person-dependent versus person-independent classification influence on performance. Finally, we release the News Rover Sentiment dataset to the greater research community.
international conference on multimedia retrieval | 2018
Hongzhi Li; Joseph G. Ellis; Lei Zhang; Shih-Fu Chang
Visual patterns represent the discernible regularity in the visual world. They capture the essential nature of visual objects or scenes. Understanding and modeling visual patterns is a fundamental problem in visual recognition that has wide ranging applications. In this paper, we study the problem of visual pattern mining and propose a novel deep neural network architecture called PatternNet for discovering these patterns that are both discriminative and representative. The proposed PatternNet leverages the filters in the last convolution layer of a convolutional neural network to find locally consistent visual patches, and by combining these filters we can effectively discover unique visual patterns. In addition, PatternNet can discover visual patterns efficiently without performing expensive image patch sampling, and this advantage provides an order of magnitude speedup compared to most other approaches. We evaluate the proposed PatternNet subjectively by showing randomly selected visual patterns which are discovered by our method and quantitatively by performing image classification with the identified visual patterns and comparing our performance with the current state-of-the-art. We also directly evaluate the quality of the discovered visual patterns by leveraging the identified patterns as proposed objects in an image and compare with other relevant methods. Our proposed network and procedure, PatterNet, is able to outperform competing methods for the tasks described.
acm multimedia | 2017
Tongtao Zhang; Spencer Whitehead; Hanwang Zhang; Hongzhi Li; Joseph G. Ellis; Lifu Huang; Wei Liu; Heng Ji; Shih-Fu Chang
In this paper, we focus on improving Event Extraction (EE) by incorporating visual knowledge with words and phrases from text documents. We first discover visual patterns from large-scale text-image pairs in a weakly-supervised manner and then propose a multimodal event extraction algorithm where the event extractor is jointly trained with textual features and visual patterns. Extensive experimental results on benchmark data sets demonstrate that the proposed multimodal EE method can achieve significantly better performance on event extraction: absolute 7.1% F-score gain on event trigger labeling and 8.5% F-score gain on event argument labeling.
international conference on multimedia retrieval | 2016
Emily Song; Joseph G. Ellis; Hongzhi Li; Shih-Fu Chang
Accurately gauging the political atmosphere is especially difficult in this day and age, as individuals have access to a constantly growing collection of written and audiovisual news sources. This is especially true with regards to the U.S. presidential election, as there are numerous candidates, countless stories, and opinion articles discussing the merits of each particular candidate. It is therefore challenging for people to make an accurate assessment of what each candidate represents and how they would act if they were elected into office. To address this problem, we present a large-scale dataset comprised of videos of politicians speaking organized by the topics they are speaking about, and a user interface for exploring this interesting dataset. Our interface links people and events to relevant pieces of audiovisual media, and presents the desired information in a meaningful and intuitive manner. Our approach is unique by direct linking to actual speaking by politicians about specific topics, rather than links to textual quotes only. We describe the larger underlying infrastructure, a novel automated system that crawls thousands of internet news sources and 100 television news channels daily, and automatically discovers entities and indexes the content into events and topics. We examine how our user interface provides helpful and unique insights to its users, and give an example of the type of large scale trend analysis that can be performed with our system. Our online demo can be accessed at: http://www.ee.columbia.edu/dvmm/PoliticialSpeakerDemo
acm multimedia | 2016
Joseph G. Ellis; Svebor Karaman; Hongzhi Li; Hong Bin Shim; Shih-Fu Chang
With the growth of social media platforms in recent years, social media is now a major source of information and news for many people around the world. In particular the rise of hashtags have helped to build communities of discussion around particular news, topics, opinions, and ideologies. However, television news programs still provide value and are used by a vast majority of the population to obtain their news, but these videos are not easily linked to broader discussion on social media. We have built a novel pipeline that allows television news to be placed in its relevant social media context, by leveraging hashtags. In this paper, we present a method for automatically collecting television news and social media content (Twitter) and discovering the hashtags that are relevant for a TV news video. Our algorithms incorporate both the visual and text information within social media and television content, and we show that by leveraging both modalities we can improve performance over single modality approaches.
acm multimedia | 2013
Brendan Jou; Hongzhi Li; Joseph G. Ellis; Daniel Morozoff-Abegauz; Shih-Fu Chang
acm multimedia | 2016
Hongzhi Li; Joseph G. Ellis; Heng Ji; Shih-Fu Chang
international world wide web conferences | 2016
Maja R. Rudolph; Joseph G. Ellis; David M. Blei
Archive | 2016
Shih-Fu Chang; Brendan Jou; Hongzhi Li; Joseph G. Ellis; Daniel Morozoff-Abezgauz
international symposium on multimedia | 2014
Joseph G. Ellis; W. Sabrina Lin; Ching-Yung Lin; Shih-Fu Chang