Margaret Mitchell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Margaret Mitchell is active.

Explore More

Publication

Featured researches published by Margaret Mitchell.

international conference on computer vision | 2015

VQA: Visual Question Answering

Stanislaw Antol; Aishwarya Agrawal; Jiasen Lu; Margaret Mitchell; Dhruv Batra; C. Lawrence Zitnick; Devi Parikh

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance.

computer vision and pattern recognition | 2015

From captions to visual concepts and back

Hao Fang; Saurabh Gupta; Forrest N. Iandola; Rupesh Kumar Srivastava; Li Deng; Piotr Dollár; Jianfeng Gao; Xiaodong He; Margaret Mitchell; John Platt; C. Lawrence Zitnick; Geoffrey Zweig

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.

north american chapter of the association for computational linguistics | 2015

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses

Alessandro Sordoni; Michel Galley; Michael Auli; Chris Brockett; Yangfeng Ji; Margaret Mitchell; Jian-Yun Nie; Jianfeng Gao; Bill Dolan

We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Spoken Language Derived Measures for Detecting Mild Cognitive Impairment

Brian Roark; Margaret Mitchell; John Paul Hosom; Kristy Hollingshead; Jeffrey Kaye

Spoken responses produced by subjects during neuropsychological exams can provide diagnostic markers beyond exam performance. In particular, characteristics of the spoken language itself can discriminate between subject groups. We present results on the utility of such markers in discriminating between healthy elderly subjects and subjects with mild cognitive impairment (MCI). Given the audio and transcript of a spoken narrative recall task, a range of markers are automatically derived. These markers include speech features such as pause frequency and duration, and many linguistic complexity measures. We examine measures calculated from manually annotated time alignments (of the transcript with the audio) and syntactic parse trees, as well as the same measures calculated from automatic (forced) time alignments and automatic parses. We show statistically significant differences between clinical subject groups for a number of measures. These differences are largely preserved with automation. We then present classification results, and demonstrate a statistically significant improvement in the area under the ROC curve (AUC) when using automatic spoken language derived features in addition to the neuropsychological test scores. Our results indicate that using multiple, complementary measures can aid in automatic detection of MCI.

international joint conference on natural language processing | 2015

Language Models for Image Captioning: The Quirks and What Works

Jacob Devlin; Hao Cheng; Hao Fang; Saurabh Gupta; Li Deng; Xiaodong He; Geoffrey Zweig; Margaret Mitchell

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-ofthe-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

computer vision and pattern recognition | 2012

Understanding and predicting importance in images

Alexander C. Berg; Tamara L. Berg; Hal Daumé; Jesse Dodge; Amit Goyal; Xufeng Han; Alyssa Mensch; Margaret Mitchell; Aneesh Sood; Karl Stratos; Kota Yamaguchi

What do people care about in an image? To drive computational visual recognition toward more human-centric outputs, we need a better understanding of how people perceive and judge the importance of content in images. In this paper, we explore how a number of factors relate to human perception of importance. Proposed factors fall into 3 broad types: 1) factors related to composition, e.g. size, location, 2) factors related to semantics, e.g. category of object or scene, and 3) contextual factors related to the likelihood of attribute-object, or object-scene pairs. We explore these factors using what people describe as a proxy for importance. Finally, we build models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.

international joint conference on natural language processing | 2015

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

Michel Galley; Chris Brockett; Alessandro Sordoni; Yangfeng Ji; Michael Auli; Chris Quirk; Margaret Mitchell; Jianfeng Gao; Bill Dolan

We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [−1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, ∆BLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in terms of both Spearman’s ρ and Kendall’s τ .

meeting of the association for computational linguistics | 2007

Syntactic complexity measures for detecting Mild Cognitive Impairment

Brian Roark; Margaret Mitchell; Kristy Hollingshead

We consider the diagnostic utility of various syntactic complexity measures when extracted from spoken language samples of healthy and cognitively impaired subjects. We examine measures calculated from manually built parse trees, as well as the same measures calculated from automatic parses. We show statistically significant differences between clinical subject groups for a number of syntactic complexity measures, and these differences are preserved with automatic parsing. Different measures show different patterns for our data set, indicating that using multiple, complementary measures is important for such an application.

meeting of the association for computational linguistics | 2016

Generating Natural Questions About an Image

Nasrin Mostafazadeh; Ishan Misra; Jacob Devlin; Margaret Mitchell; Xiaodong He; Lucy Vanderwende

There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.

Autism | 2010

Computational prosodic markers for autism

Jan P. H. van Santen; Emily Prud'hommeaux; Lois M. Black; Margaret Mitchell

We present results obtained with new instrumental methods for the acoustic analysis of prosody to evaluate prosody production by children with Autism Spectrum Disorder (ASD) and Typical Development (TD). Two tasks elicit focal stress - one in a vocal imitation paradigm, the other in a picture-description paradigm; a third task also uses a vocal imitation paradigm, and requires repeating stress patterns of two-syllable nonsense words. The instrumental methods differentiated significantly between the ASD and TD groups in all but the focal stress imitation task. The methods also showed smaller differences in the two vocal imitation tasks than in the picture-description task, as was predicted. In fact, in the nonsense word stress repetition task, the instrumental methods showed better performance for the ASD group. The methods also revealed that the acoustic features that predict auditory-perceptual judgment are not the same as those that differentiate between groups. Specifically, a key difference between the groups appears to be a difference in the balance between the various prosodic cues, such as pitch, amplitude, and duration, and not necessarily a difference in the strength or clarity with which prosodic contrasts are expressed.

Explore More