Patrick Ehlen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick Ehlen is active.

Explore More

Publication

Featured researches published by Patrick Ehlen.

meeting of the association for computational linguistics | 2002

MATCH: An Architecture for Multimodal Dialogue Systems

Michael Johnston; SrinivasBangalore; Gunaranjan Vasireddy; Amanda Stent; Patrick Ehlen; Marilyn A. Walker; Steve Whittaker; Preetam Maloor

Mobile interfaces need to allow the user and system to adapt their choice of communication modes according to user preferences, the task at hand, and the physical and social environment. We describe a multimodal application architecture which combines finite-state multimodal language processing, a speech-act based multimodal dialogue manager, dynamic multimodal output generation, and user-tailored text planning to enable rapid prototyping of multimodal interfaces with flexible input and adaptive output. Our testbed application MATCH (Multimodal Access To City Help) provides a mobile multimodal speech-pen interface to restaurant and sub-way information for New York City.

annual meeting of the special interest group on discourse and dialogue | 2008

Modelling and Detecting Decisions in Multi-party Dialogue

Raquel Fernández; Matthew Frampton; Patrick Ehlen; Matthew Purver; Stanley Peters

We describe a process for automatically detecting decision-making sub-dialogues in transcripts of multi-party, human-human meetings. Extending our previous work on action item identification, we propose a structured approach that takes into account the different roles utterances play in the decision-making process. We show that this structured approach outperforms the accuracy achieved by existing decision detection systems based on flat annotations, while enabling the extraction of more fine-grained information that can be used for summarization and reporting.

PLOS ONE | 2015

Precision and Disclosure in Text and Voice Interviews on Smartphones

Michael F. Schober; Frederick G. Conrad; Christopher Antoun; Patrick Ehlen; Stefanie Fail; Andrew L. Hupp; Michael V. Johnston; Lucas Vickers; H. Yanna Yan; Chan Zhang

As people increasingly communicate via asynchronous non-spoken modes on mobile devices, particularly text messaging (e.g., SMS), longstanding assumptions and practices of social measurement via telephone survey interviewing are being challenged. In the study reported here, 634 people who had agreed to participate in an interview on their iPhone were randomly assigned to answer 32 questions from US social surveys via text messaging or speech, administered either by a human interviewer or by an automated interviewing system. 10 interviewers from the University of Michigan Survey Research Center administered voice and text interviews; automated systems launched parallel text and voice interviews at the same time as the human interviews were launched. The key question was how the interview mode affected the quality of the response data, in particular the precision of numerical answers (how many were not rounded), variation in answers to multiple questions with the same response scale (differentiation), and disclosure of socially undesirable information. Texting led to higher quality data—fewer rounded numerical answers, more differentiated answers to a battery of questions, and more disclosure of sensitive information—than voice interviews, both with human and automated interviewers. Text respondents also reported a strong preference for future interviews by text. The findings suggest that people interviewed on mobile devices at a time and place that is convenient for them, even when they are multitasking, can give more trustworthy and accurate answers than those in more traditional spoken interviews. The findings also suggest that answers from text interviews, when aggregated across a sample, can tell a different story about a population than answers from voice interviews, potentially altering the policy implications from a survey.

Discourse Processes | 2007

Modeling Speech Disfluency to Predict Conceptual Misalignment in Speech Survey Interfaces

Patrick Ehlen; Michael F. Schober; Frederick G. Conrad

Computer-based interviewing systems could use models of respondent disfluency behaviors to predict a need for clarification of terms in survey questions. This study compares simulated speech interfaces that use two such models–a generic model and a stereotyped model that distinguishes between the speech of younger and older speakers–to several non-modeling speech interfaces in a task where respondents provided answers to survey questions from fictional scenarios. The modeling procedure found that the best predictor of conceptual misalignment was a critical Goldilocks range for response latency–hat is, a response time that is neither too slow nor too fast–outside of which responses are more likely to be conceptually misaligned. Different Goldilocks ranges are effective for younger and older speakers.

intelligent user interfaces | 2008

Meeting adjourned: off-line learning interfaces for automatic meeting understanding

Patrick Ehlen; Matthew Purver; John Niekrasz; Kari Lee; Stanley Peters

Upcoming technologies will automatically identify and extract certain types of general information from meetings, such as topics and the tasks people agree to do. We explore interfaces for presenting this information to users after a meeting is completed, using two post-meeting interfaces that display information from topics and action items respectively. These interfaces also provide an excellent forum for obtaining user feedback about the performance of classification algorithms, allowing the system to learn and improve with time. We describe how we manage the delicate balance of obtaining necessary feedback without overburdening users. We also evaluate the effectiveness of feedback from one interface on improvement of future action item detection.

international conference on multimodal interfaces | 2010

Location grounding in multimodal local search

Patrick Ehlen; Michael Johnston

Computational models of dialog context have often focused on unimodal spoken dialog or text, using the language itself as the primary locus of contextual information. But as we move from spoken interaction to situated multimodal interaction on mobile platforms supporting a combination of spoken dialog with graphical interaction, touch-screen input, geolocation, and other non-linguistic contextual factors, we will need more sophisticated models of context that capture the influence of these factors on semantic interpretation and dialog flow. Here we focus on how users establish the location they deem salient from the multimodal context by grounding it through interactions with a map-based query system. While many existing systems rely on geolocation to establish the location context of a query, we hypothesize that this approach often ignores the grounding actions users make, and provide an analysis of log data from one such system that reveals errors that arise from that faulty treatment of grounding. We then explore and evaluate, using live field data from a deployed multimodal search system, several different context classification techniques that attempt to learn the location contexts users make salient by grounding them through their multimodal actions.

meeting of the association for computational linguistics | 2009

Who is ``You''? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue

Matthew Frampton; Raquel Fernández; Patrick Ehlen; C. Mario Christoudias; Trevor Darrell; Stanley Peters

We explore the problem of resolving the second person English pronoun you in multi-party dialogue, using a combination of linguistic and visual features. First, we distinguish generic and referential uses, then we classify the referential uses as either plural or singular, and finally, for the latter cases, we identify the addressee. In our first set of experiments, the linguistic and visual features are derived from manual transcriptions and annotations, but in the second set, they are generated through entirely automatic means. Results show that a multimodal system is often preferable to a unimodal one.

international conference on machine learning | 2006

Detecting action items in multi-party meetings: annotation and initial experiments

Matthew Purver; Patrick Ehlen; John Niekrasz

This paper presents the results of initial investigation and experiments into automatic action item detection from transcripts of multi-party human-human meetings. We start from the flat action item annotations of [1], and show that automatic classification performance is limited. We then describe a new hierarchical annotation schema based on the roles utterances play in the action item assignment process, and propose a corresponding approach to automatic detection that promises improved classification accuracy while also enabling the extraction of useful information for summarization and reporting.

annual meeting of the special interest group on discourse and dialogue | 2014

MVA: The Multimodal Virtual Assistant

Michael Johnston; John Chen; Patrick Ehlen; Hyuckchul Jung; Jay Lieske; Aarthi M. Reddy; Ethan O. Selfridge; Svetlana Stoyanchev; Brant J. Vasilieff; Jay Gordon Wilpon

The Multimodal Virtual Assistant (MVA) is an application that enables users to plan an outing through an interactive multimodal dialog with a mobile device. MVA demonstrates how a cloud-based multimodal language processing infrastructure can support mobile multimodal interaction. This demonstration will highlight incremental recognition, multimodal speech and gesture input, contextually-aware language understanding, and the targeted clarification of potentially incorrect segments within user input.

north american chapter of the association for computational linguistics | 2006

Shallow Discourse Structure for Action Item Detection

Matthew Purver; Patrick Ehlen; John Niekrasz

We investigated automatic action item detection from transcripts of multi-party meetings. Unlike previous work (Gruenstein et al., 2005), we use a new hierarchical annotation scheme based on the roles utterances play in the action item assignment process, and propose an approach to automatic detection that promises improved classification accuracy while enabling the extraction of useful information for summarization and reporting.

Explore More