Ryan Lowe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ryan Lowe is active.

Explore More

Publication

Featured researches published by Ryan Lowe.

empirical methods in natural language processing | 2016

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Chia-Wei Liu; Ryan Lowe; Iulian Vlad Serban; Michael Noseworthy; Laurent Charlin; Joelle Pineau

We investigate evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available. Recent works in response generation have adopted metrics from machine translation to compare a models generated response to a single target response. We show that these metrics correlate very weakly with human judgements in the non-technical Twitter domain, and not at all in the technical Ubuntu domain. We provide quantitative and qualitative results highlighting specific weaknesses in existing metrics, and provide recommendations for future development of better automatic evaluation metrics for dialogue systems.

meeting of the association for computational linguistics | 2017

Towards an automatic Turing test: Learning to evaluate dialogue responses

Ryan Lowe; Michael Noseworthy; Iulian Vlad Serban; Nicolas Angelard-Gontier; Yoshua Bengio; Joelle Pineau

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM models predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation.

annual meeting of the special interest group on discourse and dialogue | 2016

On the Evaluation of Dialogue Systems with Next Utterance Classification.

Ryan Lowe; Iulian Vlad Serban; Michael Noseworthy; Laurent Charlin; Joelle Pineau

An open challenge in constructing dialogue systems is developing methods for automatically learning dialogue strategies from large amounts of unlabelled data. Recent work has proposed Next-Utterance-Classification (NUC) as a surrogate task for building dialogue systems from text data. In this paper we investigate the performance of humans on this task to validate the relevance of NUC as a method of evaluation. Our results show three main findings: (1) humans are able to correctly classify responses at a rate much better than chance, thus confirming that the task is feasible, (2) human performance levels vary across task domains (we consider 3 datasets) and expertise levels (novice vs experts), thus showing that a range of performance is possible on this type of task, (3) automated dialogue systems built using state-of-the-art machine learning methods have similar performance to the human novices, but worse than the experts, thus confirming the utility of this class of tasks for driving further research in automated dialogue systems.

meeting of the association for computational linguistics | 2016

Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data.

Teng Long; Ryan Lowe; Jackie Chi Kit Cheung; Doina Precup

Recent work in learning vector-space embeddings for multi-relational data has focused on combining relational information derived from knowledge bases with distributional information derived from large text corpora. We propose a simple approach that leverages the descriptions of entities or phrases available in lexical resources, in conjunction with distributional semantics, in order to derive a better initialization for training relational models. Applying this initialization to the TransE model results in significant new state-of-the-art performances on the WordNet dataset, decreasing the mean rank from the previous best of 212 to 51. It also results in faster convergence of the entity representations. We find that there is a trade-off between improving the mean rank and the hits@10 with this approach. This illustrates that much remains to be understood regarding performance improvements in relational models.

Archive | 2018

The First Conversational Intelligence Challenge

Mikhail Burtsev; Varvara Logacheva; Valentin Malykh; Iulian Vlad Serban; Ryan Lowe; Shrimai Prabhumoye; Alan W. Black; Alexander I. Rudnicky; Yoshua Bengio

The first Conversational Intelligence Challenge was conducted over 2017 with finals at NIPS conference. The challenge IS aimed at evaluating the state of the art in non-goal-driven dialogue systems (chatbots) and collecting a large dataset of human-to-machine and human-to-human conversations manually labelled for quality. We established a task for formal human evaluation of chatbots that allows to test capabilities of chatbot in topic-oriented dialogue. Instead of traditional chit-chat, participating systems and humans were given a task to discuss a short text. Ten dialogue systems participated in the competition. The majority of them combined multiple conversational models such as question answering and chit-chat systems to make conversations more natural. The evaluation of chatbots was performed by human assessors. Almost 1,000 volunteers were attracted and over 4,000 dialogues were collected during the competition. Final score of the dialogue quality for the best bot was 2.7 compared to 3.8 for human. This demonstrates that current technology allows supporting dialogue on a given topic but with quality significantly lower than that of human. To close this gap we plan to continue the experiments by organising the next conversational intelligence competition. This future work will benefit from the data we collected and dialogue systems that we made available after the competition presented in the paper.

Archive | 2018

Introduction to NIPS 2017 Competition Track

Sergio Escalera; Markus Weimer; Mikhail Burtsev; Valentin Malykh; Varvara Logacheva; Ryan Lowe; Iulian Vlad Serban; Yoshua Bengio; Alexander I. Rudnicky; Alan W. Black; Shrimai Prabhumoye; Łukasz Kidziński; Sharada Prasanna Mohanty; Carmichael F. Ong; Jennifer L. Hicks; Sergey Levine; Marcel Salathé; Scott L. Delp; Iker Huerga; Alexander Grigorenko; Leifur Thorbergsson; Anasuya Das; Kyla Nemitz; Jenna Sandker; Stephen King; Alexander S. Ecker; Leon A. Gatys; Matthias Bethge; Jordan L. Boyd-Graber; Shi Feng

Competitions have become a popular tool in the data science community to solve hard problems, assess the state of the art and spur new research directions. Companies like Kaggle and open source platforms like Codalab connect people with data and a data science problem to those with the skills and means to solve it. Hence, the question arises: What, if anything, could NIPS add to this rich ecosystem?

Numerical Solution of PDE Eigenvalue Problems, Workshop ID: 1347 | 2013

Computation of the H∞-Norm for Large-Scale Systems

Peter Benner; Ryan Lowe; Matthias Voigt

Eẋ(t) = Ax(t) +Bu(t), y(t) = Cx(t), where E, A ∈ Rn×n, B ∈ Rn×m, C ∈ Rp×n, x(t) ∈ R is the descriptor vector, u(t) ∈ R is the input vector, and y(t) ∈ R is the output vector. Assuming that the pencil λE − A is regular, the relationship between inputs and outputs in the frequency domain is given by the transfer function G(s) := C(sE −A)−1B. By RHp×m ∞ we denote the Banach space of all rational p × m matrix-valued functions that are analytic and bounded in the open right half-plane C := {s ∈ C : Re(s) > 0}. For G ∈ RHp×m ∞ , the H∞-norm is defined by ‖G‖H∞ := sup s∈C+ ‖G(s)‖2 = sup ω∈R ‖G(iω)‖2 .

annual meeting of the special interest group on discourse and dialogue | 2015