Fifteenth ACM Conference on Recommender Systems | 2021

Learning a Voice-based Conversational Recommender using Offline Policy Optimization

Abstract

Voice-based conversational recommenders offer a natural way to improve recommendation quality by asking the user for missing information. This talk details how we use offline policy optimization to learn a dialog manager that determines what items to present and what clarifying questions to ask, in order to maximize the success of the conversation. Counter-factual learning allows us to compare various modeling techniques using only logged conversational data. Our approach is applied to Amazon Music’s first voice browsing experience (Alexa, help me find music), which interleaves disambiguation questions and music sample suggestions. Offline policy evaluation results show that an XGBoost reward regressor outperforms linear and neural policies on held out data. A first user-facing A/B test confirms our offline results, by increasing our task completion rate by 8% relative compared to our production rule-based conversational recommender, while reducing the number of turns to complete the task by 20%. A second A/B test shows that extending the set of candidate items to present and adding an embedding-based user-item affinity action feature improves task success rate further by 4% relative, while reducing the number of turns further by 13%. These results suggest that offline policy optimization from conversation logs is a viable way to foster conversational recommender research, while minimizing the number of user-facing experiments needed to determine the optimal dialog policy.

Volume None

Fifteenth ACM Conference on Recommender Systems | 2021

Learning a Voice-based Conversational Recommender using Offline Policy Optimization

Abstract

Volume None

Pages None

DOI 10.1145/3460231.3474600

Language English

Journal Fifteenth ACM Conference on Recommender Systems

Full Text