Pavel Dmitriev
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pavel Dmitriev.
conference on information and knowledge management | 2016
Pavel Dmitriev; Xian Wu
You get what you measure, and you cant manage what you dont measure. Metrics are a powerful tool used in organizations to set goals, decide which new products and features should be released to customers, which new tests and experiments should be conducted, and how resources should be allocated. To a large extent, metrics drive the direction of an organization, and getting metrics right is one of the most important and difficult problems an organization needs to solve. However, creating good metrics that capture long-term company goals is difficult. They try to capture abstract concepts such as success, delight, loyalty, engagement, life-time value, etc. How can one determine that a metric is a good one? Or, that one metric is better than another? In other words, how do we measure the quality of metrics? Can the evaluation process be automated so that anyone with an idea of a new metric can quickly evaluate it? In this paper we describe the metric evaluation system deployed at Bing, where we have been working on designing and improving metrics for over five years. We believe that by applying a data driven approach to metric evaluation we have been able to substantially improve our metrics and, as a result, ship better features and improve search experience for Bings users.
software engineering and advanced applications | 2017
Aleksander Fabijan; Pavel Dmitriev; Helena Holmström Olsson; Jan Bosch
Online controlled experiments (for example A/B tests) are increasingly being performed to guide product development and accelerate innovation in online software product companies. The benefits of controlled experiments have been shown in many cases with incremental product improvement as the objective. In this paper, we demonstrate that the value of controlled experimentation at scale extends beyond this recognized scenario. Based on an exhaustive and collaborative case study in a large software-intensive company with highly developed experimentation culture, we inductively derive the benefits of controlled experimentation. The contribution of our paper is twofold. First, we present a comprehensive list of benefits and illustrate our findings with five case examples of controlled experiments conducted at Microsoft. Second, we provide guidance on how to achieve each of the benefits. With our work, we aim to provide practitioners in the online domain with knowledge on how to use controlled experimentation to maximize the benefits on the portfolio, product and team level.
international conference on software engineering | 2017
Aleksander Fabijan; Pavel Dmitriev; Helena Holmström Olsson; Jan Bosch
Software development companies are increasingly aiming to become data-driven by trying to continuously experiment with the products used by their customers. Although familiar with the competitive edge that the A/B testing technology delivers, they seldom succeed in evolving and adopting the methodology. In this paper, and based on an exhaustive and collaborative case study research in a large software-intense company with highly developed experimentation culture, we present the evolution process of moving from ad-hoc customer data analysis towards continuous controlled experimentation at scale. Our main contribution is the Experimentation Evolution Model in which we detail three phases of evolution: technical, organizational and business evolution. With our contribution, we aim to provide guidance to practitioners on how to develop and scale continuous experimentation in software organizations with the purpose of becoming data-driven at scale.
knowledge discovery and data mining | 2017
Pavel Dmitriev; Somit Gupta; Dong Woo Kim; Garnet J. Vaz
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested in web sites, mobile applications, desktop applications, services, and operating systems. One of the key challenges for organizations that run controlled experiments is to come up with the right set of metrics [1] [2] [3]. Having good metrics, however, is not enough. In our experience of running thousands of experiments with many teams across Microsoft, we observed again and again how incorrect interpretations of metric movements may lead to wrong conclusions about the experiments outcome, which if deployed could hurt the business by millions of dollars. Inspired by Steven Goodmans twelve p-value misconceptions [4], in this paper, we share twelve common metric interpretation pitfalls which we observed repeatedly in our experiments. We illustrate each pitfall with a puzzling example from a real experiment, and describe processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall. With this paper, we aim to increase the experimenters awareness of metric interpretation issues, leading to improved quality and trustworthiness of experiment results and better data-driven decisions.
international conference on big data | 2016
Pavel Dmitriev; Brian Frasca; Somit Gupta; Ron Kohavi; Garnet J. Vaz
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features. One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different variants. The difficulty is that short-term changes to metrics may not predict the long-term impact of a change. For example, raising prices likely increases short-term revenue but also likely reduces long-term revenue (customer lifetime value) as users abandon. Degrading search results in a Search Engine causes users to search more, thus increasing query share short-term, but increasing abandonment and thus reducing longterm customer lifetime value. Ideally, an OEC is based on metrics in a short-term experiment that are good predictors of long-term value. To assess long-term impact, one approach is to run longterm controlled experiments and assume that long-term effects are represented by observed metrics. In this paper we share several examples of long-term experiments and the pitfalls associated with running them. We discuss cookie stability, survivorship bias, selection bias, and perceived trends, and share methodologies that can be used to partially address some of these issues. While there is clearly value in evaluating long-term trends, experimenters running long-term experiments must be cautious, as results may be due to the above pitfalls more than the true delta between the Treatment and Control. We hope our real examples and analyses will sensitize readers to the issues and encourage the development of new methodologies for this important problem.
international world wide web conferences | 2012
Ovidiu Dan; Pavel Dmitriev; Ryen W. White
Search engines record a large amount of metadata each time a user issues a query. While efficiently mining this data can be challeng-ing, the results can be useful in multiple ways, including monitoring search engine performance, improving search relevance, prioritizing research, and optimizing day-to-day operations. In this poster, we describe an approach for mining query log data for actionable insights - specific query segments (sets of queries) that require attention, and actions that need to be taken to improve the segments. Starting with a set of important metrics, we identify query segments that are interesting with respect to these metrics using a distributed frequent itemset mining algorithm.
international acm sigir conference on research and development in information retrieval | 2017
Alex Deng; Pavel Dmitriev; Somit Gupta; Ron Kohavi; Paul Raff; Lukas Vermeer
The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fishers experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and evaluation of online controlled experiments at scale (100s of concurrently running experiments) across variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges. In this tutorial we will give an introduction to A/B testing, share key lessons learned from scaling experimentation at Bing to thousands of experiments per year, present real examples, and outline promising directions for future work. The tutorial will go beyond applications of A/B testing in information retrieval and will also discuss on practical and research challenges arising in experimentation on web sites and mobile and desktop apps. Our goal in this tutorial is to teach attendees how to scale experimentation for their teams, products, and companies, leading to better data-driven decisions. We also want to inspire more academic research in the relatively new and rapidly evolving field of online controlled experimentation.
Archive | 2011
Pavel Dmitriev; Wei Zhuang
Archive | 2011
Timothy Edgar; Ambarish Chitnis; Ryen W. White; Pavel Dmitriev; Rajanikanth Ageeru; Ovidiu Dan; Lin Tang
Archive | 2011
Timothy Edgar; Ambarish Chitnis; Ryen W. White; Pavel Dmitriev; Rajanikanth Ageeru; Ovidiu Dan; Lin Tang