Mihajlo Grbovic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mihajlo Grbovic is active.

Explore More

Publication

Featured researches published by Mihajlo Grbovic.

knowledge discovery and data mining | 2015

E-commerce in Your Inbox: Product Recommendations at Scale

Mihajlo Grbovic; Vladan Radosavljevic; Nemanja Djuric; Narayan Bhamidipati; Jaikit Savla; Varun Bhagwan; Doug Sharp

In recent years online advertising has become increasingly ubiquitous and effective. Advertisements shown to visitors fund sites and apps that publish digital content, manage social networks, and operate e-mail services. Given such large variety of internet resources, determining an appropriate type of advertising for a given platform has become critical to financial success. Native advertisements, namely ads that are similar in look and feel to content, have had great success in news and social feeds. However, to date there has not been a winning formula for ads in e-mail clients. In this paper we describe a system that leverages user purchase history determined from e-mail receipts to deliver highly personalized product ads to Yahoo Mail users. We propose to use a novel neural language-based algorithm specifically tailored for delivering effective product recommendations, which was evaluated against baselines that included showing popular products and products predicted based on co-occurrence. We conducted rigorous offline testing using a large-scale product purchase data set, covering purchases of more than 29 million users from 172 e-commerce websites. Ads in the form of product recommendations were successfully tested on online traffic, where we observed a steady 9% lift in click-through rates over other ad formats in mail, as well as comparable lift in conversion rates. Following successful tests, the system was launched into production during the holiday season of 2014.

international world wide web conferences | 2015

Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

Nemanja Djuric; Hao Wu; Vladan Radosavljevic; Mihajlo Grbovic; Narayan Bhamidipati

We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a hierarchical framework with two embedded neural language models. In particular, we exploit the context of documents in streams and use one of the language models to model the document sequences, and the other to model word sequences within them. The models learn continuous vector representations for both word tokens and documents such that semantically similar documents and words are close in a common vector space. We discuss extensions to our model, which can be applied to personalized recommendation and social relationship mining by adding further user layers to the hierarchy, thus learning user-specific vectors to represent individual preferences. We validated the learned representations on a public movie rating data set from MovieLens, as well as on a large-scale Yahoo News data comprising three months of user activity logs collected on Yahoo servers. The results indicate that the proposed model can learn useful representations of both documents and word tokens, outperforming the current state-of-the-art by a large margin.

international acm sigir conference on research and development in information retrieval | 2015

Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search

Mihajlo Grbovic; Nemanja Djuric; Vladan Radosavljevic; Fabrizio Silvestri; Narayan Bhamidipati

Search engines represent one of the most popular web services, visited by more than 85% of internet users on a daily basis. Advertisers are interested in making use of this vast business potential, as very clear intent signal communicated through the issued query allows effective targeting of users. This idea is embodied in a sponsored search model, where each advertiser maintains a list of keywords they deem indicative of increased user response rate with regards to their business. According to this targeting model, when a query is issued all advertisers with a matching keyword are entered into an auction according to the amount they bid for the query, and the winner gets to show their ad. One of the main challenges is the fact that a query may not match many keywords, resulting in lower auction value, lower ad quality, and lost revenue for advertisers and publishers. Possible solution is to expand a query into a set of related queries and use them to increase the number of matched ads, called query rewriting. To this end, we propose rewriting method based on a novel query embedding algorithm, which jointly models query content as well as its context within a search session. As a result, queries with similar content and context are mapped into vectors close in the embedding space, which allows expansion of a query via simple K-nearest neighbor search in the projected space. The method was trained on more than 12 billion sessions, one of the largest corpuses reported thus far, and evaluated on both public TREC data set and in-house sponsored search data set. The results show the proposed approach significantly outperformed existing state-of-the-art, strongly indicating its benefits and the monetization potential.

international world wide web conferences | 2015

Evolution of Conversations in the Age of Email Overload

Farshad Kooti; Luca Maria Aiello; Mihajlo Grbovic; Kristina Lerman; Amin Mantrach

Email is a ubiquitous communications tool in the workplace and plays an important role in social interactions. Previous studies of email were largely based on surveys and limited to relatively small populations of email users within organizations. In this paper, we report results of a large-scale study of more than 2 million users exchanging 16 billion emails over several months. We quantitatively characterize the replying behavior in conversations within pairs of users. In particular, we study the time it takes the user to reply to a received message and the length of the reply sent. We consider a variety of factors that affect the reply time and length, such as the stage of the conversation, user demographics, and use of portable devices. In addition, we study how increasing load affects emailing behavior. We find that as users receive more email messages in a day, they reply to a smaller fraction of them, using shorter replies. However, their responsiveness remains intact, and they may even reply to emails faster. Finally, we predict the time to reply, length of reply, and whether the reply ends a conversation. We demonstrate considerable improvement over the baseline in all three prediction tasks, showing the significant role that the factors that we uncover play, in determining replying behavior. We rank these factors based on their predictive power. Our findings have important implications for understanding human behavior and designing better email management applications for tasks like ranking unread emails.

IEEE Transactions on Industrial Informatics | 2013

Cold Start Approach for Data-Driven Fault Detection

Mihajlo Grbovic; Weichang Li; Niranjan A. Subrahmanya; Adam K. Usadi; Slobodan Vucetic

A typical assumption in supervised fault detection is that abundant historical data are available prior to model learning, where all types of faults have already been observed at least once. This assumption is likely to be violated in practical settings as new fault types can emerge over time. In this paper we study this often overlooked cold start learning problem in data-driven fault detection, where in the beginning only normal operation data are available and faulty operation data become available as the faults occur. We explored how to leverage strengths of unsupervised and supervised approaches to build a model capable of detecting faults even if none are still observed, and of improving over time, as new fault types are observed. The proposed framework was evaluated on the benchmark Tennessee Eastman Process data. The proposed fusion model performed better on both unseen and seen faults than the stand-alone unsupervised and supervised models.

european conference on machine learning | 2011

Tracking concept change with incremental boosting by minimization of the evolving exponential loss

Mihajlo Grbovic; Slobodan Vucetic

Methods involving ensembles of classifiers, such as bagging and boosting, are popular due to the strong theoretical guarantees for their performance and their superior results. Ensemble methods are typically designed by assuming the training data set is static and completely available at training time. As such, they are not suitable for online and incremental learning. In this paper we propose IBoost, an extension of AdaBoost for incremental learning via optimization of an exponential cost function which changes over time as the training data changes. The resulting algorithm is flexible and allows a user to customize it based on the computational constraints of the particular application. The new algorithm was evaluated on stream learning in presence of concept change. Experimental results showed that IBoost achieves better performance than the original AdaBoost trained from scratch each time the data set changes, and that it also outperforms previously proposed Online Coordinate Boost, Online Boost and its non-stationary modifications, Fast and Light Boosting, ADWIN Online Bagging and DWM algorithms.

international world wide web conferences | 2015

Search Retargeting using Directed Query Embeddings

Mihajlo Grbovic; Nemanja Djuric; Vladan Radosavljevic; Narayan Bhamidipati

Determining user audience for online ad campaigns is a critical problem to companies competing in online advertising space. One of the most popular strategies is search retargeting, which involves targeting users that issued search queries related to advertisers core business, commonly specified by advertisers themselves. However, advertisers often fail to include many relevant queries, which results in suboptimal campaigns and negatively impacts revenue for both advertisers and publishers. To address this issue, we use recently proposed neural language models to learn low-dimensional, distributed query embeddings, which can be used to expand query lists with related queries through simple nearest neighbor searches in the embedding space. Experiments on real-world data set strongly suggest benefits of the approach.

international symposium on neural networks | 2009

Learning Vector Quantization with adaptive prototype addition and removal

Mihajlo Grbovic; Slobodan Vucetic

Learning Vector Quantization (LVQ) is a popular class of nearest prototype classifiers for multiclass classification. Learning algorithms from this family are widely used because of their intuitively clear learning process and ease of implementation. They run efficiently and in many cases provide state of the art performance. In this paper we propose a modification of the LVQ algorithm that addresses problems of determining appropriate number of prototypes, sensitivity to initialization, and sensitivity to noise in data. The proposed algorithm allows adaptive addition of prototypes at potentially beneficial locations and removal of harmful or less useful prototypes. The prototype addition and removal steps can be easily implemented on top of many existing LVQ algorithms. Experimental results on synthetic and benchmark datasets showed that the proposed modifications can significantly improve LVQ classification accuracy while at the same time determining the appropriate number of prototypes and avoiding the problems of initialization.

international conference on data mining | 2014

Hidden Conditional Random Fields with Deep User Embeddings for Ad Targeting

Nemanja Djuric; Vladan Radosavljevic; Mihajlo Grbovic; Narayan Bhamidipati

Estimating a users propensity to click on a display ad or purchase a particular item is a critical task in targeted advertising, a burgeoning online industry worth billions of dollars. Better and more accurate estimation methods result in improved online user experience, as only relevant and interesting ads are shown, and may also lead to large benefits for advertisers, as targeted users are more likely to click or make a purchase. In this paper we address this important problem, and propose an approach for improved estimation of ad click or conversion probability based on a sequence of users online actions, modeled using Hidden Conditional Random Fields (HCRF) model. In addition, in order to address the sparsity issue at the input side of the HCRF model, we propose to learn distributed, low-dimensional representations of user actions through a directed skip-gram, a neural architecture suitable for sequential data. Experimental results on a real-world data set comprising thousands of user sessions collected at Yahoo servers clearly indicate the benefits and the potential of the proposed approach, which outperformed competing state-of-the-art algorithms and obtained significant improvements in terms of retrieval measures.

international acm sigir conference on research and development in information retrieval | 2016

Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

Mihajlo Grbovic; Nemanja Djuric; Vladan Radosavljevic; Fabrizio Silvestri; Ricardo A. Baeza-Yates; Andrew Feng; Erik Ordentlich; Lee Yang; Gavin Owens

Sponsored search represents a major source of revenue for web search engines. The advertising model brings a unique possibility for advertisers to target direct user intent communicated through a search query, usually done by displaying their ads alongside organic search results for queries deemed relevant to their products or services. However, due to a large number of unique queries, it is particularly challenging for advertisers to identify all relevant queries. For this reason search engines often provide a service of advanced matching, which automatically finds additional relevant queries for advertisers to bid on. We present a novel advance match approach based on the idea of semantic embeddings of queries and ads. The embeddings were learned using a large data set of user search sessions, consisting of search queries, clicked ads and search links, while utilizing contextual information such as dwell time and skipped ads. To address the large-scale nature of our problem, both in terms of data and vocabulary size, we propose a novel distributed algorithm for training of the embeddings. Finally, we present an approach for overcoming a cold-start problem associated with new ads and queries. We report results of editorial evaluation and online tests on actual search traffic. The results show that our approach significantly outperforms baselines in terms of relevance, coverage and incremental revenue. Lastly, as part of this study, we open sourced query embeddings that can be used to advance the field.

Explore More