Anmol Bhasin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anmol Bhasin is active.

Explore More

Publication

Featured researches published by Anmol Bhasin.

international world wide web conferences | 2013

Is it time for a career switch

Jian Wang; Yi Zhang; Christian Posse; Anmol Bhasin

Tenure is a critical factor for an individual to consider when making a job transition. For instance, software engineers make a job transition to senior software engineers in a span of 2 years on average, or it takes for approximately 3 years for realtors to switch to brokers. While most existing work on recommender systems focuses on finding what to recommend to a user, this paper places emphasis on when to make appropriate recommendations and its impact on the item selection in the context of a job recommender system. The approach we propose, however, is general and can be applied to any recommendation scenario where the decision-making process is dependent on the tenure (i.e., the time interval) between successive decisions. Our approach is inspired by the proportional hazards model in statistics. It models the tenure between two successive decisions and related factors. We further extend the model with a hierarchical Bayesian framework to address the problem of data sparsity. The proposed model estimates the likelihood of a users decision to make a job transition at a certain time, which is denoted as the tenure-based decision probability. New and appropriate evaluation metrics are designed to analyze the models performance on deciding when is the right time to recommend a job to a user. We validate the soundness of our approach by evaluating it with an anonymous job application dataset across 140+ industries on LinkedIn. Experimental results show that the hierarchical proportional hazards model has better predictability of the users decision time, which in turn helps the recommender system to achieve higher utility/user satisfaction.

knowledge discovery and data mining | 2015

From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks

Ya Xu; Nanyu Chen; Addrian Fernandez; Omar Sinno; Anmol Bhasin

A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used among online websites, including social network sites such as Facebook, LinkedIn, and Twitter to make data-driven decisions. At LinkedIn, we have seen tremendous growth of controlled experiments over time, with now over 400 concurrent experiments running per day. General A/B testing frameworks and methodologies, including challenges and pitfalls, have been discussed extensively in several previous KDD work [7, 8, 9, 10]. In this paper, we describe in depth the experimentation platform we have built at LinkedIn and the challenges that arise particularly when running A/B tests at large scale in a social network setting. We start with an introduction of the experimentation platform and how it is built to handle each step of the A/B testing process at LinkedIn, from designing and deploying experiments to analyzing them. It is then followed by discussions on several more sophisticated A/B testing scenarios, such as running offline experiments and addressing the network effect, where one users action can influence that of another. Lastly, we talk about features and processes that are crucial for building a strong experimentation culture.

knowledge discovery and data mining | 2014

Modeling professional similarity by mining professional career trajectories

Ye Xu; Zang Li; Abhishek Gupta; Ahmet Bugdayci; Anmol Bhasin

For decades large corporations as well as labor placement services have maintained extensive yet static resume databanks. Online professional networks like LinkedIn have taken these resume databanks to a dynamic, constantly updated and massive scale professional profile dataset spanning career records from hundreds of industries, millions of companies and hundreds of millions of people worldwide. Using this professional profile dataset, this paper attempts to model profiles of individuals as a sequence of positions held by them as a time-series of nodes, each of which represents one particular position or job experience in the individuals career trajectory. These career trajectory models can be employed in various utility applications including career trajectory planning for students in schools & universities using knowledge inferred from real world career outcomes. They can also be employed for decoding sequences to uncover paths leading to certain professional milestones from a users current professional status. We deploy the proposed technique to ascertain professional similarity between two individuals by developing a similarity measure SimCareers (Similar Career Paths). The measure employs sequence alignment between two career trajectories to quantify professional similarity between career paths. To the best of our knowledge, SimCareers is the first framework to model professional similarity between two people taking account their career trajectory information. We posit, that using the temporal and structural features of a career trajectory for modeling profile similarity is a far more superior approach than using similarity measures on semi-structured attribute representation of a profile for this application. We validate our hypothesis by extensive quantitative evaluations on a gold dataset of similar profiles generated from recruiting activity logs from actual recruiters using LinkedIn. In addition, we show significant improvements in engagement by running an A/B test on a real-world application called Similar Profiles on LinkedIn, worlds largest online professional network.

international world wide web conferences | 2015

Network A/B Testing: From Sampling to Estimation

Huan Gui; Ya Xu; Anmol Bhasin; Jiawei Han

A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used in online websites, including social network sites such as Facebook, LinkedIn, and Twitter to make data-driven decisions. The goal of A/B testing is to estimate the treatment effect of a new change, which becomes intricate when users are interacting, i.e., the treatment effect of a user may spill over to other users via underlying social connections.When conducting these online controlled experiments, it is a common practice to make the Stable Unit Treatment Value Assumption (SUTVA) that each individuals response is affected by their own treatment only. Though this assumption simplifies the estimation of treatment effect, it does not hold when network interference is present, and may even lead to wrong conclusion. In this paper, we study the problem of network A/B testing in real networks, which have substantially different characteristics from the simulated random networks studied in previous works. We first examine the existence of network effect in a recent online experiment conducted at LinkedIn; Secondly, we propose an efficient and effective estimator for Average Treatment Effect (ATE) considering the interference between users in real online experiments; Finally, we apply our method in both simulations and a real world online experiment. The simulation results show that our estimator achieves better performance with respect to both bias and variance reduction. The real world online experiment not only demonstrates that large-scale network A/B test is feasible but also further validates many of our observations in the simulation studies.

knowledge discovery and data mining | 2015

Transfer Learning for Bilingual Content Classification

Qian Sun; Mohammad Shafkat Amin; Baoshi Yan; Craig Martell; Vita Markman; Anmol Bhasin; Jieping Ye

LinkedIn Groups provide a platform on which professionals with similar background, target and specialities can share content, take part in discussions and establish opinions on industry topics. As in most online social communities, spam content in LinkedIn Groups poses great challenges to the user experience and could eventually lead to substantial loss of active users. Building an intelligent and scalable spam detection system is highly desirable but faces difficulties such as lack of labeled training data, particularly for languages other than English. In this paper, we take the spam (Spanish) job posting detection as the target problem and build a generic machine learning pipeline for multi-lingual spam detection. The main components are feature generation and knowledge migration via transfer learning. Specifically, in the feature generation phase, a relatively large labeled data set is generated via machine translation. Together with a large set of unlabeled human written Spanish data, unigram features are generated based on the frequency. In the second phase, machine translated data are properly reweighted to capture the discrepancy from human written ones and classifiers can be built on top of them. To make effective use of a small portion of labeled data available in human written Spanish, an adaptive transfer learning algorithm is proposed to further improve the performance. We evaluate the proposed method on LinkedIns production data and the promising results verify the efficacy of our proposed algorithm. The pipeline is ready for production.

conference on recommender systems | 2014

Improving the discriminative power of inferred content information using segmented virtual profile

Haishan Liu; Anuj Goyal; Trevor Walker; Anmol Bhasin

We present a novel component of a hybrid recommender system at LinkedIn, where item features are augmented by a virtual profile based on observed user-item interactions. A virtual profile is generated by representing an item in the user feature space and leveraging the overrepresented user features from users who interacted with the item. It is a way to think about Collaborative Filtering with content features. The core principle is that if the feature occurs with high probability for the users who interacted with an item (henceforth termed as relevant users) versus those who did not (henceforth termed as non-relevant users), then that feature is a good candidate to be included in the virtual profile of the item in question. However, this scheme suffers from the data imbalance problem because observed relevant users are usually an extremely small minority group compared to the whole user base. Feature selection in this skewed setting is prone to noise from the overwhelming non-relevant examples that belong to the majority group. To alleviate the problem, we propose a method to select the most relevant non-relevant examples from the majority group by segmenting users on certain intelligently selected feature dimensions. The resulting virtual profile from the method is called the segmented virtual profile. Empirical evaluation on a real-world large scale recommender system at LinkedIn shows that our strategies for segmentation yield significantly better results.

advances in social networks analysis and mining | 2011

Entity Resolution Using Social Graphs for Business Applications

Baoshi Yan; Lokesh Bajaj; Anmol Bhasin

Social network such as Linked In maintains profiles for its members in a semi-structured format. A lot of business applications like ad targeting and content recommendations rely on canonicalization of data elements like companies, titles and schools for enabling fine grained advertising or recommending candidates for job postings. In this paper we explore the issues around resolving company names for hundreds of millions of member positions to known company entities using the social graph. We proposed a machine learning approach leveraging three dimensional feature sets including the social graph, social behavior and various content and demographic features. The experiments showed that our approach achieved high precision at a reasonable coverage and is significantly superior to a baseline content based approach.

knowledge discovery and data mining | 2015

Distributed Personalization

Xu Miao; Chun-Te Chu; Lijun Tang; Yitong Zhou; Joel Young; Anmol Bhasin

Personalization is a long-standing problem in data mining and machine learning. Companies make personalized product recommendations to millions of users every second. In addition to the recommendation problem, with the emerging of personal devices, many conventional problems, e.g., recognition, need to be personalized as well. Moreover, as the number of users grows huge, solving personalization becomes quite challenging. In this paper, we formalize the generic personalization problem as an optimization problem. We propose several ADMM algorithms to solve this problem in a distributed way including a new Asynchronous ADMM that removes all synchronous barriers to maximize the training throughput. We provide a mathematical analysis to show that the proposed Asynchronous ADMM algorithm holds a linear convergence rate which is the best to our knowledge. The distributed personalization allows training to be performed in either a cluster or even on a users device. This can improve the privacy protection as no personal data is uploaded, while personal models can still be shared with each other. We apply this approach to two industry problems, \emph{Facial Expression Recognition} and \emph{Job Recommendation}. Experiments demonstrate more than 30\% relative error reduction on both problems. Asynchronous ADMM allows faster training for problems with millions of users since it eliminates all network I/O waiting time to maximize the cluster CPU throughput. Experiments demonstrate 4 times faster than original synchronous ADMM algorithm.

ieee international conference on data science and advanced analytics | 2015

A context-aware approach to detection of short irrelevant texts

Sihong Xie; Jing Wang; Mohammad Shafkat Amin; Baoshi Yan; Anmol Bhasin; Clement T. Yu; Philip S. Yu

This paper presents a simple and effective framework that can detect irrelevant short text contents following blogs and news articles, etc. in a context-aware and timely fashion. Nowadays, websites such as Linkedin.com and CNN.com allow their visitors to leave comments after articles, and spammers are exploiting this feature to post irrelevant contents. Visited by millions of readers per day, these websites have extremely high visibility, and irrelevant comments have a detrimental effect on the visiting traffic and revenue of these websites. Therefore, it is critical to eliminate these irrelevant comments as accurately and early as possible. Different from traditional text mining tasks, comments following news and blog articles are characterized by briefness and context-dependent semantics, making it difficult to measure semantic relevance. Whats worse, there could be only a handful of comments soon after an article is posted, leading to a severe lack of information for semantics and relevance measurement. We propose to infer “context-aware semantics” to address the above challenges in a unified framework. Specifically, we construct contexts for comments using either blocks of surrounding comments, or comments collected via a principled transfer learning approach. The constructed contexts mitigate the sparseness and sharply define context-dependent semantics of comments, even at the early stage of commenting activities, allowing traditional dimension reduction methods to better capture the semantics of short texts in a context-aware way. We confirm the effectiveness of the proposed method on two real world datasets consisting of news and blog articles and comments, with a maximal improvement of 20% in Area Under Precision-Recall Curve.

conference on recommender systems | 2013

Beyond friendship: the art, science and applications of recommending people to people in social networks

Luiz Augusto Sangoi Pizzato; Anmol Bhasin

While Recommender Systems are powerful drivers of engagement and transactional utility in social networks, People recommenders are a fairly involved and diverse subdomain. Consider that movies are recommended to be watched, news is recommended to be read, people however, are recommended for a plethora of reasons -- such as recommendation of people to befriend, follow, partner, targets for an advertisement or service, recruiting, partnering romantically and to join thematic interest groups. This tutorial aims to first describe the problem domain, touch upon classical approaches like link analysis and collaborative filtering and then take a rapid deep dive into the unique aspects of this problem space like reciprocity, intent understanding of recommender and the recomendee, contextual people recommendations in communication flows and social referrals -- a paradigm for delivery of recommendations using the social graph. These aspects will be discussed in the context of published original work developed by the authors and their collaborators and in many cases deployed in massive-scale real world applications on professional networks such as LinkedIn.

Explore More