Baoshi Yan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Baoshi Yan is active.

Explore More

Publication

Featured researches published by Baoshi Yan.

knowledge discovery and data mining | 2015

Transfer Learning for Bilingual Content Classification

Qian Sun; Mohammad Shafkat Amin; Baoshi Yan; Craig Martell; Vita Markman; Anmol Bhasin; Jieping Ye

LinkedIn Groups provide a platform on which professionals with similar background, target and specialities can share content, take part in discussions and establish opinions on industry topics. As in most online social communities, spam content in LinkedIn Groups poses great challenges to the user experience and could eventually lead to substantial loss of active users. Building an intelligent and scalable spam detection system is highly desirable but faces difficulties such as lack of labeled training data, particularly for languages other than English. In this paper, we take the spam (Spanish) job posting detection as the target problem and build a generic machine learning pipeline for multi-lingual spam detection. The main components are feature generation and knowledge migration via transfer learning. Specifically, in the feature generation phase, a relatively large labeled data set is generated via machine translation. Together with a large set of unlabeled human written Spanish data, unigram features are generated based on the frequency. In the second phase, machine translated data are properly reweighted to capture the discrepancy from human written ones and classifiers can be built on top of them. To make effective use of a small portion of labeled data available in human written Spanish, an adaptive transfer learning algorithm is proposed to further improve the performance. We evaluate the proposed method on LinkedIns production data and the promising results verify the efficacy of our proposed algorithm. The pipeline is ready for production.

advances in social networks analysis and mining | 2011

Entity Resolution Using Social Graphs for Business Applications

Baoshi Yan; Lokesh Bajaj; Anmol Bhasin

Social network such as Linked In maintains profiles for its members in a semi-structured format. A lot of business applications like ad targeting and content recommendations rely on canonicalization of data elements like companies, titles and schools for enabling fine grained advertising or recommending candidates for job postings. In this paper we explore the issues around resolving company names for hundreds of millions of member positions to known company entities using the social graph. We proposed a machine learning approach leveraging three dimensional feature sets including the social graph, social behavior and various content and demographic features. The experiments showed that our approach achieved high precision at a reasonable coverage and is significantly superior to a baseline content based approach.

ieee international conference on data science and advanced analytics | 2015

A context-aware approach to detection of short irrelevant texts

Sihong Xie; Jing Wang; Mohammad Shafkat Amin; Baoshi Yan; Anmol Bhasin; Clement T. Yu; Philip S. Yu

This paper presents a simple and effective framework that can detect irrelevant short text contents following blogs and news articles, etc. in a context-aware and timely fashion. Nowadays, websites such as Linkedin.com and CNN.com allow their visitors to leave comments after articles, and spammers are exploiting this feature to post irrelevant contents. Visited by millions of readers per day, these websites have extremely high visibility, and irrelevant comments have a detrimental effect on the visiting traffic and revenue of these websites. Therefore, it is critical to eliminate these irrelevant comments as accurately and early as possible. Different from traditional text mining tasks, comments following news and blog articles are characterized by briefness and context-dependent semantics, making it difficult to measure semantic relevance. Whats worse, there could be only a handful of comments soon after an article is posted, leading to a severe lack of information for semantics and relevance measurement. We propose to infer “context-aware semantics” to address the above challenges in a unified framework. Specifically, we construct contexts for comments using either blocks of surrounding comments, or comments collected via a principled transfer learning approach. The constructed contexts mitigate the sparseness and sharply define context-dependent semantics of comments, even at the early stage of commenting activities, allowing traditional dimension reduction methods to better capture the semantics of short texts in a context-aware way. We confirm the effectiveness of the proposed method on two real world datasets consisting of news and blog articles and comments, with a maximal improvement of 20% in Area Under Precision-Recall Curve.

Archive | 2013