Yelong Shen
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yelong Shen.
conference on information and knowledge management | 2014
Yelong Shen; Xiaodong He; Jianfeng Gao; Li Deng; Grégoire Mesnil
In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are discovered by the model and are then aggregated to form a sentence-level feature vector. Finally, a non-linear transformation is applied to extract high-level semantic information to generate a continuous vector representation for the full text string. The proposed convolutional latent semantic model (CLSM) is trained on clickthrough data and is evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that the proposed model effectively captures salient semantic information in queries and documents for the task while significantly outperforming previous state-of-the-art semantic models.
international world wide web conferences | 2014
Yelong Shen; Xiaodong He; Jianfeng Gao; Li Deng; Grégoire Mesnil
This paper presents a series of new latent semantic models based on a convolutional neural network (CNN) to learn low-dimensional semantic vectors for search queries and Web documents. By using the convolution-max pooling operation, local contextual information at the word n-gram level is modeled first. Then, salient local fea-tures in a word sequence are combined to form a global feature vector. Finally, the high-level semantic information of the word sequence is extracted to form a global vector representation. The proposed models are trained on clickthrough data by maximizing the conditional likelihood of clicked documents given a query, us-ing stochastic gradient ascent. The new models are evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that our model significantly outperforms other se-mantic models, which were state-of-the-art in retrieval performance prior to this work.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Hamid Palangi; Li Deng; Yelong Shen; Jianfeng Gao; Xiaodong He; Jianshu Chen; Xinying Song; Rabab K. Ward
This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks (RNN) with Long Short-Term Memory (LSTM) cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detect the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms Paragraph Vector method for web document retrieval task.
knowledge discovery and data mining | 2017
Yelong Shen; Po-Sen Huang; Jianfeng Gao; Weizhu Chen
Teaching a computer to read and answer general questions pertaining to a document is a challenging yet unsolved problem. In this paper, we describe a novel neural network architecture called the Reasoning Network (ReasoNet) for machine comprehension tasks. ReasoNets make use of multiple turns to effectively exploit and then reason over the relation among queries, documents, and answers. Different from previous approaches using a fixed number of turns during inference, ReasoNets introduce a termination state to relax this constraint on the reasoning depth. With the use of reinforcement learning, ReasoNets can dynamically determine whether to continue the comprehension process after digesting intermediate results, or to terminate reading when it concludes that existing information is adequate to produce an answer. ReasoNets achieve superior performance in machine comprehension datasets, including unstructured CNN and Daily Mail datasets, the Stanford SQuAD dataset, and a structured Graph Reachability dataset.
international conference on data mining | 2012
Lin Liu; Ruoming Jin; Charu C. Aggarwal; Yelong Shen
Many graphs in practical applications are not deterministic, but are probabilistic in nature because the existence of the edges is inferred with the use of a variety of statistical approaches. In this paper, we will examine the problem of clustering uncertain graphs. Uncertain graphs are best clustered with the use of a possible worlds model in which the most reliable clusters are discovered in the presence of uncertainty. Reliable clusters are those which are not likely to be disconnected in the context of different instantiations of the uncertain graph. We present experimental results which illustrate the effectiveness of our model and approach.
international conference on data mining | 2012
Yelong Shen; Ruoming Jin; Dejing Dou; Nafisa Afrin Chowdhury; Junfeng Sun; Brigitte Piniewski; David Kil
Modeling and predicting human behaviors, such as the activity level and intensity, is the key to prevent the cascades of obesity, and help spread wellness and healthy behavior in a social network. In this work, we propose a Socialized Gaussian Process (SGP) for socialized human behavior modeling. In the proposed SGP model, we naturally incorporates humans personal behavior factor and social correlation factor into a unified model, where basic Gaussian Process model is leveraged to capture individuals personal behavior pattern. Furthermore, we extend the Gaussian Process Model to socialized Gaussian Process (SGP) which aims to capture social correlation phenomena in the social network. The detailed experimental evaluation has shown the SGP model achieves the best prediction accuracy compared with other baseline methods.
meeting of the association for computational linguistics | 2017
Yelong Shen; Po-Sen Huang; Ming-Wei Chang; Jianfeng Gao
Recent studies on knowledge base completion, the task of recovering missing relationships based on recorded relations, demonstrate the importance of learning embeddings from multi-step relations. However, due to the size of knowledge bases, learning multi-step relations directly on top of observed triplets could be costly. Hence, a manually designed procedure is often used when training the models. In this paper, we propose Implicit ReasoNets (IRNs), which is designed to perform multi-step inference implicitly through a controller and shared memory. Without a human-designed inference procedure, IRNs use training data to learn to perform multi-step inference in an embedding neural space through the shared memory and controller. While the inference procedure does not explicitly operate on top of observed triplets, our proposed model outperforms all previous approaches on the popular FB15k benchmark by more than 5.7%.
international conference on data mining | 2015
Yelong Shen; Ruoming Jin; Jianshu Chen; Xiaodong He; Jianfeng Gao; Li Deng
Co-occurrence Data is a common and important information source in many areas, such as the word co-occurrence in the sentences, friends co-occurrence in social networks and products co-occurrence in commercial transaction data, etc, which contains rich correlation and clustering information about the items. In this paper, we study co-occurrence data using a general energy-based probabilistic model, and we analyze three different categories of energy-based model, namely, the L1, L2 and Lk models, which are able to capture different levels of dependency in the co-occurrence data. We also discuss how several typical existing models are related to these three types of energy models, including the Fully Visible Boltzmann Machine (FVBM) (L2), Matrix Factorization (L2), Log-BiLinear (LBL) models (L2), and the Restricted Boltzmann Machine (RBM) model (Lk). Then, we propose a Deep Embedding Model (DEM) (an Lk model) from the energy model in a principled manner. Furthermore, motivated by the observation that the partition function in the energy model is intractable and the fact that the major objective of modeling the co-occurrence data is to predict using the conditional probability, we apply the maximum pseudo-likelihood method to learn DEM. In consequence, the developed model and its learning method naturally avoid the above difficulties and can be easily used to compute the conditional probability in prediction. Interestingly, our method is equivalent to learning a special structured deep neural network using back-propagation and a special sampling strategy, which makes it scalable on large-scale datasets. Finally, in the experiments, we show that the DEM can achieve comparable or better results than state-of-the-art methods on datasets across several application domains.
conference on information and knowledge management | 2017
Zhen Liao; Xinying Song; Yelong Shen; Saekoo Lee; Jianfeng Gao; Ciya Liao
In this paper, we presented a new study for Web query entity disambiguation (QED), which is the task of disambiguating different candidate entities in a knowledge base given their mentions in a query. QED is particularly challenging because queries are often too short to provide rich contextual information that is required by traditional entity disambiguation methods. In this paper, we propose several methods to tackle the problem of QED. First, we explore the use of deep neural network (DNN) for capturing the character level textual information in queries. Our DNN approach maps queries and their candidate reference entities to feature vectors in a latent semantic space where the distance between a query and its correct reference entity is minimized. Second, we utilize the Web search result information of queries to help generate large amounts of weakly supervised training data for the DNN model. Third, we propose a two-stage training method to combine large-scale weakly supervised data with a small amount of human labeled data, which can significantly boost the performance of a DNN model. The effectiveness of our approach is demonstrated in the experiments using large-scale real-world datasets.
meeting of the association for computational linguistics | 2018
Xiaodong Liu; Yelong Shen; Kevin Duh; Jianfeng Gao