Is this you? Create Your Porfile

Yi Tay

Nanyang Technological University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yi Tay is active.

Explore More

Publication

Featured researches published by Yi Tay.

international acm sigir conference on research and development in information retrieval | 2017

Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture

Yi Tay; Minh C. Phan; Luu Anh Tuan; Siu Cheung Hui

We describe a new deep learning architecture for learning to rank question answer pairs. Our approach extends the long short-term memory (LSTM) network with holographic composition to model the relationship between question and answer representations. As opposed to the neural tensor layer that has been adopted recently, the holographic composition provides the benefits of scalable and rich representational learning approach without incurring huge parameter costs. Overall, we present Holographic Dual LSTM (HD-LSTM), a unified architecture for both deep sentence modeling and semantic matching. Essentially, our model is trained end-to-end whereby the parameters of the LSTM are optimized in a way that best explains the correlation between question and answer representations. In addition, our proposed deep learning architecture requires no extensive feature engineering. Via extensive experiments, we show that HD-LSTM outperforms many other neural architectures on two popular benchmark QA datasets. Empirical studies confirm the effectiveness of holographic composition over the neural tensor layer.

conference on information and knowledge management | 2017

NeuPL: Attention-based Semantic Matching and Pair-Linking for Entity Disambiguation

Minh C. Phan; Aixin Sun; Yi Tay; Jialong Han; Chenliang Li

Entity disambiguation, also known as entity linking, is the task of mapping mentions in text to the corresponding entities in a given knowledge base, e.g. Wikipedia. Two key challenges are making use of mentions context to disambiguate (i.e. local objective), and promoting coherence of all the linked entities (i.e. global objective). In this paper, we propose a deep neural network model to effectively measure the semantic matching between mentions context and target entity. We are the first to employ the long short-term memory (LSTM) and attention mechanism for entity disambiguation. We also propose Pair-Linking, a simple but effective and significantly fast linking algorithm. Pair-Linking iteratively identifies and resolves pairs of mentions, starting from the most confident pair. It finishes linking all mentions in a document by scanning the pairs of mentions at most once. Our neural network model combined with Pair-Linking, named NeuPL, outperforms state-of-the-art systems over different types of documents including news, RSS, and tweets.

knowledge discovery and data mining | 2018

Multi-Pointer Co-Attention Networks for Recommendation

Yi Tay; Anh Tuan Luu; Siu Cheung Hui

Many recent state-of-the-art recommender systems such as D-ATT, TransNet and DeepCoNN exploit reviews for representation learning. This paper proposes a new neural architecture for recommendation with reviews. Our model operates on a multi-hierarchical paradigm and is based on the intuition that not all reviews are created equal, i.e., only a selected few are important. The importance, however, should be dynamically inferred depending on the current target. To this end, we propose a review-by-review pointer-based learning scheme that extracts important reviews from user and item reviews and subsequently matches them in a word-by-word fashion. This enables not only the most informative reviews to be utilized for prediction but also a deeper word-level interaction. Our pointer-based method operates with a gumbel-softmax based pointer mechanism that enables the incorporation of discrete vectors within differentiable neural architectures. Our pointer mechanism is co-attentive in nature, learning pointers which are co-dependent on user-item relationships. Finally, we propose a multi-pointer learning scheme that learns to combine multiple views of user-item interactions. We demonstrate the effectiveness of our proposed model via extensive experiments on 24 benchmark datasets from Amazon and Yelp. Empirical results show that our approach significantly outperforms existing state-of-the-art models, with up to 19% and 71% relative improvement when compared to TransNet and DeepCoNN respectively. We study the behavior of our multi-pointer learning mechanism, shedding light on evidence aggregation patterns in review-based recommender systems.

conference on information and knowledge management | 2017

Dyadic Memory Networks for Aspect-based Sentiment Analysis

Yi Tay; Luu Anh Tuan; Siu Cheung Hui

This paper proposes Dyadic Memory Networks (DyMemNN), a novel extension of end-to-end memory networks (memNN) for aspect-based sentiment analysis (ABSA). Originally designed for question answering tasks, memNN operates via a memory selection operation in which relevant memory pieces are adaptively selected based on the input query. In the problem of ABSA, this is analogous to aspects and documents in which the relationship between each word in the document is compared with the aspect vector. In the standard memory networks, simple dot products or feed forward neural networks are used to model the relationship between aspect and words which lacks representation learning capability. As such, our dyadic memory networks ameliorates this weakness by enabling rich dyadic interactions between aspect and word embeddings by integrating either parameterized neural tensor compositions or holographic compositions into the memory selection operation. To this end, we propose two variations of our dyadic memory networks, namely the Tensor DyMemNN and Holo DyMemNN. Overall, our two models are end-to-end neural architectures that enable rich dyadic interaction between aspect and document which intuitively leads to better performance. Via extensive experiments, we show that our proposed models achieve the state-of-the-art performance and outperform many neural architectures across six benchmark datasets.

web search and data mining | 2018

Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering

Yi Tay; Luu Anh Tuan; Siu Cheung Hui

The dominant neural architectures in question answer retrieval are based on recurrent or convolutional encoders configured with complex word matching layers. Given that recent architectural innovations are mostly new word interaction layers or attention-based matching mechanisms, it seems to be a well-established fact that these components are mandatory for good performance. Unfortunately, the memory and computation cost incurred by these complex mechanisms are undesirable for practical applications. As such, this paper tackles the question of whether it is possible to achieve competitive performance with simple neural architectures. We propose a simple but novel deep learning architecture for fast and efficient question-answer ranking and retrieval. More specifically, our proposed model, HyperQA, is a parameter efficient neural network that outperforms other parameter intensive models such as Attentive Pooling BiLSTMs and Multi-Perspective CNNs on multiple QA benchmarks. The novelty behind HyperQA is a pairwise ranking objective that models the relationship between question and answer embeddings in Hyperbolic space instead of Euclidean space. This empowers our model with a self-organizing ability and enables automatic discovery of latent hierarchies while learning embeddings of questions and answers. Our model requires no feature engineering, no similarity matrix matching, no complicated attention mechanisms nor over-parameterized layers and yet outperforms and remains competitive to many models that have these functionalities on multiple benchmarks.Many state-of-the-art deep learning models for question answer retrieval are highly complex, ox89en having a huge number of parameters or complicated word interaction mechanisms. x8cis paper studies if it is possible to achieve equally competitive performance with smaller and faster neural architectures. Overall, our proposed approach is a simple neural network that performs question-answer matching and ranking in Hyperbolic space. We show that QA embeddings learned in Hyperbolic space results in highly competitive performance on multiple benchmarks, outperforming models with signix80cantly much larger parameters. Our proposed approach (90K parameters) remains competitive to models with millions of parameters such as Ax8aentive Pooling BiLSTMs or Multi-Perspective Convolutional Neural Networks (MP-CNN).

international acm sigir conference on research and development in information retrieval | 2017

Cross-Device User Linking: URL, Session, Visiting Time, and Device-log Embedding

Minh C. Phan; Aixin Sun; Yi Tay

Cross-Device User Linking is the task of detecting same users given their browsing logs on different devices (e.g., tablet, mobile phone, PC, etc.). The problem was introduced in CIKM Cup 2016 together with a new dataset provided by Data-Centric Alliance (DCA). In this paper, we present insightful analysis on the dataset and propose a solution to link users based on their visited URLs, visiting time, and profile embeddings. We cast the problem as pairwise classification and use gradient boosting as the leaning-to-rank model. Our model works on a set of features exacted from URLs, titles, time and session data derived from user device-logs. The model outperforms the best solution in the CIKM Cup by a large margin.

conference on information and knowledge management | 2017

Multi-Task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs

Yi Tay; Luu Anh Tuan; Minh C. Phan; Siu Cheung Hui

Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a list of non-discrete attributes for each entity. Intuitively, these attributes such as height, price or population count are able to richly characterize entities in knowledge graphs. This additional source of information may help to alleviate the inherent sparsity and incompleteness problem that are prevalent in knowledge graphs. Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs. In this paper, we propose a novel multi-task neural network approach for both encoding and prediction of non-discrete attribute information in a relational setting. Specifically, we train a neural network for triplet prediction along with a separate network for attribute value regression. Via multi-task learning, we are able to learn representations of entities, relations and attributes that encode information about both tasks. Moreover, such attributes are not only central to many predictive tasks as an information source but also as a prediction target. Therefore, models that are able to encode, incorporate and predict such information in a relational learning context are highly attractive as well. We show that our approach outperforms many state-of-the-art methods for the tasks of relational triplet classification and attribute value prediction.

knowledge discovery and data mining | 2018

Multi-Cast Attention Networks

Yi Tay; Luu Anh Tuan; Siu Cheung Hui

Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by 9%. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset.

arXiv: Computation and Language | 2018