Nam Khanh Tran
Leibniz University of Hanover
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nam Khanh Tran.
web search and data mining | 2015
Nam Khanh Tran; Andrea Ceroni; Nattiya Kanhabua; Claudia Niederée
Fully understanding an older news article requires context knowledge from the time of article creation. Finding information about such context is a tedious and time-consuming task, which distracts the reader. Simple contextualization via Wikification is not sufficient here. The retrieved context information has to be time-aware, concise (not full Wikipages) and focused on the coherence of the article topic. In this paper, we present an approach for time-aware recontextualization, which takes those requirements into account in order to improve reading experience. For this purpose, we propose (1) different query formulation methods for retrieving contextualization candidates and (2) ranking methods taking into account topical and temporal relevance as well as complementarity with respect to the original text. We evaluate our proposed approaches through extensive experiments using real-world datasets and ground-truth consisting of over 9,400 article/context pairs. To this end, our experimental results show that our approaches retrieve contextualization information for older articles from the New York Times Archive with high precision and outperform baselines significantly.
european semantic web conference | 2017
Nam Khanh Tran; Tuan A. Tran; Claudia Niederée
Entities and their relatedness are useful information in various tasks such as entity disambiguation, entity recommendation or search. In many cases, entity relatedness is highly affected by dynamic contexts, which can be reflected in the outcome of different applications. However, the role of context is largely unexplored in existing entity relatedness measures. In this paper, we introduce the notion of contextual entity relatedness, and show its usefulness in the new yet important problem of context-aware entity recommendation. We propose a novel method of computing the contextual relatedness with integrated time and topic models. By exploiting an entity graph and enriching it with an entity embedding method, we show that our proposed relatedness can effectively recommend entities, taking contexts into account. We conduct large-scale experiments on a real-world data set, and the results show considerable improvements of our solution over the states of the art.
international conference theory and practice digital libraries | 2016
Nattiya Kanhabua; Philipp Kemkes; Wolfgang Nejdl; Tu Ngoc Nguyen; Felipe Reis; Nam Khanh Tran
Significant parts of cultural heritage are produced on the web during the last decades. While easy accessibility to the current web is a good baseline, optimal access to the past web faces several challenges. This includes dealing with large-scale web archive collections and lacking of usage logs that contain implicit human feedback most relevant for today’s web search. In this paper, we propose an entity-oriented search system to support retrieval and analytics on the Internet Archive. We use Bing to retrieve a ranked list of results from the current web. In addition, we link retrieved results to the WayBack Machine; thus allowing keyword search on the Internet Archive without processing and indexing its raw archived content. Our search system complements existing web archive search tools through a user-friendly interface, which comes close to the functionalities of modern web search engines (e.g., keyword search, query auto-completion and related query suggestion), and provides a great benefit of taking user feedback on the current web into account also for web archive search. Through extensive experiments, we conduct quantitative and qualitative analyses in order to provide insights that enable further research on and practical applications of web archives.
international world wide web conferences | 2015
Nam Khanh Tran; Andrea Ceroni; Nattiya Kanhabua; Claudia Niederée
Fully understanding an older news article requires context knowledge from the time of article creation. Finding information about such context is a tedious and time-consuming task, which distracts the reader. Simple contextualization via Wikification is not sufficient here. The retrieved context information has to be time-aware, concise (not full Wikipages) and focused on the coherence of the article topic. In this paper, we present Contextualizer, a web-based system that acquires additional information for supporting interpretations of a news article of interest that requires a mapping, in this case, a kind of time-travel translation between present context knowledge and context knowledge at time of text creation. For a given article, the system provides a GUI that allows users to highlight their interested keywords which are then used to construct appropriate queries for retrieving contextualization candidates. Contextualizer exploits different kinds of information such as temporal similarity and textual complementarity to re-rank the candidates and presents to users in a friendly and interactive web-based interface.
international acm sigir conference on research and development in information retrieval | 2014
Andrea Ceroni; Nam Khanh Tran; Nattiya Kanhabua; Claudia Niederée
Understanding a text, which was written some time ago, can be compared to translating a text from another language. Complete interpretation requires a mapping, in this case, a kind of time-travel translation between present context knowledge and context knowledge at time of text creation. In this paper, we study time-aware re-contextualization, the challenging problem of retrieving concise and complementing information in order to bridge this temporal context gap. We propose an approach based on learning to rank techniques using sentence-level context information extracted from Wikipedia. The employed ranking combines relevance, complimentarity and time-awareness. The effectiveness of the approach is evaluated by contextualizing articles from a news archive collection using more than 7,000 manually judged relevance pairs. To this end, we show that our approach is able to retrieve a significant number of relevant context information for a given news article.
international conference theory and practice digital libraries | 2013
Nam Khanh Tran; Sergej Zerr; Kerstin Bischoff; Claudia Niederée; Ralf Krestel
Topic modeling has gained a lot of popularity as a means for identifying and describing the topical structure of textual documents and whole corpora. There are, however, many document collections such as qualitative studies in the digital humanities that cannot easily benefit from this technology. The limited size of those corpora leads to poor quality topic models. Higher quality topic models can be learned by incorporating additional domain-specific documents with similar topical content. This, however, requires finding or even manually composing such corpora, requiring considerable effort. For solving this problem, we propose a fully automated adaptable process of topic cropping. For learning topics, this process automatically tailors a domain-specific Cropping corpus from a general corpus such as Wikipedia. The learned topic model is then mapped to the working corpus via topic inference. Evaluation with a real world data set shows that the learned topics are of higher quality than those learned from the working corpus alone. In detail, we analyzed the learned topics with respect to coherence, diversity, and relevance.
international world wide web conferences | 2014
Nam Khanh Tran
In the past, various studies have been proposed to acquire the capacity to perceive and comprehend language in articles or human communications. Recently, researchers focus on higher semantic levels to what human would need to understand the contents of articles. While human can smoothly interpret documents when they have knowledge of the context of documents, they have difficulty with those as their context is lost or changes. In this PhD proposal, we address three novel research questions: detecting uninterpretable pieces in documents, retrieving contextual information and constructing compact context for the documents, then propose approaches to these tasks, and discuss related issues.
international acm sigir conference on research and development in information retrieval | 2018
Nam Khanh Tran; Claudia Niedereée
Attention based neural network models have been successfully applied in answer selection, which is an important subtask of question answering (QA). These models often represent a question by a single vector and find its corresponding matches by attending to candidate answers. However, questions and answers might be related to each other in complicated ways which cannot be captured by single-vector representations. In this paper, we propose Multihop Attention Networks (MAN) which aim to uncover these complex relations for ranking question and answer pairs. Unlike previous models, we do not collapse the question into a single vector, instead we use multiple vectors which focus on different parts of the question for its overall semantic representation and apply multiple steps of attention to learn representations for the candidate answers. For each attention step, in addition to common attention mechanisms, we adopt sequential attention which utilizes context information for computing context-aware attention weights. Via extensive experiments, we show that MAN outperforms state-of-the-art approaches on popular benchmark QA datasets. Empirical studies confirm the effectiveness of sequential attention over other attention mechanisms.
Archive | 2018
Mark A. Greenwood; Nam Khanh Tran; Konstantinos Apostolidis; Vasileios Mezaris
Without context, words have no meaning, and the same is true for documents, in that often a wider context is required to fully interpret the information they contain. For example, a family photo is practically useless if you do not know who the people portrayed in it are, and likewise, a document that refers to the president of the US is of little use without knowing who held the job at the time the document was written. This becomes even more important when considering the long-term preservation of documents, as not only is human memory fallible, but over long periods the people accessing the documents will change (e.g. photos passed down through generations), as will their understanding and knowledge of the world. While preserving the context associated with a document is an important first step in ensuring information remains useful over long periods of time, we also need to consider how information evolves. Over any significant time period, the meaning of information changes. This evolution can range from changes in the meaning of individual words to more general terms or concepts, such as who holds a specific position in an organization. In this chapter, we look in detail at all of these challenges and describe the development of a conceptual framework in which context information can be collected, preserved, evolved and used to access and interpret documents. A number of techniques are presented showing real examples of context in action that fit within the framework, and applying to both text documents and image collections.
Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18 | 2018
Nam Khanh Tran; Claudia Niederée
In this paper, we present a neural network based framework for answering non-factoid questions. The framework consists of two main components: Answer Retriever and Answer Ranker. In the first component, we leverage off-the-shelf retrieval models (e.g. bm25) to retrieve a pool of candidate answers regarding to the input question. Answer Ranker is then used to select the most suitable answer. In this work, we adopt two typical deep learning based frameworks for our Answer Ranker component. One is based on Siamese architecture and the other is the Compare-Aggregate framework. The Answer Ranker component is evaluated separately based on popular answer selection datasets. Our overall system is evaluated using FiQA dataset, a newly released dataset for financial domain and shows promising results.