Duy Dinh
University of Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Duy Dinh.
acm symposium on applied computing | 2011
Duy Dinh; Lynda Tamine
It is well known that the main objective of conceptual retrieval models is to go beyond simple term matching by relaxing term independence assumption through concept recognition. In this paper, we present an approach of semantic indexing and retrieval of biomedical documents through the process of identifying domain concepts extracted from the Medical Subject Headings (MeSH) thesaurus. Our indexing approach relies on a purely statistical vector space model, which represents medical documents and MeSH concepts as term vectors. By leveraging a combination of the bag-of-words concept representation and word positions in the textual features, we demonstrate that our mapping method is able to extract valuable concepts from documents. The output of this semantic mapping serves as the input to our relevance document scoring in response to a query. Experiments on the OHSUMED collection show that our semantic indexing method significantly outperforms state-of-art baselines that employ word or term statistics.
european conference on information retrieval | 2011
Duy Dinh; Lynda Tamine
In the context of biomedical information retrieval (IR), this paper explores the relationship between the documents global context and the querys local context in an attempt to overcome the term mismatch problem between the user query and documents in the collection. Most solutions to this problem have been focused on expanding the query by discovering its context, either global or local. In a global strategy, all documents in the collection are used to examine word occurrences and relationships in the corpus as a whole, and use this information to expand the original query. In a local strategy, the top-ranked documents retrieved for a given query are examined to determine terms for query expansion. We propose to combine the documents global context and the querys local context in an attempt to increase the term overlap between the user query and documents in the collection via document expansion (DE) and query expansion (QE). The DE technique is based on a statistical method (IR-based) to extract the most appropriate concepts (global context) from each document. The QE technique is based on a blind feedback approach using the top-ranked documents (local context) obtained in the first retrieval stage. A comparative experiment on the TREC 2004 Genomics collection demonstrates that the combination of the documents global context and the querys local context shows a significant improvement over the baseline. The MAP is significantly raised from 0.4097 to 0.4532 with a significant improvement rate of +10.62% over the baseline. The IR performance of the combined method in terms of MAP is also superior to official runs participated in TREC 2004 Genomics and is comparable to the performance of the best run (0.4075).
Journal of Web Semantics | 2012
Duy Dinh; Lynda Tamine
In the context of document retrieval in the biomedical domain, this paper introduces a novel approach to searching for biomedical information using contextual semantic information. More specifically, we propose to combine the contextual semantic information in documents and user queries in an attempt to improve the performance of biomedical information retrieval (IR) systems. Contextual information provides knowledge about a domain in a global context or statistical properties of a sub collection of documents related to a given query in a local context. In our context sensitive IR approach, terms denoting concepts are extracted from each document using several biomedical terminologies. Preferred terms denoting concepts are used to enrich the semantics of the document content via document expansion. The user query is expanded using terms extracted from the top-ranked expanded documents via a blind feedback query expansion approach. In addition, we aim to evaluate the utility of incorporating several terminologies within the proposed context sensitive approach. The experiments carried out on the TREC Genomics 2004 and 2005 test sets show that our context-sensitive IR approach significantly outperforms state-of-the-art baseline approaches.
conference on information and knowledge management | 2013
Julio Cesar Dos Reis; Duy Dinh; Cédric Pruski; Marcos Da Silveira; Chantal Reynaud-Delaître
The highly dynamic nature of domain ontologies has a direct impact on semantic mappings established between concepts from different ontologies. Mappings must therefore be maintained according to ongoing ontology changes. Since many software applications exploit mappings for managing information and knowledge, it is important to define appropriate adaptation strategies to apply to existing mappings in order to keep their validity over time. In this article, we propose a set of mapping adaptation actions and present how they are used to maintain mappings up-to-date based on ontology change operations of different nature. We conduct an experimental evaluation using life sciences ontologies and mappings. We measure the evolution of mappings based on the proposed approach to mapping adaptation. The results confirm that mappings must be individually adapted according to the different types of ontology change.
Artificial Intelligence in Medicine | 2013
Duy Dinh; Lynda Tamine; Fatiha Boubekeur
OBJECTIVE The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. MATERIALS AND METHODS We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniques. We instantiate this general approach on four terminologies (MeSH, SNOMED, ICD-10 and GO). We particularly focus on the effect of integrating terminologies into a biomedical IR process, and the utility of using voting techniques for combining the extracted concepts from each document in order to provide a list of unique concepts. RESULTS Experimental studies conducted on the TREC Genomics collections show that our multi-terminology IR approach based on voting techniques are statistically significant compared to the baseline. For example, tested on the 2005 TREC Genomics collection, our multi-terminology based IR approach provides an improvement rate of +6.98% in terms of MAP (mean average precision) (p<0.05) compared to the baseline. In addition, our experimental results show that document expansion using preferred terms in combination with query expansion using terms from top ranked expanded documents improve the biomedical IR effectiveness. CONCLUSION We have evaluated several voting models for combining concepts issued from multiple terminologies. Through this study, we presented many factors affecting the effectiveness of biomedical IR system including term weighting, query expansion, and document expansion models. The appropriate combination of those factors could be useful to improve the IR performance.
applications of natural language to data bases | 2010
Duy Dinh; Lynda Tamine
This paper tackles the problem of term ambiguity, especially for biomedical literature. We propose and evaluate two methods of Word Sense Disambiguation (WSD) for biomedical terms and integrate them to a sense-based document indexing and retrieval framework. Ambiguous biomedical terms in documents and queries are disambiguated using the Medical Subject Headings (MeSH) thesaurus and semantically indexed with their associated correct sense. The experimental evaluation carried out on the TREC9-FT 2000 collection shows that our approach of WSD and sense-based indexing and retrieval outperforms the baseline.
european semantic web conference | 2014
Duy Dinh; Julio Cesar Dos Reis; Cédric Pruski; Marcos Da Silveira; Chantal Reynaud-Delaître
Ontology versions are periodically released to ensure their usefulness and reliability over time. This potentially impacts dependent artefacts such as mappings and annotations. Dealing with requires to finely characterize ontology entities’ changes between ontology versions. This article proposes to identify change patterns at attribute values when an ontology evolves, to track textual statements describing concepts. We empirically evaluate our approach by using biomedical ontologies, for which new ontology versions are frequently released. Our achieved results suggest the feasibility of the proposed techniques.
artificial intelligence in medicine in europe | 2011
Duy Dinh; Lynda Tamine
We are interested in retrieving relevant information from biomedical documents according to healthcare professionals information needs. It is well known that biomedical documents are indexed using conceptual descriptors issued from terminologies for a better retrieval performance. Our attempt to develop a conceptual retrieval framework relies on the hypothesis that there are several broad categories of knowledge that could be captured from different terminologies and processed by retrieval algorithms. With this in mind, we propose a multiterminology based indexing approach for selecting the best representative concepts for each document. We instantiate this general approach on four terminologies namely MeSH (Medical Subject Headings), SNOMED (Systematized Nomenclature of Medicine), ICD-10 (International Classification of Diseases) and GO (Gene Ontology). Experimental studies were conducted on large and official document test collections of real world clinical queries and associated judgments extracted from MEDLINE scientific collections, namely TREC Genomics 2004 & 2005. The obtained results demonstrate the advantages of our multi-terminology based biomedical information retrieval approach over state-of-the art approaches.
Artificial Intelligence in Medicine | 2015
Julio Cesar Dos Reis; Duy Dinh; Marcos Da Silveira; Cédric Pruski; Chantal Reynaud-Delaître
BACKGROUND Mappings established between life science ontologies require significant efforts to maintain them up to date due to the size and frequent evolution of these ontologies. In consequence, automatic methods for applying modifications on mappings are highly demanded. The accuracy of such methods relies on the available description about the evolution of ontologies, especially regarding concepts involved in mappings. However, from one ontology version to another, a further understanding of ontology changes relevant for supporting mapping adaptation is typically lacking. METHODS This research work defines a set of change patterns at the level of concept attributes, and proposes original methods to automatically recognize instances of these patterns based on the similarity between attributes denoting the evolving concepts. This investigation evaluates the benefits of the proposed methods and the influence of the recognized change patterns to select the strategies for mapping adaptation. RESULTS The summary of the findings is as follows: (1) the Precision (>60%) and Recall (>35%) achieved by comparing manually identified change patterns with the automatic ones; (2) a set of potential impact of recognized change patterns on the way mappings is adapted. We found that the detected correlations cover ∼66% of the mapping adaptation actions with a positive impact; and (3) the influence of the similarity coefficient calculated between concept attributes on the performance of the recognition algorithms. CONCLUSIONS The experimental evaluations conducted with real life science ontologies showed the effectiveness of our approach to accurately characterize ontology evolution at the level of concept attributes. This investigation confirmed the relevance of the proposed change patterns to support decisions on mapping adaptation.
Information Systems and E-business Management | 2015
Duy Dinh; Lynda Tamine
With the explosive growth of biomedical information volumes, there is obviously an increasing need for developing effective and efficient tools for indexing and retrieval. Automatic indexing and retrieval in the biomedical domain is faced with several challenges such as recognition of terms denoting concepts and term disambiguation. In this paper, we are interested in identifying (sub-)domains of concepts in ontologies. We propose two algorithms for identifying the most appropriate (sub-)domain of a concept in the context of a document/query. We integrate these methods into a semantic indexing and retrieval framework. The experimental evaluation carried out on the OHSUMED collection shows that our approaches of semantic indexing and retrieval outperform the state-of-the-art approach.