Is this you? Create Your Porfile

Mai-Vu Tran

Vietnam National University, Hanoi

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mai-Vu Tran is active.

Explore More

Publication

Featured researches published by Mai-Vu Tran.

PLOS ONE | 2013

Learning to Recognize Phenotype Candidates in the Auto-Immune Literature Using SVM Re-Ranking

Nigel Collier; Mai-Vu Tran; Hoang-Quynh Le; Quang-Thuy Ha; Anika Oellrich; Dietrich Rebholz-Schuhmann

The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain.

Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) | 2014

The impact of near domain transfer on biomedical named entity recognition

Nigel Collier; Mai-Vu Tran; Ferdinand Paster

Current research in fully supervised biomedical named entity recognition (bioNER) is often conducted in a setting of low sample sizes. Whilst experimental results show strong performance in-domain it has been recognised that quality suffers when models are applied to heterogeneous text collections. However the causal factors have until now been uncertain. In this paper we describe a controlled experiment into near domain bias for two Medline corpora on hereditary diseases. Five strategies are employed for mitigating the impact of near domain transference including simple transference, pooling, stacking, class re-labeling and feature augmentation. We measure their effect on f-score performance against an in domain baseline. Stacking and feature augmentation mitigate f-score loss but do not necessarily result in superior performance except for selected classes. Simple pooling of data across domains failed to exploit size effects for most classes. We conclude that we can expect lower performance and higher annotation costs if we do not adequately compensate for the distributional dissimilarities of domains during learning.

knowledge and systems engineering | 2012

VnLoc: A Real -- Time News Event Extraction Framework for Vietnamese

Mai-Vu Tran; Minh Hoang Nguyen; Sy-Quan Nguyen; Minh-Tien Nguyen; Xuan-Hieu Phan

Event Extraction is a complex and interesting topic in Information Extraction that includes event extraction methods from free text or web data. The result of event extraction systems can be used in several fields such as risk analysis systems, online monitoring systems or decide support tools. In this paper, we introduce a method that combines lexico -- semantic and machine learning to extract event from Vietnamese news. Furthermore, we concentrate to describe event online monitoring system named VnLoc based on the method that was proposed above to extract event in Vietnamese language. Besides, in experiment phase, we have evaluated this method based on precision, recall and F1 measure. At this time of experiment, we on investigated on three types of event: FIRE, CRIME and TRANSPORT ACCIDENT.

international conference on asian language processing | 2011

An Integrated Approach Using Conditional Random Fields for Named Entity Recognition and Person Property Extraction in Vietnamese Text

Hoang-Quynh Le; Mai-Vu Tran; Nhat-Nam Bui; Nguyen-Cuong Phan; Quang-Thuy Ha

Personal names are among one of the most frequently searched items in web search engines and a person entity is always associated with numerous properties. In this paper, we propose an integrated model to recognize person entity and extract relevant values of a pre-defined set of properties related to this person simultaneously for Vietnamese. We also design a rich feature set by using various kind of knowledge resources and a apply famous machine learning method CRFs to improve the results. The obtained results show that our method is suitable for Vietnamese with the average result is 84 % of precision, 82.56% of recall and 83.39 % of F-measure. Moreover, performance time is pretty good, and the results also show the effectiveness of our feature set.

asia-pacific services computing conference | 2011

A Solution for Grouping Vietnamese Synonym Feature Words in Product Reviews

Huyen-Trang Pham; Tien-Thanh Vu; Mai-Vu Tran; Quang-Thuy Ha

Feature-based opinion mining is an interesting opinion mining issue. For this problem, feature words/phrases are discovered at sentence level. However, customers usually use different words/phrases referring to the same feature in reviews. To produce a meaningful summary, synonym feature words/phrases in domain, need to be grouped under the same feature. This paper proposes a solution for grouping synonym features in Vietnamese customer reviews based on semi-supervised SVM-kNN classification and HAC clustering. Experimental results on reviews in mobile phone domain demonstrate that the proposed method is promising for the task. The Purity, Accuracy measures are 0.68 and 0.65 respectively.

international conference on asian language processing | 2010

User Interest Analysis with Hidden Topic in News Recommendation System

Mai-Vu Tran; Xuan-Tu Tran; Huy-Long Uong

To take advantage of the Internet - vast but complicated information resources, Recommendation systems help users find out information they need by providing them personalized suggestions. This research area is receiving more and more attention from researchers and used in some famous websites like EBay, Amazon, etc. In this paper, we proposed a Recommendation System for Vietnamese electronic newspaper which uses content-based filtering techniques associating with the attention of users shown in user’s profile. These users’ attentions are determined by inferring a set of common Hidden Topics from the documents which users preferred. Experimental results showed that approach is feasible with positive results and its capabilities for reality development.

international conference on asian language processing | 2011

Co-reference Resolution in Vietnamese Documents Based on Support Vector Machines

Duc-Trong Le; Mai-Vu Tran; Tri-Thanh Nguyen; Quang-Thuy Ha

Co-reference resolution task still poses many challenges due to the complexity of the Vietnamese language, and the lack of standard Vietnamese linguistic resources. Based on the mention-pair model of Rahman and Ng. (2009) and the characteristics of Vietnamese, this paper proposes a model using support vector machines (SVM) to solve the co-reference in Vietnamese documents. The corpus used in experiments to evaluate the proposed model was constructed from 200 articles in cultural and social categories from vnexpress.net newspaper website. The results of the initial experiments of the proposed model achieved 76.51% accuracy in comparison with that of the baseline model of 73.79% with similar features.

ICCSAMA | 2015

A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

Ngoc Trinh Vu; Van-Hien Tran; Thi-Huyen-Trang Doan; Hoang-Quynh Le; Mai-Vu Tran

Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, class-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.

international conference on computational linguistics | 2012