Meijing Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Meijing Li is active.

Explore More

Publication

Featured researches published by Meijing Li.

Journal of Cheminformatics | 2015

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

Tsendsuren Munkhdalai; Meijing Li; Khuyagbaatar Batsuren; Hyeon Ah Park; Nak Hyeon Choi; Keun Ho Ryu

BackgroundChemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature.We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data.ResultsWe extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface.BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.

Journal of Information Processing Systems | 2013

A Feature Selection-based Ensemble Method for Arrhythmia Classification

Erdenetuya Namsrai; Tsendsuren Munkhdalai; Meijing Li; Jung-Hoon Shin; Oyun-Erdene Namsrai; Keun Ho Ryu

In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

Journal of Information Science | 2014

MapReduce-based web mining for prediction of web-user navigation

Meijing Li; Xiuming Yu; Keun Ho Ryu

Predicting web user behaviour is typically an application for finding frequent sequence patterns. With the rapid growth of the Internet, a large amount of information is stored in web logs. Traditional frequent-sequence-pattern-mining algorithms are hard pressed to analyse information from within big datasets. In this paper, we propose an efficient way to predict navigation patterns of web users by improving frequent-sequence-pattern-mining algorithms based on the programming model of MapReduce, which can handle huge datasets efficiently. During the experiments, we show that our proposed MapReduce-based algorithm is more efficient than traditional frequent-sequence-pattern-mining algorithms, and by comparing our proposed algorithms with current existed algorithms in web-usage mining, we also prove that using the MapReduce programming model saves time.

advanced information networking and applications | 2012

Bio Named Entity Recognition Based on Co-training Algorithm

Tsendsuren Munkhdalai; Meijing Li; Taewook Kim; Oyun-Erdene Namsrai; Seon-Phil Jeong; Jungpil Shin; Keun Ho Ryu

One essential task in extracting information from biomedical literature is the bio Named Entity Recognition (NER) process, which basically defines the boundaries between typical words and biomedical terminology in particular text data, and assigns them based on domain knowledge. This paper presents a semi supervised integration of completely different classifiers to cover knowledge from unlabeled data to recognize bio named entities in text. We modified the original co-training, a semi supervised learning algorithm, with a scalable feature processing schema, which extracts the bio NER feature from a number of unlabeled data and converts different types of feature sets. Our base result shows that the classifiers of co-training achieve significant learning from unlabeled data.

Journal of Information Processing Systems | 2012

An Active Co-Training Algorithm for Biomedical Named-Entity Recognition

Tsendsuren Munkhdalai; Meijing Li; Unil Yun; Oyun-Erdene Namsrai; Keun Ho Ryu

Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-by-committee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of f-measure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.

international conference of the ieee engineering in medicine and biology society | 2012

Trigger Learning and ECG Parameter Customization for Remote Cardiac Clinical Care Information System

Mohamed Ezzeldin A. Bashir; Dong Gyu Lee; Meijing Li; Jang-Whan Bae; Ho Sun Shon; Myung Chan Cho; Keun Ho Ryu

Coronary heart disease is being identified as the largest single cause of death along the world. The aim of a cardiac clinical information system is to achieve the best possible diagnosis of cardiac arrhythmias by electronic data processing. Cardiac information system that is designed to offer remote monitoring of patient who needed continues follow up is demanding. However, intra- and interpatient electrocardiogram (ECG) morphological descriptors are varying through the time as well as the computational limits pose significant challenges for practical implementations. The former requires that the classification model be adjusted continuously, and the latter requires a reduction in the number and types of ECG features, and thus, the computational burden, necessary to classify different arrhythmias. We propose the use of adaptive learning to automatically train the classifier on up-to-date ECG data, and employ adaptive feature selection to define unique feature subsets pertinent to different types of arrhythmia. Experimental results show that this hybrid technique outperforms conventional approaches and is, therefore, a promising new intelligent diagnostic tool.

Archive | 2012

Application of Closed Gap-Constrained Sequential Pattern Mining in Web Log Data

Xiuming Yu; Meijing Li; Dong Gyu Lee; Kwang Deuk Kim; Keun Ho Ryu

Discovery of information in web log data is a very popular research area in the field of data mining. Two of the objectives of favorite applications are to obtain useful information of web users’ behavior and to analyze the structure of web sites. In this paper, we suggest a novel approach to generate web sequential patterns using the gap-constrained method in web log data. The process of mining task in the proposed approach is described as follows. First, pre-process of the raw web log data is introduced by removing irrelevant or redundant items, gathering the same users and transforming the web log data into a set of tuples (sequence identifier, sequence) constrained by visiting time. Second, web access patterns, which are closed sequential patterns with gap constraints, are generated using the Gap-BIDE algorithm in web log data with two parameters, minimum support threshold and gap constraint. In the experiment, a data set is derived from http://www.vtsns.edu.rs/maja/, which is proposed in [1]. The result shows that, with the application of sequential pattern mining in the web log data presented in this paper, we can find information about navigational behavior of web users and the structure of the web page can be designed more legitimately by the order of obtained patterns.

database and expert systems applications | 2012

Prediction of Web User Behavior by Discovering Temporal Relational Rules from Web Log Data

Xiuming Yu; Meijing Li; Incheon Paik; Keun Ho Ryu

The Web has become a very popular and interactive medium in our lives. With the rapid development and proliferation of e-commerce and Web-based information systems, web mining has become an essential tool for discovering specific information on the Web. There are a lot of previous web mining techniques have been proposed. In this paper, an approach of temporal interval relational rule mining is applied to discover knowledge from web log data. Comparing our proposed approach and previous web mining techniques, the attribute of timestamp in web log data is considered in our approach. Firstly, temporal intervals of accessing web pages are formed by folding over a periodicity. And then discovery of relational rules is performed based on constraint of these temporal intervals. In the experiment, we analyze the result of relational rules and the effect of important parameters used in the mining approach.

Mathematical Problems in Engineering | 2015

A Novel Approach for Protein-Named Entity Recognition and Protein-Protein Interaction Extraction

Meijing Li; Tsendsuren Munkhdalai; Xiuming Yu; Keun Ho Ryu

Many researchers focus on developing protein-named entity recognition (Protein-NER) or PPI extraction systems. However, the studies about these two topics cannot be merged well; then existing PPI extraction systems’ Protein-NER still needs to improve. In this paper, we developed the protein-protein interaction extraction system named PPIMiner based on Support Vector Machine (SVM) and parsing tree. PPIMiner consists of three main models: natural language processing (NLP) model, Protein-NER model, and PPI discovery model. The Protein-NER model, which is named ProNER, identifies the protein names based on two methods: dictionary-based method and machine learning-based method. ProNER is capable of identifying more proteins than dictionary-based Protein-NER model in other existing systems. The final discovered PPIs extracted via PPI discovery model are represented in detail because we showed the protein interaction types and the occurrence frequency through two different methods. In the experiments, the result shows that the performances achieved by our ProNER and PPI discovery model are better than other existing tools. PPIMiner applied this protein-named entity recognition approach and parsing tree based PPI extraction method to improve the performance of PPI extraction. We also provide an easy-to-use interface to access PPIs database and an online system for PPIs extraction and Protein-NER.

Osong public health and research perspectives | 2014

A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs

Feifei Li; Minghao Piao; Yongjun Piao; Meijing Li; Keun Ho Ryu

Objectives Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. Methods We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearsons correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. Results The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Conclusion Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.

Explore More