Yonghong Peng
University of Bradford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yonghong Peng.
Journal of Biomedical Informatics | 2010
Yonghong Peng; Zhi Qing Wu; Jianmin Jiang
This paper presents a novel feature selection approach to deal with issues of high dimensionality in biomedical data classification. Extensive research has been performed in the field of pattern recognition and machine learning. Dozens of feature selection methods have been developed in the literature, which can be classified into three main categories: filter, wrapper and hybrid approaches. Filter methods apply an independent test without involving any learning algorithm, while wrapper methods require a predetermined learning algorithm for feature subset evaluation. Filter and wrapper methods have their, respectively, drawbacks and are complementary to each other in that filter approaches have low computational cost with insufficient reliability in classification while wrapper methods tend to have superior classification accuracy but require great computational power. The approach proposed in this paper integrates filter and wrapper methods into a sequential search procedure with the aim to improve the classification performance of the features selected. The proposed approach is featured by (1) adding a pre-selection step to improve the effectiveness in searching the feature subsets with improved classification performances and (2) using Receiver Operating Characteristics (ROC) curves to characterize the performance of individual features and feature subsets in the classification. Compared with the conventional Sequential Forward Floating Search (SFFS), which has been considered as one of the best feature selection methods in the literature, experimental results demonstrate that (i) the proposed approach is able to select feature subsets with better classification performance than the SFFS method and (ii) the integrated feature pre-selection mechanism, by means of a new selection criterion and filter method, helps to solve the over-fitting problems and reduces the chances of getting a local optimal solution.
Journal of Biomedical Informatics | 2010
Xuezhong Zhou; Yonghong Peng; Baoyan Liu
Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions.
Artificial Intelligence in Medicine | 2006
Yonghong Peng; Bin Yao; Jianmin Jiang
OBJECTIVESnThe presence of microcalcifications (MCs), clusters of tiny calcium deposits that appear as small bright spots in a mammogram, has been considered as a very important indicator for breast cancer diagnosis. Much research has been performed for developing computer-aided systems for the accurate identification of MCs, however, the computer-based automatic detection of MCs has been shown difficult because of the complicated nature of surrounding of breast tissue, the variation of MCs in shape, orientation, brightness and size.nnnMETHODS AND MATERIALSnThis paper presents a new approach for the effective detection of MCs by incorporating a knowledge-discovery mechanism in the genetic algorithm (GA). In the proposed approach, called knowledge-discovery incorporated genetic algorithm (KD-GA), the genetic algorithm is used to search for the bright spots in mammogram and a knowledge-discovery mechanism is integrated to improve the performance of the GA. The function of the knowledge-discovery mechanism includes evaluating the possibility of a bright spot being a true MC, and adaptively adjusting the associated fitness values. The adjustment of fitness is to indirectly guide the GA to extract the true MCs and eliminate the false MCs (FMCs) accordingly.nnnRESULTS AND CONCLUSIONSnThe experimental results demonstrate that the incorporation of knowledge-discovery mechanism into the genetic algorithm is able to eliminate the FMCs and produce improved performance comparing with the conventional GA methods. Furthermore, the experimental results show that the proposed KD-GA method provides a promising and generic approach for the development of computer-aided diagnosis for breast cancer.
Artificial Intelligence in Medicine | 2010
Yonghong Peng; Yufeng Zhang; Lipo Wang
The advances of high-throughput biotechnologies have shifted the focus of biomedical science from studying individual molecules towards analysing the interactions of the complex molecular and cellular networks that control whole biological systems. This greatly fosters the collaborative interactions between engineering, informatics, and biomedical science, and prompts the emergence of systems biology and systems medicine that aims to understand how the individual components of a biological system interact in time and space to determine the functioning of the system and how an appropriate approach can be developed for the effective treatment of diseases. A general framework of systems biology and medicine consists of (1) the identification of system elements and components; (2) the modeling of systems: the development of appropriate models that represent the physical and functional structure of biological systems and the complex interactions within the system; (3) the understanding of systems: the extraction of emergent properties of biological systems by means of analysis of the structural properties and dynamics of systems; (4) the control of systems: the identification of control targets and the development of appropriate approaches that regulate the behaviour of the targets, which helps the development of effective treatments for complex diseases. Artificial intelligence (AI) is an area of computer science, which has been developed since the 1950s, specialising in dealing with problems considered difficult by traditional computer scientists through the use of knowledge and of probabilities and other kinds of uncertainties. AI has been playing many different roles in scientific research and the literature has shown that AI is promising in solving complex problems in many applications, particularly in areas with huge amounts of data but very little theory. There is recently a growing interest in the application of AI techniques in biomedical engineering and informatics, ranging from knowledge-based reasoning for disease classification to learning and discovering novel biomedical knowledge for disease treatment.
international conference on pattern recognition | 2008
Zhi Qing Wu; Jianmin Jiang; Yonghong Peng
Many features have been proposed for the detection of microcalcification clusters (MCCs) or classification of benign/malignant MCCs. However, most of them were designed based on the characteristics of MCC. In this paper, 16 features, which have been commonly adopted in many applications, are examined and six new features based on the linear structure are proposed. To evaluate the effectiveness of these six features, 800 suspicious regions detected from 320 full-field mammograms are equally divided into two parts for training and testing respectively. Experiments demonstrate that the area under the receiver operating characteristic (ROC) is increased from 0.86 to 0.89 after the new features are added into the set of feature selection. In the best feature sequence selected by the sequential floating forward search (SFFS) algorithm, the new proposed features take up the half number of features in the sequence.
Evidence-based Complementary and Alternative Medicine | 2015
Yubing Li; Xuezhong Zhou; Runshun Zhang; Yinghui Wang; Yonghong Peng; Jingqing Hu; Qi Xie; Yanxing Xue; Lili Xu; Xiao-Fang Liu; Baoyan Liu
Background. Traditional Chinese medicine (TCM) is an individualized medicine by observing the symptoms and signs (symptoms in brief) of patients. We aim to extract the meaningful herb-symptom relationships from large scale TCM clinical data. Methods. To investigate the correlations between symptoms and herbs held for patients, we use four clinical data sets collected from TCM outpatient clinical settings and calculate the similarities between patient pairs in terms of the herb constituents of their prescriptions and their manifesting symptoms by cosine measure. To address the large-scale multiple testing problems for the detection of herb-symptom associations and the dependence between herbs involving similar efficacies, we propose a network-based correlation analysis (NetCorrA) method to detect the herb-symptom associations. Results. The results show that there are strong positive correlations between symptom similarity and herb similarity, which indicates that herb-symptom correspondence is a clinical principle adhered to by most TCM physicians. Furthermore, the NetCorrA method obtains meaningful herb-symptom associations and performs better than the chi-square correlation method by filtering the false positive associations. Conclusions. Symptoms play significant roles for the prescriptions of herb treatment. The herb-symptom correspondence principle indicates that clinical phenotypic targets (i.e., symptoms) of herbs exist and would be valuable for further investigations.
Frontiers of Medicine in China | 2014
Xuezhong Zhou; Yubing Li; Yonghong Peng; Jingqing Hu; Runshun Zhang; Liyun He; Yinghui Wang; Lijie Jiang; Shiyan Yan; Peng Li; Qi Xie; Baoyan Liu
Traditional Chinese medicine (TCM) investigates the clinical diagnosis and treatment regularities in a typical schema of personalized medicine, which means that individualized patients with same diseases would obtain distinct diagnosis and optimal treatment from different TCM physicians. This principle has been recognized and adhered by TCM clinical practitioners for thousands of years. However, the underlying mechanisms of TCM personalized medicine are not fully investigated so far and remained unknown. This paper discusses framework of TCM personalized medicine in classic literatures and in real-world clinical settings, and investigates the underlying mechanisms of TCM personalized medicine from the perspectives of network medicine. Based on 246 well-designed outpatient records on insomnia, by evaluating the personal biases of manifestation observation and preferences of herb prescriptions, we noted significant similarities between each herb prescriptions and symptom similarities between each encounters. To investigate the underlying mechanisms of TCM personalized medicine, we constructed a clinical phenotype network (CPN), in which the clinical phenotype entities like symptoms and diagnoses are presented as nodes and the correlation between these entities as links. This CPN is used to investigate the promiscuous boundary of syndromes and the co-occurrence of symptoms. The small-world topological characteristics are noted in the CPN with high clustering structures, which provide insight on the rationality of TCM personalized diagnosis and treatment. The investigation on this network would help us to gain understanding on the underlying mechanism of TCM personalized medicine and would propose a new perspective for the refinement of the TCM individualized clinical skills.
BioMed Research International | 2014
Xing Li; Xuezhong Zhou; Yonghong Peng; Baoyan Liu; Runshun Zhang; Jingqing Hu; Jian Yu; Caiyan Jia; Changkai Sun
Background. Symptoms and signs (symptoms in brief) are the essential clinical manifestations for individualized diagnosis and treatment in traditional Chinese medicine (TCM). To gain insights into the molecular mechanism of symptoms, we develop a computational approach to identify the candidate genes of symptoms. Methods. This paper presents a network-based approach for the integrated analysis of multiple phenotype-genotype data sources and the prediction of the prioritizing genes for the associated symptoms. The method first calculates the similarities between symptoms and diseases based on the symptom-disease relationships retrieved from the PubMed bibliographic database. Then the disease-gene associations and protein-protein interactions are utilized to construct a phenotype-genotype network. The PRINCE algorithm is finally used to rank the potential genes for the associated symptoms. Results. The proposed method gets reliable gene rank list with AUC (area under curve) 0.616 in classification. Some novel genes like CALCA, ESR1, and MTHFR were predicted to be associated with headache symptoms, which are not recorded in the benchmark data set, but have been reported in recent published literatures. Conclusions. Our study demonstrated that by integrating phenotype-genotype relationships into a complex network framework it provides an effective approach to identify candidate genes of symptoms.
knowledge acquisition, modeling and management | 2004
Bin Yao; Jianmin Jiang; Yonghong Peng
In this paper, we propose a CBR driven genetic algorithm to detect microcalcification clusters in digital mammograms towards computer-aided breast cancer screening. While being embedded inside the genetic algorithm, the CBR is performed as an “evaluator” and a “guide” in the proposed GA system. To form a base of cases, we adopted a competitive learning neural network to organize the MC feature vectors to construct the cases. Experiments are carried out under the DDSM database and the performances of the proposed algorithm are evaluated by the FROC curve, which show that the CBR driven genetic algorithm can achieve 98% accuracy at a low cost of false detection rate. Even in dense mammograms, the system can still detect the MCs correctly.
Archive | 2014
Xuezhong Zhou; Baoyan Liu; Xiaoping Zhang; Qi Xie; Runshun Zhang; Yinghui Wang; Yonghong Peng
Real-world clinical setting is the major arena of traditional Chinese medicine (TCM) as it has experienced long-term practical clinical activities, and developed established theoretical knowledge and clinical solutions suitable for personalized treatment. Clinical phenotypes have been the most important features captured by TCM for diagnoses and treatment, which are diverse and dynamically changeable in real-world clinical settings. Together with clinical prescription with multiple herbal ingredients for treatment, TCM clinical activities embody immense valuable data with high dimensionalities for knowledge distilling and hypothesis generation. In China, with the curation of large-scale real-world clinical data from regular clinical activities, transforming the data to clinical insightful knowledge has increasingly been a hot topic in TCM field. This chapter introduces the application of data warehouse techniques and data mining approaches for utilizing real-world TCM clinical data, which is mainly from electronic medical records. The main framework of clinical data mining applications in TCM field is also introduced with emphasizing on related work in this field. The key points and issues to improve the research quality are discussed and future directions are proposed.