Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Tian is active.

Publication


Featured researches published by David Tian.


International Journal of Approximate Reasoning | 2011

Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification

David Tian; Xiao-Jun Zeng; John A. Keane

Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst retaining important ones that preserve the classification power of the original dataset. Reducts are feature subsets selected by RSFS. Core is the intersection of all the reducts of a dataset. RSFS can only handle discrete attributes, hence, continuous attributes need to be discretized before being input to RSFS. Discretization determines the core size of a discrete dataset. However, current discretization methods do not consider the core size during discretization. Earlier work has proposed core-generating approximate minimum entropy discretization (C-GAME) algorithm which selects the maximum number of minimum entropy cuts capable of generating a non-empty core within a discrete dataset. The contributions of this paper are as follows: (1) the C-GAME algorithm is improved by adding a new type of constraint to eliminate the possibility that only a single reduct is present in a C-GAME-discrete dataset; (2) performance evaluation of C-GAME in comparison to C4.5, multi-layer perceptrons, RBF networks and k-nearest neighbours classifiers on ten datasets chosen from the UCI Machine Learning Repository; (3) performance evaluation of C-GAME in comparison to Recursive Minimum Entropy Partition (RMEP), Chimerge, Boolean Reasoning and Equal Frequency discretization algorithms on the ten datasets; (4) evaluation of the effects of C-GAME and the other four discretization methods on the sizes of reducts; (5) an upper bound is defined on the total number of reducts within a dataset; (6) the effects of different discretization algorithms on the total number of reducts are analysed; (7) performance analysis of two RSFS algorithms (a genetic algorithm and Johnsons algorithm).


systems, man and cybernetics | 2013

An Ontology for Clinical Trial Data Integration

Ratnesh Sahay; Dimitrios Ntalaperas; Eleni Kamateri; Panagiotis Hasapis; Oya Deniz Beyan; Marie-Pierre F. Strippoli; Christiana A. Demetriou; Thomai Gklarou-Stavropoulou; Matthias Brochhausen; Konstantinos A. Tarabanis; Thanassis Bouras; David Tian; Aristos Aristodimoux; Athos Antoniadesx; Christos Georgousopoulos; Manfred Hauswirth; Stefan Decker

A set of well-integrated clinical terminologies is at the core of delivering an efficient clinical trial system. The design and outcomes of a clinical trial can be improved significantly through an unambiguous and consistent set of clinical terminologies used in a participating clinical institute. However, due to lack of generalised legal and technical standards, heterogeneity exists between prominent clinical terminologies as well as within and between clinical systems at several levels, e.g., data, schema, and medical codes. This article specifically addresses the problem of integrating local or proprietary clinical terminologies with the globally defined universal concepts or terminologies. To deal with the problem of ambiguous, inconsistent, and overlapping clinical terminologies, domain and knowledge representation specialists have been repeatedly advocated the use of formal ontologies. We address two key challenges in developing an ontology-based clinical terminology (1) an ontology building methodology for clinical terminologies that are separated in global and local layers, and (2) aligning global and local clinical terminologies. We present Semantic Electronic Health Record (SEHR) ontology that covers multiple sub-domains of Healthcare and Life Sciences (HCLS) through specialisation of the upper-level Basic Formal Ontology (BFO). One of the main features of SEHR is layering and adaptation of local clinical terminologies with the upper-level BFO. Our empirical evaluation shows an agreement of clinical experts confirming SEHRs usability in clinical trials.


Transactions on Rough Sets | 2011

Core-generating discretization for rough set feature selection

David Tian; Xiao-Jun Zeng; John A. Keane

Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are called reducts. The intersection of all reducts is called core. However, RSFS handles discrete attributes only. To process datasets consisting of real attributes, they are discretized before applying RSFS. Discretization controls core of the discrete dataset. Moreover, core may critically affect the classification performance of reducts. This paper defines core-generating discretization, a type of discretization method; analyzes the properties of core-generating discretization; models core-generating discretization using constraint satisfaction; defines core-generating approximate minimum entropy (C-GAME) discretization; models C-GAME using constraint satisfaction and evaluates the performance of C-GAME as a pre-processor of RSFS using ten datasets from the UCI Machine Learning Repository.


ieee international conference on fuzzy systems | 2007

Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation

David Tian; John A. Keane; Xiao-Jun Zeng

Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are termed reducts. The intersection of all reducts is termed the core. As RSFS works on discrete attributes only, for real-valued datasets discretization of the real attributes is performed before RSFS. The core size of the discretized datasets is determined by the discretization process. Previous work has shown that the core size of the discretized dataset critically affects the performance of RSFS. This paper proposes a type of discretization termed core-generating approximate minimum entropy discretization (C-GAME) which selects a set of minimum entropy cuts capable of generating discrete data with nonempty cores. The paper defines C-GAME and then models it as a constraint satisfaction optimization problem which is solved using the branch and bound algorithm. Experiments have been performed on 2 datasets from the UCI database to investigate the performance of C-GAME as a pre-processing step for RSFS. Results show that, for these datasets, C-GAME outperforms both the recursive minimal entropy partition discretization method (RMEP) and the original decision trees without feature selection.


international conference on conceptual structures | 2013

Txt2vz: A New Tool for Generating Graph Clouds

Laurie Hirsch; David Tian

We present txt2vz (txt2vz.appspot.com), a new tool for automatically generating a visual summary of unstructured text data found in documents or web sites. The main purpose of the tool is to give the user information about the text so that they can quickly get a good idea about the topics covered. Txt2vz is able to identify important concepts from unstructured text data and to reveal relationships between those concepts. We discuss other approaches to generating diagrams from text and highlight the differences between tag clouds, word clouds, tree clouds and graph clouds.


granular computing | 2006

Evaluating the effect of rough set feature selection on the performance of decision trees

David Tian; John A. Keane; Xiao-Jun Zeng

Feature selection is a pre-processing step for train- ing of classifiers in order to improve their performance. Rough Set Feature Selection (RSFS) is a novel feature selection ap- proach. RSFS removes the redundant attributes only while keeping all the important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are called reducts. The intersection of all reducts is called the core. This paper investigates the effect of RSFS on the performance of decision trees in terms of classification accuracy and number of tree nodes. 9 datasets from different domains are used. For all datasets, there exists at least 1 reduct improving the performance of decision trees and the minimal reduct is not the best-quality reduct in improving decision tree performance. The effect of RSFS on the performance of the decision trees is shown to be related to the ratio of core size to dataset dimensionality. The core size is shown to be determined by the presence of pairs of core-determining objects within the dataset.


systems, man and cybernetics | 2013

A Bayesian Association Rule Mining Algorithm

David Tian; Ann Gledson; Athos Antoniades; Aristo Aristodimou; Ntalaperas Dimitrios; Ratnesh Sahay; Jianxin Pan; Stavros Stivaros; Goran Nenadic; Xiao-Jun Zeng; John A. Keane

This paper proposes a Bayesian association rule mining algorithm (BAR) which combines the Apriori association rule mining algorithm with Bayesian networks. Two interesting-ness measures of association rules: Bayesian confidence (BC) and Bayesian lift (BL) which measure conditional dependence and independence relationships between items are defined based on the joint probabilities represented by the Bayesian networks of association rules. BAR outputs best rules according to BC and BL. BAR is evaluated for its performance using two anonymized clinical phenotype datasets from the UCI Repository: Thyroid disease and Diabetes. The results show that BAR is capable of finding the best rules which have the highest BC, BL and very high support, confidence and lift.


Neurocomputing | 2018

A constraint-based genetic algorithm for optimizing neural network architectures for detection of loss of coolant accidents of nuclear power plants

David Tian; Jiamei Deng; Gopika Vinod; T. V. Santhosh; Hissam Tawfik

Abstract The loss of coolant accident (LOCA) of a nuclear power plant (NPP) is a severe accident in the nuclear energy industry. Nowadays, neural networks have been trained on nuclear simulation transient datasets to detect LOCA. This paper proposes a constraint-based genetic algorithm (GA) to find optimised 2-hidden layer network architectures for detecting LOCA of a NPP. The GA uses a proposed constraint satisfaction algorithm called random walk heuristic to create an initial population of neural network architectures of high performance. At each generation, the GA population is split into a sub-population of feature subsets and a sub-population of 2-hidden layer architectures to breed offspring from each sub-population independently in order to generate a wide variety of network architectures. During breeding 2-hidden layer architectures, a constraint-based nearest neighbor search algorithm is proposed to find the nearest neighbors of the offspring population generated by mutation. The results showed that for LOCA detection, the GA-optimised network outperformed a random search, an exhaustive search and a RBF kernel support vector regression (SVR) in terms of generalization performance. For the skillcraft dataset of the UCI machine learning repository, the GA-optimised network has a similar performance to the RBF kernel SVR and outperformed the other approaches.


Applications of Big Data Analytics | 2018

A Neural Networks Design Methodology for Detecting Loss of Coolant Accidents in Nuclear Power Plants

David Tian; Jiamei Deng; Gopika Vinod; T. V. Santhosh; Hissam Tawfik

Artificial intelligence methods have been successfully applied to monitor the safety of nuclear power plants (NPPs). One major safety issue of a NPP is the loss of a coolant accident (LOCA) which is caused by the occurrence of a large break in the inlet headers (IH) of a nuclear reactor. Neural networks can be trained on transient datasets of a NPP to detect LOCA of the NPP. However, the transient datasets exhibit big data characteristics and designing an optimised neural network by exhaustive training all possible neural network architectures on big data can be very time-consuming because there exist a large number of possible neural network architectures for big data. This work proposes a neural network (NN) design methodology in three stages to detect the break sizes of the IHs of a NPP. In stage one, an optimised 1-hidden layer multilayer perceptron (MLP) is obtained by training and testing a number of 1-hidden layer MLP architectures which are determined empirically. In stage two, a number of 2-hidden layer MLP architectures are determined based on the number of the weights of the optimised 1-hidden layer MLP; then, an optimised 2-hidden layer MLP is obtained by training and testing these 2-hidden layer MLP architectures. In stage three, the break sizes not present in the transient dataset are generated using linear interpolation method; then, the optimised 2-hidden layer MLP is trained and tested iteratively 100 times using the transient dataset added with the linear interpolation dataset. The results show that the proposed methodology outperformed the MLP of the previous work. Compared with exhaustive training of all 2-hidden layer architectures, the speed of the proposed methodology is faster than that of exhaustive training. Additionally, the optimised 2-hidden layer MLP of the proposed methodology has a similar performance to exhaustive training. We consider this work as an engineering application of predictive data analytics for which neural networks are used as the primary tool.


Health technology | 2017

Advancing clinical research by semantically interconnecting aggregated medical data information in a secure context

Athos Antoniades; Aristos Aristodimou; Christos Georgousopoulos; Nikolaus Forgó; Ann Gledson; Panagiotis Hasapis; Caroline L. Vandeleur; Konstantinos Perakis; Ratnesh Sahay; Muntazir Mehdi; Christiana A. Demetriou; Marie-Pierre F. Strippoli; Vasiliki Giotaki; Myrto Ioannidi; David Tian; Federica Tozzi; John A. Keane; Constantinos S. Pattichis

Collaboration


Dive into the David Tian's collaboration.

Top Co-Authors

Avatar

John A. Keane

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Xiao-Jun Zeng

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Ann Gledson

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ratnesh Sahay

National University of Ireland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Goran Nenadic

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Hissam Tawfik

Leeds Beckett University

View shared research outputs
Top Co-Authors

Avatar

Jiamei Deng

Leeds Beckett University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge