Thomas Brendan Murphy
University College Dublin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Brendan Murphy.
Statistics and Computing | 2008
Paul D. McNicholas; Thomas Brendan Murphy
Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases.In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed.These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance.
Journal of Proteome Research | 2011
James N. Arnold; Radka Saldova; Marie Galligan; Thomas Brendan Murphy; Yuka Mimura-Kimura; Jayne E. Telford; Andrew K. Godwin; Pauline M. Rudd
Lung cancer has a poor prognosis and a 5-year survival rate of 15%. Therefore, early detection is vital. Diagnostic testing of serum for cancer-associated biomarkers is a noninvasive detection method. Glycosylation is the most frequent post-translational modification of proteins and it has been shown to be altered in cancer. In this paper, high-throughput HILIC technology was applied to serum samples from 100 lung cancer patients, alongside 84 age-matched controls and significant alterations in N-linked glycosylation were identified. Increases were detected in glycans containing Sialyl Lewis X, monoantennary glycans, highly sialylated glycans and decreases were observed in core-fucosylated biantennary glycans, with some being detectable as early as in Stage I. The N-linked glycan profile of haptoglobin demonstrated similar alterations to those elucidated in the total serum glycome. The most significantly altered HILIC peak in lung cancer samples includes predominantly disialylated and tri- and tetra-antennary glycans. This potential disease marker is significantly increased across all disease groups compared to controls and a strong disease effect is visible even after the effect of smoking is accounted for. The combination of all glyco-biomarkers had the highest sensitivity and specificity. This study identifies candidates for further study as potential biomarkers for the disease.
Bioinformatics | 2010
Paul D. McNicholas; Thomas Brendan Murphy
MOTIVATION In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. RESULTS The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info
Computational Statistics & Data Analysis | 2010
Paul D. McNicholas; Thomas Brendan Murphy; Aaron F. McDaid; Dermot Frost
Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis-like covariance structure, is described and an ecient algorithm for its implementation is presented. This algorithm uses the alternating expectationconditional maximization (AECM) variant of the expectation-maximization (EM) algorithm. Two central issues around the implementation of this family of models, namely model selection and convergence criteria, are discussed. These central issues also have implications for other model-based clustering techniques and for the implementation of techniques like the EM algorithm, in general. The Bayesian information criterion (BIC) is used for model selection and Aitken’s acceleration, which is shown to outperform the lack of progress criterion, is used to determine convergence. A brief introduction to parallel computing is then given before the implementation of this algorithm in parallel is facilitated within the master-slave paradigm. A simulation study is then carried out to confirm the eectiveness of this parallelization. The resulting software is applied to two data sets to demonstrate its eectiveness when compared to existing software.
Computational Statistics & Data Analysis | 2003
Thomas Brendan Murphy; Donal Martin
Ranking data arises when judges are asked to rank some or all of a group of objects. Examples of ranking data arise in many areas, including the Irish electoral system and the Irish college admission system. Mixture models can be used to study heterogeneous populations. The study of these populations is achieved by thinking of the population as being composed of a finite number of homogeneous sub-populations. Mixtures of distance-based models are used to analyze ranking data from heterogeneous populations. Results from simulations are included, as well as an application to the well-known American Psychological Association election data set.
Stroke | 2011
Rose Galvin; Tara Cusack; Eleanor O'Grady; Thomas Brendan Murphy; Emma Stokes
Background and Purpose— Additional exercise therapy has been shown to have a positive impact on function after acute stroke and research is now focusing on methods to increase the amount of therapy that is delivered. This randomized controlled trial examined the impact of additional family-mediated exercise (FAME) therapy on outcome after acute stroke. Methods— Forty participants with acute stroke were randomly assigned to either a control group who received routine therapy with no formal input from their family members or a FAME group, who received routine therapy and additional lower limb FAME therapy for 8 weeks. The primary outcome measure used was the lower limb section of the Fugl-Meyer Assessment modified by Lindmark. Other measures of impairment, activity, and participation were completed at baseline, postintervention, and at a 3-month follow-up. Results— Statistically significant differences in favor of the FAME group were noted on all measures of impairment and activity postintervention (P<0.05). These improvements persisted at the 3-month follow-up but only walking was statistically significant (P<0.05). Participants in the FAME group were also significantly more integrated into their community at follow-up (P<0.05). Family members in the FAME group reported a significant decrease in their levels of caregiver strain at the follow-up when compared with those in the control group (P<0.01). Conclusions— This evidence-based FAME intervention can serve to optimize patient recovery and family involvement after acute stroke at the same time as being mindful of available resources.
Journal of Proteome Research | 2011
Yue Fan; Thomas Brendan Murphy; Jennifer C. Byrne; Lorraine Brennan; John M. Fitzpatrick; R. W. G. Watson
In recent years, Prostate Specific Antigen (PSA) testing is widespread and has been associated with deceased mortality rates; however, this testing has raised concerns of overdiagnosis and overtreatment. It is clear that additional biomarkers are required. To identify these biomarkers, we have undertaken proteomics and metabolomics expression profiles of serum samples from BPH, Gleason score 5 and 7 using two-dimensional difference in gel electrophoresis (2D-DIGE) and nuclear magnetic resonance spectroscopy (NMR). Panels of serum protein biomarkers were identified by applying Random Forests to the 2D-DIGE data. The evaluation of selected biomarker panels has shown that they can provide higher prediction accuracy than the current diagnostic standard. With careful validation of these serum biomarker panels, these panels may potentially help to reduce unnecessary invasive diagnostic procedures and more accurately direct the urologist to curative surgery.
Statistical Analysis and Data Mining | 2012
Michael Salter-Townshend; Arthur White; Isabella Gollini; Thomas Brendan Murphy
The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics. This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdős–Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout. The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading.
Journal of the American Statistical Association | 2008
Isobel Claire Gormley; Thomas Brendan Murphy
Irish elections use a voting system called proportional representation by means of a single transferable vote (PR–STV). Under this system, voters express their vote by ranking some (or all) of the candidates in order of preference. Which candidates are elected is determined through a series of counts where candidates are eliminated and surplus votes are distributed. The electorate in any election forms a heterogeneous population; that is, voters with different political and ideological persuasions would be expected to have different preferences for the candidates. The purpose of this article is to establish the presence of voting blocs in the Irish electorate, to characterize these blocs, and to estimate their size. A mixture modeling approach is used to explore the heterogeneity of the Irish electorate and to establish the existence of clearly defined voting blocs. The voting blocs are characterized by their voting preferences, which are described using a ranking data model. In addition, the care with which voters choose lower tier preferences is estimated in the model. The methodology is used to explore data from two Irish elections. Data from eight opinion polls taken during the six weeks prior to the 1997 Irish presidential election are analyzed. These data reveal the evolution of the structure of the electorate during the election campaign. In addition, data that record the votes from the Dublin West constituency of the 2002 Irish general election are analyzed to reveal distinct voting blocs within the electorate; these blocs are characterized by party politics, candidate profile, and political ideology.
international conference on machine learning | 2006
Isobel Claire Gormley; Thomas Brendan Murphy
Rank data consist of ordered lists of objects. A particular example of these data arises in Irish elections using the proportional representation by means of a single transferable vote (PR-STV) system, where voters list candidates in order of preference. A latent space model is proposed for rank (voting) data, where both voters and candidates are located in the same D dimensional latent space. The relative proximity of candidates to a voter determines the probability of a voter giving high preferences to a candidate. The votes are modelled using a Plackett-Luce model which allows for the ranked nature of the data to be modelled directly. Data from the 2002 Irish general election are analyzed using the proposed model which is fitted in a Bayesian framework. The estimated candidate positions suggest that the party politics play an important role in this election. Methods for choosing D, the dimensionality of the latent space, are discussed and models with D = 1 or D = 2 are proposed for the 2002 Irish general election data.