Truyen Tran
Deakin University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Truyen Tran.
knowledge discovery and data mining | 2010
Sunil Kumar Gupta; Dinh Q. Phung; Brett Adams; Truyen Tran; Svetha Venkatesh
Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets.
knowledge discovery and data mining | 2016
Trang Pham; Truyen Tran; Dinh Q. Phung; Svetha Venkatesh
Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory LSTM, DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving upi¾źto the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy.
Journal of Biomedical Informatics | 2015
Truyen Tran; Tu Dinh Nguyen; Dinh Q. Phung; Svetha Venkatesh
Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from EMR is labor intensive because EMR is complex - it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness EMR with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines.
BMJ Open | 2014
Sunil Kumar Gupta; Truyen Tran; Wei Luo; Dinh Q. Phung; Richard L. Kennedy; Adam Broad; David Campbell; David Kipp; Madhu Singh; Mustafa Khasraw; Leigh Matheson; David M. Ashley; Svetha Venkatesh
Objectives Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collected digital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventional methods in predicting clinical outcomes. Setting A regional cancer centre in Australia. Participants Disease-specific data from a purpose-built cancer registry (Evaluation of Cancer Outcomes (ECO)) from 869 patients were used to predict survival at 6, 12 and 24 months. The model was validated with data from a further 94 patients, and results compared to the assessment of five specialist oncologists. Machine-learning prediction using ECO data was compared with that using EAR and a model combining ECO and EAR data. Primary and secondary outcome measures Survival prediction accuracy in terms of the area under the receiver operating characteristic curve (AUC). Results The ECO model yielded AUCs of 0.87 (95% CI 0.848 to 0.890) at 6 months, 0.796 (95% CI 0.774 to 0.823) at 12 months and 0.764 (95% CI 0.737 to 0.789) at 24 months. Each was slightly better than the performance of the clinician panel. The model performed consistently across a range of cancers, including rare cancers. Combining ECO and EAR data yielded better prediction than the ECO-based model (AUCs ranging from 0.757 to 0.997 for 6 months, AUCs from 0.689 to 0.988 for 12 months and AUCs from 0.713 to 0.973 for 24 months). The best prediction was for genitourinary, head and neck, lung, skin, and upper gastrointestinal tumours. Conclusions Machine learning applied to information from a disease-specific (cancer) database and the EAR can be used to predict clinical outcomes. Importantly, the approach described made use of digital data that is already routinely collected but underexploited by clinical health systems.
Journal of Biomedical Informatics | 2017
Trang Pham; Truyen Tran; Dinh Q. Phung; Svetha Venkatesh
Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, stored in electronic medical records are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors and models patient health state trajectories by the memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces methods to handle irregularly timed events by moderating the forgetting and consolidation of memory. DeepCare also explicitly models medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden - diabetes and mental health - the results show improved prediction accuracy.
Knowledge and Information Systems | 2015
Truyen Tran; Dinh Q. Phung; Wei Luo; Svetha Venkatesh
The recent wide adoption of electronic medical records (EMRs) presents great opportunities and challenges for data mining. The EMR data are largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.
Australian Health Review | 2014
Santu Rana; Truyen Tran; Wei Luo; Dinh Q. Phung; Richard L. Kennedy; Svetha Venkatesh
OBJECTIVE Readmission rates are high following acute myocardial infarction (AMI), but risk stratification has proved difficult because known risk factors are only weakly predictive. In the present study, we applied hospital data to identify the risk of unplanned admission following AMI hospitalisations. METHODS The study included 1660 consecutive AMI admissions. Predictive models were derived from 1107 randomly selected records and tested on the remaining 553 records. The electronic medical record (EMR) model was compared with a seven-factor predictive score known as the HOSPITAL score and a model derived from Elixhauser comorbidities. All models were evaluated for the ability to identify patients at high risk of 30-day ischaemic heart disease readmission and those at risk of all-cause readmission within 12 months following the initial AMI hospitalisation. RESULTS The EMR model has higher discrimination than other models in predicting ischaemic heart disease readmissions (area under the curve (AUC) 0.78; 95% confidence interval (CI) 0.71-0.85 for 30-day readmission). The positive predictive value was significantly higher with the EMR model, which identifies cohorts that were up to threefold more likely to be readmitted. Factors associated with readmission included emergency department attendances, cardiac diagnoses and procedures, renal impairment and electrolyte disturbances. The EMR model also performed better than other models (AUC 0.72; 95% CI 0.66-0.78), and with greater positive predictive value, in identifying 12-month risk of all-cause readmission. CONCLUSIONS Routine hospital data can help identify patients at high risk of readmission following AMI. This could lead to decreased readmission rates by identifying patients suitable for targeted clinical interventions.
knowledge discovery and data mining | 2010
Thin Nguyen; Dinh Q. Phung; Brett Adams; Truyen Tran; Svetha Venkatesh
Automatic data-driven analysis of mood from text is an emerging problem with many potential applications Unlike generic text categorization, mood classification based on textual features is complicated by various factors, including its context- and user-sensitive nature We present a comprehensive study of different feature selection schemes in machine learning for the problem of mood classification in weblogs Notably, we introduce the novel use of a feature set based on the affective norms for English words (ANEW) lexicon studied in psychology This feature set has the advantage of being computationally efficient while maintaining accuracy comparable to other state-of-the-art feature sets experimented with In addition, we present results of data-driven clustering on a dataset of over 17 million blog posts with mood groundtruth Our analysis reveals an interesting, and readily interpreted, structure to the linguistic expression of emotion, one that comprises valuable empirical evidence in support of existing psychological models of emotion, and in particular the dipoles pleasure–displeasure and activation–deactivation.
knowledge discovery and data mining | 2013
Truyen Tran; Dinh Q. Phung; Wei Luo; Richard Harvey; Michael Berk; Svetha Venkatesh
Suicide is a major concern in society. Despite of great attention paid by the community with very substantive medico-legal implications, there has been no satisfying method that can reliably predict the future attempted or completed suicide. We present an integrated machine learning framework to tackle this challenge. Our proposed framework consists of a novel feature extraction scheme, an embedded feature selection process, a set of risk classifiers and finally, a risk calibration procedure. For temporal feature extraction, we cast the patients clinical history into a temporal image to which a bank of one-side filters are applied. The responses are then partly transformed into mid-level features and then selected in l1-norm framework under the extreme value theory. A set of probabilistic ordinal risk classifiers are then applied to compute the risk probabilities and further re-rank the features. Finally, the predicted risks are calibrated. Together with our Australian partner, we perform comprehensive study on data collected for the mental health cohort, and the experiments validate that our proposed framework outperforms risk assessment instruments by medical practitioners.
automated software engineering | 2015
Morakot Choetkiertikul; Hoa Khanh Dam; Truyen Tran; Aditya K. Ghose
Software projects have a high risk of cost and schedule overruns, which has been a source of concern for the software engineering community for a long time. One of the challenges in software project management is to make reliable prediction of delays in the context of constant and rapid changes inherent in software projects. This paper presents a novel approach to providing automated support for project managers and other decision makers in predicting whether a subset of software tasks (among the hundreds to thousands of ongoing tasks) in a software project have a risk of being delayed. Our approach makes use of not only features specific to individual software tasks (i.e. local data) -- as done in previous work -- but also their relationships (i.e. networked data). In addition, using collective classification, our approach can simultaneously predict the degree of delay for a group of related tasks. Our evaluation results show a significant improvement over traditional approaches which perform classification on each task independently: achieving 46% -- 97% precision (49% improved), 46% -- 97% recall (28% improved), 56% -- 75% F-measure (39% improved), and 78% -- 95% Area Under the ROC Curve (16% improved).