Mahnoosh Kholghi
Queensland University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mahnoosh Kholghi.
Journal of the American Medical Informatics Association | 2016
Mahnoosh Kholghi; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
OBJECTIVE This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined. MATERIALS AND METHODS The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab. RESULTS The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled. CONCLUSION Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation.
Journal of the Association for Information Science and Technology | 2017
Mahnoosh Kholghi; Lance De Vine; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
This article demonstrates the benefits of using sequence representations based on word embeddings to inform the seed selection and sample selection processes in an active learning pipeline for clinical information extraction. Seed selection refers to choosing an initial sample set to label to form an initial learning model. Sample selection refers to selecting informative samples to update the model at each iteration of the active learning process. Compared to supervised machine learning approaches, active learning offers the opportunity to build statistical classifiers with a reduced amount of training samples that require manual annotation. Reducing the manual annotation effort can support automating the clinical information extraction process. This is particularly beneficial in the clinical domain, where manual annotation is a time‐consuming and costly task, as it requires extensive labor from clinical experts. Our empirical findings demonstrate that (a) using sequence representations along with the length of sequence for seed selection shows potential towards more effective initial models, and (b) using sequence representations for sample selection leads to significantly lower manual annotation efforts, with up to 3% and 6% fewer tokens and concepts requiring annotation, respectively, compared to state‐of‐the‐art query strategies.
conference on information and knowledge management | 2015
Mahnoosh Kholghi; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that (1) amongst existing query strategies, the ones based on the classification models confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and (2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.
Methods in Ecology and Evolution | 2018
Mahnoosh Kholghi; Yvonne F. Phillips; Michael W. Towsey; Laurianne Sitbon; Paul Roe
1. This paper presents an active learning framework for the classification of one-minute audio-recordings derived from long-duration recordings of the environment. The goal of the framework is to investigate the efficacy of active learning on reducing the manual annotation effort required to label a large volume of acoustic data according to its dominant sound source, while ensuring the high quality of automatically labelled data. 2. We present a comprehensive empirical comparison through extensive simulation experiments of a range of active learning approaches against a Random Sampling baseline for soundscape classification. Random Forest is used as a benchmark supervised approach to build classifiers in the active learning framework. Also, twelve summary indices extracted for each one minute of 13-month recording are used as features for training the classifiers. 3. Our experimental findings demonstrate that (1) among existing query strategies, those based on classifier confidence and diversity of samples are more effective for very large datasets where the classes are imbalanced in size; (2) by considering a practical target performance (i.e., F-measure equal or greater than 0.8, 0.85, and 0.9) for active learning, only 5-16 hours of manual annotation effort is required to build a classifier that automatically annotates a large amount (13 months) of unlabelled audio data. 4. Active learning has a key role to play in alleviating the burden of manual annotation required to build classifiers which can support effective monitoring of species diversity in at-risk ecosystems.
Science & Engineering Faculty | 2015
Mahnoosh Kholghi; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
Science & Engineering Faculty | 2015
Lance De Vine; Mahnoosh Kholghi; Guido Zuccon; Laurianne Sitbon; Anthony Nguyen
Science & Engineering Faculty | 2014
Mahnoosh Kholghi; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
School of Electrical Engineering & Computer Science; Science & Engineering Faculty | 2018
Mahnoosh Kholghi; Yvonne F. Phillips; Michael W. Towsey; Laurianne Sitbon; Paul Roe
School of Electrical Engineering & Computer Science; Science & Engineering Faculty | 2017
Mahnoosh Kholghi; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
School of Electrical Engineering & Computer Science; Science & Engineering Faculty | 2017
Mahnoosh Kholghi; Lance De Vine; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
Collaboration
Dive into the Mahnoosh Kholghi's collaboration.
Commonwealth Scientific and Industrial Research Organisation
View shared research outputs