Aleksandar Kovačević
University of Novi Sad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aleksandar Kovačević.
Journal of the American Medical Informatics Association | 2013
Aleksandar Kovačević; Azad Dehghan; Michele Filannino; John A. Keane; Goran Nenadic
OBJECTIVE Identification of clinical events (eg, problems, tests, treatments) and associated temporal expressions (eg, dates and times) are key tasks in extracting and managing data from electronic health records. As part of the i2b2 2012 Natural Language Processing for Clinical Data challenge, we developed and evaluated a system to automatically extract temporal expressions and events from clinical narratives. The extracted temporal expressions were additionally normalized by assigning type, value, and modifier. MATERIALS AND METHODS The system combines rule-based and machine learning approaches that rely on morphological, lexical, syntactic, semantic, and domain-specific features. Rule-based components were designed to handle the recognition and normalization of temporal expressions, while conditional random fields models were trained for event and temporal recognition. RESULTS The system achieved micro F scores of 90% for the extraction of temporal expressions and 87% for clinical event extraction. The normalization component for temporal expressions achieved accuracies of 84.73% (expressions type), 70.44% (value), and 82.75% (modifier). DISCUSSION Compared to the initial agreement between human annotators (87-89%), the system provided comparable performance for both event and temporal expression mining. While (lenient) identification of such mentions is achievable, finding the exact boundaries proved challenging. CONCLUSIONS The system provides a state-of-the-art method that can be used to support automated identification of mentions of clinical events and temporal expressions in narratives either to support the manual review process or as a part of a large-scale processing of electronic health databases.
Journal of Biomedical Informatics | 2015
Azad Dehghan; Aleksandar Kovačević; George Karystianis; John A. Keane; Goran Nenadic
A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity types in longitudinal clinical narratives. Our approach combines knowledge-driven (dictionaries and rules) and data-driven (machine learning) methods with a large range of features to address de-identification of specific named entities. In addition, we have devised a two-pass recognition approach that creates a patient-specific run-time dictionary from the PHI entities identified in the first step with high confidence, which is then used in the second pass to identify mentions that lack specific clues. The proposed method achieved the overall micro F1-measures of 91% on strict and 95% on token-level evaluation on the test dataset (514 narratives). Whilst most PHI entities can be reliably identified, particularly challenging were mentions of Organizations and Professions. Still, the overall results suggest that automated text mining methods can be used to reliably process clinical notes to identify personal information and thus providing a crucial step in large-scale de-identification of unstructured data for further clinical and epidemiological studies.
international symposium on intelligent systems and informatics | 2010
Dragan Miljković; Ljubiša Gajić; Aleksandar Kovačević; Zora Konjović
Sport result prediction is nowadays very popular among fans around the world, which particularly contributed to the expansion of sports betting. This makes the problem of predicting the results of sporting events, a new and interesting challenge. Consequently systems dealing with this problem are developed every day. This paper presents one such system, which uses data mining techniques in order to predict the outcomes of basketball games in NBA (National Basketball Association) league. The problem of predicting the game result is formalized as a classification problem, where the Naive Bayes method is used. Besides actual result, for each game system calculates the spread, by using multivariate linear regression. The MVC Model 2 pattern based software system is implemented. The system was evaluated on the dataset comprising 778 games from the regular part of the 2009/2010 NBA season and it correctly predicted the winners of about 67% of matches.
Computer Speech & Language | 2012
Aleksandar Kovačević; Zora Konjović; Branko Milosavljevic; Goran Nenadic
The task of reviewing scientific publications and keeping up with the literature in a particular domain is extremely time-consuming. Extraction and exploration of methodological information, in particular, requires systematic understanding of the literature, but in many cases is performed within a limited context of publications that can be manually reviewed by an individual or group. Automated methodology identification could provide an opportunity for systematic retrieval of relevant documents and for exploring developments within a given discipline. In this paper we present a system for the identification of methodology mentions in scientific publications in the area of natural language processing, and in particular in automatic terminology recognition. The system comprises two major layers: the first layer is an automatic identification of methodological sentences; the second layer highlights methodological phrases (segments). Each mention is categorised in four semantic categories: Task, Method, Resource/Feature and Implementation. Extraction and classification of the segments is formalised as a sequence tagging problem and four separate phrase-based Conditional Random Fields are used to accomplish the task. The system has been evaluated on a manually annotated corpus comprising 45 full text articles. The results for the segment level annotation show an F-measure of 53% for identification of Task and Method mentions (with 70% precision), whereas the F-measures for Resource/Feature and Implementation identification were 61% (with 67% precision) and 75% (with 86% precision) respectively. At the document-level, an F-measure of 72% (with 81% precision) for Task mentions, 60% (with 81% precision) for Method mentions, 74% (with 78% precision) for the Resource/Feature and 79% (with 81% precision) for the Implementation categories have been achieved. We provide a detailed analysis of errors and explore the impact that the particular groups of features have on the extraction of methodological segments.
engineering of computer based systems | 2009
Dusan Majstorovic; Zoltan Pele; Aleksandar Kovačević; Nikola Celanovic
This paper defines a highly optimized computer architecture and FPGA technology as the most feasible approach to satisfy the challenging requirements defined by the need to emulate power electronics hardware with sub-microsecond latency and sampling time. The proposed commercial of the shelf computational platforms and the accompanying software tools based on the industry standard software platform have the potential to bring qualitative improvements in the way how power electronics software is designed, how it is tested and how its performance is verified.
Journal of Biomedical Informatics | 2015
George Karystianis; Azad Dehghan; Aleksandar Kovačević; John A. Keane; Goran Nenadic
Heart disease is the leading cause of death globally and a significant part of the human population lives with it. A number of risk factors have been recognized as contributing to the disease, including obesity, coronary artery disease (CAD), hypertension, hyperlipidemia, diabetes, smoking, and family history of premature CAD. This paper describes and evaluates a methodology to extract mentions of such risk factors from diabetic clinical notes, which was a task of the i2b2/UTHealth 2014 Challenge in Natural Language Processing for Clinical Data. The methodology is knowledge-driven and the system implements local lexicalized rules (based on syntactical patterns observed in notes) combined with manually constructed dictionaries that characterize the domain. A part of the task was also to detect the time interval in which the risk factors were present in a patient. The system was applied to an evaluation set of 514 unseen notes and achieved a micro-average F-score of 88% (with 86% precision and 90% recall). While the identification of CAD family history, medication and some of the related disease factors (e.g. hypertension, diabetes, hyperlipidemia) showed quite good results, the identification of CAD-specific indicators proved to be more challenging (F-score of 74%). Overall, the results are encouraging and suggested that automated text mining methods can be used to process clinical notes to identify risk factors and monitor progression of heart disease on a large-scale, providing necessary data for clinical and epidemiological studies.
international symposium on intelligent systems and informatics | 2010
Jelena Slivka; Aleksandar Kovačević; Zora Konjović
The performance of a classification model depends not only on the algorithm by which the model is learned, but also on the training set. Manual annotation of the training data is a tedious and time consuming job. In order to overcome the problem of laborious hand-labeling of a large training set, a set of techniques called semi-supervised learning was designed. Co-training is one of the major semi-supervised learning methods. Its setting applies to datasets that have a natural separation of their features into two disjoint sets. However, in the great majority of practical situations, the natural split of features does not exist. In this paper we propose the new co-training based algorithm which can be applied to such datasets.
Multimedia Tools and Applications | 2010
Aleksandar Kovačević; Branko Milosavljevic; Zora Konjović; Milan Vidaković
This paper presents a tunable content-based music retrieval (CBMR) system suitable the for retrieval of music audio clips. The audio clips are represented as extracted feature vectors. The CBMR system is expert-tunable by altering the feature space. The feature space is tuned according to the expert-specified similarity criteria expressed in terms of clusters of similar audio clips. The main goal of tuning the feature space is to improve retrieval performance, since some features may have more impact on perceived similarity than others. The tuning process utilizes our genetic algorithm. The R-tree index for efficient retrieval of audio clips is based on the clustering of feature vectors. For each cluster a minimal bounding rectangle (MBR) is formed, thus providing objects for indexing. Inserting new nodes into the R-tree is efficiently performed because of the chosen Quadratic Split algorithm. Our CBMR system implements the point query and the n-nearest neighbors query with the O(logn) time complexity. Different objective functions based on cluster similarity and dissimilarity measures are used for the genetic algorithm. We have found that all of them have similar impact on the retrieval performance in terms of precision and recall. The paper includes experimental results in measuring retrieval performance, reporting significant improvement over the untuned feature space.
Journal of Biomedical Informatics | 2017
Azad Dehghan; Aleksandar Kovačević; George Karystianis; John A. Keane; Goran Nenadic
De-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F1-scores of ∼90% and above. Yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information.
international conference industrial, engineering & other applications applied intelligent systems | 2016
Vladimir Dimitrieski; Gajo Petrovic; Aleksandar Kovačević; Ivan Luković; Hamido Fujita
In the era of Internet, high connectivity and openness introduced an opportunity for a new kind of approach to healthcare information system integration. Such an approach may utilize semantic-based technologies to represent and communicate knowledge between these systems. Resource Description Framework (RDF) in conjunction with Web Ontology Language (OWL) can be considered as a de facto standard when it comes to semantic web and linked data technologies, and represents a foundation for defining healthcare ontologies. The goal of this paper is to provide an overview and critical review of existing healthcare ontologies and approaches to healthcare IS integration, focusing on OWL/RDF based solutions. With this review we want to show that although a lot work is done in this area, no universal or omnipresent solution has surfaced to allow automatic or at least semi-automatic integration of healthcare ISs. As there is a large number of established and emerging ontologies covering this subject our review will not provide an exhaustive collection of all the references in the area, but present the most notable standards, ontologies, taxonomies, and integration approaches.