Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alexandre Kouznetsov is active.

Publication


Featured researches published by Alexandre Kouznetsov.


Journal of Biomedical Semantics | 2011

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B. Laurila; Christopher J. O. Baker; Cheng-Ju Kuo; Simone Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I. Furlong; Michael Rautschka; Mariana Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga

BackgroundCompetitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.ResultsAll four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE.ConclusionsThe SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I.


Journal of the American Medical Informatics Association | 2010

A new algorithm for reducing the workload of experts in performing systematic reviews

Stan Matwin; Alexandre Kouznetsov; Diana Inkpen; Oana Frunza; Peter O'Blenis

OBJECTIVE To determine whether a factorized version of the complement naïve Bayes (FCNB) classifier can reduce the time spent by experts reviewing journal articles for inclusion in systematic reviews of drug class efficacy for disease treatment. DESIGN The proposed classifier was evaluated on a test collection built from 15 systematic drug class reviews used in previous work. The FCNB classifier was constructed to classify each article as containing high-quality, drug class-specific evidence or not. Weight engineering (WE) techniques were added to reduce underestimation for Medical Subject Headings (MeSH)-based and Publication Type (PubType)-based features. Cross-validation experiments were performed to evaluate the classifiers parameters and performance. MEASUREMENTS Work saved over sampling (WSS) at no less than a 95% recall was used as the main measure of performance. RESULTS The minimum workload reduction for a systematic review for one topic, achieved with a FCNB/WE classifier, was 8.5%; the maximum was 62.2% and the average over the 15 topics was 33.5%. This is 15.0% higher than the average workload reduction obtained using a voting perceptron-based automated citation classification system. CONCLUSION The FCNB/WE classifier is simple, easy to implement, and produces significantly better results in reducing the workload than previously achieved. The results support it being a useful algorithm for machine-learning-based automation of systematic reviews of drug class efficacy for disease treatment.


BMC Genomics | 2010

Algorithms and semantic infrastructure for mutation impact extraction and grounding

Jonas B. Laurila; Nona Naderi; René Witte; Alexandre Riazanov; Alexandre Kouznetsov; Christopher J. O. Baker

BackgroundMutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases.ResultsWe present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework.ConclusionWe address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.


canadian conference on artificial intelligence | 2009

Classifying Biomedical Abstracts Using Committees of Classifiers and Collective Ranking Techniques

Alexandre Kouznetsov; Stan Matwin; Diana Inkpen; Amir Hossein Razavi; Oana Frunza; Morvarid Sehatkar; Leanne Seaward; Peter O'Blenis

The purpose of this work is to reduce the workload of human experts in building systematic reviews from published articles, used in evidence-based medicine. We propose to use a committee of classifiers to rank biomedical abstracts based on the predicted relevance to the topic under review. In our approach, we identify two subsets of abstracts: one that represents the top, and another that represents the bottom of the ranked list. These subsets, identified using machine learning (ML) techniques, are considered zones where abstracts are labeled with high confidence as relevant or irrelevant to the topic of the review. Early experiments with this approach using different classifiers and different representation techniques show significant workload reduction.


international conference on data mining | 2009

Parameterized Contrast in Second Order Soft Co-occurrences: A Novel Text Representation Technique in Text Mining and Knowledge Extraction

Amir Hossein Razavi; Stan Matwin; Diana Inkpen; Alexandre Kouznetsov

In this article, we present a novel statistical representation method for knowledge extraction from a corpus containing short texts. Then we introduce the contrast parameter which could be adjusted for targeting different conceptual levels in text mining and knowledge extraction. The method is based on second order co-occurrence vectors whose efficiency for representing meaning has been established in many applications, especially for representing word senses in different contexts and for disambiguation purposes. We evaluate our method on two tasks: classification of textual description of dreams, and classification of medical abstracts for systematic reviews.


canadian conference on artificial intelligence | 2010

Using classifier performance visualization to improve collective ranking techniques for biomedical abstracts classification

Alexandre Kouznetsov; Nathalie Japkowicz

The purpose of this work is to improve on the selection of algorithms for classifier committees applied to reducing the workload of human experts in building systematic reviews used in evidence-based medicine We focus on clustering pre-selected classifiers based on a multi-measure prediction performance evaluation expressed in terms of a projection from a high-dimensional space to a visualizable two-dimensional one The best classifier was selected from each cluster and included in the committee We applied the committee of classifiers to rank biomedical abstracts based on the predicted relevance to the topic under review We identified a subset of abstracts that represents the bottom of the ranked list (predicted as irrelevant) We used False Negatives (relevant articles mistakenly ranked at the bottom) as a final performance measure Our early experiments demonstrate that the classifier committee built using our new approach outperformed committees of classifiers arbitrary created from the same list of pre-selected classifiers.


advanced information networking and applications | 2011

Algorithm for Population of Object Property Assertions Derived from Telecom Contact Centre Product Support Documentation

Alexandre Kouznetsov; Jonas B. Laurila; Christopher J. O. Baker; Bradley Shoebottom

Relay of information from technical documentation by contact center workers to assist clients is limited by industry standard storage formats and query mechanisms. Here we present and evaluate a new methodology for processing technical documents and tagging them against a Telecom Hardware domain ontology. We deploy classical ontological NLP approaches to extract information from both text segments and tables, identifying text segments, named entities and relations between named entities described by an existing T-Box. We describe a method for scoring candidate object property assertions derived from text before populating the Telecom Hardware ontology. In our algorithm we leverage customized gazetteer lists, including lists specific to object property synonyms, and use functions of distance between co-occurring terms to score candidate A-box object property assertions. We review the performance of this approach with a use case involving Tier 1 and Tier 2 call centre agents using a visual query tool, Top Braid Live, to interrogate the instantiated Telecom Hardware ontology for information relevant to the needs of clients.


Journal of the American Medical Informatics Association | 2011

Performance of SVM and Bayesian classifiers on the systematic review classification task

Stan Matwin; Alexandre Kouznetsov; Diana Inkpen; Oana Frunza; Peter O'Blenis


semantic mining in biomedicine | 2010

Assessment of NER solutions against the first and second CALBC silver standard corpus.

Dietrich Rebholz-Schuhmann; Antonio Jimeno-Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B. Laurila; Christopher J. O. Baker; Cheng-Ju Kuo; Simon Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I. Furlong; Michael Rautschka; Mariana L. Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Md. Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga Llavori


owl: experiences and directions | 2010

Leverage of OWL-DL axioms in a Contact Centre for Technical Product Support.

Alexandre Kouznetsov; Bradley Shoebottom; René Witte; Christopher J. O. Baker

Collaboration


Dive into the Alexandre Kouznetsov's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jonas B. Laurila

University of New Brunswick

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chen Li

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

David Milward

St John's Innovation Centre

View shared research outputs
Researchain Logo
Decentralizing Knowledge