Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Filip Ginter is active.

Publication


Featured researches published by Filip Ginter.


BMC Bioinformatics | 2008

Comparative analysis of five protein-protein interaction corpora

Sampo Pyysalo; Antti Airola; Juho Heimonen; Jari Björne; Filip Ginter; Tapio Salakoski

BackgroundGrowing interest in the application of natural language processing methods to biomedical text has led to an increasing number of corpora and methods targeting protein-protein interaction (PPI) extraction. However, there is no general consensus regarding PPI annotation and consequently resources are largely incompatible and methods are difficult to evaluate.ResultsWe present the first comparative evaluation of the diverse PPI corpora, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties. For the evaluation, we unify the corpus PPI annotations to a shared level of information, consisting of undirected, untyped binary interactions of non-static types with no identification of the words specifying the interaction, no negations, and no interaction certainty.We find that the F-score performance of a state-of-the-art PPI extraction method varies on average 19 percentage units and in some cases over 30 percentage units between the different evaluated corpora. The differences stemming from the choice of corpus can thus be substantially larger than differences between the performance of PPI extraction methods, which suggests definite limits on the ability to compare methods evaluated on different resources. We analyse a number of potential sources for these differences and identify factors explaining approximately half of the variance. We further suggest ways in which the difficulty of the PPI extraction tasks codified by different corpora can be determined to advance comparability. Our analysis also identifies points of agreement and disagreement in PPI corpus annotation that are rarely explicitly stated by the authors of the corpora.ConclusionsOur comparative analysis uncovers key similarities and differences between the diverse PPI corpora, thus taking an important step towards standardization. In the course of this study we have created a major practical contribution in converting the corpora into a shared format. The conversion software is freely available at http://mars.cs.utu.fi/PPICorpora.


Bioinformatics | 2010

Complex event extraction at PubMed scale

Jari Björne; Filip Ginter; Sampo Pyysalo; Jun’ichi Tsujii; Tapio Salakoski

Motivation: There has recently been a notable shift in biomedical information extraction (IE) from relation models toward the more expressive event model, facilitated by the maturation of basic tools for biomedical text analysis and the availability of manually annotated resources. The event model allows detailed representation of complex natural language statements and can support a number of advanced text mining applications ranging from semantic search to pathway extraction. A recent collaborative evaluation demonstrated the potential of event extraction systems, yet there have so far been no studies of the generalization ability of the systems nor the feasibility of large-scale extraction. Results: This study considers event-based IE at PubMed scale. We introduce a system combining publicly available, state-of-the-art methods for domain parsing, named entity recognition and event extraction, and test the system on a representative 1% sample of all PubMed citations. We present the first evaluation of the generalization performance of event extraction systems to this scale and show that despite its computational complexity, event extraction from the entire PubMed is feasible. We further illustrate the value of the extraction approach through a number of analyses of the extracted information. Availability: The event detection system and extracted data are open source licensed and available at http://bionlp.utu.fi/. Contact: [email protected]


BMC Bioinformatics | 2012

University of Turku in the BioNLP'11 Shared Task

Jari Björne; Filip Ginter; Tapio Salakoski

BackgroundWe present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP11 Shared Task: generalization, the extension of event extraction to varied biomedical domains. Our system extends our BioNLP09 Shared Task winning Turku Event Extraction System, which uses support vector machines to first detect event-defining words, followed by detection of their relationships.ResultsOur current system successfully predicts events for every domain case introduced in the BioNLP11 Shared Task, being the only system to participate in all eight tasks and all of their subtasks, with best performance in four tasks. Following the Shared Task, we improve the system on the Infectious Diseases task from 42.57% to 53.87% F-score, bringing performance into line with the similar GENIA Event Extraction and Epigenetics and Post-translational Modifications tasks. We evaluate the machine learning performance of the system by calculating learning curves for all tasks, detecting areas where additional annotated data could be used to improve performance. Finally, we evaluate the use of system output on external articles as additional training data in a form of self-training.ConclusionsWe show that the updated Turku Event Extraction System can easily be adapted to all presently available event extraction targets, with competitive performance in most tasks. The scope of the performance gains between the 2009 and 2011 BioNLP Shared Tasks indicates event extraction is still a new field requiring more work. We provide several analyses of event extraction methods and performance, highlighting potential future directions for continued development.


International Journal of Medical Informatics | 2006

Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions.

Sampo Pyysalo; Filip Ginter; Tapio Pahikkala; Jorma Boberg; Jouni Järvinen; Tapio Salakoski

We present an evaluation of Link Grammar and Connexor Machinese Syntax, two major broad-coverage dependency parsers, on a custom hand-annotated corpus consisting of sentences regarding protein-protein interactions. In the evaluation, we apply the notion of an interaction subgraph, which is the subgraph of a dependency graph expressing a protein-protein interaction. We measure the performance of the parsers for recovery of individual dependencies, fully correct parses, and interaction subgraphs. For Link Grammar, an open system that can be inspected in detail, we further perform a comprehensive failure analysis, report specific causes of error, and suggest potential modifications to the grammar. We find that both parsers perform worse on biomedical English than previously reported on general English. While Connexor Machinese Syntax significantly outperforms Link Grammar, the failure analysis suggests specific ways in which the latter could be modified for better performance in the domain.


BMC Bioinformatics | 2005

Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation

Tapio Pahikkala; Filip Ginter; Jorma Boberg; Jouni Järvinen; Tapio Salakoski

BackgroundThe ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task.ResultsWe incorporated into the conventional SVM a weighting scheme based on distances of context words from the word to be disambiguated. This weighting scheme increased the performance of SVMs by five percentage points giving performance better than 85% as measured by the area under ROC curve and outperformed the Weighted Additive Classifier, which also incorporates the weighting, and the Naive Bayes classifier.ConclusionWe show that the performance of SVMs can be improved by the proposed weighting scheme. Furthermore, our results suggest that in this study the increase of the classification performance due to the weighting is greater than that obtained by selecting the underlying classifier or the kernel part of the SVM.


BMC Bioinformatics | 2015

Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis

Kai Hakala; Sofie Van Landeghem; Tapio Salakoski; Yves Van de Peer; Filip Ginter

BackgroundModern methods for mining biomolecular interactions from literature typically make predictions based solely on the immediate textual context, in effect a single sentence. No prior work has been published on extending this context to the information automatically gathered from the whole biomedical literature. Thus, our motivation for this study is to explore whether mutually supporting evidence, aggregated across several documents can be utilized to improve the performance of the state-of-the-art event extraction systems.In this paper, we describe our participation in the latest BioNLP Shared Task using the large-scale text mining resource EVEX. We participated in the Genia Event Extraction (GE) and Gene Regulation Network (GRN) tasks with two separate systems. In the GE task, we implemented a re-ranking approach to improve the precision of an existing event extraction system, incorporating features from the EVEX resource. In the GRN task, our system relied solely on the EVEX resource and utilized a rule-based conversion algorithm between the EVEX and GRN formats.ResultsIn the GE task, our re-ranking approach led to a modest performance increase and resulted in the first rank of the official Shared Task results with 50.97% F-score. Additionally, in this paper we explore and evaluate the usage of distributed vector representations for this challenge.In the GRN task, we ranked fifth in the official results with a strict/relaxed SER score of 0.92/0.81 respectively. To try and improve upon these results, we have implemented a novel machine learning based conversion system and benchmarked its performance against the original rule-based system.ConclusionsFor the GRN task, we were able to produce a gene regulatory network from the EVEX data, warranting the use of such generic large-scale text mining data in network biology settings. A detailed performance and error analysis provides more insight into the relatively low recall rates.In the GE task we demonstrate that both the re-ranking approach and the word vectors can provide slight performance improvement. A manual evaluation of the re-ranking results pinpoints some of the challenges faced in applying large-scale text mining knowledge to event extraction.


international conference natural language processing | 2006

Regular approximation of link grammar

Filip Ginter; Sampo Pyysalo; Jorma Boberg; Tapio Salakoski

We present a regular approximation of Link Grammar, a dependency-type formalism with context-free expressive power, as a first step toward a finite-state joint inference system. The approximation is implemented by limiting the maximum nesting depth of links, and otherwise retains the features of the original formalism. We present a string encoding of Link Grammar parses and describe finite-state machines implementing the grammar rules as well as the planarity, connectivity, ordering and exclusion axioms constraining grammatical Link Grammar parses. The regular approximation is then defined as the intersection of these machines. Finally, we implement two approaches to finite-state parsing using the approximation and discuss their feasibility. We find that parsing in the intersection grammars framework using the approximation is feasible, although inefficient, and we discuss several approaches to improve the efficiency.


BMC Bioinformatics | 2008

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

Antti Airola; Sampo Pyysalo; Jari Björne; Tapio Pahikkala; Filip Ginter; Tapio Salakoski


the florida ai research society | 2005

Kernels Incorporating Word Positional Information in Natural Language Disambiguation Tasks.

Tapio Pahikkala; Sampo Pyysalo; Filip Ginter; Jorma Boberg; Jouni Järvinen; Tapio Salakoski


NODALIDA | 2009

Parsing Clinical Finnish: Experiments with Rule-Based and Statistical Dependency Parsers

Katri Haverinen; Filip Ginter; Veronika Laippala; Tapio Salakoski

Collaboration


Dive into the Filip Ginter's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sampo Pyysalo

Information Technology University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jari Björne

Turku Centre for Computer Science

View shared research outputs
Top Co-Authors

Avatar

Jorma Boberg

Turku Centre for Computer Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jouni Järvinen

Turku Centre for Computer Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sampo Pyysalo

Information Technology University

View shared research outputs
Researchain Logo
Decentralizing Knowledge