Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Juho Heimonen is active.

Publication


Featured researches published by Juho Heimonen.


BMC Bioinformatics | 2008

Comparative analysis of five protein-protein interaction corpora

Sampo Pyysalo; Antti Airola; Juho Heimonen; Jari Björne; Filip Ginter; Tapio Salakoski

BackgroundGrowing interest in the application of natural language processing methods to biomedical text has led to an increasing number of corpora and methods targeting protein-protein interaction (PPI) extraction. However, there is no general consensus regarding PPI annotation and consequently resources are largely incompatible and methods are difficult to evaluate.ResultsWe present the first comparative evaluation of the diverse PPI corpora, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties. For the evaluation, we unify the corpus PPI annotations to a shared level of information, consisting of undirected, untyped binary interactions of non-static types with no identification of the words specifying the interaction, no negations, and no interaction certainty.We find that the F-score performance of a state-of-the-art PPI extraction method varies on average 19 percentage units and in some cases over 30 percentage units between the different evaluated corpora. The differences stemming from the choice of corpus can thus be substantially larger than differences between the performance of PPI extraction methods, which suggests definite limits on the ability to compare methods evaluated on different resources. We analyse a number of potential sources for these differences and identify factors explaining approximately half of the variance. We further suggest ways in which the difficulty of the PPI extraction tasks codified by different corpora can be determined to advance comparability. Our analysis also identifies points of agreement and disagreement in PPI corpus annotation that are rarely explicitly stated by the authors of the corpora.ConclusionsOur comparative analysis uncovers key similarities and differences between the diverse PPI corpora, thus taking an important step towards standardization. In the course of this study we have created a major practical contribution in converting the corpora into a shared format. The conversion software is freely available at http://mars.cs.utu.fi/PPICorpora.


computational intelligence | 2011

EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH-BASED FEATURE SETS

Jari Björne; Juho Heimonen; Filip Ginter; Antti Airola; Tapio Pahikkala; Tapio Salakoski

We describe a system for extracting complex events among genes and proteins from biomedical literature, developed in context of the BioNLP’09 Shared Task on Event Extraction. For each event, the system extracts its text trigger, class, and arguments. In contrast to the approaches prevailing prior to the shared task, events can be arguments of other events, resulting in a nested structure that better captures the underlying biological statements. We divide the task into independent steps which we approach as machine learning problems. We define a wide array of features and in particular make extensive use of dependency parse graphs. A rule‐based postprocessing step is used to refine the output in accordance with the restrictions of the extraction task. In the shared task evaluation, the system achieved an F‐score of 51.95% on the primary task, the best performance among the participants. Currently, with modifications and improvements described in this article, the system achieves 52.86% F‐score on Task 1, the primary task, improving on its original performance. In addition, we extend the system also to Tasks 2 and 3, gaining F‐scores of 51.28% and 50.18%, respectively. The system thus addresses the BioNLP’09 Shared Task in its entirety and achieves the best performance on all three subtasks.


meeting of the association for computational linguistics | 2007

On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA

Sampo Pyysalo; Filip Ginter; Veronika Laippala; Katri Haverinen; Juho Heimonen; Tapio Salakoski

Several incompatible syntactic annotation schemes are currently used by parsers and corpora in biomedical information extraction. The recently introduced Stanford dependency scheme has been suggested to be a suitable unifying syntax formalism. In this paper, we present a step towards such unification by creating a conversion from the Link Grammar to the Stanford scheme. Further, we create a version of the BioInfer corpus with syntactic annotation in this scheme. We present an application-oriented evaluation of the transformation and assess the suitability of the scheme and our conversion to the unification of the syntactic annotations of BioInfer and the GENIA Treebank. We find that a highly reliable conversion is both feasible to create and practical, increasing the applicability of both the parser and the corpus to information extraction.


Global Media and China | 2017

Automated quantification of Reuters news using a receiver operating characteristic curve analysis: The Western media image of China:

Jukka Aukia; Juho Heimonen; Tapio Pahikkala; Tapio Salakoski

Country images are increasingly popular but controversial policy concepts. The Western news media image of China is consequently a well-explored academic subject. This study contributes to the discussion by considering the news stream of an international news agency to contrast prior studies that analyze small-scale data sets. An automated dictionary method is proposed to analyze two Reuters data corpora RCV1 (1996–1998) and TRC2 (2008–2009) (N = 1,386,000). The area under the receiver operating characteristic curve is employed and its development over time is statistically analyzed. China’s media image was found to be relatively positive in comparison with Japan, South Korea, and Taiwan. The results also suggest that economy provides a greater positive impact on the China’s image in Western media than culture, while politics has a negative impact. The results of the automated method are similar to those of a previous study in which a moderately sized data set was manually examined. This suggests that automated sentiment analysis can provide equally reliable observations as manual analysis but with smaller labor costs.


S+SSPR 2014 Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition - Volume 8621 | 2014

Properties of Object-Level Cross-Validation Schemes for Symmetric Pair-Input Data

Juho Heimonen; Tapio Salakoski; Tapio Pahikkala

In bioinformatics, many learning tasks involve pair-input data i.e., inputs representing object pairs where inputs are not independent. Two cross-validation schemes for symmetric pair-input data are considered. The mean and variance of cross-validation estimate deviations from respective generalization performances are examined in the situation where the learned model is applied to pairs of two previously unseen objects. In experiments with the task of learning protein functional similarities, large positive mean deviations were observed with the relaxed scheme due to training---validation dependencies while the strict scheme yielded small negative mean deviations and higher variances. The properties of the strict scheme can be explained by the reduction in cross-validation training set sizes when avoiding training---validation dependencies. The results suggest that the strict scheme is preferable in the given setting.


International Conference on Well-Being in the Information Society | 2014

Documentation of the Clinical Phase of the Cardiac Rehabilitation Process in a Finnish University Hospital District

Lotta Kauhanen; Laura-Maria Murtola; Juho Heimonen; Tuija Leskinen; Kari K. Kalliokoski; Elina Raivo; Tapio Salakoski; Sanna Salanterä

Cardiac rehabilitation (CR) is an essential part of the treatment and recovery process of cardiac patients by which mortality can be reduced. CR is documented in the patient’s health records to ensure continuity of care. The aim of this study was to describe and evaluate the contents of the clinical phase documentation of CR according to the care notes of physical therapists and physiatrists. The data set used in this register-based study consisted of the electronic health records (EHR) of patients, with any type of cardiac problem admitted to a Finnish university hospital district between 2005 and 2009. The main findings indicate that 1) only a small part of the eligible patients’ records include CR documentation 2) the patients with CR documentation are relatively old when compared to the age distribution of all cardiac patients (p<0,001), 3) the documentation does not systematically follow the national guidelines, 4) the evaluation of treatment is rarely documented, and 5) the most commonly documented therapy concerned walking- and breathing exercises.


International Conference on Well-Being in the Information Society | 2012

Avoiding Hazards – What Can Health Care Learn from Aviation?

Olli Sjöblom; Juho Heimonen; Lotta Kauhanen; Veronika Laippala; Heljä Lundgrén-Laine; Laura-Maria Murtola; Tapio Salakoski; Sanna Salanterä

Effective methods are needed to identify and analyze risks to improve patient safety. Analysing patient records and learning from “touch and go”- situations is one possible way to prevent hazardous conditions. The eventuality for the incident or accident occurring may be markedly reduced in case the risks can be efficiently diagnosed. Through this outlook, flight safety has been successfully improved during decades. Aviation and health care share many important points and similarities, thus the methods for improving safety could be transferred between the domains. In this paper, text mining and especially clustering is applied to identify lethal trends in both patient records and aviation for comparing and evaluating these trends in the two fields.


north american chapter of the association for computational linguistics | 2009

Extracting Complex Biological Events with Rich Graph-Based Feature Sets

Jari Björne; Juho Heimonen; Filip Ginter; Antti Airola; Tapio Pahikkala; Tapio Salakoski


Artificial Intelligence in Medicine | 2016

Comparison of automatic summarisation methods for clinical free text notes

Hans Moen; Laura-Maria Peltonen; Juho Heimonen; Antti Airola; Tapio Pahikkala; Tapio Salakoski; Sanna Salanterä


NODALIDA | 2009

Learning to Extract Biological Event and Relation Graphs

Jari Björne; Filip Ginter; Juho Heimonen; Sampo Pyysalo; Tapio Salakoski

Collaboration


Dive into the Juho Heimonen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jari Björne

Turku Centre for Computer Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge