Pieter W. Adriaans | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pieter W. Adriaans is active.

Explore More

Publication

Featured researches published by Pieter W. Adriaans.

Genome Biology | 2008

Overview of BioCreative II gene mention recognition

Larry Smith; Lorraine K. Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M. Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A. Struble; Richard J. Povinelli; Andreas Vlachos; William A. Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter W. Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov

Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.

KDECB'06 Proceedings of the 1st international conference on Knowledge discovery and emergent complexity in bioinformatics | 2006

Learning relations from biomedical corpora using dependency trees

Sophia Katrenko; Pieter W. Adriaans

In this paper we address the relation learning problem in the biomedical domain. We propose a representation which takes into account the syntactic information and allows for using different machine learning methods. To carry out the syntactic analysis, three parsers, LinkParser, Minipar and Charniak parser were used. The results we have obtained are comparable to the performance of relation learning systems in the biomedical domain and in some cases out-perform them. In addition, we have studied the impact of ensemble methods on learning relations using the representation we proposed. Given that recall is very important for the relation learning, we explored the ways of improving it. It has been shown that ensemble methods provide higher recall and precision than individual classifiers alone.

international colloquium on grammatical inference | 2002

The EMILE 4.1 Grammar Induction Toolbox

Pieter W. Adriaans; Marco Vervoort

The EMILE 4.1to olbox is intended to help researchers to analyze the grammatical structure of free text. The basic theoretical concepts behind the EMILE algorithm are expressions and contexts. The idea is that expressions of the same syntactic type can be substituted for each other in the same context. By performing a large statistical cluster analysis on the sentences of the text EMILE tries to identify traces of expressions that have this substitutionability relation. If there exists enough statistical evidence for the existence of a grammatical type EMILE creates such a type. Fundamental notions in the EMILE 4.1 algorithm are the so-called characteristic expressions and contexts. An expression of type T is characteristic for T if it only appears in a context of type T. The notion of characteristic context and expression boosts the learning capacities of the EMILE 4.1algorit hm. The EMILE algorithm is relatively scalable. It can easily analyze text up to 100,000 sentences on a workstation. The EMILE tool has been used in various domains, amongst others biomedical research [Adriaans, 2001b], identification of ontologies and semantic learning [Adriaans et al., 1993].

conference on current trends in theory and practice of informatics | 2000

Towards High Speed Grammar Induction on Large Text Corpora

Pieter W. Adriaans; Marten Trautwein; Marco Vervoort

In this paper we describe an efficient and scalable implementation for grammar induction based on the EMILE approach [2,3,4,5,6]. The current EMILE 4.1 implementation [11] is one of the first efficient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction research on various types of corpora. The EMILE approach is based on notions from categorial grammar (cf. [10]), which is known to generate the class of context-free languages. EMILE learns from positive examples only (cf. [1,7,9]). We describe the algorithms underlying the approach and some interesting practical results on small and large text collections. As shown in the articles mentioned above, in the limit EMILE learns the correct grammatical structure of a language from sentences of that language. The conducted experiments show that, put into practice, EMILE 4.1 is efficient and scalable. This current implementation learns a subclass of the shallow context-free languages. This subclass seems sufficiently rich to be of practical interest. Especially Emile seems to be a valuable tool in the context of syntactic and semantic analysis of large text corpora.

workflows in support of large scale science | 2007

WS-VLAM: towards a scalable workflow system on the grid

Vladimir Korkhov; Dmitry Vasyunin; Adianto Wibisono; Víctor Guevara-Masís; Adam Belloum; Cees de Laat; Pieter W. Adriaans; Louis O. Hertzberger

Large scale scientific applications require extensive support from middleware and frameworks that provide the capabilities for distributed execution in the Grid environment. In particular, one of the examples of such frameworks is a Grid-enabled workflow management system. In this paper we present WS-VLAM workflow management system, describe its current design and the developments targeting to support efficient and scalable execution of large workflow applications on the Grid.

international colloquium on grammatical inference | 2006

Using MDL for grammar induction

Pieter W. Adriaans; Ceriel J. H. Jacobs

In this paper we study the application of the Minimum Description Length principle (or two-part-code optimization) to grammar induction in the light of recent developments in Kolmogorov complexity theory. We focus on issues that are important for construction of effective compression algorithms. We define an independent measure for the quality of a theory given a data set: the randomness deficiency. This is a measure of how typical the data set is for the theory. It can not be computed, but it can in many relevant cases be approximated. An optimal theory has minimal randomness deficiency. Using results from [4] and [2] we show that: – Shorter code not necessarily leads to better theories. We prove that, in DFA induction, already as a result of a single deterministic merge of two nodes, divergence of randomness deficiency and MDL code can occur. – Contrary to what is suggested by the results of [6] there is no fundamental difference between positive and negative data from an MDL perspective. – MDL is extremely sensitive to the correct calculation of code length: model code and data-to-model code. These results show why the applications of MDL to grammar induction so far have been disappointing. We show how the theoretical results can be deployed to create an effective algorithm for DFA induction. However, we believe that, since MDL is a global optimization criterion, MDL based solutions will in many cases be less effective in problem domains where local optimization criteria can be easily calculated. The algorithms were tested on the Abbadingo problems ([10]). The code was in Java, using the Satin ([17]) divide-and-conquer system that runs on top of the Ibis ([18]) Grid programming environment.

cluster computing and the grid | 2007

Using Jade agent framework to prototype an e-Science workflow bus

Zhiming Zhao; Adam Belloum; C. de Laat; Pieter W. Adriaans; Bob Hertzberger

Most of the existing scientific workflow management systems (SWMS) are driven by applications from specific domains and are developed in academic projects. It is challenging to introduce an existing SWMS to a new domain; not only the workflow model and description language do not easily fit in new problem domains, but also the unstable development state of existing systems does not provide all functionality required by the new applications and thus gives high risk for the development. Aggregating different workflow systems as one generic environment enables the sharing on both components and processes between experiments, and promotes the knowledge transfer between domains. A workflow bus approach is to integrate different e-science workflow engines via a software bus. In this paper, we present the basic idea of workflow bus, and discuss how Jade agent framework can be used to prototype the runtime infrastructure of a workflow bus.

international conference on conceptual structures | 2007

WS-VLAM: A GT4 Based Workflow Management System

Adianto Wibisono; Dmitry Vasyunin; Vladimir Korkhov; Zhiming Zhao; Adam Belloum; Cees de Laat; Pieter W. Adriaans; Bob Hertzberger

Generic Grid middleware, e.g., Globus Toolkit 4 (GT4), provides basic services for scientific workflow management systems to discover, store and integrate workflow components. Using the state of the art Grid services can advance the functionality of workflow engine in orchestrating distributed Grid resources. In this paper, we present our work on migrating VLAM-G, a Grid workflow engine based on GT2 to GT4. We discuss how we use the rich set of services provided by GT4 in the new design to realize the user interactivity, interoperability and monitoring. The experiment results show that use cases from previous systems can be migrated seamlessly into the new architecture.

Journal of Artificial Intelligence Research | 2010

Using local alignments for relation recognition

Sophia Katrenko; Pieter W. Adriaans; Maarten van Someren

This paper discusses the problem of marrying structural similarity with semantic relatedness for Information Extraction from text. Aiming at accurate recognition of relations, we introduce local alignment kernels and explore various possibilities of using them for this task. We give a definition of a local alignment (LA) kernel based on the Smith-Waterman score as a sequence similarity measure and proceed with a range of possibilities for computing similarity between elements of sequences. We show how distributional similarity measures obtained from unlabeled data can be incorporated into the learning task as semantic knowledge. Our experiments suggest that the LA kernel yields promising results on various biomedical corpora outperforming two baselines by a large margin. Additional series of experiments have been conducted on the data sets of seven general relation types, where the performance of the LA kernel is comparable to the current state-of-the-art results.

IEEE Transactions on Information Theory | 2009

Approximation of the Two-Part MDL Code

Pieter W. Adriaans; Paul M. B. Vitányi

Approximation of the optimal two-part minimum description length (MDL) code for given data, through successive monotonically length-decreasing two-part MDL codes, has the following properties: (i) computation of each step may take arbitrarily long; (ii) we may not know when we reach the optimum, or whether we will reach the optimum at all; (iii) the sequence of models generated may not monotonically improve the goodness of fit; but (iv) the model associated with the optimum has (almost) the best goodness of fit. To express the practically interesting goodness of fit of individual models for individual data sets we have to rely on Kolmogorov complexity.

Explore More