Richard Eckart de Castilho

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Richard Eckart de Castilho is active.

Explore More

Publication

Featured researches published by Richard Eckart de Castilho.

international conference on computational linguistics | 2014

A broad-coverage collection of portable NLP components for building shareable analysis pipelines

Richard Eckart de Castilho; Iryna Gurevych

Due to the diversity of natural language processing (NLP) tools and resources, combining them into processing pipelines is an important issue, and sharing these pipelines with others remains a problem. We present DKPro Core, a broad-coverage component collection integrating a wide range of third-party NLP tools and making them interoperable. Contrary to other recent endeavors that rely heavily on web services, our collection consists only of portable components distributed via a repository, making it particularly interesting with respect to sharing pipelines with other researchers, embedding NLP pipelines in applications, and the use on high-performance computing clusters. Our collection is augmented by a novel concept for automatically selecting and acquiring resources required by the components at runtime from a repository. Based on these contributions, we demonstrate a way to describe a pipeline such that all required software and resources can be automatically obtained, making it easy to share it with others, e.g. in order to reproduce results or as examples in teaching, documentation, or publications.

Proceedings of the 2011 workshop on Data infrastructurEs for supporting information retrieval evaluation | 2011

A lightweight framework for reproducible parameter sweeping in information retrieval

Richard Eckart de Castilho; Iryna Gurevych

Information retrieval experiments consist of multiple tasks, such as preprocessing and evaluation, each subject to various parameters affecting their results. Dependencies between tasks exist such that one task may have to use the output of another. Many scientific workflow systems come with sophisticated graphical authoring tools but do not integrate well with integrated development environments used for programming. The framework for dataflow-based parameter sweeping experiments introduced in this paper is lightweight, provides support for declaratively setting up experiments, and integrates seamlessly with Java-based development environments. To reduce the computational effort of running an experiment with many different parameter settings, the framework uses dataflow dependency information to maintain and reuse intermediate results.

meeting of the association for computational linguistics | 2014

Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno

Seid Muhie Yimam; Chris Biemann; Richard Eckart de Castilho; Iryna Gurevych

In this paper, we present a flexible approach to the efficient and exhaustive manual annotation of text documents. For this purpose, we extend WebAnno (Yimam et al., 2013) an open-source web-based annotation tool. 1 While it was previously limited to specific annotation layers, our extension allows adding and configuring an arbitrary number of layers through a web-based UI. These layers can be annotated separately or simultaneously, and support most types of linguistic annotations such as spans, semantic classes, dependency relations, lexical chains, and morphology. Further, we tightly integrate a generic machine learning component for automatic annotation suggestions of span annotations. In two case studies, we show that automatic annotation suggestions, combined with our split-pane UI concept, significantly reduces annotation time.

Database | 2016

Text mining resources for the life sciences

Piotr Przybyła; Matthew Shardlow; Sophie Aubin; Robert Bossy; Richard Eckart de Castilho; Stelios Piperidis; John McNaught; Sophia Ananiadou

Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability.

linguistic annotation workshop | 2017

Representation and Interchange of Linguistic Annotation. An In-Depth, Side-by-Side Comparison of Three Designs

Richard Eckart de Castilho; Nancy Ide; Emanuele Lapponi; Stephan Oepen; Keith Suderman; Erik Velldal; Marc Verhagen

For decades, most self-respecting linguistic engineering initiatives have designed and implemented custom representations for various layers of, for example, morphological, syntactic, and semantic analysis. Despite occasional efforts at harmonization or even standardization, our field today is blessed with a multitude of ways of encoding and exchanging linguistic annotations of these types, both at the levels of ‘abstract syntax’, naming choices, and of course file formats. To a large degree, it is possible to work within and across design plurality by conversion, and often there may be good reasons for divergent design reflecting differences in use. However, it is likely that some abstract commonalities across choices of representation are obscured by more superficial differences, and conversely there is no obvious procedure to tease apart what actually constitute contentful vs. mere technical divergences. In this study, we seek to conceptually align three representations for common types of morpho-syntactic analysis, pinpoint what in our view constitute contentful differences, and reflect on the underlying principles and specific requirements that led to individual choices. We expect that a more in-depth understanding of these choices across designs may led to increased harmonization, or at least to more informed design of future representations.

ieee international conference semantic computing | 2011

Semantic Service Retrieval Based on Natural Language Querying and Semantic Similarity

Richard Eckart de Castilho; Iryna Gurevych

In this paper, we address the task of semantic service retrieval based on natural language queries. We analyze identifiers of services, operations, and parameters extracted from WSDL service descriptions with respect to their semantic content. In order to measure the semantic similarity between query and service description, we introduce a novel computationally efficient document similarity measure based on information content and fuzzy set theory.

meeting of the association for computational linguistics | 2015

In-tool Learning for Selective Manual Annotation in Large Corpora

Erik-Lân Do Dinh; Richard Eckart de Castilho; Iryna Gurevych

We present a novel approach to the selective annotation of large corpora through the use of machine learning. Linguistic search engines used to locate potential instances of an infrequent phenomenon do not support ranking the search results. This favors the use of high-precision queries that return only a few results over broader queries that have a higher recall. Our approach introduces a classifier used to rank the search results and thus helping the annotator focus on those results with the highest potential of being an instance of the phenomenon in question, even in low-precision queries. The classifier is trained in an in-tool fashion, except for preprocessing relying only on the manual annotations done by the users in the querying tool itself. To implement this approach, we build upon CSniper1, a web-based multi-user search and annotation tool.

Archive | 2017

Collaborative Web-Based Tools for Multi-layer Text Annotation

Chris Biemann; Kalina Bontcheva; Richard Eckart de Castilho; Iryna Gurevych; Seid Muhie Yimam

Effectively managing the collaboration of many annotators is a crucial ingredient for the success of larger annotation projects. For collaboration, web-based tools offer a low-entry way gathering annotations from distributed contributors. While the management structure of annotation tools is more or less stable across projects, the kind of annotations vary widely between projects. The challenge for web-based tools for multi-layer text annotation is to combine ease of use and availability through the web with maximal flexibility regarding the types and layers of annotations. In this chapter, we outline requirements for web-based annotation tools in detail and review a variety of tools in respect to these requirements. Further, we discuss two web-based multi-layer annotation tools in detail: GATE Teamware and WebAnno. While differing in some aspects, both tools largely fulfill the requirements for today’s web-based annotation tools. Finally, we point out further directions, such as increased schema flexibility and tighter integration of automation for annotation suggestions.

meeting of the association for computational linguistics | 2013