Matúš Kalaš
University of Bergen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matúš Kalaš.
Bioinformatics | 2013
Jon C. Ison; Matúš Kalaš; Inge Jonassen; Dan Bolser; Mahmut Uludag; Hamish McWilliam; James Malone; Rodrigo Lopez; Steve Pettifer; Peter Rice
Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. Availability: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. Contact: [email protected]
Nucleic Acids Research | 2010
Steve Pettifer; Jon Ison; Matúš Kalaš; Dave Thorne; Philip McDermott; Inge Jonassen; Ali Liaquat; José María Fernández; Jose Manuel Rodriguez; David G. Pisano; Christophe Blanchet; Mahmut Uludag; Peter Rice; Edita Bartaseviciute; Kristoffer Rapacki; Maarten L. Hekkelman; Olivier Sand; Heinz Stockinger; Andrew B. Clegg; Erik Bongcam-Rudloff; Jean Salzemann; Vincent Breton; Teresa K. Attwood; Graham Cameron; Gert Vriend
The EMBRACE (European Model for Bioinformatics Research and Community Education) web service collection is the culmination of a 5-year project that set out to investigate issues involved in developing and deploying web services for use in the life sciences. The project concluded that in order for web services to achieve widespread adoption, standards must be defined for the choice of web service technology, for semantically annotating both service function and the data exchanged, and a mechanism for discovering services must be provided. Building on this, the project developed: EDAM, an ontology for describing life science web services; BioXSD, a schema for exchanging data between services; and a centralized registry (http://www.embraceregistry.net) that collects together around 1000 services developed by the consortium partners. This article presents the current status of the collection and its associated recommendations and standards definitions.
Journal of Biomedical Semantics | 2014
Toshiaki Katayama; Mark D. Wilkinson; Kiyoko F. Aoki-Kinoshita; Shuichi Kawashima; Yasunori Yamamoto; Atsuko Yamaguchi; Shinobu Okamoto; Shin Kawano; Jin Dong Kim; Yue Wang; Hongyan Wu; Yoshinobu Kano; Hiromasa Ono; Hidemasa Bono; Simon Kocbek; Jan Aerts; Yukie Akune; Erick Antezana; Kazuharu Arakawa; Bruno Aranda; Joachim Baran; Jerven T. Bolleman; Raoul J. P. Bonnal; Pier Luigi Buttigieg; Matthew Campbell; Yi An Chen; Hirokazu Chiba; Peter J. A. Cock; K. Bretonnel Cohen; Alexandru Constantin
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Nucleic Acids Research | 2016
Jon Ison; Kristoffer Rapacki; Hervé Ménager; Matúš Kalaš; Emil Rydza; Piotr Jaroslaw Chmura; Christian Anthon; Niall Beard; Karel Berka; Dan Bolser; Tim Booth; Anthony Bretaudeau; Jan Brezovsky; Rita Casadio; Gianni Cesareni; Frederik Coppens; Michael Cornell; Gianmauro Cuccuru; Kristian Davidsen; Gianluca Della Vedova; Tunca Doğan; Olivia Doppelt-Azeroual; Laura Emery; Elisabeth Gasteiger; Thomas Gatter; Tatyana Goldberg; Marie Grosjean; Björn Grüning; Manuela Helmer-Citterich; Hans Ienasescu
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Nucleic Acids Research | 2013
Geir Kjetil Sandve; Sveinung Gundersen; Morten Johansen; Ingrid K. Glad; Krishanthi Gunathasan; Lars Holden; Marit Holden; Knut Liestøl; Ståle Nygård; Vegard Nygaard; Jonas Paulsen; Halfdan Rydbeck; Kai Trengereid; Trevor Clancy; Finn Drabløs; Egil Ferkingstad; Matúš Kalaš; Tonje G. Lien; Morten Beck Rye; Arnoldo Frigessi; Eivind Hovig
The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.
european conference on computational biology | 2010
Matúš Kalaš; Pæl Puntervoll; Alexandre Joseph; Edita Bartaševičiūtė; Armin Töpfer; Prabakar Venkataraman; Steve Pettifer; Jan Christian Bryne; Jon Ison; Christophe Blanchet; Kristoffer Rapacki; Inge Jonassen
Motivation: The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. Results: BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. Availability: The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community. Contact: [email protected]; [email protected]; [email protected]
BMC Bioinformatics | 2011
Sveinung Gundersen; Matúš Kalaš; Osman Abul; Arnoldo Frigessi; Eivind Hovig; Geir Kjetil Sandve
BackgroundWith the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.ResultsWe here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.ConclusionsThe defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.
New Biotechnology | 2013
Tomas Klingström; Larissa Soldatova; Robert Stevens; T. Erik Roos; Morris A. Swertz; Kristian M. Müller; Matúš Kalaš; Patrick Lambrix; Michael J. Taussig; Jan-Eric Litton; Ulf Landegren; Erik Bongcam-Rudloff
Management of data to produce scientific knowledge is a key challenge for biological research in the 21st century. Emerging high-throughput technologies allow life science researchers to produce big data at speeds and in amounts that were unthinkable just a few years ago. This places high demands on all aspects of the workflow: from data capture (including the experimental constraints of the experiment), analysis and preservation, to peer-reviewed publication of results. Failure to recognise the issues at each level can lead to serious conflicts and mistakes; research may then be compromised as a result of the publication of non-coherent protocols, or the misinterpretation of published data. In this report, we present the results from a workshop that was organised to create an ontological data-modelling framework for Laboratory Protocol Standards for the Molecular Methods Database (MolMeth). The workshop provided a set of short- and long-term goals for the MolMeth database, the most important being the decision to use the established EXACT description of biomedical ontologies as a starting point.
International Journal on Software Tools for Technology Transfer | 2016
Hervé Ménager; Matúš Kalaš; Kristoffer Rapacki; Jon Ison
The diversity and complexity of bioinformatics resources presents significant challenges to their localisation, deployment and use, creating a need for reliable systems that address these issues. Meanwhile, users demand increasingly usable and integrated ways to access and analyse data, especially within convenient, integrated “workbench” environments. Resource descriptions are the core element of registry and workbench systems, which are used to both help the user find and comprehend available software tools, data resources, and Web Services, and to localise, execute and combine them. The descriptions are, however, hard and expensive to create and maintain, because they are volatile and require an exhaustive knowledge of the described resource, its applicability to biological research, and the data model and syntax used to describe it. We present here the Workbench Integration Enabler, a software component that will ease the integration of bioinformatics resources in a workbench environment, using their description provided by the existing ELIXIR Tools and Data Services Registry.
GigaScience | 2017
Olivia Doppelt-Azeroual; Fabien Mareuil; Éric Deveaud; Matúš Kalaš; Nicola Soranzo; Marius van den Beek; Björn Grüning; Jon Ison; Hervé Ménager
Abstract Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. Findings: We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. Conclusions: ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE.