Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Petr Škoda is active.

Publication


Featured researches published by Petr Škoda.


Journal of Cheminformatics | 2014

Molpher: a software framework for systematic chemical space exploration

David Hoksza; Petr Škoda; Milan Voršilák; Daniel Svozil

BackgroundChemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology.ResultsIn this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term ‘molecular morphing’, Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called ‘morphing operators’ that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines.ConclusionsMolpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline.


international semantic web conference | 2016

LinkedPipes ETL: Evolved Linked Data Preparation

Jakub Klímek; Petr Škoda

As Linked Data gains traction, the proper support for its publication and consumption is more important than ever. Even though there is a multitude of tools for preparation of Linked Data, they are still either quite limited, difficult to use or not compliant with recent W3C Recommendations. In this demonstration paper, we present LinkedPipes ETL, a lightweight, Linked Data preparation tool. It is focused mainly on smooth user experience including mobile devices, ease of integration based on full API coverage and universal usage thanks to its library of components. We build on our experience gained by development and use of UnifiedViews, our previous Linked Data ETL tool, and present four use cases in which our new tool excels in comparison.


european semantic web conference | 2014

UnifiedViews: An ETL Framework for Sustainable RDF Data Processing

Tomáš Knap; Maria Kukhar; Bohuslav Macháč; Petr Škoda; Jiří Tomeš; Ján Vojt

We present UnifiedViews, an Extract-Transform-Load (ETL) framework that allows users to define, execute, monitor, debug, schedule, and share ETL data processing tasks, which may employ custom plugins created by users. UnifiedViews differs from other ETL frameworks by natively supporting RDF data and ontologies. We are persuaded that UnifiedViews helps RDF/Linked Data consumers to address the problem of sustainable RDF data processing; we support such statement by introducing list of projects and other activities where UnifiedViews is successfully exploited.


OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2017

Speeding up Publication of Linked Data Using Data Chunking in LinkedPipes ETL

Jakub Klímek; Petr Škoda

There is a multitude of tools for preparation of Linked Data from data sources such as CSV and XML files. These tools usually perform as expected when processing examples, or smaller real world data. However, a majority of these tools become hard to use when faced with a larger dataset such as hundreds of megabytes large CSV file. Tools which load the entire resulting RDF dataset into memory usually have memory requirements unsatisfiable by commodity hardware. This is the case of RDF-based ETL tools. Their limits can be avoided by running them on powerful and expensive hardware, which is, however, not an option for majority of data publishers. Tools which process the data in a streamed way tend to have limited transformation options. This is the case of text-based transformations, such as XSLT, or per-item SPARQL transformations such as the streamed version of TARQL. In this paper, we show how the power and transformation options of RDF-based ETL tools can be combined with the possibility to transform large datasets on common consumer hardware for so called chunkable data - data which can be split in a certain way. We demonstrate our approach in our RDF-based ETL tool, LinkedPipes ETL. We include experiments on selected real world datasets and a comparison of performance and memory consumption of available tools.


international workshop on combinatorial algorithms | 2009

Computability of Width of Submodular Partition Functions

Petr Škoda

The notion of submodular partition functions generalizes many of well-known tree decompositions of graphs. For fixed k, there are polynomial-time algorithms to determine whether a graph has tree-width, branch-width, etc. at most k. Contrary to these results, we show that there is no sub-exponential algorithm for determining whether the width of a given submodular partition function is at most two. In addition, we also develop another dual notion for submodular partition functions which is analogous to loose tangles for connectivity functions.


electronic government and the information systems perspective | 2017

Practical Use Cases for Linked Open Data in eGovernment Demonstrated on the Czech Republic

Jakub Klímek; Petr Škoda

The motivation for publishing data as Open Data and its benefits are already clear to many public authorities. However, most of open data is published as 3* data classified using the 5-star deployment scheme. When it comes to publishing data as 5* data, i.e. as Linked Open Data (LOD), for many authorities the benefits and motivation become abstract and unclear. In this paper, we introduce a playground which clarifies these benefits to public authorities in the Czech Republic using their own datasets. The playground consists of 73 real datasets transformed to LOD and two mature tools for LOD processing, visualization and analysis. We demonstrate the benefits on two concrete datasets provided by the Ministry of the Interior of the Czech Republic. We show how other public authorities may perform a similar demonstration on their own datasets. The paper is by no means limited to public authorities of the Czech Republic, as the same principles and processes are applicable everywhere else. Our example can be used to demonstrate the benefits of publishing 5* data on real datasets, and as a motivation and guidelines for building a similar playground for other countries.


bioinformatics and biomedicine | 2016

Benchmarking platform for ligand-based virtual screening

Petr Škoda; David Hoksza

Virtual screening (VS) of databases of chemical compounds has become a common step in the drug discovery process. Ligand-based virtual screening is a variant of VS where similarity to known active compounds is utilized in the discovery of new bioactive molecules. The cornerstone, which determines success of virtual screening, is the used molecular similarity measure. Currently, there is no superior approach to modeling molecular similarity and design of new similarity approaches is an active research field in cheminformatics. Therefore, proper benchmarking is of utter importance. In this paper, we describe common pitfalls of current approach to benchmarking of new methods. We focus on the importance of reproducibility and design of benchmarking datasets. Moreover, we identify the dataset difficulty as an important, yet not wildly utilized, property of the benchmarking data. To solve the identified issues we present a new benchmarking platform. The platform implements most commonly used molecular representations and includes datasets of varying difficulty levels as well as scripts which make the platform easy to use and extend. The existing representations are benchmarked using the proposed platform and results are presented. The benchmarking platform is available at https://github.com/skodapetr/lbvs-environment.


international symposium on bioinformatics research and applications | 2014

2D Pharmacophore Query Generation

David Hoksza; Petr Škoda

Using pharmacophores in virtual screening of large chemical compound libraries proved to be a valuable concept in computer-aided drug design. Traditionally, pharmacophore-based screening is performed in 3D space where crystallized or predicted structures of ligands are superposed and where pharmacophore features are identified and compiled into a 3D pharmacophore model. However, in many cases the structures of the ligands are not known which results in using a 2D pharmacophore model.


annual acis international conference on computer and information science | 2013

Chemical space visualization using ViFrame

Petr Škoda; David Hoksza

Exploration of the chemical space is an important component of drug discovery process and its importance grows with the increase in the computation power which allows to explore larger areas of the chemical space. Recently, there emerged new algorithms proposed to automatically generate and search for compounds (objects in the chemical space) with desired properties. Although these approaches can be a big help, human interaction is usually still inevitable in the end. Visualization of the space can help make sense of the generated data and therefore visualization techniques are usually an integral part of any task related to chemical space exploration. Currently, there exist methods dealing with visualization of the chemical space but there is no framework supporting simple development of new methods. The purpose of this paper is to introduce such a modular framework called ViFrame. ViFrame offers the possibility to implement every single part of the visualization pipeline consisting of steps such as reading and merging molecules from multiple data sources, applying transformations and, of course, visualization of the data set in 2D space. The advantage of the framework consists in providing an environment where the user can focus on the development of the previously mentioned tasks while the framework supports seamless integration of the developed components. The framework also incorporates an application that provides the user with graphical interface for modules manipulation and presentation of the visualization results. For simple utilization of the application without the necessity of implementation of ones own module, several visualization methods have been implemented.


information integration and web-based applications & services | 2017

LinkedPipes ETL in use: practical publication and consumption of linked data

Jakub Klímek; Petr Škoda

Companies and institutions now realize the potential of Linked Open Data (LOD) and they start publishing their own data as LOD. However, publishing LOD is still a challenging task. One of the main reasons is a lack of user friendly tooling which would properly support the whole LOD publishing process. The process typically consists of source data extraction, transformation to RDF, alignment with commonly used vocabularies, linking to other datasets, computing metadata, publishing on the web as a dump, loading into a triplestore and recording the dataset in a data catalog such as CKAN. In this paper we present LinkedPipes ETL, a tool for ETL-like LOD publishing, which mainly focuses on supporting such LOD publishing workflows in a user friendly way. In addition, the tool also eases consumption of already existing LOD data sources as it addresses some of the practical issues associated with it. Finally, the tool itself uses Linked Data technologies for representation of the ETL processes. We describe LinkedPipes ETL and its main distinguishing features in context of the use cases in which the tool has already been deployed. They include an institution of public administration, a municipality, a university, a software company and an open data initiative.

Collaboration


Dive into the Petr Škoda's collaboration.

Top Co-Authors

Avatar

David Hoksza

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Jakub Klímek

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Jan Jelínek

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Tomáš Knap

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Bohuslav Macháč

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Daniel Svozil

Academy of Sciences of the Czech Republic

View shared research outputs
Top Co-Authors

Avatar

Jiří Tomeš

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Ján Vojt

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Maria Kukhar

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Milan Voršilák

Institute of Chemical Technology in Prague

View shared research outputs
Researchain Logo
Decentralizing Knowledge