Bartosz Dobrzelecki
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bartosz Dobrzelecki.
Philosophical Transactions of the Royal Society A | 2010
Bartosz Dobrzelecki; Amrey Krause; Alastair Hume; Alistair Grant; Mario Antonioletti; Tilaye Y. Alemu; Malcolm P. Atkinson; Mike Jackson; Elias Theocharopoulos
OGSA-DAI (Open Grid Services Architecture Data Access and Integration) is a framework for building distributed data access and integration systems. Until recently, it lacked the built-in functionality that would allow easy creation of federations of distributed data sources. The latest release of the OGSA-DAI framework introduced the OGSA-DAI DQP (Distributed Query Processing) resource. The new resource encapsulates a distributed query processor, that is able to orchestrate distributed data sources when answering declarative user queries. The query processor has many extensibility points, making it easy to customize. We have also introduced a new OGSA-DAI Views resource that provides a flexible method for defining views over relational data. The interoperability of the two new resources, together with the flexibility of the OGSA-DAI framework, allows the building of highly customized data integration solutions.
Grid and Cloud Database Management | 2011
Mike Jackson; Mario Antonioletti; Bartosz Dobrzelecki; Neil Chue Hong
OGSA–DAI provides a framework for sharing and managing distributed data. OGSA–DAI is highly customizable and can be used to manage, share and process distributed data (e.g. relational, XML, files and RDF triples). It does this by executing workflows that can encapsulate complex distributed data management scenarios in which data from one or more sources can be accessed, updated, combined and transformed. Moreover, the data processing capabilities provided by OGSA–DAI are further augmented by a powerful distributed query processor and relational views component that allow distributed data sources to be viewed and queried as if they were a single resource. OGSA–DAI allows researchers and business users to move away from logistical and technical concerns such as data locations, data models, data transfers and optimization strategies for data integration and instead focus on application-specific data analysis and processing.
Journal of Bioinformatics and Computational Biology | 2010
Mizanur Khondoker; Till T. Bachmann; Muriel Mewissen; Paul Dickinson; Bartosz Dobrzelecki; Colin J. Campbell; Andrew R. Mount; Anthony J. Walton; Jason Crain; Holger Schulze; Gerard Giraud; Alan J. Ross; Ilenia Ciani; Stuart W. J. Ember; Chaker Tlili; Jonathan G. Terry; Eilidh Grant; Nicola McDonnell; Peter Ghazal
Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).
parallel computing | 2006
Mario Antonioletti; Malcolm P. Atkinson; Neil Chue Hong; Bartosz Dobrzelecki; Alastair Hume; Mike Jackson; Konstantinos Karasavvas; Amy Krause; Jennifer M. Schopf; Tom Sugden; Elias Theocharopoulos
OGSA-DAI (Open Grid Services Architecture - Data Access and Integration) provides an extensible software framework allowing data resources, such as files, relational and XML databases, to be exposed through Web services acting within collaborative Grid environments or, more modestly, in stand-alone mode. OGSA-DAI may be deployed to WSRF-based platforms, such as the Globus Toolkit 4, as well as non-WSRF based ones, such as the UK OMII Server or standard versions of Tomcat and axis. Regardless of the platform, the core functionality provided remains the same. OGSA-DAI allows data resources to be accessed and integrated into the main infrastructures presently being used to construct Grids. OGSA-DAI provides a number of optimisations that reduce unnecessary data movement by shifting work to the Web service and encapsulating multiple client-Web service interactions into a single one, and allows for functionality to be added or customised based on the application. OGSA-DAI is widely used and is available from www.ogsadai.org.uk. It is also bundled with the OMII-UK and Globus Toolkit distributions. This paper gives an overview of what OGSA-DAI is, how it works, presents some usage scenarios, and outlines future enhancements.
Grid and Cloud Database Management | 2011
Bartosz Dobrzelecki; Amrey Krause; Michal Piotrowski; Neil Chue Hong
Database management techniques using distributed processing services have evolved to address the issues of distributed, heterogeneous data collections held across dynamic, virtual organisations [1-3]. These techniques, originally developed for data grids in domains such as high-energy particle physics [4], have been adapted to make use of the emerging cloud infrastructures [5].
Philosophical Transactions of the Royal Society A | 2009
Jeremy Nowell; Charaka Palansuriya; Michal Piotrowski; Florian Scharinger; Paul Graham; Bartosz Dobrzelecki; Arthur Trew
As large grid infrastructures, such as Enabling Grids for E-sciencE, mature, they are being used by scientists around the world in their daily work, running thousands of concurrent computational jobs and transferring large amounts of data. The successful and sustainable operation of such grid infrastructures is only possible through the use of monitoring tools. The underlying networks upon which grid infrastructures are built are critical to their operation; therefore, network monitoring becomes an important part of the overall grid monitoring strategy. In this paper, the design and implementation of a set of tools for providing access to federated network monitoring data are presented, based on standards developed within the Open Grid Forum Network Measurements Working Group (NM-WG). These tools give access to data collected by heterogeneous, NM-WG compliant network monitoring tools.
high performance distributed computing | 2010
Savvas Petrou; Terence Sloan; Muriel Mewissen; Thorsten Forster; Michal Piotrowski; Bartosz Dobrzelecki
The statistical language R and Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by these analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing (HPC) systems offer a solution to this issue. The Simple Parallel R INTerface (SPRINT) is a package that provides biostatisticians with easy access to HPC systems and allows the addition of parallelized functions to R. This paper will present how we added a parallelized permutation testing function in R using SPRINT and how this function performs on a supercomputer for executions of up to 512 processes.
international conference on networking and services | 2009
Paul Graham; Jeremy Nowell; Florian Scharinger; Charaka Palansuriya; Bartosz Dobrzelecki; Arthur Trew
The coordination and scheduling of affiliated tasks to be run at different sites is a challenging problem, specifically in the domain of network performance monitoring. This paper presents a software implementation of the Probes Coordination Protocol (PCP) which provides a solution to this problem. The PCP allows tasks to be executed regularly on a multitude of sites without the need for repeated user or administrator intervention. In addition it provides a robust mechanism for handling site failures, and a VOMS based security model to ensure appropriate usage. The paper provides the original motivation for the PCP, describes the PCP itself and discusses the new software implementation and its application.
ieee international conference on escience | 2008
Alistair Grant; Mario Antonioletti; Alastair Hume; Amy Krause; Bartosz Dobrzelecki; Mike Jackson; Mark Parsons; Malcolm P. Atkinson; Elias Theocharopoulos
In modern distributed computing, vast amounts of data are stored in many different formats, employing different storage solutions. OGSA-DAI 3.0 is a middleware software solution. It provides application developers with the means to access data distributed across multiple platforms with different native access mechanisms. Data integration can take place at the server and deliver results using a variety of protocols and mechanisms within OGSA-DAI. It accomplishes this by using a highly flexible and extensible framework which can accommodate different types of data resources, such as XML databases, relational databases or files, different operations such as transformation to different formats, selection or filter operations. The framework can be extended by a developer to provide customized functionality for project specific tasks while using generic functions for common tasks such as database querying. This paper presents an overview of OGSA-DAI and how it tackles data access and integration through a set of example use cases.
eScience '08. IEEE Fourth International Conference on | 2008
Alistair Grant; Mario Antonioletti; Alastair Hume; Amrey Krause; Bartosz Dobrzelecki; Michael Jackson; Mark Parsons; Malcolm P. Atkinson; Elias Theocharopoulos
In modern distributed computing, vast amounts of data are stored in many different formats, employing different storage solutions. OGSA-DAI 3.0 is a middleware software solution. It provides application developers with the means to access data distributed across multiple platforms with different native access mechanisms. Data integration can take place at the server and deliver results using a variety of protocols and mechanisms within OGSA-DAI. It accomplishes this by using a highly flexible and extensible framework which can accommodate different types of data resources, such as XML databases, relational databases or files, different operations such as transformation to different formats, selection or filter operations. The framework can be extended by a developer to provide customized functionality for project specific tasks while using generic functions for common tasks such as database querying. This paper presents an overview of OGSA-DAI and how it tackles data access and integration through a set of example use cases.