Jano van Hemert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jano van Hemert is active.

Explore More

Publication

Featured researches published by Jano van Hemert.

parallel processing and applied mathematics | 2007

Scientific workflow: a survey and research directions

Adam Barker; Jano van Hemert

Workflow technologies are emerging as the dominant approach to coordinate groups of distributed services. However with a space filled with competing specifications, standards and frameworks from multiple domains, choosing the right tool for the job is not always a straightforward task. Researchers are often unaware of the range of technology that already exists and focus on implementing yet another proprietary workflow system. As an antidote to this common problem, this paper presents a concise survey of existing workflow technology from the business and scientific domain and makes a number of key suggestions towards the future development of scientific workflow systems.

advanced data mining and applications | 2009

Automating Gene Expression Annotation for Mouse Embryo

Liangxiu Han; Jano van Hemert; Richard Baldock; Malcolm P. Atkinson

It is of high biomedical interest to identify gene interactions and networks that are associated with developmental and physiological functions in the mouse embryo. There are now large datasets with both spatial and ontological annotation of the spatio-temporal patterns of geneexpression that provide a powerful resource to discover potential mechanisms of embryo organisation. Ontological annotation of gene expression consists of labelling images with terms from the anatomy ontology for mouse development. Current annotation is made manually by domain experts. It is both time consuming and costly. In this paper, we present a new data mining framework to automatically annotate gene expression patterns in images with anatomic terms. This framework integrates the images stored in file systems with ontology terms stored in databases, and combines pattern recognition with image processing techniques to identify the anatomical components that exhibit gene expression patterns in images. The experimental result shows the framework works well.

high performance distributed computing | 2010

Towards optimising distributed data streaming graphs using parallel streams

Chee Sun Liew; Malcolm P. Atkinson; Jano van Hemert; Liangxiu Han

Modern scientific collaborations have opened up the opportunity of solving complex problems that involve multi-disciplinary expertise and large-scale computational experiments. These experiments usually involve large amounts of data that are located in distributed data repositories running various software systems, and managed by different organisations. A common strategy to make the experiments more manageable is executing the processing steps as a workflow. In this paper, we look into the implementation of fine-grained data-flow between computational elements in a scientific workflow as streams. We model the distributed computation as a directed acyclic graph where the nodes represent the processing elements that incrementally implement specific subtasks. The processing elements are connected in a pipelined streaming manner, which allows task executions to overlap. We further optimise the execution by splitting pipelines across processes and by introducing extra parallel streams. We identify performance metrics and design a measurement tool to evaluate each enactment. We conducted experiments to evaluate our optimisation strategies with a real world problem in the Life Sciences---EURExpress-II. The paper presents our distributed data-handling model, the optimisation and instrumentation strategies and the evaluation experiments. We demonstrate linear speed up and argue that this use of data-streaming to enable both overlapped pipeline and parallelised enactment is a generally applicable optimisation strategy.

parallel problem solving from nature | 2004

Dynamic Routing Problems with Fruitful Regions: Models and Evolutionary Computation

Jano van Hemert; J.A. La Poutré

We introduce the concept of fruitful regions in a dynamic routing context: regions that have a high potential of generating loads to be transported. The objective is to maximise the number of loads transported, while keeping to capacity and time constraints. Loads arrive while the problem is being solved, which makes it a real-time routing problem. The solver is a self-adaptive evolutionary algorithm that ensures feasible solutions at all times. We investigate under what conditions the exploration of fruitful regions improves the effectiveness of the evolutionary algorithm.

learning and intelligent optimization | 2010

Understanding TSP difficulty by learning from evolved instances

Kate Smith-Miles; Jano van Hemert; Xin Yu Lim

Whether the goal is performance prediction, or insights into the relationships between algorithm performance and instance characteristics, a comprehensive set of meta-data from which relationships can be learned is needed. This paper provides a methodology to determine if the meta-data is sufficient, and demonstrates the critical role played by instance generation methods. Instances of the Travelling Salesman Problem (TSP) are evolved using an evolutionary algorithm to produce distinct classes of instances that are intentionally easy or hard for certain algorithms. A comprehensive set of features is used to characterise instances of the TSP, and the impact of these features on difficulty for each algorithm is analysed. Finally, performance predictions are achieved with high accuracy on unseen instances for predicting search effort as well as identifying the algorithm likely to perform best.

Annals of Mathematics and Artificial Intelligence | 2011

Discovering the suitability of optimisation algorithms by learning from evolved instances

Kate Smith-Miles; Jano van Hemert

The suitability of an optimisation algorithm selected from within an algorithm portfolio depends upon the features of the particular instance to be solved. Understanding the relative strengths and weaknesses of different algorithms in the portfolio is crucial for effective performance prediction, automated algorithm selection, and to generate knowledge about the ideal conditions for each algorithm to influence better algorithm design. Relying on well-studied benchmark instances, or randomly generated instances, limits our ability to truly challenge each of the algorithms in a portfolio and determine these ideal conditions. Instead we use an evolutionary algorithm to evolve instances that are uniquely easy or hard for each algorithm, thus providing a more direct method for studying the relative strengths and weaknesses of each algorithm. The proposed methodology ensures that the meta-data is sufficient to be able to learn the features of the instances that uniquely characterise the ideal conditions for each algorithm. A case study is presented based on a comprehensive study of the performance of two heuristics on the Travelling Salesman Problem. The results show that prediction of search effort as well as the best performing algorithm for a given instance can be achieved with high accuracy.

British Journal of Ophthalmology | 2016

Measuring the precise area of peripheral retinal non-perfusion using ultra-widefield imaging and its correlation with the ischaemic index

Colin S. Tan; Milton C. Chew; Jano van Hemert; Michael Singer; Darren Bell; Srinivas R Sadda

Objective To determine the calculated, anatomically correct, area of retinal non-perfusion and total area of visible retina on ultra-widefield fluorescein angiography (UWF FA) in retinal vein occlusion (RVO) and to compare the corrected measures of non-perfusion with the ischaemic index. Methods Uncorrected UWF FA images from 32 patients with RVO were graded manually for capillary non-perfusion, which was calculated as a percentage of the total visible retina (uncorrected ischaemic index). The annotated images were converted using novel stereographic projection software to calculate precise areas of non-perfusion in mm2, which was compared as a percentage of the total area of visible retina (‘corrected non-perfusion percentage’) with the ischaemic index. Results The precise areas of peripheral non-perfusion ranged from 0 mm2 to 365.4 mm2 (mean 95.1 mm2), while the mean total visible retinal area was 697.0 mm2. The mean corrected non-perfusion percentage was similar to the uncorrected ischaemic index (13.5% vs 14.8%, p=0.239). The corrected non-perfusion percentage correlated with uncorrected ischaemic index (R=0.978, p<0.001), but the difference in non-perfusion percentage between corrected and uncorrected metrics was as high as 14.8%. Conclusions Using stereographic projection software, lesion areas on UWF images can be calculated in anatomically correct physical units (mm2). Eyes with RVO show large areas of peripheral retinal non-perfusion.

high performance distributed computing | 2008

Eliminating the middleman: peer-to-peer dataflow

Adam Barker; Jon B. Weissman; Jano van Hemert

Efficiently executing large-scale, data-intensive workflows such as Montage must take into account the volume and pattern of communication. When orchestrating data-centric workflows, centralised servers common to standard workflow systems can become a bottleneck to performance. However, standards-based workflow systems that rely on centralisation, e.g., Web service based frameworks, have many other benefits such as a wide user base and sustained support. This paper presents and evaluates a light-weight hybrid architecture which maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another. Furthermore our architecture is standards compliment, flexible and is a non-disruptive solution; service definitions do not have to be altered prior to enactment. Our architecture could be realised within any existing workflow framework, in this paper, we focus on a Web service based framework. Taking inspiration from Montage, a number of common workflow patterns (sequence, fan-in and fan-out), input-output data size relationships and network configurations are identified and evaluated. The performance analysis concludes that a substantial reduction in communication overhead results in a 2-4 fold performance benefit across all patterns. An end-to-end pattern through the Montage workflow results in an 8 fold performance benefit and demonstrates how the advantage of using our hybrid architecture increases as the complexity of a workflow grows.

electronic commerce | 2006

Evolving combinatorial problem instances that are difficult to solve

Jano van Hemert

This paper demonstrates how evolutionary computation can be used to acquire difficult to solve combinatorial problem instances. As a result of this technique, the corresponding algorithms used to solve these instances are stress-tested. The technique is applied in three important domains of combinatorial optimisation, binary constraint satisfaction, Boolean satisfiability, and the travelling salesman problem. The problem instances acquired through this technique are more difficult than the ones found in popular benchmarks. In this paper, these evolved instances are analysed with the aim to explain their difficulty in terms of structural properties, thereby exposing the weaknesses of corresponding algorithms.

BMC Genomics | 2010

Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles

Robert R. Kitchen; Vicky S. Sabine; Andrew H. Sims; E. Jane Macaskill; Lorna Renshaw; Jeremy Thomas; Jano van Hemert; J. Michael Dixon; John M.S. Bartlett

BackgroundMicroarray technology is a popular means of producing whole genome transcriptional profiles, however high cost and scarcity of mRNA has led many studies to be conducted based on the analysis of single samples. We exploit the design of the Illumina platform, specifically multiple arrays on each chip, to evaluate intra-experiment technical variation using repeated hybridisations of universal human reference RNA (UHRR) and duplicate hybridisations of primary breast tumour samples from a clinical study.ResultsA clear batch-specific bias was detected in the measured expressions of both the UHRR and clinical samples. This bias was found to persist following standard microarray normalisation techniques. However, when mean-centering or empirical Bayes batch-correction methods (ComBat) were applied to the data, inter-batch variation in the UHRR and clinical samples were greatly reduced. Correlation between replicate UHRR samples improved by two orders of magnitude following batch-correction using ComBat (ranging from 0.9833-0.9991 to 0.9997-0.9999) and increased the consistency of the gene-lists from the duplicate clinical samples, from 11.6% in quantile normalised data to 66.4% in batch-corrected data. The use of UHRR as an inter-batch calibrator provided a small additional benefit when used in conjunction with ComBat, further increasing the agreement between the two gene-lists, up to 74.1%.ConclusionIn the interests of practicalities and cost, these results suggest that single samples can generate reliable data, but only after careful compensation for technical bias in the experiment. We recommend that investigators appreciate the propensity for such variation in the design stages of a microarray experiment and that the use of suitable correction methods become routine during the statistical analysis of the data.

Explore More