Shoaib Sufi
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shoaib Sufi.
Nucleic Acids Research | 2013
Katherine Wolstencroft; Robert Haines; Donal Fellows; Alan R. Williams; David Withers; Stuart Owen; Stian Soiland-Reyes; Ian Dunlop; Aleksandra Nenadic; Paul Fisher; Jiten Bhagat; Khalid Belhajjame; Finn Bacall; Alex Hardisty; Abraham Nieva de la Hidalga; Maria Paula Balcazar Vargas; Shoaib Sufi; Carole A. Goble
The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.
international conference on e-science | 2010
Sean Bechhofer; John Ainsworth; Jiten Bhagat; Iain Buchan; Philip A. Couch; Don Cruickshank; David De Roure; Mark Delderfield; Ian Dunlop; Matthew Gamble; Carole A. Goble; Danius T. Michaelides; Paolo Missier; Stuart Owen; David R. Newman; Shoaib Sufi
Scientific data stands to represent a significant portion of the linked open data cloud and science itself stands to benefit from the data fusion capability that this will afford. However, simply publishing linked data into the cloud does not necessarily meet the requirements of reuse. Publishing has requirements of provenance, quality, credit, attribution, methods in order to provide the \emph{reproducibility} that allows validation of results. In this paper we make the case for a scientific data publication model on top of linked data and introduce the notion of \emph{Research Objects} as first class citizens for sharing and publishing.
Computing in Science and Engineering | 2013
Stephen Crouch; Neil Chue Hong; Simon Hettrick; Mike Jackson; Aleksandra Pawlik; Shoaib Sufi; Les Carr; David De Roure; Carole A. Goble; Mark Parsons
To effect change, the Software Sustainability Institute works with researchers, developers, funders, and infrastructure providers to identify and address key issues with research software.
Nature Neuroscience | 2017
Stephen J. Eglen; Ben Marwick; Yaroslav O. Halchenko; Michael Hanke; Shoaib Sufi; Padraig Gleeson; R. Angus Silver; Andrew P. Davison; Linda J. Lanyon; Mathew Abrams; Thomas Wachtler; David Willshaw; Christophe Pouzat; Jean-Baptiste Poline
Computational techniques are central in many areas of neuroscience and are relatively easy to share. This paper describes why computer programs underlying scientific publications should be shared and lists simple steps for sharing. Together with ongoing efforts in data sharing, this should aid reproducibility of research.
international conference on e-science | 2009
Damian Flannery; Brian Matthews; Tom Griffin; Juan Bicarregui; Michael Gleaves; Laurent Lerusse; Roger Downing; Alun Ashton; Shoaib Sufi; Glen Drinkwater; Kerstin Kleese
Scientific facilities, in particular large-scale photon and neutron sources, have demanding requirements to manage the increasing quantities of experimental data they generate in a systematic and secure way. In this paper, we describe the ICAT infrastructure for cataloguing facility-generated experimental data which has been in development within STFC and DLS for several years. We consider the factors which have influenced its design and describe its architecture and metadata model, a key tool in the management of data. We go on to give an outline of its current implementation and use, with plans for its future development.
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing | 2011
Clemens Neudecker; Sven Schlarb; Zeki Mustafa Dogan; Paolo Missier; Shoaib Sufi; Alan R. Williams; Katy Wolstencroft
The paper presents a novel web-based platform for experimental workflow development in historical document digitisation and analysis. The platform has been developed as part of the IMPACT project, providing a range of tools and services for transforming physical documents into digital resources. It explains the main drivers in developing the technical framework and its architecture, how and by whom it can be used and presents some initial results. The main idea lies in setting up an interoperable and distributed infrastructure based on loose coupling of tools via web services that are wrapped in modular workflow templates which can be executed, combined and evaluated in many different ways. As the workflows are registered through a Web 2.0 environment, which is integrated with a workflow management system, users can easily discover, share, rate and tag workflows and thereby support the building of capacity across the whole community. Where ground truth is available, the workflow templates can also be used to compare and evaluate new methods in a transparent and flexible way.
Proceedings of the Tenth ECMWF Workshop on the Use of High Performance Computers in Meteorology | 2003
Kerstin Kleese van Dam; Shoaib Sufi; Glen Drinkwater; Lisa Blanshard; Ananta Manandhar; Rik Tyer; Robert J Allan; Kevin O’Neill; Michael Doherty; Mark Williams; Andrew Woolf; Lakshmi Sastry
Current developments for an e-Science Environment for Environmental Science, integrating data discovery and retrieval, computation and visualisation will be presented. The paper will focus on three developments of the CLRC e-Science Centre: the Dataportal, the HPCPortal and the VisualisationPortal. The Dataportal technology is to be used e.g. for all (Central Laboratory of the Research Councils of the UK) CLRC departments, the Natural Environmental Research Council DataGrid and Environment from the Molecular level project. The HPCPortal will provide access to code libraries and compute resources on the UK Science Grid. The VisualisationPortal finally is to be used e.g. by the projects mentioned above and the GODIVA project to provide access to suitable visualisation tools. It is our aim to provide easy access and support for the usage of data, substantial computing and visualisation resources across Europe by using Grid technologies like grid services and Globus, via user configurable web access points (personal workbenches).
Knowledge and Data Management in GRIDs | 2007
Shoaib Sufi; Brian Matthews
A general model for the representation of scientific study metadata does not exist. The e-Science enablement of the data holdings of CCLRC requires such a model to allow access to the data resources of the facilities in a uniform way. By proposing a model and an implementation, the adoption of such a system would aid interoperability of scientific information systems in the organisation and form a specification of the type and categories of metadata that studies should capture about their investigations and the data they produce inside and outside of CCLRC. This allows further exploitation of scientific Studies and associated datasets, ease citation, facilitate collaboration and allow the easy integration of pre-Grid metadata into a common Grid/e-Science enabled scientific information platform. In this paper, we describe a science metadata model developed at CCLRC, with its motivation, overall design, usage and future development.
Journal of Medical Internet Research | 2016
Caroline Jay; Simon Harper; Ian Dunlop; Samuel G. Smith; Shoaib Sufi; Carole A. Goble; Iain Buchan
Background Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. Objective The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Methods Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Results Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F 1,19=37.3, P<.001), with a main effect of task (F 3,57=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F 1,19=18.0, P<.001). There was also a main effect of task (F 2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. Conclusions The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation.
PLOS Computational Biology | 2018
Shoaib Sufi; Aleksandra Nenadic; Raniere Silva; Beth Duckles; Iveta Simera; Jennifer A. de Beyer; Caroline Struthers; Terhi Nurmikko-Fuller; Louisa Bellis; Wadud Miah; Adriana Wilde; Iain Emsley; Olivier Philippe; Melissa Balzano; Sara Coelho; Heather Ford; Catherine Jones; Vanessa Higgins
Workshops are used to explore a specific topic, to transfer knowledge, to solve identified problems, or to create something new. In funded research projects and other research endeavours, workshops are the mechanism used to gather the wider project, community, or interested people together around a particular topic. However, natural questions arise: how do we measure the impact of these workshops? Do we know whether they are meeting the goals and objectives we set for them? What indicators should we use? In response to these questions, this paper will outline rules that will improve the measurement of the impact of workshops.