Salman Zubair Toor
Uppsala University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Salman Zubair Toor.
SIAM Journal on Scientific Computing | 2016
Brian Drawert; Michael Trogdon; Salman Zubair Toor; Linda R. Petzold; Andreas Hellander
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
networking architecture and storages | 2012
Salman Zubair Toor; Rainer Toebbicke; Maitane Zotes Resines; Sverker Holmgren
We present a first case study where an open source storage cloud based on Openstack - SWIFT is used for handling data from CERN experiments using the ROOT software framework. This type of storage clouds promise to be easy to deploy and provide transparent access to data using standardized protocols. We examine the scalability and performance of the system using test cases which are derived from the normal usage and the structure of the ROOT software. The results show that cloud solutions like the SWIFT storage system could fulfill the requirements by the CERN scientific community. To verify this, a more extensive effort with many more tests and use-cases is needed. However, the impact of providing alternate storage solutions is large and further work is motivated.
Journal of Physics: Conference Series | 2014
Salman Zubair Toor; Lirim Osmani; Paula Eerola; O Kraemer; T. Lindén; Sasu Tarkoma; John White
The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
IEEE Transactions on Services Computing | 2018
Lirim Osmani; Salman Zubair Toor; Miika Komu; Matti J Kortelainen; T. Lindén; John White; Rasib Khan; Paula Eerola; Sasu Tarkoma
Cloud computing improves utilization and flexibility in allocating computing resources while reducing the infrastructural costs. However, in many cases cloud technology is still proprietary and tainted by security issues rooted in the multi-user and hybrid cloud environment. A lack of secure connectivity in a hybrid cloud environment hinders the adaptation of clouds by scientific communities that require scaling-out of the local infrastructure using publicly available resources for large-scale experiments. In this article, we present a case study of the DII-HEP secure cloud infrastructure and propose an approach to securely scale-out a private cloud deployment to public clouds in order to support hybrid cloud scenarios. A challenge in such scenarios is that cloud vendors may offer varying and possibly incompatible ways to isolate and interconnect virtual machines located in different cloud networks. Our approach is tenant driven in the sense that the tenant provides its connectivity mechanism. We provide a qualitative and quantitative analysis of a number of alternatives to solve this problem. We have chosen one of the standardized alternatives, Host Identity Protocol, for further experimentation in a production system because it supports legacy applications in a topologically-independent and secure way.
international conference on e-science | 2013
Andrej Andrejev; Salman Zubair Toor; Andreas Hellander; Sverker Holmgren; Tore Risch
Data-intensive applications in e-Science require scalable solutions for storage as well as interactive tools for analysis of scientific data. It is important to be able to query the data in a storage-independent way, and to be able to obtain the results of the data-analysis incrementally (in contrast to traditional batch solutions). We use the RDF data model extended with multidimensional numeric arrays to represent the results, parameters, and other metadata describing scientific experiments, and SciSPARQL, an extension of the SPARQL language, to combine massive numeric array data and metadata in queries. To address the scalability problem we present an architecture that enables the same SciSPARQL queries to be executed on the RDF dataset whether it is stored in a relational DBMS or mapped over a specialized geographically distributed e-Science data store. In order to minimize access and communication costs, we represent the arrays with proxy objects, and retrieve their content lazily. We formulate typical analysis tasks from a computational biology application in terms of SciSPARQL queries, and compare the query processing performance with manually written scripts in MATLAB.
ieee international conference on escience | 2011
Salman Zubair Toor; Manivasakan Sabesan; Sverker Holmgren; Tore Risch
The massive increase in the size of the data provided by e-Science applications requires not only to increase the capabilities of resources, but also to design new strategies for efficient utilization of already available resources. In this paper we present a scalable approach to extend a file-oriented storage system, Chelonia, with geographically distributed databases defined by a generic database schema. The database schema is able to model the data from typical e-Science applications. The system includes web service query service allowing e-Science applications to query the required data.
Journal of Physics: Conference Series | 2010
Oxana Smirnova; D. Cameron; P Dóbé; M. Ellert; Thomas Frågåt; Michael Grønager; Daniel Johansson; J Jönemo; Josva Kleist; Marek Kocan; Aleksandr Konstantinov; Balazs Konya; Iván Márton; Steffen Möller; Bjarte Mohn; Zs Nagy; J. K. Nilsen; F. Ould Saada; Weizhong Qiang; Alexander Lincoln Read; P Rosendahl; G Roczei; M Savko; M Skou Andersen; P Stefán; Ferenc Szalai; A. Taga; Salman Zubair Toor; Anders Wäänänen
The Advanced Resource Connector (ARC) middleware introduced by NorduGrid is one of the basic Grid solutions used by scientists worldwide. While being well-proven in daily use by a wide variety of scientific applications at large-scale infrastructures like the Nordic DataGrid Facility (NDGF) and smaller scale projects, production ARC of today is still largely based on conventional Grid technologies and custom interfaces introduced a decade ago. In order to guarantee sustainability, true cross-system portability and standards-compliance based interoperability, the ARC community undertakes a massive effort of implementing modular Web Service (WS) approach into the middleware. With support from the EU KnowARC project, new components were introduced and the existing key ARC services got extended with WS technology based standard-compliant interfaces following a service-oriented architecture. Such components include the hosting environment framework, the resource-coupled execution service, the re-engineered client library, the self-healing storage solution and the peer-to-peer information system, to name a few. Gradual introduction of these new services and client tools into the production middleware releases is carried out together with NDGF and thus ensures a smooth transition to the next generation Grid middleware. Standard interfaces and modularity of the new component design are essential for ARC contributions to the planned Universal Middleware Distribution of the European Grid Initiative.
Journal of Cheminformatics | 2018
Laeeq Ahmed; Valentin Georgiev; Marco Capuccini; Salman Zubair Toor; Wesley Schaal; Erwin Laure; Ola Spjuth
BackgroundDocking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands.ContributionIn this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as ‘low-scoring’ ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling.ResultsWe show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
international conference on e-science | 2017
Salman Zubair Toor; Mathias Lindberg; Ingemar Falman; Andreas Vallin; Olof Mohill; Pontus Freyhult; Linus Nilsson; Martin Agback; Lars Viklund; Henric Zazzik; Ola Spjuth; Marco Capuccini; Joakim Moller; Donal P. Murtagh; Andreas Hellander
The cloud computing paradigm have fundamentally changed the way computational resources are being offered. Although the number of large-scale providers in academia is still relatively small, there is a rapidly increasing interest and adoption of cloud Infrastructure-as-a-Service in the scientific community. The added flexibility in how applications can be implemented compared to traditional batch computing systems is one of the key success factors for the paradigm, and scientific cloud computing promises to increase adoption of simulation and data analysis in scientific communities not traditionally users of large scale e-Infrastructure, the so called ”long tail of science”. In 2014, the Swedish National Infrastructure for Computing (SNIC) initiated a project to investigate the cost and constraints of offering cloud infrastructure for Swedish academia. The aim was to build a platform where academics could evaluate cloud computing for their use-cases. SNIC Science Cloud (SSC) has since then evolved into a national-scale cloud infrastructure based on three geographically distributed regions. In this article we present the SSC vision, architectural details and user stories. We summarize the experiences gained from running a nationalscale cloud facility into ”ten simple rules” for starting up a science cloud project based on OpenStack. We also highlight some key areas that require careful attention in order to offer cloud infrastructure for ubiquitous academic needs and in particular scientific workloads.
Journal of Physics: Conference Series | 2011
J. K. Nilsen; Salman Zubair Toor; Zsombor Nagy; Alex Read
Chelonia is a novel grid storage system designed to fill the requirements gap between those of large, sophisticated scientific collaborations which have adopted the grid paradigm for their distributed storage needs, and of corporate business communities gravitating towards the cloud paradigm. Chelonia is an integrated system of heterogeneous, geographically dispersed storage sites which is easily and dynamically expandable and optimized for high availability and scalability. The architecture and implementation in term of web-services running inside the Advanced Resource Connector Hosting Environment Dameon (ARC HED) are described and results of tests in both local -area and wide-area networks that demonstrate the fault tolerance, stability and scalability of Chelonia will be presented. In addition, example setups for production deployments for small and medium-sized VOs are described.