Mohammad Rezwanul Huq
University of Twente
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohammad Rezwanul Huq.
data management for sensor networks | 2010
Mohammad Rezwanul Huq; Andreas Wombacher; Peter M.G. Apers
E-science applications use fine grained data provenance to maintain the reproducibility of scientific results, i.e., for each processed data tuple, the source data used to process the tuple as well as the used approach is documented. Since most of the e-science applications perform on-line processing of sensor data using overlapping time windows, the overhead of maintaining fine grained data provenance is huge especially in longer data processing chains. This is because data items are used by many time windows. In this paper, we propose an approach to reduce storage costs for achieving fine grained data provenance by maintaining data provenance on the relation level instead on the tuple level and make the content of the used database reproducible. The approach has prototypically been implemented for streaming and manually sampled data.
database and expert systems applications | 2011
Mohammad Rezwanul Huq; Andreas Wombacher; Peter M.G. Apers
Fine-grained data provenance ensures reproducibility of results in decision making, process control and e-science applications. However, maintaining this provenance is challenging in stream data processing because of its massive storage consumption, especially with large overlapping sliding windows. In this paper, we propose an approach to infer fine-grained data provenance by using a temporal data model and coarse-grained data provenance of the processing. The approach has been evaluated on a real dataset and the result shows that our proposed inferring method provides provenance information as accurate as explicit fine-grained provenance at reduced storage consumption.
extending database technology | 2013
Mohammad Rezwanul Huq; Peter M.G. Apers; Andreas Wombacher
The increasing data volume and highly complex models used in different domains make it difficult to debug models in cases of anomalies. Data provenance provides scientists sufficient information to investigate their models. In this paper, we propose a tool which can infer fine-grained data provenance based on a given script. The tool is demonstrated using a hydrological model. The tool is also tested successfully handling other scripts in different contexts.
ieee international conference on escience | 2011
Mohammad Rezwanul Huq; Andreas Wombacher; Peter M.G. Apers
In stream data processing, data arrives continuously and is processed by decision making, process control and e-science applications. To control and monitor these applications, reproducibility of result is a vital requirement. However, it requires massive amount of storage space to store fine-grained provenance data especially for those transformations with overlapping sliding windows. In this paper, we propose techniques which can significantly reduce storage costs and can achieve high accuracy. Our evaluation shows that adaptive inference technique can achieve almost 100% accurate provenance information for a given dataset at lower storage costs than the other techniques. Moreover, we present a guideline about the usage of different provenance collection techniques described in this paper based on the transformation operation and stream characteristics.
statistical and scientific database management | 2012
Mohammad Rezwanul Huq; Peter M.G. Apers; Andreas Wombacher
Many applications facilitate a data processing chain, i.e. a workflow, to process data. Results of intermediate processing steps may not be persistent since reproducing these results are not costly and these are hardly re-usable. However, in stream data processing where data arrives continuously, documenting fine-grained provenance explicitly for a processing chain to reproduce results is not a feasible solution since the provenance data may become a multiple of the actual sensor data. In this paper, we propose the multi-step provenance inference technique that infers provenance data for the entire workflow with non-materialized intermediate views. Our solution provides high quality provenance graph.
database and expert systems applications | 2012
Mohammad Rezwanul Huq; Peter M.G. Apers; Andreas Wombacher
Decision making, process control and e-science applications process stream data, mostly produced by sensors. To control and monitor these applications, reproducibility of result is a vital requirement. However, it requires massive amount of storage space to store fine-grained provenance data especially for those transformations with overlapping sliding windows. In this paper, we propose a probabilistic technique to infer fine-grained provenance which can also estimate the accuracy beforehand. Our evaluation shows that the probabilistic inference technique achieves same level of accuracy as the other approaches do, with minimal prior knowledge.
IEEE Transactions on Geoscience and Remote Sensing | 2013
Mohammad Rezwanul Huq; Peter M.G. Apers; Andreas Wombacher
Data provenance allows scientists to validate their model as well as to investigate the origin of an unexpected value. Furthermore, it can be used as a replication recipe for output data products. However, capturing provenance requires enormous effort by scientists in terms of time and training. First, they need to design the workflow of the scientific model, i.e., workflow provenance, which requires both time and training. However, in practice, scientists may not document any workflow provenance before the model execution due to the lack of time and training. Second, they need to capture provenance while the model is running, i.e., fine-grained data provenance. Explicit documentation of fine-grained provenance is not feasible because of the massive storage consumption by provenance data in the applications, including those from the geoscience domain where data are continuously arriving and are processed. In this paper, we propose an inference-based framework, which provides both workflow and fine-grained data provenance at a minimal cost in terms of time, training, and disk consumption. Our proposed framework is applicable to any given scientific model, and is capable of handling different model dynamics, such as variation in the processing time as well as input data products arrival pattern. Our evaluation of the framework in a real use case with geospatial data shows that the proposed framework is relevant and suitable for scientists in geoscientific domain.
international conference on e-science | 2012
Mohammad Rezwanul Huq; Peter M.G. Apers; Andreas Wombacher; Yoshihide Wada; Ludovicus P. H. van Beek
Scientists require provenance information either to validate their model or to investigate the origin of an unexpected value. However, they do not maintain any provenance information and even designing the processing workflow is rare in practice. Therefore, in this paper, we propose a solution that can build the workflow provenance graph by interpreting the scripts used for actual processing. Further, scientists can request fine-grained provenance information facilitating the inferred workflow provenance. We also provide a guideline to customize the workflow provenance graph based on user preferences. Our evaluation shows that the proposed approach is relevant and suitable for scientists to manage provenance.
conference on information and knowledge management | 2010
Mohammad Rezwanul Huq; Andreas Wombacher; Peter M.G. Apers
One of the major requirements for e-science applications handling sensor data, is reproducibility of results. Several optimization and scalability problems exist where the reproducibility of results remains guaranteed. Firstly, various data streams need to be coordinated to optimize the accuracy and processing of the results. Secondly, because of the high volume of streaming data and a series of processing steps to be performed on that data, demand for disk space may grow unacceptably high. Lastly, reproducibility in a decentralized scenario may be difficult to achieve because of data replication. This paper introduces and addresses these challenges which arise for optimizing the process of achieving reproducibility of results.
international conference on information networking | 2008
Mohammad Rezwanul Huq; Young-Koo Lee; Byeong-Soo Jeong; Sungyoung Lee
Sharing files is very common in collaborative environment. Users may want to share each others file for more effective and meaningful collaboration. Sometimes it would be preferable to adapt the file so that it can provide the required information to users with minimal overhead. Moreover, users may not want to share files in their original format. Due to device heterogeneity file sharing in original format is less meaningful. Therefore, data adaptation is needed to have the effective file sharing among users. In this paper, we propose a framework for file sharing and adaptation to have effective collaboration among users in advanced collaborating environment. This will certainly enhance the degree of collaboration. Moreover, we propose a hybrid approach for adapting data which considers user preferences and device capabilities at the time of data adaptation. The goal of this adaptation approach is to provide the best possible adaptation strategy based on user preference and device capabilities. Moreover, we discuss about our first prototype implementation here. At its current stage, our prototype only deals with image files. Our prototype implementation realizes our designed framework and demonstrates the viability of our approach.