Mukhtaj Khan
Brunel University London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mukhtaj Khan.
IEEE Transactions on Parallel and Distributed Systems | 2016
Mukhtaj Khan; Yong Jin; Maozhen Li; Yang Xiang; Changjun Jiang
MapReduce has become a major computing model for data intensive applications. Hadoop, an open source implementation of MapReduce, has been adopted by an increasingly growing user community. Cloud computing service providers such as Amazon EC2 Cloud offer the opportunities for Hadoop users to lease a certain amount of resources and pay for their use. However, a key challenge is that cloud service providers do not have a resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the users responsibility to estimate the required amount of resources for running a job in the cloud. This paper presents a Hadoop job performance model that accurately estimates job completion time and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model builds on historical job execution records and employs Locally Weighted Linear Regression (LWLR) technique to estimate the execution time of a job. Furthermore, it employs Lagrange Multipliers technique for resource provisioning to satisfy jobs with deadline requirements. The proposed model is initially evaluated on an in-house Hadoop cluster and subsequently evaluated in the Amazon EC2 Cloud. Experimental results show that the accuracy of the proposed model in job execution estimation is in the range of 94.97 and 95.51 percent, and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model.
power and energy society general meeting | 2015
Mukhtaj Khan; Phillip M. Ashton; Maozhen Li; Gareth A. Taylor; Ioana Pisica; Junyong Liu
Phasor measurement units (PMUs) are being rapidly deployed in power grids due to their high sampling rates and synchronized measurements. The devices high data reporting rates present major computational challenges in the requirement to process potentially massive volumes of data, in addition to new issues surrounding data storage. Fast algorithms capable of processing massive volumes of data are now required in the field of power systems. This paper presents a novel parallel detrended fluctuation analysis (PDFA) approach for fast event detection on massive volumes of PMU data, taking advantage of a cluster computing platform. The PDFA algorithm is evaluated using data from installed PMUs on the transmission system of Great Britain from the aspects of speedup, scalability, and accuracy. The speedup of the PDFA in computation is initially analyzed through Amdahls Law. A revision to the law is then proposed, suggesting enhancements to its capability to analyze the performance gain in computation when parallelizing data intensive applications in a cluster computing environment.
fuzzy systems and knowledge discovery | 2014
Mukhtaj Khan; Maozhen Li; Phillip M. Ashton; Gareth A. Taylor; Junyong Liu
Phasor Measurement Units (PMUs) are being rapidly deployed in power grids due to their high sampling rates. PMUs offer a more current and accurate visibility of the power grids than traditional SCADA systems. However, the high sampling rates of PMUs bring in two major challenges that need to be addressed to fully benefit from these PMU measurements. On one hand, any transient events captured in the PMU measurements can negatively impact the performance of steady state analysis. On the other hand, processing the high volumes of PMU data in a timely manner poses another challenge in computation. This paper presents PDFA, a parallel detrended fluctuation analysis approach for fast detection of transient events on massive PMU measurements utilizing a computer cluster. The performance of PDFA is evaluated from the aspects of speedup, scalability and accuracy in comparison with the standalone DFA approach.
Journal of Theoretical Biology | 2018
M. Fazli Sabooh; Nadeem Iqbal; Mukhtaj Khan; M. S. Khan; Hafiz Farhan Maqbool
This study examines accurate and efficient computational method for identification of 5-methylcytosine sites in RNA modification. The occurrence of 5-methylcytosine (m5C) plays a vital role in a number of biological processes. For better comprehension of the biological functions and mechanism it is necessary to recognize m5C sites in RNA precisely. The laboratory techniques and procedures are available to identify m5C sites in RNA, but these procedures require a lot of time and resources. This study develops a new computational method for extracting the features of RNA sequence. In this method, first the RNA sequence is encoded via composite feature vector, then, for the selection of discriminate features, the minimum-redundancy-maximum-relevance algorithm was used. Secondly, the classification method used has been based on a support vector machine by using jackknife cross validation test. The suggested method efficiently identifies m5C sites from non- m5C sites and the outcome of the suggested algorithm is 93.33% with sensitivity of 90.0 and specificity of 96.66 on bench mark datasets. The result exhibits that proposed algorithm shown significant identification performance compared to the existing computational techniques. This study extends the knowledge about the occurrence sites of RNA modification which paves the way for better comprehension of the biological uses and mechanism.
Concurrency and Computation: Practice and Experience | 2017
Godwin Caruana; Maozhen Li; Man Qi; Mukhtaj Khan; Omer Farooq Rana
MapReduce has become a major programming model for data‐intensive applications in cloud computing environments. Hadoop, an open source implementation of MapReduce, has been adopted by an increasingly wide user community. However, Hadoop suffers from task scheduling performance degradation in heterogeneous contexts because of its homogeneous design focus. This paper presents gSched, a resource‐aware Hadoop scheduler that takes into account both the heterogeneity of computing resources and provisioning charges in task allocation in cloud computing environments. gSched is initially evaluated in an experimental Hadoop cluster and demonstrates enhanced performance compared with the default Hadoop scheduler. Further evaluations are conducted on the Amazon EC2 cloud that demonstrates the effectiveness of gSched in task allocation in heterogeneous cloud computing environments. Copyright
Concurrency and Computation: Practice and Experience | 2017
Mukhtaj Khan; Zhengwen Huang; Maozhen Li; Gareth A. Taylor; Mushtaq Khan
Hadoop MapReduce has become a major computing technology in support of big data analytics. The Hadoop framework has over 190 configuration parameters, and some of them can have a significant effect on the performance of a Hadoop job. Manually tuning the optimum or near optimum values of these parameters is a challenging task and also a time consuming process. This paper optimizes the performance of Hadoop by automatically tuning its configuration parameter settings. The proposed work first employs gene expression programming technique to build an objective function based on historical job running records, which represents a correlation among the Hadoop configuration parameters. It then employs particle swarm optimization technique, which makes use of the objective function to search for optimal or near optimal parameter settings. Experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings. Moreover, it outperforms both rule‐of‐thumb settings and the Starfish model in Hadoop performance optimization.
International Conference on Future Intelligent Vehicular Technologies | 2016
Fazlullah Khan; Izaz ur Rahman; Mukhtaj Khan; Nadeem Iqbal; Muhammad Alam
The Internet of Things (IoT) is a broad vision that incorporate real-wold devices from everyday life. These objects coordinate with each other to share the information gathered from phenomena of interest. IoT is a broad term and has attain popularity with the integration of Cloud Computing and Big Data. The partnership among these technologies is revolutionizing the world in which we live and interact with different devices. On the down side, there are lot of speculations and forecasts about the scale of IoT products expected to be available in the market. Most of the products are vendor-specific and as such are not interoperable. They lack a unified standard and are not compatible with each other. Another major issue with these products is the lack of secured features. Albeit, IoT devices are resource-rich, however, they are not capable to communicate in absence of embedded sensor nodes. The presence of resource-constrained sensors in the core of each IoT device make it resource-starving and as such require extremely lightweight but secured algorithms to combat various attacks and malevolent entities from spreading their malicious data. In this paper we aim to propose an extremely lightweight mutual handshaking algorithm for authentication. The proposed scheme verifies the identity of each participating device because establishing communication. Our scheme is based on client-server interaction model using Constrained Application Protocol (CoAP). A 4-byte header, extremely lightweight parsing complexity and JSON based payload encryption make it a lightweight scheme for IoT objects. The proposed scheme can be used as an alternative to DTLS schemes, the one common nowadays for IoT objects.
fuzzy systems and knowledge discovery | 2014
Mukhtaj Khan; Yang Liu; Maozhen Li
MapReduce has become a major programming model that supports distributed and parallel processing for large-scale data-intensive applications such as Web data mining, network traffic analysis, machine learning and scientific simulation. Hadoop is the most popular open-source implementation of the MapReduce programming model. In Hadoop, input files are divided into many data blocks and these blocks are distributed over several nodes in cluster. To efficiently process the data blocks, Hadoop should provide an efficient scheduling mechanism for enhancing the performance of the system in a shared cluster environment. In Hadoop scheduling mainly caused by data locality issues due to limited network bandwidth. By introducing the scheduling issues with regarding to the data locality, this paper review different data locality aware scheduling algorithms that handling the data locality issues. In addition, this paper also evaluating their features, strength, weakness and provided some guidelines on how to improve further these scheduling algorithms.
International Conference on Future Intelligent Vehicular Technologies | 2016
Fazlullah Khan; Mukhtaj Khan; Zafar Iqbal; Izaz ur Rahman; Muhammad Alam
Sensor network is a network of autonomous devices that consist of sensors which are spatially distributed to sense the physical environment for certain parameters like temperature, humidity and pollution etc. There are various applications of sensor network, like volcanic eruption, inventory tracking system, military surveillance, homes and industrial automation and automobiles. Different sensors use for specific purpose such as temperature sensor, humidity sensor, light sensor, ultrasonic and multimedia sensor, and all these sensors are used for their own task. In this system, we use ultrasonic sensor for defense and security purpose. The ultrasonic sensor constantly transmits ultrasonic sound (Transmitter) which on striking with an obstacle bounces back and that bounced wave is also received by sensor (Receiver) and from this reflection the distance between sensor and obstacle is calculated. So when a person come close to dangerous area like electric field, river side and explosive material, the system will detect the person and will sound an alarm to inform the authorities. The proposed scheme is implemented and the generated results validates its functionalities.
Journal of Grid Computing | 2018
Abdul Majid; Mukhtaj Khan; Nadeem Iqbal; Mian Ahmad Jan; Mushtaq Khan; Salman
With rapid advancement in the field of bioinformatics and computational biology, the collected DNA dataset is growing exponentially, doubling after every 18 months. Due to large-scale and complex structure of the DNA dataset, the analysis of DNA sequence is becoming computationally a challenging issue in bioinformatics field and computational biology. Fast algorithms, capable of analyzing large-scale DNA sequence, are now required in the field of bioinformatics. This paper presents a novel Parallel Vector Space Model (PVSM) approach that supports the analysis of large-scale DNA sequence by taking advantages of multi-core system. The proposed approach is built on top of modified Vector Space Model (VSM). In order to evaluate the performance of PVSM, the proposed technique is extensively evaluated using varied size of DNA sequences in the context of computational efficiency and accuracy. The performance of PVSM is compared with sequential modified VSM. The sequential VSM is implemented on a single processor whereas, the proposed method is initially parallelized on 4 processors and subsequently on 12 processors. The experimental results show that the PVSM performed better than the sequential VSM. The proposed method achieved approximately 2× speedup compared with sequential approach, without affecting the accuracy level. Moreover, the proposed PVSM is highly scalable with an increase in the number of processing cores and support the analysis of large-scale DNA sequences.