Is this you? Create Your Porfile

S. D. Madhu Kumar

National Institute of Technology Calicut

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where S. D. Madhu Kumar is active.

Explore More

Publication

Featured researches published by S. D. Madhu Kumar.

International Journal of Big Data Intelligence | 2015

Optimising virtual machine allocation in MapReduce cloud for improved data locality

T P Shabeera; S. D. Madhu Kumar

Big data is getting more attention in today’s world. Although MapReduce is successful in processing big data, it has some performance bottlenecks when deployed in cloud. Data locality has an important role among them. The focus of this paper is on improving data locality in MapReduce cloud by allocating adjacent VMs, for executing MapReduce jobs. Good data locality reduces cross network traffic and hence results in high performance. When a user requests for a set of virtual machines (VMs), VMs are chosen based on their physical distance between other VMs. We propose a greedy algorithm for creating cluster of VMs. Greedy methods do not give an optimal solution. The second method for the allocation of VMs is via partitioning around medoids method. Partitioning around medoids method always find a local minimum. This allocation may not be globally optimised. We also present a dynamic programming approach which is guaranteed to find an optimal solution from the users’ perspective.

international conference on emerging applications of information technology | 2011

Enhancing the K-means Clustering Algorithm by Using a O(n logn) Heuristic Method for Finding Better Initial Centroids

K. A. Abdul Nazeer; S. D. Madhu Kumar; M. P. Sebastian

With the advent of modern techniques for scientific data collection, large quantities of data are getting accumulated at various databases. Systematic data analysis methods are necessary to extract useful information from rapidly growing data banks. Cluster analysis is one of the major data mining methods and the k-means clustering algorithm is widely used for many practical applications. But the original k-means algorithm is computationally expensive and the quality of the resulting clusters substantially relies on the choice of initial centroids. Several methods have been proposed in the literature for improving the performance of the k-means algorithm. This paper proposes an improvement on the classic k-means algorithm to produce more accurate clusters. The proposed algorithm comprises of a O(n logn) heuristic method, based on sorting and partitioning the input data, for finding the initial centroids in accordance with the data distribution. Experimental results show that the proposed algorithm produces better clusters in less computation time.

Bioinformation | 2013

A novel harmony search-K means hybrid algorithm for clustering gene expression data

K. A. Abdul Nazeer; M. P. Sebastian; S. D. Madhu Kumar

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.

advances in computing and communications | 2012

Authenticated and persistent skip graph: a data structure for cloud based data-centric applications

T P Shabeera; Priya Chandran; S. D. Madhu Kumar

Cloud computing has evolved as a popular computing environment. In data centric applications hosted on the cloud, data is accessed and updated in a purely distributed manner. The distributed data structures used for dynamic storage of the data for such applications require two fundamental qualities, authentication and persistence, which are not completely met by existing distributed data structures. Authentication is a crucial requirement for data structures used on the cloud, as users need to be convinced about the validity of the data they receive. Moreover the data structure has to be persistent, so that changes can be made to the data structure without losing old data, or old versions of the data structure, which may be required by different users in the distributed environment. In this paper we present authenticated and persistent skip graph, an enhanced variant of the existing skip graph. It is based on an event based model, to correspond to the distributed data centric application requirements. Skip graphs are randomized data structures based on skip lists, proposed for use in distributed applications. But skip graphs are neither persistent nor authenticated. The fat node technique is adapted here for providing persistence, and a role based access control, with a cryptographic scheme is used to ensure valid data update and verify the validity of the retrieved data. Our data structure is proved to be efficient in terms of time and space complexities.

acs/ieee international conference on computer systems and applications | 2014

Efficient information retrieval using Lucene, LIndex and HIndex in Hadoop

Anita Brigit Mathew; Priyabrat Pattnaik; S. D. Madhu Kumar

The growth of unstructured and partially-structured data in biological networks, social media, geographical information and other web-based applications present an open challenge to the cloud database community. Hence, the approach to exhaustive BigData analysis that integrates structured and unstructured data processing have become increasingly critical in todays world. MapReduce, has recently emerged as a popular framework for extensive data analytics. Use of powerful indexing techniques would allow users to significantly speed up query processing among MapReduce jobs. Currently, there are a number of indexing techniques like Hadoop++, HAIL, LIAH, Adaptive Indexing etc., but none of them provide an optimized technique for text based selection operations. This paper proposes two indexing approaches in HDFS, namely LIndex and HIndex. These indexing approaches are found to carefully perform selection operation better compared to existing Lucene index approach. A fast retrieval technique is suggested in the MapReduce framework with the new LIndex and HIndex approaches. LIndex provides a complete-text index and it informs the Hadoop implementation engine to scan only those data blocks which contain the terms of interest. LIndex also enhances the throughput (minimizes response time) and overcome some of the drawbacks like upfront cost and long idle time for index creation. This gave a better performance than Lucene but lacked in response and computation time. Hence a new index named HIndex is suggested. This scheme is found to perform better than LIndex in response and computation time.

advances in computing and communications | 2013

A Dynamic Data Placement Scheme for Hadoop Using Real-time Access Patterns

Viju P. Poonthottam; S. D. Madhu Kumar

Hadoop has become a popular platform for largescale data analytics. In this paper, we identify a major performance bottleneck of Hadoop-its lack of ability to place data near to its required users. This new Data Placement scheme using access patterns will improve the performance of Hadoop. With this strategy the data will be placed nearer to the required users thereby achieving optimization in access time and bandwidth. Finally, the simulation experiments indicate that our strategy behaves much better than the HDFS blocks placement.

ieee recent advances in intelligent computational systems | 2013

Bandwidth-aware data placement scheme for Hadoop

T P Shabeera; S. D. Madhu Kumar

We are living in a data rich era. The size of the data is increasing exponentially. Social networking applications, Scientific experiments, etc. are the major contributors of Big Data. The data can be structured, semi-structured or unstructured. Big Data management solutions can be implemented in-house in the organization or it can be stored in cloud. Whether it is stored in-house or in cloud, the placement of data is very important. In general, users demand the availability of data whenever they request for it. There are many parameters that effect the data retrieval time in Hadoop. Among them, this paper pays attention to the available bandwidth. To minimize the data retrieval time, the data must be placed in a DataNode which has the maximum bandwidth. We have proposed a solution for bandwidth-aware data placement in Hadoop by periodically measuring the bandwidth between clients and DataNodes and placing the data blocks in DataNodes that have maximum end-to-end bandwidth.

international conference on signal processing | 2015

Telecom grade cloud computing: Challenges and opportunities

Binesh Jose; S. D. Madhu Kumar

Cloud computing and virtualization are two key technology priorities for telecom service providers. Besides total cost reduction, there are many strategic objectives while adopting cloud technology into the telecom sector. Telecom service providers core assets and strength lies in their communication networks, but these alone are not enough to maintain the industry in the higher level that it once enjoyed. Combining cloud computing technology and networks, telecom service providers can become a significant force in the cloud providers domain and more importantly to return to the growth path. This work evaluates the new challenges and opportunities that are offered by the adoption of cloud and virtualization technologies in telecom sector and its impact on industry value chain and the operational model. Results indicate that even though there are many technical and non-technical challenges still existing, security is still considered as the primary concern that forces cloud into back foot. This study also brings out the fact that, like all other technologies involving in business, cloud technology also brings in many new advantages as well as few disadvantages. Whilst several studies and research works have been done on cloud computing for IT sector, limited research work has been found on cloud computing for telecommunication. Also, majority of the research done in this area is based on industrial research perspective. The significance of our work comes in this context.

international conference on distributed computing systems | 2017

Towards a Complete Virtual Data Center Embedding Algorithm Using Hybrid Strategy

M. P. Gilesh; S. D. Madhu Kumar; Lillykutty Jacob; Umesh Bellur

A Virtual Data Center (VDC) is a set of virtual machines (VMs) connected by a Virtual Network (VN) topology. Todays cloud data centers support dynamic requests for VDCs, by using software defined embedding strategies thatallow them to mesh multiple VDCs onto their Physical Data Center(PDC) network and machines. In this paper, we present a solution to this VDC embedding problem that achieves a higher acceptance rate by minimizing fragmentation, compared to existing strategies, while at the same time minimally disrupting the existing VDCs.

international conference on data science and engineering | 2016

Erasure coded storage systems for cloud storage — challenges and opportunities

Ojus Thomas Lee; S. D. Madhu Kumar; Priya Chandran

Erasure coded storage schemes offer a promising future for cloud storage. Highlights of erasure coded storage systems are that these offer the same level of fault tolerance as that of replication, at lower storage footprints. In the big data era, cloud storage systems based on data replication are of dubious usability due to 200% storage overhead in data replication systems. This has prompted storage service providers to use erasure coded storage as an alternative to replication. Refinements are required in various aspects of erasure coded storage systems to make it a real contender against data replication based storage systems. Streamlining huge bandwidth requirements during the recovery of failed nodes, inefficient update operations, effect of topology in recovery and consistency requirements of erasure coded storage systems, are some areas which need attention. This paper presents an in-depth study on the challenges faced, and research pursued in some of these areas. The survey shows that more research is required to improve erasure coded storage system from being bandwidth crunchers to efficient storage systems. Another challenge that has emerged from the study is the requirement of elaborate research for upgrading the erasure coded storage systems from being mere archival storage systems by providing better update methods. Provision of multiple level consistency in erasure coded storage is yet another research opportunity identified in this work. A brief introduction to open source libraries available for erasure coded storage is also presented in the paper.

Explore More