Rakhi Garg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rakhi Garg is active.

Explore More

Publication

Featured researches published by Rakhi Garg.

International Journal of Computer Applications | 2014

Preprocessing Techniques in Web Usage Mining: A Survey

Mitali Srivastava; Rakhi Garg; P. K. Mishra

Due to huge, unstructured and scattered amount of data available on web, it is very tough for users to get relevant information in less time. To achieve this, improvement in design of web site, personalization of contents, prefetching and caching activities are done according to user’s behavior analysis. User’s activities can be captured into a special file called log file. There are various types of log: Server log, Proxy server log, Client/Browser log. These log files are used by web usage mining to analyze and discover useful patterns. The process of web usage mining involves three interdependent steps: Data preprocessing, Pattern discovery and Pattern analysis. Among these steps, Data preprocessing plays a vital role because of unstructured, redundant and noisy nature of log data. To improve later phases of web usage mining like Pattern discovery and Pattern analysis several data preprocessing techniques such as Data Cleaning, User Identification, Session Identification, Path Completion etc. have been used. In this paper all these techniques are discussed in detail. Moreover these techniques are also categorized and incorporated with their advantage and disadvantage that will help scientist, researchers and academicians working in this direction.

International Journal of Computer Applications | 2015

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

Sudhakar Singh; Rakhi Garg; P. K. Mishra

Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.

Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015) | 2015

Analysis of Data Extraction and Data Cleaning in Web Usage Mining

Mitali Srivastava; Rakhi Garg; P. K. Mishra

Data preprocessing is considered as an important phase of Web usage mining due to unstructured, heterogeneous and noisy nature of log data. Complete and effective data preprocessing insures the efficiency and scalability of algorithms used in pattern discovery phase of Web usage mining. Data preprocessing generally includes the steps-Data fusion, Data cleaning, User identification, Session identification, Path completion etc. Data cleaning is the initial and important step in preprocessing to extract cleaned data for further processing. It is important to apply data extraction before data cleaning on raw log data in analysis of specific time-duration i.e. one day, one week or one month etc. In this paper we have mainly focused on data fusion, data extraction and data cleaning steps of preprocessing and proposed an algorithm for data extraction which extracts log data according to analysis of time duration. This algorithm also sorts log entries according to their date and time which will be further used in prediction of browsing sequence of user. After that we have applied data cleaning algorithm on extracted real Web server log. In data cleaning almost all irrelevant files, irrelevant HTTP methods and wrong HTTP status codes are considered and after experiment it is analyzed that raw log data reduces to almost 80% which shows the importance of initial phases of data preprocessing.

Computers & Electrical Engineering | 2017

Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster

Sudhakar Singh; Rakhi Garg; P. K. Mishra

Abstract Many techniques have been proposed to implement the Apriori algorithm on MapReduce framework but only a few have focused on performance improvement. FPC (Fixed Passes Combined-counting) and DPC (Dynamic Passes Combined-counting) algorithms combine multiple passes of Apriori in a single MapReduce phase to reduce the execution time. In this paper, we propose improved MapReduce based Apriori algorithms VFPC (Variable Size based Fixed Passes Combined-counting) and ETDPC (Elapsed Time based Dynamic Passes Combined-counting) over FPC and DPC. Further, we optimize the multi-pass phases of these algorithms by skipping pruning step in some passes, and propose Optimized-VFPC and Optimized-ETDPC algorithms. Quantitative analysis reveals that counting cost of additional un-pruned candidates produced due to skipped-pruning is less significant than reduction in computation cost due to the same. Experimental results show that VFPC and ETDPC are more robust and flexible than FPC and DPC whereas their optimized versions are more efficient in terms of execution time.

2013 IEEE International Conference in MOOC, Innovation and Technology in Education (MITE) | 2013

Diversified courseware technology: A new hope to enhance the educational achievement of students with and without special needs in the inclusive classroom

Pankaj Singh; Rakhi Garg; P. K. Mishra

The real classroom conditions are very challenging for the teachers and administrators because of having the diversified students with and without special needs in an inclusive classroom. It seems like the different colours of flowers in a single flask. To tackle the individual differences and to maximize the educational achievement of the every student it becomes necessary to upgrade the educational programs and course content in accordance with the diversified classroom situations. Diversified courseware technology honours the individual difference and satisfied the specific educational need of each student. Diversified Courseware technology enhances self-learning to students, teachers and trainers with the help of diagrams, animation, assessment tools, teaching notes and exercises. It has been observed that the usage of computer learning and courseware technology in education system has increased dramatically since its inception. The time and place is no more big issue in communication because of internet. However, more work is required to be done in the area of graphics to improve Diversified courseware technology to enhance the Educational Achievement of Students with and without Special Needs in the Inclusive Classroom. Through this paper we try to focus on the requirements and the challenges in the field of Diversified Courseware technology to enhance the Educational Achievement of Students with and without Special Needs in the Inclusive Classroom. We also emphasized on reading skill, application of computer in distance/regular mode, rural/tribal education and courseware technology for classroom aids, assessment software, reference software and various technology used in computer learning.

international conference on computing communication and automation | 2016

Observations on factors affecting performance of MapReduce based Apriori on Hadoop cluster

Sudhakar Singh; Rakhi Garg; P. K. Mishra

Designing fast and scalable algorithm for mining frequent itemsets is always being a most eminent and promising problem of data mining. Apriori is one of the most broadly used and popular algorithm of frequent itemset mining. Designing efficient algorithms on MapReduce framework to process and analyze big datasets is contemporary research nowadays. In this paper, we have focused on the performance of MapReduce based Apriori on homogeneous as well as on heterogeneous Hadoop cluster. We have investigated a number of factors that significantly affects the execution time of MapReduce based Apriori running on homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both algorithmic and non-algorithmic improvements. Considered factors specific to algorithmic improvements are filtered transactions and data structures. Experimental results show that how an appropriate data structure and filtered transactions technique drastically reduce the execution time. The non-algorithmic factors include speculative execution, nodes with poor performance, data locality & distribution of data blocks, and parallelism control with input split size. We have applied strategies against these factors and fine tuned the relevant parameters in our particular application. Experimental results show that if cluster specific parameters are taken care of then there is a significant reduction in execution time. Also we have discussed the issues regarding MapReduce implementation of Apriori which may significantly influence the performance.

International Journal of Computer Applications | 2010

Parallel Association Rule Mining on Heterogeneous System

Rakhi Garg; P. K. Mishra

Association Rule Mining from transaction–oriented databases is one of the important process that finds relation between items and plays important role in decision making. Parallel algorithms are required because of large size of the database to be mined. Most of the algorithms designed were for homogeneous system uses static load balancing technique which is far from reality. A parallel algorithm for heterogeneous system is regarded as one of the most promising platforms for association rule mining. In this paper we propose a simple parallel algorithm for association rule mining on heterogeneous system with dynamic load balancing based on Par-Maxclique algorithm. We maintain one linked list at the scheduler end that keep track of load value of every processor and each processor is having a job queue associated with it which is served in First come first basis. On the basis of load value scheduler directs the migration of task from heavy loaded to least loaded processor in the cluster during the execution and thus balances load dynamically in a cluster.

Journal of Information and Optimization Sciences | 2018

Analysis of iterative methods in PageRank computation

Atul K. Srivastava; Rakhi Garg; P. K. Mishra

Abstract PageRank is one of the basic metric used in web search technology to rank the web pages. It uses power method to compute principal eigenvector of the web matrix of several billion nodes. PageRank method incorporates a parameter α called damping factor that plays a major role in PageRank computation. In this study, we have observed experimentally the efficiency of various iterative methods on hyperlink graph for different value of α. We conclude from experiment that Power method is effective and more competitive for the well condition problem i.e. small value of α. However, for α → 1 Power method becomes more complex, and other methods such as Aitken-Power, SOR, and Gauss-Seidel are more efficient than it in respect of CPU time as well as the number of iteration needed for convergence.

international conference on computer and automation engineering | 2009