Is this you? Create Your Porfile

Poonam Goyal

Birla Institute of Technology and Science

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Poonam Goyal is active.

Explore More

Publication

Featured researches published by Poonam Goyal.

advances in mobile multimedia | 2009

Designing self-adaptive websites using online hotlink assignment algorithm

Poonam Goyal; Navneet Goyal; Ashish Gupta; T. S. Rahul

An online hotlink assignment algorithm is proposed for designing adaptive websites. The objective is to reach desired pages on a website in minimum number of clicks, thereby reducing the load on the web server. As a consequence, the traffic on the internet is also reduced. The hotlinks are assigned based on the frequency of access of pages. We model a website as a single source directed graph. Optimal hotlink assignment problem is NP-hard for general graphs. The website graph is reduced to a Breadth First Search (BFS) tree which maintains the semantic relationships between web pages. The proposed online algorithm can place at most k hotlinks per page with a maximum of l hotlinks on the entire website, where k«l. The input stream is simulated using the Zipf distribution. The results presented in the paper compare the performance of the online algorithm with the optimal offline algorithm.

international conference of distributed computing and networking | 2015

Parallelizing OPTICS for Commodity Clusters

Poonam Goyal; Sonal Kumari; Dhruv Kumar; Sundar Balasubramaniam; Navneet Goyal; Saiyedul Islam; Jagat Sesh Challa

In this paper, we propose an algorithm, DOPTICS, a parallelized version of a popular density based cluster-ordering algorithm OPTICS. Parallelizing OPTICS is challenging because of its strong sequential data access behavior. To achieve high parallelism, a data parallel approach that exploits the underlying indexing structure is proposed. We implement the proposed algorithm for processor nodes in a commodity cluster as well as across cores in a processor. Moreover, the clusters obtained by our algorithm are exactly same as that of classical OPTICS unlike the only existing implementation of the parallel OPTICS. We demonstrate the performance of the proposed algorithm on a commodity cluster which is typically a combination of distributed and shared memory systems. Experimental results on several large real and synthetic data sets with varying dimensions are presented to show speed up and scalability achieved. The speed up obtained is remarkable and is found to scale well with increasing number of processing elements. Performance improvements of the proposed DOPTICS algorithm are due to algorithmic optimizations and parallelization strategy.

Journal of Information Science | 2013

A robust approach for finding conceptually related queries using feature selection and tripartite graph structure

Poonam Goyal; N. Mehala; Ankur Bansal

The information explosion on the Internet has placed high demands on search engines. Despite the improvements in search engine technology, the precision of current search engines is still unsatisfactory. Moreover, the queries submitted by users are short, ambiguous and imprecise. This leads to a number of problems in dealing with similar queries. The problems include lack of common keywords, selection of different documents by the search engine and lack of common clicks etc. These problems render the traditional query clustering methods unsuitable for query recommendations. In this paper, we propose a new query recommendation system. For this, we have identified conceptually related queries by capturing users’ preferences using click-through graphs of web search logs and by extracting the best features, relevant to the queries, from the snippets. The proposed system has an online feature extraction phase and an offline phase in which feature filtering and query clustering are performed. Query clustering is carried out by a new tripartite agglomerative clustering algorithm, Query-Document-Concept Clustering, in which the documents are used innovatively to decouple queries and features/concepts in a tripartite graph structure. This results in clusters of similar queries, associated clusters of documents and clusters of features. We model the query recommendation problem in four different ways. Two models are non-personalized and personalized content-ignorant models. Other two are non-personalized and personalized content-aware models. Three similarity measures are introduced to estimate different kinds of similarities. Experimental results show that the proposed approach has better precision, recall and F-measure than the existing approaches.

ieee international conference on high performance computing data and analytics | 2016

Scalable Parallel Algorithms for Shared Nearest Neighbor Clustering

Sonal Kumari; Saurabh Maurya; Poonam Goyal; Sundar Balasubramaniam; Navneet Goyal

Clustering is a popular data mining technique which discovers structure in unlabeled data by grouping objects together on the basis of a similarity criterion. Traditional similarity measures lose their meaning as the number of dimensions increases and as a consequence, distance or density based clustering algorithms become less meaningful. Shared Nearest Neighbor (SNN) is a solution to clustering high-dimensional data with the ability to find clusters of varying density. SNN assigns objects to a cluster, which share a large number of their nearest neighbors. However, SNN is compute and memory intensive for data of large size and/or dimensionality. Nearest neighbor queries are responsible for a major proportion of computations in SNN, resulting in lower efficiency for higher value of number of nearest neighbors (k). The main motivation of this work is to improve the efficiency of SNN and to parallelize it so that it can be used for clustering large high-dimensional datasets and for large values of k. Existing SNN algorithms become inefficient in these situations. In this paper, we present a new sequential SNN algorithm, R-SNN, which uses R-tree for executing neighborhood queries efficiently and exploiting spatial locality to minimize memory usage. R-SNN is benchmarked against the best available implementation of SNN and is found up to 77 times faster when tested on various real datasets. R-SNN is parallelized for distributed memory, shared memory, and hybrid systems. Significant speedup and scalability achieved can be attributed to parallelization and good load balancing strategies and also to exploitation of spatial locality. Experimental results demonstrate the same for datasets of varying dimensionality and size. The maximum speedup achieved for shared, distributed, and hybrid models are 427.19 using 48 threads, 394.24 using 32 processes, and 1380.69 on 32 nodes (with each node spawning 4 threads), respectively. Super-linear speedup for some datasets is attributed to optimized neighborhood queries. All the proposed algorithms produce identical clustering results as that of the classical SNN.

ieee international conference on data science and advanced analytics | 2016

A Parallel Framework for Grid-Based Bottom-Up Subspace Clustering

Poonam Goyal; Sonal Kumari; Shubham Singh; Vivek Kishore; Sundar Balasubramaniam; Navneet Goyal

Clustering is a popular data mining and machine learning technique which discovers interesting patterns from unlabeled data by grouping similar objects together. Clustering high-dimensional data is a challenging task as points in high dimensional space are nearly equidistant from each other, rendering commonly used similarity measures ineffective. Subspace clustering has emerged as a possible solution to the problem of clustering high-dimensional data. In subspace clustering, we try to find clusters in different subspaces within a dataset. Many subspace clustering algorithms have been proposed in the last two decades to find clusters in multiple overlapping subspaces of high-dimensional data. Subspace clustering algorithms iteratively find the best subset of dimensions for a cluster from 2d–1 possible combinations in d-dimensional data. Subspace clustering is extremely compute intensive because of exhaustive search of subspaces, especially in the bottom-up subspace clustering algorithms. To address this issue, an efficient parallel framework for grid-based bottom-up subspace clustering algorithms is developed, considering popular algorithms belonging to this category. The framework is implemented for shared memory, distributed memory, and hybrid systems and is tested for three grid-based bottom-up subspace clustering algorithms: CLIQUE, MAFIA, and ENCLUS. All parallel implementations exhibit impressive speedup and scalability on real datasets.

high performance computing and communications | 2016

A Fast, Scalable SLINK Algorithm for Commodity Cluster Computing Exploiting Spatial Locality

Poonam Goyal; Sonal Kumari; Sumit Sharma; Dhruv Kumar; Vivek Kishore; Sundar Balasubramaniam; Navneet Goyal

Single linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm over traditional partitioning-based clustering as it does not require the number of clusters as input. But, due to its high time complexity and inherent data dependencies, it does not scale well for large datasets. To the best of our knowledge, all existing parallel SLINK algorithms are based on the traditional SLINK algorithm and thus require large number of computing resources. In this paper, we present a novel optimization of SLINK algorithm, GridSLINK, which is an order of magnitude faster than the existing state-of-the-art implementation. The optimization in GridSLINK comes from reduction in number of distance calculations required by SLINK. This reduction is achieved by exploiting spatial locality of data points and using an adaptive gridding technique. GridSLINK is parallelized for distributed memory systems. Scalable performance is achieved for increasing number of compute nodes. The proposed parallel algorithm, dGridSLINK, is benchmarked against the best existing parallel algorithm in literature and found to outperform the latter for all the real datasets considered. dGridSLINK can cluster millions of data points in few seconds/minutes using a small number of processing elements, without compromising the quality of clustering.

bangalore annual compute conference | 2015

A concurrent k-NN search algorithm for R-tree

Jagat Sesh Challa; Poonam Goyal; S. Nikhil; Sundar Balasubramaniam; Navneet Goyal

k-nearest neighbor (k-NN) search is one of the commonly used query in database systems. It has its application in various domains like data mining, decision support systems, information retrieval, multimedia and spatial databases, etc. When k-NN search is performed over large data sets, spatial data indexing structures such as R-trees are commonly used to improve query efficiency. The best-first k-NN (BF-kNN) algorithm is the fastest known k-NN over R-trees. We present CBF-kNN, a concurrent BF-kNN for R-trees, which is the first concurrent version of k-NN we know of for R-trees. CBF-kNN uses one of the most efficient concurrent priority queues known as mound. CBF-kNN overcomes the concurrency limitations of priority queues by using a tree-parallel mode of execution. CBF-kNN has an estimated speedup of O(p/k) for p threads. Experimental results on various real datasets show that the speedup in practice is close to this estimate.

bangalore annual compute conference | 2014

Parallelizing OPTICS for multicore systems

Poonam Goyal; Sonal Kumari; Dhruv Kumar; Sundar Balasubramaniam; Navneet Goyal

Parallelizing algorithms to leverage multiple cores in a processor or multiple nodes in a cluster setup is the only way forward to handle ever-increasing volumes of data. OPTICS is a well-known density based clustering algorithm to identify arbitrary shaped clusters. Since, hierarchical cluster ordering of OPTICS is sensitive to the order in which data is processed, typically a priority queue is used to maintain the order. This sequential access order makes it difficult to parallelize OPTICS. Moreover, the execution time of OPTICS increases with increase in density of data. We propose a parallel version of OPTICS for shared memory multi-core systems using a master-slave pattern for parallelization. The master runs concurrently with the slaves and distributes data to the slaves. Each slave performs neighborhood queries for a subset of data. Our approach ensures that cluster ordering matches with that of the classical OPTICS. Our solution runs in a mostly data parallel mode yielding scalable performance. We also argue that our approach is well suited for dense datasets in particular.

international conference on multimedia retrieval | 2018

Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries

Chandramani Chaudhary; Poonam Goyal; Joel Ruben Antony Moniz; Navneet Goyal; Yi-Ping Phoebe Chen

With the rising prevalence of social media, coupled with the ease of sharing images, people with specific needs and applications such as known item search, multimedia question answering, etc., have started searching for visual content, which is expressed in terms of complex queries. A complex query consists of multiple concepts and their attributes are arranged to convey semantics. It is less effective to answer such queries by simply appending the search results gathered from individual or subsets of concepts present in the query. In this paper, we propose to exploit the query constituents and relationships among them. The proposed approach finds image-query relevance by integrating three models - the linguistic pattern-based textual model, the visual model, and the cross modality model. We extract linguistic patterns from complex queries, gather their related crawled images, and assign relevance scores to images in the corpus. The relevance scores are then used to rank the images. We experiment on more than 140k images and compare the NDCG@n scores with the state-of-the-art image ranking methods for complex queries. Also, ranking of images obtained by our approach outperforms than that of obtained by a popular search engine.

international conference on distributed computing and internet technology | 2017

A Domain Specific Language for Clustering

Saiyedul Islam; Sundar Balasubramaniam; Poonam Goyal; Mohit Sati; Navneet Goyal

Clustering of large volumes of data is a complex problem which requires use of sophisticated algorithms as well as High Performance Computing hardware like a cluster of computers. It is highly desirable that data mining experts have a solution which on one hand provides a simple interface for ex-pressing their algorithms in terms of domain specific idioms and on the other hand automatically generates parallel code that can run on a cluster of multicore nodes. The proposed Domain Specific Language (DSL) along with its parallelizing compiler attempts to provide a solution. In this paper, we give the design of the DSL, called DWARF. Various language constructs have been described along with the rationale behind their inclusion in the language. A qualitative comparison of abstraction provided by DWARF is compared with MapReduce, Spark, and other MPI-based implementations to establish the usefulness of the proposed clustering DSL.

Explore More