Komal Kumar Bhatia
YMCA University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Komal Kumar Bhatia.
International Journal of Computer Trends and Technology | 2014
Sonali Gupta; Komal Kumar Bhatia
A large amount of data on the WWW remains inaccessible to crawlers of Web search engines because it can only be exposed on demand as users fill out and submit forms. The Hidden web refers to the collection of Web data which can be accessed by the crawler only through an interaction with the Web-based search form and not simply by traversing hyperlinks. Research on Hidden Web has emerged almost a decade ago with the main line being exploring ways to access the content in online databases that are usually hidden behind search forms. The efforts in the area mainly focus on designing hidden Web crawlers that focus on learning forms and filling them with meaningful values. The paper gives an insight into the various Hidden Web crawlers developed for the purpose giving a mention to the advantages and shortcoming of the techniques employed in each.
grid computing | 2010
Komal Kumar Bhatia; A. K. Sharma; Rosy Madaan
Existing search engines crawl and index surface web, ignoring hidden web which otherwise contains more than 500 times of information than PIW. In this paper, a Domain-specific Hidden Web Crawler (AKSHR) is being proposed. The framework extracts hidden web pages by accruing benefits of its three unique features: 1) automatic downloading of search interfaces to crawl hidden web databases, 2) identification of semantic mappings between search interface elements by using a novel approach called DSIM (Domain-specific Interface Mapper), and 3) the capability to automatic filling of search interfaces. The effectiveness of proposed framework has been evaluated through experiments using real web sites and encouraging preliminary results were obtained.
grid computing | 2012
Sonali Gupta; Komal Kumar Bhatia
With the proliferation of the document corpora (commonly called as HTML documents or web pages) on the WWW, efficient ways of exploring relevant documents are of increasing importance [4, 8]. The key challenge lies in tackling the sheer volume of documents on the Web and evaluating relevancy for such a huge number. Efficient exploration needs a web crawler that can semantically understand and predict the domain of the web page through analytical processing. This will not only facilitate efficient exploration but also help in the better organization of the web content. As a search engine classifies the Search results by keyword matches, link analysis and other such mechanisms, the paper proposes a solution to the domain identification problem by finding keywords or key terms that are representative of the pages content through the elements like <;META>; and <;TITLE>; in the HTML structure of the webpage [11]. This paper proposes a two-step framework that automatically first identifies the domain of the specified web page and with the thus obtained domain information, classifies the web content according to the different prespecified categories. The former uses the various HTML elements present in the web page while the latter is achieved using Artificial Neural Networks (ANN).
IOSR Journal of Computer Engineering | 2012
Pikakshi Manchanda; Sonali Gupta; Komal Kumar Bhatia
The World Wide Web is growing at an uncontrollable rate. Hundreds of thousands of web sites appear every day, with the added challenge of keeping the web directories up-to-date. Further, the uncontrolled nature of web presents difficulties for Web page classification. As the number of Internet users is growing, so is the need for classification of web pages with greater precision in order to present the users with web pages of their desired class. However, web page classification has been accomplished mostly by using textual categorization methods. Herein, we propose a novel approach for web page classification that uses the HTML information present in a web page for its classification. There are many ways of achieving classification of web pages into various domains. This paper proposes an entirely new dimension towards web page classification using Artificial Neural Networks (ANN). Index Terms: World Wide Web, Web page classification, textual categorization, HTML, Artificial Neural Networks, ANN.
International Journal of Information Technology and Web Engineering | 2012
Vandana Dhingra; Komal Kumar Bhatia
Ontologies are the backbone of knowledge representation on Semantic web. Challenges involved in building ontologies are in terms of time, efforts, skill, and domain specific knowledge. In order to minimize these challenges, one of the major advantages of ontologies is its potential of “reuse,†currently supported by various search engines like Swoogle, Ontokhoj. As the number of ontologies that such search engines like Swoogle, OntoKhoj Falcon can find increases, so will the need increase for a proper ranking method to order the returned lists of ontologies in terms of their relevancy to the query which can save a lot of time and effort. This paper deals with analysis of various ontology ranking algorithms. Based on the analysis of different ontology ranking algorithms, a comparative study is done to find out their relative strengths and limitations based on various parameters which provide a significant research direction in ranking of ontologies in semantic web.
International Journal of Information Retrieval Research archive | 2016
Surbhi Bhatia; Manisha Sharma; Komal Kumar Bhatia
Due to the sudden and explosive increase in web technologies, huge quantity of user generated content is available online. The experiences of people and their opinions play an important role in the decision making process. Although facts provide the ease of searching information on a topic but retrieving opinions is still a crucial task. Many studies on opinion mining have to be undertaken efficiently in order to extract constructive opinionated information from these reviews. The present work focuses on the design and implementation of an Opinion Crawler which downloads the opinions from various sites thereby, ignoring rest of the web. Besides, it also detects web pages which frequently undergo updation by calculating the timestamp for its revisit in order to extract relevant opinions. The performance of the Opinion Crawler is justified by taking real data sets that prove to be much more accurate in terms of precision and recall quality attributes.
Ingénierie Des Systèmes D'information | 2015
Vandana Dhingra; Komal Kumar Bhatia
Web is considered as the largest information pool and search engine, a tool for extracting information from web, but due to unorganized structure of the web it is getting difficult to use search engine tool for finding relevant information from the web. Future search engine tools will not be based merely on keyword search, whereas they will be able to interpret the meaning of the web contents to produce relevant results. Design of such tools requires extracting information from the contents which supports logic and inferential capability. This paper discusses the conceptual differences between the traditional web and semantic web, specifying the need for crawling semantic web documents. In this paper a framework is proposed for crawling the ontologies/semantic web documents. The proposed framework is implemented and validated on different collection of web pages. This system has features of extracting heterogeneous documents from the web, filtering the ontology annotated web pages and extracting triples from them which supports better inferential capability.
international conference on contemporary computing | 2014
Vandana Dhingra; Komal Kumar Bhatia
To make machine understand the semantics of web page, there is need for representation language other than HTML, XML. Semantic Web, allows the information to be represented in well-defined manner using different languages like RDF, OWL which enhances the inference power of the machines, making the contents machine-interpretable. Indexing of the fetched web contents for these representation languages for effective information retrieval is the key issue of research. In this paper comparative analysis of existing indexing techniques for RDF documents is done. And framework is proposed and implemented for indexing crawled ontologies represented in Semantic Web Language like RDF language.
International Conference on Advances in Computing, Communication and Control | 2013
Pikakshi Manchanda; Sonali Gupta; Komal Kumar Bhatia
A huge amount of data has been made available on the WWW [3] lately most of which remains inaccessible to the usual Web crawlers as those web pages are generated dynamically in response to users queries through Web based search form interfaces [5, 6, 9]. A Hidden Web crawler must be able to automatically annotate such Hidden Web data. The goal can only be accomplished if the crawler has been provided with some knowledge or data that pertains to a domain similar to that of the search form interface. The paper seems to provide a solution in this regard by exploiting the information present in the HTML structure of the Web pages, efficiently obtaining domain specific data to facilitate the crawler’s access to the dynamic web pages through automatic processing of these search form interfaces. Finding the domain of the webpage further eases the process of organization and understanding of the web content.
international conference on contemporary computing | 2010
Rosy Madaan; Ashutosh Dixit; A. K. Sharma; Komal Kumar Bhatia
Hidden Web’s broad and relevant coverage of dynamic and high quality contents coupled with the high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. For the purpose, it is required to verify whether a web page has been changed or not, which is another challenge. Therefore, a mechanism needs to be introduced for adjusting the time period between two successive revisits based on probability of updation of the web page. In this paper, architecture is being proposed that introduces a technique to continuously update/refresh the Hidden Web repository.