Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Neeraj Agrawal is active.

Publication


Featured researches published by Neeraj Agrawal.


knowledge discovery and data mining | 2003

A bag of paths model for measuring structural similarity in Web documents

Sachindra Joshi; Neeraj Agrawal; Raghu Krishnapuram; Sumit Negi

Structural information (such as layout and look-and-feel) has been extensively used in the literatuce for extraction of interesting or relevant data, efficient storage, and query optimization. Traditionally, tree models (such as DOM trees) have been used to represent structural information, especially in the case of HTML and XML documents. However, computation of structural similarity between documents based on the tree model is computationally expensive. In this paper, we propose an alternative scheme for representing the structural information of documents based on the paths contained in the corresponding tree model. Since the model includes partial information about parents, children and siblings, it allows us to define a new family of meaningful (and at the same time computationally simple) structural similarity measures. Our experimental results based on the SIGMOD XML data set as well as HTML document collections from ibm.com, dell.com, and amazon.com show that the representation is powerful enough to produce good clusters of structurally similar pages.


international conference on data engineering | 2004

EShopMonitor: a Web content monitoring tool

Neeraj Agrawal; Rema Ananthanarayanan; Rahul Gupta; Sachindra Joshi; Raghu Krishnapuram; Sumit Negi

Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as


Ibm Journal of Research and Development | 2004

The eShopmonitor: a comprehensive data extraction tool for monitoring web sites

Neeraj Agrawal; Rema Ananthanarayanan; Rahul Gupta; Sachindra Joshi; Raghu Krishnapuram; Sumit Negi

9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. We describe a system that monitors Web sites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.


Archive | 2003

Determining structural similarity in semi-structured documents

Neeraj Agrawal; Sachindra Joshi; Raghuram Krishnapuram; Sumit Negi

Typical commercial Web sites publish information from multiple back-end data sources; these data sources are also updated very frequently. Given the size of most commercial sites today, it becomes essential to have an automated means of checking for correctness and consistency of data. The eShopmonitor allows users to specify items of interest to be tracked, monitors these items on the Web pages, and reports on any changes observed. Our solution comprises a crawler, a miner, a reporter, and a user component that work together to achieve the above functionality. The miner learns to locate the items of interest on a class of pages based on just one sample supplied by the user, via the user interface (UI) provided. The learning algorithm is based on the XPaths of the Document Object Model (DOM) of the page.


Archive | 2008

System and a method for focused re-crawling of Web sites

Neeraj Agrawal; Sreeram V. Balakrishnan; Sachindra Joshi


Archive | 2005

System and method for on-demand analysis of unstructured text data returned from a database

Neeraj Agrawal; Scott Holmes; Kiran Mehta; Sumit Negi


Archive | 2005

Automated process for identifying and delivering domain specific unstructured content for advanced business analysis

Neeraj Agrawal; Scott Holmes; Ana Lelescu; Kiran Mehta; Hongcheng Mi


conference on management of data | 2005

TAP: A Platform for Enabling Enterprises to Develop Business Specific Text Analytic Applications

Neeraj Agrawal; Scott Holmes; Sachindra Joshi; Sumit Negi


Archive | 2008

Policy-Based Usage of Computing Assets

Charles Dyer Bridgham; Neeraj Agrawal


Archive | 2006

Procede automatise permettant d'identifier et de fournir un contenu non structure specifique a un domaine a des fins d'analyse commerciale avancee

Neeraj Agrawal; Scott Holmes; Ana Lelescu; Kiran Mehta; Hongcheng Mi

Collaboration


Dive into the Neeraj Agrawal's collaboration.

Researchain Logo
Decentralizing Knowledge