Neeraj Agrawal
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Neeraj Agrawal.
knowledge discovery and data mining | 2003
Sachindra Joshi; Neeraj Agrawal; Raghu Krishnapuram; Sumit Negi
Structural information (such as layout and look-and-feel) has been extensively used in the literatuce for extraction of interesting or relevant data, efficient storage, and query optimization. Traditionally, tree models (such as DOM trees) have been used to represent structural information, especially in the case of HTML and XML documents. However, computation of structural similarity between documents based on the tree model is computationally expensive. In this paper, we propose an alternative scheme for representing the structural information of documents based on the paths contained in the corresponding tree model. Since the model includes partial information about parents, children and siblings, it allows us to define a new family of meaningful (and at the same time computationally simple) structural similarity measures. Our experimental results based on the SIGMOD XML data set as well as HTML document collections from ibm.com, dell.com, and amazon.com show that the representation is powerful enough to produce good clusters of structurally similar pages.
international conference on data engineering | 2004
Neeraj Agrawal; Rema Ananthanarayanan; Rahul Gupta; Sachindra Joshi; Raghu Krishnapuram; Sumit Negi
Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as
Ibm Journal of Research and Development | 2004
Neeraj Agrawal; Rema Ananthanarayanan; Rahul Gupta; Sachindra Joshi; Raghu Krishnapuram; Sumit Negi
9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. We describe a system that monitors Web sites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.
Archive | 2003
Neeraj Agrawal; Sachindra Joshi; Raghuram Krishnapuram; Sumit Negi
Typical commercial Web sites publish information from multiple back-end data sources; these data sources are also updated very frequently. Given the size of most commercial sites today, it becomes essential to have an automated means of checking for correctness and consistency of data. The eShopmonitor allows users to specify items of interest to be tracked, monitors these items on the Web pages, and reports on any changes observed. Our solution comprises a crawler, a miner, a reporter, and a user component that work together to achieve the above functionality. The miner learns to locate the items of interest on a class of pages based on just one sample supplied by the user, via the user interface (UI) provided. The learning algorithm is based on the XPaths of the Document Object Model (DOM) of the page.
Archive | 2008
Neeraj Agrawal; Sreeram V. Balakrishnan; Sachindra Joshi
Archive | 2005
Neeraj Agrawal; Scott Holmes; Kiran Mehta; Sumit Negi
Archive | 2005
Neeraj Agrawal; Scott Holmes; Ana Lelescu; Kiran Mehta; Hongcheng Mi
conference on management of data | 2005
Neeraj Agrawal; Scott Holmes; Sachindra Joshi; Sumit Negi
Archive | 2008
Charles Dyer Bridgham; Neeraj Agrawal
Archive | 2006
Neeraj Agrawal; Scott Holmes; Ana Lelescu; Kiran Mehta; Hongcheng Mi