Neel Sundaresan
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Neel Sundaresan.
international world wide web conferences | 2000
Marc Girardot; Neel Sundaresan
XML is poised to take the World Wide Web to the next level of innovation. XML data, large or small, with or without associated schema, will be exchanged between increasing number of applications running on diverse devices. Efficient storage and transportation of such data is an important issue. We have designed a system called Millau for efficient encoding and streaming of XML structures. In this paper we describe the Millau algorithms for compression of XML structures and data. Millau compression algorithms, in addition to separating structure and text for compression, take advantage of the associated schema (if available) in compressing the structure. Millau also defines a programming model corresponding to XML DOM and SAX for XML APIs for Millau streams of XML documents. Our experiments have shown significant performance gains of our algorithms and APIs. We describe some of these results in this paper. We also describe some applications of XML-based remote procedure calls and client-server applications based on Millau that take advantage of the compression and streaming technology defined by the system.
international conference on multimedia and expo | 2000
Marc Georges Girardot; Neel Sundaresan
As large XML documents with text and multimedia are exchanged and streamed over the Internet medium, techniques for the compact and efficient representation and exchange for this data become essential. We have designed a system called Millau for the efficient encoding and streaming of XML structures. In this paper, we describe the Millau algorithms for the compression and streaming of XML structures and data. In order to be able to transmit the most important information first, Millau builds a solution for fragmenting an XML document, associating a priority to each fragment, streaming each fragment independently, and rebuilding the whole document or parts of the document according to the users preferences or the browsers capabilities. This solution can be applied to the streaming of structured XML documents with text or multimedia data.
international conference on data engineering | 2002
Christina Yip Chung; Michael Gertz; Neel Sundaresan
Despite the advancement of XML, the majority of documents on the Web is still marked up with HTML for visual rendering purposes only, thus building a huge amount of legacy data. In order to facilitate querying Web based data in a way more efficient and effective than just keyword based retrieval, enriching such Web documents with both structure and semantics is necessary. We describe a novel approach to the integration of topic specific HTML documents into a repository of XML documents. In particular, we describe how topic specific HTML documents are transformed into XML documents. The proposed document transformation and semantic element tagging process utilizes document restructuring rules and minimum information about the topic in the form of concepts. For the resulting XML documents, a majority schema is derived that describes common structures among the documents in the form of a DTD. We explore and discuss different techniques, and rules for document conversion and majority schema discovery. We finally demonstrate the feasibility and effectiveness of our approach by applying it to a set of resume HTML documents gathered by a Web crawler.
international world wide web conferences | 2000
Neel Sundaresan; Jeonghee Yi
Abstract The Web is a vast source of information. However, due to the disparate authorship of Web pages, this information is buried in its amorphous and chaotic structure. At the same time, with the pervasiveness of Web access, an increasing number of users is relying on Web search engines for interesting information. We are interested in identifying how pieces of information are related as they are presented on the Web. One such problem is studying patterns of occurrences of related phrases in Web documents and in identifying relationships between these phrases. We call these the duality problems of the Web. Duality problems are materialized in trying to define and identify two sets of inter-related concepts, and are solved by iteratively refining mutually dependent coarse definitions of these concepts. In this paper we define and formalize the general duality problem of relations on the Web. Duality of patterns and relationships are of importance because they allow us to define the rules of patterns and relationships iteratively through the multitude of their occurrences. Our solution includes Web crawling to iteratively refine the definition of patterns and relations. As an example we solve the problem of identifying acronyms and their expansions through patterns of occurrences of (acronym, expansion) pairs as they occur in Web pages.
electronic commerce and web technologies | 2000
Jeonghee Yi; Neel Sundaresan; Anita W. Huang
As the World-Wide-Web grows at an exponential rate, we are faced with the issue of rating pages in terms of quality and trust. In this siutation, with significant linkage among web pages, what other pages say about a web page can be as important as and more objective than what the page says about itself. The cumulative knowledge of such recommendations (or lack of them) can help a system to decide whether to pursue a page or not. This metadata information can also be used by a web robot program, for example, to derive summary information about web documents written in a foreign language. In this paper, we describe how we exploit this type of metadata to drive a web information gathering system, which forms the backend of a topic-specific search engine. The system uses metadata from hyperlinks to guide itself to crawl the web staying focused on a target topic. The crawler follows links that point to information related to the topic and avoids following links to irrelevant pages. Moreover, the system uses the metadata to improve its definition of the target topic through association mining. Ultimately, the guided crawling system builds a rich repository of metadata information, which is used to serve the search engine.
International Workshop on the World Wide Web and Databases | 2000
Jeonghee Yi; Neel Sundaresan; Anita Huang
With the web at lose to a billion pages and growing at an exponential rate, we are fa ed with the issue of rating pages in terms of quality and trust. In this situation, what other pages say about a web page an be as important as what the page says about itself. The umulative knowledge of these types of re ommendations (or the la k thereof) an be obje tive enough to help a user or robot program to deide whether or not to pursue a web do ument. In addition, these annotations or metadata an be used by a web robot program to derive summary information about web do uments that are written in a language that the robot does not understand. We use this idea to drive a web information gathering system that forms the ore of a topi -spe i sear h engine. In this paper, we des ribe how our system uses annotations about the hyperlinks ontained in web pages to guide itself to rawl the web. It sifts through useful information related to a parti ular topi to eliminate the traversal of links that may not be of interest. Thus, the guided rawling system stays fo used on the target topi . It builds a ri h repository of link information that in ludes annotations. This repository is used to build quality metadata, whi h ultimately serves a sear h engine.
web information and data management | 1999
Jeonghee Yi; Neel Sundaresan
The Web is a rich source of information, but this information is scattered and hidden in the diversity of web pages. Search engines are windows to the web. However, the current search engines, designed to identify pages with specified phrases have very limited power. For example, they cannot search for phrases related in a particular way (e.g. books and their authors). In this paper we present a solution for identifying a set of inter-related information on the web using the duality concept. Duality problems arise when one tries to identify a pair of inter-related phrases such as (book, author), (name, email) or (acronym, expansion) relations. We propose a solution to this problem that iteratively refines mutually dependent approximations to their identifications. Specifically, we iteratively refine i) pairs of phrases related in a specific way, and ii) the patterns of their occurrences in web pages, i.e. the ways in which the related phrases are marked in the pages. We cast light on the general solution of the duality problems in the web by concentrating on one paradigmatic duality problem i.e. identifying (acronym, expansion) pairs in terms of the patterns of their occurrences in the web pages. The solution to this problem involves two mutually dependent duality problems of 1) the duality between the related pairs and their patterns, and 2) the duality between the related pairs and the acronym formulation rules.
international world wide web conferences | 2000
Sami Rollins; Neel Sundaresan
Abstract The World Wide Web is a rich source of information that has become a universal means of communication. XML promises to be the future of the World Wide Web. However, as HTML is replaced by its more powerful counterpart, traditional browsers are not sufficient to display the information communicated in an XML document. Todays browsers are capable of showing only a textual version of an XML document. This is limiting not only for a viewer in a traditional scenario, but is a barrier for a user who wishes to access the information without having access to a traditional keyboard, mouse, and monitor. This paper presents a framework for developing non-traditional, schema-driven, customizable interfaces used to navigate and modify XML documents that may be served over the Web. Our system, Audio XmL (AXL) focuses on developing a speech-based component within that framework. At the most basic level, we provide a Speech DOM, a spoken equivalent to the Document Object Model. Beyond that, we provide an intuitive set of commands based upon schema as well as a customization language. AXL enables voice-based Web browsing, but without requiring extra effort on the part of the Web page designer. Given any XML document, AXL allows the user to navigate, modify and traverse the structure and links of the document entirely by voice. To illustrate, we focus on how AXL can enable a user to browse the Web via a cellular phone.
international conference on multimedia and expo | 2000
Sami Rollins; Neel Sundaresan
The eXtensible Markup Language (XML) is emerging as a new way to store and communicate data. Even though its primary application is as the future of the World Wide Web, it can be used in a variety of situations to structure electronic data. As XML becomes ubiquitous, there is a need to develop tools to allow users to view, navigate, and modify the underlying XML data via a high-level, multi-modal interface. Moreover, because XML can be used in a variety of situations, the tools must allow a user to access the data via non-traditional interfaces. The Web, eCommerce, and digital classrooms are all possible applications for XML. The paper presents a framework for developing multi-modal tools to view, navigate, and modify XML structures.
international conference on multimedia and expo | 2000
Wayne Niblack; Stanley Yue; Reiner Kraft; Arnon Amir; Neel Sundaresan
We describe methods for crawling, summarizing, and displaying visual multimedia data. The methods combine text and image similarity query with fast browsing and flexible display using animated and/or compressed visual summaries.