Susan Gauch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Susan Gauch is active.

Explore More

Publication

Featured researches published by Susan Gauch.

web intelligence | 2005

Personalized Search Based on User Search Histories

Micro Speretta; Susan Gauch

User profiles, descriptions of user interests, can be used by search engines to provide personalized search results. Many approaches to creating user profiles collect user information through proxy servers (to capture browsing histories) or desktop bots (to capture activities on a personal computer). Both these techniques require participation of the user to install the proxy server or the bot. In this study, we explore the use of a less-invasive means of gathering user information for personalized search. In particular, we build user profiles based on activity at the search site itself and study the use of these profiles to provide personalized search results. By implementing a wrapper around the Google search engine, we were able to collect information about individual user search activities. In particular, we collected the queries for which at least one search result was examined, and the snippets (titles and summaries) for each examined result. User profiles were created by classifying the collected information (queries or snippets) into concepts in a reference concept hierarchy. These profiles were then used to re-rank the search results and the rank-order of the user-examined results before and after re-ranking were compared. Our study found that user profiles based on queries were as effective as those based on snippets. We also found that our personalized re-ranking resulted in a 34% improvement in the rankorder of the user-selected results.

Lecture Notes in Computer Science | 2007

User profiles for personalized information access

Susan Gauch; Mirco Speretta; Aravind Chandramouli; Alessandro Micarelli

The amount of information available online is increasing exponentially. While this information is a valuable resource, its sheer volume limits its value. Many research projects and companies are exploring the use of personalized applications that manage this deluge by tailoring the information presented to individual users. These applications all need to gather, and exploit, some information about individuals in order to be effective. This area is broadly called user profiling. This chapter surveys some of the most popular techniques for collecting information about users, representing, and building user profiles. In particular, explicit information techniques are contrasted with implicitly collected user information using browser caches, proxy servers, browser agents, desktop agents, and search logs. We discuss in detail user profiles represented as weighted keywords, semantic networks, and weighted concepts. We review how each of these profiles is constructed and give examples of projects that employ each of these techniques. Finally, a brief discussion of the importance of privacy protection in profiling is presented.

international acm sigir conference on research and development in information retrieval | 2000

Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web

Xiaolan Zhu; Susan Gauch

Most information retrieval systems on the Internet rely primarily on similarity ranking algorithms based solely on term frequency statistics. Information quality is usually ignored. This leads to the problem that documents are retrieved without regard to their quality. We present an approach that combines similarity-based similarity ranking with quality ranking in centralized and distributed search environments. Six quality metrics, including the currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness, were investigated. Search effectiveness was significantly improved when the currency, availability, information-to-noise ratio and page cohesiveness metrics were incorporated in centralized search. The improvement seen when the availability, information-to- noise ratio, popularity, and cohesiveness metrics were incorporated in site selection was also significant. Finally, incorporating the popularity metric in information fusion resulted in a significant improvement. In summary, the results show that incorporating quality metrics can generally improve search effectiveness in both centralized and distributed search environments.

Lecture Notes in Computer Science | 2007

Personalized search on the world wide web

Alessandro Micarelli; Fabio Gasparetti; Filippo Sciarrone; Susan Gauch

With the exponential growth of the available information on theWorld Wide Web, a traditional search engine, even if based on sophisticated document indexing algorithms, has difficulty meeting efficiency and effectiveness performance demanded by users searching for relevant information. Users surfing the Web in search of resources to satisfy their information needs have less and less time and patience to formulate queries, wait for the results and sift through them. Consequently, it is vital in many applications - for example in an e-commerce Web site or in a scientific one - for the search system to find the right information very quickly. PersonalizedWeb environments that build models of short-term and long-term user needs based on user actions, browsed documents or past queries are playing an increasingly crucial role: they form a winning combination, able to satisfy the user better than unpersonalized search engines based on traditional Information Retrieval (IR) techniques. Several important user personalization approaches and techniques developed for the Web search domain are illustrated in this chapter, along with examples of real systems currently being used on the Internet.

ACM Transactions on Information Systems | 1999

A corpus analysis approach for automatic query expansion and its extension to multiple databases

Susan Gauch; Jianying Wang; Satya Mahesh Rachakonda

Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved, while many relevant ones are missed. Terminology mismatches between the users query and document contents are a main cause of retrieval failures. Expanding a users query with related words can improve search performances, but finding and using related words is an open problem. This research uses corpus analysis techniques to automatically discover similar words directly from the contents of the databases which are not tagged with part-of-speech labels. Using these similarities, user queries are automatically expanded, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents. We are able to achieve a 7.6% improvement for TREC 5 queries and up to a 28.5% improvement on the narrow-domain Cystic Fibrosis collection. This work has been extended to multidatabase collections where each subdatabase has a collection-specific similarity matrix associated with it. If the best matrix is selected, substantial search improvements are possible. Various techniques to select the appropriate matrix for a particular query are analyzed, and a 4.8% improvement in the results is validated.

conference on information and knowledge management | 2000

Personal ontologies for web navigation

Jason Chaffee; Susan Gauch

The publicly indexable Web contains an estimated 800 million pages, however it is estimated that the largest search engine contains only 300 million of these pages. As the number of Internet users and the number of accessible Web pages grows, it is becoming increasingly difficult for users to find documents that are relevant to their particular needs. Often users must browse through a large hierarchy of categories to find the information for which they are looking. To provide the user with the most useful information in the least amount of time, we need a system that uses each user’s view of the world for classification. This paper explores a way to use a user’s personal arrangement of concepts to navigate the Web. This system is built by using the characterizations for a particular site created by the Ontology Based Informing Web Agent Navigation (OBIWAN) system and mapping from them to the user’s personal ontologies. OBIWAN allows users to explore multiple sites via the same browsing hierarchy. This paper extends OBIWAN to allow users to explore multiple sites via their own browsing hierarchy. The mapping of the reference ontology to the personal ontology is shown to have a promising level of correctness and precision.

acm international conference on digital libraries | 1996

Vision: a digital video library

Wei Li; Susan Gauch; John M. Gauch; Kok Meng Pua

The goal of the VISION (Video Indexing for SearchIng Over Networks)project is to establish a comprehensive, online digital videolibrary. We are developing automatic mechanisms to populate thelibrary and provide content-based search and retrieval overcomputer networks. The salient feature of our approach is theintegrated application of mature image or video processing,information retrieval, speech feature extraction and word-spottingtechnologies for efficient creation and exploration of the librarymaterials. First, full-motion video is captured in real-time withflexible qualities to meet the requirements of library patronsconnected via a wide range of network bandwidths. Then, the videosare automatically segmented into a number of logically meaningfulvideo clips by our novel two-step algorithm based on video andaudio contents. A closed caption decoder and/or word-spotter isbeing incorporated into the system to extract textual informationto index the video clips by their contents. Finally, allinformation is stored in a full-text information retrieval systemfor content-baaed exploration of the library over networks ofvarying bandwidths.

acm conference on hypertext | 2008

Document similarity based on concept tree distance

Praveen Lakkaraju; Susan Gauch; Mirco Speretta

The Web is quickly moving from the era of search engines to the era of discovery engines. Whereas search engines help you find information you are looking for, discovery engines help you find things that you never knew existed. A common discovery technique is to automatically identify and display objects similar to ones previously viewed by the user. Core to this approach is an accurate method to identify similar documents. In this paper, we present a new approach to identifying similar documents based on a conceptual tree-similarity measure. We represent each document as a concept tree using the concept associations obtained from a classifier. Then, we make employ a tree-similarity measure based on a tree edit distance to compute similarities between concept trees. Experiments on documents from the CiteSeer collection showed that our algorithm performed significantly better than document similarity based on the traditional vector space model.

intelligence and security informatics | 2004

ChatTrack: Chat Room Topic Detection Using Classification

Jason Bengel; Susan Gauch; Eera Mittur; Rajan Vijayaraghavan

The traditional analysis of Internet chat room discussions places a resource burden on the intelligence community because of the time required to monitor thousands of continuous chat sessions. Chat rooms are used to discuss virtually any subject, including computer hacking and bomb making, creating a virtual sanctuary for criminals to collaborate. Given the upsurge of interest in homeland security issues, we have developed a text classification system that creates a concept-based profile that represents a summary of the topics discussed in a chat room or by an individual participant. We then discuss this basic chat profiling system and demonstrate the ability to selectively augment the standard concept database with new concepts of significance to an agent. Finally, we show how an investigator can, once alerted to a user or session of interest via the profile, retrieve details about the chat session through our chat archiving and search system.

Information Processing and Management | 1999

Real time video scene detection and classification

John M. Gauch; Susan Gauch; Sylvain Bouix; Xiaolan Zhu

The VISION (video indexing for searching over networks) digital video library system has been developed in our laboratory as a testbed for evaluating automatic and comprehensive mechanisms for video archive creation and content-based search, filtering and retrieval of video over local and wide area networks. In order to provide access to video footage within seconds of broadcast, we have developed a new pipelined digital video processing architecture which is capable of digitizing, processing, indexing and compressing video in real time on an inexpensive general purpose computer. These videos were automatically partitioned into short scenes using video, audio and closed-caption information. The resulting scenes are indexed based on their captions and stored in a multimedia database. A client-server-based graphical user interface was developed to enable users to remotely search this archive and view selected video segments over networks of different bandwidths. Additionally, VISION classifies the incoming videos with respect to a taxonomy of categories and will selectively send users videos which match their individual profiles.

Explore More