Lee A. Hollaar
University of Utah
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lee A. Hollaar.
ACM Transactions on Database Systems | 1983
Roger L. Haskin; Lee A. Hollaar
The design and operation of a new class of hardware-based pattern matchers, such as would be used in a backended database processor in a full-text or other retrieval system, is presented. This recognizer is based on a unique implementation technique for finite state automata consisting of partitioning the state table among a number of simple digital machines. It avoids the problems generally associated with implementing finite state machines, such as large state table memories, complex control mechanisms, and state encodings. Because it consists primarily of memory, with its high regularity and density, needs only limited static interconnections, and operates at a relatively low speed, it can be easily constructed using integrated circuit techniques. After a brief discussion of other pattern-matching hardware, the structure and operation of the partitioned finite state automaton is given, along with a simplified discussion of how the state tables are partitioned. The expected performance of the resulting system and the state table partitioning programs is then discussed.
international acm sigir conference on research and development in information retrieval | 1983
Lee A. Hollaar
As databases become very large, conventional digital computers cannot provide satisfactory response time. This is particularly true for text databases, which must often be several orders of magnitude larger than formatted databases to store a useful amount of information. Even the standard techniques for improving system performance (such as inverted files) may not be sufficient to give the desired performance, and the use of an unconventional hardware organization may become necessary.A variety of different organizations has been proposed to enhance processing of text retrieval operations. Most of these have concentrated on the design of fast, efficient search engines. These can be divided into three classes: associative memories, cellular pattern matchers, and finite state automata. The advantages and disadvantages inherent in each of these approaches are discussed, along with a number of proposed implementations. Finally, the text retrieval system under development at the University of Utah is discussed in more detail.
ACM Computing Surveys | 1996
Ellen Riloff; Lee A. Hollaar
The goal of a traditional information retrieval (IR) system is to search an information repository, such as a text database, and retrieve documents that are potentially relevant to a query. Since query-based IR systems must operate in real time, they must be able to search large volumes of text quickly and efficiently. Other information-retrieval applications, such as text categorization, text routing, and text filtering, are also becoming increasingly important. These applications are generally concerned with long-term information needs, where a topic is expected to be of interest for an extended period of time. Text categorization systems assign predefined category labels to texts. For example, a text categorization system for computer science might use categories such as operating systems, programming languages, artificial intelligence, or information retrieval. Text routing systems typically accept a set of user profiles and automatically classify texts so that relevant texts can be routed to appropriate users [Harman 1994]. Text filtering systems accept a list of topics that are, or are not, of interest and allow only texts that satisfy the filter to pass through to the user [Belkin and Croft 1992]. Text categorization systems are typically applied to static databases, while text routing and text filtering systems are usually applied to incoming data streams. Information-retrieval systems must grapple with all of the ambiguities and idiosyncrasies inherent in natural language, such as synonymy (e.g., “start”, “begin”, and “initiate” have essentially the same meaning) and polysemy (e.g., “shot” has many different meanings, including the act of shooting, an injection, a quantity of liquor, a photograph, pellets, or an attempt). Phrases also require special attention because multiword expressions often have a composite meaning different from the individual words. For example, a “hot dog” does not usually refer to a warm canine, and an “operating system” does not usually refer to a system that is simply operating. Most information-retrieval systems preprocess a document collection into an inverted file that allows the system to determine quickly which words appear in each document. Stopword lists are commonly used to remove highly frequent words, such as “the” and “of,” under the assumption that they don’t contribute much to the meaning of a text. Stemming algorithms are sometimes used to reduce a word to its root form so that different morphological variations will match [Frakes and Baeza-Yates 1992]. An alternative text-representation scheme uses superimposed codewords to produce a fixed-length vector from the binary representations of words. The fixed-length vector is especially useful for parallel and hardware systems, but this method can sometimes hallucinate words that don’t actually appear in the original document. Traditional information-retrieval methods retrieve documents by searching for relevant words or phrases. Most commercial IR systems allow the user to define a query using keywords and standard Boolean operators. These systems retrieve documents that precisely match the query. The vector-space model [Salton
design automation conference | 1984
Lee A. Hollaar; Brent E. Nelson; Tony M. Carter; Raymond A. Lorie
An important use for a database management system is in the storage and handling of information for engineering design, particularly integrated circuit design. However, most discussions on this topic have concentrated on the layout of shapes necessary to form the various circuit elements, or connections between user-defined cells. Equally important, but often disregarded, is the necessity to support other design tools in addition to graphics for circuit layout. These include simulators and automatic layout programs that take a description of a circuit at one level and convert it to a lower level. In addition, if cells are part of a library defined and maintained by others, operations must be included to handle the maintenance of generations or versions of a cell design. These aspects of a database management system for engineering design are discussed in light of the tools being developed at the University of Utah and an extended version of System R, developed at the IBM San Jose Research Laboratory. The Utah approach emphasizes the use of previously designed and tested cells, with interconnects at fixed locations, placed on a grid. Because it is unlikely that the designers of circuits designed all (or any) of the cells used in their circuits, special database management operations are necessary to assure that a consistent, working circuit results.
hawaii international conference on system sciences | 1992
Lee A. Hollaar
The Utah Retrieval System Architecture (URSA) was initially developed in 1981 developed as an alternative to central-processor-based information retrieval systems. It combined distributed processing and a windowed user interface with a hardware-based search server combined with using document surrogates such as partially-inverted files. The authors have now started the development and testing of a medium-scale (about 10 gigabyte) parallel backend search server to demonstrate its operation and to gather data on the use of such a backend processor in actual operation, including information about query complexity and arrival rates. This searcher, based on a hardware-augmented RISC processor, builds on their experience developing and operating the custom VLSI FSA-based search engine. The use of a programmable processor allows the easy implementation of complex search patterns, such as numeric range matching, while the special hardware augmentation provides considerably better performance than would be available from a standard RISC processor server.<<ETX>>
IWDM | 1985
Lee A. Hollaar
The Utah Text Search Engine is a special-purpose backend processor capable of scanning serial data from a mass storage device for occurrences of complex patterns. It is based on a new form of finite state automaton, the partitioned FSA, which is well suited for VLSI implementation. Custom integrated circuits have been developed and tested, and an initial prototype configuration has been used to replace the software-based search module in the prototype retrieval system running on the University of Utah Computer Science Department Apollo workstation network. A new prototype is currently under development. The basic structure of the search engine, the implementation of the two prototypes, and plans for future extensions and improvements are discussed.
international acm sigir conference on research and development in information retrieval | 1985
Lee A. Hollaar
The Utah Retrieval System Architecture provides an excellent testbed for the development and testing of new algorithms or techniques for information retrieval. URSA#8482; is a message-based structure capable of running on a variety of system configurations, ranging from a single mainframe processor to a system distributed across a number of dissimilar processors. It can readily support a variety of specialized backend processors, such as high-speed search engines. The architecture divides the components of a text retrieval system into two classes: servers and clients. A triple of servers (index, search, and document access) for each database provide the capabilities normally associated with a retrieval system. Possible clients for these servers include a window-based user interface, whose query language can be easily modified, a connection to a mainframe host processor, or Al-based query modification programs that wish to use the database. Any module in the system can be replaced by a new module using a different algorithm as long as the new module complies with the message formats for that function. In fact, with some care this module switch can occur while the system is running, without affecting the users. A monitor program collects statistics on all system messages, giving information regarding query complexity, processing time for each module, queueing times, and bandwidths between every module. This paper discusses the background of URSA and its structure, with particular emphasis on the features that make it a good testbed for information retrieval techniques.
acm sigops european workshop | 1986
Lee A. Hollaar
The Utah Retrieval System Architecture project is an ongoing research program at the University of Utah to demonstrate a new overall system architecture for information retrieval and handling. While systems for searching large full-text or bibliographic databases have been available for well over a decade (for example, LEXIS and Westlaw for legal material), almost without exception these systems reflect the computer technology of the late 60s. They are organized around a large mainframe computer, with a number of terminals available for users. Video displays emulates hardcopy terminals, with commands being entered at the bottom of the screen and information scrolling off the top, to be lost forever. The system commands tend to be rather cryptic, especially those for altering a past query so that it can be reissued. In contrast, the URSA user interface consists of a single powerful search command, allowing the specification of a very complex search expression, coupled with a multiple-window display with an editor function present in each window. Because text can be moved freely between the windows, no special commands are necessary for modifying and reissuing a past query, saving results or queries, or including retrieved information into a new report or document. Unlike the past monolithic, mainframe-based systems, which were generally based on some key algorithm (such as a novel search or index technique), URSA is based on a small number of well-defined functions which may be implemented using a number of strategies. This architecture is well suited for a distributed envinmmont, as well as a centrnliTed processor. Information being passed to and from a module consist of of as a series of messages, using a means appropriate to the particular implementation (subroutine calls or an interprocess communications facility for a single-processor implementation, shared menm~ for a tightly-coupled nudtlprccessor version, or network communications for a loosely-conpled system). The same modules could be used in a variety of configurations, with only the underlying communications routines being changed. In the cuw~t distributed configuration, all system communications are handled by a portable, network-transparent communications system (NTCS). It is built on top of existing virtual circuit systems, such as the Apollo MBX system or TCP. The NTCS consists of a module which is bound to each application module, and has three layers: a network-dependent layer, which drives the particular virtual circuit system; a multinet layer, supporting multiple physical circulm through peer gateways; and a logical connection mMqtellarlce layer, …
acm annual conference on range of computing | 1985
Lee A. Hollaar
The Utah Retrieval System Architecture is a distributed system organization supporting a wide range of information retrieval system requirements. It was originally developed to serve as a testbed for the evaluation of different system features and algorithms, and to provide a contemporary retrieval system with a multi-window user interface, in part to demonstrate a specialized hardware-based backend search engine. The actual ietrieval process is supported by a number of backend server processes, each providing a basic function (such as index lookup, text searching, or fetching and formatting of indicated documents), connected to user workstations by a communications network. A resource allocation protocol takes requests for particular server functions and establishes the appropriate linkages. A portable network-transparent communications system, built on top of existing virtual circuit systems, supports a variety of communications styles, including both remote procedure calls and asynchronous communications for the user processes. The primary user interface is through multiple windows. Queries are entered in a window corresponding to a database of interest, and results are returned in a separate window corresponding to the query. Other windows can be used for word processing, electronic mail, and other applications programs. Information can be freely moved between the windows, providing a simple means of capturing retrieved information and eliminating the need for many of the commands of a conventional retrieval system.
ACM Sigarch Computer Architecture News | 1983
Lee A. Hollaar
This book is unique. It is not a basic logic design textbook, teaching how to minimize circuits or build simple clocked machines, nor is it a discussion of computer architecture. Both topics are included. It is about the design of non-trivial systems, using a sixteen-bit processor as the example. In examining various implementations of this processor (hardwired, microprogrammed, and bit-sliced), the reader gains an understanding of how a real system is implemented, and why certain design decisions are made. The concepts are readily transferable to other, more common design projects (how often does one design a computer), such as interfaces and controllers. It is assumed that the reader has already had an introduction to logical design and boolean switching theory. Appendix C provides a good review of the area and presents the authors style of logic design. The minimization of functions is purposely downplayed, while the more important concepts of designing using buses and MSI parts like selectors, ROMs, and PLAs are emphasized. Appendix B discusses the physical implementation of switching circuits as well as loading and propagation delays. Appendix D provides a review of sequential circuits, emphasizing the timing considerations that can be the root of many sequential circuit problems when ignored by the designer. After a brief introduction to the design process, the author discusses the considerations in the design of a computers instruction set architecture: memory organization, data types, addressing modes, flow of control and data manipulation instructions, and input/output operations. Included are a number of examples drawn from actual machines like the PDP-8 and PDP-II. Then a simplified sixteen bit machine, the TM-16 is introduced. The steps in its design and implementation are discussed, including the development of the data flow paths and the control section. After a discussion of memory system design, including the extra logic needed for a dynamic memory and a discussion of cache memory architecture , a more complex sixteen bit processor, the SC-16 is presented. Its implementation as a hardwired processor is detailed, using techniques similar to those previously used for the TM-16. Then two alternative implementations are presented, one using a microprogrammed control unit and the other based on bit-sliced chips (in particular, the Am2903). In Appendix A, the author presents a very complete course outline, including a design project of a processor similar to those presented in the text. Even the criteria for evaluating and grading the project is …