David Lefkovitz
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Lefkovitz.
Archive | 1969
David Lefkovitz
An information system is a very complex and sophisticated communication system, but, unlike a telephone system, its major complexity lies not in the switching of lines among various users, but rather in the provision of file structures as a central medium of communication through which relatively complicated data processing takes place. Another difference is that in a telephone system, the several users approach and use the system in essentially the same mode of operation. That is, they are usually transmitters as well as receptors of information. Furthermore, processing of the information that they transmit or receive is usually considered to be a detractive or undesirable property of the communication system, if it in any way modifies the information per se, its meaning or content. The information system, on the other hand, generally has two different classes of users, each of which approaches the system in a different way and for a different purpose, although certain individuals may belong to both classes. Figure 1 presents the model of an information system, and these two classes of people are there labeled Generator and User. The medium of communication, as indicated within the ovals in the center of the diagram are the files. The Generators of information transmit to the files via depositions of various types that may generically be called Documents.
Proceedings of the 1976 annual conference on | 1976
Donald B. Crouch; David Lefkovitz; Julius T. Tou; Chung-Shu Yang
This SIGIR tutorial session on retrieval systems presents two experts in their field discussing two major types of retrieval systems. Gerard Salton will present his view of the automated information storage and retrieval system, i.e., a system which is designed to retrieve documents or document surrogates in response to the specific request of a user. He will emphasize file organization, content analysis and indexing, search strategies, and retrieval processes. On the other side of the retrieval fence, Ed Sibley will discuss Data Base Management Systems, with an emphasis on the data structures and interfaces utilized. Terminology, techniques, and problem areas will be discussed with the aim of (1) introducing newcomers to the subject areas, (2) giving the more experienced practitioners an up-to-date look at current technology, and (3) giving users and designers of each type of system insight into the types of techniques which are utilized in both familiar and unfamiliar situations, in order that they may be aided in solving their own retrieval problems.
Archive | 1969
David Lefkovitz
The query language reflects all of the system functional requirements discussed in the preceding chapter, because it is the interface between the user of the file system and the automated search and storage components of the system. Hence, all of the design capabilities that are built into the file system by virtue of its structure must be made available to the user through the medium of the query language. It is important to realize that most users of an information system will probably be aware of the information structure of the files (i.e, whether it’s hierarchic, whether it’s associative, etc.), but they may be disinterested in the file organization (i.e., whether the lists are threaded or inverted, or whatever other means of file partitioning are employed). Furthermore, command of the greater capabilities that are designed into the system, by virtue of sophisticated DASDs, consoles, communications, processors, and advanced techniques can only be exercised by the nonprogrammer user through a properly designed task oriented query language. This represents a considerable design challenge, because the nonprogrammer user has certain traditional thought processes regarding file handling, which are not, in general, compatible with the way in which the computer has organized its files internally.
Archive | 1969
David Lefkovitz
The principal reason for on-line, real-time update of a file is to serve the operational needs of those systems in which the Update transaction must be usable in the file within a very short time from its declaration. Cost comparisons between on-line versus batched updating are deceptive because of the manner in which one may choose to account for such fixed costs as the DASD, terminals, and software. Therefore, the decision should be qualitatively based upon a requirement for timely updates, unless the hardware and software accounting is so clear-cut as to enable a cost comparison. For example, if the software were supplied with the equipment, and all equipment were charged strictly on a time, type, and capacity basis (i.e., broken down by equipment type and quantity of direct access storage), then one could readily compare costs. Other than in this kind of clear-cut situation, the difficulty arises in apportioning the share of hardware cost to be carried by the update functions, because, notwithstanding on-line update, the same DASD and terminal costs would have to be sustained in support of retrieval functions. Software design and implementation costs, however, would be higher, and these should be accounted. A single update can be executed typically in terms of time from a few tens to hundreds of milliseconds of processing time, depending on equipment type.
Archive | 1969
David Lefkovitz
The discussions of the subsequent chapters in this book require that certain definitions be made at this point, and that a framework be established for the effective presentation of design and programming concepts. This chapter is, therefore, devoted to certain definitions, underlying concepts, and the statement of functional requirements that are placed upon the design of files for on-line systems by virtue of their incorporation into the kind of system environment discussed in Chapter I.
Archive | 1969
David Lefkovitz
The direct access storage device, or DASD, as it will be referred to, is a generic name for peripheral computer storages that have the approximate data access characteristic shown in Fig. 7. This characteristic is also compared in the figure with those corresponding to magnetic tape and core. No time scale is implied by this diagram but, as drawn, would be roughly logarithmic. The access time of core memories is independent of the addressing distance between accessions, and is on the order of 1 to 10 microseconds. The access time of magnetic tape is very nearly linear over a single reel of tape, which spans an addressing distance corresponding to approximately 15 million bytes, or 3 to 4 million words. The typical slope of this line is around 10-5 sec per byte for a 90 KB/S tape drive. The DASD, in contrast has linear portions of its characteristic, which are separated by vertical jumps, where the number and height of these jumps are determined by the mechanical construction of the device. In fact, the DASDs can be further classified according to the number and height of such breaks in their characteristic, as shown in Table 3. It is also necessary, in the case of the DASD, to redefine the notion of distance between addresses, as will be shown after the discussion of mechanical construction.
Archive | 1969
David Lefkovitz
In Chapter V the two-step retrieval process of Decoding and File search was described, and it was indicated that these two processes were relatively independent. As a further demonstration of this independence Chapter VI presented detailed descriptions of these de- coders with minimal reference to or contingencies upon the data file organization. Only in one case, the randomizer associated with Fig. 33, was there a restriction placed on the file structure design. Thus, the designer is relatively free to select a decoder, based almost entirely upon the criteria outlined in the last chapter. Similarly, the selection of a data file structure is independent of the decoder.
Archive | 1969
David Lefkovitz
In Chapter I an overview of the information system was presented, and in Chapter III the functional requirements it imposes upon the file structure were discussed. Then three definitions were introduced: information structure, file structure (organization), and data structure. The first was said to be outside the purview of the automated system designer, since it was a relatively fixed property of the data files presented to him. The second and third are of direct interest to the designer, but the third is relatively trivial as compared with the second because it primarily concerns record formats and list structuring controls. The second, file structure, is of primary interest to the designer because it is here that he makes decisions regarding the file partitioning techniques, the type of directory construction, file and directory maintenance techniques, and executive system and query processor interactions. Chapters VI, VII, and VIII are devoted exclusively to these subjects, and in order to assemble the various concepts and techniques into a coherent body of useable design information, a simple classification is to be made in this chapter of the techniques for file structuring. This systematization is also helpful when comparing techniques and designating trade-offs. The first step is to isolate the file structuring problem from file manipulation programs, and both of these from executive and query processing functions.
Archive | 1969
David Lefkovitz
Four of the decoding techniques shown in Fig. 27 are to be described and analyzed. There are (1) the truncated fixed length keyword tree, (2) the unique truncation variable length key-word tree, (3) the complete variable length key-word tree, and (4) the randomizer. For convenience these descriptions will be shortened to the fixed tree, truncated variable tree, variable tree, and randomized methods, respectively. The procedure followed will be to describe and illustrate the methods of each, then to formulate expressions for retrieval time and storage requirements, and finally to compare them with respect to programming complexity, decoding speed, and memory requirement.
IEEE Spectrum | 1964
David Lefkovitz
Automation and the Library of Congress — Library of Congress, Washington, D.C., 1964; 88 pages, illus.