Clement T. Yu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Clement T. Yu is active.

Explore More

Publication

Featured researches published by Clement T. Yu.

Journal of the Association for Information Science and Technology | 1974

A Theory of Term Importance in Automatic Text Analysis

Gerard Salton; Chung-Shu Yang; Clement T. Yu

Most existing automatic content analysis and indexing techniques are based on word frequency characteristics applied largely in an ad hoc manner. Contradictory requirements arise in this connection, in that terms exhibiting high occurence frequencies in individual documents are often useful for high recall performance (to retrieve many relevant items), whereas terms with low frequency in the whole collection are useful for high precision (to reject nonrelevant items).

Journal of the ACM | 1976

Precision Weighting—An Effective Automatic Indexing Method

Clement T. Yu; Gerard Salton

A great many automatic indexing methods have been implemented and evaluated over the last few years, and automatic procedures comparable in effectiveness to conventional manual ones are now easy to generate. Two drawbacks of the available automatic indexing methods are the absence of reliable linguistic inputs during the indexing process and the lack of formal, analytical proofs concerning the effectiveness of the proposed methods. The precision weighting procedure described in the present study uses relevance criteria to weight the terms occurring in user queries as a function of the balance between relevant and nonrelevant documents in which these terms occur; this approximates a semantic know-how of term importance. Formal mathematical proofs are given under well-defined conditions of the effectiveness of the method .

Journal of the ACM | 1976

A Statistical Model for Relevance Feedback in Information Retrieval

Clement T. Yu; W. S. Luk; T. Y. Cheung

A statistical model is presented for the investigation of a practical method used in relevance feedback. A necessary and sufficient condition for the two parameters used in this method to define a better query than the original query is given. A region in the plane of the parameters is shown to satisfy the sufficient condition. While the points for producing optimal queries are not exactly located, they are shown to be lying on a finite portion of a hyperbola. Experimental results support some of the theoretical findings.

international conference on management of data | 1977

A study on the protection of statistical data bases

Clement T. Yu; Francis Y. L. Chin

We study a number of protection schemes with respect to their effectiveness in providing security for statistical data bases, their feasibility and their ease of implementation. A new method is proposed, and two implementations presented. One implementation guarantees perfect protection against leakage of information about individuals; the other requires very little implementation effort, but has a small probability of leakage.

ACM Transactions on Database Systems | 1978

On the estimation of the number of desired records with respect to a given query

Clement T. Yu; W. S. Luk; M. K. Siu

The importance of the estimation of the number of desired records for a given query is outlined. Two algorithms for the estimation in the “closest neighbors problem” are presented. The numbers of operations of the algorithms are <italic>&Ogr;</italic>(<italic>ml</italic><supscrpt>2</supscrpt>) and <italic>&Ogr;</italic>(<italic>ml</italic>), where <italic>m</italic> is the number of clusters and <italic>l</italic> is the “length” of the query.

Communications of The ACM | 1977

Effective information retrieval using term accuracy

Clement T. Yu; Gerard Salton

The performance of information retrieval systems can be evaluated in a number of different ways. Much of the published evaluation work is based on measuring the retrieval performance of an average user query. Unfortunately, formal proofs are difficult to construct for the average case. In the present study, retrieval evaluation is based on optimizing the performance of a specific user query. The concept of query term accuracy is introduced as the probability of occurrence of a query term in the documents relevant to that query. By relating term accuracy to the frequency of occurrence of the term in the documents of a collection it is possible to give formal proofs of the effectiveness with respect to a given user query of a number of automatic indexing systems that have been used successfully in experimental situations. Among these are inverse document frequency weighting, thesaurus construction, and phrase generation.

Information Processing and Management | 1976

Automatic Indexing Using Term Discrimination and Term Precision Measurements

Gerard Salton; A. Wong; Clement T. Yu

Abstract A variety of abstract automatic indexing models have been developed in recent times in an effort to produce indexing methods that are both effective and usable in practice. Among these are the term discrimination model and the term precision system. These two indexing systems are briefly described and experimental evidence is cited showing that a combination of both theories produces better retrieval performance than either one alone. Appropriate conclusions are reached concerning viable automatic indexing procedures usable in practice.

ACM Transactions on Database Systems | 1979

Experiments on the determination of the relationships between terms

Vijay V. Raghavan; Clement T. Yu

The retrieval effectiveness of an automatic method that uses relevance judgments for the determination of positive as well as negative relationships between terms is evaluated. The term relationships are incorporated into the retrieval process by using a generalized similarity function that has a term match component, a positive term relationship component, and a negative term relationship component. Two strategies, query partitioning and query clustering, for the evaluation of the effectiveness of the term relationships are investigated. The latter appears to be more attractive from linguistic as well as economic points of view. The positive and the negative relationships are verified to be effective both when used individually, and in combination. The importance attached to the term relationship components relative to that of term match component is found to have a substantial effect on the retrieval performance. The usefulness of discriminant analysis as a technique for determining the relative importance of these components is investigated.

Information Processing Letters | 1976

On the complexity of finding the set of candidate keys for a given set of functional dependencies

Clement T. Yu; D. T. Johnson

model proposed by E.F. Codd [I. -31 rest promising models of data base. In view6 of the users are expressed as two%rpbIes or relations, with the rows denoting e columns representing the domains of the suggestions for the automaruction of the relations is the idetrtifidomains, called candidate keys, which the values of the remaining domains. and Fadous and Forsyth [!i] rent algorithms for finding the set onal data base, given the on the data base. Unfore .computational complexit algorithms is not provided. In this paper, number of candidate keys can be of the number of functional dependencies. is no “fast” or polynomial algorithm for +~m. Much of the notation in the next section

Journal of the Association for Information Science and Technology | 1974

A Clustering Algorithm Based on User Queries.

Clement T. Yu

A clustering algorithm which is tree-like in structure, and is based on user queries, is presented. It is compared to Bonners Method, Rocchios Method, Dattolas Method and the Single Link Method in three different aspects, namely system effectiveness, system efficiency and the time required for clustering. Experimental results using the Cranfield 424 collection indicate that the proposed method is superior to the other methods.

Explore More