Antonio Zamora
Chemical Abstracts Service
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Antonio Zamora.
Information Processing and Management | 1981
Erena Mitsuchieru Zamora; Joseph J. Pollock; Antonio Zamora
Abstract Work performed under the SPElling Error Detection COrrection Project (SPEEDCOP) supported by National Science Foundation (NSF) at Chemical Abstracts Service (CAS) to devise effective automatic methods of detecting and correcting misspellings in scholarly and scientific text is described. The investigation was applied to 50,000 word/misspelling pairs collected from six datasets (Chemical Industry Notes (CIN), Biological Abstracts (BA). Chemical Abstracts (CA), Americal Chemical Society primary journal keyboarding (ACS), Information Science Abstracts (ISA), and Distributed On-Line Editing (DOLE) (a CAS internal dataset especially suited to spelling error studies). The purpose of this study was to determine the utility of trigram analysis in the automatic detection and/or correction of misspellings. Computer programs were developed to collect data on trigram distribution in each dataset and to explore the potential of trigram analysis for detecting spelling errors, verifying correctly-spelled words, locating the error site within a misspelling, and distinguishing between the basic kinds of spelling errors. The results of the trigram analysis were largely independent of the dataset to which it was applied but trigram compositions varied with the dataset. The trigram analysis technique developed determined the error site within a misspelling accurately, but did not distinguish effectively between different error types or between valid words and misspellings. However, methods for increasing its accuracy are suggested.
Journal of the Association for Information Science and Technology | 1983
Joseph J. Pollock; Antonio Zamora
The SPEEDCOP (SPEIIing Error Detection correction Project) project recently completed at Chemical Abstracts Service (CAS) extracted over 50,000 misspellings from approximately 25,000,000 words of text from seven scientific and scholarly databases. The misspellings were automatically classified and the error types analyzed. The results, which were consistent over the different databases, showed that the expected incidence of misspelling is 0.2%, that 90–95% of spelling errors have only a single mistake, that substitution is homogeneous while transposition is heterogeneous, that omission is the commonest type of misspelling, and that inadvertent doubling of a letter is the most important cause of insertion errors. The more frequently a letter occurs in the text, the more likely it is to be involved in a spelling error. Most misspellings collected by SPEEDCOP are of the type colloquially referred to as “typos” and approximately 90% are unlikely to be repeated in normal spans of text.
Journal of Chemical Information and Computer Sciences | 1976
Antonio Zamora
This paper describes an algorithm which finds the smallest set of smallest rings of a ring system without the necessity of finding all rings in the ring system. The algorithm first finds the smallest rings in which unused atoms occur and then progresses to find the smallest rings in which unused edges and faces occur until the smallest set of rings required to describe the complete ring system is found. The algorithm converges quickly because the lengths of the paths that need to be scanned to discover each new ring decrease when a smaller ring is found.
Journal of the Association for Information Science and Technology | 1980
Karen A. Hamill; Antonio Zamora
An experimental computer program has been developed to classify documents according to the 80 sections and five major section groupings of Chemical Abstracts (CA). The program uses pattern recognition techniques supplemented by heuristics. During the “training” phase, words from pre‐classified documents are selected, and the probability of occurrence of each word in each section of CA is computed and stored in a reference dictionary. The “classification” phase matches each word of a document title against the dictionary and assigns a section number to the document using weights derived from the probabilities in the dictionary. Heuristic techniques are used to normalize word variants such as plurals, past tenses, and gerunds in both the training phase and the classification phase. The dictionary lookup technique is supplemented by the analysis of chemical nomenclature terms into their component word roots to influence the section to which the documents are assigned. Program performance and human consistency have been evaluated by comparing the program results against the published sections of CA and by conducting an experiment with people experienced in the assignment of documents to CA sections. The program assigned approximately 78% of the documents to the correct major section groupings of CA and 67% of the correct sections or cross‐references at a rate of 100 documents per second.
Journal of the Association for Information Science and Technology | 1980
Antonio Zamora
On-line bibliographic search systems tend to increase the visibility of spelling errors through the use of indexes of unique terms; even low error rates in a data base can result in large numbers of misspelled terms in these indexes. This article describes the techniques used to detect and correct spelling errors in the data base of Chemical Abstracts Service. A computer program for spelling error detection achieves a high level of performance using hashing techniques for dictionary look-up and compression. Heuristic procedures extend the dictionary and increase the proportion of misspelled words in the words flagged. Automatic correction procedures are applied only to words which are known to be misspelled; other corrections are performed manually during the normal editorial cycle. The constraints imposed on the selection of a spelling error detection technique by a complex data base, human factors, and high-volume production are discussed.
Journal of the Association for Information Science and Technology | 1984
Joseph J. Pollock; Antonio Zamora
The SPEEDCOP project recently completed at Chemical Abstracts Service (CAS) extracted over 50,000 misspellings from approximately 25,000,000 words of text from seven scientific and scholarly databases. The misspellings were automatically classified and analyzed and the results used to design and implement a program that proved capable of correcting most such errors. Analysis of the performance of the spelling error detection and correction programs highlighted the features that should be incorporated into a powerful and user-friendly interactive system suitable for nonprogram-mers. These include document level thresholds for misspelling detection, automatic reuse of user decisions, and user verification and control of correction. An advantage of the proposed design is that the system automatically customizes itself to its environment. This article is primarily concerned with system design, not implementation details.
Journal of Chemical Information and Computer Sciences | 1976
Antonio Zamora; David L. Dayton
The Chemical Abstracts Service Chemical Registry System is a computer-based system that uniquely identifies chemical substances on the basis of their structural features. The Registry System currently contains records for more than 3.4 million different substances. Although there are several ways of entering data into the CAS Chemical Registry System, the majority of the transactions for storage or retrieval of data are entered using chemical typewriters. This paper describes the conventions used for typing structure diagrams, the computer programs which decode the typed structure into a connection table, and the edits which are performed during decoding.
Journal of Chemical Information and Computer Sciences | 1977
Ronald G. Dunn; William Fisanick; Antonio Zamora
In January 1976 Chemical Abstracts Service (CAS) began operation of an experimental chemical substance search service which offers both retrospective and current awareness searches for specific substances and for substances containing specified substructures. To support this service, CAS has developed an experimental computer-based substructure search system which provides for batch mode serial searches on files containing Chemical Abstracts Index Nomenclature and structurally related data. The search system supplements basic text search methods with screening techniques to improve search efficiency and with extended logic capabilities to improve the precision of search results.
Journal of Chemical Information and Computer Sciences | 1976
Tommy Ebe; Antonio Zamora
Three lessons in Computer-Aided-Instruction are offered to the students. The first lesson introduces the concept of basic steady-state kinetics and shows the student the steady-state treatment of Michaelis and Menten as well as the various forms of the equations. A typical output from this lesson is given in Figure 3. The second “lesson” concentrates on inhibition of steady-state enzyme kinetics and attempts to bring the basic concepts to the full attention of the student. Various plots are produced, allowing student’s inference. One such plot is presented in Figure 4. The use of the bic:yclic scheme allows not only the use of a single cycle, but provides also rather extensive freedom in the choice of values for the constants and analytical concentrations. The user may also choose various values for the 11. COMPUTER SIMULATION OF STEADY-STATE rate constants, which generate the product. Two types of curves are available to the user, presented in Figures 5 and 6.
Journal of Chemical Information and Computer Sciences | 1975
Joseph J. Pollock; Antonio Zamora