Stephen K. Boyer
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stephen K. Boyer.
pacific symposium on biocomputing | 2006
James J. Rhodes; Stephen K. Boyer; Jeffrey Thomas Kreulen; Ying Chen; Patricia Ordóñez
Text analytics is becoming an increasingly important tool used in biomedical research. While advances continue to be made in the core algorithms for entity identification and relation extraction, a need for practical applications of these technologies arises. We developed a system that allows users to explore the US Patent corpus using molecular information. The core of our system contains three main technologies: A high performing chemical annotator which identifies chemical terms and converts them to structures, a similarity search engine based on the emerging IUPAC International Chemical Identifier (InChI) standard, and a set of on demand data mining tools. By leveraging this technology we were able to rapidly identify and index 3,623,248 unique chemical structures from 4,375,036 US Patents and Patent Applications. Using this system a user may go to a web page, draw a molecule, search for related Intellectual Property (IP) and analyze the results. Our results prove that this is a far more effective way for identifying IP than traditional keyword based approaches.
international conference on document analysis and recognition | 1993
Richard G. Casey; Stephen K. Boyer; Paul Donald Healey; Alex Miller; Bernadette Oudot; Karl S. Zilles
A prototype system for encoding chemical structure diagrams from scanned printed documents is described. The system distinguishes a structure diagram from other printed material on a page image using size and spacing characteristics. It distinguishes line graphics from symbols in an intermediate vectorization stage. Line information is mapped into a connection diagram that represents atomic bonds. Atomic symbols are identified by means of chemical drawing conventions and optical character recognition. The final coded output interfaces to conventional chemistry software for database storage and retrieval, publishing, and modeling.<<ETX>>
international conference on data mining | 2009
Ying Chen; W. Scott Spangler; Jeffrey Thomas Kreulen; Stephen K. Boyer; Thomas D. Griffin; Alfredo Alba; Amit Behal; Bin He; Linda Kato; Ana Lelescu; Cheryl A. Kieliszewski; Xian Wu; Li Zhang
Intellectual Properties (IP), such as patents and trademarks, are one of the most critical assets in today’s enterprises and research organizations. They represent the core innovation and differentiators of an organization. When leveraged effectively, they not only protect a business from its competition, but also generate significant opportunities in licensing, execution, long term research and innovation. In certain industries, e. g., Pharmaceutical industry, patents lead to multi-billion dollar revenue per year. In this paper, we present a holistic information mining solution, called SIMPLE, which mines large corpus of patents and scientific literature for insights. Unlike much prior work that deals with specific aspects of analytics, SIMPLE is an integrated and end-to-end IP analytics solution which addresses a wide range of challenges in patent analytics such as the data complexity, scale, and nomenclature issues. It encompasses techniques for patent data processing and modeling, analytics algorithms, web interface and web services for analytics service delivery and end-user interaction. We use real-world case studies to demonstrate the effectiveness of SIMPLE.
knowledge discovery and data mining | 2015
Meenakshi Nagarajan; Angela D. Wilkins; Benjamin J. Bachman; Ilya B. Novikov; Shenghua Bao; Peter J. Haas; María E. Terrón-Díaz; Sumit Bhatia; Anbu Karani Adikesavan; Jacques Joseph Labrie; Sam Regenbogen; Christie M. Buchovecky; Curtis R. Pickering; Linda Kato; Andreas Martin Lisewski; Ana Lelescu; Houyin Zhang; Stephen K. Boyer; Griff Weber; Ying Chen; Lawrence A. Donehower; W. Scott Spangler; Olivier Lichtarge
We present KnIT, the Knowledge Integration Toolkit, a system for accelerating scientific discovery and predicting previously unknown protein-protein interactions. Such predictions enrich biological research and are pertinent to drug discovery and the understanding of disease. Unlike a prior study, KnIT is now fully automated and demonstrably scalable. It extracts information from the scientific literature, automatically identifying direct and indirect references to protein interactions, which is knowledge that can be represented in network form. It then reasons over this network with techniques such as matrix factorization and graph diffusion to predict new, previously unknown interactions. The accuracy and scope of KnITs knowledge extractions are validated using comparisons to structured, manually curated data sources as well as by performing retrospective studies that predict subsequent literature discoveries using literature available prior to a given date. The KnIT methodology is a step towards automated hypothesis generation from text, with potential application to other scientific domains.
Proceedings of the Third Forum on Research and Technology Advances in Digital Libraries, | 1996
David M. Choy; Cynthia Dwork; Jeffrey Bruce Lotspiech; Laura C. Anderson; Stephen K. Boyer; Thomas D. Griffin; Bruce Albert Hoenig; M. J. Jackson; W. Kaka; James M. McCrossin; Alex Miller; Robert J. T. Morris; Norman J. Pass
As part of IBMs Digital Library Initiative, IBMs Almaden Research Center has teamed with the Institute for Scientific Information in a joint project to deliver on-line access to the bibliographic information and abstracts from the scientific journal articles indexed in Current Contents/Life Sciences as well as articles offered by the respective publishers. This requires both adaptation of existing technologies and development of new capabilities, especially regarding copyright protection. Since the Fall of 1995, a pilot system has been installed at four universities, two corporate libraries, and a major public research library, beginning a study that involves many publishers, libraries, and users to test the system and to experiment with new economic models. This article describes some requirements we identified for this system, and the solutions we have devised for these requirements.
international conference on data mining | 2010
W. Scott Spangler; Ying Chen; Jeffrey Thomas Kreulen; Stephen K. Boyer; Thomas D. Griffin; Alfredo Alba; Linda Kato; Ana Lelescu; Su Yan
Intellectual Properties (IP), such as patents and trademarks, are one of the most critical assets in today’s enterprises and research organizations. They represent the core innovation and differentiators of an organization. When leveraged effectively, they not only protect freedom of action, but also generate significant opportunities in licensing, execution, long term research and innovation. In this paper, we expand upon a previous paper describing a solution called SIMPLE, which mines large corpus of patents and scientific literature for insights. In this paper we focus on the interactive analytics aspects of SIMPLE, which allow the analyst to explore large unstructured information collections containing mixed information in a dynamic way. We use real-world case studies to demonstrate the effectiveness of interactive analytics in SIMPLE.
ADL '95 Selected Papers from the Digital Libraries, Research and Technology Advances | 1995
David M. Choy; Cynthia Dwork; Jeffrey Bruce Lotspiech; Robert J. T. Morris; Norman J. Pass; Laura C. Anderson; Alan E. Bell; Stephen K. Boyer; Thomas D. Griffin; Bruce Albert Hoenig; James M. McCrossin; Alex Miller; Florian Pestoni; Deidra S. Picciano
In this chapter we describe the architecture for the Almaden Distributed Digital Library System, which is intended to support an emerging “information marketplace”. Using a distributed server approach and accommodating heterogeneous environments, the system is designed to meet the diverse needs of the publishers, distributors, and users of scientific journal information at low cost, while protecting the information assets of the publishers and the privacy of the users. A prototype is currently being implemented in a joint effort by IBM Almaden Research Center and the Institute for Scientific Information. A pilot is planned to test the system and to explore new economic models.
Proceedings of the National Academy of Sciences of the United States of America | 2018
Byung-Kwon Choi; Tajhal Dayaram; Neha Parikh; Angela D. Wilkins; Meena Nagarajan; Ilya B. Novikov; Benjamin J. Bachman; Sung Yun Jung; Peter J. Haas; Jacques L. Labrie; Curtis R. Pickering; Anbu Karani Adikesavan; Sam Regenbogen; Linda Kato; Ana Lelescu; Christie M. Buchovecky; Houyin Zhang; Sheng Hua Bao; Stephen K. Boyer; Griff Weber; Kenneth L. Scott; Ying Chen; Scott Spangler; Lawrence A. Donehower; Olivier Lichtarge
Significance We adapted natural language processing to the biological literature and demonstrated end-to-end automated knowledge discovery by exploring subtle word connections. General text mining scanned 21 million publication abstracts and selected a reliable 130,000 from which hypothesis generation algorithms predicted kinases not known to phosphorylate p53, but likely to do so. Six of these p53 kinase candidates passed experimental validation. Among them NEK2 was examined in depth and shown to repress p53 and promote cell division. This work demonstrates the possibility of integrating a vast corpora of written knowledge to compute valuable hypotheses that will often test true and fuel discovery. Scientific progress depends on formulating testable hypotheses informed by the literature. In many domains, however, this model is strained because the number of research papers exceeds human readability. Here, we developed computational assistance to analyze the biomedical literature by reading PubMed abstracts to suggest new hypotheses. The approach was tested experimentally on the tumor suppressor p53 by ranking its most likely kinases, based on all available abstracts. Many of the best-ranked kinases were found to bind and phosphorylate p53 (P value = 0.005), suggesting six likely p53 kinases so far. One of these, NEK2, was studied in detail. A known mitosis promoter, NEK2 was shown to phosphorylate p53 at Ser315 in vitro and in vivo and to functionally inhibit p53. These bona fide validations of text-based predictions of p53 phosphorylation, and the discovery of an inhibitory p53 kinase of pharmaceutical interest, suggest that automated reasoning using a large body of literature can generate valuable molecular hypotheses and has the potential to accelerate scientific discovery.
Archive | 2000
Stephen K. Boyer; Alex Miller
Archive | 1991
Stephen K. Boyer; Richard G. Casey; Alex Miller; Bernadette Oudot; Karl S. Zilles