S. Goldberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where S. Goldberg is active.

Explore More

Publication

Featured researches published by S. Goldberg.

international conference on management of data | 2016

Ontological Pathfinding

Yang Chen; S. Goldberg; Daisy Zhe Wang; Soumitra Siddharth Johri

Recent years have seen a drastic rise in the construction of web-scale knowledge bases (e.g., Freebase, YAGO, DBPedia). These knowledge bases store structured information about real-world people, places, organizations, etc. However, due to limitations of human knowledge and information extraction algorithms, these knowledge bases are still far from complete. In this paper, we study the problem of mining first-order inference rules to facilitate knowledge expansion. We propose the Ontological Pathfinding algorithm (OP) that scales to web-scale knowledge bases via a series of parallelization and optimization techniques: a relational knowledge base model to apply inference rules in batches, a new rule mining algorithm that parallelizes the join queries, a novel partitioning algorithm to break the mining tasks into smaller independent sub-tasks, and a pruning strategy to eliminate unsound and resource-consuming rules before applying them. Combining these techniques, we develop the first rule mining system that scales to Freebase, the largest public knowledge base with 112 million entities and 388 million facts. We mine 36,625 inference rules in 34 hours; no existing approach achieves this scale.

very large data bases | 2016

ScaLeKB: scalable learning and inference over large knowledge bases

Yang Chen; Daisy Zhe Wang; S. Goldberg

Recent years have seen a drastic rise in the construction of web knowledge bases (e.g., Freebase, YAGO, DBPedia). These knowledge bases store structured information about real-world people, places, organizations, etc. However, due to the limitations of human knowledge, web corpora, and information extraction algorithms, the knowledge bases are still far from complete. To infer the missing knowledge, we propose the Ontological Pathfinding (OP) algorithm to mine first-order inference rules from these web knowledge bases. The OP algorithm scales up via a series of optimization techniques, including a new parallel-rule-mining algorithm, a pruning strategy to eliminate unsound and inefficient rules before applying them, and a novel partitioning algorithm to break the learning task into smaller independent sub-tasks. Combining these techniques, we develop a first rule mining system that scales to Freebase, the largest public knowledge base with 112 million entities and 388 million facts. We mine 36,625 inference rules in 34 h; no existing system achieves this scale.Based on the mining algorithm and the optimizations, we develop an efficient inference engine. As a result, we infer 0.9 billion new facts from Freebase in 17.19 h. We use cross validation to evaluate the inferred facts and estimate a degree of expansion by 0.6 over Freebase, with a precision approaching 1.0. Our approach outperforms state-of-the-art mining algorithms and inference engines in terms of both performance and quality.

international conference on multimedia information networking and security | 2012

Landmine Detection Using Two-Tapped Joint Orthogonal Matching Pursuits

S. Goldberg; Taylor C. Glenn; Joseph N. Wilson; Paul D. Gader

Joint Orthogonal Matching Pursuits (JOMP) is used here in the context of landmine detection using data obtained from an electromagnetic induction (EMI) sensor. The response from an object containing metal can be decomposed into a discrete spectrum of relaxation frequencies (DSRF) from which we construct a dictionary. A greedy iterative algorithm is proposed for computing successive residuals of a signal by subtracting away the highest matching dictionary element at each step. The nal condence of a particular signal is a combination of the reciprocal of this residual and the mean of the complex component. A two-tap approach comparing signals on opposite sides of the geometric location of the sensor is examined and found to produce better classication. It is found that using only a single pursuit does a comparable job, reducing complexity and allowing for real-time implementation in automated target recognition systems. JOMP is particularly highlighted in comparison with a previous EMI detection algorithm known as String Match.

very large data bases | 2016

SigmaKB: multiple probabilistic knowledge base fusion

Miguel Rodríguez; S. Goldberg; Daisy Zhe Wang

The interest in integrating web-scale knowledge bases (KBs) has intensified in the last several years. Research has focused on knowledge base completion between two KBs with complementary information, lacking any notion of uncertainty or method of handling conflicting information. We present SigmaKB, a knowledge base system that utilizes Consensus Maximization Fusion and user feedback to integrate and improve the query results of a total of 71 KBs. This paper presents the architecture and demonstration details.

north american chapter of the association for computational linguistics | 2016

Consensus Maximization Fusion of Probabilistic Information Extractors.

Miguel Rodríguez; S. Goldberg; Daisy Zhe Wang

Current approaches to Information Extraction (IE) are capable of extracting large amounts of facts with associated probabilities. Because no current IE system is perfect, complementary and conflicting facts are obtained when different systems are run over the same data. Knowledge Fusion (KF) is the problem of aggregating facts from different extractors. Existing methods approach KF using supervised learning or deep linguistic knowledge, which either lack sufficient data or are not robust enough. We propose a semi-supervised application of Consensus Maximization to the KF problem, using a combination of supervised and unsupervised models. Consensus Maximization Fusion (CM Fusion) is able to promote high quality facts and eliminate incorrect ones. We demonstrate the effectiveness of our system on the NIST Slot Filler Validation contest, which seeks to evaluate and aggregate multiple independent information extractors. Our system achieved the highest F1 score relative to other system submissions.

Journal of Data and Information Quality | 2017

A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction

S. Goldberg; Daisy Zhe Wang; Christan Grant

The amount of text data has been growing exponentially in recent years, giving rise to automatic information extraction methods that store text annotations in a database. The current state-of-the-art structured prediction methods, however, are likely to contain errors and it is important to be able to manage the overall uncertainty of the database. On the other hand, the advent of crowdsourcing has enabled humans to aid machine algorithms at scale. In this article, we introduce pi-CASTLE, a system that optimizes and integrates human and machine computing as applied to a complex structured prediction problem involving Conditional Random Fields (CRFs). We propose strategies grounded in information theory to select a token subset, formulate questions for the crowd to label, and integrate these labelings back into the database using a method of constrained inference. On both a text segmentation task over academic citations and a named entity recognition task over tweets we show an order of magnitude improvement in accuracy gain over baseline methods.

north american chapter of the association for computational linguistics | 2012