Is this you? Create Your Porfile

Shi Gao

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shi Gao is active.

Explore More

Publication

Featured researches published by Shi Gao.

international conference on management of data | 2014

ABS: a system for scalable approximate queries with accuracy guarantees

Kai Zeng; Shi Gao; Jiaqi Gu; Barzan Mozafari; Carlo Zaniolo

Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must be accompanied by reliable estimates on the quality of sample-produced approximate answers; the two main techniques used in the past for this purpose are (i) closed-form analytic error estimation, and (ii) the bootstrap method. Approach (i) is extremely efficient but lacks generality, whereas (ii) is general but suffers from high computational overhead. Our recently introduced Analytical Bootstrap method combines the strengths of both approaches and provides the basis for our ABS system, which will be demonstrated at the conference. The ABS system models bootstrap by a probabilistic relational model, and extends relational algebra with operations on probabilistic relations to predict the distributions of the AQP results. Thus, ABS entails a very fast computation of bootstrap-based quality measures for a general class of SQL queries, which is several orders of magnitude faster than the standard simulation-based bootstrap. In this demo, we will demonstrate the generality, automaticity, and ease of use of the ABS system, and its superior performance over the traditional approaches described above.

very large data bases | 2013

IBminer: a text mining tool for constructing and populating InfoBox databases and knowledge bases

Hamid Mousavi; Shi Gao; Carlo Zaniolo

Knowledge bases and structured summaries are playing a crucial role in many applications, such as text summarization, question answering, essay grading, and semantic search. Although, many systems (e.g., DBpedia and YaGo2) provide massive knowledge bases of such summaries, they all suffer from incompleteness, inconsistencies, and inaccuracies. These problems can be addressed and much improved by combining and integrating different knowledge bases, but their very large sizes and their reliance on different terminologies and ontologies make the task very difficult. In this demo, we will demonstrate a system that is achieving good success on this task by: i) employing available interlinks in the current knowledge bases (e.g. externalLink and redirect links in DBpedia) to combine information on individual entities, and ii) using widely available text corpora (e.g. Wikipedia) and our IBminer text-mining system, to generate and verify structured information, and reconcile terminologies across different knowledge bases. We will also demonstrate two tools designed to support the integration process in close collaboration with IBminer. The first is the InfoBox Knowledge-Base Browser (IBKB) which provides structured summaries and their provenance, and the second is the InfoBox Editor (IBE), which is designed to suggest relevant attributes for a user-specified subject, whereby the user can easily improve the knowledge base without requiring any knowledge about the internal terminology of individual systems.

very large data bases | 2013

Discovering attribute and entity synonyms for knowledge integration and semantic web search

Hamid Mousavi; Shi Gao; Carlo Zaniolo

We propose the Context-aware Synonym Suggestion System (CS3) which learns synonyms from text by using our NLP-based text mining framework, called SemScape, and also from existing evidence in the current knowledge bases (KBs). Using CS3 and our previously proposed knowledge extraction system IBminer, we integrate some of the publicly available knowledge bases into one of the superior quality and coverage, called IKBstore.

international conference on management of data | 2014

Text-Mining, Structured Queries, and Knowledge Management on Web Document Corpora

Hamid Mousavi; Maurizio Atzori; Shi Gao; Carlo Zaniolo

Wikipedias InfoBoxes play a crucial role in advanced applications and provide the main knowledge source for DBpedia and the powerful structured queries it supports. However, InfoBoxes, which were created by crowdsourcing for human rather than computer consumption, suffer from incompleteness, inconsistencies, and inaccuracies. To overcome these problems, we have developed (i) the IBminer system that extracts InfoBox information by text-mining Wikipedia pages, (ii) the IKBStore system that integrates the information derived by IBminer with that of DBpedia, YAGO2,WikiData,WordNet, and other sources, and (iii) SWiPE and InfoBox Editor (IBE) that provide a user-friendly interfaces for querying and revising the knowledge base. Thus, IBminer uses a deep NLP-based approach to extract from text a semantic representation structure called TextGraph from which the system detects patterns and derives subject-attribute-value relations, as well as domain-specific synonyms for the knowledge base. IKBStore and IBE complement the powerful, user-friendly, by-example structured queries of SWiPE by supporting the validation and provenance history for the information contained in the knowledge base, along with the ability of upgrading its knowledge when this is found incomplete, incorrect, or outdated.

Information & Computation | 2017

User-friendly temporal queries on historical knowledge bases

Carlo Zaniolo; Shi Gao; Maurizio Atzori; Muhao Chen; Jiaqi Gu

Abstract DBpedia and other RFD-encoded Knowledge Bases (KB)s give users access to encyclopedic knowledge via SPARQL queries. As the world evolves, the KBs are updated, and the history of entities and their properties becomes of great interest. Thus, we need powerful tools and friendly interfaces to query histories and flash-back to the past. Here, we propose (i) a point-based temporal extension of SPARQL, called SPARQL T , which enables simple and concise expression of temporal queries, and (ii) an extension of Wikipedia Infoboxes to support user-friendly by-example temporal queries implemented by mapping them into SPARQL T . Our main-memory RDF-TX system supports such queries efficiently using Multi-Version B+ trees, compressed indexes, and query optimization techniques, which achieve performance and scalability, as demonstrated by experiments on historical datasets including Cliopedia derived from Wikipedia dumps. We finally discuss how provenance information can be used to add valid-time features to these transaction-time KBs.

ieee international conference on fuzzy systems | 2016

Converting spatiotemporal data Among heterogeneous granularity systems

Muhao Chen; Shi Gao; X. Sean Wang

Spatiotemporal data are often expressed in terms of granularities to indicate the measurement units of the data. A granularity system usually consists of a set of granularities that share a “common refined granularity” (CRG) to enable granular comparison and data conversion within the system. However, if data from multiple granularity systems needs to be used in a unified application, it is necessary to extend the data conversion and comparison within a granularity system to those for multiple granularity systems. This paper proposes a formal framework to enable such an extension. The framework involves essentially some preconditions and properties for verifying the existence of a CRG and unifying conversions of incongruous semantics, and supports the approach to integrate multiple systems into one so as to process granular interoperation across systems just like in a single system. Quantification of uncertainty in granularity conversion is also considered to improve the precision of granular comparison.

International Journal of Semantic Computing | 2014

Mining Semantics Structures from Syntactic Structures in Web Document Corpora

Hamid Mousavi; Shi Gao; Deirdre Kerr; Markus Iseli; Carlo Zaniolo

The Web is making possible many advanced text-mining applications, such as news summarization, essay grading, question answering, semantic search and structured queries on corpora of Web documents. For many of such applications, statistical text-mining techniques are of limited effectiveness since they do not utilize the morphological structure of the text. On the other hand, many approaches use NLP-based techniques that parse the text into parse trees, and then use patterns to mine and analyze parse trees which are often unnecessarily complex. To reduce this complexity and ease the entire process of text mining, we propose a weighted-graph representation of text, called TextGraphs, which captures the grammatical and semantic relations between words and terms in the text. TextGraphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves coreferences by a novel technique, generates domain-specific TextGraphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining TextGraphs.

international conference on conceptual modeling | 2012

Supporting database provenance under schema evolution

Shi Gao; Carlo Zaniolo

Database schema upgrades are common in modern information systems, where the provenance of the schema is of much interest, and actually required to explain the provenance of contents generated by the database conversion that is part of such upgrades. Thus, an integrated management for data and metadata is needed, and the Archived Metadata and Provenance Manager (AM&PM) system is the first to address this requirement by building on recent advances in schema mappings and database upgrade automation. Therefore AM&PM (i) extends the Information Schema with the capability of archiving the provenance of the schema and other metadata, (ii) provides a timestamp based representation for the provenance of the actual data, and (iii) supports powerful queries on the provenance of the data and on the history of the metadata. In this paper, we present the design and main features of AM&PM, and the results of various experiments to evaluate its performance.

international conference on management of data | 2014