Dezhao Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dezhao Song is active.

Explore More

Publication

Featured researches published by Dezhao Song.

Journal of Data and Information Quality | 2013

Domain-Independent Entity Coreference for Linking Ontology Instances

Dezhao Song; Jeff Heflin

The objective of entity coreference is to determine if different mentions (e.g., person names, place names, database records, ontology instances, etc.) refer to the same real word object. Entity coreference algorithms can be used to detect duplicate database records and to determine if two Semantic Web instances represent the same underlying real word entity. The key issues in developing an entity coreference algorithm include how to locate context information and how to utilize the context appropriately. In this article, we present a novel entity coreference algorithm for ontology instances. For scalability reasons, we select a neighborhood of each instance from an RDF graph. To determine the similarity between two instances, our algorithm computes the similarity between comparable property values in the neighborhood graphs. The similarity of distinct URIs and blank nodes is computed by comparing their outgoing links. In an attempt to reduce the impact of distant nodes on the final similarity measure, we explore a distance-based discounting approach. To provide the best possible domain-independent matches, we propose an approach to compute the discriminability of triples in order to assign weights to the context information. We evaluated our algorithm using different instance categories from five datasets. Our experiments show that the best results are achieved by including both our discounting and triple discrimination approaches.

conference on information and knowledge management | 2010

Domain-independent entity coreference in RDF graphs

Dezhao Song; Jeff Heflin

In this paper, we present a novel entity coreference algorithm for Semantic Web instances. The key issues include how to locate context information and how to utilize the context appropriately. To collect context information, we select a neighborhood (consisting of triples) of each instance from the RDF graph. To determine the similarity between two instances, our algorithm computes the similarity between comparable property values in the neighborhood graphs. The similarity of distinct URIs and blank nodes is computed by comparing their outgoing links. To provide the best possible domain-independent matches, we examine an appropriate way to compute the discriminability of triples. To reduce the impact of distant nodes, we explore a distance-based discounting approach. We evaluated our algorithm using different instance categories in two datasets. Our experiments show that the best results are achieved by including both our triple discrimination and discounting approaches.

Journal of Web Semantics | 2014

Exploring Linked Data with Contextual Tag Clouds

Xingjian Zhang; Dezhao Song; Sambhawa Priya; Zachary A. Daniels; Kelly Reynolds; Jeff Heflin

In this paper we present the contextual tag cloud system: a novel application that helps users explore a large scale RDF dataset. Unlike folksonomy tags used in most traditional tag clouds, the tags in our system are ontological terms (classes and properties), and a user can construct a context with a set of tags that defines a subset of instances. Then in the contextual tag cloud, the font size of each tag depends on the number of instances that are associated with that tag and all tags in the context. Each contextual tag cloud serves as a summary of the distribution of relevant data, and by changing the context, the user can quickly gain an understanding of patterns in the data. Furthermore, the user can choose to include RDFS taxonomic and/or domain/range entailment in the calculations of tag sizes, thereby understanding the impact of semantics on the data. In this paper, we describe how the system can be used as a query building assistant, a data explorer for casual users, or a diagnosis tool for data providers. To resolve the key challenge of how to scale to Linked Data, we combine a scalable preprocessing approach with a specially-constructed inverted index, use three approaches to prune unnecessary counts for faster online computations, and design a paging and streaming interface. Together, these techniques enable a responsive system that in particular holds a dataset with more than 1.4 billion triples and over 380,000 tags. Via experimentation, we show how much our design choices benefit the responsiveness of our system.

IEEE Transactions on Medical Imaging | 2015

Multimodal Entity Coreference for Cervical Dysplasia Diagnosis

Dezhao Song; Edward Kim; Xiaolei Huang; Joseph E Patruno Md; Héctor Muñoz-Avila; Jeff Heflin; L. Rodney Long; Sameer K. Antani

Cervical cancer is the second most common type of cancer for women. Existing screening programs for cervical cancer, such as Pap Smear, suffer from low sensitivity. Thus, many patients who are ill are not detected in the screening process. Using images of the cervix as an aid in cervical cancer screening has the potential to greatly improve sensitivity, and can be especially useful in resource-poor regions of the world. In this paper, we develop a data-driven computer algorithm for interpreting cervical images based on color and texture. We are able to obtain 74% sensitivity and 90% specificity when differentiating high-grade cervical lesions from low-grade lesions and normal tissue. On the same dataset, using Pap tests alone yields a sensitivity of 37% and specificity of 96%, and using HPV test alone gives a 57% sensitivity and 93% specificity. Furthermore, we develop a comprehensive algorithmic framework based on Multimodal Entity Coreference for combining various tests to perform disease classification and diagnosis. When integrating multiple tests, we adopt information gain and gradient-based approaches for learning the relative weights of different tests. In our evaluation, we present a novel algorithm that integrates cervical images, Pap, HPV, and patient age, which yields 83.21% sensitivity and 94.79% specificity, a statistically significant improvement over using any single source of information alone.

international semantic web conference | 2013

Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds

Xingjian Zhang; Dezhao Song; Sambhawa Priya; Jeff Heflin

In this paper we present the infrastructure of the contextual tag cloud system which can execute large volumes of queries about the number of instances that use particular ontological terms. The contextual tag cloud system is a novel application that helps users explore a large scale RDF dataset: the tags are ontological terms (classes and properties), the context is a set of tags that defines a subset of instances, and the font sizes reflect the number of instances that use each tag. It visualizes the patterns of instances specified by the context a user constructs. Given a request with a specific context, the system needs to quickly find what other tags the instances in the context use, and how many instances in the context use each tag. The key question we answer in this paper is how to scale to Linked Data; in particular we use a dataset with 1.4 billion triples and over 380,000 tags. This is complicated by the fact that the calculation should, when directed by the user, consider the entailment of taxonomic and/or domain/range axioms in the ontology. We combine a scalable preprocessing approach with a specially-constructed inverted index and use three approaches to prune unnecessary counts for faster intersection computations. We compare our system with a state-of-the-art triple store, examine how pruning rules interact with inference and analyze our design choices.

web intelligence | 2012

Accuracy vs. Speed: Scalable Entity Coreference on the Semantic Web with On-the-Fly Pruning

Dezhao Song; Jeff Heflin

One challenge for the Semantic Web is to scalably establish high quality owl: same As links between co referent ontology instances in different data sources, traditional approaches that exhaustively compare every pair of instances do not scale well to large datasets. In this paper, we propose a pruning-based algorithm for reducing the complexity of entity co reference. First, we discard candidate pairs of instances that are not sufficiently similar to the same pool of other instances. A sigmoid function based thresholding method is proposed to automatically adjust the threshold for such commonality on-the-fly. In our prior work, each instance is associated with a context graph consisting of neighboring RDF nodes. In this paper, we speed up the comparison for a single pair of instances by pruning insignificant context in the graph, this is accomplished by evaluating its potential contribution to the final similarity measure. We evaluate our system on three Semantic Web instance categories. We verify the effectiveness of our thresholding and context pruning methods by comparing to nine state-of-the-art systems. We show that our algorithm frequently outperforms those systems with a runtime speedup factor of 18 to 24 while maintaining competitive F1-scores. For datasets of up to 1 million instances, this translates to as much as 370 hours improvement in runtime.

international semantic web conference | 2012

Scalable and domain-independent entity coreference: establishing high quality data linkages across heterogeneous data sources

Dezhao Song

Due to the decentralized nature of the Semantic Web, the same real world entity may be described in various data sources and assigned syntactically distinct identifiers. In order to facilitate data utilization in the Semantic Web, without compromising the freedom of people to publish their data, one critical problem is to appropriately interlink such heterogeneous data. This interlinking process can also be referred to as Entity Coreference, i.e., finding which identifiers refer to the same real world entity. This proposal will investigate algorithms to solve this entity coreference problem in the Semantic Web in several aspects. The essence of entity coreference is to compute the similarity of instance pairs. Given the diversity of domains of existing datasets, it is important that an entity coreference algorithm be able to achieve good precision and recall across domains represented in various ways. Furthermore, in order to scale to large datasets, an algorithm should be able to intelligently select what information to utilize for comparison and determine whether to compare a pair of instances to reduce the overall complexity. Finally, appropriate evaluation strategies need to be chosen to verify the effectiveness of the algorithms.

international semantic web conference | 2011