Yinle Zhou | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yinle Zhou is active.

Explore More

Publication

Featured researches published by Yinle Zhou.

Handbook of Data Quality | 2013

A Practical Guide to Entity Resolution with OYSTER

John R. Talburt; Yinle Zhou

This chapter discusses the concepts and methods of entity resolution (ER) and how they can be applied in practice to eliminate redundant data records and support master data management programs. The chapter is organized into two main parts. The first part discusses the components of ER with particular emphasis approximate matching algorithms and the activities that comprise identity information management. The second part provides a step-by-step guide to build an ER process including data profiling, data preparation, identity attribute selection, rule development, ER algorithm considerations, deciding on an identity management strategy, results analysis, and rule refinement. Each step in the process is illustrated with an actual example using the OYSTER open-source, entity resolution system.

Archive | 2014

Information quality and governance for business intelligence

William Yeoh; John R. Talburt; Yinle Zhou

This book presents the latest exchange of academic research on all aspects of practicing and managing information using a multidisciplinary approach that examines its quality for organizational growth.

international conference on information technology: new generations | 2013

User-Defined Inverted Index in Boolean, Rule-Based Entity Resolution Systems

Yinle Zhou; John R. Talburt; Eric D. Nelson

This paper discusses the user-defined inverted index design, analysis and measurement in Boolean rule-based entity resolution (ER) systems. The features of Boolean rule-based ER system will be described first and followed by how to design user-defined inverted index for better performance. An illustration of alignment of index and matching rules will be given. Also, there will be a discussion of three index measurements: reduction ratio, index precision and recall. The final part gives two suggested strategies for designing the index.

International Journal of Business Intelligence Research | 2012

Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods

Yinle Zhou; Ali Kooshesh; John R. Talburt

Entity-based data integration (EBDI) is a form of data integration in which information related to the same real-world entity is collected and merged from different sources. It often happens that not all of the sources will agree on one value for a common attribute. These cases are typically resolved by invoking a rule that will select one of the non-null values presented by the sources. One of the most commonly used selection rules is called the naA¯ve selection operator that chooses the non-null value provided by the source with the highest overall accuracy for the attribute in question. However, the naA¯ve selection operator will not always produce the most accurate result. This paper describes a method for automatically generating a selection operator using methods from genetic programming. It also presents the results from a series of experiments using synthetic data that indicate that this method will yield a more accurate selection operator than either the naA¯ve or naA¯ve-voting selection operators.