Yinle Zhou
University of Arkansas at Little Rock
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yinle Zhou.
Handbook of Data Quality | 2013
John R. Talburt; Yinle Zhou
This chapter discusses the concepts and methods of entity resolution (ER) and how they can be applied in practice to eliminate redundant data records and support master data management programs. The chapter is organized into two main parts. The first part discusses the components of ER with particular emphasis approximate matching algorithms and the activities that comprise identity information management. The second part provides a step-by-step guide to build an ER process including data profiling, data preparation, identity attribute selection, rule development, ER algorithm considerations, deciding on an identity management strategy, results analysis, and rule refinement. Each step in the process is illustrated with an actual example using the OYSTER open-source, entity resolution system.
Archive | 2014
William Yeoh; John R. Talburt; Yinle Zhou
This book presents the latest exchange of academic research on all aspects of practicing and managing information using a multidisciplinary approach that examines its quality for organizational growth.
international conference on information technology: new generations | 2013
Yinle Zhou; John R. Talburt; Eric D. Nelson
This paper discusses the user-defined inverted index design, analysis and measurement in Boolean rule-based entity resolution (ER) systems. The features of Boolean rule-based ER system will be described first and followed by how to design user-defined inverted index for better performance. An illustration of alignment of index and matching rules will be given. Also, there will be a discussion of three index measurements: reduction ratio, index precision and recall. The final part gives two suggested strategies for designing the index.
International Journal of Business Intelligence Research | 2012
Yinle Zhou; Ali Kooshesh; John R. Talburt
Entity-based data integration (EBDI) is a form of data integration in which information related to the same real-world entity is collected and merged from different sources. It often happens that not all of the sources will agree on one value for a common attribute. These cases are typically resolved by invoking a rule that will select one of the non-null values presented by the sources. One of the most commonly used selection rules is called the naA¯ve selection operator that chooses the non-null value provided by the source with the highest overall accuracy for the attribute in question. However, the naA¯ve selection operator will not always produce the most accurate result. This paper describes a method for automatically generating a selection operator using methods from genetic programming. It also presents the results from a series of experiments using synthetic data that indicate that this method will yield a more accurate selection operator than either the naA¯ve or naA¯ve-voting selection operators.
Entity Information Life Cycle for Big Data#R##N#Master Data Management and Information Integration | 2015
John R. Talburt; Yinle Zhou
This chapter provides a discussion of the new International Organization for Standardization (ISO) standards related to the exchange of master data. It includes an in-depth look at the ISO 8000 family of standards, including ISO 8000-110, -120, -130, and -140, and their relationship to the ISO 22745-10, -30, and -40 standards. Also an explanation is given of simple versus strong ISO 8000-110 compliance, and the value proposition for ISO 8000 compliance is discussed.
Entity Information Life Cycle for Big Data#R##N#Master Data Management and Information Integration | 2015
John R. Talburt; Yinle Zhou
This chapter goes into detail about the design considerations surrounding the entity resolution and entity identity information management processes that support the CSRUD life cycle.
Entity Information Life Cycle for Big Data#R##N#Master Data Management and Information Integration | 2015
John R. Talburt; Yinle Zhou
This chapter gives a definition of master data management (MDM) and describes how it generates value for organizations. It also provides an overview of Big Data and the challenges it brings to MDM.
Entity Information Life Cycle for Big Data#R##N#Master Data Management and Information Integration | 2015
John R. Talburt; Yinle Zhou
This chapter explores the issues around maintaining entity identity integrity over time as entity identity information changes. It explains why both automated and manual update processes are critical for successful ER and MDM processes. It also covers the management and retirement of entity identifiers.
Entity Information Life Cycle for Big Data#R##N#Master Data Management and Information Integration | 2015
John R. Talburt; Yinle Zhou
This chapter describes the start of the CSRUD Life Cycle with initial capture and storage of entity identity information. It also discusses the importance of understanding the characteristics of the data, properly preparing the data, selecting identity attributes, and coming up with matching strategies. Perhaps most importantly, it discusses the methods and techniques for evaluating ER outcomes.
Entity Information Life Cycle for Big Data#R##N#Master Data Management and Information Integration | 2015
John R. Talburt; Yinle Zhou
This chapter describes how a distributed processing environment such as Hadoop Map/Reduce can be used to support the CSRUD Life Cycle for Big Data. The examples shown in this chapter use the match key blocking described in Chapter 9 as a data partitioning strategy to perform ER on large datasets. The chapter includes an algorithm for finding the transitive closure of multiple match keys in a distributed processing environment using an iterative algorithm that minimizes the amount of local memory required for each processor. It also outlines a structure for an identity knowledge base in a distributed key-value data store, and describes strategies and distributed processing workflows for capture and update phases of the CSRUD life cycle using both record-based and attribute-based cluster-to-cluster structure projections.