Hye-Chung Kum
Texas A&M University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hye-Chung Kum.
Data Mining and Knowledge Discovery | 2006
Hye-Chung Kum; Joong Hyuk Chang; Wei Wang
To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequences databases with long patterns. Hence, ApproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases.
Journal of Parallel and Distributed Computing | 1997
Lars S. Nyland; Jan F. Prins; R. H. Yun; Jan Hermans; Hye-Chung Kum; Lei Wang
To achieve scalable parallel performance in molecular dynamics simulations, we have modeled and implemented several dynamic spatial domain decomposition algorithms. The modeling is based upon the bulk synchronous parallel architecture model (BSP), which describes supersteps of computation, communication, and synchronization. Using this model, we have developed prototypes that explore the differing costs of several spatial decomposition algorithms and then use this data to drive implementation of our molecular dynamics simulator,Sigma. The parallel implementation is not bound to the limitations of the BSP model, allowing us to extend the spatial decomposition algorithm. For an initial decomposition, we use one of the successful decomposition strategies from the BSP study and then subsequently use performance data to adjust the decomposition, dynamically improving the load balance. The motivating reason to use historical performance data is that the computation to predict a better decomposition increases in cost with the quality of prediction, while the measurement of past work often has hardware support, requiring only a slight amount of work to modify the decomposition for future simulation steps. In this paper, we present our adaptive spatial decomposition algorithms, the results of modeling them with the BSP, the enhanced spatial decomposition algorithm, and its performance results on computers available locally and at the national supercomputer centers.
Journal of the American Medical Informatics Association | 2014
Hye-Chung Kum; Ashok Krishnamurthy; Ashwin Machanavajjhala; Michael K. Reiter; Stanley C. Ahalt
OBJECTIVE Record linkage to integrate uncoordinated databases is critical in biomedical research using Big Data. Balancing privacy protection against the need for high quality record linkage requires a human-machine hybrid system to safely manage uncertainty in the ever changing streams of chaotic Big Data. METHODS In the computer science literature, private record linkage is the most published area. It investigates how to apply a known linkage function safely when linking two tables. However, in practice, the linkage function is rarely known. Thus, there are many data linkage centers whose main role is to be the trusted third party to determine the linkage function manually and link data for research via a master population list for a designated region. Recently, a more flexible computerized third-party linkage platform, Secure Decoupled Linkage (SDLink), has been proposed based on: (1) decoupling data via encryption, (2) obfuscation via chaffing (adding fake data) and universe manipulation; and (3) minimum information disclosure via recoding. RESULTS We synthesize this literature to formalize a new framework for privacy preserving interactive record linkage (PPIRL) with tractable privacy and utility properties and then analyze the literature using this framework. CONCLUSIONS Human-based third-party linkage centers for privacy preserving record linkage are the accepted norm internationally. We find that a computer-based third-party platform that can precisely control the information disclosed at the micro level and allow frequent human interaction during the linkage process, is an effective human-machine hybrid system that significantly improves on the linkage center model both in terms of privacy and utility.
IEEE Computer | 2014
Hye-Chung Kum; Ashok Krishnamurthy; Ashwin Machanavajjhala; Stanley C. Ahalt
Data-intensive research using distributed, federated, person-level datasets in near real time has the potential to transform social, behavioral, economic, and health sciences--but issues around privacy, confidentiality, access, and data integration have slowed progress in this area. When technology is properly used to manage both privacy concerns and uncertainty, big data will help move the growing field of population informatics forward.
data and knowledge engineering | 2007
Hye-Chung Kum; Joong Hyuk Chang; Wei Wang
Recently, there is an increasing interest in new intelligent mining methods to find more meaningful and compact results. In intelligent data mining research, accessing the quality and usefulness of the results from different mining methods is essential. However, there is no general benchmarking criteria to evaluate whether these new methods are indeed more effective compared to the traditional methods. Here we propose a novel benchmarking criteria that can systematically evaluate the effectiveness of any sequential pattern mining method under a variety of situations. The benchmark evaluates how well a mining method finds known common patterns in synthetic data. Such an evaluation provides a comprehensive understanding of the resulting patterns generated from any mining method empirically. In this paper, the criteria are applied to conduct a detailed comparison study of the support-based sequential pattern model with an approximate pattern model based on sequence alignment. The study suggests that the alignment model will give a good summary of the sequential data in the form of a set of common patterns in the data. In contrast, the support model generates massive amounts of frequent patterns with much redundancy. This suggests that the results of the support model require more post processing before it can be of actual use in real applications.
Government Information Quarterly | 2009
Hye-Chung Kum; Dean F. Duncan; C. Joy Stewart
Abstract The business sector has already recognized the importance of information flow for good management, with many businesses adopting new technology in data mining and data warehousing for intelligent operation based on free flow of information. Free flow of information in government agencies is just as important. For example, in child welfare, entities that fund social services programs have increasingly demanded improved outcomes for clients in return for continued financial support. To this end, most child welfare agencies are paying more attention to the outcomes of children in their care. In North Carolina, many county departments of social services have successfully adopted the self-evaluation model to monitor the effects of their programs on the outcomes of children. Such efforts in self-evaluation require good information flow from state division of social services to county departments of social services. In this paper, we propose a comprehensive KDD (Knowledge Discovery and Data mining) information system that could upgrade information flow in government agencies. We present the key elements of the information system and demonstrate how such a system could be successfully implemented via a case study in North Carolina. The next generation infrastructure in digital government must incorporate such information system to enable effective information flow in government agencies without compromising individual privacy.
Information Sciences | 2009
Joong Hyuk Chang; Hye-Chung Kum
Usually the data generation rate of a data stream is unpredictable, and some data elements of the data stream cannot be processed in real time if the generation rate exceeds the capacity of a data stream processing algorithm. In order to overcome this situation gracefully, a load shedding technique is recommended. This paper proposes a frequency-based load shedding technique over a data stream of tuples. In many data stream processing applications, such as mining frequent patterns, data elements having high frequency can be considered more significant than others having low frequency. Based on this observation, in the proposed technique, only frequent elements of a data stream are processed in real time while the others are trimmed. The decision to shed a load from the data stream or not is controlled automatically by the data generation rate of a data stream. Consequently, an unnecessary load shedding operation is not allowed in the proposed technique.
Lecture Notes in Computer Science | 1998
Lars S. Nyland; Jan F. Prins; R. H. Yun; Jan Hermans; Hye-Chung Kum; Lei Wang
To achieve scalable parallel performance in Molecular Dynamics Simulation, we have modeled and implemented several dynamic spatial domain decomposition algorithms. The modeling is based upon Valiants Bulk Synchronous Parallel architecture model (BSP), which describes supersteps of computation, communication, and synchronization. We have developed prototypes that estimate the differing costs of several spatial decomposition algorithms using the BSP model.
Child Maltreatment | 2008
Dean F. Duncan; Hye-Chung Kum; Elizabeth C. Weigensberg; Kimberly Flair; C. Joy Stewart
Proper management and implementation of an effective child welfare agency requires the constant use of information about the experiences and outcomes of children involved in the system, emphasizing the need for comprehensive, timely, and accurate data. In the past 20 years, there have been many advances in technology that can maximize the potential of administrative data to promote better evaluation and management in the field of child welfare. Specifically, this article discusses the use of knowledge discovery and data mining (KDD), which makes it possible to create longitudinal data files from administrative data sources, extract valuable knowledge, and make the information available via a user-friendly public Web site. This article demonstrates a successful project in North Carolina where knowledge discovery and data mining technology was used to develop a comprehensive set of child welfare outcomes available through a public Web site to facilitate information sharing of child welfare data to improve policy and practice.
Journal of Rural Health | 2015
Chinedum O. Ojinnaka; Yong Choi; Hye-Chung Kum; Jane N. Bolin
PURPOSE The purpose of this study was to explore the associations between sociodemographic factors such as residence, health care access, and colorectal cancer (CRC) screening among residents of Texas. METHODS Using the 2012 Behavioral Risk Factor Surveillance Survey, we performed logistic regression analyses to determine predictors of CRC screening among Texas residents, including rural versus urban differences. Our outcomes of interest were previous (1) CRC screening using any CRC test, (2) fecal occult blood test (FOBT), or (3) endoscopy, as well as up-to-date screening using (4) any CRC test, (5) FOBT, or (6) endoscopy. The independent variable of interest was rural versus urban residence; we controlled for other sociodemographic and health care access variables such as lack of health insurance. RESULTS Multivariate analysis showed that individuals who were residents of a rural/non-Metropolitan Statistical Area (MSA) location (OR = 0.70, 95% CI = 0.51-0.97) or a suburban county (OR = 0.61, 95% CI = 0.39-0.95) were less likely to report ever having any CRC screening compared to residents of a center city of an MSA. Residents of a rural/non-MSA location were less likely (OR = 0.49, 95% CI = 0.28-0.87) than residents of a center city of an MSA to be up-to-date using FOBT. There was decreased likelihood of ever being screened for CRC among the uninsured (OR = 0.43, 95% CI = 0.31-0.59). CONCLUSIONS Effective development and implementation of strategies to improve screening rates should aim at improving access to health care, taking into account demographic characteristics such as rural versus urban residence.