Liqiang Geng
National Research Council
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liqiang Geng.
ACM Computing Surveys | 2006
Liqiang Geng; Howard J. Hamilton
Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
Quality Measures in Data Mining | 2007
Liqiang Geng; Howard J. Hamilton
Interestingness measures play an important role in data mining regardless of the kind of patterns being mined. Good measures should select and rank patterns according to their potential interest to the user. Good measures should also reduce the time and space cost of the mining process. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, and reviews the analysis methods and selection principles for appropriate measures for applications.
Journal of Applied Logic | 2006
Howard J. Hamilton; Liqiang Geng; Leah Findlater; Dee Jay Randall
Abstract We describe a method for spatio-temporal data mining based on GenSpace graphs. Using familiar calendar and geographical concepts, such as workdays, weeks, climatic regions, and countries, spatio-temporal data can be aggregated into summaries in many ways. We automatically search for a summary with a distribution that is anomalous, i.e., far from user expectations. We repeatedly ranking possible summaries according to current expectations, and then allow the user to adjust these expectations. We also choose a propagation path in the GenSpace subgraph that reduces the storage and time costs of the mining process.
international syposium on methodologies for intelligent systems | 2009
Liqiang Geng; Scott Buffett; Bruce Hamilton; Xin Wang; Larry Korba; Hongyu Liu; Yunli Wang
Workflow mining aims to find graph-based process models based on activities, emails, and various event logs recorded in computer systems. Current workflow mining techniques mainly deal with well-structured and -symbolized event logs. In most real applications where workflow management software tools are not installed, these structured and symbolized logs are not available. Instead, the artifacts of daily computer operations may be readily available. In this paper, we propose a method to map these artifacts and content-based logs to structured logs so as to bridge the gap between the unstructured logs of real life situations and the status quo of workflow mining techniques. Our method consists of two tasks: discovering workflow instances and activity types. We use a clustering method to tackle the first task and a classification method to tackle the second. We propose a method to combine these two tasks to improve the performance of two as a whole. Experimental results on simulated data show the effectiveness of our method.
canadian conference on artificial intelligence | 2008
Liqiang Geng; Larry Korba; Yunli Wang; Xin Wang; Yonghua You
In this paper, we present a method to identify topics in email messages. The formal concept analysis is adopted as a semantic analysis method to group emails containing the same keywords to concepts. The fuzzy membership functions are used to rank the concepts based on the features of the emails, such as the senders, recipients, time span, and frequency of emails in the concepts. The highly ranked concepts are then identified as email topics. Experimental results on the Enron email dataset illustrate the effectiveness of the method.
business process management | 2008
Scott Buffett; Liqiang Geng
We investigate a method designed to improve accuracy of workflow mining in the case that the identification of task labels for log events are uncertain. Here we consider how the accuracy of an independent task identifier, such as a classification or clustering engine, can be improved by examining workflow. After briefly introducing the notion of iterative workflow mining, where the mined workflow is used to help improve the true task labelings which, when re-mined, will produce a more accurate workflow model, we demonstrate a Bayesian updating approach to determining posterior probabilities for each label for a given event, by considering the probabilities from the previous step as well as information as to the beliefs of the labels that can be gained by examining the workflow model. Experiments show that labeling accuracy can be increased significantly, resulting in more accurate workflow models.
cooperative design visualization and engineering | 2007
Larry Korba; Ronggong Song; George Yee; Andrew S. Patrick; Scott Buffett; Yunli Wang; Liqiang Geng
Organizations are under increasing pressures to manage all of the personal data concerning their customers and employees in a responsible manner. With the advancement of information and communication technologies, improved collaboration, and the pressures of marketing, it is very difficult to locate personal data is, let alone manage its use. in this paper, we outline the challenges of managing personally identifiable information in a collaborative environment, and describe a software prototype we call SNAP (Social Networking Applied to Privacy). SNAP uses automated workflow discovery and analysis, in combination with various text mining techniques, to support automated enterprise management of personally identifiable information.
trust and privacy in digital business | 2011
Liqiang Geng; Yonghua You; Yunli Wang; Hongyu Liu
Privacy compliance for free text documents is a challenge facing many organizations. Named entity recognition techniques and machine learning methods can be used to detect private information, such as personally identifiable information (PII) and personal health information (PHI) in free text documents. However, these methods cannot measure the level of privacy embodied in the documents. In this paper, we propose a framework to measure the privacy content in free text documents. The measure consists of two factors: the probability that the text can be used to uniquely identify a person and the degree of sensitivity of the private entities associated with the person. We then instantiate the framework in the scenario of detection and protection of PHI in medical records, which is a challenge for many hospitals, clinics, and other medical institutions. We did experiments on a real dataset to show the effectiveness of the proposed measure.
cooperative design visualization and engineering | 2008
Larry Korba; Yunli Wang; Liqiang Geng; Ronggong Song; George Yee; Andrew S. Patrick; Scott Buffett; Hongyu Liu; Yonghua You
With the growing use of computers and the Internet, it has become difficult for organizations to locate and effectively manage sensitive personally identifiable information (PII). This problem becomes even more evident in collaborative computing environments. PII may be hidden anywhere within the file system of a computer. As well, in the course of different activities, via collaboration or not, personally identifiable information may migrate from computer to computer. This makes meeting the organizational privacy requirements all the more complex. Our particular interest is to develop technology that would automatically discover workflow across organizational collaborators that would include private data. Since in this context, it is important to understand where and when the private data is discovered, in this paper, we focus on PII discovery, i.e. automatically identifying private data existant in semi-structured and unstructured (free text) documents. The first part of the process involves identifying PII via named entity recognition. The second part determines relationships between those entities based upon a supervised machine learning method. We present test results of our methods using publicly-available data generated from different collaborative activities to provide an assessment of scalability in cooperative computing environment.
international symposium on temporal representation and reasoning | 2003
Howard J. Hamilton; Liqiang Geng; Leah Findlater; Dee Jay Randall
We describe a method for spatio-temporal data mining based on expected distribution domain generalization (ExGen) graphs. Using familiar calendar and geographical concepts, such as workdays, weeks, climatic regions, and countries, spatio-temporal data can be aggregated into summaries in many ways. We automatically search for a summary with a distribution that is anomalous, i.e., far from user expectations. We repeatedly ranked possible summaries according to current expectations, and then allow the user to adjust these expectations.