Yongjian Fu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yongjian Fu is active.

Explore More

Publication

Featured researches published by Yongjian Fu.

international conference on parallel and distributed information systems | 1996

A fast distributed algorithm for mining association rules

David W. Cheung; Jiawei Han; Vincent T. Y. Ng; Ada Wai-Chee Fu; Yongjian Fu

With the existence of many large transaction databases, the huge amounts of data, the high scalability of distributed systems, and the easy partitioning and distribution of a centralized database, it is important to investigate efficient methods for distributed mining of association rules. The study discloses some interesting relationships between locally large and globally large item sets and proposes an interesting distributed association rule mining algorithm, FDM (fast distributed mining of association rules), which generates a small number of candidate sets and substantially reduces the number of messages to be passed at mining association rules. A performance study shows that FDM has a superior performance over the direct application of a typical sequential algorithm. Further performance enhancement leads to a few variations of the algorithm.

IEEE Transactions on Knowledge and Data Engineering | 1996

Efficient mining of association rules in distributed databases

David W. Cheung; Vincent T. Y. Ng; Ada Wai-Chee Fu; Yongjian Fu

Many sequential algorithms have been proposed for the mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In this study, an efficient algorithm called DMA (Distributed Mining of Association rules), is proposed. It generates a small number of candidate sets and requires only O(n) messages for support-count exchange for each candidate set, where n is the number of sites in a distributed database. The algorithm has been implemented on an experimental testbed, and its performance is studied. The results show that DMA has superior performance, when compared with the direct application of a popular sequential algorithm, in distributed databases.

knowledge discovery and data mining | 1999

A Generalization-Based Approach to Clustering of Web Usage Sessions

Yongjian Fu; Kanwalpreet Sandhu; Ming-Yi Shih

The clustering of Web usage sessions based on the access patterns is studied. Access patterns of Web users are extracted from Web server log files, and then organized into sessions which represent episodes of interaction between the Web users and the Web server. Using attribute-oriented induction, the sessions are then generalized according to a page hierarchy which organizes pages based on their contents. These generalized sessions are finally clustered using a hierarchical clustering method. Our experiments on a large real data set show that the approach is efficient and practical for Web mining applications.

IEEE Transactions on Knowledge and Data Engineering | 1996

Intelligent query answering by knowledge discovery techniques

Jiawei Han; Yue Huang; Nick Cercone; Yongjian Fu

Knowledge discovery facilitates querying database knowledge and intelligent query answering in database systems. We investigate the application of discovered knowledge, concept hierarchies, and knowledge discovery tools for intelligent query answering in database systems. A knowledge-rich data model is constructed to incorporate discovered knowledge and knowledge discovery tools. Queries are classified into data queries and knowledge queries. Both types of queries can be answered directly by simple retrieval or intelligently by analyzing the intent of query and providing generalized, neighborhood or associated information using stored or discovered knowledge. Techniques have been developed for intelligent query answering using discovered knowledge and/or knowledge discovery tools, which includes generalization, data summarization, concept clustering, rule discovery, query rewriting, deduction, lazy evaluation, application of multiple-layered databases, etc. Our study shows that knowledge discovery substantially broadens the spectrum of intelligent query answering and may have deep implications on query answering in data- and knowledge-base systems.

conference on information and knowledge management | 2001

Reorganizing web sites based on user access patterns

Yongjian Fu; Mario Creado; Chunhua Ju

In this paper, an approach for reorganizing Web sites based on user access patterns is proposed. The approach consists of three steps: preprocessing, page classification, and site reorganization. In preprocessing, pages on a Web site are processed to create an internal representation of the site, and page access information of its users is extracted from its server log. In page classification, the Web pages on the site are classified into two categories, index pages and content pages, based on the page access information. After the pages are classified, in site reorganization, the Web site is examined to find better ways to organize and arrange the pages on the site. Our experiments on a large real data set show that the approach is efficient and practical for adaptive Web sites.

international conference on management of data | 1994

DBLearn: a system prototype for knowledge discovery in relational databases

Jiawei Han; Yongjian Fu; Yue Huang; Yandong Cai; Nick Cercone

A prototyped data mining system, DBLearn, has been developed, which efficiently and effectively extracts different kinds of knowledge rules from relational databases. It has the following features: high level learning interfaces, tightly integrated with commercial relational database systems, automatic refinement of concept hierarchies, efficient discovery algorithms and good performance. Substantial extensions of its knowledge discovery power towards knowledge mining in object-oriented, deductive and spatial databases are under research and development.

knowledge discovery and data mining | 2008

On privacy in time series data mining

Ye Zhu; Yongjian Fu; Huirong Fu

Traditional research on preserving privacy in data mining focuses on time-invariant privacy issues. With the emergence of time series data mining, traditional snapshot-based privacy issues need to be extended to be multi-dimensional with the addition of time dimension. We find current techniques to preserve privacy in data mining are not effective in preserving time-domain privacy. We present data flow separation attack on privacy in time series data mining, which is based on blind source separation techniques from statistical signal processing. Our experiments with real data show that this attack is effective. By combining the data flow separation method and the frequency matching method, an attacker can identify data sources and compromise time-domain privacy. We propose possible countermeasures to the data flow separation attack in the paper.

Lecture Notes in Computer Science | 1999

Improving High-Dimensional Indexing with Heuristics for Content-Based Image Retrieval

Yongjian Fu; Jui-Che Teng

Most high-dimensional indexing structures proposed for similarity query in content-based image retrieval (CBIR) systems are tree-structured. The quality of a high-dimensional tree-structured index is mainly determined by its insertion algorithm. Our approach focuses on an important phase in insertion, that is, the tree descending phase, when the tree is explored to find a host node to accommodate the vector to be inserted. We propose to integrate a heuristic algorithm in tree descending in order to find a better host node and thus improve the quality of the resulting index. A heuristic criteria for child selection has been developed, which takes into account both the similarity-based distance and the radius-increasing of the potential host node. Our approach has been implemented and tested on an image database. Our experiments show that the proposed approach can improve the quality of high-dimensional indices without much run-time overhead.

International Journal of Data Warehousing and Mining | 2011

Preserving Privacy in Time Series Data Mining

Yongjian Fu; Ye Zhu; Huirong Fu

Time series data mining poses new challenges to privacy. Through extensive experiments, the authors find that existing privacy-preserving techniques such as aggregation and adding random noise are insufficient due to privacy attacks such as data flow separation attack. This paper also presents a general model for publishing and mining time series data and its privacy issues. Based on the model, a spectrum of privacy preserving methods is proposed. For each method, effects on classification accuracy, aggregation error, and privacy leak are studied. Experiments are conducted to evaluate the performance of the methods. The results show that the methods can effectively preserve privacy without losing much classification accuracy and within a specified limit of aggregation error.

International Journal of Data Mining, Modelling and Management | 2010

On privacy-preserving time series data classification

Ye Zhu; Yongjian Fu; Huirong Fu

In this paper, we propose discretisation-based schemes to preserve privacy in time series data mining. Traditional research on preserving privacy in data mining focuses on time-invariant privacy issues. With the emergence of time series data mining, traditional snapshot-based privacy issues need to be extended to be multi-dimensional with the addition of time dimension. In this paper, we defined three threat models based on trust relationship between the data miner and data providers. We propose three different schemes for these three threat models. The proposed schemes are extensively evaluated against public-available time series datasets. Our experiments show that proposed schemes can preserve privacy with cost of reduction in mining accuracy. For most datasets, proposed schemes can achieve low privacy leakage with slight reduction in classification accuracy. We also studied effect of parameters of proposed schemes in this paper.

Explore More