Zhou Shuigeng
Fudan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhou Shuigeng.
Journal of Computer Science and Technology | 2000
Zhou Aoying; Zhou Shuigeng; Cao Jing; Fan Ye; Hu Yunfa
The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.
Journal of Computer Science and Technology | 2000
Zhou Aoying; Jin Wen; Zhou Shuigeng; Qian Weining; Tian Zenping
Semistructured data are specified in lack of any fixed and rigid schema, even though typically some implicit structure appears in the data. The huge amounts of on-line applications make it important and imperative to mine the schema of semistructured data, both for the users (e.g., to gather useful information and facilitate querying) and for the systems (e.g., to optimize access). The critical problem is to discover the hidden structure in the semistructured data. Current methods in extracting Web data structure are either in a general way independent of application background, or bound in some concrete environment such as HTML, XML etc. But both face the burden of expensive cost and difficulty in keeping along with the frequent and complicated variances of Web data. In this paper, the problem of incremental mining of schema for semistructured data after the update of the raw data is discussed. An algorithm for incrementally mining the schema of semistructured data is provided, and some experimental results are, also given, which show that incremental mining for semistructured data is more efficient than non-incremental mining.
Chinese Physics Letters | 2015
Li Ling; Guan Ji-hong; Zhou Shuigeng
Controls, especially efficiency controls on dynamical processes, have become major challenges in many complex systems. We study an important dynamical process, random walk, due to its wide range of applications for modeling the transporting or searching process. For lack of control methods for random walks in various structures, a control technique is presented for a class of weighted treelike scale-free networks with a deep trap at a hub node. The weighted networks are obtained from original models by introducing a weight parameter. We compute analytically the mean first passage time (MFPT) as an indicator for quantitatively measuring the efficiency of the random walk process. The results show that the MFPT increases exponentially with the network size, and the exponent varies with the weight parameter. The MFPT, therefore, can be controlled by the weight parameter to behave superlinearly, linearly, or sublinearly with the system size. This work provides further useful insights into controlling efficiency in scale-free complex networks.
Geo-spatial Information Science | 2004
Wang Leichun; Guan Jihong; Zhou Shuigeng
In recent years, Web services and Peer-to-Peer (or simply P2P) appear as two of the hottest research topics in network computing. On the one hand, by adopting a decentralized, network-based style, P2P technologies can make P2P systems enhance overall reliability and fault-tolerance, increase autonomy, and enable ad-hoc communication and collaboration. On the other hand, Web services provides a good approach to integrate various heterogeneous systems and applications into a cooperative environment. This paper presents the techniques of combining Web services and P2P technologies into GIS to construct a new generation of GIS, which is more flexible and cooperative. As a case study, an ongoing project JGWS is introduced which is an experimental GIS Web services platform built on JXTA. This paper also explores the schemes of building GIS Web services in a P2P environment.
Journal of Computer Science and Technology | 2000
Zhou Aoying; Zhou Shuigeng; Jin Wen; Tian Zengping
The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of research activities around this problem. Traditional association rule mining is limited to intra-transaction. Only recently the concept ofN-dimensional inter-transaction association rule (NDITAR) was proposed by H.J. Lu. This paper modifies and extends Lu’s definition of NDITAR based on the analysis of its limitations, and the generalized multidimensional association rule (GMDAR) is subsequently introduced, which is more general, flexible and reasonable than NDITAR.
Archive | 2005
Guan Jihong; Zhou Shuigeng; Bian Fuling
Caai Transactions on Intelligent Systems | 2011
Zhou Shuigeng
Computer Engineering and Applications | 2008
Zhou Shuigeng
Archive | 2014
Zhou Shuigeng; Guan Jihong; Li Danqing; Zhu Xiaoran; Zhou Ye; Wang Haiqing
Communications in Theoretical Physics | 2010
Huang Dingjiang; Zhou Shuigeng; Mei Jian-Qin; Zhang Hong-Qing