Shuigeng Zhou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shuigeng Zhou is active.

Explore More

Publication

Featured researches published by Shuigeng Zhou.

Archive | 2004

Conceptual Modeling – ER 2004

Paolo Atzeni; Wesley W. Chu; Hongjun Lu; Shuigeng Zhou; Tok Wang Ling

The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this should enable a wealth of query processing and semantic reasoning capabilities using XQuery and logical inference engines. However, we believe that the diversity and uncertainty of terminologies and schema-like annotations will make precise querying on a Web scale extremely elusive if not hopeless, and the same argument holds for large-scale dynamic federations of Deep Web sources. Therefore, ontology-based reasoning and querying needs to be enhanced by statistical means, leading to relevanceranked lists as query results. This paper presents steps towards such a “statistically semantic” Web and outlines technical challenges. We discuss how statistically quantified ontological relations can be exploited in XML retrieval, how statistics can help in making Web-scale search efficient, and how statistical information extracted from users’ query logs and click streams can be leveraged for better search result ranking. We believe these are decisive issues for improving the quality of next-generation search engines for intranets, digital libraries, and the Web, and they are crucial also for peer-to-peer collaborative Web search. 1 The Challenge of “Semantic” Information Search The age of information explosion poses tremendous challenges regarding the intelligent organization of data and the effective search of relevant information in business and industry (e.g., market analyses, logistic chains), society (e.g., health care), and virtually all sciences that are more and more data-driven (e.g., gene expression data analyses and other areas of bioinformatics). The problems arise in intranets of large organizations, in federations of digital libraries and other information sources, and in the most humongous and amorphous of all data collections, the World Wide Web and its underlying numerous databases that reside behind portal pages. The Web bears the potential of being the world’s largest encyclopedia and knowledge base, but we are very far from being able to exploit this potential. Database-system and search-engine technologies provide support for organizing and querying information; but all too often they require excessive manual preprocessing, such as designing a schema and cleaning raw data or manually classifying documents into a taxonomy for a good Web portal, or manual postprocessing such as browsing through large result lists with too many irrelevant items or surfing in the vicinity of promising but not truly satisfactory approximate matches. The following are a few example queries where current Web and intranet search engines fall short or where data P. Atzeni et al. (Eds.): ER 2004, LNCS 3288, pp. 3–17, 2004. c

international conference on data engineering | 2007

GString: A Novel Approach for Efficient Search in Graph Databases

Haoliang Jiang; Haixun Wang; Philip S. Yu; Shuigeng Zhou

Graphs are widely used for modeling complicated data, including chemical compounds, protein interactions, XML documents, and multimedia. Information retrieval against such data can be formulated as a graph search problem, and finding an efficient solution to the problem is essential for many applications. A popular approach is to represent both graphs and queries on graphs by sequences, thus converting graph search to subsequence matching. State-of-the-art sequencing methods work at the finest granularity - each node (or edge) in the graph will appear as an element in the resulting sequence. Clearly, such methods are not semantic conscious, and the resulting sequences are not only bulky but also prone to complexities arising from graph isomorphism and other problems in searching. In this paper, we introduce a novel sequencing method to capture the semantics of the underlying graph data. We find meaningful components in graph structures and use them as the most basic units in sequencing. It not only reduces the size of resulting sequences, but also enables semantic-based searching. In this paper, we base our approach on chemical compound databases, although it can be applied to searching other complicated graphs, such as protein structures. Experiments demonstrate that our approach outperforms state-of-the-art graph search methods.

IEEE Transactions on Parallel and Distributed Systems | 2008

Distributed Localization Using a Moving Beacon in Wireless Sensor Networks

Bin Xiao; Hekang Chen; Shuigeng Zhou

The localization of sensor nodes is a fundamental problem in sensor networks and can be implemented using powerful and expensive beacons. Beacons, the fewer the better, can acquire their position knowledge either from GPS devices or by virtue of being manually placed. In this paper, we propose a distributed method to localization of sensor nodes using a single moving beacon, where sensor nodes compute their position estimate based on the range-free technique. Two parameters are critical to the location accuracy of sensor nodes: the radio transmission range of the beacon and how often the beacon broadcasts its position. Theoretical analysis shows that these two parameters determine the upper bound of the estimation error when the traverse route of the beacon is a straight line. We extend the position estimate when the traverse route of the beacon is randomly chosen in a real-world situation, where the radio irregularity might cause a node to miss some crucial coordinate information from the beacon. We further point out that the movement pattern of the beacon plays a pivotal role in the localization task for sensors. To minimize estimation errors, sensor nodes can carry out a variety of algorithms in accordance with the movement of the beacon. Simulation results compare variants of the distributed method in a variety of testing environments. Real experiments show that the proposed method is feasible and can estimate the location of sensor nodes accurately, given a single moving beacon.

Nature Communications | 2014

Cassava genome from a wild ancestor to cultivated varieties

Wenquan Wang; Feng B; Jingfa Xiao; Zhiqiang Xia; Xuefeng Zhou; Li P; Weixiong Zhang; Ying Wang; Birger Lindberg Møller; Peng Zhang; Luo Mc; Xiao G; J. B. Liu; Junhui Yang; Suting Chen; Pablo D. Rabinowicz; Xu Chen; Haiying Zhang; Hernán Ceballos; Lou Q; Zou M; Carvalho Lj; Changying Zeng; Jing Xia; Shixiang Sun; Yun Xin Fu; Huizhong Wang; Cheng Lu; Ruan M; Shuigeng Zhou

Cassava is a major tropical food crop in the Euphorbiaceae family that has high carbohydrate production potential and adaptability to diverse environments. Here we present the draft genome sequences of a wild ancestor and a domesticated variety of cassava and comparative analyses with a partial inbred line. We identify 1,584 and 1,678 gene models specific to the wild and domesticated varieties, respectively, and discover high heterozygosity and millions of single-nucleotide variations. Our analyses reveal that genes involved in photosynthesis, starch accumulation and abiotic stresses have been positively selected, whereas those involved in cell wall biosynthesis and secondary metabolism, including cyanogenic glucoside formation, have been negatively selected in the cultivated varieties, reflecting the result of natural selection and domestication. Differences in microRNA genes and retrotransposon regulation could partly explain an increased carbon flux towards starch accumulation and reduced cyanogenic glucoside accumulation in domesticated cassava. These results may contribute to genetic improvement of cassava through better understanding of its biology.

very large data bases | 2012

Shortest path and distance queries on road networks: an experimental evaluation

Lingkun Wu; Xiaokui Xiao; Dingxiong Deng; Gao Cong; Andy Diwen Zhu; Shuigeng Zhou

Computing the shortest path between two given locations in a road network is an important problem that finds applications in various map services and commercial navigation products. The state-of-the-art solutions for the problem can be divided into two categories: spatial-coherence-based methods and vertex-importance-based approaches. The two categories of techniques, however, have not been compared systematically under the same experimental framework, as they were developed from two independent lines of research that do not refer to each other. This renders it difficult for a practitioner to decide which technique should be adopted for a specific application. Furthermore, the experimental evaluation of the existing techniques, as presented in previous work, falls short in several aspects. Some methods were tested only on small road networks with up to one hundred thousand vertices; some approaches were evaluated using distance queries (instead of shortest path queries), namely, queries that ask only for the length of the shortest path; a state-of-the-art technique was examined based on a faulty implementation that led to incorrect query results. To address the above issues, this paper presents a comprehensive comparison of the most advanced spatial-coherence-based and vertex-importance-based approaches. Using a variety of real road networks with up to twenty million vertices, we evaluated each technique in terms of its preprocessing time, space consumption, and query efficiency (for both shortest path and distance queries). Our experimental results reveal the characteristics of different techniques, based on which we provide guidelines on selecting appropriate methods for various scenarios.

international conference on management of data | 2013

Shortest path and distance queries on road networks: towards bridging theory and practice

Andy Diwen Zhu; Hui Ma; Xiaokui Xiao; Siqiang Luo; Youze Tang; Shuigeng Zhou

Given two locations s and t in a road network, a distance query returns the minimum network distance from s to t, while a shortest path query computes the actual route that achieves the minimum distance. These two types of queries find important applications in practice, and a plethora of solutions have been proposed in past few decades. The existing solutions, however, are optimized for either practical or asymptotic performance, but not both. In particular, the techniques with enhanced practical efficiency are mostly heuristic-based, and they offer unattractive worst-case guarantees in terms of space and time. On the other hand, the methods that are worst-case efficient often entail prohibitive preprocessing or space overheads, which render them inapplicable for the large road networks (with millions of nodes) commonly used in modern map applications. This paper presents Arterial Hierarchy (AH), an index structure that narrows the gap between theory and practice in answering shortest path and distance queries on road networks. On the theoretical side, we show that, under a realistic assumption, AH answers any distance query in Õ(log α) time, where α = dmax/dmin, and dmax (resp. dmin) is the largest (resp. smallest) L∞ distance between any two nodes in the road network. In addition, any shortest path query can be answered in Õ(k + log α) time, where k is the number of nodes on the shortest path. On the practical side, we experimentally evaluate AH on a large set of real road networks with up to twenty million nodes, and we demonstrate that (i) AH outperforms the state of the art in terms of query time, and (ii) its space and pre-computation overheads are moderate.

IEEE Transactions on Knowledge and Data Engineering | 2009

Distributed Skyline Retrieval with Low Bandwidth Consumption

Lin Zhu; Yufei Tao; Shuigeng Zhou

We consider skyline computation when the underlying data set is horizontally partitioned onto geographically distant servers that are connected to the Internet. The existing solutions are not suitable for our problem, because they have at least one of the following drawbacks: (1) applicable only to distributed systems adopting vertical partitioning or restricted horizontal partitioning, (2) effective only when each server has limited computing and communication abilities, and (3) optimized only for skyline search in subspaces but inefficient in the full space. This paper proposes an algorithm, called feedback-based distributed skyline (FDS), to support arbitrary horizontal partitioning. FDS aims at minimizing the network bandwidth, measured in the number of tuples transmitted over the network. The core of FDS is a novel feedback-driven mechanism, where the coordinator iteratively transmits certain feedback to each participant. Participants can leverage such information to prune a large amount of local data, which otherwise would need to be sent to the coordinator. Extensive experimentation confirms that FDS significantly outperforms alternative approaches in both effectiveness and progressiveness.

BMC Bioinformatics | 2010

MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features

Jiandong Ding; Shuigeng Zhou; Jihong Guan

BackgroundMicroRNAs (simply miRNAs) are derived from larger hairpin RNA precursors and play essential regular roles in both animals and plants. A number of computational methods for miRNA genes finding have been proposed in the past decade, yet the problem is far from being tackled, especially when considering the imbalance issue of known miRNAs and unidentified miRNAs, and the pre-miRNAs with multi-loops or higher minimum free energy (MFE). This paper presents a new computational approach, miRenSVM, for finding miRNA genes. Aiming at better prediction performance, an ensemble support vector machine (SVM) classifier is established to deal with the imbalance issue, and multi-loop features are included for identifying those pre-miRNAs with multi-loops.ResultsWe collected a representative dataset, which contains 697 real miRNA precursors identified by experimental procedure and other computational methods, and 5428 pseudo ones from several datasets. Experiments showed that our miRenSVM achieved a 96.5% specificity and a 93.05% sensitivity on the dataset. Compared with the state-of-the-art approaches, miRenSVM obtained better prediction results. We also applied our method to predict 14 Homo sapiens pre-miRNAs and 13 Anopheles gambiae pre-miRNAs that first appeared in miRBase13.0, MiRenSVM got a 100% prediction rate. Furthermore, performance evaluation was conducted over 27 additional species in miRBase13.0, and 92.84% (4863/5238) animal pre-miRNAs were correctly identified by miRenSVM.ConclusionMiRenSVM is an ensemble support vector machine (SVM) classification system for better detecting miRNA genes, especially those with multi-loop secondary structure.

Archive | 2004

Conceptual Modeling for Advanced Application Domains

Shan Wang; Katsumi Tanaka; Shuigeng Zhou; Tok Wang Ling; Jihong Guan; Dongqing Yang; Fabio Grandi; Eleni Mangina; Il-Yeol Song; Heinrich C. Mayr

In this paper a joint topology-geometry model is proposed for dealing with multiple representations and topology management to support map generalization. This model offers a solution for efficiently managing both geometry and topology during the map generalization process. Both geometry-oriented generalization techniques and topology-oriented techniques are integrated within this model. Furthermore, by encoding vertical links in this model, the joint topology-geometry model provides support for hierarchical navigation and browsing across the different levels as well as for the proper reconstruction of maps at intermediate levels.

EPL | 2007

Maximal planar scale-free Sierpinski networks with small-world effect and power law strength-degree correlation

Zhongzhi Zhang; Shuigeng Zhou; Lujun Fang; Jihong Guan; Yichao Zhang

Many real networks share three generic properties: they are scale-free, display a small-world effect, and show a power law strength-degree correlation. In this paper, we propose a type of deterministically growing networks called Sierpinski networks, which are induced by the famous Sierpinski fractals and constructed in a simple iterative way. We derive analytical expressions for degree distribution, strength distribution, clustering coefficient, and strength-degree correlation, which agree well with the characterizations of various real-life networks. Moreover, we show that the introduced Sierpinski networks are maximal planar graphs.

Explore More