Weidi Dai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weidi Dai is active.

Explore More

Publication

Featured researches published by Weidi Dai.

international conference on machine learning and cybernetics | 2005

A clustering algorithm based on building a density-tree

Weidi Dai; Yue-Xian Hou; Pi-Lian He; Xiao-Shen Zheng

A new kind of clustering algorithm called CABDET is presented in this paper. CABDET creates a tree structure for every cluster, from which the neighbors radius of the current object is calculated by the local density of its father node. Those unprocessed objects in the neighbor of the current object are added to extend the tree structure until no new object is founded. Each density-tree is regarded as one cluster. CABDET requires only one input parameter as the initial radius of the root node and has no limitation of density threshold. Other characteristics include the abilities of discovering clusters with arbitrary shape and processing the noise data. The result of our experiments demonstrates that CABDET is significantly more accurate in discovering density-changeable clustering than the algorithm DBSCAN, and that CABDET is less sensitive to input parameters.

international conference on computer science and network technology | 2012

CloudAssoc: A pipeline for imputation based genome wide association study on cloud

Weidi Dai; Qiuwen Wang; Meng Gao; Lu Zhang

Genome wide association study (GWAS) has been proved to be an efficient approach to identify susceptibility genes for complex diseases. In order to increase the power for detecting the disease causal variants, imputation has been used to predict genotype dosages of untyped variants on the basis of linkage disequilibrium evaluated by public data. However, as the volume of data grows, time-consuming of imputation based association study becomes extremely large. We developed a cloud based pipeline to implement data format conversion, imputation, quality control and association study based on Map/Reduce framework which can aid biologists to accelerate the identification and evaluation of susceptibility genes for complex diseases and make it easier to combine GWAS data from worldwide for meta analysis.

international conference on computer science and network technology | 2012

A semantic integration system for heterogeneous bioinformatics data

Weidi Dai; Jianlai Cheng; Qiuwen Wang

A data integrating system, designed to provide unified access to multiple, heterogeneous biological and medical data sources, is proposed in this paper. Compared with other analogous systems, a linear regression model is used to improve the efficient of ontology alignment by results of string-based ontology alignment algorithms. We build domain ontology from object-oriented perspective to integrate bioinformatics information stored in relational databases, and describe the detailed process of heterogeneous data integration.

fuzzy systems and knowledge discovery | 2010

Riemannian Manifolds clustering via Geometric median

Yang Wang; Weidi Dai; Xiaodi Huang

In this paper, we propose a new kernel function that makes use of Riemannian geodesic distance s among data points, and present a Geometric median shift algorithm over Riemannian Manifolds. Relying on the geometric median shift, together with geodesic distances, our approach is able to effectively cluster data points distributed on Riemannian manifolds. In addition to improving the clustering results, Using both Riemannian Manifolds and Euclidean spaces, We compare the geometric median shift and mean shift algorithms on synthetic and real data sets for the tasks of clustering.

international conference on intelligent computing | 2006

A Local Computing-Based Hierarchical Clustering Algorithm Building Density Trees

Weidi Dai; Jie-Liu; Da-yi Zhao; Zhen-hua Liu; Jun-xian Zhang; Pi-Lian He

A new kind of clustering algorithm called LOCHDET (LOcal Computing-based Hierarchical clustering algorithm building DEnsity Trees) is presented in this paper. LOCHDET generates a density tree for each potential cluster according to its local density distribution. Each cluster is regarded as a tight coupling structure. Those “closer” clusters are merged if some conditio are satisfied. In order to reduce the cost time, a local computing technology is introduced. LOCHDET has a wide range of parameter settings, preferable accuracy in discovering clusters with arbitrary shape, good ability of processing noise data sets and weak sensitivity to input parameters by generalizing density-based, hierarchical, and locality-based methods. The results of our experiments confirm these mentioned above.

Applied Mechanics and Materials | 2011

An Ontology-Based Data Mining Framework in Traffic Domain

Ru Guang Wang; Weidi Dai; Jie Ru Cheng

Traditional data mining often focuses on the research of models and methods without considering of specific requirements in the field. This paper proposes a data mining framework in traffic field—Traffic Domain Data-Mining Framework (TDDMF) which is a domain-driven data mining framework based on ontology. The ontology model for traffic domain data based on TDDMF is also built in the paper. A prototype system is developed to prove the availability and effectiveness of TDDMF.

international symposium on information processing | 2008

Partition-Based Parallel Constructing-Density-Tree Clustering

Yunpeng Zhang; Zhengjun Zhai; Lu Zhang; Yifei Bao; Weidi Dai; Fei Zuo

A parallel constructing-density-tree clustering algorithm based on data partitioning (PCAP) was presented. The PCAP automatically partitioned global data space into load-balanced subspaces, which were distributed to different processors to complete subspacespsila clustering. The clustering result of global data space was achieved by merging those strong-association clusters though checking the association-intensity of leavespsila similarity. The detailed method of computing the association-intensity between clusters was described. Finally, the relevancy of the speedup and the amount of processors were discussed. The experiment results on artificial and real datasets show PCAP realizes the parallel of constructing-density-tree clustering algorithm and improves the clustering speed efficiently under preserving enough clustering precision. This approach is more suitable for dealing with great amounts of datasets.

international conference on machine learning and cybernetics | 2005

A clustering algorithm based on density kernel extension

Weidi Dai; Pi-Lian He; Yue-Xian Hou; Xiao-Dong Kang

A new type of clustering algorithm called CADEKE is presented in this paper. CADEKE creates an extended density kernel structure for every cluster by using its neighborhood coefficient. Those unprocessed objects found in current kernel structure are added to extend the kernel structure until no new object is found. Each density kernel structure is regarded as one cluster. CADEKE requires only one input parameter as the initial radius of finding the density kernel and has no limitation on density threshold. Other characteristics include the capacity of discovering clusters with arbitrary shapes and processing the noise data. The results of our experiments demonstrate that CADEKE is significantly more accurate in discovering density-changeable clustering than the algorithm DBSCAN, and that CADEKE is less sensitive to input parameters.

international symposium on neural networks | 2004

Nonlinear Prediction Model Identification and Robust Prediction of Chaotic Time Series

Yue-Xian Hou; Weidi Dai; Pi-Lian He

Although, in theory, the neural network is able to fit, model and predict any continuous determinant system, there is still an obstacle to prevent the neural network from wider and more effective applications due to the lack of complete theory of model identification. This paper addresses this issue by introducing a universal method to achieve nonlinear model identification. The proposed method is based on the theory of information entropy and its development, which is called as nonlinear irreducible autocorrelation. The latter is originally defined in the paper and could determine the optimal autoregressive order of nonlinear autoregression models by investigating the irreducible auto-dependency of the investigated time series. Following the above proposal, robust prediction of chaotic time series became realizable. Our idea is perfectly supported by computer simulations.

international conference on education technology and computer | 2010