Yanlong Wen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanlong Wen is active.

Explore More

Publication

Featured researches published by Yanlong Wen.

asia-pacific web conference | 2014

Summarizing Relational Database Schema Based on Label Propagation

Xiaojie Yuan; Xinkun Li; Man Yu; Xiangrui Cai; Ying Zhang; Yanlong Wen

Real enterprise databases are usually composed of hundreds of tables, which make querying a complex database a really hard task for unprofessional users, especially when lack of documentation. Schema summarization helps to improve the usability of databases and provides a succinct overview of the entire schema. In this paper, we introduce a novel three-step schema summarization method based on label propagation. First, we exploit varied similarity properties in database schema and propose a measure of table similarity based on Radial Basis Function Kernel, which measures similarity properties comprehensively. Second, we find representative tables as labeled data and annotate the labeled schema graph. Finally, we use label propagation algorithm on the labeled schema graph to classify database schema and create a schema summary. Extensive evaluations demonstrate the effectiveness of our approach.

Neurocomputing | 2018

Discriminative extraction of features from time series

Zhenguo Zhang; Haiwei Zhang; Yanlong Wen; Ying Zhang; Xiaojie Yuan

Abstract A primary challenge of time series classification is how to extract powerful features from training samples. Two kinds of classification methods, global-based and local-based methods, have been studied widely in recent years. The global-based methods, like 1-Nearest Neighbor(1-NN), take the entire series as features, which have the drawback that they are not able to indicate the intrinsic characters of a class. The local-based methods overcome this weakness by employing discriminative time series subsequences as features, called shapelets. However, most local-based methods are computationally expensive because of the massive number of shapelet candidates. In this paper, we propose a novel shapelets extraction method which takes each time series as a high-dimensional data and then finds the discriminative dimensions corresponding to the positions of shapelets. More specifically, the discriminative dimensions are determined by combining Local Fisher Discriminant Analysis (LFDA) method and two sparse restrictions which can encourage the continuous characteristic of time series. Extensive experimental results show that the proposed method achieves significant improvement compared to the existing shapelet-based methods in terms of classification accuracy and running time on the commonly used time series datasets. In addition, comparing with the accepted time series classification methods, NNDTW and COTE, our method still gets better results.

asia-pacific web conference | 2016

Accelerating Time Series Shapelets Discovery with Key Points

Zhenguo Zhang; Haiwei Zhang; Yanlong Wen; Xiaojie Yuan

Shapelets are discriminative subsequences in a time series dataset, which provide good interpretability for time series classification results. For this reason, time series shapelets have attracted great interest in time series data mining community. Although time series shapelets have satisfactory performance on many time series datasets, how to fast discover them is still a challenge because any subsequence in a time series may be a shapelet candidate. There are several methods to speed up shapelets discovery in recent years. However, these methods are still time-consuming when dealing with the large datasets or long time series. In this paper, we propose a preprocessing step with time series key points for shapelets discovery which make full use of the prior knowledge of shapelets. Combining with shapelets discovery method based on SAX(Fast-Shaplets), we can find shapelets quickly on all benchmark datasets of UCR archives, while the classification accuracy is almost the same as the current methods.

web age information management | 2015

Efficient Foreign Key Discovery Based on Nearest Neighbor Search

Xiaojie Yuan; Xiangrui Cai; Man Yu; Chao Wang; Ying Zhang; Yanlong Wen

With rapid growth of data size and schema complexity, many data sets are structured in tables but without explicit foreign key definitions. Automatically identifying foreign keys among relations will be beneficial to query optimization, schema matching, data integration and database design as well. This paper formulates foreign key discovery as a nearest neighbor search problem and proposes a fast foreign key discovery algorithm. To reduce foreign key candidates, we detect inclusion dependencies first. Then we choose statistical features to represent an attribute and define two attributes’s distance. Finally, foreign keys are discovered by finding nearest neighbors of all primary keys. Experiment results on real and synthetic data sets show that our algorithm can discover foreign keys efficiently.

asia-pacific web conference | 2014

Discovery of Unique Column Combinations with Hadoop

Shupeng Han; Xiangrui Cai; Chao Wang; Haiwei Zhang; Yanlong Wen

A unique column combination is one important kind of structural information in relations. From a data management perspective, discovering unique column combinations is a crucial step in understanding and utilizing the data. It will benefit data modeling, data integration, anomaly detection, query optimization and indexing. Nevertheless, discovering all unique column combinations is a NP-hard problem. Therefore, efficiency is a tremendous challenge.

database systems for advanced applications | 2011

Effective keyword search for candidate fragments of XML documents

Yanlong Wen; Haiwei Zhang; Ying Zhang; Lu Zhang; Lei Xu; Xiaojie Yuan

In this paper, we focus on the problem of effectively and efficiently answering XML keyword search. We first show the weakness of existing SLCA (Smallest Lowest Common Ancestor) based solutions, and then we propose the concept of Candidate Fragment. A Candidate Fragment is a meaningful sub tree in the XML document tree, which has the appropriate granularity. To efficiently compute Candidate Fragments as the answers of XML keyword search, we design Node Match Algorithm and Path Match algorithm. Finally, we conduct extensive experiments to show that our approach is both effective and efficient.

database systems for advanced applications | 2018

KAT: Keywords-to-SPARQL Translation Over RDF Graphs

Yanlong Wen; Yudong Jin; Xiaojie Yuan

In this paper, we focus on the problem of translating keywords into SPARQL query effectively and propose a novel approach called KAT. KAT takes into account the context of each input keyword and reduces the ambiguity of input keywords by building a keyword index which contains the class information of keywords in RDF data. To explore RDF data graph efficiently, KAT builds a graph index as well. Moreover, a context aware ranking method is proposed to find the most relevant SPARQL query. Extensive experiments are conducted to show that KAT is both effective and efficient.

database systems for advanced applications | 2018

Nearest Subspace with Discriminative Regularization for Time Series Classification

Zhenguo Zhang; Yanlong Wen; Ying Zhang; Xiaojie Yuan

For time series classification (TSC) problem, many studies focus on elastic distance measures for comparing time series and complete the task with the help of Nearest Neighbour (NN) classifier. This is mainly due to the fact that the order of variables is a crucial factor for time series. Unlike the NN classifier only considers one training sample, in this paper, we propose an improved Nearest Subspace (NS) classifier to classify new time series. By adding a discriminative regularization item, the improved NS classifier takes full advantage of all training time series of one class. Two kinds of discriminative regularization items are employed in our method. One is directly calculated based on Euclidean distance of time series. For the other, we obtain the regularization items from a lower-dimensional subspace. Two well-known dimensional reduction methods, Generalized Eigenvector Method (GEM) and Local Fisher Discriminant Analysis (LFDA), are employed to complete this task. Furthermore, we combine these improved NS classifiers through ensemble schemes to accommodate different time series datasets. Through extensive experiments on all UCR and UEA datasets, we demonstrate that the proposed method can gain better performance than NN classifiers with different elastic distance measures and other classifiers.

web information systems engineering | 2017

Time Series Classification by Modeling the Principal Shapes

Zhenguo Zhang; Yanlong Wen; Ying Zhang; Xiaojie Yuan

Time series classification has been attracting significant interests with many challenging applications in the research community. In this work, we present a novel time series classification method based on the statistical information of each time series class, called Principal Shape Model (PSM), which can quickly and effectively classify the time series even if they are very long and the dataset is very large. In PSM, the time series with the same class label in the training set are gathered to extract the principal shapes which will be used to generate the classification model. For each test sample, by comparing the minimum distance between this sample and each generated model, we can predict its label. Meanwhile, through the principal shapes, we can get the intrinsic shape variation of time series of the same class. Extensive experimental results show that PSM is orders of magnitudes faster than the state-of-art time series classification methods while achieving comparable or even better classification accuracy over common used and large datasets.

web age information management | 2016

Efficient Unique Column Combinations Discovery Based on Data Distribution

Chao Wang; Shupeng Han; Xiangrui Cai; Haiwei Zhang; Yanlong Wen

Discovering all unique column combinations in a relation is a fundamental research problem for modern data management and knowledge discovery applications. With the rapid growth of data volume and popularity of distributed platform, some algorithms are trying to discover uniques in large-scale datasets. However, the performance is not always satisfactory for some datasets which have few unique values in each column. This paper proposes a parallel algorithm to discover unique column combinations in large-scale datasets on Hadoop. We first construct a prefix tree to depict all unique candidates. Then we parallelize the verification of candidates in the same layer of the prefix tree. Two parallel strategies can be chosen: one is parallelizing across all subtrees, the other is parallelizing only in a single subtree. The parallel strategies and pruning methods are self-adaptive based on the data distribution. Eventually, experimental results demonstrate the advantages of the method we proposed.

Explore More