Dejun Yue
Northeastern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dejun Yue.
Computers & Mathematics With Applications | 2009
Tiancheng Zhang; Dejun Yue; Yu Gu; Yi Wang; Ge Yu
Correlation analysis is a very useful technique for similarity search in the field of data stream mining. The traditional method is not suitable for real time processing especially when the amount of stream sequences is very large. In this paper, we propose HBR (Hierarchical Boolean Representation), a novel technique for correlation analysis in stream time series. The original stream sequences are transformed into the Macro-Boolean series and the Micro-Boolean series successively, and the candidate correlation set can be easily obtained by simple bit operations. With huge amount of stream series, this method can quickly get the correlation pairs of series efficiently by reducing complicated calculation in a little space. Meanwhile, this approach can update the Boolean series incrementally with very low cost and adjust some important coefficients adaptively by the stream feature. The experimental evaluations show that HBR has excellent computation complexity with high accuracy.
conference on information and knowledge management | 2007
Tiancheng Zhang; Dejun Yue; Yu Gu; Ge Yu
Correlation analysis is a basic problem in the field of data stream mining. Typical approaches add sliding window to data streams to get the recent results, but the window length defined by users is always fixed which is not suitable for the changing stream environment. We propose a Boolean representation based data-adaptive method for correlation analysis among a large number of time series streams. The periodical trends of each stream series to are monitored to choose the most suitable window size and group the series with the same trends together. Instead of adopting complex pair-wise calculation, we can also quickly get the correlation pairs of series at the optimal window sizes. All the processing is realized by simple Boolean operations. Both the theory analysis and the experimental evaluations show that our method has good computation efficiency with high accuracy.
ieee international conference on dependable, autonomic and secure computing | 2014
Dejun Yue; Ge Yu; Derong Shen; Xiaocong Yu
Many challenging problems could be better solved by exploiting crowdsourcing platforms than traditional machine-based methods. However, data quality in crowdsourcing applications has become a crucial aspect since crowdsourcing workers may have different capabilities. In this paper, we propose a novel weighted aggregation rule (WAR) to improve the result accuracy in crowdsourcing systems. According to the agreement of answers given by the workers, we classify all the tasks into the high-agreement tasks and low-agreement tasks. For the high-agreement tasks, we use simple majority voting to select the correct answer while ensuring the result accuracy. For the low-agreement tasks, we adopt weighted majority voting strategy, which assigns a weight for each worker according to his performance on the high-agreement tasks. We evaluate the effectiveness of our proposed method using three real-world datasets on AMT. The experimental results show that our method achieves excellent result accuracy.
chinese control and decision conference | 2011
Tiancheng Zhang; Dejun Yue; Yanqiu Wang; Ge Yu
Correlation analysis is a key problem for data stream analysis. In this paper, we propose a correlation analysis method for multiple dimensional data streams, which is based on the Boolean lag representation and the PCA (Principal Component Analysis). Firstly, the raw stream sequence is transformed into the Boolean sequence. By the correlation analysis of Boolean sequences, we can easily find the sequence pairs with lag correlations by means of simple bit operations. Secondly, we compute the lag time and synchronize the multiple dimensional data stream. Thirdly, the PCA method is deployed to reduce the multiple data streams, and we can reconstruct the data streams by a few principal components. The experimental evaluations show that the method has high computation performance with high accuracy.
fuzzy systems and knowledge discovery | 2007
Tiancheng Zhang; Dejun Yue; Ge Yu; Yu Gu
Correlation analysis is a basic problem in the field of data stream mining. Traditional method is not suitable for real time processing with huge amount of stream data. We propose a hierarchical Boolean representation method for correlation analysis among time series data streams. The original streaming series are transformed into the Macro- Boolean series and then the Micro-Boolean series successively, and the candidate can be easily gained by simple bit operations. With huge amount of streaming series, this method can quickly get the correlation pairs of series in an efficient way by reducing huge calculation in a little space The experimental evaluations show that our method has better computation complexity with high accuracy.
web age information management | 2012
Haixu Miao; Tiezheng Nie; Dejun Yue; Tiancheng Zhang; Jinshen Liu
As XML becomes the standard of data presentation and information exchange, how to efficiently query information from XML documents becomes a hot topic. However, for larger XML documents and complicated XQueries, the performance of query processing which executes in a single node can seldom meet the needs of users. In this paper, algebra PPXA (Pure Parallel XQuery Algebra) is proposed to support parallel processing for XQuery statements. Based on the Algebra, a strategy for query plan decomposition is proposed for complex path queries and Twig queries. Then, we propose three optimization algorithms based on PPXA. The logical parallel execution plan is optimized by rules on operators, which reduce the local query execution costs. We implement the algebra and the query decomposition strategy in a native XML database system PureXBase. The experimental results show that it supports the XQuery parallel query processing effectively, and can significantly improve the efficiency of query processing.
computational intelligence and security | 2012
Tiancheng Zhang; Yifang Yin; Dejun Yue; Qian Ma; Ge Yu
Radio Frequency Identification (RFID) poses multiple advantages over traditional barcodes, such as hands-off detection, longer read range and more data storage. In addition, the declining cost of RFID systems along with improved sensitivity and durability nowadays has increased its usage potential in a variety of domains such as logistical, planning and supply chain process. However, the deployment of RFID facilities in real-world scenario always takes time and money. Once some significant design weaknesses appear, the facilities must be deployed all over again. In this paper, we present an RFID simulation platform, RFIDSim, which supports users to build their own virtual scenario and deploy RFID facilities in it instead. This simulation platform, which relies on a discrete event simulator, is designed to implement part of ISO 18000-6C communication protocol and support path loss, backscatter, capture and tag mobility models. Besides, the reader models are programable by using a special language so that users can adjust the readers into different applications. All the data collected during the simulation would be stored in the database for users to judge if a certain deployment is fairly appropriate.
workshop on information security applications | 2011
Dejun Yue; Ge Yu; Jinshen Liu; Tiancheng Zhang; Tiezheng Nie; FangFang Li
Keyword search is a wildly popular way for querying XML document. However, the increasing volume of XML data poses new challenges to keyword search processing. Parallel database is an efficient solution for this problem. In this paper, we study the problem of effective keyword search for SLCA (Smallest lower common ancestor) in parallel XML databases. We propose two efficient algorithm SONB (Scan once with no buffer) and MSOP (Merge strategy based on ordered partition) to compute the SLCA efficiently in the parallel environment. We have performed an extensive experimental study and the results show that our proposed approach achieves high efficiency for the keyword search.
database systems for advanced applications | 2011
Tiezheng Nie; Ge Yu; Derong Shen; Yue Kou; Dejun Yue
Web pages contain a large number of structured data, which are useful for many advanced applications. Existing works mainly focused on extracting structured data from web pages by individual wrappers but ignored the quality for these underlying web pages, which in fact impact the extracting results seriously. Thus, we define the quality of a web page by the data quality a wrapper can achieve in extraction. This paper proposes a novel approach to assess the quality of web pages in the deep web. In our approach, we first define the schema of web data with a hierarchical model. Then web pages are dealt with as XML documents and parsed into a DOM tree. The data units and attribute values in the web page are annotated with the schema semantics and the XPATH of position in the DOM tree. Based on the annotation, we build an assessment model for the quality of web pages with two dimensions: the structure complexity and the text complexity of node in the DOM tree. The quality is partitioned into three quality levels in our model, and the quality of web pages in the same quality level is compared by the proposed formulas. Moreover, we design an XQuery-based wrapper to extract the web page and validate our quality model since most of existing wrappers can not handle the data with hierarchical structure. The wrapper generates XQuery statements to extract web data with the annotation information. The experimental results demonstrated our approach is accurate for assessing the data quality of web pages. It is very helpful for data quality control in the deep web related applications.
web age information management | 2007
Yu Gu; Ge Yu; Shanshan Wu; Xiaojing Li; Yanfei Lv; Dejun Yue
With expanding of query requirements, event-relative semantic has appeared for data stream applications. This paper builds a simple but effective event-driven data stream model EQM based on event semantics and features over data stream. Furthermore, some improved approaches over event detection and queries as well as relative efficiency evaluation are discussed. Experiments show that our approaches gain better performance than available data stream model and processing approaches as far as event-relative problem is concerned.