Yubin Bao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yubin Bao is active.

Explore More

Publication

Featured researches published by Yubin Bao.

international conference on management of data | 2002

XBase: making your gigabyte disk queriable

Hongjun Lu; Guoren Wang; Ge Yu; Yubin Bao; Jianhua Lv; Yaxin Yu

With the rapid development of the Internet and the World Wide Web (WWW), very large amount of information is available and ready for downloading, most of which are free of charge. At the same time, hard disks with large capacity are available at affordable prices. Most of us nowadays often dump a large number of various types of documents into our computers without much thinking. On the other hand, file systems have not changed too much during the past decades. Most of them organize files in directories that form a tree structure, and a file is identified by its name and pathname in the directory tree. Remembering name of files created sometime ago and digging them out from a disk with dozen gigabytes of data in hundred thousands of files becomes never an easy task. Tools available for helping such a search are still far from satisfactory.Xbase (XML-based document BASE) is a prototype system aiming at addressing the above problem. By XML-based, we meant that XML is used to define the metadata. The current version of XBase stores text-based files, including semi-structured data such as XML, HTML, plain text documents (e.g., tex files, computer programs) and those files that can be converted into text (e.g., postscript files, PDF files). In XBase, file name is optional. Users can just load a file into XBase without giving a name and the directory where it should be stored. XBase will automatically associate it with attributes such as the time when the file was saved, its source, its size and type, and etc., To retrieve those files, XBase provides three access methods, explorative browsing, querying using query languages, and keyword based search.

computer science and software engineering | 2008

Study on Log-Based Change Data Capture and Handling Mechanism in Real-Time Data Warehouse

Jingang Shi; Yubin Bao; Fangling Leng; Ge Yu

This paper proposes a framework of change data capture and data extraction, which captures changed data based on the log analysis and processes the captured data further to improve the quality of data. Then processed data are pushed to a data queue and the system processes the data queue using priority-based scheduling algorithm. Ultimately processed data are loaded to real-time data warehouse to support decision analysis. After analysis of a test case, this method can capture all changed data coming from the source data in time without changing the structure of the source system, and has a little impact on system performance to the source system. In addition, the real-time scheduling algorithm can effectively improve the data quality and data freshness of the real-time data warehouse to give a better data support for businesss routine tactical decision.

web age information management | 2004

CD-Trees: An Efficient Index Structure for Outlier Detection

Huanliang Sun; Yubin Bao; Faxin Zhao; Ge Yu; Daling Wang

Outlier detection is to find objects that do not comply with the general behavior of the data. Partition is a kind of method of dividing data space into a set of non-overlapping rectangular cells. There exists very large data skew in real-life datasets so that partition will produce many empty cells. The cell-based algorithms for outlier detection don’t get enough attention to the existence of many empty cells, which affects the efficiency of algorithms. In this paper, we propose the concept of Skew of Data (SOD) to measure the degree of data skew, and which approximates the percentage of empty cells under a partition of a dataset. An efficient index structure called CD-Tree and the related algorithms are designed. This paper applies the CD-Tree to detect outliers. Compared with cell-based algorithms on real-life datasets, the speed of CD-Tree-based algorithm increases 4 times at least and that the number of dimensions processed also increases obviously.

international conference on web based learning | 2002

Using Page Classification and Association Rule Mining for Personalized Recommendation in Distance Learning

Daling Wang; Yubin Bao; Ge Yu; Guoren Wang

With the rapid development of Internet, distance learning applications over Internet become more and more popular. This paper introduces a personalized learning system for web-based distance learning and focus on the web usage mining techniques aimed at personalized recommendation service. First, this paper presents a web page classification method, which uses attribute-oriented induction method according to related domain knowledge shown by a concept hierarchy tree. Second, the paper presents an algorithm of mining association rules with one-support using Freq-Set-Tree. Third, based on their current access patterns, page classes at the home site, page integration from other sites, and the rules discovered in mining, recommendation pages are made and presented for the students.

Journal of Computer Science and Technology | 2003

Managing very large document collections using semantics

Guoren Wang; Hongjun Lu; Ge Yu; Yubin Bao

In this paper, a system is presented where documents are no longer identified by their file names. Instead, a document is represented by its semantics in terms ofdescriptor andcontent vector. Thedescriptor of a document consists of a set of attributes, such as date of creation, its type, its size, annotations, etc. Thecontent vector of a document consists of a set of terms extracted from the document. In this paper, a semantic document management system XBASE is designed and implemented based on the semantics and the functions of three main modules, X-Loader, X-Explorer and X-Query.

computer and information technology | 2010

A Triggering and Scheduling Approach for ETL in a Real-time Data Warehouse

Jie Song; Yubin Bao; Jingang Shi

In a real-time data warehouses, ETL is no longer executed periodically during the idle time of data warehouses, but continuously ongoing. Thus the triggering of ETL task, and In a real-time data warehouses, ETL is no longer executed periodically during the idle time of data warehouses, but continuously ongoing. Thus the triggering of ETL task, and the scheduling of updates and queries become the key issues. This paper proposes an IBSA (Integration Based Scheduling Approach), including the triggering rule and algorithm for staring an ETL task, and a scheduling algorithm for balancing the queries and updates by threads controlling. We also proposed the framework of implementations. A series of experiments show that IBSA can adjust the running order of tasks reasonably and use the system resources effectively to provide the faster query response and higher real-time capability of ETL.

international conference hybrid intelligent systems | 2009

Priority-Based Balance Scheduling in Real-Time Data Warehouse

Jingang Shi; Yubin Bao; Fangling Leng; Ge Yu

In real-time data warehouses, data import is no longer implemented in the batched and periodic way during the idle time of data warehouses, but continuously ongoing. The updates of real-time data warehouses are conflict with queries against data warehouses. Thus the scheduling of updates and queries becomes a key issue. This paper proposes a priority-based balance scheduling algorithm (PBBS). Firstly, according to the response time requirements of queries and the different import levels of the data being updated, the algorithm gives different priorities to all tasks. Then it makes a parallel scheduling, considering the task priorities, the implementation conditions of task queues and the feedback of system resources. And it proposes a method that ensures data consistency for parallel tasks. Finally, the experiments show that the algorithm is not only able to adjust the resources allocation for updates and queries in accordance with user requirements, but also make rational use of system resources and ensure high-priority tasks are processed first. Thus it not only reduces the response time of the important queries, but enhances the data freshness of the important data.

database systems for advanced applications | 2014

Label and Distance-Constraint Reachability Queries in Uncertain Graphs

Minghan Chen; Yu Gu; Yubin Bao; Ge Yu

A fundamental research problem concerning edge-labeled uncertain graphs is called label and distance-constraint reachability query (LDCR): Given two vertices u and v, query the probability that u can reach v through paths whose labels and lengths are constrained by a label set and a distance threshold separately. Considering LDCR is not tractable as a #P-complete problem, we aim to propose effective and efficient approximate solutions for it. We first introduce a subpath-based filtering strategy which combines divide-conquer algorithm and branch path pruning to compress the original graph and reduce the scale of DC-tree. Then to approximate LDCR, several estimators are presented based on different sampling mechanisms and a path/cut bound is proposed to prune large-deviation values. An extensive experimental evaluation on both real and synthetic datasets demonstrates that our approaches exhibit prominent performance in term of query time and accuracy.

international conference for young computer scientists | 2008

A Role and Context Based Access Control Model with UML

Yubin Bao; Jie Song; Daling Wang; Derong Shen; Ge Yu

As the wide uses of access control model in systems, a more agile access control model is required to solve complicated modeling, user authorizing and verifying problem. In this paper, an access control model based on the concepts of role, attribute and context, named C-RBAC, is proposed. This model is based on and further improved role-based access control (RBAC). The proposed model adds system conditions in access control, distinguishes users that belong to one role by user attributes, provides an agile and dynamic role model by adopting the concept of conditional role, and designs a more flexible access authorization mechanism to reinforce role model of RBAC. The implementation and the UML-modeling approaches of proposed model are also explained in this paper. Theoretical analysis and experiments prove that the new access control model is more effective by comparing with traditional RBAC model.

web age information management | 2004

Performance Optimization of Fractal Dimension Based Feature Selection Algorithm

Yubin Bao; Ge Yu; Huanliang Sun; Daling Wang

Feature selection is a key issue in the advanced application fields like data mining, multi-dimensional statistical analysis, multimedia index and document classification. It is a novel method to exploit fractal dimension to reduce dimension of feature spaces. The most famous one is the fractal dimension based feature selection algorithm FDR proposed by Traina Jr et al. This paper proposes an optimized algorithm, OptFDR, which scans the dataset only once and avoids the efficiency problems of multiple scanning large dataset in the algorithm FDR. The performance experiments are made for evaluating OptFDRalgorithm using real-world image feature dataset and synthetic dataset with fractal characteristics. The experimental results show that OptFDR algorithm outperforms FDR algorithm.

Explore More