Qingsong Yao
York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qingsong Yao.
database systems for advanced applications | 2005
Qingsong Yao; Aijun An; Xiangji Huang
A database user session is a sequence of queries issued by a user (or an application) to achieve a certain task. Analysis of task-oriented database user sessions provides useful insight into the query behavior of database users. In this paper, we describe novel algorithms for identifying sessions from database traces and for grouping the sessions different classes. We also present experimental results.
database and expert systems applications | 2003
Qingsong Yao; Aijun An
In this paper, we propose a solution that partly solves the selection and replacement problems for semantic query caching. We believe that the queries submitted by a client are not random. They have certain meaning and may follow certain rules. We use user access graphs to represent the query execution orders and propose algorithms that use such information for semantic query caching. Unlike the previous approaches, ours anticipates incoming queries based on the queries that have been submitted, analyzes the semantic relationship between them, and rewrites and caches the current query to answer multiple queries. Our initial experimental result shows that our solution improves cache performance.
Knowledge and Information Systems | 2006
Xiangji Huang; Qingsong Yao; Aijun An
A database session is a sequence of requests presented to the database system by a user or an application to achieve a certain task. Session identification is an important step in discovering useful patterns from database trace logs. The discovered patterns can be used to improve the performance of database systems by prefetching predicted queries, rewriting the current query or conducting effective cache replacement.In this paper, we present an application of a new session identification method based on statistical language modeling to database trace logs. Several problems of the language modeling based method are revealed in the application, which include how to select values for the parameters of the language model, how to evaluate the accuracy of the session identification result and how to learn a language model without well-labeled training data. All of these issues are important in the successful application of the language modeling based method for session identification. We propose solutions to these open issues. In particular, new methods for determining an entropy threshold and the order of the language model are proposed. New performance measures are presented to better evaluate the accuracy of the identified sessions. Furthermore, three types of learning methods, namely, learning from labeled data, learning from semi-labeled data and learning from unlabeled data, are introduced to learn language models from different types of training data. Finally, we report experimental results that show the effectiveness of the language model based method for identifying sessions from the trace logs of an OLTP database application and the TPC-C Benchmark.
database and expert systems applications | 2004
Qingsong Yao; Aijun An
Much work has been done on characterizing the workload of a database system. Previous studies focused on providing different types of statistical summaries, and modeling the run-time behavior on the physical resource level. In this paper, we focus on characterizing the database system’s workload from the view of database users. We use user access patterns to describe how a client application or a group of users accesses the data of a database system. The user access patterns include a set of user access events that represent the format of the queries and a set of user access graphs that represent the query execution orders. User access patterns can help database administrators tune the system, help database users optimize queries, and help to predict and cache future queries. In this paper, we present several approaches to using user access patterns to improve system performance, and report some experimental results.
data warehousing and knowledge discovery | 2005
Qingsong Yao; Xiangji Huang; Aijun An
In this paper, we describe a novel co-training based algorithm for identifying database user sessions from database traces. The algorithm learns to identify positive data (session boundaries) and negative data (non-session boundaries) incrementally by using two methods interactively in several iterations. In each iteration, previous identified positive and negative data are used to build better models, which in turn can label some new data and improve performance of further iterations. We also present experimental results.
international syposium on methodologies for intelligent systems | 2006
Qingsong Yao; Aijun An; Xiangji Huang
We present our approach to mining and modeling the behavior of database users. In particular, we propose graphic models to capture the database users dynamic behavior and focus on applying data mining techniques to the problem of mining and modeling database user behaviors from database trace logs. The experimental results show that our approach can discover and model user behaviors successfully.
international syposium on methodologies for intelligent systems | 2005
Qingsong Yao; Aijun An; Xiangji Huang
In this paper, we present a distance-based clustering algorithm for grouping database user sessions. The algorithm considers both local and global similarities between sessions and incorporates three distance metrics in the computation of the distance between two sessions. We describe the three metrics and discuss the rational for combining them. The algorithm is evaluated on two datasets. One is a clinic OLTP workload file and the other is the TPC-W benchmark. The evaluation results are reported.
web age information management | 2003
Qingsong Yao; Aijun An
Database users often submit similar queries to retrieve certain information from the database. We use user access event to represent a set of similar queries. A user access event contains an SQL template and a set of parameters, where the value of a parameter can be a constant or a variable. For example, event (“select name from customer where id =%”,101) represents a single query which retrieves the name of customer 101, while event (“select name from customer where id =%”,g_cid) represents a set of queries thats retrieve the name of given customer. The event execution orders are represented by using dependency graphs, which are called user access paths.
Lecture Notes in Computer Science | 2006
Qingsong Yao; Aijun An; Xiangji Huang
Lecture Notes in Computer Science | 2004
Qingsong Yao; Aijun An