Is this you? Create Your Porfile

Yi-Hung Wu

National Tsing Hua University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yi-Hung Wu is active.

Explore More

Publication

Featured researches published by Yi-Hung Wu.

IEEE Transactions on Knowledge and Data Engineering | 2007

Hiding Sensitive Association Rules with Limited Side Effects

Yi-Hung Wu; Chia-Ming Chiang; Arbee L. P. Chen

Data mining techniques have been widely used in various applications. However, the misuse of these techniques may lead to the disclosure of sensitive information. Researchers have recently made efforts at hiding sensitive association rules. Nevertheless, undesired side effects, e.g., nonsensitive rules falsely hidden and spurious rules falsely generated, may be produced in the rule hiding process. In this paper, we present a novel approach that strategically modifies a few transactions in the transaction database to decrease the supports or confidences of sensitive rules without producing the side effects. Since the correlation among rules can make it impossible to achieve this goal, in this paper, we propose heuristic methods for increasing the number of hidden sensitive rules and reducing the number of modified entries. The experimental results show the effectiveness of our approach, i.e., undesired side effects are avoided in the rule hiding process. The results also report that in most cases, all the sensitive rules are hidden without spurious rules falsely generated. Moreover, the good scalability of our approach in terms of database size and the influence of the correlation among rules on rule hiding are observed

international workshop on research issues in data engineering | 2001

Enabling personalized recommendation on the Web based on user interests and behaviors

Yi-Hung Wu; Yong-Chuan Chen; Arbee L. P. Chen

The dramatic growth of the Web has brought about the rapid accumulation of data and the increasing possibility of information sharing. As the population on the Web grows, the analysis of user interests and behaviors will provide hints on how to improve the quality of service. We define user interests and behaviors based on the documents read by the user. A method for mining such user interests and behaviors is then presented. In this way, each user is associated with a set of interests and behaviors, which is stored in the user profile. In addition, we define six types of user profiles and a distance measure to classify users into clusters. Finally, three kinds of recommendation services using the clustered results are realized. For performance evaluation, we implement these services on the Web to make experiments on real data/users. The results show that the average acceptance rates of these services range from 71.5% to 94.6%.

international conference on data engineering | 2004

An efficient algorithm for mining frequent sequences by a new strategy without support counting

Ding-Ying Chiu; Yi-Hung Wu; Arbee L. P. Chen

Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.

international world wide web conferences | 2002

Prediction of Web Page Accesses by Proxy Server Log

Yi-Hung Wu; Arbee L. P. Chen

As the population of web users grows, the variety of user behaviors on accessing information also grows, which has a great impact on the network utilization. Recently, many efforts have been made to analyze user behaviors on the WWW. In this paper, we represent user behaviors by sequences of consecutive web page accesses, derived from the access log of a proxy server. Moreover, the frequent sequences are discovered and organized as an index. Based on the index, we propose a scheme for predicting user requests and a proxy-based framework for prefetching web pages. We perform experiments on real data. The results show that our approach makes the predictions with a high degree of accuracy with little overhead. In the experiments, the best hit ratio of the prediction achieves 75.69%, while the longest time to make a prediction only requires 2.3 ms.

database systems for advanced applications | 2004

Music Classification Using Significant Repeating Patterns

Chang-Rong Lin; Ning-Han Liu; Yi-Hung Wu; Arbee L. P. Chen

With the popularity of multimedia applications, a large amount of music data has been accumulated on the Internet. Automatic classification of music data becomes a critical technique for providing an efficient and effective retrieval of music data. In this paper, we propose a new approach for classifying music data based on their contents. In this approach, we focus on monophonic music features represented as rhythmic and melodic sequences. Moreover, we use repeating patterns of music data to do music classification. For each pattern discovered from a group of music data, we employ a series of measurements to estimate its usefulness for classifying this group of music data. According to the patterns contained in a music piece, we determine which class it should be assigned to. We perform a series of experiments and the results show that our approach performs on average better than the approach based on the probability distribution of contextual information in music.

siam international conference on data mining | 2006

Discovering Frequent Tree Patterns over Data Streams

Mark Cheng-Enn Hsieh; Yi-Hung Wu; Arbee L. P. Chen

Since tree-structured data such as XML files are widely used for data representation and exchange on the Internet, discovering frequent tree patterns over tree-structured data streams becomes an interesting issue. In this paper, we propose an online algorithm to continuously discover the current set of frequent tree patterns from the data stream. A novel and efficient technique is introduced to incrementally generate all candidate tree patterns without duplicates. Moreover, a framework for counting the approximate frequencies of the candidate tree patterns is presented. Combining these techniques, the proposed approach is able to compute frequent tree patterns with guarantees of completeness and accuracy.

multimedia information retrieval | 2003

Efficient K-NN search in polyphonic music databases using a lower bounding mechanism

Ning-Han Liu; Yi-Hung Wu; Arbee L. P. Chen

Querying polyphonic music from a large data collection is an interesting and challenging topic. Recently, researchers attempt to provide efficient techniques for content-based retrieval in polyphonic music databases where queries can also be polyphonic. However, most of the techniques do not perform the approximate matching well. In this paper, we present a novel method to efficiently retrieve k music works that contain segments most similar to the user query based on the edit distance. A list-based index structure is first constructed using the feature of the polyphony. A set of candidate approximate answers is then generated for the user query. A lower bounding mechanism is proposed to prune these candidates such that the k answers can be obtained efficiently. The efficiency of the proposed method is evaluated by real data set and synthetic data set, reporting significant improvement over existing approaches in the response time yielded.

symposium on large spatial databases | 2007

Continuous evaluation of fastest path queries on road networks

Chia-Chen Lee; Yi-Hung Wu; Arbee L. P. Chen

The one-shot shortest path query has been studied for decades. However, in the applications on road networks, users are actually interested in the path with the minimum travel time (the fastest path), which varies as time goes. This motivates us to study the continuous evaluation of fastest path queries in order to capture the dynamics of road networks. Repeatedly evaluating a large number of fastest path queries at every moment is infeasible due to its computationally expensive cost. We propose a novel approach that employs the concept of the affecting area and the tolerance parameter to avoid the reevaluation while the travel time of the current answer is close enough to that of the fastest path. Furthermore, a grid-based index is designed to achieve the efficient processing of multiple queries. Experiments on real datasets show significant reduction on the total amount of reevaluation and therefore the cost for reevaluating a query.

database systems for advanced applications | 2005

An efficient approach to extracting approximate repeating patterns in music databases

Ning-Han Liu; Yi-Hung Wu; Arbee L. P. Chen

Pattern extraction from music strings is an important problem. The patterns extracted from music strings can be used as features for music retrieval or analysis. Previous works on music pattern extraction only focus on exact repeating patterns. However, music segments with minor differences may sound similar. The concept of the prototypical melody has therefore been proposed to represent these similar music segments. In musicology, the number of music segments that are similar to a prototypical melody implies the importance degree of the prototypical melody to the music work. In this paper, a novel approach is developed to extract all the prototypical melodies in a music work. Our approach considers each music segment as a candidate for the prototypical melody and uses the edit distance to determine the set of music segments that are similar to this candidate. A lower bounding mechanism, which estimates the number of similar music segments for each candidate and prunes the impossible candidates is designed to speed up the process. Experiments are performed on a real data set and the results show a significant improvement of our approach over the existing approaches in the average response time.

very large data bases | 2011

On-line rule matching for event prediction

Chung-Wen Cho; Yi-Hung Wu; Show-Jane Yen; Ying Zheng; Arbee L. P. Chen

The prediction of future events has great importance in many applications. The prediction is based on episode rules which are composed of events and two time constraints which require all the events in the episode rule and in the predicate of the rule to occur in a time interval, respectively. In an event stream, a sequence of events which matches the predicate of the rule satisfying the specified time constraint is called an occurrence of the predicate. After finding the occurrence, the consequent event which will occur in a time interval can be predicted. However, the time intervals computed from some occurrences for predicting the event can be contained in the time intervals computed from other occurrence and become redundant. As a result, how to design an efficient and effective event predictor in a stream environment is challenging. In this paper, an effective scheme is proposed to avoid matching the predicate events corresponding to redundant time intervals for prediction. Based on the scheme, we respectively consider two methodologies, forward retrieval and backward retrieval, for the efficient matching of predicate events over event streams. The approach based on forward retrieval construct a queue structure to incrementally maintain parts of the matched results as events arrive, and thus it avoids backward scans of the event stream. On the other hand, the approach based on backward retrieval maintains the recently arrived events in a tree structure. The matching of predicate events is triggered by identifiable events and achieved by an efficient retrieval on the tree structure, which avoids exhaustive scans of the arrived events. By running a series of experiments, we show that each of the proposed approaches has its advantages on particular data distributions and parameter settings.

Explore More