Chi-Yao Tseng
Center for Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chi-Yao Tseng.
IEEE Transactions on Multimedia | 2008
Ken-Hao Liu; Ming-Fang Weng; Chi-Yao Tseng; Yung-Yu Chuang; Ming-Syan Chen
Automatic semantic concept detection in video is important for effective content-based video retrieval and mining and has gained great attention recently. In this paper, we propose a general post-filtering framework to enhance robustness and accuracy of semantic concept detection using association and temporal analysis for concept knowledge discovery. Co-occurrence of several semantic concepts could imply the presence of other concepts. We use association mining techniques to discover such inter-concept association relationships from annotations. With discovered concept association rules, we propose a strategy to combine associated concept classifiers to improve detection accuracy. In addition, because video is often visually smooth and semantically coherent, detection results from temporally adjacent shots could be used for the detection of the current shot. We propose temporal filter designs for inter-shot temporal dependency mining to further improve detection accuracy. Experiments on the TRECVID 2005 dataset show our post-filtering framework is both efficient and effective in improving the accuracy of semantic concept detection in video. Furthermore, it is easy to integrate our framework with existing classifiers to boost their performance.
computational science and engineering | 2009
Chi-Yao Tseng; Ming-Syan Chen
Recently, the huge number of email spams has caused serious problems in essential email communication. Traditional spam filters aim at analyzing email content to characterize the features that are commonly included in spams. However, it is observed that crafty tricks designed to avoid content-based filters will be endless owing to the economic benefits of sending spams. In view of this situation, there has been much research effort toward doing spam detection based on the reputation of senders rather than what is contained in emails. Motivated by the fact that spammers are prone to have unusual behavior and specific patterns of email communication, exploring email social networks to detect spams has received much attention. Nevertheless, previous works generally suffer from two problems: (1) the system is not robust in diverse environments, and (2) no update scheme is provided to catch the feature changes of evolving networks. In this paper, we propose an incremental support vector machine (SVM) model for spam detection on dynamic email social networks. A complete spam detection system MailNET is devised to better adjust to diverse networks. Several features of each user in the network are extracted to train an SVM model. Moreover, to catch the evolving nature of email communication, we present an incremental update scheme to efficiently re-train an SVM model. We evaluate MailNET on a live data set from a university-scale email server and show that the proposed model is efficient and effective, thus applicable to the real world.
international congress on big data | 2013
Chun-Chieh Chen; Chi-Yao Tseng; Ming-Syan Chen
Sequential pattern mining is an essential data mining technique that has been widely applied to many real world applications. However, traditional algorithms generally suffer from the scalability problem when dealing with big data. In this paper, we aim to significantly upgrade the scale and propose Sequential PAttern Mining algorithm based on MapReduce model on the Cloud (abbreviated as SPAMC). Derived from the prior SPAM algorithm, we design an iterative MapReduce framework to efficiently generate and prune candidate patterns when constructing the lexical sequence tree. This framework not only distributes the sub-tasks of tree construction to independent mappers in parallel, but also enables the parallel processing of support counting. We conduct extensive experiments on the cloud environment of 32 virtual machines with up to 12.8 million transactional sequences. Experimental results show that SPAMC can significantly reduce mining time with big data, achieve extremely high scalability, and provide perfect load balancing on the cloud cluster.
IEEE Transactions on Knowledge and Data Engineering | 2011
Chi-Yao Tseng; Pin-Chieh Sung; Ming-Syan Chen
E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In recent years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user feedback, to block subsequent near-duplicate spams. On purpose of achieving efficient similarity matching and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams. Moreover, we design a complete spam detection system Cosdes (standing for COllaborative Spam DEtection System), which possesses an efficient near-duplicate matching scheme and a progressive update scheme. The progressive update scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection. We evaluate Cosdes on a live data set collected from a real e-mail server and show that our system outperforms the prior approaches in detection results and is applicable to the real world.
IEEE Transactions on Knowledge and Data Engineering | 2015
Cheng-Ying Liu; Ming-Syan Chen; Chi-Yao Tseng
This paper focuses on the problem of short text summarization on the comment stream of a specific message from social network services (SNS). Due to the high popularity of SNS, the quantity of comments may increase at a high rate right after a social message is published. Motivated by the fact that users may desire to get a brief understanding of a comment stream without reading the whole comment list, we attempt to group comments with similar content together and generate a concise opinion summary for this message. Since distinct users will request the summary at any moment, existing clustering methods cannot be directly applied and cannot meet the real-time need of this application. In this paper, we model a novel incremental clustering problem for comment stream summarization on SNS. Moreover, we propose IncreSTS algorithm that can incrementally update clustering results with latest incoming comments in real time. Furthermore, we design an at-a-glance visualization interface to help users easily and rapidly get an overview summary. From extensive experimental results and a real case demonstration, we verify that IncreSTS possesses the advantages of high efficiency, high scalability, and better handling outliers, which justifies the practicability of IncreSTS on the target problem.
international conference on multimedia and expo | 2011
Chi-Yao Tseng; Ming-Syan Chen
Recently, uploading photos and adding identity tags on social network services are prevalent. Although some researchers have considered leveraging context to facilitate the process of tagging, these approaches still rely mainly on face recognition techniques that use visual features of photos. However, since the computational and storage costs of these approaches are generally high, they cannot be directly applicable to large-scale web services. To resolve this problem, we explore using only social network context to generate the top-k list of photo identity tag suggestion. The proposed method is based on various co-occurrence contexts that are related to the question of who may appear in this photo. An efficient ranking algorithm is designed to satisfy the real-time needs of this application. We utilize public album data of 400 volunteers from Facebook to verify that our approach can efficiently provide accurate suggestions with less additional storage requirement.
pacific-asia conference on knowledge discovery and data mining | 2013
Cheng-Ying Liu; Chi-Yao Tseng; Ming-Syan Chen
Sharing URLs has recently emerged as an important way for information exchange in online social networks (OSN). As can be perceived from our investigation toward several social streams, the percentage of messages with URL embedded ranges from 54% to 92%. Due to the extremely high volume of evolving messages in OSN, finding interesting and significant URLs from social streams possesses numerous challenges, such as the real-time need, noisy contents, various URL shortening services, etc. In this paper, we propose the Significant URLs MINing algorithm, abbreviated as SURLMINE, to produce the up-to-date ranking list of significant URLs without any pre-learning process. The key strategy of SURLMINE is to incrementally update the significance coefficients of all collected URLs by four pivotal features, including Follower-Friend ratio, language distribution, topic duration and period and decay model. Moreover, its capability of incremental update enables SURLMINE to achieve the real-time processing. To evaluate the effectiveness and efficiency of SURLMINE, we apply the proposed framework to Twitter platform and conduct experiments for 30 days (over 75 million tweets). The experimental results show that the precision of SURLMINE can reach up to 92%, and the execution performance can also satisfy the real-time requirements in large-scale social streams.
international conference on web services | 2012
Chi-Yao Tseng; Yu-Jen Chen; Ming-Syan Chen
Online social network services, such as Facebook and Twitter, have become increasingly popular recently. More and more users are accustomed to regularly reading the latest news feeds and interacting with friends on these social websites. However, when the number of friends increases to a large extent, users will receive hundreds of messages in a day and may be overwhelmed by the information overload. To alleviate this problem, we propose a novel visualization technique for social news feeds summarization on social web services. The proposed system SocFeedViewer can produce an egocentric network graph based on the news feeds generated in an arbitrary period of time. This graph provides an overview of those who have generated news feeds during this time period. To enhance the reading experience, we incorporate community detection, connectivity analysis, and importance analysis into our system to make users capable of preferentially surfing news feeds that are more significant and interesting.
mobile data management | 2013
Chi-Yao Tseng; Shih-Han Lin; Ming-Syan Chen
Place query is one of the most fundamental applications, and traditional use cases include finding the exact spatial location of a place and searching for a specific type of places in a given spatial range. On the other hand, there is another possibility that you may want to recommend a visited place to friends but forget the complete name of it. You have vague impressions on it and only remember the information of the place type, the rough range of the place, and some places near it. To enable the capability of query by impression that has not been fully explored in the literature, we define a new place query problem called Place Query with Adjacency Constraints (abbreviated as PQAC). We propose a naive approach and two enhancement algorithms, distance pre-calculating algorithm and grid indexing algorithm, to achieve greater efficiency that can satisfy the real-time need of this place query service. We implement the Query By Impression (abbreviated as QBI) system with a real metropolitan place dataset consisting of more than 40,000 place records from Google Place API. Several experiments are conducted to validate the efficiency and effectiveness of the proposed QBI system.
pacific-asia conference on knowledge discovery and data mining | 2011
Chi-Yao Tseng; Ming-Syan Chen
Given a social network, identifying significant nodes from the network is highly desirable in many applications. In different networks formed by diverse kinds of social connections, the definitions of what are significant nodes differ with circumstances. In the literature, most previous works generally focus on expertise finding in specific social networks. In this paper, we aim to propose a general node ranking model that can be adopted to satisfy a variety of service demands. We devise an unsupervised learning method that produces the ranking list of top-k significant nodes. The characteristic of this method is that it can generate different ranking lists when diverse sets of features are considered. To demonstrate the real application of the proposed method, we design the system DblpNET that is an author ranking system based on the co-author network of DBLP computer science bibliography. We discuss further extensions and evaluate DblpNET empirically on the public DBLP dataset. The evaluation results show that the proposed method can effectively apply to real-world applications.