Flora S. Tsai
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Flora S. Tsai.
Expert Systems With Applications | 2009
Flora S. Tsai; Wenchou Han; Junwei Xu; Hock Chuan Chua
The proliferation of wireless and mobile devices such as personal digital assistants and mobile phones has created a large demand for mobile software applications such as social networking software. In addition, the realization and widespread usage of peer-to-peer (P2P) networking have drastically increased the number of applications utilizing these technologies. The convergence of mobile and P2P networking have generated increasing interest in the mobile peer-to-peer (MP2P) community. In this paper, we describe the design and development of a mobile social software (MoSoSo) based on a P2P network architecture using Juxtapose (JXTA) and Juxtapose for Java MicroEdition (JXME). The MoSoSo application allows users to discover, communicate and share resources with one another. We present three facets of designing the MoSoSo: object-oriented software design, network infrastructure design, and user-interface design. The software has been fully implemented and tested on a variety of mobile devices for use in a campus setting. By studying the design and implementation of the MoSoSo, we hope to benefit the entire mobile application development community by providing common models and insights into developing MP2P software.
Expert Systems With Applications | 2008
Yun Chen; Flora S. Tsai; Kap Luk Chan
Weblogs, or blogs, have rapidly gained in popularity over the past few years. In particular, the growth of business blogs that are written by or provide commentary on businesses and companies opens up new opportunities for developing blog-specific search and mining techniques. In this paper, we propose probabilistic models for blog search and mining using two machine learning techniques, latent semantic analysis (LSA) and probabilistic latent semantic analysis (PLSA). We implement the models in our database of business blogs, BizBlogs07, with the aim of achieving higher precision and recall. The probabilistic model is able to segment the business blogs into separate topic areas, which is useful for keywords detection on the blogosphere. Various term-weighting schemes and factor values were also studied in detail, which reveal interesting patterns in our database of business blogs. Our multi-functional business blog system is indeed found to be very different from existing blog search engines, as it aims to provide better relevance and precision of the search.
IEEE Intelligent Systems | 2010
Flora S. Tsai; Minoru Etoh; Xing Xie; Wang-Chien Lee; Qiang Yang
The new frontier of mobile information retrieval will combine context awareness and content adaptation.
pacific asia workshop on intelligence and security informatics | 2007
Flora S. Tsai; Kap Luk Chan
Organizations and governments are becoming vulnerable to a wide variety of security breaches against their information infrastructure. The magnitude of this threat is evident from the increasing rate of cyber attacks against computers and critical infrastructure. Weblogs, or blogs, have also rapidly gained in numbers over the past decade. Weblogs may provide up-to-date information on the prevalence and distribution of various cyber security threats as well as terrorism events. In this paper, we analyze weblog posts for various categories of cyber security threats related to the detection of cyber attacks, cyber crime, and terrorism. Existing studies on intelligence analysis have focused on analyzing news or forums for cyber security incidents, but few have looked at weblogs. We use probabilistic latent semantic analysis to detect keywords from cyber security weblogs with respect to certain topics. We then demonstrate how this method can present the blogosphere in terms of topics with measurable keywords, hence tracking popular conversations and topics in the blogosphere. By applying a probabilistic approach, we can improve information retrieval in weblog search and keywords detection, and provide an analytical foundation for the future of security intelligence analysis of weblogs.
international conference on signal processing | 2007
Kok Wah Ng; Flora S. Tsai; Lihui Chen; Kiat Chong Goh
In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing various types of entities within sentences. We first detected novel sentences using named entity recognition to extract the entity types of person, place, time, and organization. In addition, part-of-speech tagging was performed to tag each word in the documents, allowing syntactic structures of noun, verb, and adjective to be used for comparisons. WordNet, an English lexical database of concepts and relations, was also incorporated to generate synonyms for the entities and parts of speech as well as to determine the similarity of sentences. The novelty score of each sentence was determined by using two different metrics, UniqueComparison and Importance Value. UniqueComparison calculated the number of matched entities, whereas ImportanceValue took into account the total weight of matched words that coexisted in both the test and history sentences. The results look promising when compared to the benchmark scores for the Text Retrieval Conferences (TREC) Novelty Track 2004. This demonstrated that the combination of named entity recognition and part-of-speech tagging is capable of detecting novelty with good results.
Expert Systems With Applications | 2011
Flora S. Tsai
Exploratory data analysis often relies heavily on visual methods because of the power of the human eye to detect structures. For large, multidimensional data sets which cannot be easily visualized, the number of dimensions of the data can be reduced by applying dimensionality reduction techniques. This paper reviews current linear and nonlinear dimensionality reduction techniques in the context of data visualization. The dimensionality reduction techniques were used in our case study of business blogs. The superior techniques were able to discriminate the various categories of blogs quite accurately. To our knowledge, this is the first study using dimensionality reduction techniques for visualization of blogs. In summary, we have applied dimensionality reduction for visualization of real-world blog data, with potential applications in the ever-growing digital realm of social media.
Information Sciences | 2010
Flora S. Tsai; Wenyin Tang; Kap Luk Chan
This work addresses the problem of detecting novel sentences from an incoming stream of text data, by studying the performance of different novelty metrics, and proposing a mixed metric that is able to adapt to different performance requirements. Existing novelty metrics can be divided into two types, symmetric and asymmetric, based on whether the ordering of sentences is taken into account. After a comparative study of several different novelty metrics, we observe complementary behavior in the two types of metrics. This finding motivates a new framework of novelty measurement, i.e. the mixture of both symmetric and asymmetric metrics. This new framework of novelty measurement performs superiorly under different performance requirements varying from high-precision to high-recall as well as for data with different percentages of novel sentences. Because it does not require any prior information, the new metric is very suitable for real-time knowledge base applications such as novelty mining systems where no training data is available beforehand.
knowledge discovery and data mining | 2009
Agus Trisnajaya Kwee; Flora S. Tsai; Wenyin Tang
Novelty detection (ND) is a process for identifying information from an incoming stream of documents. Although there are many studies of ND on English language documents, however, to the best of our knowledge, none has been reported on Malay documents. This issue is important because there are many documents with a mixture of both English and Malay languages. This paper examines multilingual sentence-level ND in English and Malay documents using TREC 2003 and TREC 2004 Novelty Track data. We describe the text processing for multilingual ND, which consists of language translation, stop words removal, automatic stemming, and novel sentence detection. We compare the results for sentence-level ND on English and Malay documents and find that the results are fairly similar. Therefore, after preprocessing is performed on Malay documents, our ND algorithm appears to be robust in detecting novel sentences, and can possibly be extended to other alphabet-based languages.
Proceedings of the 2007 international workshop on Domain driven data mining | 2007
Yun Chen; Flora S. Tsai; Kap Luk Chan
Weblogs, or blogs, have rapidly gained in popularity over the past few years. In particular, the growth of business blogs written by or providing commentary on businesses and companies opens up new opportunities for developing blog-specific search and mining techniques. In this paper, we propose probabilistic models for blog search and mining using two machine learning techniques, Latent Semantic Analysis (LSA) and Probabilistic Latent Semantic Analysis (PLSA). We implement the models in our database of business blogs, with the aim of achieving higher precision and recall. The probabilistic model is able to segment the business blogs into separate topic areas, which is useful for keywords detection on the blogosphere. Various term-weighting schemes and factor values were also studied in detail, which reveal interesting patterns in our database of business blogs. From our study, we can uncover domain-driven data mining techniques that can better strengthen business intelligence in complex enterprise applications.
Expert Systems With Applications | 2011
Flora S. Tsai
Blog mining addresses the problem of mining information from blog data. Although mining blogs may share many similarities to Web and text documents, existing techniques need to be reevaluated and adapted for the multidimensional representation of blog data, which exhibit dimensions not present in traditional documents, such as tags. Blog tags are semantic annotations in blogs which can be valuable sources of additional labels for the myriad of blog documents. In this paper, we present a tag-topic model for blog mining, which is based on the Author-Topic model and Latent Dirichlet Allocation. The tag-topic model determines the most likely tags and words for a given topic in a collection of blog posts. The model has been successfully implemented and evaluated on real-world blog data.