Tianjun Fu
University of Arizona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tianjun Fu.
ACM Transactions on Information Systems | 2012
Tianjun Fu; Ahmed Abbasi; Daniel Zeng; Hsinchun Chen
Despite the increased prevalence of sentiment-related information on the Web, there has been limited work on focused crawlers capable of effectively collecting not only topic-relevant but also sentiment-relevant content. In this article, we propose a novel focused crawler that incorporates topic and sentiment information as well as a graph-based tunneling mechanism for enhanced collection of opinion-rich Web content regarding a particular topic. The graph-based sentiment (GBS) crawler uses a text classifier that employs both topic and sentiment categorization modules to assess the relevance of candidate pages. This information is also used to label nodes in web graphs that are employed by the tunneling mechanism to improve collection recall. Experimental results on two test beds revealed that GBS was able to provide better precision and recall than seven comparison crawlers. Moreover, GBS was able to collect a large proportion of the relevant content after traversing far fewer pages than comparison methods. GBS outperformed comparison methods on various categories of Web pages in the test beds, including collection of blogs, Web forums, and social networking Web site content. Further analysis revealed that both the sentiment classification module and graph-based tunneling mechanism played an integral role in the overall effectiveness of the GBS crawler.
intelligence and security informatics | 2008
Hsinchun Chen; Sven Thoms; Tianjun Fu
As part of the NSF-funded Dark Web research project, this paper presents an exploratory study of cyber extremism on the Web 2.0 media: blogs, YouTube, and Second Life. We examine international Jihadist extremist groups that use each of these media. We observe that these new, interactive, multimedia-rich forms of communication provide effective means for extremists to promote their ideas, share resources, and communicate among each other. The development of automated collection and analysis tools for Web 2.0 can help policy makers, intelligence analysts, and researchers to better understand extremistspsila ideas and communication patterns, which may lead to strategies that can counter the threats posed by extremists in the second-generation Web.
international conference on social computing | 2013
Ahmed Abbasi; Tianjun Fu; Daniel Zeng; Donald A. Adjeroh
Social intelligence derived from Health 2.0 content has become of significant importance for various applications, including post-marketing drug surveillance, competitive intelligence, and to assess health-related opinions and sentiments. However, the volume, velocity, variety, and quality of online health information present challenges, necessitating enhanced facilitation mechanisms for medical social computing. In this study, we propose a focused crawler for online medical content. The crawler leverages enhanced credibility and context information. An extensive evaluation was performed against several comparison methods, on an online Health 2.0 test bed encompassing millions of pages. The results revealed that the proposed method was able to collect relevant content with considerably higher precision and recall rates than comparison methods, on content associated with medical websites, forums, blogs, and social networking sites. Furthermore, an example was used to illustrate the usefulness of the crawler for accurately representing online drug sentiments. Overall, the results have important implications for social computing, where a high-quality data and information foundation are imperative to the success of any overlying social intelligence initiative.
intelligence and security informatics | 2009
Tianjun Fu; Chun-Neng Huang; Hsinchun Chen
Web 2.0 has become an effective grassroots communication platform for extremists to promote their ideas, share resources, and communicate among each other. As an important component of Web 2.0, online video sharing sites such as YouTube and Google video have also been utilized by extremist groups to distribute videos. This study presented a framework for identifying extremist videos in online video sharing sites by using user-generated text content such as comments, video descriptions, and titles without downloading the videos. Text features including lexical features, syntactic features and content specific features were first extracted. Then Information Gain was used for feature selection, and Support Vector Machine was deployed for classification. The exploratory experiment showed that our proposed framework is effective for identifying online extremist videos, with the F-measure as high as 82%.
intelligence and security informatics | 2008
Tianjun Fu; Hsinchun Chen
Cyberactivism refers to the use of the Internet to advocate vigorous or intentional actions to bring about social or political change. Cyberactivism analysis aims to improve the understanding of cyber activists and their online communities. In this paper, we present a case study of online Free Tibet activities. For web site analysis, we use the inlink and outlink information of five selected seed URLs to construct the network of Free Tibet web sites. The network shows the close relationships between our five seed sites. Centrality measures reveal that tibet.org is probably an information hub site in the network. Further content analysis tells us that common hub site words are most popular in tibet.org whereas dalailama.com focuses mostly on religious words. For forum analysis, descriptive statistics such as the number of posts each month and the post distribution of forum users illustrate that the two large forums FreeTibetAndYou and RFAnews-Tibbs have experienced significant reduction in activities in recent years and that a small percentage of their users contribute the majority of posts. Important phrases of several long threads and active forum users are identified by using mutual information and TF-IDF scores. Such topical analyses help us understand the topics discussed in the forums and the ideas and interest of those forum users. Finally, social network analyses of the forum users are conducted to reflect their interactions and the social structure of their online communities.
intelligence and security informatics | 2006
Jau-Hwang Wang; Tianjun Fu; Hong Ming Lin; Hsinchun Chen
This paper examines the “Gray Web Forums” in Taiwan. We study their characteristics and develop an analysis framework for assisting investigations on forum communities. Based on the statistical data collected from online forums, we found that the relationship between a posting and its responses is highly correlated to the forum nature. In addition, hot threads extracted based on the proposed metric can be used to assist analysts in identifying illegal or inappropriate contents. Furthermore, members’ roles and activities in a virtual community can be identified by member level analysis.
intelligence and security informatics | 2008
Jau-Hwang Wang; Tianjun Fu; Hong Ming Lin; Hsinchun Chen
Our society is in a state of transformation toward a “virtual society.” However, due to the nature of anonymity and less observability, internet activities have become more diverse and obscure. As a result, unscrupulous individuals or criminals may exploit the internet as a channel for their illegal activities to avoid the apprehension by law enforcement officials. This paper examines the “Gray Web Forums” in Taiwan. We study their characteristics and develop an analysis framework for assisting investigations on forum communities. Based on the statistical data collected from online forums, we found that the relationship between a posting and its responses is highly correlated to the forum nature. In addition, hot threads extracted based on posting activity and our proposed metric can be used to assist analysts in identifying illegal or inappropriate contents. Furthermore, a member’s role and his/her activities in a virtual community can be identified by member level analysis. In addition, two schemes based on content analysis were also developed to search for illegal information items in gray forums. The experiment results show that hot threads are correlated to illegal information items, but the retrieval effectiveness can be significantly improved by search schemes based on content analysis.
intelligence and security informatics | 2007
Tianjun Fu; Ahmed Abbasi; Hsinchun Chen
Interaction coherence analysis (ICA) attempts to accurately identify and construct interaction networks by using various features and techniques. It is useful to identify user roles, users social and information value, as well as the social network structure of Dark Web communities. In this study, we applied interaction coherence analysis for Dark Web forums using the hybrid interaction coherence (HIC) algorithm. Our algorithm utilizes both system features such as header information and quotations, and linguistic features such as direct address and lexical relation. Furthermore, several similarity-based methods, for example vector space model, dice equation, and sliding window, are used to address various types of noises. Two experiments have been conducted to compare our HIC algorithm with traditional linkage-based method, similarity-based method, and a simplified HIC method that does not address noise issues. The results demonstrate the effectiveness of our HIC algorithm for identifying interactions in Dark Web forums.
IEEE Transactions on Knowledge and Data Engineering | 2008
Ahmed Abbasi; Hsinchun Chen; Sven Thoms; Tianjun Fu
Journal of the Association for Information Science and Technology | 2010
Tianjun Fu; Ahmed Abbasi; Hsinchun Chen