Hsinchun Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hsinchun Chen is active.

Explore More

Publication

Featured researches published by Hsinchun Chen.

Management Information Systems Quarterly | 2012

Business intelligence and analytics: from big data to big impact

Hsinchun Chen; Roger H. L. Chiang; Veda C. Storey

Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.

decision support systems | 2004

Credit rating analysis with support vector machines and neural networks: a market comparative study

Zan Huang; Hsinchun Chen; Chia Jung Hsu; Wun-Hwa Chen; Soushan Wu

Corporate credit rating analysis has attracted lots of research interests in the literature. Recent studies have shown that Artificial Intelligence (AI) methods achieved better performance than traditional statistical methods. This article introduces a relatively new machine learning technique, support vector machines (SVM), to the problem in attempt to provide a model with better explanatory power. We used backpropagation neural network (BNN) as a benchmark and obtained prediction accuracy around 80% for both BNN and SVM methods for the United States and Taiwan markets. However, only slight improvement of SVM was observed. Another direction of the research is to improve the interpretability of the AI-based models. We applied recent research results in neural network model interpretation and obtained relative importance of the input financial variables from the neural network models. Based on these results, we conducted a market comparative analysis on the differences of determining factors in the United States and Taiwan markets.

ACM Transactions on Information Systems | 2008

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

Ahmed Abbasi; Hsinchun Chen; Arab Salem

The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

IEEE Computer | 2004

Crime data mining: a general framework and some examples

Hsinchun Chen; Wingyan Chung; Jennifer Jie Xu; Gang Wang; Yi Qin; Michael Chau

A major challenge facing all law-enforcement and intelligence-gathering organizations is accurately and efficiently analyzing the growing volumes of crime data. Detecting cybercrime can likewise be difficult because busy network traffic and frequent online transactions generate large amounts of data, only a small portion of which relates to illegal activities. Data mining is a powerful tool that enables criminal investigators who may lack extensive training as data analysts to explore large databases quickly and efficiently. We present a general framework for crime data mining that draws on experience gained with the Coplink project, which researchers at the University of Arizona have been conducting in collaboration with the Tucson and Phoenix police departments since 1997.

ACM Transactions on Information Systems | 2009

Textual analysis of stock market prediction using breaking financial news: The AZFin text system

Robert P. Schumaker; Hsinchun Chen

Our research examines a predictive machine learning approach for financial news articles analysis using several different textual representations: bag of words, noun phrases, and named entities. Through this approach, we investigated 9,211 financial news articles and 10,259,042 stock quotes covering the S&P 500 stocks during a five week period. We applied our analysis to estimate a discrete stock price twenty minutes after a news article was released. Using a support vector machine (SVM) derivative specially tailored for discrete numeric prediction and models containing different stock-specific variables, we show that the model containing both article terms and stock price at the time of article release had the best performance in closeness to the actual future stock price (MSE 0.04261), the same direction of price movement as the future price (57.1% directional accuracy) and the highest return using a simulated trading engine (2.06% return). We further investigated the different textual representations and found that a Proper Noun scheme performs better than the de facto standard of Bag of Words in all three metrics.

IEEE Intelligent Systems | 2005

Applying authorship analysis to extremist-group Web forum messages

Ahmed Abbasi; Hsinchun Chen

The speed, ubiquity, and potential anonymity of Internet media - email, Web sites, and Internet forums - make them ideal communication channels for militant groups and terrorist organizations. Analyzing Web content has therefore become increasingly important to the intelligence and security agencies that monitor these groups. Authorship analysis can assist this activity by automatically extracting linguistic features from online messages and evaluating stylistic details for patterns of terrorist communication. However, authorship analysis techniques are rooted in work with literary texts, which differ significantly from online communication. To explore these problems, we modified an existing framework for analyzing online authorship and applied it to Arabic and English Web forum messages associated with known extremist groups. We developed a special multilingual model - the set of algorithms and related features - to identify Arabic messages, gearing this model toward the languages unique characteristics. Furthermore, we incorporated a complex message extraction component to allow the use of a more comprehensive set of features tailored specifically toward online messages. Evaluating the linguistic features of Web messages and comparing them to known writing styles offers the intelligence community a tool for identifying patterns of terrorist communication.

ACM Transactions on Information Systems | 2008

Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace

Ahmed Abbasi; Hsinchun Chen

One of the problems often associated with online anonymity is that it hinders social accountability, as substantiated by the high levels of cybercrime. Although identity cues are scarce in cyberspace, individuals often leave behind textual identity traces. In this study we proposed the use of stylometric analysis techniques to help identify individuals based on writing style. We incorporated a rich set of stylistic features, including lexical, syntactic, structural, content-specific, and idiosyncratic attributes. We also developed the Writeprints technique for identification and similarity detection of anonymous identities. Writeprints is a Karhunen-Loeve transforms-based technique that uses a sliding window and pattern disruption algorithm with individual author-level feature sets. The Writeprints technique and extended feature set were evaluated on a testbed encompassing four online datasets spanning different domains: email, instant messaging, feedback comments, and program code. Writeprints outperformed benchmark techniques, including SVM, Ensemble SVM, PCA, and standard Karhunen-Loeve transforms, on the identification and similarity detection tasks with accuracy as high as 94% when differentiating between 100 authors. The extended feature set also significantly outperformed a baseline set of features commonly used in previous research. Furthermore, individual-author-level feature sets generally outperformed use of a single group of attributes.

systems man and cybernetics | 1992

Automatic construction of networks of concepts characterizing document databases

Hsinchun Chen; Kevin J. Lynch

Two East-bloc computing knowledge bases, both based on a semantic network structure, were created automatically from large, operational textual databases using two statistical algorithms. The knowledge bases were evaluated in detail in a concept-association experiment based on recall and recognition tests. In the experiment, one of the knowledge bases, which exhibited the asymmetric link property, outperformed four experts in recalling relevant concepts in East-bloc computing. The knowledge base, which contained 20000 concepts (nodes) and 280000 weighted relationships (links), was incorporated as a thesaurus-like component in an intelligent retrieval system. The system allowed users to perform semantics-based information management and information retrieval via interactive, conceptual relevance feedback. >

Journal of Visual Communication and Image Representation | 1996

Internet Categorization and Search: A Self-Organizing Approach

Hsinchun Chen; Chris Schuffels; Richard E. Orwig

Abstract The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CMU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing.

IEEE Intelligent Systems | 2010

AI and Opinion Mining

Hsinchun Chen; David Zimbra

The advent of Web 2.0 and social media content has stirred much excitement and created abundant opportunities for understanding the opinions of the general public and consumers toward social events, political movements, company strategies, marketing campaigns, and product preferences. Many new and exciting social, geopolitical, and business-related research questions can be answered by analyzing the thousands, even millions, of comments and responses expressed in various blogs (such as the blogosphere), forums (such as Yahoo Forums), social media and social network sites (including YouTube, Facebook, and Flikr), virtual worlds (such as Second Life), and tweets (Twitter). Opinion mining, a subdiscipline within data mining and computational linguistics, refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online news sources, social media comments, and other user-generated content. Sentiment analysis is often used in opinion mining to identify sentiment, affect, subjectivity, and other emotional states in online text.

Explore More