Hsin-Min Lu
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hsin-Min Lu.
Journal of Biomedical Informatics | 2008
Hsin-Min Lu; Daniel Zeng; Lea Trujillo; Ken Komatsu; Hsinchun Chen
Emergency department free-text chief complaints (CCs) are a major data source for syndromic surveillance. CCs need to be classified into syndromic categories for subsequent automatic analysis. However, the lack of a standard vocabulary and high-quality encodings of CCs hinder effective classification. This paper presents a new ontology-enhanced automatic CC classification approach. Exploiting semantic relations in a medical ontology, this approach is motivated to address the CC vocabulary variation problem in general and to meet the specific need for a classification approach capable of handling multiple sets of syndromic categories. We report an experimental study comparing our approach with two popular CC classification methods using a real-world dataset. This study indicates that our ontology-enhanced approach performs significantly better than the benchmark methods in terms of sensitivity, F measure, and F2 measure.
International Journal of Medical Informatics | 2009
Hsin-Min Lu; Hsinchun Chen; Daniel Dajun Zeng; Chwan-Chuen King; Fuh-Yuan Shih; Tsung-Shu Joseph Wu; Jin-Yi Hsiao
Abstract Purpose Syndromic surveillance is aimed at early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which may be recorded in different languages. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories to facilitate subsequent data aggregation and analysis. Despite the fact that syndromic surveillance is largely an international effort, existing CC classification systems do not provide adequate support for processing CCs recorded in non-English languages. This paper reports a multilingual CC classification effort, focusing on CCs recorded in Chinese. Methods We propose a novel Chinese CC classification system leveraging a Chinese-English translation module and an existing English CC classification approach. A set of 470 Chinese key phrases was extracted from about one million Chinese CC records using statistical methods. Based on the extracted key phrases, the system translates Chinese text into English and classifies the translated CCs to syndromic categories using an existing English CC classification system. Results Compared to alternative approaches using a bilingual dictionary and a general-purpose machine translation system, our approach performs significantly better in terms of positive predictive value (PPV or precision), sensitivity (recall), specificity, and F measure (the harmonic mean of PPV and sensitivity), based on a computational experiment using real-world CC records. Conclusions Our design provides satisfactory performance in classifying Chinese CCs into syndromic categories for public health surveillance. The overall design of our system also points out a potentially fruitful direction for multilingual CC systems that need to handle languages beyond English and Chinese.
acm transactions on management information systems | 2012
Hsin-Min Lu; Feng Tse Tsai; Hsinchun Chen; Mao-Wei Hung; Shu-Hsing Li
Credit ratings convey credit risk information to participants in financial markets, including investors, issuers, intermediaries, and regulators. Accurate credit rating information plays a crucial role in supporting sound financial decision-making processes. Most previous studies on credit rating modeling are based on accounting and market information. Text data are largely ignored despite the potential benefit of conveying timely information regarding a firm’s outlook. To leverage the additional information in news full-text for credit rating prediction, we designed and implemented a news full-text analysis system that provides firm-level coverage, topic, and sentiment variables. The novel topic-specific sentiment variables contain a large fraction of missing values because of uneven news coverage. The missing value problem creates a new challenge for credit rating prediction approaches. We address this issue by developing a missing-tolerant multinomial probit (MT-MNP) model, which imputes missing values based on the Bayesian theoretical framework. Our experiments using seven and a half years of real-world credit ratings and news full-text data show that (1) the overall news coverage can explain future credit rating changes while the aggregated news sentiment cannot; (2) topic-specific news coverage and sentiment have statistically significant impact on future credit rating changes; (3) topic-specific negative sentiment has a more salient impact on future credit rating changes compared to topic-specific positive sentiment; (4) MT-MNP performs better in predicting future credit rating changes compared to support vector machines (SVM). The performance gap as measured by macroaveraging F-measure is small but consistent.
systems, man and cybernetics | 2006
Hsin-Min Lu; Daniel Zeng; Hsinchun Chen
This paper presents a novel ontology-based approach to classify free-text chief complaints (CCs) into syndrome categories. This approach exploits the semantic relations in a medical ontology to address the CC word variation problem. Initial computational experiments indicate that this ontology-based approach is able to improve significantly the probability that a CC can be correctly classified as a syndrome.
pacific asia workshop on intelligence and security informatics | 2009
Hsin-Min Lu; Nina WanHsin Huang; Zhu Zhang; Tsai-Jyh Chen
Textual data are an important information source for risk management for business organizations. To effectively identify, extract, and analyze risk-related statements in textual data, these processes need to be automated. We developed an annotation framework for firm-specific risk statements guided by previous economic, managerial, linguistic, and natural language processing research. A manual annotation study using news articles from the Wall Street Journal was conducted to verify the framework. We designed and constructed an automated risk identification system based on the annotation framework. The evaluation using manually annotated risk statements in news articles showed promising results for automated risk identification.
Review of Pacific Basin Financial Markets and Policies | 2007
Jow-Ran Chang; Mao-Wei Hung; Cheng-Few Lee; Hsin-Min Lu
We use square root stochastic volatility with or without jump model to study the heteroskedasticity and jump behavior of the Thai Baht. Bayesian factor is used to evaluate the explanatory power of competing model. It turns out that the square root stochastic volatility model with independent jump in observation and state equations (SVIJ) has the best explanatory power to our sample. Using the estimation results of the SVIJ model, we are able to link the major events of the Asian financial crisis to the jump behavior of either volatility or observation.
Journal of Biomedical Informatics | 2016
Hsin-Min Lu; Chih-Ping Wei; Fei-Yuan Hsiao
Information and communications technologies have enabled healthcare institutions to accumulate large amounts of healthcare data that include diagnoses, medications, and additional contextual information such as patient demographics. To gain a better understanding of big healthcare data and to develop better data-driven clinical decision support systems, we propose a novel multiple-channel latent Dirichlet allocation (MCLDA) approach for modeling diagnoses, medications, and contextual information in healthcare data. The proposed MCLDA model assumes that a latent health status group structure is responsible for the observed co-occurrences among diagnoses, medications, and contextual information. Using a real-world research testbed that includes one million healthcare insurance claim records, we investigate the utility of MCLDA. Our empirical evaluation results suggest that MCLDA is capable of capturing the comorbidity structures and linking them with the distribution of medications. Moreover, MCLDA is able to identify the pairing between diagnoses and medications in a record based on the assigned latent groups. MCLDA can also be employed to predict missing medications or diagnoses given partial records. Our evaluation results also show that, in most cases, MCLDA outperforms alternative methods such as logistic regressions and the k-nearest-neighbor (KNN) model for two prediction tasks, i.e., medication and diagnosis prediction. Thus, MCLDA represents a promising approach to modeling healthcare data for clinical decision support.
intelligence and security informatics | 2008
Hsin-Min Lu; Daniel Zeng; Hsinchun Chen
The threat of infectious disease outbreaks and bioterrorism attacks has stimulated the development of syndromic surveillance systems, which focus on using pre-diagnostic data such as emergency department chief complaints and over-the-counter (OTC) drug sales to detect bioterrorism events in a timely manner. A key function of syndromic surveillance systems is detecting possible bioterrorism events from time series data. In this paper, we propose a novel temporal outbreak detection method based on the Markov switching model, a special case of hidden Markov models. The model is motivated to address several computational problems with existing detection schemes concerning the inconsistency in parameter estimation and the resulting undesired detection performance. Preliminary evaluation using simulated outbreaks injected on authentic time series shows that our method outperforms benchmark methods in terms of outbreak detection speed and detection sensitivity at given levels of false alarm rates.
intelligence and security informatics | 2007
Hsin-Min Lu; Chwan-Chuen King; Tsung-Shu Joseph Wu; Fuh-Yuan Shih; Jin-Yi Hsiao; Daniel Dajun Zeng; Hsinchun Chen
There is a critical need for the development of chief complaint (CC) classification systems capable of processing non-English CCs as syndromic surveillance is being increasingly practiced around the world. In this paper, we report on an ongoing effort to develop a Chinese CC classification system based on the analysis of Chinese CCs collected from hospitals in Taiwan. We found that Chinese CCs contain important symptom-related information and provide a valid source of information for syndromic surveillance. Our technical approach consists of two key steps: (a) mapping Chinese CCs to English CCs using a mutual information-based mapping method, and (b) reusing existing English CC classification systems to process translated Chinese CCs. We demonstrate the effectiveness of this proposed approach through a preliminary evaluation study using a real-world dataset.
Industrial Management and Data Systems | 2015
Yu-Tai Chien; Hsin-Min Lu
Purpose – Websites have become an important channel for firms to communicate with their stakeholders. Higher web site traffic could represent effective information disclosure and higher investor recognition. Both may reduce the risk of firm by reducing the level of information asymmetry and facilitating a more complete market by reaching to more potential investors. The purpose of this paper is to investigate the impact of firm web site traffic to the risk of firm. Design/methodology/approach – The authors conducted a cross-sectional study on the risk and firm web site traffic data of 4,122 US public firms. Findings – After controlling for confounding factors, web site traffic is significantly negatively associated with three firm risk measures: cost of equity, return volatility, and analyst forecast dispersion. Originality/value – The results provide new insights to the economic impact of web site traffic. Compared with previous studies that mostly investigated the relationships between web site traffic ...