Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Liang Sian Lin is active.

Publication


Featured researches published by Liang Sian Lin.


European Journal of Operational Research | 2013

A new approach to assess product lifetime performance for small data sets

Der Chiang Li; Liang Sian Lin

Because of cost and time limit factors, the number of samples is usually small in the early stages of manufacturing systems, and the scarcity of actual data will cause problems in decision-making. In order to solve this problem, this paper constructs a counter-intuitive hypothesis testing method by choosing the maximal p-value based on a two-parameter Weibull distribution to enhance the estimate of a nonlinear and asymmetrical shape of product lifetime distribution. Further, we systematically generate virtual data to extend the small data set to improve learning robustness of product lifetime performance. This study provides simulated data sets and two practical examples to demonstrate that the proposed method is a more appropriate technique to increase estimation accuracy of product lifetime for normal or non-normal data with small sample sizes.


decision support systems | 2014

Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency

Der Chiang Li; Liang Sian Lin; Li Jhong Peng

Small-data problems are commonly encountered in the early stages of a new manufacturing procedure, presenting challenges to both academics and practitioners, as good performance is difficult to achieve with learning models when there is a lack of sufficient data. Virtual sample generation (VSG) has been shown to be an effective method to overcome this issue in a wide range of studies in various fields. Such works usually assume that the relations among attributes are independent of each other, and produce synthetic data by using sample distributions of these. However, the VSG technique may be ineffective if the real data has interrelated attributes. Therefore, this research provides a novel procedure to generate related virtual samples with non-linear attribute dependency. To construct a relational model between the independent and dependent attributes, we employ gene expression programming (GEP) to find the most suitable mathematical model. One practical dataset and three real UCI datasets are presented in this paper to verify the effectiveness of the proposed method, and the results show that the proposed approach has better learning accuracy with regard to a back-propagation neural (BPN) network than that of the well-known mega-trend-diffusion (MTD) and the multi regression analysis (MRA) approaches.


decision support systems | 2014

Generating information for small data sets with a multi-modal distribution

Der Chiang Li; Liang Sian Lin

Virtual sample generation approaches have been used with small data sets to enhance classification performance in a number of reports. The appropriate estimation of data distribution plays an important role in this process, with performance usually better for data sets that have a simple distribution rather than a complex one. Mixed-type data sets often have a multi-modal distribution instead of a simple, uni-modal one. This study thus proposes a new approach to detect multi-modality in data sets, to avoid the problem of inappropriately using a uni-modal distribution. We utilize the common k-means clustering method to detect possible clusters, and, based on the clustered sample sets, a Weibull variate is developed for each of these to produce multi-modal virtual data. In this approach, the degree of error variation in the Weibull skewness between the original and virtual data is measured and used as the criterion for determining the sizes of virtual samples. Six data sets with different training data sizes are employed to check the performance of the proposed method, and comparisons are made based on the classification accuracies. The results using non-parametric testing show that the proposed method has better classification performance to that of the recently presented Mega-Trend-Diffusion method. We propose a new method to assess the modality of small data.We construct a unique scheme to systematically generate multi-modal virtual samples.In this scheme, we suggest a criterion to control the size of virtual samples.We utilize six data sets to show the superiority of the proposed method.


International Journal of Production Research | 2017

The attribute-trend-similarity method to improve learning performance for small datasets

Der Chiang Li; Wu Kuo Lin; Liang Sian Lin; Chien Chih Chen; Wen Ting Huang

Small data-set learning problems are attracting more attention because of the short product lifecycles caused by the increasing pressure of global competition. Although statistical approaches and machine learning algorithms are widely applied to extract information from such data, these are basically developed on the assumption that training samples can represent the properties of the whole population. However, as the properties that the training samples contain are limited, the knowledge that the learning algorithms extract may also be deficient. Virtual sample generation approaches, used as a kind of data pretreatment, have proved their effectiveness when handling small data-set problems. By considering the relationships among attributes in the value generation procedure, this research proposes a non-parametric process to learn the trend similarities among attributes, and then uses these to estimate the corresponding ranges that attribute values may be located in when other attribute values are given. The ranges of the attribute values of the virtual samples are then stepwise estimated using the triangular membership functions (MFs) built to represent the attribute sample distributions. In the experiment, two real cases are examined with four modelling tools, including the M5′ model tree (M5′), multiple linear regression, support vector regression and back-propagation neural network. The results show that the forecasting accuracies of the four modelling tools are improved when training sets contain virtual samples. In addition, the outcomes of the proposed procedure show significantly smaller predictive errors than those of other approaches.


PLOS ONE | 2017

Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets

Der Chiang Li; Susan C. Hu; Liang Sian Lin; Chun Wu Yeh

It is difficult for learning models to achieve high classification performances with imbalanced data sets, because with imbalanced data sets, when one of the classes is much larger than the others, most machine learning and data mining classifiers are overly influenced by the larger classes and ignore the smaller ones. As a result, the classification algorithms often have poor learning performances due to slow convergence in the smaller classes. To balance such data sets, this paper presents a strategy that involves reducing the sizes of the majority data and generating synthetic samples for the minority data. In the reducing operation, we use the box-and-whisker plot approach to exclude outliers and the Mega-Trend-Diffusion method to find representative data from the majority data. To generate the synthetic samples, we propose a counterintuitive hypothesis to find the distributed shape of the minority data, and then produce samples according to this distribution. Four real datasets were used to examine the performance of the proposed approach. We used paired t-tests to compare the Accuracy, G-mean, and F-measure scores of the proposed data pre-processing (PPDP) method merging in the D3C method (PPDP+D3C) with those of the one-sided selection (OSS), the well-known SMOTEBoost (SB) study, and the normal distribution-based oversampling (NDO) approach, and the proposed data pre-processing (PPDP) method. The results indicate that the classification performance of the proposed approach is better than that of above-mentioned methods.


decision support systems | 2018

Rebuilding sample distributions for small dataset learning

Der Chiang Li; Wu Kuo Lin; Chien Chih Chen; Hung Yu Chen; Liang Sian Lin

Abstract Over the past few decades, a few learning algorithms have been proposed to extract knowledge from data. The majority of these algorithms have been developed with the assumption that training sets can denote populations. When the training sets contain only a few properties of their populations, the algorithms may extract minimal and/or biased knowledge for decision makers. This study develops a systematic procedure based on fuzzy theories to create new training sets by rebuilding the possible sample distributions, where the procedure contains new functions that estimate domains and a sample generating method. In this study, two real cases of a leading company in the thin film transistor liquid crystal display (TFT-LCD) industry are examined. Two learning algorithms—a back-propagation neural network and support vector regression—are employed for modeling, and two sample generation approaches—bootstrap aggregating (bagging) and the synthetic minority over-sampling technique (SMOTE)—are employed to compare the accuracy of the models. The results indicate that the proposed method outperforms bagging and the SMOTE with the greatest amount of statistical support.


international conference on advanced applied informatics | 2016

Improving Virtual Sample Generation for Small Sample Learning with Dependent Attributes.

Liang Sian Lin; Der Chiang Li; Chih Wei Pan

Since the product life cycles are getting shorter and shorter, the issue of small data set learning has drawn more and more attentions in both academics and enterprises. Many methods have been proposed to improve the learning performance of small data set. In these methods, the virtual sample generation approach is the most popular technique for improving small data learning. In the process of virtual sample generation, the attribute independence in small data is the key part to determine the learning performance, because it is the necessary assumption before generating virtual samples. However, in the real world, attributes in the data set usually are not mutual independent. Therefore, this paper proposes a new process to generate independent virtual samples based on the box-and-whisker plot domain estimation. In order to validate the effectiveness of the proposed method, one data set is used to calculate the classification accuracy average and standard deviation based on the support vector machine. The results of the experiment show that the presented method has a superior classification performance than other methods.


international conference on advanced applied informatics | 2016

A Learning Approach with Under-and Over-Sampling for Imbalanced Data Sets

Chun Wu Yeh; Der Chiang Li; Liang Sian Lin; Tung I. Tsai

It is difficult for learning models to achieve high classification performance with imbalanced data sets. To conquer the problem, this study presents a strategy involving the reduction of size of majority data set and the generation of synthetic samples of minority data set. Parkinsons disease data set is used to examine and to compare the performance of classification methods. The paired t-tests are also used to show the effectiveness of the proposed method compari.ng with that of the other methods.


Neurocomputing | 2018

An attribute extending method to improve learning performance for small datasets

Liang Sian Lin; Der Chiang Li; Hung Yu Chen; Yu Chun Chiang

Abstract A small dataset often makes it difficult to build a reliable learning model, and thus some researchers have proposed virtual sample generation (VSG) methods to add artificial samples into small datasets to extend the data size. However, for some datasets the assumption of the distribution of data in the VSG methods may be vague, and when data only has a few attributes, such approaches may not work effectively. Other researchers thus proposed attribute extension methods to generate attributes to convert data into a higher dimensional space. Unfortunately, the resulting dataset may become a sparse dataset with many null or zero values in extended attributes, and then a large quantity of such attributes will reduce the representativeness of instances for the learning model. Therefore, based on fuzzy theories, this paper proposes a novel sample attribute extending (SEA) method to extend a suitable quantity of attributes to improve small dataset learning. In order to verify the validity of the SEA method, using SVR and BPNN, this paper adopts two real cases and two public datasets to conduct the learning of the predictive model, and uses the paired t-test to statistically examine the significance of improvement. The experimental results show that the proposed SEA method can effectively improve the learning accuracy of small datasets.


international conference on advanced applied informatics | 2016

Extending sample information for small data set prediction

Hung Yu Chen; Der Chiang Li; Liang Sian Lin

This paper proposes a method that focuses on creating new data attributes by using fuzzy operations for solving small dataset learning problems. Using the idea of fuzzy rules, the membership value of antecedents in each rule can be extracted from the data point. Therefore, in this research, those membership values will be deemed as new data features and the data dimensionality will be extended. To test the effectiveness of the proposed method, the data set with new data features and the one with no special treatment will be utilized respectively to build predictive models. Paired t-test is carried out to see how effective the proposed method can improve the learning on the basis of small sample sets.

Collaboration


Dive into the Liang Sian Lin's collaboration.

Top Co-Authors

Avatar

Der Chiang Li

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Hung Yu Chen

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chien Chih Chen

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wu Kuo Lin

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chih Wei Pan

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Li Jhong Peng

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Mei Lan Su

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Susan C. Hu

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Tung I. Tsai

National Cheng Kung University

View shared research outputs
Researchain Logo
Decentralizing Knowledge