Yunlong Gao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yunlong Gao is active.

Explore More

Publication

Featured researches published by Yunlong Gao.

Neurocomputing | 2010

Edited AdaBoost by weighted kNN

Yunlong Gao; Feng Gao

Any realistic model of learning from samples must address the issue of noisy data. AdaBoost is known as an effective method for improving the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost is prone to overfitting, especially in noisy domains. On the other hand, the kNN rule is one of the oldest and simplest methods for pattern classification. Nevertheless, it often yields competitive results, and in certain domains, when cleverly combined with prior knowledge, it has significantly advanced the state-of-the-art. In this paper, an edited AdaBoost by weighted kNN (EAdaBoost ) is designed where AdaBoost and kNN naturally complement each other. First, AdaBoost is run on the training data to capitalize on some statistical regularity in the data. Then, a weighted kNN algorithm is run on the feature space composed of classifiers produced by AdaBoost to achieve competitive results. AdaBoost is then used to enhance the classification accuracy and avoid overfitting by editing the data sets using the weighted kNN algorithm for improving the quality of training data. Experiments performed on ten different UCI data sets show that the new Boosting algorithm almost always achieves considerably better classification accuracy than AdaBoost. Furthermore, experiments on data with artificially controlled noise indicate that the new Boosting algorithm is robust to noise.

Knowledge Based Systems | 2012

A novel two-level nearest neighbor classification algorithm using an adaptive distance metric

Yunlong Gao; Jinyan Pan; Guoli Ji; Zijiang Yang

When there exist an infinite number of samples in the training set, the outcome from nearest neighbor classification (kNN) is independent on its adopted distance metric. However, it is impossible that the number of training samples is infinite. Therefore, selecting distance metric becomes crucial in determining the performance of kNN. We propose a novel two-level nearest neighbor algorithm (TLNN) in order to minimize the mean-absolute error of the misclassification rate of kNN with finite and infinite number of training samples. At the low-level, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. At the high-level, AdaBoost is used as guidance for local information extraction. Data invariance is maintained by TLNN and the highly stretched or elongated neighborhoods along different directions are produced. The TLNN algorithm can reduce the excessive dependence on the statistical method which learns prior knowledge from the training data. Even the linear combination of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. The experiments on both synthetic and real world data sets provide justifications for our proposed method.

International Journal of Systems Science | 2013

Self-balancing dynamic scheduling of electrical energy for energy-intensive enterprises

Yunlong Gao; Feng Gao; Qiaozhu Zhai; Xiaohong Guan

Balancing production and consumption with self-generation capacity in energy-intensive enterprises has huge economic and environmental benefits. However, balancing production and consumption with self-generation capacity is a challenging task since the energy production and consumption must be balanced in real time with the criteria specified by power grid. In this article, a mathematical model for minimising the production cost with exactly realisable energy delivery schedule is formulated. And a dynamic programming (DP)-based self-balancing dynamic scheduling algorithm is developed to obtain the complete solution set for such a multiple optimal solutions problem. For each stage, a set of conditions are established to determine whether a feasible control trajectory exists. The state space under these conditions is partitioned into subsets and each subset is viewed as an aggregate state, the cost-to-go function is then expressed as a function of initial and terminal generation levels of each stage and is proved to be a staircase function with finite steps. This avoids the calculation of the cost-to-go of every state to resolve the issue of dimensionality in DP algorithm. In the backward sweep process of the algorithm, an optimal policy is determined to maximise the realisability of energy delivery schedule across the entire time horizon. And then in the forward sweep process, the feasible region of the optimal policy with the initial and terminal state at each stage is identified. Different feasible control trajectories can be identified based on the region; therefore, optimising for the feasible control trajectory is performed based on the region with economic and reliability objectives taken into account.

systems man and cybernetics | 2012

A Dynamic AdaBoost Algorithm With Adaptive Changes of Loss Function

Yunlong Gao; Guoli Ji; Zijiang Yang; Jinyan Pan

AdaBoost is a method to improve a given learning algorithms classification accuracy by combining its hypotheses. Adaptivity, one of the significant advantages of AdaBoost, makes AdaBoost maximize the smallest margin so that AdaBoost has good generalization ability. However, when the samples with large negative margins are noisy or atypical, the maximized margin is actually a “hard margin.” The adaptive feature makes AdaBoost sensitive to the sampling fluctuations, and prone to overfitting. Therefore, the traditional schemes prevent AdaBoost from overfitting by heavily damping the influences of samples with large negative margins. However, the samples with large negative margins are not always noisy or atypical; thus, the traditional schemes of preventing overfitting may not be reasonable. In order to learn a classifier with high generalization performance and prevent overfitting, it is necessary to perform statistical analysis for the margins of training samples. Herein, Hoeffding inequality is adopted as a statistical tool to divide training samples into reliable samples and temporary unreliable samples. A new boosting algorithm, which is named DAdaBoost, is introduced to deal with reliable samples and temporary unreliable samples separately. Since DAdaBoost adjusts weighting scheme dynamically, the loss function of DAdaBoost is not fixed. In fact, it is a series of nonconvex functions that gradually approach the 0-1 function as the algorithm evolves. By defining a virtual classifier, the dynamic adjusted weighting scheme is well unified into the progress of DAdaBoost, and the upper bound of training error is deduced. The experiments on both synthetic and real world data show that DAdaBoost has many merits. Based on the experiments, we conclude that DAdaBoost can effectively prevent AdaBoost from overfitting.

world congress on intelligent control and automation | 2010

Improved Boosting algorithm with adaptive filtration

Yunlong Gao; Feng Gao; Xiaohong Guan

AdaBoost is known as an effective method to improve the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost is always prone to overfitting especially in noisy case. In addition, most current works on Boosting assume that the loss function is fixed and therefore do not take the distinction between noisy case and noise-free case into consideration. In this paper, an improved Boosting algorithm with adaptive filtration is proposed. A filtering algorithm is designed firstly based on Hoeffding Inequality to identify mislabeled or atypical samples. By introducing the filtering algorithm, we manage to modify the loss function such that influences of mislabeled or atypical samples are penalized. Experiments performed on eight different UCI data sets show that the new Boosting algorithm almost always obtains considerably better classification accuracy than AdaBoost. Furthermore, experiments on data with artificially controlled noise indicate that the new Boosting algorithm is more robust to noise than AdaBoost.

international conference on computer science and information technology | 2010

Improved boosting algorithm through weighted k-nearest neighbors classifier

Yunlong Gao; Jinyan Pan; Feng Gao

AdaBoost is known as an effective method for improving the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost is prone to overfitting, especially in noisy domains. On the other hand, the k-nearest neighbors (kNN) rule is one of the oldest and simplest methods for pattern classification, when cleverly combined with prior knowledge, it often yields competitive results. In this paper, an improved boosting algorithm is proposed where AdaBoost and kNN naturally complement each other. First, AdaBoost is run on the training data to capitalize on some statistical regularity in the data. Then, a weighted kNN algorithm designed in this paper is run on the feature space composed of classifiers produced by AdaBoost to achieve competitive results. AdaBoost is then used to enhance the classification accuracy and avoid overfitting by editing the data sets using the weighted kNN algorithm for improving the quality of training data. Experiments performed on ten different UCI data sets show that the new Boosting algorithm almost always achieves considerably better classification accuracy than AdaBoost.

world congress on intelligent control and automation | 2008

On the balance model of power demand/supply for large-scale power consuming corporation

Yunlong Gao; Feng Gao; Qiaozhu Zhai; Xiaohong Guan; Dianmin Zhou; Chuan Zhang

With the process of electricity market-oriented, taking advantage of the capability of electricity generation of autonomous power plant, large-scale power consuming corporation can decrease electricity expenses and boost competence of the corporation. However, forecasting accuracy of load is low due to private characteristics of large-scale power consuming corporation, it is difficult to satisfy the requirement for economic dispatch. In this paper, gateway balance is defined, balance between electricity generation and electricity consumption of large-scale power consuming corporation are analyzed and mathematical model is built. Loss function is employed to reduce prediction error further and enhance ratio of gateway balance.

world congress on intelligent control and automation | 2014

Nearest neighbor classification method based on the mutual information distance measure

Yunlong Gao; Peng Yan; JinYan Pan

The k-nearest neighbor classification method predicts the class label of a query pattern based on its nearest neighbors. So which samples can be selected as the nearest neighbors of the query pattern and how to use these neighbor samples to predict the class label of the query pattern are two key problems in the nearest neighbor based method. Based on the definition of mutual information distance measure, a new classification method named mutual information distance measure nearest neighbor method is proposed in this paper. The efficiency of our method is validated and compared against other techniques using the real-world data, experimental results show that the new algorithm behaviors the excellent performance and robustness.

Applied Surface Science | 2008