Balazs Feil | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Balazs Feil is active.

Explore More

Publication

Featured researches published by Balazs Feil.

Fuzzy Sets and Systems | 2005

Modified Gath--Geva clustering for fuzzy segmentation of multivariate time-series

János Abonyi; Balazs Feil; Sándor Németh; Peter Arva

Partitioning a time-series into internally homogeneous segments is an important data-mining problem. The changes of the variables of a multivariate time-series are usually vague and do not focus on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation, because the clusters need to be contiguous in time. This paper proposes a clustering algorithm for the simultaneous identification of local probabilistic principal component analysis (PPCA) models used to measure the homogeneity of the segments and fuzzy sets used to represent the segments in time. The algorithm favors contiguous clusters in time and is able to detect changes in the hidden structure of multivariate time-series. A fuzzy decision making algorithm based on a compatibility criteria of the clusters has been worked out to determine the required number of segments, while the required number of principal components are determined by the screeplots of the eigenvalues of the fuzzy covariance matrices. The application example shows that this new technique is a useful tool for the analysis of historical process data.

intelligent data analysis | 2003

Fuzzy Clustering Based Segmentation of Time-Series

János Abonyi; Balazs Feil; Sándor Németh; Peter Arva

The segmentation of time-series is a constrained clustering problem: the data points should be grouped by their similarity, but with the constraint that all points in a cluster must come from successive time points. The changes of the variables of a time-series are usually vague and do not focused on any particular time point. Therefore it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation. This paper proposes a clustering algorithm for the simultaneous identification of fuzzy sets which represent the segments in time and the local PCA models used to measure the homogeneity of the segments. The algorithm is applied to the monitoring of the production of high-density polyethylene.

international conference on advances in system simulation | 2009

Comparison of Monte Carlo and Quasi Monte Carlo Sampling Methods in High Dimensional Model Representation

Balazs Feil; Sergei S. Kucherenko; Nilay Shah

A number of new techniques which improve the efficiency of Random Sampling-High Dimensional Model Representation (RS-HDMR) is presented. Comparison shows that Quasi Monte Carlo based HDMR (QRS-HDRM) significantly outperforms RS-HDMR. RS/QRS-HDRM based methods also show faster convergence than the Sobol method for sensitivity indices calculation. Numerical tests prove that the developed methods for choosing optimal orders of polynomials and the number of sampled points are robust and efficient.

Archive | 2007

Geodesic Distance Based Fuzzy Clustering

Balazs Feil; János Abonyi

Clustering is a widely applied tool of data mining to detect the hidden structure of complex multivariate datasets. Hence, clustering solves two kinds of problems simultaneously, it partitions the datasets into cluster of objects that are similar to each other and describes the clusters by cluster prototypes to provide some information about the distribution of the data. In most of the cases these cluster prototypes describe the clusters as simple geometrical objects, like spheres, ellipsoids, lines, linear subspaces etc., and the cluster prototype defines a special distance function. Unfortunately in most of the cases the user does not have prior knowledge about the number of clusters and not even about the proper shape of prototypes. The real distribution of data is generally much more complex than these simple geometrical objects, and the number of clusters depends much more on how well the chosen cluster prototypes fit the distribution of data than on the real groups within the data. This is especially true when the clusters are used for local linear modeling purposes.

soft computing | 2006

Visualization of fuzzy clusters by fuzzy Sammon mapping projection: application to the analysis of phase space trajectories

Balazs Feil; Balazs Balasko; János Abonyi

Since in practical data mining problems high-dimensional data are clustered, the resulting clusters are high-dimensional geometrical objects, which are difficult to analyze and interpret. Cluster validity measures try to solve this problem by providing a single numerical value. As a low dimensional graphical representation of the clusters could be much more informative than such a single value, this paper proposes a new tool for the visualization of fuzzy clustering results. By using the basic properties of fuzzy clustering algorithms, this new tool maps the cluster centers and the data such that the distances between the clusters and the data-points are preserved. During the iterative mapping process, the algorithm uses the membership values of the data and minimizes an objective function similar to the original clustering algorithm. Comparing to the original Sammon mapping not only reliable cluster shapes are obtained but the numerical complexity of the algorithm is also drastically reduced. The developed tool has been applied for visualization of reconstructed phase space trajectories of chaotic systems. The case study demonstrates that proposed FUZZSAMM algorithm is a useful tool in user-guided clustering.

Computers & Chemical Engineering | 2005

Monitoring process transitions by Kalman filtering and time-series segmentation

Balazs Feil; János Abonyi; Sándor Németh; Peter Arva

Abstract The analysis of historical process data of technological systems plays important role in process monitoring, modelling and control. Time-series segmentation algorithms are often used to detect homogenous periods of operation-based on input–output process data. However, historical process data alone may not be sufficient for the monitoring of complex processes. This paper incorporates the first-principle model of the process into the segmentation algorithm. The key idea is to use a model-based non-linear state-estimation algorithm to detect the changes in the correlation among the state-variables. The homogeneity of the time-series segments is measured using a PCA similarity factor calculated from the covariance matrices given by the state-estimation algorithm. The whole approach is applied to the monitoring of an industrial high-density polyethylene plant.

international conference on artificial intelligence and soft computing | 2004

Semi-mechanistic models for state-estimation - Soft sensor for polymer melt index prediction

Balazs Feil; János Abonyi; Peter Pach; Sándor Németh; Peter Arva; Miklos Nemeth; Gábor Nagy

Nonlinear state estimation is a useful approach to the monitoring of industrial (polymerization) processes. This paper investigates how this approach can be followed to the development of a soft sensor of the product quality (melt index). The bottleneck of the successful application of advanced state estimation algorithms is the identification of models that can accurately describe the process. This paper presents a semi-mechanistic modeling approach where neural networks describe the unknown phenomena of the system that cannot be formulated by prior knowledge based differential equations. Since in the presented semi-mechanistic model structure the neural network is a part of a nonlinear algebraic-differential equation set, there are no available direct input-output data to train the weights of the network. To handle this problem in this paper a simple, yet practically useful spline-smoothing based technique has been used. The results show that the developed semi-mechanistic model can be efficiently used for on-line state estimation.

systems man and cybernetics | 2006

Process-data-warehousing-based operator support system for complex production technologies

Ferenc Peter Pach; Balazs Feil; Sándor Németh; Peter Arva; János Abonyi

Process manufacturing is increasingly being driven by market forces, customer needs, and perceptions, resulting in more and more complex multiproduct manufacturing technologies. The increasing automation and tighter quality constraints related to these processes make the operators job more and more difficult. This makes decision support systems (DSSs) for the operator more important than ever before. A traditional operator support system (OSS) focuses only on specific tasks that are performed. In the case of complex processes, the design of an integrated information system is extremely important. The proposed data-warehouse-based OSS makes possible linking complex and isolated production units based on the integration of the heterogenous information collected from the production units of a complex production process. The developed OSS is based on a data warehouse designed by following the proposed focus-on-process data-warehouse-design approach, which means stronger focus on the material and information flow through the entire enterprise. The resulting OSS follows the process through the organization instead of focusing separate tasks of the isolated process units. For human-computer interaction, front-end tools have been worked out, where exploratory data analysis and advanced multivariate statistical models are applied to extract the most informative features of the operation of the technology. The concept is illustrated by an industrial case study, where the OSS is designed for the monitoring and control of a high-density polyethylene (HDPE) plant

systems, man and cybernetics | 2004

State-space reconstruction and prediction of chaotic time series based on fuzzy clustering

János Abonyi; Balazs Feil; Sándor Németh; R. Arva; R. Babuska

Selecting the embedding dimension of a dynamic system is a key step toward the analysis and prediction of nonlinear and chaotic time-series. This paper proposes a clustering-based algorithm for this purpose. The clustering is applied in the reconstructed space defined by the lagged output variables. The intrinsic dimension of the reconstructed space is then estimated based on the analysis of the eigenvalues of the fuzzy cluster covariance matrices, while the correct embedding dimension is inferred from the prediction performance of the local models of the clusters. The main advantage of the proposed solution is that three tasks are simultaneously solved during clustering: selection of the embedding dimension, estimation of the intrinsic dimension, and identification of a model that can be used for prediction.

Archive | 2003

Determining the Model Order of Nonlinear Input-Output Systems by Fuzzy Clustering

Balazs Feil; János Abonyi; Ferenc Szeifert

Selecting the order of an input-output model of a dynamical system is a key step toward the goal of system identification. By determining the smallest regression vector dimension that allows accurate prediction of the output, the false nearest neighbors algorithm (FNN) is a useful tool for linear and also for nonlinear systems. The one parameter that needs to be determined before performing FNN is the threshold constant that is used to compute the percentage of false neighbors. For this purpose heuristic rules can be followed. However, for nonlinear systems choosing a suitable threshold is extremely important, the optimal choice of this parameter will depend on the system. While this advanced FNN uses nonlinear input-output data based models, the computational effort of the method increases along with the number of data and the dimension of the model. To increase the efficiency of the method this paper proposes the application of a fuzzy clustering algorithm. The advantage of the generated solutions is that it remains in the horizon of the data, hence there is no need to apply nonlinear model identification tools. The efficiency of the algorithm is supported by a data driven identification of a polymerization reactor.

Explore More