Markus Hegland | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Markus Hegland is active.

Explore More

Publication

Featured researches published by Markus Hegland.

knowledge discovery and data mining | 2004

Febrl – A Parallel Open Source Data Linkage System

Peter Christen; Tim Churches; Markus Hegland

In many data mining projects information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. Most of the time the linkage process is challenged by the lack of a common unique entity identifier, and thus becomes non-trivial. Linking todays large data collections becomes increasingly difficult using traditional linkage techniques. In this paper we present an innovating data linkage system called Febrl, which includes a new probabilistic approach for improved data cleaning and standardisation, innovative indexing methods, a parallelisation approach which is implemented transparently to the user, and a data set generator which allows the random creation of records containing names and addresses. Implemented as open source software, Febrl is an ideal experimental platform for new linkage algorithms and techniques.

Mathematics of Computation | 1999

For numerical differentiation, dimensionality can be a blessing!

R. S. Anderssen; Markus Hegland

Finite difference methods, such as the mid-point rule, have been applied successfully to the numerical solution of ordinary and partial differential equations. If such formulas are applied to observational data, in order to determine derivatives, the results can be disastrous. The reason for this is that measurement errors, and even rounding errors in computer approximations, are strongly amplified in the differentiation process, especially if small step-sizes are chosen and higher derivatives are required. A number of authors have examined the use of various forms of averaging which allows the stable computation of low order derivatives from observational data. The size of the averaging set acts like a regularization parameter and has to be chosen as a function of the grid size h. In this paper, it is initially shown how first (and higher) order single-variate numerical differentiation of higher dimensional observational data can be stabilized with a reduced loss of accuracy than occurs for the corresponding differentiation of one-dimensional data. The result is then extended to the multivariate differentiation of higher dimensional data. The nature of the trade-off between convergence and stability is explicitly characterized, and the complexity of various implementations is examined.

Applicable Analysis | 1995

Variable hilbert scales and their interpolation inequalities with applications to tikhonov regularization

Markus Hegland

Variable Hilbert scales are constructed using the spectral theory of self-adjoint operators in Hilbert spaces. An embedding and an interpolation theorem (based on Jenssens inequality) are proved. They generalize known results about “ordinary” Hilbert scales derived by Natterer [Applic. Anal., 18 (1984), pp.29-37]. Bounds on best possible and actual errors for regularization methods are obtained by applying the interpolation inequality. These bounds extend the standard ones, and, in particular, include exponential and logarithmic error laws. Similar results were established earlier by Hegland [SIAM J. Numer. Anal., 29 (1992), pp. 1446-14611 for compact operators only. Here, they are generalized to include unbounded operators. A detailed discussion of Tikhonov regularization by Nair et al. [Tech. Rep. MR8-94, CMA, Aust. Nat. Uni., 19941 indicates that parameter choice strategies, which were thought to be suboptimal, can give substantially higher convergence rates than the so-called optimal choices! This im...

International Journal of High Speed Computing | 2000

PARALLEL PERFORMANCE OF FAST WAVELET TRANSFORMS

Ole Nielsen; Markus Hegland

We present a parallel 2D wavelet transform algorithm with modest communication requirements. Data are transmitted between nearest neighbors only and the amount is independent of the problem size as well as the number of processors. An analysis of the theoretical performance shows that the algorithm is scalable approaching perfect speedup as the problem size is increased. This performance is realized in practice on the IBM SP2 as well as on the Fujitsu VPP300 where it will form part of the Scientific Software Library.

SIAM Journal on Numerical Analysis | 1992

An optimal order regularization method which does not use additional smoothness assumptions

Markus Hegland

This paper defines an optimal method to reconstruct the solutions of operator equations of the first kind. Only the case of compact operators is considered. The method is in principle a discrepancy method. It does not require any additional knowledge about the solution and is optimal for all standard smoothness assumptions. In order to analyze the properties of the new regularization method variable Hilbert scales are introduced and several well-known results for Hilbert scales are generalized. Convergence theorems for classes of optimal and suboptimal methods are derived from a generalization of the interpolation inequality.

Bulletin of The Australian Mathematical Society | 1998

A stable finite difference ansatz for higher order differentiation of non-exact data

Bob Anderssen; Frank de Hoog; Markus Hegland

If standard central diierence formulas are used to compute second or third order derivatives from measured data even quite precise data can lead to totally unusable results due to the basic instability of the diierentiation process. Here an averaging procedure is presented and analysed which allows the stable computation of low order derivatives from measured data. The new method rst averages the data, then samples the averages and nally applies standard diierence formulas. The size of the averaging set acts like a regularization parameter and has to be chosen as a function of the grid size h.

Computing | 2009

Fitting multidimensional data using gradient penalties and the sparse grid combination technique

Jochen Garcke; Markus Hegland

Sparse grids, combined with gradient penalties provide an attractive tool for regularised least squares fitting. It has earlier been found that the combination technique, which builds a sparse grid function using a linear combination of approximations on partial grids, is here not as effective as it is in the case of elliptic partial differential equations. We argue that this is due to the irregular and random data distribution, as well as the proportion of the number of data to the grid resolution. These effects are investigated both in theory and experiments. As part of this investigation we also show how overfitting arises when the mesh size goes to zero. We conclude with a study of modified “optimal” combination coefficients who prevent the amplification of the sampling noise present while using the original combination coefficients.

Acta Numerica | 2001

Data mining techniques

Markus Hegland

Methods for knowledge discovery in data bases (KDD) have been studied for more than a decade. New methods are required owing to the size and complexity of data collections in administration, business and science. They include procedures for data query and extraction, for data cleaning, data analysis, and methods of knowledge representation. The part of KDD dealing with the analysis of the data has been termed data mining. Common data mining tasks include the induction of association rules, the discovery of functional relationships (classification and regression) and the exploration of groups of similar data objects in clustering. This review provides a discussion of and pointers to efficient algorithms for the common data mining tasks in a mathematical framework. Because of the size and complexity of the data sets, efficient algorithms and often crude approximations play an important role.

SIAM Journal on Numerical Analysis | 2003

Approximation of a Thin Plate Spline Smoother Using Continuous Piecewise Polynomial Functions

Stephen Roberts; Markus Hegland; Irfan Altas

A new smoothing method is proposed which can be viewed as a finite element thin plate spline. This approach combines the favorable properties of finite element surface fitting with those of thin plate splines. The method is based on first order techniques similar to mixed finite element techniques for the biharmonic equation. The existence of a solution to our smoothing problem is demonstrated, and the approximation theory for uniformly spread data is presented in the case of both exact and noisy data. This convergence analysis seems to be the first for a discrete smoothing spline with data perturbed by white noise. Numerical results are presented which verify our theoretical results and demonstrate our method on a large real life data set.

intelligent data analysis | 2003

A Logical Formalisation of the Fellegi-Holt Method of Data Cleaning

Agnes Boskovitz; Rajeev Goré; Markus Hegland

The Fellegi-Holt method automatically “corrects” data that fail some predefined requirements. Computer implementations of the method were used in many national statistics agencies but are less used now because they are slow. We recast the method in propositional logic, and show that many of its results are well-known results in propositional logic. In particular we show that the Fellegi-Holt method of “edit generation” is essentially the same as a technique for automating logical deduction called resolution. Since modern implementations of resolution are capable of handling large problems efficiently, they might lead to more efficient implementations of the Fellegi-Holt method.

Explore More