Jürgen Kleffe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jürgen Kleffe is active.

Explore More

Publication

Featured researches published by Jürgen Kleffe.

Bioinformatics | 1992

First and second moment of counts of words in random texts generated by Markov chains

Jürgen Kleffe; Mark Borodovsky

An exact expression for the variance of random frequency that a given word has in text generated by a Markov chain is presented. The result is applied to periodic Markov chains, which describe the protein-coding DNA sequences better than simple Markov chains. A new solution to the problem of word overlap is proposed. It was found that the expected frequency and overlapping properties determine most of the variance. The expectation and variance of counts for triplets are compared with experimental counts in Escherichia coli coding sequences.

Bioinformatics | 1990

Exact computation of pattern probabilities in random sequences generated by Markov chains

Jürgen Kleffe; Uwe Langbecker

Observed patterns in macromolecular sequences are often considered as words and compared with their probabilities of occurring in random sequences. Calculation of these probabilities, however, often lacks rigour. We have developed an algorithm for exact computation of such probabilities for stochastic sequences that follow a Markov chain model. The method is applicable to the case that a random sequence contains one out of two given patterns P and Q, or both simultaneously. Another application yields the probability function P(x) that a sequence contains pattern P exactly x times. An application to patterns that include wild-card characters yields probabilities for homonucleotide clusters of a given length. We prove the probability of multiple runs of single nucleotides in the SV40 genome to be in accordance with the dinucleotide composition of the sequence, although it is in conflict with mononucleotide composition.

german conference on bioinformatics | 1998

Genegenerator : a flexible algorithm for gene prediction and its application to maize sequences

Jürgen Kleffe; Klaus Hermann; Wolfgang Vahrson; Burghardt Wittig; Volker Brendel

MOTIVATION We developed GeneGenerator because of the need for a tool to predict gene structure without knowing in advance how to score potential exons and introns in order to obtain the best results, pertinent in particular to less well-studied organisms for which suitable training sets are small. GeneGenerator is a very flexible algorithm which for a given genomic sequence generates a number of feasible gene structures satisfying user-defined constraints. The specific implementation described in detail requires minimum scoring for translation start and donor and acceptor splice sites according to previously trained logitlinear models. In addition, potential exons and introns are required to exceed specified minimal lengths and threshold scores for coding or non-coding potential derived as log-likelihood ratios of appropriate Markov sequence models. RESULTS A database of 46 non-redundant genomic sequences from maize is used for illustration. It is shown that the correct gene structures do not always maximize the considered target function. However, in most cases, the correct or nearly correct structures are found in a small set of high-scoring structures. A critical review of the generated structures sometimes allows the choices to be narrowed by considering additional variables such as predicted splice site strength or local optimality of splice site scores. Summary statistics for prediction accuracy over all 46 maize genes are derived under cross-validation and non-cross-validation training conditions for the Markov sequence models. The algorithm achieved exon sensitivity of 0.81 and specificity of 0.75 on an independent set of 14 novel maize genomic segments. AVAILABILITY GeneGenerator runs under Borland-Pascal 7.0 using MS-DOS and C on UNIX work stations. The source code is available upon request. CONTACT [email protected]

Bioinformatics | 1996

Object-oriented sequence analysis : SCL-a C+ + class library

Wolfgang Vahrson; Klaus Hermann; Jürgen Kleffe; Burghardt Wittig

SCL (Sequence Class Library) is a class library written in the C++ programming language. Designed using object-oriented programming principles, SCL consists of classes of objects performing tasks typically needed for analyzing DNA or protein sequences. Among them are very flexible sequence classes, classes accessing databases in various formats, classes managing collections of sequences, as well as classes performing higher-level tasks like calculating a pairwise sequence alignment. SCL also includes classes that provide general programming support, like a dynamically growing array, sets, matrices, strings, classes performing file input/output, and utilities for error handling. By providing these components, SCL fosters an explorative programming style: experimenting with algorithms and alternative implementations is encouraged rather than punished. A description of SCLs overall structure as well as an overview of its classes is given. Important aspects of the work with SCL are discussed in the context of a sample program.

Statistics | 1973

Principal components of random variables with values in a seperable hilbert space

Jürgen Kleffe

we extend the theory of principal components to random variabies X with values in a separable HILBERT space and prove optima! properties well-known for finite-dimen-sional spaces, Further we give an estimate of the variance operator D from a series of independent, observations and prove the strong consistency if the estimate for nuclear D.In the case D has single eigenvalues we find that with probability 1 the limiting properties of the sample principal components computed from such an estimate of D are those of the exact principal components.

Statistics | 1998

Minimum Norm Estimation Under Parameter Constraints with an Application to Insurance

Jürgen Kleffe; Ragnar Norberg

A Minimum Norm Quadratic Estimator is developed for situations where some fixed effects and variance components coincide. The study is motivated by a semiparametric latent variable model frequently encountered in insurance, where a Poisson assumption induces identity between the mean and the within-unit variance. The method is applied to authentic group life insurance data, and its performance is also illustrated by simulations.

Communications in Statistics-theory and Methods | 2001

Minimum norm estimation of variance components for life insurance data

Jürgen Kleffe; Ragnar Norberg

Life insurance companies want to predict the average claimed sums they have to pay in events of death for specific groups of customers in order to derive group specific premiums. This requires estimation of the variability of claims across groups. We derive a corresponding mixed linear model for claim data from many groups of customers that incorporates group-specific age distributions, the Compertz-Makeham mortality function and an unknown group-specific random hazard factor. It takes the form of a generalized replicated model with two variance components where the between blocks variance component depends on the common mean of all observations. Two methods of parameter estimation are derived along the lines of C. R. Raos MINQUE and generalized least squares estimation. Simulations show both methods to work well for large sets of data.

Bioinformatics | 1995

DNASTAT: a Pascal unit for the statistical analysis of DNA and protein sequences

Jürgen Kleffe; Klaus Hermann; W. Gunia; Wolfgang Vahrson; Burghardt Wittig

DNASTAT is a collection of Pascal routines for researchers who develop their own application programs for statistical analysis of DNA and protein sequences. Dynamic and file-based data structures allow users to process sets of sequences by simple loop control without limitations on the number of sequences and their individual sizes. This frees the programmer from potentially error-prone tasks like dynamic memory allocation and controlling array sizes. Sequences can be stored in databases along with biological and statistical attributes. Individual sequences can be accessed by column name and row number as with spread-sheets. DNASTAT allows large sets of sequences to be processed using a PC with standard configuration. Its small size, simplicity and free availability make it attractive to students of mathematical biology. Use of DNASTAT is illustrated by two sample programs that generate a database of coding regions from the GenBank entry of the tobacco chloroplast genome. A version of DNASTAT written in ANSI-C for PCs and Unix workstations is also available.

Bioinformatics | 1993

The joint distribution of patterns in random sequences with application to the RC-measure for expressivity

Jürgen Kleffe; E. Grau

A method was previously developed for computation of pattern probabilities in random sequences under Markov chain models. We extend this method to the calculation of the joint distribution for two patterns. An application yields the distribution of the right choice measure for expressivity and how significance bounds depend on sequence length. These bounds are used to show that the choice of pyrimidine in codon position 3 of Escherichia coli genes deviates considerably from a general Markov process model for coding regions. We also derive some statistical evidence that this significant deviation is limited to codon position 3.

Nucleic Acids Research | 1998