Károly Héberger
Hungarian Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Károly Héberger.
Journal of Cheminformatics | 2015
Dávid Bajusz; Anita Rácz; Károly Héberger
AbstractBackgroundCheminformaticians are equipped with a very rich toolbox when carrying out molecular similarity calculations. A large number of molecular representations exist, and there are several methods (similarity and distance metrics) to quantify the similarity of molecular representations. In this work, eight well-known similarity/distance metrics are compared on a large dataset of molecular fingerprints with sum of ranking differences (SRD) and ANOVA analysis. The effects of molecular size, selection methods and data pretreatment methods on the outcome of the comparison are also assessed.ResultsA supplier database (https://mcule.com/) was used as the source of compounds for the similarity calculations in this study. A large number of datasets, each consisting of one hundred compounds, were compiled, molecular fingerprints were generated and similarity values between a randomly chosen reference compound and the rest were calculated for each dataset. Similarity metrics were compared based on their ranking of the compounds within one experiment (one dataset) using sum of ranking differences (SRD), while the results of the entire set of experiments were summarized on box and whisker plots. Finally, the effects of various factors (data pretreatment, molecule size, selection method) were evaluated with analysis of variance (ANOVA).ConclusionsThis study complements previous efforts to examine and rank various metrics for molecular similarity calculations. Here, however, an entirely general approach was taken to neglect any a priori knowledge on the compounds involved, as well as any bias introduced by examining only one or a few specific scenarios. The Tanimoto index, Dice index, Cosine coefficient and Soergel distance were identified to be the best (and in some sense equivalent) metrics for similarity calculations, i.e. these metrics could produce the rankings closest to the composite (average) ranking of the eight metrics. The similarity metrics derived from Euclidean and Manhattan distances are not recommended on their own, although their variability and diversity from other similarity metrics might be advantageous in certain cases (e.g. for data fusion). Conclusions are also drawn regarding the effects of molecule size, selection method and data pretreatment on the ranking behavior of the studied metrics. Graphical AbstractA visual summary of the comparison of similarity metrics with sum of ranking differences (SRD).
Journal of Chemometrics | 2011
Károly Héberger; Klára Kollár-Hunek
This paper describes the theoretical background, algorithm and validation of a recently developed novel method of ranking based on the sum of ranking differences [TrAC Trends Anal. Chem. 2010; 29: 101–109]. The ranking is intended to compare models, methods, analytical techniques, panel members, etc. and it is entirely general. First, the objects to be ranked are arranged in the rows and the variables (for example model results) in the columns of an input matrix. Then, the results of each model for each object are ranked in the order of increasing magnitude. The difference between the rank of the model results and the rank of the known, reference or standard results is then computed. (If the golden standard ranking is known the rank differences can be completed easily.) In the end, the absolute values of the differences are summed together for all models to be compared. The sum of ranking differences (SRD) arranges the models in a unique and unambiguous way. The closer the SRD value to zero (i.e. the closer the ranking to the golden standard), the better is the model. The proximity of SRD values shows similarity of the models, whereas large variation will imply dissimilarity. Generally, the average can be accepted as the golden standard in the absence of known or reference results, even if bias is also present in the model results in addition to random error. Validation of the SRD method can be carried out by using simulated random numbers for comparison (permutation test). A recursive algorithm calculates the discrete distribution for a small number of objects (n < 14), whereas the normal distribution is used as a reasonable approximation if the number of objects is large. The theoretical distribution is visualized for random numbers and can be used to identify SRD values for models that are far from being random. The ranking and validation procedures are called Sum of Ranking differences (SRD) and Comparison of Ranks by Random Numbers (CRNN), respectively. Copyright
Journal of Chromatography A | 2002
Annamaria Jakab; Károly Héberger; Esther Forgács
Different vegetable oil samples (almond, avocado, corngerm, grapeseed, linseed, olive, peanut, pumpkin seed, soybean, sunflower, walnut, wheatgerm) were analyzed using high-performance liquid chromatography-atmospheric pressure chemical ionization mass spectrometry. A gradient elution technique was applied using acetone-acetonitrile eluent systems on an ODS column (Purospher, RP-18e, 125 x 4 mm, 5 microm). Identification of triacylglycerols (TAGs) was based on the pseudomolecular ion [M+1]+ and the diacylglycerol fragments. The positional isomers of triacylglycerol were identified from the relative intensities of the [M-RCO2]+ fragments. Linear discriminant analysis (LDA) as a common multivariate mathematical-statistical calculation was successfully used to distinguish the oils based on their TAG composition. LDA showed that 97.6% of the samples were classified correctly.
Journal of Agricultural and Food Chemistry | 2009
Rosa M. Alonso-Salces; Francesca Serra; Fabiano Reniero; Károly Héberger
Green coffee beans of the two main commercial coffee varieties, Coffea arabica (Arabica) and Coffea canephora (Robusta), from the major growing regions of America, Africa, Asia, and Oceania were studied. The contents of chlorogenic acids, cinnamoyl amides, cinnamoyl glycosides, free phenolic acids, and methylxanthines of green coffee beans were analyzed by liquid chromatography coupled with UV spectrophotometry to determine their botanical and geographical origins. The analysis of caffeic acid, 3-feruloylquinic acid, 5-feruloylquinic acid, 4-feruloylquinic acid, 3,4-dicaffeoylquinic acid, 3-caffeoyl-5-feruloylquinic acid, 3-caffeoyl-4-feruloylquinic acid, 3-p-coumaroyl-4-caffeoylquinic acid, 3-caffeoyl-4-dimethoxycinnamoylquinic acid, 3-caffeoyl-5-dimethoxycinnamoylquinic acid, p-coumaroyl-N-tryptophan, feruloyl-N-tryptophan, caffeoyl-N-tryptophan, and caffeine enabled the unequivocal botanical characterization of green coffee beans. Moreover, some free phenolic acids and cinnamate conjugates of green coffee beans showed great potential as means for the geographical characterization of coffee. Thus, p-coumaroyl-N-tyrosine, caffeoyl-N-phenylalanine, caffeoyl-N-tyrosine, 3-dimethoxycinnamoyl-5-feruloylquinic acid, and dimethoxycinnamic acid were found to be characteristic markers for Ugandan Robusta green coffee beans. Multivariate data analysis of the phenolic and methylxanthine profiles provided preliminary results that allowed showing their potential for the determination of the geographical origin of green coffees. Linear discriminant analysis (LDA) and partial least-squares discriminant analysis (PLS-DA) provided classification models that correctly identified all authentic Robusta green coffee beans from Cameroon and Vietnam and 94% of those from Indonesia. Moreover, PLS-DA afforded independent models for Robusta samples from these three countries with sensitivities and specificities of classifications close to 100% and for Arabica samples from America and Africa with sensitivities of 86 and 70% and specificities to the other class of 90 and 97%, respectively.
Molecules | 2004
Orsolya Farkas; Judit Jakus; Károly Héberger
A quantitative structure-antioxidant activity relationship (QSAR) study of 36 flavonoids was performed using the partial least squares projection of latent structures (PLS) method. The chemical structures of the flavonoids have been characterized by constitutional descriptors, two-dimensional topological and connectivity indices. Our PLS model gave a proper description and a suitable prediction of the antioxidant activities of a diverse set of flavonoids having clustering tendency.
Journal of Agricultural and Food Chemistry | 2010
Rosa M. Alonso-Salces; José Manuel Moreno-Rojas; Margaret V. Holland; Fabiano Reniero; Claude Guillou; Károly Héberger
(1)H NMR fingerprints of virgin olive oils (VOOs) from the Mediterranean basin (three harvests) were analyzed by principal component analysis, linear discriminant analysis (LDA), and partial least-squares discriminant analysis (PLS-DA) to determine their geographical origin at the national, regional, or PDO level. Further delta(13)C and delta(2)H measurements were performed by isotope ratio mass spectrometry (IRMS). LDA and PLS-DA achieved consistent results for the characterization of PDO Riviera Ligure VOOs. PLS-DA afforded the best model: for the Liguria class, 92% of the oils were correctly classified in the modeling step, and 88% of the oils were properly predicted in the external validation; for the non-Liguria class, 90 and 86% of hits were obtained, respectively. A stable and robust PLS-DA model was obtained to authenticate VOOs from Sicily: the recognition abilities were 98% for Sicilian oils and 89% for non-Sicilian ones, and the prediction abilities were 93 and 86%, respectively. More than 85% of the oils of both categories were properly predicted in the external validation. Greek and non-Greek VOOs were properly classified by PLS-DA: >90% of the samples were correctly predicted in the cross-validation and external validation. Stable isotopes provided complementary geographical information to the (1)H NMR fingerprints of the VOOs.
Journal of Chromatography A | 1999
Károly Héberger; Miklós Görgényi
Abstract Principal component analysis was performed on a data matrix consisting of Kovats indices of 35 aliphatic ketones and aldehydes. The calculations were carried out on the correlation matrices of Kovats indices. The Kovats indices were determined on capillary columns with four different stationary phases, namely bonded methyl- (HP-1), methylphenyl- (HP-50), and trifluoropropylmethylsiloxane (DB-210), as well as polyethylene glycol (HP-Innowax) at four different temperatures. It was found that one principal component accounts for more than 94% of the total variance in the data, indicating that the temperature does not change the dominant pattern in the data. The physical meaning attributable to the principal components, thus the most influential ones are as follows: the first principal component accounts for the boiling point (and/or the molecular mass) of carbonyl compounds whereas the second is responsible for the temperature dependence. The plots of component loadings showed a characteristic pattern (counterclockwise increasing temperature) whereas that of component scores showed a triangular structure and some groupings of oxo compounds. Abstract retention data (free from influence of temperature and column polarity) are non-linear functions of boiling points of solutes. Similarity among the solutes from the point of view of retention is represented by characteristic plots.
Analytica Chimica Acta | 2001
Tamás Körtvélyesi; Miklós Görgényi; Károly Héberger
Abstract Kovats retention indices determined on four different capillary columns (OV-1, HP-50, DB-210 and HP-Innowax) were correlated with molecular structural parameters calculated by a semiempirical quantum-chemical method PM3. Multivariate techniques: principal component analysis and cluster analysis were applied to extract the data structure. Multiple linear regression was made in forward stepwise manner to select suitable variables in the model. Basic correlations were found between the retention indices on different columns and the molecular surface, energies of highest occupied and lowest unoccupied molecular orbitals { E (HOMO) and E (LUMO)}, polarizability, and dipole moments. These correlations provide insights into the mechanism of chromatographic retention on a molecular level. They support the view: the more polar is the stationary phase, the greater is the effect of the polarity (polarizability and dipole moment) of solute molecules on the retention phenomena.
Food Chemistry | 2012
László Sipos; Zoltán Kovács; Virág Sági-Kiss; Tímea Csiki; Zoltán Kókai; András Fekete; Károly Héberger
Mineral, spring and tap water samples of different geographical origins (7 classes) were distinguished by various methods, such as sensory evaluation, electronic tongue measurement, inductively coupled plasma atomic emission spectroscopy and ion chromatography. Samples from the same geographical origin were correctly classified by chemical analysis and electronic tongue (100%), but it was found that only 80% classification rate can be achieved by sensory evaluation. Different water brands (different brand names) from the same geographical origin did not show definite differences, as expected. Forward stepwise algorithm selected three chemical parameters namely, chloride (Cl(-)), sulphate (SO(4)(2-)) and magnesium (Mg) content and two electronic tongue sensor signals (ZZ and HA) to discriminate according to the geographical origins.
Journal of Chromatography A | 2002
Károly Héberger; Miklós Görgényi; Teresa Kowalska
Temperature dependence of the Kováts retention index (I) was measured for some aliphatic ketones and aldehydes on a poly(dimethyl siloxane) (HP-1) stationary phase. An interesting minimum (non-linearity) was observed for the I versus isothermal column temperature (T) relationships. A novel empirical model is proposed: I=A+B/T+C ln T, where A, B and Care equation constants and B/C = T(min). A detailed statistical analysis clearly shows superiority of the extended model (i.e., of this containing the logarithm of the temperature (ln T) term) over the earlier established Antoine-type reciprocal equation. The minimum temperature (and the energy like quantity=RT(min), where R is the gas constant) changes in a systematic manner. The factors effecting the (RT(min)) term are as follows: (i) this term decreases with the increase of the molecular mass of the respective oxo compounds; (ii) ketones have higher absolute values of (RT(min) than aldehydes; (iii) branching of the carbon chain lowers the mentioned (RT(min)). This enthalpy term is unambiguously bound to the polarity of solutes.