Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rüdiger Mutz is active.

Publication


Featured researches published by Rüdiger Mutz.


Journal of Informetrics | 2011

A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants

Lutz Bornmann; Rüdiger Mutz; Sven E. Hug; Hans-Dieter Daniel

This paper presents the first meta-analysis of studies that computed correlations between the h index and variants of the h index (such as the g index; in total 37 different variants) that have been proposed and discussed in the literature. A high correlation between the h index and its variants would indicate that the h index variants hardly provide added information to the h index. This meta-analysis included 135 correlation coefficients from 32 studies. The studies were based on a total sample size of N=9005; on average, each study had a sample size of n=257. The results of a three-level cross-classified mixed-effects meta-analysis show a high correlation between the h index and its variants: Depending on the model, the mean correlation coefficient varies between .8 and .9. This means that there is redundancy between most of the h index variants and the h index. There is a statistically significant study-to-study variation of the correlation coefficients in the information they yield. The lowest correlation coefficients with the h index are found for the h index variants MII and m index. Hence, these h index variants make a non-redundant contribution to the h index.


Journal of Informetrics | 2011

Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization

Lutz Bornmann; Rüdiger Mutz

Since many years the so called crown indicator (CPP/FCSm) for field normalization of the Centre for Science and Technolgy Studies (CWTS, Leiden, The Netherlands) has been defined as being the standard in the evaluative bibliometric practice n many different contexts. The publication of the paper by Opthof and Leydesdorff (2010) was a starting point in the field of valuative bibliometrics to challenge this CWTS standard indicator for evaluative purposes. Meanwhile, the paper of Opthof nd Leydesdorff (2010) has been extensively discussed (Van Raan, Van Leeuwen, Visser, Van Eck, & Waltman, 2010), e.g., uring the Science and Technology Indicators conference 2010 in Leiden (Anon, 2010), and a new crown indicator was preented by CWTS: the mean normalized citation score (MNCS) (Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011). At ts heart both, the previous and new indicator differentiates as follows: whereas for the previous crown indicator the average itation rate over all papers in a publication set is calculated and then the citation rate is field-normalized, for the new crown ndicator each paper’s citation impact in a paper set becomes field-normalized, before an average value (harmonic average) ver the field-normalized impact values is calculated. Not only for the previous but also for the new crown indicator the disadvantage of resting on the arithmetic average xists (Leydesdorff & Opthof, in press): the mean citation impact values calculated for different fields are arithmetic averages nd the crown indicators rest on arithmetic averages of ratios or ratios of arithmetic averages. A number of publications has ointed out that in the face of non-normal distributed citation data, the arithmetic mean value is not appropriate as a measure f central tendency. It can give a distorted picture of the kind of distribution (Bornmann, Mutz, Neuhaus, & Daniel, 2008), and it is a rather crude statistic” (Joint Committee on Quantitative Assessment of Research, 2008, p. 2). As the distribution of itation counts is usually right-skewed, distributed according to a power law (Joint Committee on Quantitative Assessment f Research, 2008), arithmetic average citation rates mainly show where papers with high citation counts are to be found. ccording to Evidence Ltd (2007)“where bibliometric data must stand-alone, they should be treated as distributions and ot as averages” (p. 10). What is more, the calculation of ratios runs into serious problems regarding the interpretation of itation impact as low or high (or excellent). The interpretation is more or less arbitrary without using reference distributions. ince many years, the evaluation of the citation performance of research groups as far below (<0.5) or far above (>1.5) the nternational citation impact standard has been based on cut-off points developed by personal experiences (see here van aan, 2005). In the following we will present an extension of the proposals of Bornmann (2010) for an improved practise of fieldormalized citation performance measurement. The extension is intended to calculate a single measure for the citation mpact of a group of scientists that is not based on the arithmetic average but uses reference distributions. The measure llows – similar to the previous and new crown indicator – to compare groups of scientists by using one single number. he proposals of Bornmann (2010) are based on the calculation of percentiles. The use of percentiles for citation data is ery advantageous, because no assumptions have to be made as to the underlying distribution of citations (Boyack, 2004). ith percentiles each paper in a paper set of a research group can be field-normalized with a matching reference standard. o generate the reference standard for a paper in question all papers published in the same year, with the same document ype and belonging to the same field (defined by a discipline-oriented database) are categorized into six percentile impact lasses: 99th – top 1%, 95th, 90th, 75th, 50th, and <50th – bottom 50% (following the approach of the National Science Board, 010) (see here also Bornmann, de Moya-Anegón, & Leydesdorff, 2010). Through the use of the citation limits that define hese classes the paper in question can be assigned to one of the six citation impact classes. This procedure is repeated for ll papers published by a research group (The use of the percentile for each paper instead of the corresponding percentile mpact class might be more preferable (e.g., resulting in a higher power), but this is a very cumbersome and expensive task or a bigger publication set.). Bornmann (2010) presents the results of an evaluative citation analysis calculated with fictitious bibliometric data for hree research groups. Table 1 shows this data with some additional numbers. As the table reveals the papers of the groups ere categorized into six percentile impact classes (99th – top 1%, 95th, 90th, 75th, 50th, and <50th – bottom 50%). First of all,


Review of Educational Research | 2009

Gender Effects in the Peer Reviews of Grant Proposals: A Comprehensive Meta-Analysis Comparing Traditional and Multilevel Approaches:

Herbert W. Marsh; Lutz Bornmann; Rüdiger Mutz; Hans-Dieter Daniel; Alison J O'Mara

Peer review is valued in higher education, but also widely criticized in terms of potential biases, particularly gender. We evaluate gender differences in peer reviews of grant applications, extending Bornmann, Mutz, and Daniel’s meta-analyses that reported small gender differences in favor of men (d = .04), but a substantial heterogeneity in effect sizes that compromised the robustness of their results. We contrast these findings with the most comprehensive single primary study (Marsh, Jayasinghe, and Bond) that found no gender differences for grant proposals. We juxtapose traditional (fixed- and random-effects) and multilevel models, demonstrating important advantages to the multilevel approach. Consistent with Marsh et al.’s primary study, there were no gender differences for the 40 (of 66) effect sizes from Bornmann et al. that were based on grant proposals. This lack of a gender effect for grant proposals was very robust, generalizing over country, discipline, and publication year


PLOS ONE | 2010

A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

Lutz Bornmann; Rüdiger Mutz; Hans-Dieter Daniel

Background This paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews. IRR is defined as the extent to which two or more independent reviews of the same scientific document agree. Methodology/Principal Findings Altogether, 70 reliability coefficients (Cohens Kappa, intra-class correlation [ICC], and Pearson product-moment correlation [r]) from 48 studies were taken into account in the meta-analysis. The studies were based on a total of 19,443 manuscripts; on average, each study had a sample size of 311 manuscripts (minimum: 28, maximum: 1983). The results of the meta-analysis confirmed the findings of the narrative literature reviews published to date: The level of IRR (mean ICC/r2 = .34, mean Cohens Kappa = .17) was low. To explain the study-to-study variation of the IRR coefficients, meta-regression analyses were calculated using seven covariates. Two covariates that emerged in the meta-regression analyses as statistically significant to gain an approximate homogeneity of the intra-class correlations indicated that, firstly, the more manuscripts that a study is based on, the smaller the reported IRR coefficients are. Secondly, if the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient than if the information was not conveyed. Conclusions/Significance Studies that report a high level of IRR are to be considered less credible than those with a low level of IRR. According to our meta-analysis the IRR of peer assessments is quite limited and needs improvement (e.g., reader system).


Journal of Informetrics | 2010

The h index research output measurement: Two approaches to enhance its accuracy

Lutz Bornmann; Rüdiger Mutz; Hans-Dieter Daniel

The h index is a widely used indicator to quantify an individuals scientific research output. But it has been criticized for its insufficient accuracy—the ability to discriminate reliably between meaningful amounts of research output. As a single measure it cannot capture the complete information on the citation distribution over a scientists publication list. An extensive data set with bibliometric data on scientists working in the field of molecular biology is taken as an example to introduce two approaches providing additional information to the h index: (1) h2 lower, h2 center, and h2 upper are proposed, which allow quantification of three areas within a scientists citation distribution: the low impact area (h2 lower), the area captured by the h index (h2 center), and the area of publications with the highest visibility (h2 upper). (2) Given the existence of different areas in the citation distribution, the segmented regression model (sRM) is proposed as a method to statistically estimate the number of papers in a scientists publication list with the highest visibility. However, such sRM values should be compared across individuals with great care.


Journal of the Association for Information Science and Technology | 2013

Multilevel-Statistical Reformulation of Citation-Based University Rankings: The Leiden Ranking 2011/2012

Lutz Bornmann; Rüdiger Mutz; Hans-Dieter Daniel

Since the 1990s, with the heightened competition and the strong growth of the international higher education market, an increasing number of rankings have been created that measure the scientific performance of an institution based on data. The Leiden Ranking 2011/2012 (LR) was published early in 2012. Starting from Goldstein and Spiegelhalters (1996) recommendations for conducting quantitative comparisons among institutions, in this study we undertook a reformulation of the LR by means of multilevel regression models. First, with our models we replicated the ranking results; second, the reanalysis of the LR data showed that only 5% of the PPtop10% total variation is attributable to differences between universities. Beyond that, about 80% of the variation between universities can be explained by differences among countries. If covariates are included in the model the differences among most of the universities become meaningless. Our findings have implications for conducting university rankings in general and for the LR in particular. For example, with Goldstein-adjusted confidence intervals, it is possible to interpret the significance of differences among universities meaningfully: Rank differences among universities should be interpreted as meaningful only if their confidence intervals do not overlap.


Zeitschrift fur Psychologie | 2012

Does Gender Matter in Grant Peer Review?: An Empirical Investigation Using the Example of the Austrian Science Fund.

Rüdiger Mutz; Lutz Bornmann; Hans-Dieter Daniel

One of the most frequently voiced criticisms of the peer review process is gender bias. In this study we evaluated the grant peer review process (external reviewers’ ratings, and board of trustees’ final decision: approval or no approval for funding) at the Austrian Science Fund with respect to gender. The data consisted of 8,496 research proposals (census) across all disciplines from 1999 to 2009, which were rated on a scale from 1 to 100 (poor to excellent) by 18,357 external reviewers in 23,977 reviews. In line with the current state of research, we found that the final decision was not associated with applicant’s gender or with any correspondence between gender of applicants and reviewers. However, the decisions on the grant applications showed a robust female reviewer salience effect. The approval probability decreases (up to 10%), when there is parity or a majority of women in the group of reviewers. Our results confirm an overall gender null hypothesis for the peer review process of men’s and women’s grant applications in contrast to claims that women’s grants are systematically downrated.


PLOS ONE | 2012

Heterogeneity of Inter-Rater Reliabilities of Grant Peer Reviews and Its Determinants: A General Estimating Equations Approach

Rüdiger Mutz; Lutz Bornmann; Hans-Dieter Daniel

Background One of the most important weaknesses of the peer review process is that different reviewers’ ratings of the same grant proposal typically differ. Studies on the inter-rater reliability of peer reviews mostly report only average values across all submitted proposals. But inter-rater reliabilities can vary depending on the scientific discipline or the requested grant sum, for instance. Goal Taking the Austrian Science Fund (FWF) as an example, we aimed to investigate empirically the heterogeneity of inter-rater reliabilities (intraclass correlation) and its determinants. Methods The data consisted of N = 8,329 proposals with N = 23,414 overall ratings by reviewers, which were statistically analyzed using the generalized estimating equations approach (GEE). Results We found an overall intraclass correlation (ICC) of reviewer? ratings of ρ = .259 with a 95% confidence interval of [.249,.279]. In humanities the ICCs were statistically significantly higher than in all other research areas except technical sciences. The ICC in biosciences deviated statistically significantly from the average ICC. Other factors (besides the research areas), such as the grant sum requested, had negligible influence on the ICC. Conclusions Especially in biosciences, the number of reviewers of each proposal should be increased so as to increase the ICC.


Journal of the Association for Information Science and Technology | 2013

Do universities or research institutions with a specific subject profile have an advantage or a disadvantage in institutional rankings

Lutz Bornmann; Félix de Moya Anegón; Rüdiger Mutz

Using data compiled for the SCImago Institutions Ranking, we look at whether the subject area type an institution (university or research-focused institution) belongs to (in terms of the fields researched) has an influence on its ranking position. We used latent class analysis to categorize institutions based on their publications in certain subject areas. Even though this categorization does not relate directly to scientific performance, our results show that it exercises an important influence on the outcome of a performance measurement: Certain subject area types of institutions have an advantage in the ranking positions when compared with others. This advantage manifests itself not only when performance is measured with an indicator that is not field-normalized but also for indicators that are field-normalized.


Research Evaluation | 2009

Are there really two types of h index variants? A validation study by using molecular life sciences data

Lutz Bornmann; Rüdiger Mutz; Hans-Dieter Daniel; Anna Ledin

Due to the disadvantages of the h index that have been named since Hirschs first publication of the index in 2005 (Hirsch, 2005), a number of variants that are intended to compensate for the weaknesses have been proposed. Bornmann et al (2008a, 2009b) tested (1) whether the variants developed are associated with an incremental contribution for evaluation purposes against the h index, (2) whether there is any need at all for the h index and its variants besides standard bibliometric measures and (3) which of the h index and its variants predict peer assessments of scientific performance at best. As all results of Bornmann et al (2008a, 2009b) are based on bibliometric data on post-doctoral research fellowship recipients of the Boehringer Ingelheim Fonds, it will be important to test whether the results can be validated using other data sets. Therefore, we examined in this study 693 applicants to the Long-Term Fellowship programme of the European Molecular Biology Organization whether the results found by Bornmann et al (2008a, 2009b) can be validated using another data set and further h index variants. All in all, with the findings in this study all results to the h index and its variants could be validated that are reported in Bornmann et al (2008a, 2009b). Copyright , Beech Tree Publishing.

Collaboration


Dive into the Rüdiger Mutz's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ute Seeling

University of Freiburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Félix de Moya Anegón

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge