Data is increasingly used to control science. Research evaluations that were once pre-scheduled and carried out by peers are now routine and metrics dependent.
The problem is that evaluation is now led by data rather than judgement. Metrics have proliferated: usually with good intentions, not always well informed, often applied incorrectly. We run the risk of harm with tools designed to improve systems, as evaluations are increasingly implemented by organizations without any knowledge, or advice, of good practice and interpretation.
Before 2000, there was a Science Citation Index on CD-ROM from the Institute for Scientific Information (ISI), used by experts for expert analysis. In 2002, Thomson Reuters launched a unified web platform, making the Web of Science database widely accessible.
Competitive citation indices were created: Elsevier’s Scopus (released 2004) and Google Scholar (beta version released in 2004). Web-based tools were introduced to easily compare institutional research productivity and impact, such as InCites (using the Web of Science) and SciVal (using Scopus), as well as individual citation profiles using Google Scholar Software to analyze (publish or perish, released in 2007).
In 2005, George Hirsch, a physicist at the University of California, San Diego, proposed the h-index, which popularized citation calculations for individual researchers. Interest in the journal Impact Factor grew steadily after 1995 (see ‘Impact-Factor Obsession’).
Recently, metrics related to social usage and online commenting have gained momentum – F1000Prime was founded in 2002, Mendeley in 2008, and Altmetric.com (supported by Macmillan Science & Education, which owns Nature Publishing Group) in 2011.
As scientists, social scientists and research administrators, we have seen with growing alarm the widespread mis-use of indicators for evaluating scientific performance. The following are just a few of the myriad examples. Across the world, universities have become obsessed with their position in global rankings (such as the Shanghai Ranking and the Times Higher Education list), even though such lists are, in our view, based on inaccurate data and arbitrary indicators.
Some recruiters request h-index values for candidates. Many universities make promotion decisions based on threshold h-index values and the number of articles in ‘high-impact’ journals. Researchers’ CVs have become opportunities to make claims about these scores, especially in biomedicine. Everywhere, supervisors ask PhD students to publish in high-impact journals and obtain external funding before they are ready.
In Scandinavia and China, some universities allocate research funding or bonuses based on a number: for example, by computing individual impact scores to allocate ‘performance resources’ or a bonus to researchers for a publication in a journal. by giving an impact factor higher than 15 (ref. 2).
In many cases, researchers and evaluators still make balanced decisions. Yet the misuse of research metrics has become too widespread to ignore.
We therefore present the Leiden Manifesto, named after the convention in which it crystallized (see http://sti2014.cwts.nl). Its ten principles are not news to Scientometrics, although none of us will be able to read them in their entirety because of the lack of codification so far. Veterans of the field like Eugene Garfield (founder of ISI) are on record stating some of these theories.
But they are not in the room when assessors report back to university administrators who are not experts in the relevant methodology. Scientists searching the literature with whom to contend an evaluation find material scattered in obscure journals that they do not have access to.
We offer this distillation of best practice in metrics-based research evaluation so that researchers can take evaluators into account, and evaluators their indicators.
1) Quantitative assessment should support qualitative, expert evaluation. Quantitative metrics can challenge the tendency to bias in peer review and facilitate deliberation. This should strengthen peer review, as it is difficult to make decisions about peers without a lot of relevant information.
However, assessors should not be tempted to hand over data to the decision-making process. Indicators should not substitute for informed judgment. Everyone is responsible for his own assessment.
2) Measure performance against the research missions of the institution, group or researcher. Program goals should be stated at the outset, and the indicators used to evaluate performance should be clearly related to those goals.