By not being able to differentiate between valid and phoney papers on the web, Scholar Metrics stands poised to become untrustworthy.

I spotted an interesting pre-print paper on the arXiv server last week, on how Google Scholar's citation metrics could be manipulated to easily but maliciously increase the H-index of other papers on the web.

As described in the paper, in an experiment, a group of researchers from the Universities of Grenada and Navarra created false working papers, cited existing papers by other individuals, and uploaded them onto a personal page on the University of Grenada website.

When Google Scholar (CS) cached the page a month later, on 12 May, 2012, it saw that many of the papers had cited a few articles, and consequently increased the H-index of those articles.

The H-index is a tricky but very useful measure of the productivity of authors. To wit, if you have published 10 papers, and at least two of those papers have two citations each, and the remaining eight have no more than two citations each, then your H-index is 2.

In other words, if you assigned the number of citations your papers have each received in descending order on the x-axis, and the number of citations on the y-axis, you'd see a curve going from left-top to the right-bottom. The side-length of the largest square that can be fit under this curve is your H-index.

And now, by not being able to differentiate between valid and phoney papers, Google seems to have lent itself to an issue with disparaging consequences for the online referencing community. For perspective: Here's a screenshot from the paper on the effect the experiment seems to have had on the "victim" researchers' work.

The H-index for the three authors has increased at least by 2 and at most by 5. Consider Lopez-Cozar's case especially: While his H-index has jumped by 2 for all his papers, his record since 2007 saw a H-hike by 5. Among the three researchers, Lopez-Cozar is also the one to have received the most citations and, therefore, the biggest i10-index increase (by 22).

That's alarming. Since the H-index is an important qualifying metric as well - one that could decide if you're tenured or not in colleges! - falsifying it by getting pseudo-papers to cite it is a dangerous proposition.

Even worse, my H-index could be falsified by strangers at the other end of the world, while I end up getting the cane from my boss. Another possibility is editors engineering the H-indices of papers submitted to them in order to boost the rankings of the publications they control!

I think Google has fallen short in its assessment of its impact. While the search engine itself is envisioned as a data-mining tool, the moment it decides to evaluate papers, it must've moved into the arena with a rigid monitoring system in place. At the least, it could've provided for other researchers to "mark down" existing citations on GS so that keeps the bibliometric numbers meaningful.

This solution, however, is easier said than done.

For the moment, all models envisioned to function just online cannot replace the traditional peer-reviewing system. Note that this system also comprises a controlling authority between the process of citing and the acknowledgment of a citation. It is onto this slippery slope that Google Scholar has slipped into.