Round numbers are always false.
The very use of the word "data" (from the Latin "given") suggests that those who receive data for analysis do so in the spirit of a gift. To question the veracity of data is then to look the gift horse in the mouth. It may be wise, however, to have at least the level of suspicion that the Trojans were advised to have when confronted with their gift horse.
The emphasis in statistical textbooks and training is on analysing the data, assuming they are genuine. Checking the data is sometimes mentioned,1'2 although this is directed mainly at the possibility of accidental errors rather than at deliberate falsification. Altman notes that, "It is the large errors that can influence statistical analyses."2 Accidental errors that are large clearly affect the analysis, but alteration or invention of data will be done in a way to attempt to conceal their false nature. The features of these false data will not be the same as ordinary errors. In spite of this, careful use of the best procedures for data checking with some simple extensions will go a long way to detect many instances of fraud.This chapter will outline routine methods for checking data that will help to correct accidental errors as well as the problems that are the target of this book. The accidental errors in data are (one hopes) more frequent than deliberate ones, so that effort expended in checking data will result in better quality reports of medical science even in the absence of attempts to cheat.
The power of modern computer programs for statistical data analysis is a great help with this type of data checking, but also facilitates fabrication of data. More emphasis in the training of statisticians needs to be given to this aspect of data analysis, since a perfect analysis on the wrong data can be much more dangerous than an imperfect analysis of correct data. It has been suggested that statisticians should not publish too much on the methods to detect fraud, since then those wishing to pervert science without being caught will learn how to avoid detection. Whilst this could be
*The author is grateful for comments from Dr P Lachenbruch in revising this chapter. 186
partially true, there will always be new, as yet unpublished, methods that will be invented by the vigilant statistician to detect invented data.
Was this article helpful?