In biological and especially in medical investigation there is usually considerable variation in any phenomenon studied. A major contribution of statistical methodology is in quantifying variability and in distinguishing random variation from genuine effects. Statistical significance tests (or confidence intervals) are one way of classifying genuine as opposed to spurious effects. The object of most fraud is to demonstrate a "statistically significant" effect that the genuine data would not show.
Most medical research looks for evidence of some effect such as a new treatment for a disease. The effect will be shown by differences between a treated and a control group. In a statistical significance test (or confidence interval), there are three components that are combined to show the strength of the evidence:
• the magnitude of the effect
• the variability of individuals in the study
• the number of individuals studied.
Evidence for the existence of an effect is when the first and third are as large as possible and when the second is as small as possible. Most fraud intended to demonstrate an effect consists of reducing variability and increasing the number of individuals.
In some senses the opposite problem occurs when a study is designed to show the similarity of two treatments. These "equivalence trials", or often in practice, "non-inferiority trials", are particularly vulnerable to invented data. These types of trial have an increasing place in the development of drugs by the pharmaceutical industry. There are ethical pressures to avoid the use of placebos, but the consequence is that introducing a lot of "noise" can make treatments look similar. Increase in individual variability leads to failing to find statistically significant differences, especially if the mean is similar in the two groups, as happens when "noise" is introduced. Invented data can make treatments appear similar, and hence a new product could be licensed on the basis that it is similar to an older product for which there is evidence of efficacy. This can have adverse consequences for public health. Adverse effects could also appear to be similar between treatments if data are invented.
The large amount of variability in genuine data not only makes the use of statistical methods important in medical research, but also tends to be hidden when summary statistics of data are presented. Whenever summaries of data are presented, they should be accompanied by some measure of the variability of the data.
Wherever measurement involves human judgment, even in reading data from an instrument, the data show special features. One of these features is "digit-preference". This is well known in blood pressure, where clinical measurements tend to be recorded to the nearest 5 mmHg or 10mmHg. Where it is recorded to 5 mmHg, it is still more usual to find values with a last digit of 0 than of 5. In research, measurements may be made to the nearest 2 mmHg.
Another example is in recording babies' birthweights. Figure 14.1 shows an example of data from a large national study done in 1958. In this diagram the weights are shown to the nearest ounce, and it is clear that certain values are much preferred to others. Whole pounds are the most frequent, with half and quarter pounds and two ounces being progressively less "popular", while very few weights are recorded to the nearest ounce. In data measured in the metric system, the preferences are for values recorded to the nearest 500, 250 or 100 grams.
Errors of measurement and of recording occur in all genuine data, and some individuals show an idiosyncratic response. These factors result in
900 800 700 > 600 S 500 H £ 400 £ 300 200 100 0
Divisions of a pound
5 6 7 8 9 10 11 12 13 14 Birthweight in pounds
Source - National child development survey
Figure 14.1 Value preference in recording babies' birthweights.
"outliers" occurring in the data. When they are genuine values they should be included in the analysis, but when there is good external evidence that they are errors (for example, age = 142 years) they should be corrected or eliminated. Most effort in data checking is directed towards these values as noted above.
Was this article helpful?