## Variability

Medicine, by one definition, is applied biological science and biological systems are characterised by variability. Three major types of variability are recognised:

• within individual,

• between individuals,

If the serum sodium concentration is measured in the same individual on different occasions it will be different on each occasion, although lying within a fairly narrow range. The same phenomenon is found with many other parameters. Similarly, if the same parameter is measured in different individuals on the same occasion, a marked variation will be noted. Height is an obvious example. The existence of such variability means that care must be exercised to ensure that differences between readings before and after treatment or between different treatment groups are not just a result of natural variation.

Where variability exists, the individuals within the sample can be arranged into a frequency distribution, showing the number of observations at different values or within certain ranges. When the sample is large enough, the frequency distribution will approach that of the population and will form a smooth curve with high frequencies at the centre of the distribution (peak) and low values at the ends (lower and upper tails). Many biological variables are described by a symmetrical distribution, known as a normal or Gaussian distribution (Fig. 10.1), but other distributions are also found. Many of the commonly used techniques of statistical analysis are based on the assumption that the data being tested is normally distributed. If this is not the case, the deductions made from the statistical analysis will be incorrect. Checks for normality of data should be made before using these statistical tests, or else alternative, non-parametric tests should be used. Non-parametric tests are less sensitive but more robust.

Quantitative data can be summarised by two measurements: one indicating the average value and the other the spread of the values. The commonest measure of the average value is the arithmetic mean which is usually referred to simply as the mean. This is calculated as the sum of the observations divided by the number of the observations. The median is the value that divides the distribution in half. When the distribution is skewed the median is a better estimate of the average value than the mean. When the distribution is normal the two values coincide. The mode is the value which occurs most frequently but it is seldom used.

The simplest indicator of variability is the range, which is the difference between the highest and lowest values. It does not give any indication of how the observations are arranged between the extremes and a single extreme reading can lead to a very large range. It is better to express variation in terms of deviation from the mean. This value, the variance is calculated as the average of the squares of the differences of each observation from the mean. The variance is useful in statistical theory but for understanding the relevance of the observations (in terms of the units of measurement), it is customary to take the square root of the variance, which is known as the standard deviation. For a normal distribution approximately 70% of the results lie within 1 standard deviation and 95% lie within 2 standard deviations. What has been measured is the mean and standard deviation of the sample. This will almost certainly differ from the mean and standard deviation for the population from which the sample is drawn and ultimately it is the results for the population which actually matter. If a sufficient number of samples is taken, a frequency distribution of sample means will be obtained. The mean of this distribution would be the population mean and the standard deviation of this distribution is the standard error of the mean, which gives an estimate of how close to the population mean the sample mean is. The standard error of the mean can be calculated without taking multiple samples and is equal to the standard deviation divided by the square root of the number of observations. The standard error of the mean takes into account the size of the sample as well as the variability.

It is possible to calculate a range of likely values for the true population mean based on the sample mean and the standard error of the mean. This is usually expressed as the 95% confidence interval and is now regarded as the preferred way of expressing results.

0 0