The statistical methods used to analyze SMART data, which make it possible to characterize a result as positive, weakly positive, negative, or inconclusive, were first presented by Frei and Wurgler (54).

Some biological aspects of the wing spot test must be pointed out before considering the statistical analysis. First, the number of target cells in the wing primordium is not precisely known. However, we know that during the larval and early pupal stages, cells of the wing primordia undergo approx 12 rounds of division, beginning with some 10-30 cells after embryogenesis and ending up with approx 30,000 cells when cell division ceases at the onset of metamorphosis (54). In chronic exposure experiments, the number of clones per wing divided by the number of cells contained in a wing provides an overall estimate of the clone induction frequency per cell and per cell division. Second, clone size reflects time of induction, according to the number of cell division cycles undergone between induction and metamorphosis. For continuous exposure, the expected clone size distribution in the ideal situation therefore corresponds to a geometric series with frequencies decreasing by a factor of 2 as clone size (measured in numbers of cells) increases by a factor of 2 (55).

In experiments designed to assess the mutagenicity of a given chemical, most often a treatment series is compared with a control series. One might like to decide whether the compound used in the treatment should be considered as mutagenic or nonmutagenic. The formulation of two alternative hypotheses allows one to distinguish among the possibilities of a positive, weakly positive, inconclusive, or negative result of an experiment. In the null hypothesis (H0), one assumes that there is no difference in the mutation frequency between the control and treated series. Rejection of the null hypothesis indicates that the treatment resulted in a statistically increased mutation frequency. The alternative hypothesis (HA) postulates a priori that the treatment results in an increased mutation frequency compared to the spontaneous frequency. This alternative hypothesis is rejected if the observed mutation frequency is significantly lower than the postulated increased frequency. Rejection indicates that the treatment did not produce the increase required to consider the compound as mutagenic. If neither of the two hypotheses is rejected, the results are considered inconclusive, as one cannot accept at the same time the two mutually exclusive hypotheses. In the practical application of the decision procedure, one defines a specific alternative hypothesis requiring that the mutation frequency in the treated series be m times that in the control series, which is then used together with the null hypothesis. It may happen in this case that both hypotheses have to be rejected. This would mean that the treatment is weakly mutagenic, but leads to a mutation frequency that is significantly lower than m times the control frequency (54).

In the wing spot assay, it is customary to assess genotoxicity not only for the total number of spots recovered, but also to distinguish twin spots from single spots, because twin spots are uniquely produced by mitotic recombination, whereas single spots can be produced by various mechanisms.

To assess negative results, empirically chosen multiplication factors (m) were originally introduced for testing (4,54); these are m=2 for both total spots and small single spots, because of their high spontaneous frequencies, and m=5 for both the rare spontaneous large single spots and twin spots (56).

In order to minimize the chance of inconclusive results, the statistical tests should be made sufficiently powerful. This can be achieved by planning optimal experimental sample sizes. For an experiment with p<5%, and tested in both directions, we need a control sample of such a size as to make the expected average yield be 32.5 spots on all control flies together. This figure is independent of stocks and test systems and is determined exclusively by theoretical parameters (i.e., the significance level [p<5%] and the minimal risk [doubling effect]) we have opted for as well as the power we require (95% correct decisions). In the standard mwh/flr3 wing spot test with a spontaneous frequency regularly of approx 0.6 spots per fly, this corresponds to an optimal sample size of 55 flies (both wings analyzed) (56).

Determination of the optimal sample size depends on (1) the optimally sufficient number of spots expected in the control sample, which is a theoretical parameter, and (2) the mean frequency of spontaneous spots per individual, which is an empirical parameter. Although the former is independent of the particular strain or strain combinations used in experimentation, the latter is not. Therefore, working groups using the present method to find the optimal experimental sample size should base their sample size estimations on the specific spontaneous spot frequencies, which their strains or strain combinations show (56).

Normally, two or more experiments are performed with a test compound, and if no statistical differences are found between them, the data are pooled. Depending on the data, one can use different statistical tests to check for homogeneity/heterogeneity. In this case, if the individual series do not show overdispersion, the chi-squared test for proportion may be used. On the other hand, if there is overdispersion within samples, the Kruskal-Wallis H-test is more reliable, because the chi-squared (%2) test may be too liberal (Frei, personal communication) (see Note 6).

Pooled negative controls may be useful to estimate parameters (e.g., an optimal estimation of spontaneous spot frequencies). However, because of the possibility of heterogeneity among control samples, it is always advisable to carry out a parallel control and, for significance testing, to compare the experimental samples with the parallel control (Frei, personal communication).

In order to minimize the risk of false-positive or false-negative test results, the minimum necessary requirements are (1) that each treatment series be accompanied by a concurrent control series, (2) that for each experiment the ratio between the number of treated flies and the number of control flies examined be the same, and (3) that for the control and the treatment group in each experiment, the ratio between females and males examined be the same (56).

To test the two hypotheses, several tests are suitable and almost equivalent: (1) The conditional binomial test (Kastenbaum and Bowman test) is recommended if the spot number is small; (2) the x2 test for proportions is used if the expected number of mutations in the control and treatment series are not too small (say, >5 each); (3) the G test (log-likelihood ratio test) and (4) the £/-test (Mann-Whitney test) with correction for ties are used if the individual variability (within experiments, within sexes) contributes significantly to overdispersion.

3.4.6. The x2 Test for Proportions

3.4.6.1. Assessment of Positive Results: Testing Against the Null Hypothesis (H0)

H0: No difference between control and treatment group.

In an experiment with Nc untreated flies in the control and Nt treated flies in the treatment series, we test against the null hypothesis H0 that wing spots are not increased in frequency in the experimental group. The expectation of nc spots for the control flies and the expectation of nt spots for the treated flies is, in each case, proportional to the numbers of flies in each group, n being the total number of mutations recovered in both series together.

Provided the respective expected numbers of mutations in the control and treatment series are not too small (say, >5 each), the x2 test for proportions may be used to test against H0 and HA. It may be recalled that with a sufficiently large n, the x2 test is equivalent to the binomial test. Frei and Wurgler (54) proposed to use the x2 test with Yates' continuity correction, because with that approximation, the probabilities P0 and PA, corresponding to the respective calculated x2 values, become almost the same as with the conditional binomial test.

To illustrate how the calculations are carried out, we use the data from a treatment with docetaxel (0.005 mM) and the corresponding control data published in ref. 18. We test against the proportionality p0 : q0 among the observed total spots, whereby p0 and q0 are the proportions of control and treated flies respectively (note: p0 + q0 = 1).

The number of flies in the control was Nc = 100; the total number of spots in this series was nc = 46, which gives the frequency of spots per fly for the control:

In the experimental series, the number of flies was Nt = 60 ; the number of spots was nt = 47, with a resulting frequency of spots per fly of ft = n/Nt = 47/60 = 0.783 (2)

From the data, one estimates that the frequency in the experimental series is 1.703 times the frequency in the control:

The proportion of wing spots expected in the control is p0 = Nc/(Nc + N) = 100/(100 + 60) = 0.625 (4)

and in the experimental series is q0 = 1 - p0 = N/(Nc + N) = 60/(100 + 60) = 0.375 (5)

if the general incidence were the same in the two groups. Considering the number of spots in the control and experimental series together (n = 93) and applying Yates' correction, it is possible to test against H0 by calculating

X2 = {[(Inc - Pen! - 1/2)2]/p0n} + {[(in, - qcni - 1/2)2] / q0n}

+ {[(I47 - 0.375 x 93I - 1/2)2] / 0.375 x 93} = 6.200

Use a x2-table and look up the probability p = a of the calculated x2. The test is one sided as long as we are only interested in proving an increase in spot frequency in the treated group. A two-sided test (p=2a) is indicated in comparisons whose interest lies in significant disproportions in both directions (e.g., if we ask whether the two sexes in a treatment group react differently).

X2 = 6.200 exceeds the value x2 (a = 0,05, v = i) = 2.706 tabulated for the onesided test; thus, H0 is rejected.

3.4.6.2. Assessment of Negative Results: Testing Against the Alternative Hypothesis (HA)

Ha: Treated flies have m-times more spots than untreated ones.

One may be interested in "proving" that a substance is not hazardous. In this case, one tries to exclude the possibility that the spots observed could be the results of a mutagenic effect of the substance. A minimal risk cannot be excluded, but one may be able to exclude significantly a certain multiple (m) of the spontaneous frequency; that is, one may be able to demonstrate that the effect is significantly below a doubling of the spontaneous frequency (m=2, used for small single and total spots).

Under this hypothesis, the expected spot numbers are also proportional to the fly numbers (pA : qA), but they differ in addition, because the theory postulates that spots are found in proportions 1 : m in control and treated groups (note: pa + qA = 1).

For testing against HA, the expectations change according to the multiple m we are testing against (here, m = 2, because me = 1.7). So, we have pA = Nc / (Nc + mNt) = 100 / (100 + 2 x 60) = 0.45455 (7)

and qA = 1 - pA = mN, / (Nc + mN) = 2 x 60 / (100 + 2 x 60) = 0.54545 (8)

which represent the respective proportions in which the spots would be expected in the control and experimental series if HA was true. Again, using Yates' correction, we test against this hypothesis by calculating

X2 = {[(Inc -pAnl - 1/2)2] /pAn} + {[(In, - qAnl - 1/2)2] / qAn} =

= {[(I46 - 0.45455 x 93I - 1/2)2] / 0.45455 x 93} (9)

+ {[(I47 - 0.54545 x 93I - 1/2)2] / 0.54545 x 93} = 0.452

which is less than the value x2 (p = 0, 05, v = i) = 2.706 tabulated for the one-sided test and, thus, HA, is accepted. Having rejected H0 and accepted A we conclude the test substance has a significant mutagenic effect (see Note 7).

In an experiment, the number of mutations in the control series can theoretically take any value, from 0 to n, and the number of mutations in the treated series can have any value, from n to 0. One calculates the binomial distributions (based on p0, q0, and n under H0, and based on pA, qA, and n under HA, already calculated for the x2 test) to determine the probabilities with which all the different possible results of an experiment are expected, with n mutations overall.

The respective significance levels at which we decide to test for rejection of H0 and HA were denoted by a and p, respectively. Conceptually, both tests are one sided. The opposite nature of the hypotheses requires that the cumulative probabilities (P0 and PA) be calculated from the opposite extreme ends of the respective binomial distributions (54).

According to the rationale set out, H0 is rejected in the binomial test if

j=0 1 r=ntr and, by analogy, HA is rejected if

PA = 5=0 |n) q^A"- = Lnc r"' pArqAn-r <P (11)

The tables of Kastenbaum and Bowman (57) for the conditional binomial test can be used for the test of both hypotheses. For rejection of H0 and HA, the frequencies q0 and pA, respectively, should be used to look up the corresponding limit numbers in the tables. H0 is rejected if the number of mutations in the treated group (nt) is larger than or equal to the tabulated value; HA is rejected if

Calculation Steps of the Mean mwh Clone Size Class (i), With and Without Clone Size Correction, Induced After Treatment With 0.05 mM of Camptothecinb o iv)

Clone size |
mwh spot |
mwh spot |
Corrected | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

category (i)a |
Freq. |
number |
Freq X |
i |
Freq. |
number |
Freq X |
i |
frequency |
Freq X i | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1 (1 cell) |
0.192 |
15 |
1 X 0.192 = |
0.192 |
0.60 |
12 |
1 X 0.6 = |
0.60 |
0.6-0.192 = 0.408 |
0.408 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

2 (2 cells) |
0.218 |
17 |
2 X 0.218 = |
0.436 |
1.10 |
22 |
2 X 1.1 = |
2.20 |
1.1-0.218 = 0.882 |
1 .1 64 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3 (3-4 cells) |
0.026 |
2 |
3 X 0.026 = |
0.078 |
1.60 |
32 |
3 X 1.6 = |
4.80 |
1.6-0.026 = 1.574 |
4.722 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

4 (5-8 cells) |
0.013 |
1 |
4 X 0.013 = |
0.052 |
1.20 |
24 |
4 X 1.2 = |
4.80 |
1.2-0.013 = 1.187 |
4.748 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

5 (9-16 cells) |
0.013 |
1 |
5 X 0.013 = |
0.065 |
0.90 |
18 |
5 X 0.9 = |
4.50 |
0.9-0.013 = 0.887 |
Negative Control* (4% ethanol + 4% Tween-80) Camptothecin 0.05 mmb Camptothecin, 0.05 mm, N = 78 flies N = 20 flies control corrected
Was this article helpful? |

## Post a comment