Statistical Analysis

The statistical methods used to analyze SMART data, which make it possible to characterize a result as positive, weakly positive, negative, or inconclusive, were first presented by Frei and Wurgler (54).

Some biological aspects of the wing spot test must be pointed out before considering the statistical analysis. First, the number of target cells in the wing primordium is not precisely known. However, we know that during the larval and early pupal stages, cells of the wing primordia undergo approx 12 rounds of division, beginning with some 10-30 cells after embryogenesis and ending up with approx 30,000 cells when cell division ceases at the onset of metamorphosis (54). In chronic exposure experiments, the number of clones per wing divided by the number of cells contained in a wing provides an overall estimate of the clone induction frequency per cell and per cell division. Second, clone size reflects time of induction, according to the number of cell division cycles undergone between induction and metamorphosis. For continuous exposure, the expected clone size distribution in the ideal situation therefore corresponds to a geometric series with frequencies decreasing by a factor of 2 as clone size (measured in numbers of cells) increases by a factor of 2 (55).

In experiments designed to assess the mutagenicity of a given chemical, most often a treatment series is compared with a control series. One might like to decide whether the compound used in the treatment should be considered as mutagenic or nonmutagenic. The formulation of two alternative hypotheses allows one to distinguish among the possibilities of a positive, weakly positive, inconclusive, or negative result of an experiment. In the null hypothesis (H0), one assumes that there is no difference in the mutation frequency between the control and treated series. Rejection of the null hypothesis indicates that the treatment resulted in a statistically increased mutation frequency. The alternative hypothesis (HA) postulates a priori that the treatment results in an increased mutation frequency compared to the spontaneous frequency. This alternative hypothesis is rejected if the observed mutation frequency is significantly lower than the postulated increased frequency. Rejection indicates that the treatment did not produce the increase required to consider the compound as mutagenic. If neither of the two hypotheses is rejected, the results are considered inconclusive, as one cannot accept at the same time the two mutually exclusive hypotheses. In the practical application of the decision procedure, one defines a specific alternative hypothesis requiring that the mutation frequency in the treated series be m times that in the control series, which is then used together with the null hypothesis. It may happen in this case that both hypotheses have to be rejected. This would mean that the treatment is weakly mutagenic, but leads to a mutation frequency that is significantly lower than m times the control frequency (54).

3.4.1. Distinguishing Different Spot Types

In the wing spot assay, it is customary to assess genotoxicity not only for the total number of spots recovered, but also to distinguish twin spots from single spots, because twin spots are uniquely produced by mitotic recombination, whereas single spots can be produced by various mechanisms.

To assess negative results, empirically chosen multiplication factors (m) were originally introduced for testing (4,54); these are m=2 for both total spots and small single spots, because of their high spontaneous frequencies, and m=5 for both the rare spontaneous large single spots and twin spots (56).

3.4.2. Optimal Sample Size

In order to minimize the chance of inconclusive results, the statistical tests should be made sufficiently powerful. This can be achieved by planning optimal experimental sample sizes. For an experiment with p<5%, and tested in both directions, we need a control sample of such a size as to make the expected average yield be 32.5 spots on all control flies together. This figure is independent of stocks and test systems and is determined exclusively by theoretical parameters (i.e., the significance level [p<5%] and the minimal risk [doubling effect]) we have opted for as well as the power we require (95% correct decisions). In the standard mwh/flr3 wing spot test with a spontaneous frequency regularly of approx 0.6 spots per fly, this corresponds to an optimal sample size of 55 flies (both wings analyzed) (56).

Determination of the optimal sample size depends on (1) the optimally sufficient number of spots expected in the control sample, which is a theoretical parameter, and (2) the mean frequency of spontaneous spots per individual, which is an empirical parameter. Although the former is independent of the particular strain or strain combinations used in experimentation, the latter is not. Therefore, working groups using the present method to find the optimal experimental sample size should base their sample size estimations on the specific spontaneous spot frequencies, which their strains or strain combinations show (56).

3.4.3. Pooling Data from Different Experiments

Normally, two or more experiments are performed with a test compound, and if no statistical differences are found between them, the data are pooled. Depending on the data, one can use different statistical tests to check for homogeneity/heterogeneity. In this case, if the individual series do not show overdispersion, the chi-squared test for proportion may be used. On the other hand, if there is overdispersion within samples, the Kruskal-Wallis H-test is more reliable, because the chi-squared (%2) test may be too liberal (Frei, personal communication) (see Note 6).

Pooled negative controls may be useful to estimate parameters (e.g., an optimal estimation of spontaneous spot frequencies). However, because of the possibility of heterogeneity among control samples, it is always advisable to carry out a parallel control and, for significance testing, to compare the experimental samples with the parallel control (Frei, personal communication).

3.4.4. Optimal Design

In order to minimize the risk of false-positive or false-negative test results, the minimum necessary requirements are (1) that each treatment series be accompanied by a concurrent control series, (2) that for each experiment the ratio between the number of treated flies and the number of control flies examined be the same, and (3) that for the control and the treatment group in each experiment, the ratio between females and males examined be the same (56).

3.4.5. Which Statistical Test to Use

To test the two hypotheses, several tests are suitable and almost equivalent: (1) The conditional binomial test (Kastenbaum and Bowman test) is recommended if the spot number is small; (2) the x2 test for proportions is used if the expected number of mutations in the control and treatment series are not too small (say, >5 each); (3) the G test (log-likelihood ratio test) and (4) the £/-test (Mann-Whitney test) with correction for ties are used if the individual variability (within experiments, within sexes) contributes significantly to overdispersion.

3.4.6. The x2 Test for Proportions

3.4.6.1. Assessment of Positive Results: Testing Against the Null Hypothesis (H0)

H0: No difference between control and treatment group.

In an experiment with Nc untreated flies in the control and Nt treated flies in the treatment series, we test against the null hypothesis H0 that wing spots are not increased in frequency in the experimental group. The expectation of nc spots for the control flies and the expectation of nt spots for the treated flies is, in each case, proportional to the numbers of flies in each group, n being the total number of mutations recovered in both series together.

Provided the respective expected numbers of mutations in the control and treatment series are not too small (say, >5 each), the x2 test for proportions may be used to test against H0 and HA. It may be recalled that with a sufficiently large n, the x2 test is equivalent to the binomial test. Frei and Wurgler (54) proposed to use the x2 test with Yates' continuity correction, because with that approximation, the probabilities P0 and PA, corresponding to the respective calculated x2 values, become almost the same as with the conditional binomial test.

To illustrate how the calculations are carried out, we use the data from a treatment with docetaxel (0.005 mM) and the corresponding control data published in ref. 18. We test against the proportionality p0 : q0 among the observed total spots, whereby p0 and q0 are the proportions of control and treated flies respectively (note: p0 + q0 = 1).

The number of flies in the control was Nc = 100; the total number of spots in this series was nc = 46, which gives the frequency of spots per fly for the control:

In the experimental series, the number of flies was Nt = 60 ; the number of spots was nt = 47, with a resulting frequency of spots per fly of ft = n/Nt = 47/60 = 0.783 (2)

From the data, one estimates that the frequency in the experimental series is 1.703 times the frequency in the control:

The proportion of wing spots expected in the control is p0 = Nc/(Nc + N) = 100/(100 + 60) = 0.625 (4)

and in the experimental series is q0 = 1 - p0 = N/(Nc + N) = 60/(100 + 60) = 0.375 (5)

if the general incidence were the same in the two groups. Considering the number of spots in the control and experimental series together (n = 93) and applying Yates' correction, it is possible to test against H0 by calculating

X2 = {[(Inc - Pen! - 1/2)2]/p0n} + {[(in, - qcni - 1/2)2] / q0n}

+ {[(I47 - 0.375 x 93I - 1/2)2] / 0.375 x 93} = 6.200

Use a x2-table and look up the probability p = a of the calculated x2. The test is one sided as long as we are only interested in proving an increase in spot frequency in the treated group. A two-sided test (p=2a) is indicated in comparisons whose interest lies in significant disproportions in both directions (e.g., if we ask whether the two sexes in a treatment group react differently).

X2 = 6.200 exceeds the value x2 (a = 0,05, v = i) = 2.706 tabulated for the onesided test; thus, H0 is rejected.

3.4.6.2. Assessment of Negative Results: Testing Against the Alternative Hypothesis (HA)

Ha: Treated flies have m-times more spots than untreated ones.

One may be interested in "proving" that a substance is not hazardous. In this case, one tries to exclude the possibility that the spots observed could be the results of a mutagenic effect of the substance. A minimal risk cannot be excluded, but one may be able to exclude significantly a certain multiple (m) of the spontaneous frequency; that is, one may be able to demonstrate that the effect is significantly below a doubling of the spontaneous frequency (m=2, used for small single and total spots).

Under this hypothesis, the expected spot numbers are also proportional to the fly numbers (pA : qA), but they differ in addition, because the theory postulates that spots are found in proportions 1 : m in control and treated groups (note: pa + qA = 1).

For testing against HA, the expectations change according to the multiple m we are testing against (here, m = 2, because me = 1.7). So, we have pA = Nc / (Nc + mNt) = 100 / (100 + 2 x 60) = 0.45455 (7)

and qA = 1 - pA = mN, / (Nc + mN) = 2 x 60 / (100 + 2 x 60) = 0.54545 (8)

which represent the respective proportions in which the spots would be expected in the control and experimental series if HA was true. Again, using Yates' correction, we test against this hypothesis by calculating

X2 = {[(Inc -pAnl - 1/2)2] /pAn} + {[(In, - qAnl - 1/2)2] / qAn} =

= {[(I46 - 0.45455 x 93I - 1/2)2] / 0.45455 x 93} (9)

+ {[(I47 - 0.54545 x 93I - 1/2)2] / 0.54545 x 93} = 0.452

which is less than the value x2 (p = 0, 05, v = i) = 2.706 tabulated for the one-sided test and, thus, HA, is accepted. Having rejected H0 and accepted A we conclude the test substance has a significant mutagenic effect (see Note 7).

3.4.7. The Conditional Binomial Test

In an experiment, the number of mutations in the control series can theoretically take any value, from 0 to n, and the number of mutations in the treated series can have any value, from n to 0. One calculates the binomial distributions (based on p0, q0, and n under H0, and based on pA, qA, and n under HA, already calculated for the x2 test) to determine the probabilities with which all the different possible results of an experiment are expected, with n mutations overall.

The respective significance levels at which we decide to test for rejection of H0 and HA were denoted by a and p, respectively. Conceptually, both tests are one sided. The opposite nature of the hypotheses requires that the cumulative probabilities (P0 and PA) be calculated from the opposite extreme ends of the respective binomial distributions (54).

According to the rationale set out, H0 is rejected in the binomial test if

j=0 1 r=ntr and, by analogy, HA is rejected if

PA = 5=0 |n) q^A"- = Lnc r"' pArqAn-r <P (11)

The tables of Kastenbaum and Bowman (57) for the conditional binomial test can be used for the test of both hypotheses. For rejection of H0 and HA, the frequencies q0 and pA, respectively, should be used to look up the corresponding limit numbers in the tables. H0 is rejected if the number of mutations in the treated group (nt) is larger than or equal to the tabulated value; HA is rejected if

Table 1

Calculation Steps of the Mean mwh Clone Size Class (i), With and Without Clone Size Correction, Induced After Treatment With 0.05 mM of Camptothecinb o iv)

Clone size

mwh spot

mwh spot

Corrected

category (i)a

Freq.

number

Freq X

i

Freq.

number

Freq X

i

frequency

Freq X i

1 (1 cell)

0.192

15

1 X 0.192 =

0.192

0.60

12

1 X 0.6 =

0.60

0.6-0.192 = 0.408

0.408

2 (2 cells)

0.218

17

2 X 0.218 =

0.436

1.10

22

2 X 1.1 =

2.20

1.1-0.218 = 0.882

1 .1 64

3 (3-4 cells)

0.026

2

3 X 0.026 =

0.078

1.60

32

3 X 1.6 =

4.80

1.6-0.026 = 1.574

4.722

4 (5-8 cells)

0.013

1

4 X 0.013 =

0.052

1.20

24

4 X 1.2 =

4.80

1.2-0.013 = 1.187

4.748

5 (9-16 cells)

0.013

1

5 X 0.013 =

0.065

0.90

18

5 X 0.9 =

4.50

0.9-0.013 = 0.887

Negative Control*

(4% ethanol + 4% Tween-80) Camptothecin 0.05 mmb Camptothecin, 0.05 mm,

N = 78 flies N = 20 flies control corrected

6 (17-32 cells)

0.013

1

6 x 0.013 = 0.078

0.95

19

6 x 0.95 = 5.70

0.95-0.013 = 0.937

5M.622

7 (33-64 cells)

0

0

0

0.35

7

7 x 0.35 = 2.45

0.35-0 = 0.35

R2.45

8 (65-128 cells)

0

0

0

0.50

10

8 x 0.5 = 4.00

0.5-0 = 0.50

4.00

9 (129-256 cells)

0

0

0

0.95

19

9 x 0.95 = 8.55

0.95-0 = 0.95

8.55

10 (> 256 cells)

0

0

0

0.35

7

10 x 0.35 = 3.50

0.35-0 = 0.35

3.50

Z

0.475

37

0.907

8.50

170

41.10

8.025

Mean mwh clone size class (i) 1.90 4.84 5.01

amwh clones from single and twin spots were classified according to the number of cells they contain. fcData were extracted from Cunha et al. (19).

Table 2

Clone Induction Frequency per 105 Cells and per Cell Division as Well as the Percentage of Recombination, With and Without Clone Size Correction, After Chronic Treatment with 0.5 mM of Camptothecina

Historical negative control (4% ethanol + 4% Tween-80)

Camptothecin (0.5 mM)

mwh/flr3

No. of flies (N)

78

20

Total mwh clones' (n)

37

170

Mean mwh clone size classb-d

(t)

1.89 [-]

4.84 [5.01]

Geometric mean of clone sizeb-d (2t-1)

1.86 [-]

14.22 [16.11]

Without clone With clone size correction®-® size correction®-® f = (n/NC) x 105 f't = (2(t-2)) xf

Without clone size correction®-® f = (n/NC) x105

With clone size correction®-® ft = (2(t-2)) xR

Clone induction frequencies (per 105 cells per cell division)

f = 37/(78 x 48,000) x 105 f't = (2(1-89--2)) x 0.97... = 0.97 [-] = 0.91 [-]

ft = 170/(20 x 48,8000) x 105 f't = 2(484—2) x 17.41... = 17.42 [16.45] = 124.31 [132.42]

Without clone With clone size correction^ size correction®'d

Recombination (%) (1-1.54.../17.42...) (1-1.77.../124.31...)

in mwh/flr3 fliest x 100 = 91.18 [94.24] x 100 = 98.58 [98.98]

Table 2 (continued)

Clone Induction Frequency per 105 Cells and per Cell Division as Well as the Percentage of Recombination, With and Without Clone Size Correction, After Chronic Treatment with 0.5 mM of Camptothecin0

Table 2 (continued)

Clone Induction Frequency per 105 Cells and per Cell Division as Well as the Percentage of Recombination, With and Without Clone Size Correction, After Chronic Treatment with 0.5 mM of Camptothecin0

Historical negative control (4% ethanol + 4% Tween-80)

Camptothecin (0.5 mM)

mwh/TM3

No. of flies (N)

80

20

Total mwh clones' (n)

23

15

Mean mwh clone size class'-d (t)

1.70 [-]

2.20 [2.51]

Geometric mean of clone sizeb-d (2t-1)

1.62 [-]

2.30 [2.86]

Without clone size correctionc-e fh = (n/NC) x 105

With clone size correctionc-e fh = (2(t-2)) x f

Without clone size correctionc-e fh = (n/NC) x105

With clone size correctionc-e f'h = (2(t-2)) xf

Clone induction frequencies fh (per 105 cells per cell division)

= 23/(80 x 48,800) x 105 f'h = 2(L7—2) x 0.58... = 0.59 [-] = 0.48 [-]

fh = 15/(20 x 48,800) x 105 f'h = 2(2-20--2) x 1.54... = 1.54 [0.95] = 1.77 [1.35]

aData extracted from Cunha et al. (19).

^Considering mwh clones from mwh single spots and from twin spots. cIn order to render the table more easily read, only two decimals are shown. ^Numbers in square brackets are control corrected. eC = 48,800 (i.e., approx number of cells examined per fly).

fRecombinational frequency is calculated only in mwh/flr3 markers using the clone induction frequencies obtained in mwh/TM3 flies, in whichthis event is suppressed.

the number of mutations in the control group (nc) is larger than or equal to the tabulated value (54).

3.4.8. The U-Test of Wilcoxon, Mann, and Whitney

If individual variability (within experiments, within the same sex) contributes significantly to overdispersion, the fidelity of the aforementioned tests may be seriously affected (see Note 8). This is particularly the case for antimutagenicity or comutagenicity experiments, where the so-called positive control is compared with the cotreatment or posttreatment series to check if the modulator is modifying the genotoxic effect of a specific mutagen, or in cases where two experimental conditions (e.g., genotypes) are investigated. Pronounced individual variability can be the result of differential individual sensitivity and/or variable uptake of compounds. In this case, the U-test of Wilcoxon, Mann, and Whitney (also called Wilcoxon II) based on the number of spots recovered in individual flies is indicated (56).

Was this article helpful?

0 0

Post a comment