My understanding of statistics is very weak, so I apologise if this is not clear, or the question is unwarranted

Undergraduate students set out to find out whether substrate type (four types on a coral reef) affected algal community structure. They used a stratified random sampling design, with five replicates in each stratum, with each replicate represented by a quadrat that was randomly thrown.

Using a 1m by 1m quadrat subdivided into 100 squares they estimated percentage cover of different species of algae (i.e., percentage of each quadrat occupied by each species). Their resolution was 0.25% (quarter of a square).

They wanted to do an ANOVA, so they needed continuous data. Instead of using the data as percentages, they instead used actual area covered (each square is 10 cm by 10 cm, or 100 cm2).

A colleague disagreed with this strategy on the following counts:

  • they felt the percentage data should have been Arc Sine transformed instead, and converting to area did not represent a valid transformation for this purpose. Applying an arithmetic conversion was not a satisfactory option
  • the percentages, and hence areas, were an estimate and not an absolute measure. They mentioned that that means they are likely to vary from one person to the next (no questions were raised about whether or not the estimates were done by one or more students)
  • the resolution of measure was quarter of a square, or 25 cm2, so they felt this was not really continuous data
  • Are these valid concerns, and if so, which and why? In addressing 2 and 3, please add comments on how using ArcSine would have been better than using area (I felt that the transformation may carry forward the concerns – estimated data and what I think they meant as inadequate resolution.

    Similar questions and discussions