Within a project about geographical traceability of horticultural products, we would like to apply classification models to our data set (e.g. LDA) to predict if it is possible to correctly classify samples according to their origin and based on the results of 20-25 different chemical variables.

We identified 5 cultivation areas and selected 41 orchards (experimental units) in total. In each orchard, 10 samples were collected (each sample from a different tree). The samples were analyzed separately. So, at the end, we have the results for 410 samples.

The question is: the 10 samples per orchard have to be considered pseudoreplicates since they belong to the same experimental unit (even if collected from indepedent trees)? Should the LDA be performed considering 41 replicates (the 41 orchards, taking the average of the 10 samples) or should we run it for the whole dataset?

Thank you for your help.

Similar questions and discussions