What sample size for Hierarchial Clustering Analysis?

More Juliette Blt's questions See All

Are there Web based tools to analyse 5'UTR variants?

I'm still unexperienced in this field and on ResearchGate. I need to analyze some variants that I found in the 5' upstream region. I would like to know if they occur in a pattern that could be a...

22 February 2024 932 0 View

What woud be the best clustering method?

Hello, Clustering methods are new to me: I'm trying to find out which would be the best clustering method for a sample size of around 350 observations and around fifteen variables, either...

01 July 2023 802 3 View

Scale of impact of marine activities : desalinization and geothermal extraction?

I have to take decisions about some activities in marine protected areas - what level protection are we providing if these activities are occurring? I am looking for any good references on the...

06 November 2022 7,865 2 View

How to speed up spatial operations with R 'sf' package?

Hi everyone, I have to identify overlapping polygons, with one of the datasets containing thousands of polygons. I am using the sf package and its st_intersects function, as: dataframe1 %>%...

25 September 2022 9,633 1 View

Geometric standard error of the mean ?

Hi everybody, Graphpad prism is able to calculate geometric means and geometric standard deviation. - Is there any requirement for the data to have a normal distribution in order to calculate the...

14 July 2021 582 8 View

How can I fix residual covariances in MCMCglmm?

Hi, I need a little bit of help! I am working on a multivariate MCMC model, I basically want to model the relationship between a trait and fitness, with fitness modelled with a hurdle model. The...

28 February 2021 3,576 0 View

Does anyone know of a cell fractionation and mitochondria extraction buffer with a simple composition, without deoxycholate?

It is for LC-MS dosage of a drug.

12 March 2020 9,604 3 View

Has anyone used ITEX GC MS for benzaldehyde and 3 methyl butanal quantitation ?

I am trying to use ITEX GC MS to quantify benzaldehyde and 3 methyl butanal from plant matrices. I am struggling with the calibration curves for the 2 compounds, we are not able to get a linear...

05 November 2019 5,568 5 View

What is the insertion site of the nosGal4-VP16 driver of the 3rd chromosome in drosophila (van doren, 1998)?

Does anybody know the exact insertion site of the nanos-Gal4 driver located on the 3rd chromosome in drosophila melanogaster? the one from Van Doren et al, 1998? thank you for you help

08 September 2019 7,307 0 View

Which cell line would you recommend for a study of joint inflammation?

Hello everyone, I would like to make a short study on joint inflammation (especially MMP/TIMP production), which cell line would you recommend (and would be relatively easy to grow)? I have been...

08 February 2018 1,933 2 View

Which distribution type should I use when calculating the average particle size from TEM image? and how to calculate the error ?

average particle size calculation from TEM

04 August 2024 2,921 1 View

How to calculate effect size of AMCE (Average Marginal Component Effect) in Randomized Conjoint Experiment?

I am following Hainmueller, Hopkins, and Yamamoto's (2014) paper for my randomized conjoint experimental data analysis. The link to the paper is provided below. I received a comment from the...

02 August 2024 4,406 0 View

How to conduct a sensitivity power analysis for Kendall's Tau?

Is there a straightforward way to conduct a sensitivity power analysis for a Kendall's Tau correlation? I was considering using the sensitivity setting and "Correlation: point biserial model" test...

28 July 2024 6,133 8 View

How to estimate sample size for GWAS of continuous and discrete traits? What are the pre-requisites?

Genome-wide association study (GWAS) Continuous traits: eg. Height Discrete traits: eg. Eye color

28 July 2024 286 0 View

What is the best method for removing paraffin from plant samples prepared for microtome?

...

24 July 2024 3,087 3 View

How many samples size should I select to compare both groups?

I want to study the differences between two groups: the treatment group and the comparison group. The total population consists of 60000 women in the treatment group, distributed across different...

21 July 2024 669 3 View

How to bring baseline to zero for an absorbance data for chromatogram?

I forgot to autozero during the run (Size exclusion chromatography.) and later i realised i forgot to do that and the baseline was not zero but below zero (and in some cases it above zero). I...

15 July 2024 5,551 6 View

How to identify a monomeric protein species of the target without running the molecular weight standard on SEC?

I was using Superose6 10/300 Gl column for purification of my target protein. the sample consists of various oligomeric state and i would like to identify the approximate elution volume where my...

15 July 2024 5,157 2 View

How to calculate Cohen's d from CI 95 and t value from a paired sample t test?

We have conducted a systematic review to investigate the effectiveness of a treatment for a psychological disorder. We aim to report effect sizes and p values of the reviewed studies but one study...

10 July 2024 7,186 4 View

Is VSM measurment of NdFeB alloy powder sample require any spesific sample preparation ?

Is any specific sample preparation required to perform VSM measurement of NdFeB alloy powder to get a square-shaped M-H loop? Or is it mandatory to make magnetically aligned and sintered pellets...

09 July 2024 1,969 0 View

Chuck A Arize

The sample size for Hierarchical Clustering Analysis depends on the number of clustering variables and the number of clusters. In the simplest case where clusters are of equal size, Qiu and Joe recommend a sample size at least ten times the number of clustering variables multiplied by the number of clusters. Dolnicar et al. recommend using a sample size of 70 times the number of clustering variables. Overall, researchers should aim for sample sizes of N = 20 to N = 30 per expected subgroup2.

Yes, it is possible to get non-significant results in post hoc tests even if we got significant results in ANOVA. This can happen when the sample size is small or when there is a large amount of variability within groups. In such cases, the post hoc tests may not have enough power to detect differences between groups.

Ma'Mon Abu Hammad

Hierarchical Cluster Analysis (HCA) is a clustering method that aims to group similar observations based on their characteristics. The ideal sample size for an HCA can vary depending on various factors, including the complexity of the data, the number of variables, and the desired level of detail in the resulting clusters.

While HCA is often used with smaller sample sizes, it can still be appropriate for larger sample sizes, such as the n=340 observations you mentioned. In fact, HCA can handle datasets of varying sizes. However, there are a few considerations to keep in mind:

Computational Complexity: As the sample size increases, the computational complexity of the clustering algorithm also increases. HCA involves calculating distances between pairs of observations, so the computation can become more time-consuming and memory-intensive with larger sample sizes.

Interpretability: With a larger sample size, the resulting dendrogram (tree-like structure representing the clusters) may become more complex and difficult to interpret. Identifying meaningful clusters and drawing meaningful insights from the analysis could become challenging.

Variability and Stability: Larger sample sizes can lead to more variability in the data, which can affect the stability of the clustering results. It's important to assess the stability and robustness of the clusters obtained from HCA, particularly with larger sample sizes.

Preprocessing and Feature Selection: With a larger sample size, it becomes even more crucial to carefully preprocess the data and select relevant features. Dimensionality reduction techniques or feature selection methods may be necessary to handle high-dimensional data effectively.

In summary, while HCA is commonly used with smaller sample sizes, it can still be applied to larger datasets like n=340 observations. However, it's essential to consider the computational complexity, interpretability, variability, and preprocessing aspects when using HCA with larger sample sizes.

Stephen Ellison

I don't think you'd have any problem applying any reasonably kind of HCA to 300-odd objects. It will certainly 'work' in the sense of giving you clusters, and it is at least an order of magnitude short of a size that would make a pairwise distance matrix impossible to store. In that sense, 300 _is_ 'small'.

It is probably too big for simple visual interpretation of a complete hierarchical clustering, though, if that's what you were intending - that goes down to individual objects/individuals and it can get hard to see what's really going on.

Otherwise, for a lot of ordinary purposes, 300 is likely to be _big_ enough to be useful , and that's usually the more important problem if you want to draw any inferences or classify other objects later. Others above have pointed to recommendations. 30 (or more) per expected subgroup is a fair rule of thumb. But it does depend on how many variables you have and what the typical ranges are between and within groups. I'll stick my neck out a little and say there's _no_ simple way of getting an 'ideal' number for clustering in advance; you'd have to know quite a lot about your particular population and your intended use to even make a start on the question.

And an important rider, after all that, is to plan for some cross-validation to make sure your clustering isn't just rolling dice ...