What would be the best statistical strategy to compare two groups with a different number of subjects in each group (imbalanced design; 17 vs. 8)?

More Patrice Brassard's questions See All

How can technical errors be resolved within the bacteriology lab which often includes hazardous substances, chemicals and microorganisms?

It is more often than not does negligence cause the immediate risks to our laboratory personnel and to our surroundings, needless to say, how can we reduce these clerical and technical errors to...

22 February 2023 4,683 1 View

Do I need to perform validity test on a research instrument that I’ll use in my study?

If I am to adopt a previously validated questionnaire, and then make some minimal revisions to the items, would I still need to perform a validity test? Or is a reliabity test using cronbach’s...

14 November 2022 4,858 2 View

What statistical test is appropriate for my hypotheses?

Hello, I'm currently working on my data analysis but I'm not sure what statistical test to use. My research objective is: To determine the effect of age, gender, and GPA on the work readiness of...

12 November 2022 2,769 5 View

How to assess the quality of specular highlight removal results?

Dear researchers, Recently, I have been working on detecting and removing specular highlights from natural and medical images. The obtained results seem quite satisfactory. However, given that...

03 October 2020 1,687 5 View

What are the best (and simplest, and free, and most reliable) options for online experiments that record response times?

Just looking for the best (and simplest, free, and most reliable) solutions for online experiments that record response times...

07 September 2020 6,260 4 View

I am looking for the following paper: Alternative providers of higher education: what are the risks? by Evans, G. R. would you send me a copy please?

Abstract:The part of the higher education sector in the UK in which providers are not publicly-funded (including for-profit, not-for-profit and charitable) has been expanding considerably under...

07 November 2018 8,293 1 View

Handover for a Relay Node (RN) in 5G?

Hi everybody, Up to know, the Inter-cell handover of the RN is not supported. (36300- [4.7]). The TR 38.874 rel 15 focus on IAB (relay node in 5G) with physically fixed relays. I am studying the...

26 September 2018 732 0 View

Different treatments as differents environment?

Ammi Analysis or GGE Biplot Analysis ? I have several Nitrogen application treatments in a three year Winter wheat Experiment. Can I consider each treatment as an environment and run an AMMI...

24 July 2018 4,588 13 View

How can I perform WC or Perforated patch clamp recordings on confluent human iPSCs derived RPE cells ?

Hello, I wish to perform patch clamp recordings on confluent human iPSCs derived RPE cells. The problem is that when they gather to form the epithelium these cells are linked to vicinal ones...

29 November 2017 4,559 1 View

Any observation of Bactrocera oleae controled by Macrolophus predator ? ?

Lot of Macrolophus this year on Bactrocera oleae in south France.

26 July 2017 6,970 19 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Jochen Wilhelm

The "best" way to statistically analyze your data does NOT depend on the sample size but on two other things:

- your scientific question, and how you formulate this in a statistical model

- the kind or nature of your data, and how you represent it by a random variable (this includes the distributional assumptions that can reasonably be associated with the data)

As you say you used a t-test, I assume that your problem is to test an expected difference between two random variables, because this is the only analytical tool for this task.

If you had different variances in the groups, the question arises: why? How can the expected difference be interpreted when it refers to diffenent kinds of random variables? Often, heteroskedasticity implies that the distribution of the random variable is also not normal (what is an assumption made by the t-test) and that the entire statistical model is inappropriate (e.g. your data are counts and a Poisson-model would be appropriate, or your data are proportions and a beta-model would be in place etc.).

Emmanuel Curis

Note also that, if you say « most of the variables has equal variance », it suggests you have a lot of variables available to compare your groups.

According to your exact question, this may either ask for multiplicity corrections, or for the use of multivariate techniques.

As wrote Jochen, without knowing the question and the experimental protocol, there is no way to answer your question. There is no universal "best" method.

Peter Samuels

Dear Patrice,

If your data is not normally distributed, an alternative nonparametric test is the Mann-Whitney U test. This also works with unequal samples sizes. You sample sizes are rather low, by the way, but their appropriateness would depend upon the size of the effect(s) you are measuring. Large effects require smaller sample sizes to detect them.

Dear Peter,

that's not that simple. The U-test tests a different hypothesis than the t-test. It should be given by the scientific question what kind of hypothesis the researcher wants to test.

Patrice Brassard

Thank you for the follow-up guys. The question is rather simple in that case. I want to examine the impact of cardiorespiratory fitness on a specific determinant of blood flow regulation . For that, I am comparing athletes (n=17) to controls (n=8). The distribution of the random variables is normal for all variables. The variance is equal for most of the variables...

Khalid Hassan

Dear Patrice

In this case of your data ( normal distribution and equal variance ), the T-test for two independent samples will be suitable to test the null-hypothesis for equality of the two means .

Good Luck

Thanoon Y Thanoon

You can use a different number of sample size in each group, this is not a problem. Please give more details about the analysis.

Best

According to your question, you're interested in « a specific determinant of blood flow regulation », which suggest you have a single variable. However, you say afterthat « for all variables » and « most of the variables », which suggests you have many. There is apparently a self-contradiction here.

In addition, you do not describe what you expect for the CR fitness effect on this determinant, and what you're interested in. Any change is welcome => Mann-Whitney test is OK? Change in location => question may be solved by T-tests, or not, and it may turn out to be a very difficult question, especially with small samples sizes, if you want to detect changes in location even in the case of different variances or distributions...

Please explain better you experimental plan if you want a clear answer on what you should do…

In the most common case of a single variable compared between two groups, assuming a Gaussian distribution and equal variances (so that the only difference between the two groups can be on means), unequal sample size may not be a problem for the T-test in principle, but will lower the power & make the test less robust to any departure to its assumptions.

David E Drehmer

The Welch–Satterthwaite approach is probably good enough to test for differences between means. It is distributed approximately as t with a degrees of freedom correction. Many computer programs implement some variation of this test for differences in means when the assumption of equal variances (observations drawn from the SAME population and the only effect is on the means) is not tenable. I would prefer to stay away from non-specific omnibus tests that don't involve the estimation and testing of parameter estimates (non-parametric) such as the Mann-Whitney U test when possible.

If you have unstable variances, then I would worry about whether the very small sample sizes and possibly large variances might not yield enough power to detect true differences or be appropriate to pooling for effect size estimates..

When comparing groups, however, please don't stop with just comparing means (or some other estimate of location, such as the median, trimean, winsorized mean, etc.) What is the difference in spread (variance, range, midspread, etc.) You might want to explore why the groups differ in spread. Next look at shape. Are the distributions of the two samples skewed, normal, Cauchy or some other shape? What about peakedness? Is the kurtosis the same or not? Are there unusual observations such as outliers (points sampled from a distribution other than what you think you've sampled from) or extreme but legitimate data points? These are the main ways that two samples may show differences the the populations from which they are drawn.

Thank you guys for insightful comments and suggestions. I now have a game plan!

Yemane Hailu Fissuh

If your data is cross-sectional and fulfills the assumption of normality, independent two sample T-test will be fine. However, if you have doubt with normality assumption and constant variance assumption for error term, the non-parametric analog of T-test which is known as Mann-Whitney test will be preferable. In case your data is normal and is repeatedly measured/longitudinal you can use linear mixed effect model.

@ Yang Li: 1) with so few data, seems difficult and 2) to run a mixed effects model, you need repetitions on the same patient/experimental unit which was not implied by the design described in the question.

Ilya B. Gertsbakh

In addition to all above advices. I would suggest to try a normal plot on the normal paper for each sample separately, and for the united sample of all 18+7=25. Possibly, these plots will reveal or confirm your assumptions.

Ilya Gertsbakh