I need to conduct a study in which the treatment group has two levels, say A and B. The sizes of these groups are really different, size of group A is 1 million while the size of B is 20,000. I will use the Welch's test in comparing means and multiple linear regression or weighted least squares in predicting say, cost, hospital stay, etc. Now, my problem is the big gap of sample sizes. My questions are:
1.) I'm planning of getting a subset from the larger group (group A) for the analysis, so that the sample sizes between two groups will not be that large. Is this right?
2.) Is there a minimum required proportion of sample in both group? If so, by how much? What are the disadvantages of this?
Thank you.