For a retrospective cohort study, what is the best randomization method to select the sample size of medical records from a large patients list?

Simple Randomization

The easiest method is simple randomization. If you assign subjects into two groups A and B, you assign subjects to each group purely randomly for every assignment. Even though this is the most basic way, if the total number of samples is small, sample numbers are likely to be assigned unequally. For this reason, we recommend you to use this method when the total number of samples is more than 100.

Block Randomization

We can create a block to assign sample numbers equally to each group and assign the block.

If we specify two in one block (the so-called block size is two), we can make two possible sequences of AB and BA. When we randomize them, the same sample numbers can be assigned to each group. If the block size is four, we can make six possible sequences; these are AABB, ABAB, ABBA, BAAB, BABA, BBAA, and we randomize them.

However, there is a disadvantage in that the executer can predict the next assignment. We can easily know the fact that B comes after A if the block size is two and if the block size is four; we can predict what every 4th sample is. This is discordant with the principle of randomization. To solve this problem, the allocator must hide the block size from the executer and use randomly mixed block sizes. For example, the block size can be two, four, and six.

Stratified Randomization

Randomization is important because it is almost the only way to assign all the other variables equally except for the factor (A and B) in which we are interested. However, some very important confounding variables can often be assigned unequally to the two groups. This possibility increases when the number of samples is smaller, and we can stratify the variables and assign the two groups equally in this case.

For example, if the smoking status is very important, what will you do? First, we have two methods of randomization that we learned previously. There are two randomly assigned separate sequences for smokers and non-smokers. Smokers are assigned to the smoker's sequences, and non-smokers are assigned to the non-smoker's sequences. Therefore, both smokers and non-smokers groups will be placed equally with the same numbers.

So we can use 'simple randomization with/without stratification' or 'block randomization with/without stratification.' However, if there are multiple stratified variables, it is difficult to place samples in both groups equally with the same numbers. Usually two or fewer stratified variables are recommended.

Go to:

EXAMPLES OF RANDOMIZATION

Although there are websites or common programs for randomization, let us use an Excel file. Download the attached file in http://cafe.naver.com/easy2know/6427. It is in a 'Read-only' state, but there is no limit in function; it is in the 'Read-only' state only to prevent accidental modification.

Due to the nature of Excel, if there is a change, it creates a new random number accordingly. If we input any number instead of '2' in the orange-colored cell and click the 'enter key,' it creates new random sequences. The sequences are the result of simple randomization. The numbers in the right column show the numbers of the total sample. Basically the numbers are up to 1,000, but if you need to, you can extend the numbers with the AutoFill function in Excel.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3942596/

Jerry Miller

For the simple selection process that you need to have in this study, incorporating a random number to select patient records is the best approach.

You could for example use the last four digits of the patients' ID number and select every 20th one (e.g., patient numbers 1020, 1040, 1060, etc). But you should be sure that the patient ID numbers don't cluster on some other factor, such as all patients with numbers under 5000 being from only one district, or one limited age group. You want to ensure that the pool of patients you are selecting from represent the traits you want, and then ensure that each patient has a theoretically equal chance of being sampled. You could use day of birth.

And, as a professor once told me, "randomization isn't always your friend." Especially if your sample size is small, you could just by chance end up with an unbalanced sample even if it was chosen randomly---you could have too many females, too few older people, and so on, just from chance alone. To avoid this it is useful to randomize on more than one variable. For example you might select half your patients randomly from the male patients, and half randomly from the female patients. That way you will ensure that half are male and half are female, but within each gender they are randomly selected based on their day of birth for example.

Do you think that "secondary objectives" should be considered when choosing the TITLE of a thesis?

Are there any instruments for studying time similar to the way it is in space?

How to report results of Generalised Linear Mixed Models in a journal article?

Which distribution type should I use when calculating the average particle size from TEM image? and how to calculate the error ?

How to calculate effect size of AMCE (Average Marginal Component Effect) in Randomized Conjoint Experiment?

What is the best sampling strategy?

Is the peer-reviewed publication "MedieKultur: Journal of Media and Communication Research" (E-ISSN 1901-9726, P-ISSN : 0900-9671) a legitima?

Will combining different research methods help in my case?

How do I correct that wrong author's profile in my publication?

How to conduct a sensitivity power analysis for Kendall's Tau?

In terms of chaos, what is the necessary and sufficient condition for authoritarianism, permanent or temporary, to come to exist and persist?