You need to be much more specific. Testing means, percentages, a regression model? Is it a Phase II with different strata or multiple stages? Is there pilot data that you can use for rough estimates to help determine sample size? You also need to decide what kind of difference you want to detect - what kind of difference would be considered clinically meaningful over the control?
Sample sizes for different tests/designs are calculated quite a bit differently. The attached link has a lot of calculators for basic tests.
What is your population? All pregnant females, or a subset?
What is the minimum difference where you care? So if you had 4000 surveys and 1999 showed a negative effect and 2001 showed a positive effect would you care? Is this a large enough difference to be a biologically meaningful outcome? How about if the numbers were 200 versus 3800? The closer these numbers are to each other the larger the sample size you will need.
You will also need some idea of the variability in your population.
You also need to be careful about how you define both "safe" and "effective". These are two different questions. Out of 8000 mothers, half of the 4000 in the treatment group died after treatment whereas none died in the control. However, of the 2000 mothers that survived in the treatment group all got better whereas only two got better in the control group. This is an effective but not safe treatment.
There are sample size calculators available through the internet. I have played with GPower 3.1 a bit. I find them less than useful because they require knowledge about the system that I typically don't have until after the experiment. However, you could put in a range of different values to see how sample size changes with different inputs. Then you can ask a different question: Given that I have resources to gather 4000 samples, and given that I will have a standard deviation of X, what is the smallest difference between treatments where I can detect a significant difference? Given this outcome, it the project worth doing? Or, given my confidence in what I entered for standard deviation, is my probability of failure acceptable? Or, given that my chance of success is 20% do the people providing funding want to improve the odds by providing more funding? If nothing else, find other surveys and try this experiment using the variability that was present in their data.
Ethical issues in using human subjects may override any other consideration in sample size. It would also be useful to consult a statistician at your university, especially one versed in this type of experimental design.