What are the different parameters we should include in an analysis of PSM? Is it possible to apply the technique in a secondary data? What should be the ideal sample size for both treatment & control group then?
When conducing Propensity Score Analysis (PSA) all variables which had an effect on both treatment selection and outcome must be included to avoid bias.
For example, if you are measuring the effectiveness of Drug A vs. Drug B you must include age if users of Drug A tend to be older and your outcome is medical cost which also trends higher for older users.
There Is no one standard for sample size requirements as there are many kinds of PSA and because it depends on how closely the case and control populations are on the selected confounders before PSA. Variants of PSA include PS stratification, PS weighting, PS matching along with doubly robust methods of regression. I typically find a 3:1 control to case ratio is necessary for PS matching but that is a heuristic based only on my experience.
An excellent book on PSA which was recently published is Paul Rosenbaum’s Observation and Experiment. Along with Donald Rubin, Rosenbaum co-authored the paper which introduced PSA in Biometrika in 1983 (The Central Role of the Propensity Score in Observational Studies for Causal Effects). That paper is available online but is mostly devoted to the theory which shows that, correctly implemented, PSA is as rigorous as Fisher’s randomized experiments. Rosenbaum’s text is much more helpful in understanding how to practically implement the method.