My colleagues and I conducted a study on the prevalence of gambling addiction in Ohio .The study was based on a household telephone survey of 3,600 Ohioans (18+) to get baseline prevalence data before the casinos opened. We used multistage random area probability samples (selection of individual with a zip code), and sampled people from the entire state (1,200) and over-sample people from each of the regions where casinos were going to be located (600 x 4). Data was weighted on age, race and gender in order to more closely reflect the population and enable us to make generalizations from the sample data about the population of adults in Ohio and the four county clusters. Originally, we analyzed weighted data from each of the regions and state separately, but we were wondering about combining all of the data into one analysis in order to run more sophisticated analyses.
My question is this: If we combine the data, should we “rake” it so that the areas with overrepresentation aren’t biasing the analysis? If yes, how should we do this/what are the resources to learn the raking technique?