What is a representative sampling?

14 March 2014 12 4K Report

Probability sampling I know, but it seems be different.

Adilson, this expression sounds familiar but strange to me. Representativeness is always a concern in sampling. If I will be "philosophical", will there also be a Non-representative Sampling ? Please bear with me. Ed

Nathalie Picard

Non-representative Sampling is very common. Over-sampling some categories can improve efficiency, but may also introduce bias. Non representative samples can prove very useful, but have to be manipulated with caution. If you do not master the topic, it is simpler to use representative samples, i.e. samples in which the distribution of your variables of interest is the same as in your population.

Aurélie Cailleau

Yes, simpler, but useless if the variable you record is categorical and some categories you are interested in are in very very small proportions : in this case, you may very well find yourself with no individuals from these categories.

Stratification (leading to such a "non-reprensentative" sample) can help. But you are right, specific statistical tools have then to be used to analyse the data, and a drawback is that your sample does not allow you to estimate the proportions of the different categories you are interested in.

Fabrice Clerot

A nice discussion of "representativeness" in the document attached

( in French :[ )

Fabrice Clerot

Attached !

Ghosh Probir

A sample is a subset of define population. we always try to pick up a sample from population based on probability or non-probability that have small sampling error and minimum bias. probability sampling with small sampling error and minimum bias represents the distribution of define population. The probability sampling may be a representative for population. Non probability sampling represents the restricted areas.

James R Knaub

Adilson -

"Representativeness" in a sample, as indicated by answers above, may mean different things to different people.

Consider this, regarding sampling from finite populations: At one place I worked for many years, I had a supervisor that threw that term (representative sample) around continuously. It made it into documents declaring that our samples were "statistically representative." What he really meant to convey, likely, was a sense that what we did was valid. I think most people do think of that as meaning that probability (random-based) methodology is used, which makes it somewhat amusing when I noticed he still used the same wording long after I switched to model-based estimation and quasi-cutoff sampling. :-)

In the 1940s there was much controversy between proponents of randomized sampling and 'purposive' sampling. Randomized sampling won. As I think Ken Brewer once put it, people conceive of a random sample as "fair." And I suppose that purposive sampling back then might generally have meant someone's expert opinion of a 'representative' sample. But even if it were better, how do you do inference from such a purposive sample? What would the variance be?

I read some old idea about what 'representative' might mean. I wish I could remember where I saw it. I think it was even different than I imagined.

Today there are a number of different forms of purposive sampling. Mike Brick (great name for a 1940s Private Detective, like Sam Spade, or Mike Hanmer, eh?) was the Washington Statistical Society's (ASA chapter) President's invited seminar speaker (last March, I think), and he talked about how varied are the types of purposive sampling. My feeling is that by far the best occurs when you have good regressor data to do model-based estimation, and you know how to stratify so that each model application is only applied to data which should be modeled together.

My experience is mostly in establishment surveys, with continuous data, and there are basically three methods in use. Ray Chambers (an Australian, as is Ken Brewer), had a seminar a couple of years ago, where he used this same breakdown, as would many other survey/mathematical statisticians:

(1) design-based (randomized) sampling and estimation methodology, (2) model-based (regression) methodology, and (3) model-assisted design-based methodology.

Note that using a model-assisted design-based method means that you sample using randomized methods, but your estimation takes into account 'auxiliary data' (which are really 'regressor' data under model-based estimation). The advantage is that the auxiliary data (on the entire population) adjusts your results to compensate for random sampling that may not be very 'representative.' That is, if you randomly picked only the smallest or only the largest members of a population in your random sample, the 'model-assisted' part would compensate to a degree, during the estimation phase.

In random sampling (simple or stratified) you have survey weights. (Even probability proportionate to size - PPS - sampling has a sort of built in weight for each respondent.) For random sampling, we generally use a "w" for this survey design weight. For model-based estimation, we also generally use a "w" for the regression weight. However, one type of model-assisted design-based methodology uses "calibration weights," again "w," which combine both survey design weights (magically renamed "d"), and regression weights (now named "c" or another letter I forget at the moment). Calibration weights are survey weights that are modified to account for the better representativeness that a model can impose. Models can also be used to help decide what design-based method is used, as I have seen in work by Ken Brewer, and furthered later by Anders Holmberg (Statistics Sweden). Also, Ken Brewer wrote a book published in 2002 on better combining design-based and model-based methods.

So what does representativeness mean? I think that many may mean that a "representative sample" should help you estimate well for a finite population. We may think random is best, but that is not necessarily true. As humans, we may first tend to think of randomization. But as noted more than once in the TV series "Numbers," we tend to confuse 'random' with 'uniform.' And here, as Sergio noted, we may mean that the sample has the same distribution as the population, finite or not.

I think that you can't beat having good "auxiliary"/regressor data, and good stratification. But to avoid the appearance of manipulation, having rules for selection are helpful. Some like "balanced" sampling, where you select a sample that has the same mean for its associated regressor (or linear combination of regressors) as the corresponding mean for the population. However, I have found that, looking at this from a "total survey error" point-of-view, it is much more accurate (representative?) for highly-skewed establishment surveys not to do this. A technique that at least one other person and I have dubbed "quasi-cutoff sampling" appears to generally be much better for establishment surveys. But at any rate, the key is good regressor data, and logical stratification. The goal here is to estimate aggregate level information, however, so that the sample is representative in that it will yield good estimates for those aggregate values for the population - not necessarily have the same distribution as that of the population.

Another view of "representativeness" in previous responses to the question above had to do with 'oversampling,' such as that proposed at times by the US Bureau of the Census, and others worldwide, I'm sure, to try to obtain good results for smaller segments of the population. That might be considered unrepresentative in some sense overall, or perhaps more representative, in order to not fail at estimating well for each and every segment of the population. I prefer to think of it as the latter. :-) At any rate, I think that this is yet another legitimate way to think of the word 'representative.'

Cheers -

Jim

Fabrice Clerot

for those who might be interested and do not read French, a short synopsis of the paper i mentionned above :

Section 2 discusses various "definitions" (or attempts at defining) in the litterature and their often circular nature

Section 3 gives a formal definition of a representative sample as follows :

Definition 1

A characteristic of a population of size N is a vector of size N which keeps for this population the values taken by each population unit at a given time (eg. age of each person)

Definition 2

The set of the characteristics of a population of size N is a NxK matrix which keeps for this population the values taken by each population unit for the set of the K characteristics (eg. age, size, CSP, ...)

Definition 3 : representative sample for a characteristic

A sample E composed of n units {u_i} (i in set S) is representative of the characteristic C_k of a population of size N if there is a probabilistic sampling method in E for a unit u_i of E such that the probability law of C_(i,k) (which is the value of this characteristic for u_i, taken randomly in the sample) is equal to the empirical distribution law F_N(C_k) of this characeristic in the population P

Definition 4 representative sample of a finite population

A sample E composed of n units {u_i} (i in set S) is representative of a finite population P if there is a probabilistic sampling method in E for a unit u_i of E such that the joint probability law of (C_(i,1),...,C_(i,K)) for unit u_i taken randomly in the sample is equal to the empirical distribution law of the characteristics in the population P, that is F_{E_1}(C_1,...,C_K) = F_{N}(C_1,...,C_K)

Property 1

The population P is a representative sample of the population P

Property 2

A simple random sampling produces a representative sample of the population P

Property 3

Assume E is a sample of n individuals of a population P of size N and E has been obtained by a probabilistic sampling method ; if there is a probabilistic sampling of u_i in E with P(u_i is in E_1) = 1/N for all i=1,...,N then E is a representative sample of P

// in bold

hence, "in words", a sample is representative if its construction is "equivalent" to the construction of a simple random sample

Property 4

If E is a sample of n individuals of a population P of size N, if E has been obtained by a probabilistic sampling method with known inclusion probabilities and if these probabilities are all greater or equal to 1/N, then E is a representative sample of P

Section 4 shows that the quota method, if and only if each population unit has the same probability to be selected, builds a representative sample according to the definition of section 3

Section 5 deals with a posteriori reweighting and reminds the reader that reweighting a non-representative sample does not lead to much in terms of representativeness

James R Knaub

Chalamalla Srinivas (and everyone) -

I largely agree with your comments. However, let's examine this part of what you included: "In general, random samples provide a good approximation of the population and offer better assurance against sampling bias; thus are more representative than non-probability samples." Often that is true, but there are some drawbacks, even with regard to accuracy. First, for any kind of sampling, there are potential problems if you do not stratify when necessary. Perhaps most relevantly here, if the sample size (even by stratum) is small, then the chances of drawing a 'representative' sample at random can be very unsatisfactory. Depending upon the data distribution, the estimates of variance and bias can also be very inaccurate. But if you have auxiliary data on the population which can be used for modeling, then this will generally solve this problem. Then one might use a sample not necessarily taken at random. (See "balanced sampling," and for highly skewed establishment survey populations, cutoff or quasi-cutoff sampling with prediction and stratification.)

...

The paper at the following link was written by KRW Brewer (Ken Brewer) as a result of his selection about three years ago as the Waksberg Award winner, for survey statistics, and explains the history of this thinking, and concludes with his recommendation as to when to use which method:

Brewer, K.R.W. (2014), “Three controversies in the history of survey sampling,” Survey Methodology,

(December 2013/January 2014), Vol 39, No 2, pp. 249-262. Statistics Canada, Catalogue No. 12-001-X.

http://www.statcan.gc.ca/pub/12-001-x/2013002/article/11883-eng.htm

The key is to have good regressor data available, which often is true, especially for periodically collected official data. (Note that in the example model, he uses the model-based classical ratio estimator (CRE).)

...........

In addition, there is the following by Mike Brick, who was selected about three years ago as the Washington Statistical Society's (a chapter of the ASA) presidential invited speaker on the topic of a variety of types of nonprobability sampling:

J. Michael Brick on Inference from Nonprobability Sampling:

...............

This may be of interest:

"2014 WSS PRESIDENT’S INVITED SEMINAR,"

http://washstat.org/seminars/2014/20140326_brick.html

In that seminar, Mike Brick addressed a committee report on nonprobability sampling of many types:

https://www.researchgate.net/publication/273561892_Summary_Report_of_the_AAPOR_Task_Force_on_Non-probability_Sampling

...............

Also, see:

"The Future of Survey Sampling"

Article in Public Opinion Quarterly 75(5):872-888 · December 2011

Impact Factor: 2.25 · DOI: 10.2307/41345915

https://www.researchgate.net/publication/261967521_The_Future_of_Survey_Sampling

and

"Beyond traditional survey taking: adapting to a changing world

Explorations in Non-Probability Sampling Using the Web"

Proceedings of Statistics Canada Symposium 2014

http://www.statcan.gc.ca/sites/default/files/media/14252-eng.pdf

I have noted those papers elsewhere, and they bear listing again.

Cheers - Jim

Article The Future of Survey Sampling

Article Summary Report of the AAPOR Task Force on Non-probability Sampling

James R Knaub

Adilson, Chalamalla, and all -

Adilson, you stated that "Probability sampling I know, but it [representativeness] seems be different." Yes, they are different. First, representativeness can have varying definitions, but regardless, one can say that probability-based sample selection is just an attempt at finding a representative sample. This would happen, on "average," if you were to repeat your sample selection infinitely many times. But you only select once, and not even bootstrapping can tell you what you did not select. However, it is very often a reasonable and often even best thing to do to hope for 'representativeness.' Further, estimates of variance and bias can be obtained, though they can be very inaccurate themselves, and most bias and much variance often comes from nonsampling error, such as measurement error, which can basically just be modeled anyway. (Probability sampling with a small sample from a population with multiple modes, skewness, or any unusual distributional features, can be particularly prone to substantial failure.)

Chalamalla noted the following -

Bevins, Duke, & Bevins: Representativeness - means that the characteristics of the population and the sample are congruent

This is a good definition, but note that probability sampling may often not come even close to doing this in many examples, from one draw of a sample.

Thus, Adilson, you are correct to recognize that representativeness and probability sampling "seem different." They are. The latter is only an attempt to obtain an approximation of the former characteristic for a sample. Very often it is a good idea. But not always. (See Brewer paper linked previously. He generally liked combining probability of selection methodology with regression modeling.)

Cheers - Jim

Nathalie Picard

First of all, "representative" should always specify with respect to which characteristic(s) (gender, age, education...)

Second, representative sample is not the same as representative sampling.

* The former ensures representativeness EX POST, possibly using weights.You first draw a sample and then compute the weights for each observation in the sample to make it representative, i.e. equalize the (weighted) distribution of the characteristic(s) of interest in the sample and in the population (or minimize the difference between distributions).

* The latter ensures representativeness EX ANTE, but only statistically. You draw your sample in the same distribution as the population, but you may end up, by pure chance on the random draws, with different distributions.

Note that it may be efficient to overweight small categories in the sampling strategy, and then to use weights (small weights on overweighted categories) to compensate and ensure ex post that your sample is representative. If you use the weights in all your statistics and regressions, your reseults will be unbiased, and more efficient (lower variance) than with a representative sampling strategy.

Mothana Ali Khalil

nice information regarding representating samples

Badges
Science topic

Similar topics
Mathematics
Statistics

More Adilson Simonis's questions See All

I am facing this error while using SDSM to predict Rainfall, can someone assist me?

"Sorry - an error has occured - please check all selections and try again. Input past end of file. Please check global program settings.

22 February 2023 7,084 7 View

I'd like to evaluate NADH, NADPH quantities in Saccharomyces cerevisiae, is there any specific kits which I could use?

All kits were for mammalian cells. I understand that the co-factors are the same, but yeast cells will need specific extraction step which can later interfere with kits applicability. Suggestions...

13 September 2018 8,699 1 View

Do you have leafhoppers from Mexico in your collection?

Looking material from Mexican leafhoppers in private collections or academic institutes

16 October 2016 2,734 19 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View