Different authors use different numbers of samples for source apportionment of particulate matter pollution. What is the minimum sample size can be consider to PMF for sources apportionment study in atmospheric particulate matter?
Fine Particulate Matter (PMF) is apportioned for the contribution of various sources to the airborne particulate matter concentrations usually for chemical mass balance (CMB) analysis or in environmental studies to estimate the number and compositions of the sources as well as their contributions to the samples taken at the receptors. In some cases prior knowledge of the composition of emission sources (e.g., CMB) may be determined by employing criterion suggested by Henry et al. (1984):
n > (30 + p + 3)/2
Where, p is the number of chemical components in each sample.
However, in environmental studies, receptor models are used to identify and apportion the contributions of various sources to the airborne particulate matter concentrations. In these studies multivariate data analysis methods that are used to estimate the number and compositions of the sources as well as their contributions to the samples taken at the receptors. Factor analysis (FA) techniques and Principal Components Analysis (PCA) are the common place methods to carry out this analysis. In these analyses, fairly large datasets with reasonable uncertainties are needed in order to produce good results. However, obtaining such datasets is often difficult and expensive. Moreover, it may be required to differentiate source contributions in different seasons or during specific limited duration pollution episodes. In such cases, large data set may not be available. However, as a rule of thumb on the basis of the prior knowledge results of a pilot survey, number of chemicals/elements in each season multiplied by a factor of ten gives an appropriate sample size for the FA and the PCA.
PMF works best with large datasets (n > 100). Look at Zhang, Y., Sheesley, R. J., Bae, M.-S., & Schauer, J. J. (2009). Sensitivity of a molecular marker based positive matrix factorization model to the number of receptor observations. Atmospheric Environment, 43(32), 4951-4958. doi: http://dx.doi.org/10.1016/j.atmosenv.2009.07.009
Can normally aim at 50 filter valid samples for PMF at least. Each sample is analyzed for about 20 elements+10 water soluble ions+ BC, EC/OC (and fractions). More samples would be better though for this statistical/multivariate analysis.
30 is the minimum or the “Magic Number” for Sample Size. The number 30 is bandied about as a sweet spot that should get the job done. Read on to find out why: http://www.jedcampbell.com/?p=262
It depends on the speciated data set analysed, however 50 samples and above can provide good result if there are few uncertainties, missing values as well as less values that are below detection limits.
If I have selected four representative sites(Commercial, residential, Industrial, Bus terminal) in a city (approx.70 sq.km) for determining the source apportionment through PCA. Do i need to have 30 samples from each site or of total 30 samples from the city. ?
30 samples from each site. as you will be doing modelling separately for each individual site.
Can anyone please help with frequency of sampling. I am doing source apportionment study of a particular periodic event and i have taken around 5 to 6 samples during that event. So, when I run the model with my dataset. Do I have to input each months data or only period under consideration. Total number of samples are 30, which were taken over the period of six months. and does PMF automatically leave the outliers? like the specific period of one month the concentration of pollutants is almost double as compared to other 5 months.
It would be very helpful if anyone can provide their expert views regarding this.
Several authors, including EPA (https://www.epa.gov/sites/production/files/2015-02/documents/pmf_5.0_user_guide.pdf), recommend at least 100 samples for PM2.5 data sets. According to Reff et al (2007) (http://www.tandfonline.com/doi/abs/10.1080/10473289.2007.10465319), relative errors increase as sample size declined.