Mohamed Mahmoud, not really. General "Area" units derived from some type of GC method using some type of GC detector (none of which are specified, only generalized) only generates results in arbitrary units, w/o context. Measured values are needed.
The analytical technique of gas chromatography is very complex, with many variables. The "area" units obtained from a chromatogram are not necessarily of real compounds, are linear, are concentration vs area dependent and/or may not be representative of any actual concentrations on their own (this would be understood by someone with years of professional training in the technique. A great deal of testing and method development would be needed to get to that point and such a general question as posed, provides no specific details).
Statistics can of course be applied to any data set, but when the data themselves are taken out-of-context, the result obtained are also out-of-context and of poor value. Biplot can aid in pattern recognition of data sets, but when the data are not well defined and arbitrary, the results may be misleading. We see this a lot today with some meta data experiments.
*Reminds me of the old expression; "Statistics do not lie, people do" (meaning the math behind statistics is solid, but people can use statistics incorrectly to "show" things that are invalid or untrue).
I don't totally agree with the answers. Of course the results obtained from an untargeted analysis are less reliable than those obtained from well identified and quantified peaks. But it can still be done if you are confident that the chromatrographic method is consistent and the samples have a similar nature (range of analyte concentrations, extractability, matrix effects...) which is often the case. This can be used as a preliminary study to get knowledge about your system.
When doing this, spend some time studying the loadings of the PCA, determine which peaks have the highest relative variance and explain better the differences between samples. Make sure that they are analytes and not column or solvent peaks etc. And you can even determine which peaks you should focus on and quantify to best describe your samples.
Article GC-MS-based untargeted metabolomics reveals the key volatile...
Jokin using GC 'Area units' alone, as specified by Mohamed Mahmoud, is a sure recipe for failure and demonstrates a lack of knowledge about the technique.
Actual measured amounts, true concentrations measured, (obtained from properly calibrated tables, using valid methods and techniques) could be used (and should be used instead), in context, but not 'area units' which have no reference point.
Area units are arbitrary units. Depending on the sample type, method (column etc), mode of detection and instrument settings are defined, you could collect millions of different "area unit" amounts for the same sample at the same concentration. No point to this at all and something we teach in our entry level classes to students so the meaning of any collected data is understood and not misused.
Let us always place emphasis on good techniques and understanding of the fundamentals.
William Letter that's why I stated that "it can be done if you are confident that the chromatrographic method is consistent and the samples have a similar nature". There are even some normalisation methods that can help minimise the analytical variability such as dividing by the 1-Norm or the 2-Norm, appart from setting the max peak to 1, as most softwares do and I assume Mohamed's data is normalised in that sense. Exploratory methods shouldn't be discarded because the data is variable, as long as you know how to manage the variability and you assess the strenght of the results (or the lack of it).
Never Assume Jokin Ezenarro (#1 rule in science). NO detailed information has been provided at all. Since the question asks about using peak 'area units' only, we know the poster is not familiar with or knowledgeable in the technique used and that is why they are looking for guidance (a good question to ask). Someone without the required knowledge and experience, such as a college student, may ask such questions and even misuse raw data in their work. A sweeping generalization can not be added to try and cover all of the unknowns. You may be assuming one hundred steps have already been taken, but the question posed is in using just the 'area units' (worthless). If the samples are known and measured (as in the article), then general comparisons can be made. This is understood by those with training in the technique(s) used. We do not use 'area units' alone for comparison to data taken from other analysis methods.
William Letter I didn't assume anything, the question was if he can apply PCA on chromatographic area data. The answer is yes if the requirements I described are met or if the data is handled correctly. Answering no is assuming they are not met and it can never be done. I just wanted to offer more insights, I believe this is not the place for this discussion, we both have stated our points of view.
Jokin Ezenarro wrote: "I didn't assume anything"; But then you did assume, didn't you, when you wrote: "and I assume Mohamed's data is normalised in that sense."
There is no reason in chromatography to think (or assume) that someone's data is or was normalized. Please focus on leading them down the path that is most likely to result in them collecting and using valid data. *As a professional scientist I can assure you that peak "Area Units" by themselves should not be used for anything other than comparison to other peaks in the same exact sample analysis run. We do not use those units for anything other than exploring what relationship they may have regarding response vs. actual concentration (an unknown until it is known), relative to the other peaks detected by the same method, on the same exact instrument, in the sample (we do not assume because something shows a large area (peak) that it represents a large amount of the total composition. different samples may have different responses).
By 'we' you mean 'you'. If you don't wanna do any untargeted analysis that's okay but it is a field that exists and provides very useful results when working with complex samples for instance, where you cannot rigurously quantify the analytes of interest.
Article Analytical challenges of untargeted GC-MS-based metabolomics...
Moreover, from the chromatographic point of view, you are saying that peak areas should not be compared between samples, but that's exactly what you do when you quantify the analytes, even if you use an internal standard. You can do that because you assume that they are comparable "relative to the other peaks detected by the same method, on the same exact instrument", and in that case you can apply PCA directly on the areas. Autoscaling prevents giving more importance to the bigger areas.
PS: I believe we are all professional scientists here.
"Please focus on leading them down the path that is most likely to result in them collecting and using valid data." That's exactly what I'm trying to do, instead of saying they can't do anything correctly.
"untargeted analysis" is fine, if the data are valid, but they are not in this case and using them as you suggest is invalid. In any case, now Mohamed Mahmoud has been provided with professional advice that he should not use the raw area % units from his analysis data outside of the one analysis method. To write any more would be repetitive and a waste of Mohamed Mahmoud's time as his question has been answered.
I wish you luck with your classes at school Jokin and thank Mohamed Mahmoud for his patience.