Is there a way to compare two spectra and quantify in some way how close they are associated? I have six samples and want to study the closeness (with quantification). What is the best technique to use? Thanks in advance.
you can compare two spectra by means of the SAM (Spectral Angle Mapper), which actually consists on calculating the correlation coefficient between the two spectra. It corresponds to the cosine of the angle between the spectra, in the spectral space. For calculating the closeness of more than 2 spectra, may I suggest to perform a PCA on them (possibly without centering) and to compute some classical statistics on the scores.
If you want to compare two spectra, you can use for example the Pearson correlation coefficient. It is very common to use Principal Component Analysis when comparing several samples. It is unsupervised. Your data should be represented in a 2D matrix (1 line = 1 sample, 1 column = 1 wavelength)
then you compute the PCA from this matrix choosing a number of principal components (the number needed will depend on your data). and it will split the data in few principal components, were each is a scores value multiplied by the loading of the spectra. The loading is some kind of weighting of the wavelengths based on the covariance, while the score the scaling for each sample. You can plot
PC scores 1 vs. PC scores 2, which gives a nice visual representation of how similar or different are the samples. It is suggested to mean center the data prior to PCA. You can find a PCA function in Matlab, or you can use the software R.
You have to use some statistical methods in order to define the similarity of the spectra. The most common method is principal component analysis (PCA). Using PCA you can create a graph where groups of similar datasets appear.
Everything above sounds good. If you have 2 groups (6 distinct samples or two triplicates ?) you can use Mahalanobis distance rather than euclidian since the former takes into account the dispersion inherent to a particular group (or cluster) of samples.
Keep in mind that the statistical method that you choose to compare spectral data will depend on the main purpose of your comparison. There are different methods and software that you can apply to spectral data but you need to define the precision, heterogeneity, accuracy and main objective of the comparison to select the appropriate tool and minimize error rates.
To better assist you can you please specify the following?
a) What is the technique that you are using to acquire the spectra (this can provide insight of the type of precision, resolution and accuracy expected)
b) What is your matrix (heterogeneity of the material will also play an important role in the decision of the match criteria)
c) What is the objective of your analysis (i.e. comparison of all the 6 samples to each other to see which ones are closer/grouped; or comparison of the 6 samples to another to determine common sources of origin; or comparison of the samples to each other to determine significant differences associated with defects?)?
d) How many replicates (spectra) per sample are you acquiring?
I agree with the above suggestions and particularly with the use of Matlab that is a very powerful matrix programming tool containing a lot of very powerful functions. If you have some aquaitance with simple programming it is very good. I would like to add that sometime comparing spectra or chromatograms it is important to perform an adequate alignment of the signals. There are a lot of literature example with the alignement of the NMR signals e.g. using the COW (Correlation Optimisation Warping) you can found in literature written using Matlab.
I can add that in the case of spectra too similar to be discriminated You can apply strategies mentioned above to derivatives of spectra (for instance second derivatives).
As Jean-Michel, I would go for Spectral Angle Mapper. It is used for classification purposes in remote sensing, but would work well with spectral data of course.
The simple way of analysing spectral data is plotting them on excel sheet. If you have reference value for the thing you are looking at in spectra. You can do Principal Component Analysis, Linear Discriminant Analysis as well as Mahalanobis distance and K nearest neighbourhood (KNN) to check how close are the spectra among each other. To do these analysis you need some kind of software like Unscrambler or WinISI or R or Matlab or xlstat etc which perform multivariate analysis.
With any procedure, it may be very useful to select a range of wavelengths that is related to the properties you want to compare. A specific range will probably be much more effective than the complete spectra. Also, the most relevant measure of distance must be selected. Several are available in Matlab and other programs.
For comparing spectra you should apply some kind of multivariate analysis, like PCA, clustering…. Let me give a piece of advice: use The Unscrambler® software, that offers you a complete set of mathematical pre-treatments of spectra (smoothing, normalization, centering, detrending, derivatives). With this software you can find relevant variations in one data matrix (X) or the relationships between two data matrices (X and Y). Also, it can be used to resolve unknown mixtures by finding the number of pure components and estimating their concentration profiles and spectra or to classify unknown samples into various possible categories.
Without going so far as performing a full PCA or applying complec algorithms (SVM, KNN or the likes), I like to simply calculate a Pearson correlation coefficient. It will then take time to learn what a sfistactory R value means and that will depend of the application of course.
An easy start is to export the spectra in Excel and use the PEARSON preloaded function. Only requirement is that the abscissa of botgh spctra are too be identical (or possibly fit to find a common X axis (waveneumber or other unit). Also very useful is to display the R distribution in a frequency histogram and look at how narrow he distribution here.
A useful reference follows: J. Clin. Microbiol. March 2009 vol. 47 no. 3 652-659
As said above, you can use Correlation or apply a PCA analysis. However, don't forget to apply a baseline correction before or your results will be biased.
I should propose an EMC correction for both baseline and homothecy corrections. somethings like S'=a*S + C where a and c are optimized to get any S as close as possible as Smean
If you have A and B with different molecular mass and UV intensity is different can be explained just by absorption coefficient? assume other factors are constant. which one we expect higher intensity, more molecular mass or it desnot depends on mass and only depends on concentration?
Before performing mathematical/statistical comparison between two or more spectra it is important to evaluate adequate pre-processing of the spectra, for example may be important to perform an alignement of the spactra (e.g. NMR chromatographic etc.
If you compare the specta ,In context of UV-VIS-NIR reflectancespectra ,then you can converted to absorbance , just using Absorbance =- log (10) * Reflectànce, commonly referred as log (1/R), if transmission mode also change absorption mode, if absorption mode not converted . here only six spectra you first preprocessing data 1 St differentiation . afyer words you select some wavelength where data are differentiated .Then you reduced data matrix , afterwords PCA ia the perfect match . Grouping your sample in a exploratory mode, finally you can clustering your data ,
Here you can use another preprocessing MSC & SNV , removing offset your data, then you can explore PCA.
Discrimination analysis also used ,mahanolbos distance, and their related some statistics .
Quantification purpose at first model developed, strength of the model, then validation (cross validation, leave one out , various method), then you can prediction toll PLS tool are used .
The best way seems to use Mel Frequency Cepstrum Coefficients, which is (as far I understand) the state of the art in the field of signal processing to cluster/classify signals, and then discover their similarities.
If you want to use some fancy stuff, you can model you sample as a Gaussian Mixture and then compare the distance between your 2 signals by 1/ the likelihood itself, or 2/ the sampling with a Kullback-Leibler distance (if your mixture is a univariate Gaussian model).
Hi,for example, The goodness-of-fits of the fitted spectrum and the observed spectrum can be evaluated by SI (Similarity). It can be also evaluated by the correlation coefficient (R) or DI(Deviation Index). Good luck!
Focus on your field (visible spectra? remote sensing? etc.) and there are sure to be publications on this, since it's a common problem. Here are a couple publications, and they seem to say "it depends on what matters to you".