In addition to mean and variance, what statistics can I use to for discrimination of two signals? I want to determine which signals are more similar to each other and which signals are too different
There are many ways you can characterize a time series. Mean and std. deviation are simple ways, but not particularly good ones, and that is because very different signals can have similar means and standard deviations. First, lets talk about a good way to characterize a single time series. Assuming you are talking of a continuous signal, a good summary of the signal is the power spectral density. This method is good if you think you are looking at a fairly stationary signal, ie. if for example your signal is the vibrations of the seat cushion of the bus as it travels down a new, paved highway. However, if the bus goes from highway to a dirt road, the characteristics of the signal would change. In this case, your signal is non-stationary, and the power spectral density would mask any such changes. In that case, you might want to try a wavelet transform, which tells you how much power there is in a given frequency space as a function of time.
Everything up to this point has just considered characterizing a single signal.
Now when you say you want to compare similarity of two signals, you can go with the standard Pearson correlation coefficient. If your time series are very aperiodic, this is a simple thing to calculate, and gives you a simple answer between -1 and 1 -- negative if your signals are kind of "opposite" and positive if your signals are "in phase". A lot of standard routines will also give you a p-value, to give you an idea of how significant that result is.
Now if your signals have a strong periodicity, the correlation coefficient has problems. For example, two identical sine-waves phase shifted 90 degrees from each other has correlation 0, even though they are identical! A better measure would be "coherence", which looks at the Fourier transform of the correlation function, and should be independent of the phase difference of the signals. Here is a nice discussion with some code in Python which might help you along.
You can also start asking questions about whether it is possible that there is some causal link between the two signals, and these are issues I am starting to learn myself. Granger analysis, directed coherence etc. may be the way you want to go, but I think understanding standard coherence is a good first step.
There are many ways you can characterize a time series. Mean and std. deviation are simple ways, but not particularly good ones, and that is because very different signals can have similar means and standard deviations. First, lets talk about a good way to characterize a single time series. Assuming you are talking of a continuous signal, a good summary of the signal is the power spectral density. This method is good if you think you are looking at a fairly stationary signal, ie. if for example your signal is the vibrations of the seat cushion of the bus as it travels down a new, paved highway. However, if the bus goes from highway to a dirt road, the characteristics of the signal would change. In this case, your signal is non-stationary, and the power spectral density would mask any such changes. In that case, you might want to try a wavelet transform, which tells you how much power there is in a given frequency space as a function of time.
Everything up to this point has just considered characterizing a single signal.
Now when you say you want to compare similarity of two signals, you can go with the standard Pearson correlation coefficient. If your time series are very aperiodic, this is a simple thing to calculate, and gives you a simple answer between -1 and 1 -- negative if your signals are kind of "opposite" and positive if your signals are "in phase". A lot of standard routines will also give you a p-value, to give you an idea of how significant that result is.
Now if your signals have a strong periodicity, the correlation coefficient has problems. For example, two identical sine-waves phase shifted 90 degrees from each other has correlation 0, even though they are identical! A better measure would be "coherence", which looks at the Fourier transform of the correlation function, and should be independent of the phase difference of the signals. Here is a nice discussion with some code in Python which might help you along.
You can also start asking questions about whether it is possible that there is some causal link between the two signals, and these are issues I am starting to learn myself. Granger analysis, directed coherence etc. may be the way you want to go, but I think understanding standard coherence is a good first step.
My suggestion is: start with least technical. Calculate the ACF up to lag of roughly 0.3length, and plot both act functions (for the 2 signals) on the same coordinate system, with lag on the x axis and [-1,1] on the Y. This simple calculation often reveals a whole lot, depending on the physical problem.
You can always then proceed to estimating the spectra of the two signals, and again plot the two on the same plane or next to each other.
I f the two signal s are of 2 realization of the same physical phenomenon, try to calculate and plot the distribution of values of the two signals.
You can also explore the variability in variance along the time dimension, that can reveal hidden seasonalities, always important.
Definitely estimate trends in both signals. Start with linear, move on to exponential, and so on.
If you tell us more about the problem and the nature of the two signals, we can probably tell you more specific ideas!
Coherence might be described as a correlation coefficient that varies with frequency.
I don't know your application, but perhaps principal component analysis of the time series might help. You could identify which group of vectors accounts for most of the overall variance in the set.
You can try to find probability distributions of each signals. You can conclude that signals belong to same distribution function may be much closer to each other and signals from different distribution may be much more different from each other. Good luck!
Matched filters and correlators are time-honored ways of carrying out discrimination. Maybe a bit more info on your signal characteristics might allow us to narrow the options a bit. Are you trying to detect a signal in real time or off-line? This will also have a bearing on how you do it.
The answer depends heavily on what is the expectation about these signals, i.e. if they have some sort of structure, and what is expected to be different about them.
For instance, frequency domain techniques could be useful if the signals are expected to have different composition in their component frequencies. I see that some responders have talked about comparing power spectra, measuring coherence, matched filters etc.
Time domain techniques are often simpler, and could be useful if differences exist in the signals in their mean, standard deviation, or higher order moments like skewness and kurtosis. This is not very different from comparing the distributions of the series, which another responder has suggested.
Or, if you know even more about the structure of the signals, you could compare them with models. For instance, it is possible that a certain model, like a periodic one, might fit one signal well, but not the other. These fits could be tested statistically, and give you inference with p-values.
Dear Mitchell Maltenfort, I applied the PCA for dimention reduction. I did not get your idea about PCA application for comparing two signals. May you expalin more!
some responders asked about the nature of signals. Indeed, they are Temperature, Pressure and flowrate of a chemical plant. They are measured in different states of plant. Also, I would like to compare them off line.
Let us assume you are considering a signal-plus-noise representation for the two time series under study. In that case, you can apply the same filter to both time series in order to obtain their respective signals. Then, you can visually compare those signals (or trends) and calculate some descriptive measures of the signals.
By visual inspection of the two signals you may find the answer you are looking for and, if needed, calculate the difference between the signals to see where they are more or less different from each other.
A linear filter of a (discrete) time series can be relatively easily applied, but in order to establish valid comparisons between the two signals the filters should produce the same smoothness of the data. In order to control for the smoothness achieved by the filter, you should employ a procedure like the one proposed by Guerrero (2008).
I hope this suggestion is useful to you. Good luck,
In addition to Fourier series analysis (--> DFT), I would add that via some knowledge of your signals you might decompose them into linear combinations of some clever chosen basis functions (--> vectors). Often these basis functions are chosen to be orthogonal. Now, you are left with comparing the coefficients. These coefficients can again be compared using any measure you want.
Concerning PCA you could look at the subspaces spanned by two signals. For instance, you could compare the angle between two subspaces (I am sorry I cannot provide any reference).
Another possibility, which comes to my mind, would be to look at information theoretic measures like minimum description lengths.
In case you want to compare two series of tupels (i.e., multiple real-valued quantities) you may look into tensor algebra or (hyper-)complex numbers.
and, last but not least, complementary references about some research directions in this topic:
Aguilera A.M., Escabias, M. ,Valderrama M.J. (2006) Using principal components for estimating logistic regression with high-dimensional multicollinear data, Computational Statistics & Data Analysis, 50, 1905-1924
Barker M., Rayens W. (2003) Partial least squares for discrimination. J. of Chemometrics 17:166–173
D. Costanzo, C. Preda , G. Saporta (2006) Anticipated prediction in discriminant analysis on functional data for binary response . In COMPSTAT2006, p. 821-828, Physica-Verlag
Hennig, C., (2000) Identifiability of models for clusterwise linear regression. J. Classification 17, 273–296.
Preda C. , Saporta G. (2005a) PLS regression on a stochastic process, Computational Statistics and Data Analysis, 48, 149-158.
Preda C. , Saporta G. (2005b) Clusterwise PLS regression on a stochastic process, Computational Statistics and Data Analysis, 49, 99-108.
You can use some non-linear similarity metrics to compare two signals in the time domain, such as dynamic time warping and uniform scaling. You can find some methods in http://www.cs.ucr.edu/~eamonn/ or you can read my paper that uses dynamic warping to compare respiratory signals for sleep.
Article Sleep and Wake Classification With Actigraphy and Respirator...
On a fundamental level can also have a look at higher order statistics (moments).
Time series data are usually high dimensional and it may make sense to use regression or some other time-series modelling method to parsimoniously parametrise the time series (polynomial, HMM, AR, MA, ARMA, ARIMA, etc). You can the discriminate in the parameter space based on the statistics of the parameters.
Have a look at the field of speech recognition which requires robust discrimination/classification of time series. You may find some valuable methods there.
In order to be called a signal, the data must manifest a particular type of distribution. Assume that one is presented with two signals, by looking at their distribution, these signals are different if there distributions are different---this is a definite answer. If their distributions are the same—it is inconclusive---by using the Anderson-Darling test.
DIFFERENTIATE BY TREND TESTS
To verify whether each array manifests significant trend can also be another approach. Similar to distribution, this is a negative test: if one shows significant trend and the other does not, then it is conclusive that there are different. However, if both are “trendy”, it is not conclusive. Available trend tests include: (i) Laplace trend test, Military handbook test, and (iii) reverse arrangement test.