I have got few time series data, to be precise daily closing price of few shares. I want to measure similarity between them. Can you please share your experience on this! Thanks in advance.
Well, that depends on what you want to research and what are your objectives with said comparison. As a historian, usually I'm interested in correlations and oscillating profiles. Also, for me it is important to identify any conjunctural force that may have influenced the relationship between the variables.
Typically, when dealing with time series, there are some steps I follow:
(1) First thing, I plot the series in a double scale graphic, with time as X, and the other two variables as Y, and Y'. I do that to identify if there are visible similarities, and if the similarities are lagged. For instance, you can see if price o variable A is influenced by last day price of variable B. From that i build my hypothesis (if A explain B, if B explain A, or if they are explained by a third factor) ;
(2) I apply an correlation test on the 2 variables (or modified variables, if lag is needed), to see if there is statistical confirmation on my
(3) then I plot the variables in a X-Y distribution graphic. This way I will be able to see if the 2 variables have any hidden relation other then linear. For instance, if they have an exponential or harmonic relation, it will probably not show on my first correlation test. If there is, I modify my variable (transform into log, if there is an exponential relationship between them, for example), and do step 2 again.
(4) using theory as base, I build a regression to try to identify more details of the supposed relationship. Usually, if the other steps where followed, and we are working with only 2 variables (plus time), a least square model (either a linear model, a log-lin, a log-log, an harmonic model, etc.) should be enough.
(5) I usually consider the following statistical informations: Durbin-Watson test for error autocorrelation. Shapiro test for normality. Breusch-pagan test and Goldfed-Quandt for heteroscedasticity. This way I will be able to identify conjunctural forces that may have disturbed the relationships, or even if my model is spurious;
(6) If I suspect that there is conjunctural forces that is altering the relationship between my two variables, I build a model with a dummy to try to identify it. If the dummy model resolve my statistical problems, then it is possible that the conjunctural factors did influenced on how the 2 variables reacted to each other. I look on historical information and at theory to try to build an explanation for what I discovered;
(7) I then use the fit data from my model and residual found from the difference between original data and fit to build an oscillating profile, and then I exam that profile (I usually consider tendency and cyclical behavior).
This is just a basic methodology that I apply when I start to study an subject, but I hope it helps you.
For a preliminary analysis, correlation & covariance may be computed.
To measure similarity between time series data, plot the histograms of your data to examine if they have similar features. If they have roughly similar features, plot their cumulative distribution function to confirm their similarity. Use the wavelet function analysis to find the covariance structure of the wavelets to ascertain similarity in time series data. Hope it helps.
Hello Thanks a lot. I have few uni variate time series and I have used a simple correlation based dissimilarity matrix and then have performed clustering.Id it is not a bother can you please refer me to some primer on how wavelet can be used for similarity analysis. Thanking you in advance!
Thank you so much for your detailed answer Apoena Canuto Cosenza . Not that I could follow everything but this will work as a future reference material for my time series boot strapping.
Your approach seems to be fine. You have computed correlations & then performed a cluster analysis, as I have understood from your answer. If you have used Mahalanobis distances for your clusters, that takes into account the correlations of the entire dataset. But, I don't know you want to ascertain the similarity on the basis of what - price data, return series or volatility. That will be your chosen metric for clustering.
Sometimes, certain stocks tend to move after some other stocks have already shown movement, i.e. there is a time lag. If you have such stocks in dataset , use cross-correlation which measures the correlation of 'X' with lagged or shifted 'Y', which will help bring out the relationship or similarity not evident from simple correlation.
Finally, if these measures do not help bring out the similarity between your chosen data series, use FT if your series is stationary & Wavelet transformation if the series is non-stationary. Wavelet function analysis is a very recent method used in social science (just 8-10 yrs old) though it was used to analyse signals in science field earlier too. It consists of decomposing a time series i.e. breaking a time series into its spectral components - trend, cycle & noise & after denoising looking for relationship or similarity between the dataseries. Hope it helps.