I went through the algorithm and some papers for the Robust PCA and although i understood it, that a matrix M is composed of a lower rank matrix L and a sparse matrix S. If i'm correct it is the sparse matrix S that can be used to identify the outliers. Now I have two questions from it and any help from RG community would be appreciated:
Question 1: . If the data is 1-d time series, how am i going to convert it into 2-d matrix. I had tried two approaches, one was, lets say i have 7 days of data and each day i collect 24 data points (just saying) so what i did was reshaped my 1-d vector into a 2-d array of shape (# records per day, #days) i.e (24,7) but what it means is 1st column will consist only of 1st day data, 2nd column will contain only second day data and so on.
And the second approach was that i just simply reshape my 1-d time series vector of shape (168,1) into a 2-d matrix of (24,7) but in this case the enteries were spread horizontally i.e the first 7 enteries of 1st day on 1st row, entry 7-14 on row 2, entry 15-21 on row 3 and so on for each day.
However, when i applied the R-PCA algorithm on it, the reconstructed values from 2-d matrix obtained from 2nd approach was what was matching the almost matching original time series (the purpose of PCP), whereas the reconstructed (L+S) values from matrix reshaping from 1st way was almost random and didn't really match the values of the original matrix M. Any idea how and why did this happen?
Question 2: . How would i use the sparse matrix to determine the anomalies. I'm thinking something on the lines of some distance measure, but any insights/references on this would also be welcome.