Robust Principal Component Analysis?

Abhinandan Malhotra @Abhinandan_Malhotra

07 July 2018 3 3K Report

I went through the algorithm and some papers for the Robust PCA and although i understood it, that a matrix M is composed of a lower rank matrix L and a sparse matrix S. If i'm correct it is the sparse matrix S that can be used to identify the outliers. Now I have two questions from it and any help from RG community would be appreciated:

Question 1: . If the data is 1-d time series, how am i going to convert it into 2-d matrix. I had tried two approaches, one was, lets say i have 7 days of data and each day i collect 24 data points (just saying) so what i did was reshaped my 1-d vector into a 2-d array of shape (# records per day, #days) i.e (24,7) but what it means is 1st column will consist only of 1st day data, 2nd column will contain only second day data and so on.

And the second approach was that i just simply reshape my 1-d time series vector of shape (168,1) into a 2-d matrix of (24,7) but in this case the enteries were spread horizontally i.e the first 7 enteries of 1st day on 1st row, entry 7-14 on row 2, entry 15-21 on row 3 and so on for each day.

However, when i applied the R-PCA algorithm on it, the reconstructed values from 2-d matrix obtained from 2nd approach was what was matching the almost matching original time series (the purpose of PCP), whereas the reconstructed (L+S) values from matrix reshaping from 1st way was almost random and didn't really match the values of the original matrix M. Any idea how and why did this happen?

Question 2: . How would i use the sparse matrix to determine the anomalies. I'm thinking something on the lines of some distance measure, but any insights/references on this would also be welcome.

Ajit kumar Roy

I suggest reading of an interesting article attached

https://statweb.stanford.edu/~candes/papers/RobustPCA.pdf

David Eugene Booth

There is a literature on this problem that you can find by using Google with the term Robust PCA. Best wishes, David Booth

Here is a link:

https://www.google.com/search?q=robust+pca&ie=utf-8&oe=utf-8&client=firefox-b-1-ab

Abhinandan Malhotra

Dear Ajit sir, I had already gone through the paper you attached as the first base paper and although it deals with images my concern has more to do with a time series application or for that matter any 1D vector. Moreover, Mr. David I did apply the algorithm as i already stated in my question. I wanted to understand the different behaviour that it exhibits when a 1D vector is converted into a 2D array. Also i wanted to understand how a distance measure could be incorporated into the algorithm for giving some thresholdvalue to detect outliers. Any help in that direction would be appreciated.

How can I design the contour of a Dual bell Nozzle?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

Is it true that $\det(V(A))$ may be only $\pm 1$, depending on $n$, for the last symmetric tridiagonal matrix $A$?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?