Does anyone have experience with rotation forest algorithm?

Xiaomeng Wu Popular answer

I did not carefully read the whole paper, but maybe I can answer some of your questions.

How and how many classes should be eliminated?

I did not find explanations from the paper on this issue. A possible solution is to fix the number of classes to be selected, say between 50% and 75% of the whole classes. Then perform random class selection. The authors mentioned that "running PCA on a subset of classes instead on the whole set is done in a bid to avoid identical coefficients if the same feature subset is chosen for different classifiers." So basically, if you can ensure that you can avoid occasionally identical selection of features across the L classifiers, then you do not need to perform class selection, i.e. you can use the whole classes. Actually, even if you could not avoid identical feature selection, I personally do not think it is a big issue.

In every iteration a new random subset has to be selected?

The answer is yes, but as discussed above, I do not think it is a must.

What if it is a two-class data set?

Then obviously, you do not need to (cannot) perform class selection.

Does it make sense to select a bootstrap, centering the data to do PCA and then to generate scores using the rearranged rotation matrix on the whole data?

Yes, I think it makes sense. The idea is quite similar to random forest in the sense that both select a subset of features (feature dimensions if my understanding is correct) to introduce variation and thus avoid correlation between the L classifiers. The difference lies in the fact that rotation forest casts an additional PCA to the selected features. If we can consider feature selection itself as a simple feature projection process, then both random forest and rotation forest project the original feature onto a feature subspace, where rotation forest also translates (covariance-PCA), scales (correlation-PCA) and rotates the subspace. Rotation forest separates the features into subsets, and performs translation, scaling, and rotation to the subset features for both training and testing instances. It is similar to classification based on multiple types of features, where we perform PCA to each individual feature type, concatenate the projected feature vectors to obtain a single vector, and then use the single vector for training and testing. So yes, it makes sense to me.

Xiaomeng Wu

I did not carefully read the whole paper, but maybe I can answer some of your questions.

How and how many classes should be eliminated?

In every iteration a new random subset has to be selected?

The answer is yes, but as discussed above, I do not think it is a must.

What if it is a two-class data set?

Then obviously, you do not need to (cannot) perform class selection.

Does it make sense to select a bootstrap, centering the data to do PCA and then to generate scores using the rearranged rotation matrix on the whole data?

Marcelo Bacher

Great explanation! Thanks!

Marcelo Bacher

I have an additional question: I suppose that the data has 12 features (1,2...,12) and I divide them aleatory into 3 sebsets of 4 features each one without overlapping. Then, the rotation matrix for a single drawn R(i) would be:

4x4 0x0 0x0

0x0 4x4 0x0

0x0 0x0 4x4

where 4x4 means the matrix of principal components whose elements are 4 eigenvectors and 0x0 a null matrix with dimension 4x4. In case I randomly select the following features sets: (5,6,7,8) (1,2,3,4) and (9,10,11,14), the rotation matrix Ra(i) would be after rearranging the columns:

0x0 4x4 0x0

4x4 0x0 0x0

0x0 0x0 4x4

The training data Xi = X*Ra'(i) where Ra'(i) is the transpose of Ra(i)

Is this correct?

Are known methods for ensemble of One-Class SVM that do not lead to overfitting?

Can anyone recommend approaches to compute of conditional entropy on real-valued multivariate data?

R Program for Mac (OS Leopard)

(Multivariate) Time-series analysis. Can someone recommend me introductinary, intermediate and advanced text books?

How can I deal with non-measured data in a multivariate measurement?

Could you recommend some articles on Urban Transportation System optimization and Innovation?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Is there a problem with my RNA pellet?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

RNA Extraction Using Hot Borate Method No Longer Working?