A random forest is basically an ensemble of decision trees. Each tree classifies (often linearly) the dataset using a subset of variables. The number of trees in the forest and the number of variables in the subset are hyper-parameters and must be chosen a-priori. The number of trees is of the order of hundreds, while the subset of variables is quite small compared with the total number of variables.
Random forests also provide a natural way of assessing the importance of input variables (predictors). This is achieved by removing one variable at a time and assessing whether the out-of-bag error changes or not. If it does, the variable is important for the decision.
I used them once long time ago and the description is certainly very superficial.
Leo Breiman is the statistician who introduced random forests and it is worth checking one of his papers.
A random forest is basically an ensemble of decision trees. Each tree classifies (often linearly) the dataset using a subset of variables. The number of trees in the forest and the number of variables in the subset are hyper-parameters and must be chosen a-priori. The number of trees is of the order of hundreds, while the subset of variables is quite small compared with the total number of variables.
Random forests also provide a natural way of assessing the importance of input variables (predictors). This is achieved by removing one variable at a time and assessing whether the out-of-bag error changes or not. If it does, the variable is important for the decision.
I used them once long time ago and the description is certainly very superficial.
Leo Breiman is the statistician who introduced random forests and it is worth checking one of his papers.
A random forest is an ensemble of unpruned decision trees. Each tree is built from a random subset of the training dataset. In each decision tree model, a random subset of the available variables is used to choose how to partition the dataset at each node. The resulting tree models of the forest represent the final ensemble model . Unlike boosting where the base models are trained and combined using a weighting scheme, the trees are trained independently and the predictions of the trees are combined . To classify a new object, put the object input vector down each of the trees in the forest, every classifier records a vote for the class to which it belongs and the object is labeled as a member of the class with the most votes. The randomness introduced by the random forest model in selecting the dataset and the variables gives robustness to noise and delivers substantial computational efficiencies. Also, very little, if any, preprocessing of the data needs to be performed. The need of the variable selection is avoided since the algorithm effectively does its own
I would like to quote Albert Einstein..."The only source of knowledge is experience."...Please construct your own knowledge by first searching for it in the deep sea of the internet. Later ask for guide in very specific issues, that is what an advisor does.....I you feel offended .. i apologize in advanced...