The training of an AdaBoost ensemble in its purest form can be hardly parallelized. The problem is that the weightings of the examples in the next iteration of the algorithm depends on the performances of the previous iteration. This implies that a new iteration cannot be started before the previous one finishes. On the contrary, the classification through an AdaBoost ensemble is easily parallelized: each weak classifier can work independently from the others and thus it can be executed on its own thread or remotely. I've skimmed through the paper suggested by Stephane Genaud and indeed it seems they the authors experiment with a variation on the algorithm, not with its original formulation (they create a number of workers and a master, the master will build the final ensemble, the workers train a number of classifiers and return the best to the master).
In order to parallelize an algorithm, you need to understand the nature of the algorithm and see what operations are dependent and what are independent. Identifying the data parallelism, control parallelism and temporal parallelism in the execution of the algorithm will help you in understanding the different ways in which you can parallelize the algorithm. The "best" way will depend on what machine you are targeting. Please follow the text books on parallel programming to help you further. There may also be publications on parallelizing classifiers that might help you get started. Wish you all the best.
I am not familiar with Adaboos algorithm. However, if you want a practical way to parallelize an algorithm efficiently, you have to fully understand the nature of a parallel algorithm and different way of implications. The easier way is to use Parallel function in common libraries that are usually implemented good.
I am not familiar with Adaboos algorithm. However, if you want a practical way to parallelize an algorithm efficiently, you have to fully understand the nature of a parallel algorithm and different way of implications. The easier way is to use Parallel function in common libraries that are usually implemented good.
I am not familiar with Adaboos algorithm. However, if you want a practical way to parallelize an algorithm efficiently, you have to fully understand the nature of a parallel algorithm and different way of implications. The easier way is to use Parallel function in common libraries that are usually implemented good.
The parallelizaton scheme is based on the distribution of classifiers over the processors. A master owns all classifiers and distributes them to slaves who computes the classifiers they have been assigned using a local copy of the training dataset.
The training of an AdaBoost ensemble in its purest form can be hardly parallelized. The problem is that the weightings of the examples in the next iteration of the algorithm depends on the performances of the previous iteration. This implies that a new iteration cannot be started before the previous one finishes. On the contrary, the classification through an AdaBoost ensemble is easily parallelized: each weak classifier can work independently from the others and thus it can be executed on its own thread or remotely. I've skimmed through the paper suggested by Stephane Genaud and indeed it seems they the authors experiment with a variation on the algorithm, not with its original formulation (they create a number of workers and a master, the master will build the final ensemble, the workers train a number of classifiers and return the best to the master).