I'm going to assume you want to parallelize an existing algorithm. First you need to consider what kind of parallelisation you want (listed in order of increasing complexity):
use all cores on a single computer
using a GPU (e.g. CUDA or OpenCL)
cluster computing across multiple nodes (e.g. MLlib on Spark, Mahout on Hadoop, ...)
Each approach has advantages, disadvantages and (particularly the last one) high costs. What you need depends on the size of your learning task and what the algorithm of your choice lends itself to.
It is worth noting that not everything can be parallelized efficiently. Some algorithms do, some don't. More information would be useful to better answer your question.
That said, many parallel versions of algorithms use some version of stochastic gradient descent; where the SGD steps for training instances are spread across nodes/cores. You can find more information about that in the following NIPS paper:
Chu, Cheng, et al. "Map-reduce for machine learning on multicore." Advances in neural information processing systems 19 (2007): 281.
I'm a bit surprised that all other answers seem to go for cluster parallelization without considering other options.
The answer is: it depends. Some algorithms lend themselves to being parallelized well, some others don't. Some can be done on distributed shared-nothing systems; some can work well with matrix-y GPU adaptations. Some methods like Ensembles beg for paralellization There is no "best way" for Machine Learning. You need to take it on a case-by-case basis.
You might want to do this in Python. The language itself is quite easy to learn and use. The 2 following slides explains their approach in implementing parallelized machine learning in python (based essentially on distributing the cross-validations.
hadoop is good but distributed GraphLab is better because it is design and optimize for machine learning approach. generally method that does not used MapReduce is the best for machine learning algorithms.
I'm going to assume you want to parallelize an existing algorithm. First you need to consider what kind of parallelisation you want (listed in order of increasing complexity):
use all cores on a single computer
using a GPU (e.g. CUDA or OpenCL)
cluster computing across multiple nodes (e.g. MLlib on Spark, Mahout on Hadoop, ...)
Each approach has advantages, disadvantages and (particularly the last one) high costs. What you need depends on the size of your learning task and what the algorithm of your choice lends itself to.
It is worth noting that not everything can be parallelized efficiently. Some algorithms do, some don't. More information would be useful to better answer your question.
That said, many parallel versions of algorithms use some version of stochastic gradient descent; where the SGD steps for training instances are spread across nodes/cores. You can find more information about that in the following NIPS paper:
Chu, Cheng, et al. "Map-reduce for machine learning on multicore." Advances in neural information processing systems 19 (2007): 281.
I'm a bit surprised that all other answers seem to go for cluster parallelization without considering other options.
using the word "parallelize" is a little ambiguous. Under the assumption that you mean for multi-core systems such as modern intel chips, then you may like to check this paper that used the map-reduce algorithm on a wide variety of ml algorithms.