What is the best way to parallelize Machine Learning techniques?

More David Veganzones's questions See All

What is the best book to learn Machine Learning?

11 December 2014 4,426 24 View

What are the good references for SOM?

10 November 2014 2,615 13 View

What type of Machine Learning techniques can be used for Bankruptcy Prediction?

09 October 2014 1,903 5 View

Can you suggest a good introduction paper to Extreme Learning Machines?

09 October 2014 2,857 7 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Could dyes amplify the spectrum of light to a specific wavelength?

I am interested to know the behavior of dyes toward light. Specifically, Blue dyes re-emit the spectrum, especially from the green zone (known as principal in LED lamps, and blue dyes are known...

05 August 2024 3,290 1 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Marc Claesen Popular answer

I'm going to assume you want to parallelize an existing algorithm. First you need to consider what kind of parallelisation you want (listed in order of increasing complexity):

use all cores on a single computer
using a GPU (e.g. CUDA or OpenCL)
cluster computing across multiple nodes (e.g. MLlib on Spark, Mahout on Hadoop, ...)

Each approach has advantages, disadvantages and (particularly the last one) high costs. What you need depends on the size of your learning task and what the algorithm of your choice lends itself to.

It is worth noting that not everything can be parallelized efficiently. Some algorithms do, some don't. More information would be useful to better answer your question.

That said, many parallel versions of algorithms use some version of stochastic gradient descent; where the SGD steps for training instances are spread across nodes/cores. You can find more information about that in the following NIPS paper:

Chu, Cheng, et al. "Map-reduce for machine learning on multicore." Advances in neural information processing systems 19 (2007): 281.

I'm a bit surprised that all other answers seem to go for cluster parallelization without considering other options.

http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_725.pdf

Amaury Lendasse

Hadoop with Spark:

https://spark.apache.org

David Veganzones

Thanks!

https://spark.apache.org/examples.html

Oliver Sampson

The answer is: it depends. Some algorithms lend themselves to being parallelized well, some others don't. Some can be done on distributed shared-nothing systems; some can work well with matrix-y GPU adaptations. Some methods like Ensembles beg for paralellization There is no "best way" for Machine Learning. You need to take it on a case-by-case basis.

Chanin Nantasenamat

You might want to do this in Python. The language itself is quite easy to learn and use. The 2 following slides explains their approach in implementing parallelized machine learning in python (based essentially on distributing the cross-validations.

http://www.slideshare.net/ogrisel/strategies-and-tools-for-parallel-machine-learning-in-python

http://zonca.github.io/machine-learning-at-scale-with-python/

The following article describes another approaching using the qjam framework in python:

http://cs229.stanford.edu/proj2010/BatizBenetSlackSparksYahya-ParallelizingMachineLearningAlgorithms.pdf

Sepideh Mansouri

hadoop is good but distributed GraphLab is better because it is design and optimize for machine learning approach. generally method that does not used MapReduce is the best for machine learning algorithms.

Marc Claesen

Dr. Indrajit Mandal

hi friend

you can use MPI for parallelism too.

thanks

Paul Sturgess

Hi,

using the word "parallelize" is a little ambiguous. Under the assumption that you mean for multi-core systems such as modern intel chips, then you may like to check this paper that used the map-reduce algorithm on a wide variety of ml algorithms.

Map-Reduce for Machine Learning on Multicore

http://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf

I agree with Paul