This depends on your capability in terms of how you are able to manipulate things. To make this clear to you, Mahout is a machine learning tool that allows you to execute a number of algorithms, however, mahout was developed to work on top of hadoop, which means as time goes on you may need to write your little codes or manipulate existing codes to address some problems. This means that you will face the challenge of writing mapreduce codes to solve your problem. However, spark has addressed this problem of having to face the headache of writing mapreduce codes always. Spark support lots of machine learning algorithms that you can use and they are easy to implement. Therefore, I will advise you go for it. You can check spark in action by visiting this link:
Like I said, it all depends on the problem you are trying to solve and your capability in handling the tools. You can also use ECL-ML (https://hpccsystems.com/download/free-modules/ecl-ml) to process large volume of data. In addition, do not also forget that WEKA, the popular machine learning tool support distributed data mining. This means that you can use a lot of algorithms on WEKA in distributed mode (http://weka.sourceforge.net/packageMetaData/distributedWekaHadoop/index.html).
I think you can use Apache Hadoop. The essential things in the big data is how to distributing the computing process, and Apache Hadoop is the right framework.
This depends on your capability in terms of how you are able to manipulate things. To make this clear to you, Mahout is a machine learning tool that allows you to execute a number of algorithms, however, mahout was developed to work on top of hadoop, which means as time goes on you may need to write your little codes or manipulate existing codes to address some problems. This means that you will face the challenge of writing mapreduce codes to solve your problem. However, spark has addressed this problem of having to face the headache of writing mapreduce codes always. Spark support lots of machine learning algorithms that you can use and they are easy to implement. Therefore, I will advise you go for it. You can check spark in action by visiting this link:
Like I said, it all depends on the problem you are trying to solve and your capability in handling the tools. You can also use ECL-ML (https://hpccsystems.com/download/free-modules/ecl-ml) to process large volume of data. In addition, do not also forget that WEKA, the popular machine learning tool support distributed data mining. This means that you can use a lot of algorithms on WEKA in distributed mode (http://weka.sourceforge.net/packageMetaData/distributedWekaHadoop/index.html).