This depends on the method you want to use. I could recommend Apache Spark for most big data projects. As far as I remember it is set up upon a Hadoop backend.
This might help to get an idea and an oveview of scling options. However, as has been already answered, it depends on your type of analysis and the type of data (e.g., image, video etc.):
Article A survey of methods for distributed machine learning