I don't think there is a simple answer existing for this question. May it's better to search for something like "how to reuse data mining and machine learning algorithms implemented for Weka in Apache Hadoop Exosystem". There is sense I can recommend you to take a look at publication "A Parallel Distributed Weka Framework for Big Data Mining using Spark" that describes how Weka was integrated with Apache Spark. Additionally authors published source code on GitHub here - https://github.com/ariskk/distributedWekaSpark