Apache Spark Ecosystem is very good, choice of java, scala or python (via pyspark api). Also if you are considering performance and will be mostly using spark dataframe objects (not just RDD objects) then python performance vs scala is mostly the same (for dataframes) due to the spark catalyst optimizer sending everything into JVM bytecode.