This is a question regarding PySpark Error on Jupyter Notebook (Py4JJavaError)
I'm running the demo code from https://spark.apache.org/docs/2.2.0/mllib-linear-methods.html
regarding Linear least squares, Lasso, and ridge regression, using Jupyter Notebook running on Python [conda env:python2]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
from pyspark import SparkContext
sc =SparkContext()
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel
# Load and parse the data
def parsePoint(line):
values = [float(x) for x in line.replace(',', ' ').split(' ')]
return LabeledPoint(values[0], values[1:])
data = sc.textFile("data/mllib/ridge-data/lpsa.data")
parsedData = data.map(parsePoint)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The piece above runs fine, However, when I run the code below:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
parsedData.map(lambda lp: lp.features).mean()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
an error occured:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I don't understand what this means.
P.S. I've attached the files below.