02 February 2018 0 8K Report

This is a question regarding PySpark Error on Jupyter Notebook (Py4JJavaError)

I'm running the demo code from https://spark.apache.org/docs/2.2.0/mllib-linear-methods.html

regarding Linear least squares, Lasso, and ridge regression, using Jupyter Notebook running on Python [conda env:python2]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

from pyspark import SparkContext

sc =SparkContext()

from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel

# Load and parse the data

def parsePoint(line):

values = [float(x) for x in line.replace(',', ' ').split(' ')]

return LabeledPoint(values[0], values[1:])

data = sc.textFile("data/mllib/ridge-data/lpsa.data")

parsedData = data.map(parsePoint)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The piece above runs fine, However, when I run the code below:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

parsedData.map(lambda lp: lp.features).mean()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

an error occured:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I don't understand what this means.

P.S. I've attached the files below.

More Chenying Gao's questions See All
Similar questions and discussions