PySpark Error on Jupyter Notebook (Py4JJavaError)?

02 February 2018 0 9K Report

This is a question regarding PySpark Error on Jupyter Notebook (Py4JJavaError)

I'm running the demo code from https://spark.apache.org/docs/2.2.0/mllib-linear-methods.html

regarding Linear least squares, Lasso, and ridge regression, using Jupyter Notebook running on Python [conda env:python2]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

from pyspark import SparkContext

sc =SparkContext()

from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel

# Load and parse the data

def parsePoint(line):

values = [float(x) for x in line.replace(',', ' ').split(' ')]

return LabeledPoint(values[0], values[1:])

data = sc.textFile("data/mllib/ridge-data/lpsa.data")

parsedData = data.map(parsePoint)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The piece above runs fine, However, when I run the code below:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

parsedData.map(lambda lp: lp.features).mean()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

an error occured:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I don't understand what this means.

P.S. I've attached the files below.

Badges
Science topic

More Chenying Gao's questions See All

How to set up LSTM for Time Series Forecasting?

Hi All, My question is about modeling time series using LSTM (Long-Short-Term-Memory). I have 18 response variables for which all of them are monthly time series for about 15 years, and I would...

06 July 2018 6,901 1 View

How to add predictors to LSTM time series model?

Hi All, I'm doing Long-Short-Term-Memory (LSTM) to forecast time series. I was wondering, could we add x-reg part to an LSTM model? Like adding an X-Reg part to and ARIMA model? Say I have a...

06 July 2018 4,984 0 View

R String, return a boolean vector of if or not the character is alphabetic?

Hi All, I was wondering if there is a way I can do something like this: str = "3 .*MTNs2 AB33" This is a string of 14 characters I would like to have R return a vector of length 14, and true...

05 June 2018 6,291 1 View

R fitting arima, Error in optim, non-finite value supplied by optim?

Hi All, I'm fitting an arima with xreg: arima(temp_ts, order = c(1,1,1), seasonal = list(order = c(0,1,0), period = 12), xreg = temp_xreg, method="CSS") and it reported error: Error in...

03 April 2018 7,533 2 View

R Factor Analysis with huge amount of predictors resulting in system is computationally singular?

Hi All, I was trying to run Factor analysis for a dataset with around 150 variables but only have around 80 observations. I tried the "factanal" function in R and R reported error: Error in...

02 March 2018 4,325 3 View

Python stepwise regression with AIC?

Hi, what is the Python equivalent for R step() function of stepwise regression with AIC as criteria? Is there an existing function in statsmodels.api?

02 March 2018 578 1 View

R Logistic Regression Coef sensitive to epsilon = 1e-8. Coefficient not converge?

I'm running a logistic regression with about 200k observations, in which there is one binary predictor where out of the 200k observations there is only 4 occurrence of "1". And the coefficient I...

10 November 2017 2,892 2 View

Logistic Regression detecting over-dispersion and the impact?

Hi All, I'm fitting a logistic regression and I summed up the deviance and used a chi-square distribution to test over-dispersion with this code below: 2 Questions: Q1: I'm getting a p-value

10 November 2017 5,347 2 View

How to test "constant" over time?

How to test a Time Series to be "Constant" over time? I can think of: 1. No Trend (use linear regression and kernel regression) 2. No Seasonality (? Spectrum analysis with T = 2*pi /...

01 February 2017 2,349 7 View

What is the Fastest Neural Network Command in R?

Dear All, I'm running Neural Network on a data frame with 40,000 observations, 7500 predictors and with one response variables. The response variable is a categorical variable with 4 levels. I've...

05 June 2016 5,904 3 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Request Python code?

Request Python code from this article : Gender equity of authorship in pulmonary medicine over the past decade. THANKS!

08 August 2024 6,242 2 View

Why does everyone use vs code?

Visual Studio Code (VS Code) has become a popular choice among developers for several reasons: 1. **Free and Open Source**: VS Code is free to use and open source, making it accessible to...

07 August 2024 7,013 4 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

AUX gas reading problem on QE with full MS and PRM method in one run?

Dear QE-users, In the method where full MS positive mode and PRM mode are used, we always get an incorrect auxiliary gas reading (41 instead of 25). This only happens in this method; other...

06 August 2024 4,953 0 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

People weight in Oaxaca Blinder Decomposition on R?

Hello guys! Do you have experience running a Oaxaca-Blinder decomposition on R applying person weights. How do you suggest doing it? I have a variable PERWT which gives more information on how...

04 August 2024 6,033 0 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View