When developing my machine learning regression model, how do I train the model on one dataset and test it on another dataset?

More Ayush Agrawal's questions See All

How to convert an audio file (.mp3 or .wav or any other) to an unique audio id using Python?

I aim to take a raw audio file as input. And my final objective is to convert that audio into an, unique for each audio, ID. Is there any python library to do it? One which takes audio as an...

30 September 2020 2,995 2 View

How do I make something similar to spotify codes?

Basically I want to create something similar to Spotify codes i.e. making scannable images linking to unique songs. But I am facing trouble in generating unique ids for each song's name as input,...

17 September 2020 6,663 5 View

How to generate Unique id for unique string input of different lengths?

I am actually working upon a project where I need to generate a unique id for unique string as input given. I have read about UUID's but the problem is I want a FIXED but unique id for a specific...

17 September 2020 6,747 7 View

How to iterate through a dataset while performing a specific function with the aim to get the corresponding index as answer?

The dataset in question is basically Olympics medal tally. The dataframe has the below mentioned columns with the Name of the country as Index. It is a (146, 16) dataset. #Summer Gold...

01 September 2020 680 3 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Dear fellow researchers, I am currently working on a paper where I need to provide a reliable reference that defines and distinguishes between 3D mesh models and 3D city models. Although I am...

06 August 2024 9,986 2 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

I am working on Abaqus/Explicit(Quasistatic ) for the deformation of the auxetic structure model. Please explain how the plastic input value should be considered from the true stress-strain curve...

05 August 2024 454 3 View

What are the shear and normal stiffness values of an LLDPE liner in 3D numerical modeling of a stockpile?

I am seeking experimental or applicable data for the liner (LLDPE) interface in FLAC3D numerical modeling of a large stockpile. Could you please recommend suitable references? The preferred data...

05 August 2024 3,665 0 View

David Morse Popular answer

Hello Ayush,

There are several ways to proceed to accomplish your goal (and some software packages may automate this step whereas others do not). Here's one way that would work regardless of the software:

1. Develop your regression model on the training set.

2. Now, upload the test set into your software. Use the final regression model from your training set to compute estimated/predicted values for the DV for the test set. For example, if the final equation from step 1 was Y-est = 2.44 + 0.35*IV1 + 1.27*IV2 - 0.69*IV3, then use those specific coefficients to compute estimated values for each test case.

3. To determine the performance of the model on the test set, compute:

a. r^2(Y, Y-est) across the test cases--an ordinary squared Pearson correlation of the actual and estimated values on the DV. This can be compared directly to the multiple R-squared from the training set. If the values are comparable, then the model works about equally well when applied to new or different (the test) data.

b. To quantify the magnitude of discrepancy, compute the residual for each test case: e = Y - Yestimate. The SD or variance of these values may be compared to the standard error of estimate (or variance error, MSE, respectively) from the training set.

Good luck with your work.

David Morse

Daniel Wright

This is covered in many online textbooks, e.g., https://web.stanford.edu/~hastie/ElemStatLearn/?ref=mentorcruise .

Sana Noreen

Radha MOHAN Pattanayak

I do agree with the comments of David Morse, but getting confusion on the comments of Glenn Wayne Jones. In the question it is clearly mentioned that he will use one data set for training and another one is used for testing, so what is the requirement of applying folding. In my concern k-fold procedure is not required here unless until both training and testing are not performed on the single data set.

Thanks

Thaer Dawood Salman

Interesting question that provoked several useful information.

Hi Dr. Ayush Agrawal , Interesting question... Congratulations.

David Eugene Booth

Radha MOHAN Pattanayak Most builders of predictive models in fact suggest a min 10 fold. Best, D Booth