Is this Methodology Viable for an academic research paper?

Benjamin Joseph Herrera @Benjamin-Joseph-Herrera

28 June 2021 5 3K Report

I have worked on a Machine Learning model that accurately predicts rice production. However, I am worried that the methodology that I have taken may not be on par with Journal of Machine Learning Research's (JMLR) standards.

***

Here's the methodology that I came up with to make my machine learning model:

1) Dataset Creation

Two different datasets were created: Main and Variation. The Main dataset contains all of the following variables, while the Variation dataset contains all except the Quarter variable. This was done to lower input complexity in hope of curving overfitting during training. We deemed all other variables necessary for research. Thus, we could not drop the other variables.

For features, it contains:

- Area harvested (Hectares)

- Quarter (Q1, Q2, Q3, Q4)

- The Region (Region 9, Region 10, etc.)

- Rice Field System (Rainfed or Irrigated)

- El Nino Monthly Average SST --- Six Month Span (Millimeters)

- Monthly Average Rainfall --- Six Month Span (Millimeters)

For labels, it contains:

- Rice Harvested (Metric Tons)

Overall, 1584 samples were formed during this step. Do note that none of the variables are not detrended in anyway.

2) Model Architecture Formation

32 different models were formulated with different machine learning techniques. Half of the 32 models employed ELU as their activation function, while the other half employed ReLU. Half of the 32 models utilized the Main dataset, while the other half utilized the Variation dataset. Half of the 32 models employed Batch Normalization after each hidden layer, while the other half did not. All of the models also used the following techniques:

- L2 Regularization

- Dropout after each hidden layer (25% Dropout)

3) Model Training (1st Phase)

Once the models are formulated, each model is trained for 400,000 epochs. Below are the hyperparameters set for training:

- Number of Epochs: 400,000

- Optimizer: ADAM

- Learning Rate: 0.0001

- Validation Split: 20%

- 20% Validation

- 80% Training

- Batch Size: 1024 samples

- Data Normalization Technique: MinMaxScaler (0 to 10)

- Loss Function: Squared Mean Error

Do note that the following techniques were not employed during training or during model architecture formation (Step 2):

- Bayesian Hyperparameter Optimization

- Weight Initialization

- LSTMs

4) Selection and Further Training (2nd Phase)

Once all 32 models have went through 400,000 epochs of training, three model architectures will be selected for further training. These models are to be selected based on their deep "steepness" of their validation curve. Further training is conducted by getting the model's 100th epoch's weights and training from that point on. The dataset split is as follows:

- 70% Training

- 15% Validation

- 15% Testing

All other training hyperparameters were not changed during this part of the methodology. Still note that the techniques that were not used (explained above) are still not used in this part of the methodology.

5) Model Selection

The model with the lowest loss score will be presented as the final product for this paper. Analysis of the performance of this model will be conducted by using the Testing portion of the dataset (15% portion of the dataset).

***

If this methodology seems childish, it is because I started this project during the start of senior year and I had no prior experience in academic research other than chemistry or biology labs.

If you have anything to provide, please be as critical as you can. I am trying to make sure that my manuscript doesn't get rejected upon submission. Thank you!

Benyamin Evazzadeh

Book Fundamentals of Research Methodology: Problems and Prospects

Chapter Chapter 3 from Designing and managing your research project:...

please check these

Muhammad Ali

The scheme is okay. Have a work with good originality and confidence. Consequently, all the outcomes will be successful.

Syed Alamdar Ali Shah

Just don't share your methodology before executing it!

I don't know what's wrong with modern day Researchers? You just cannot imagine the amount of learning in failures and then being successful! This is the prime purpose of re-search, meaning search again! If you do not have a heart to work on the same thing over and over again to improve it and develop it correctly, rest assured that you are not a Researcher!

Natalie Juane Walthrust Jones

I am not sure about the required standards of the stated Journal, but, surely you should put your methodology to a test. As Syed said do not share it until executed. You must do a pilot study wherever possible.

Your start is okay. In a research imitative process, the first step is to submit your findings report to some relevant conference, upon its successes make it more mature for some relevant with further contribution & extension for submitting to some relevant journals.

How to quantify the fluorescence intensity of only the merge area of an image using ImageJ software?

How can I avoid uneven IHC staining on free floating tissue from frozen human CNS sections (cryostat 50um thickness)?

Does anyone knows about any protandrous fish inhabiting the Mediterranean Sea, smaller than 20 cm (adult) and that change sex relatively quickly?

¿Que enfoque metodológico me recomiendan para abordar los gremios tróficos en un área natural de BST?

Installing Climate Data Operators?

Very urgent. Where we can get some of these plant species (details see description box) in Nigeria?

How can I change the coordinate system in Abaqus using Python?

How do I publish my research work here?

I design a unit cell metamatarial but the array structure does not behave like the unit cell, what could be the possible problem?

Do you recommend any transportation software for simulating crowd-based transportation systems like crowdshipping?

Feedback defines the constitution of an organism?

Is this a facetotecta nauplius?

May members post flyers about opportunities to present at a conference? If so, where to post?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

How to report results of Generalised Linear Mixed Models in a journal article?