Hello everyone and thank you for reading my question.

I have a data set that have around 2000 data point. It have 5 inputs (4 wells rate and the 5th is the time) and 2 ouputs ( oil cumulative and water cumulative). See the attached image.

I want to build a Proxy model to simualte the cumulative oil & water.

I have made 5 models ( ANN, Extrem Gradient Boost, Gradient Boost, Randam forest, SVM) and i have used GridSearch to tune the hyper parameters and the results for training the models are good. Of course I have spilited the training data set to training, test and validation sets.

So I have another data that I haven't include in either of the train,test and validation sets and when I use the models to predict the output for this data set the models results are bad ( failed to predict).

I think the problem lies in the data itself because the only input parameter that changes are the (days) parameter while the other remains constant.

But the problem is I can't remove the well rate or join them into a single variable because after the Proxy model has been made I want to optimize the well rates to maximize oil and minimize water cumulative respectively.

Is there a solution to suchlike issue?

More Mohammed Qassim's questions See All
Similar questions and discussions