Actually, I am trying to solve a highly non-linear problem regarding energy systems. There are 8~14 variables are involved to predict the energy of the system under uncertain 3~4 variables. All the variables are floating-point numbers.
Question is not clear. Any how, there is no limit of input data, but the input data to SVM, should be a finite number and non-complex data. For floating point data it is possible for prediction.
From what I can understand regarding the background of your problem, you might have less training sample, and some features are not reliable. Regarding the size of the dataset, there is no rule of thumb in particular. Your model will need as much as training examples so that your model can learn the underlying pattern and generalize it on the test set. I would recommend the following steps if it helps:
1) Augment your data set by using data augmentation techniques. This will help increase your training size. However, make sure not to augment your dataset before splitting into test and train. Augment your training set.
2) Use feature selection techniques to select useful and reliable features. You can use supervised and unsupervised feature selection techniques.
3) Finally, you can consider using other models like decision tree regressor, or even you can use boosting algorithms such as AdaBoost or XGBoost.
Kinza Qadeer the basic rule is to have at least 10 and preferably 50 samples per variable. In your case, if you are using 14 independent variables, you may need at least 140 but preferably 700 samples. I would suggest maintaining this ratio when you split the sample for training and testing. Check the model performance metrics for both training and testing and see if the model is accurate and robust. CV will help mitigate many problems so try doing that. Good luck!
There is no rule of the thumb as such, need to see the metrics to evaluate the performance of your model, infact you may need to tinker between different algorithms to see which one fits your data best.
Specifically, there is no upper limit of the data requirement based on the algorithm. However, as this is a data-driven algorithm and you have almost 14 variables, the algorithm will definitely perform poorly if you go lower samples (for example say < 2000). Being said that, if you don't have enough samples, I would recommend you to do PCA, SVD (for dimension reduction to find the most significant one) or some other type of dimension reduction techniques.
Agree with the above expert's replays by Radha MOHAN Pattanayak , Rajdeep Kumar Nath, Mohammed Ashikur Rahman, Sushant K. Singh , Shahir Asfahan and Shibaji Chakrabarty . I would like to extend some more suggestions: https://www.sciencedirect.com/topics/nursing-and-health-professions/support-vector-machine Article Using Support Vector Machines for Survey Research
Do you have a reference or a paper that mentioned this rule ?
i have heard this rule before but i don't know in what paper or article i can find it written . I really need to mention this information in my thesis because my data set is small .