How can I train my dataset with SVM quickly?

More Deepak Paudel's questions See All

Where do I find the Nepali root words collection?

I am doing research on Nepali Natural Language Processing. The main problem is the dataset. I could not find any useful datasets for that. The dataset or tagset found is either small or not in...

11 December 2019 7,592 3 View

Can I implement Principal Component Analysis in categorical data?

I find difficult to implement Principal Component Analysis (PCA )in categorical data. Is it good to apply PCA in categorical data? Are there any alternatives to PCA in my case?

03 April 2016 3,754 12 View

What might be the problems that I may face when I am testing the Network Intrusion Detection System that I built?

I am currently occupied on the research of my final year project i.e. Network Intrusion Detection System. I am planning to implement machine learning in the project. After the system is trained...

11 December 2015 10,015 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Lior Shamir

What kernel are you using? I believe a linear kernel would be the fastest. In any case, you have a very low dimensionality (two) so I don't see why it should be slow.

Shailesh Chaudhari

Based on dimension and feature vector size you can select appropriate kernel function of SVM.

Muhammad Farooq

Using SVM with linear kernel will probably be the fastest given the dimensions of your data.

Santanu Ghorai

Hi Deepak,

I don't know which SVM program you are using. LibSVM toolbox is fast to train this type of two dimensional data set. If not you may try some other methods as stated below.

SVM with linear kernel may be fast but may not provide good classification results if the data set is non-linearly separable. In that case nonlinear kernel may be used with other tricks, like reduced kernel techniques. In this method, 10~15% of the training data are selected randomly to form the basis of the kernel matrix. As the kernel size is reduced it will take less time to compute the kernel matrix and hence less time for training. It is proved statistically that the information contained in the reduced kernel is almost same to that of the full kernel matrix. Classification results also identical.

Minh-Tien Nguyen

You should first scale your data, then you linear kernel.

Other, you can you SVM in scikit-learn, written in Python. I think it has multi-threading.

Ingo Steinwart

whether a linear kernel may have a chance to work in your case can be easily seen by a 2d-plot: plot classes with different colors and if it looks like that most data can be separated by a linear line, a linear kernel is good for you.

if not, you should use a nonlinear kernel such as the gaussian. here you need to optimize two parameters of the svm which is usually time-consuming. a fast implementation which does that optimization for you is here

http://www.isa.uni-stuttgart.de/software/

Vinod Kumar Chauhan

Hi Deepak

To quickly train the SVM, you can try following things:-

Use Linear SVM

Use Primal SVM form

Use scaled data

Use optimum parameter values.

Explanation:

1. Use Linear SVM (linear Kernels) like LIBLINEAR library. But the conditions to use Linear SVM are that: (a) Data should be linearly separable, otherwise test accuracy could be very low. You can check whether data is linearly separable or not by the method mentioned by Sir Ingo Steinwart (b) Training time is more important than test accuracy. This is because test accuracy of linear SVM is always less than non-linear SVM.

2. Use Primal SVM for your problem. This is because number of features are very less (2) as compared to number of training instances (30000) so primal should be very very fast as compared to dual form.

3. Use scaled data as mentioned in the paper "A Practical Guide to Support Vector Classification". This can be helpful, as mentioned by Minh-Tien Nguyen.

4. Use optimum values of parameters to get the best result. You can use grid search as mentioned in the above mentioned paper.

Thanks

Vinod

Article A Practical Guide to Support Vector Classification Chih-Wei ...