Does support vector mechine work for categorical independent variabels?

More Nirajan Bam's questions See All

Is the title 'Behavioral Finance and Its Implications for Asset Pricing Models' appropriate for a Master's thesis?

The title 'Behavioral Finance and Its Implications for Asset Pricing Models' has been assigned for a Master's thesis in Finance. Is this an appropriate topic? While searching, no previous works...

10 June 2024 3,906 2 View

How to simulate longitudinal normal and skew-normal response variable ?

Hello, I am trying to create the simulated data set for beta mixed model. I am trying to create longitudinal normal and skew-normal response variable which should be continuous within the interval...

01 April 2022 1,543 8 View

What is reason to have no response from journal and editor after third round review of manuscript ?

I have requested some minor revisions from editor after second round review of my manuscript. I have submitted revisions on December 9th. My manuscript supposed to be published in their December...

12 January 2022 366 3 View

How to run a mixed model least square analysis in R-Studio?

Hello all ! I want to see how season of calving, breed of animal, stage of lactation, age of cow, BCS and altitude variation affect quantity and quality of milk production in Jersey and Holstein...

08 September 2020 6,363 4 View

Where to find Covid 19 longitudinal data by states, age, and gender?

I am searching the weekly death count by COVID 19 by states, age and gender. Anyone please suggest the correct link to find it?

03 May 2020 1,860 0 View

Where to find ongoing longitudional data ??

Anyone please suggest me the ongoing longitudinal data collection from where we can download the data for some research purpose?

11 January 2020 7,609 3 View

Does anyone know user friendly tool to perform association between expression of multiple genes and one particular phenotype?

I want to test how much variation in expression of 10 different genes (combined effect 10 genes) explains variation in one particular phenotype.

30 January 2017 4,786 2 View

What is the best feature extraction algorithm for Optical Character Recognition?

From the study I realize that the feature extraction methods for word level recognition does not fit well for character level recognition. Thus the question is raised in my mind, what algorithm...

27 February 2016 9,676 2 View

How do I calculate Classification Confidence in Classification Algorithms (Supervised Machine Learning )?

I am dealing with Image Classification problem and I am using SVM classifier for the classification. I have setup an experiment that consists of two level classification. I assume that I first...

29 December 2015 1,576 9 View

Does anyone know why my Illumina reads for miRNA sequencing do not have length ~22 nt ?

All of them are 38-40 nt long after adapter trimming. Does anyone have idea how to deal with differential expression analysis with these reads using CLC bio? Since these reads are longer than...

11 November 2015 8,505 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

How to load and plot 2D-PIV (particle image velocimetry) recorded velocity vector field in Techplot?

Dear Researchers I need to know, how to load and plot 2D-PIV (particle image velocimetry) recorded velocity vector field data in Tecplot? Thank You

02 August 2024 3,615 1 View

How to choose the journal?

Hello I want a suitable journal in the field of remote sensing and machine learning to be judged quickly. Thank you for your guidance Thanks

01 August 2024 1,799 4 View

Transfection in HEK293T cells?

Dear All, I am trying to transfect a pCDNA3.1 vector containing my gene of interest. The purpose is to figure out the localization of the protein of interest. I have fused the protein with GFP on...

31 July 2024 9,892 4 View

A Question about Phd thesis?

Hello everyone What is your opinion about the introduction of an expert decision support system in which the rules are extracted from existing data without human intervention, instead of being...

31 July 2024 5,785 4 View

The use of data from PubChem for commercial purposes?

Hi, I'm curious to know if data on chemical compounds from PubChem, such as water solubility properties, can be used to train a machine learning model for commercial purposes. Will this infringe...

30 July 2024 8,707 1 View

How can we improve transfer learning techniques to make models generalize better across different tasks and domains with limited labeled data?

Machine Learning

24 July 2024 2,487 3 View

How can AI technology to enhance the agricultural productivity?

Farmers no longer have to apply water, fertilizers, and pesticides uniformly across entire fields. Instead, they can use the minimum quantities required and target very specific areas, or even...

22 July 2024 8,296 3 View

Nahian Ahmed

In machine learning, categorical variables need to be preprocessed using one-hot encoding to create binary independent variables. For example, if a specific categorical variable has 6 unique values (discrete states), one-hot encoding the feature will result in 6 binary variables. If a specific sample is in the 1st category of the 6, then the first binary feature will be '1' and the rest will be '0' (100000). If another sample is in the 2nd category, second binary variable will be '1' and the rest will be zero (010000) and so on.

Sergey Porotsky

I don't agree with above answer. It may be useful only small amount of the possible values of the single variable (two, maximum 3). I don't understand, why it is impossible to use direct coding - 1,2,3,... for each value of the variable. For example, white = 1, black = 2, gray = 3, green = 4, etc.

Nirajan Bam

Hello Nahian, yes I had a binary classification but I found following ;

Non-numerical data such as categorical data are common in practice. Some classification methods are adaptive to categorical predictor variables in nature, but some methods can be only applied to continuous numerical data. Among the three classification methods, only Kernel Density Classification can handle the categorical variables in theory, while kNN and SVM are unable to be applied directly since they are based on the Euclidean distances. In order to define the distance metrics for categorical variables, the first step of preprocessing of the dataset is to use dummy variables to represent the categorical variables. Secondly, due to the distinct natures of categorical and numerical data, we usually need to standardize the numerical variables, such as the contributions to the euclidean distances from a numerical variable and a categorical variable are basically on the same level. Finally, the introduction of dummy variables usually increase the dimension significantly. By various experiments, we find that dimension reduction techniques such as PCA usually improve the performance of these three classifiers significantly. Following is the link

https://stats.libretexts.org/Bookshelves/Advanced_Statistics_Computing/RTG%3A_Classification_Methods/4%3A_Numerical_Experiments_and_Real_Data_Analysis/Preprocessing_of_categorical_predictors_in_SVM%2C_KNN_and_KDC_(contributed_by_Xi_Cheng)