How to use SVM for Data Sets with Missing Data?

More Sergey Porotsky's questions See All

How do we pick data for determination of Validation Acceptance Criteria?

Hello, colleagues! There is commenting open for new upcoming edition of USP 1033. Validation target acceptance criteria is now different from what it used to be and it doesn't include Cpm....

23 July 2024 7,292 3 View

Hydride zirconium properties?

Could you present the review or nowadays status state investigation of thermophysical and physico-mechanical properties of hydride zirconium (ZrH2)? Where can I find out properties data tables?

02 April 2024 2,025 0 View

Ansys APDL_Receiving resulting heat flow in finite element?

Could anyone explain *how to* or give an example of receiving heat flow (watt) for the FE element? I understand that *GET command is useful like *GET, PAR, ELEM,NO_ELEM,...,SMICS but can't get...

01 April 2024 6,562 2 View

Is it possible an unspecific binding of streptavidine or peroxidase?

I have a question for the professionals. The essence of the problem: I fix the oligos on a plastic substrate, they play role of primers in solid-phase PCR. I carry out a one-step PCR with...

28 March 2024 9,110 2 View

Simulation of temperature field in a brick, taking into account thermal expansion with ANSYS Mechanical APDL? (Thermomechanics in details)?

Problem to model a simple (at first glance) stationary problem of thermomechanics in an elastic formulation in ANSYS Mechanical APDL. There is a simple block (brick), with spatial non-uniform...

28 March 2024 7,709 2 View

Ansys Mechanical APDL__4D table input (axes and indices)?

Hello colleagues! Several days ago I used to struggle with tech problem of 4d table input QV = QV(X,Y,Z,TIME) using Mechanical APDL commands script. I haven't solved it yet. 1) Standard technique...

22 March 2024 3,515 0 View

I am looking for B-H curves of GO steel M080-23DR (Domain Refined)?

I am looking for B-H curves of GO steel M080-23DR (Domain Refined). It is customary to think that the "technical saturation" of GO steels is near 2.0 – 2.04 T. However, there are some indications...

04 March 2024 7,393 0 View

How to get solution for house-selling task (or task of selling an asset) with conditions “finite horizon, recall is allowed” ?

House-selling is one of the typical tasks of the Optimal Stopping problems. Offers come in daily for an asset, such as a house, that you wish to sell. Let Xi denote the amount of the offer...

03 February 2024 4,482 4 View

What dye do you use for RNA electrophoresis?

Colleagues, tell me which dye is optimal for staining RNA during gel electrophoresis? Which one do you use in your laboratory?

21 January 2024 6,373 4 View

Are you in Public Health and interested the project: "Central Asia Alliance for Climate-Health Research"?

I seek coauthors and colleagues from Uzbekistan, Kazakstan, or other CA countries in "climate-air-pollutant-health" research. It would be great to collaborate with individuals who share similar...

28 November 2023 8,493 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

The question is how to use Wavenet transform?

HOW CAN I WRITE A CODE TO USE THE WAVENET TRANSFORM AS A FEATURE EXTRACTION METHOD INSTEAD OF DWT IN MATLAB?

03 August 2024 7,829 0 View

Sergey Porotsky

Dear Imen and Fernando, thanks a lot for your answers.

Sorry, additional comment.

Usually for "missing data" we mean, that for some objects there are not labels. For my task situation is other - labels exist for all objects from training data set, but for some objects there are not exist values of some parameters.

Samer Sarsam

Hi Sergey,

If the missing value is a numeric one, replace it with the mean of the associated attribute. Otherwise, repplace it with the mode (if it is nominal).

HTH.

Samer

Zhuo Sun

Hi Sergey. For your case "some object without label", maybe you can try the semi-supervised methods.

Conference Paper Superpixel tracking via graph-based semi-supervised SVM and ...

Zhau,

I'm afraid that applying semi-supervised learning is not useful (at least at this stage) as it requires building a prediction model from the full data that have NO missing value (while here the data have missing values) and use this resulting model to predict the unlabeled data (post stage--not the current situation). In fact, Sergey needs to preprocess the data before building prediction model, so he needs to handle missing values at the very early stage.

Hi Samer

I agree with your idea that semi-supervised learning is not a proper tool in the preprocessing steps. While if there are some data has label and Sergey does not totally throw away the unlabeled data, maybe semi-supervised learning can help

Dear all, thanks for your answers. Certainly, label missing isn't problem for me - all objects have labels. Problem is rather missing of the parameter values. I will carefully study your proposals.

Dear Samer Sarsam, once more thanks for your answer ( "If the missing value is a numeric one, replace it with the mean of the associated attribute. Otherwise, replace it with the mode (if it is nominal"). In my opinion, it isn't fully correct. Some parameters may be strong correlated, and in this case to insert missing value of some parameter we should take into account values of other (not missing!) parameters and values of correlations. Perhaps, you know some articles, which consider this approach ? Thanks beforehand. Regards, Sergey.

Dear Sergey,

No worries.

There is no universal perfect approach for finding hidden stuff (whatever they are), even the prediction model (with its high complexity, sometimes) that you need to build has a percentage error.

In regards to handling missing values, knowing your data is a crucial fact that you need to consider at the very early stage. The strategy I suggested is old and common one, where several resources have discussed it in the literature. For example, kindly consult the book "Data mining and predictive analytics- second edition" by Larose (2015).

On the other hand, generally, if you have a group of subjects, the dataset of each is a combination of {(instances/examples) and (attributes/features)}. In order to build a classifier model from all subjects' datasets, such datasets have to be compatible (same instance/attribute characteristics). Nevertheless, if you have missing *instances*, in some of them, you can use the strategy I suggested. But, if subject's dataset has missing *attributes*, then you can:

- either remove all these attributes from all subjects, or

- build a predictive model from all the subjects that have full attributes and use this model to predict the missing attribute(s) in each participant's dataset.

Cheers,

I also see following approach - step-by-step to prune (to delete) some parameters, which can have missing values, and to solve SVM task for limited amount of input parameters. Drawback is following - we should solve a few SVM tasks. e.g.. for full amount of variables, without parameter 3, without parameter 7, etc.

Maysam Toghraee

It can be done through a nonlinear support vector machine

Dear Maysam, please, give me link for this approach.