What will the best approach to handle missing values in data mining?

More Muhammad Haroon Naseem's questions See All

How combine yolo with Faster R-CNN?

I want a model that is balanced with accuracy or speed, faster rcnn has high accuracy while yolo have fast speed. i am thinking to combine them to get a hybrid model to achieve both speed and accuracy

02 August 2024 3,104 0 View

What should be the sample container for the hydrothermal reaction in a microwave reactor at 180 °C for 10 min at the heating rate of 5 °C per minute?

Suggest one of them 1. Teflon-lined stainless steel autoclave: 2. Alumina (Al2O3) ceramic container

30 July 2024 7,326 1 View

Addition of EDTA during the synthesis of copper nanoparticles to prevent it from being oxidized?

I've attempted to use this method to synthesize copper nanoparticles. Copper nanoparticles can be synthesised using a variety of precursor materials. CuSO4, distilled water, NaOH, and EDTA are...

28 July 2024 8,027 3 View

Can I please ask why my samples from anaerobic bioreactor giving me different size PCR product even after multiple runs?

Hi everyone, I have extracted DNA from a biogas bioreactor using Qiagen kit and prep cDNA library then used this library as template to optimize primers for qPCR (taken from papers). Some of the...

23 July 2024 1,329 5 View

Swerling Characteristic functions?

Hello!!! I want to implement the Swerling characteristics functions (CF) directly in MATLAB without using its Fourier integral pairs...the Swerling CFs are actually Laplace Transform of the signal...

23 July 2024 4,925 1 View

Radar Detection Probabilities?

Currently I need to calculate detection probabilities (PD) from RCS data. Beta distribution parameters for this RCS data are calculated and will be used in Swerling0 Equation. The idea is based on...

22 July 2024 2,851 0 View

Why methanol and sulphuric acid used in the analysis of polyhydroxyalkanoates (PHA) by GC-MS?

Why methanol and sulphuric acid, used in the analysis of polyhydroxyalkanoates (PHA) methyl esters by GC-MS? Additionally, why do we typically use non-polar solvents in GC-MS?

22 July 2024 1,210 2 View

Radar Detection Probabilities using beta distributed Scattering Cross section?

Currently I need to calculate detection probabilities (PD) from radar cross section (RCS) data. Beta distribution parameters for this RCS data are calculated and will be used in Swerling0...

22 July 2024 868 0 View

I want to buy Hydrothermal Synthesis Autoclave from any European company. Can anyone suggest any company inside Europe?

Thanks

22 July 2024 1,143 3 View

Optimal condition for depositing FTO target 95:5% using sputter technique?

Hello I want to know about the sputtering condition of depositing FTO from target 95:5%. I tried with RF sputter in pressure 2.5Pa and 100 sccm Ar at room temperature but it showing no...

21 July 2024 1,680 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

How combine yolo with Faster R-CNN?

02 August 2024 3,104 0 View

Why can't academics earn the money they deserve?

Only Journals make money from the articles we have worked on for years. Academics do not earn money from their refereeing. Then shouldn't the solution be a system in which academics can earn...

01 August 2024 6,469 6 View

Daqing Chen

Mainly two strategies: Ignoring any records that contain missing values, and finding out a replacement of missing values. Which one to apply? That depends on the data set and what to analyse. In brief, if you cannot ignore any data items although they contain missing you'd need to find replacement.

Orlando Grabiel Toledano López

For missing values you can replace it by substitution of these values using the mean, min, max, the value most appears in the column or other statistician. This work depends on your data features. Or as well as say Daqing Chen ignoring the instance or record that contains this value, but this desition can be less profit for small datasets.

K. M. Azharul Hasan

If the data is unstructured text data then better to create taxonomy and find appropriate missing value if the data is structured data then bin boundary, bin mean and many others

Amardeep Sharma

C&RT, Mean etc

Sourabh Shastri

By applying any Classification and regression tree algorithm.

or central tendency

Mind Kutyauripo

Quite a number of techniques are available to control the issue of missing values such as replacing the missing value with: (a) closest value, (b) mean value and (c) median value. Some algorithms are also used to deal with the problem of missing values such as k-nearest neighbor.

Kernan Mzelikahle

This is a data enhancement problem. Attempts to enhance data have a bearing on the results, thus, must be conducted with caution. Following is an algorithm for managing missing data:-

1. Eliminate all instances with inconsistent data and perform an analysis.

2. Choose a data filling method, such as the averaging technique, and apply to the original data then perform your analysis.

3. Compare results between Steps 1 and 2, if there is NO statistical difference between results, then report results. If there is statistical difference, then use more data enhancing techniques on original data, and perform analyses, to may be 3 sets of results. Then report results conditionally. This algorithm minimises errors.

Wojciech Indyk

The best techniques depends on characteristic and properties of your dataset. E.g. I recently published a paper on " Generic Data Imputation and Feature Extraction for Signals from Multifunctional Printers" http://ceur-ws.org/Vol-2322/dsi4-1.pdf that focus on IoT data.

Zouhair Chiba

See the attached paper.

Kris Villez

Here are some options, some of which have been discussed above:

Option 1: Ignore samples with missing data

Option 2: Ignore variables with missing data

Option 3: Build a model based on samples without missing data and use the model to impute missing data in the samples with missing data

Option 4: Build a model with all samples while estimating the model parameters jointly with the missing data

Most application of these four options assume that the missing data were removed randomly (regardless of chosen option). There are however also cases where the appearance of missing data is caused by sensor signals that are out-of-range or measurements that are below detection limits or removed by other processes (e.g. remote sensor battery management, deadband/swinging door filter). For such cases, a specialized case-specific model is often required to handle missing data or to obtain optimal estimates of the missing data.

With option 3 and 4, one additionally has the challenge that the chosen model must be trusted. Since calibrating or selecting a model is often the primary objective of the data mining effort in the first place, such a trusted model can be hard to obtain at the time of imputation - this is why option 1 and 2 are often chosen still.

Junjie Zhu

My comments are about time-series data prediction, so it may not apply for other research topics. Typical methods are listwise deletion (delete any observations that contain at least one missing value), replacement (mean, median, neighbors), and imputations. Many studies used replacement or imputations because they have to fill the blanks to proceed their investigations. If you have such a restriction, replacement (such as moving window average) or imputations (model-based) are recommended.

Alternatively, in case you don't want to give "new information", you could use a modified listwise deletion method that I developed. Basically, it is "listwise deletion" + "variable selection". See https://doi.org/10.1061/(ASCE)EE.1943-7870.0001097. if you're interested.

Mohsen Ghorbian

I think this link will be useful:

https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779