What are the common data preprocessing steps for materials datasets before applying machine learning algorithms?

31 July 2023 1 6K Report

What are the common data preprocessing steps for materials datasets before applying machine learning algorithms? How do researchers deal with missing data or outliers?

Hossein Safar Yousefifard

Dear Ankur Taya

There are many methods for data preprocessing but most of them falls in to these categories :

Dimensionality Reduction
Data Cleaning
Feature Engineering ( Feature Extraction )
Sampling Data
Data Transformation ( Normalization, Standardization, etc )
Imbalanced Data

See these links for more information:

1 - https://www.geeksforgeeks.org/data-preprocessing-in-data-mining/

2 - https://www.scalablepath.com/data-science/data-preprocessing-phase

For your second question nowadays it is very common that researchers using forecasting methods ( mostly Machine Learning methods ) to produce new data same as missed data .

More Ankur Taya's questions See All

Do you mix and add all primers together in a singel PCR reaction (4 total primers for 2 mutations) when using Agilent's multi-site mutagenesis kit?

I want to introduce 2 mutations using Agilent's multi-site mutagenesis kit, I designed the primers using their online tool and it gave me 2 sets for primers (1 set for each mutation) so I was...

25 July 2024 6,517 3 View

Why farmer's of India not adopting Integrated farming systems?

While IFS is beneficial

15 May 2024 4,174 4 View

What is the approach for Machine learning algorithms selection for material science?

Which Machine learning algorithms suits best in the material science for the problems that aims to determine the properties and functions of existing materials. Eg. typical problem of...

10 May 2024 1,490 3 View

How do we contruct an optimal feature set?

How do we evaluate the importance of individual features for a specific property using ML algorithms (say using GBR) and construct an optimal features set for our problem. image taken from:...

10 May 2024 2,691 4 View

What will be the effect of using the low thermal conducitvity PCMs at fixed thermal conductivity of 1 W/(m K)?

The wide application of PCMs is rather limited due to a very low conductivity (less than 1 W/(m K)). If we fix the thermal conductivity of all these PCMs at 1 W/(m K). Then what will be the effect...

07 February 2024 3,593 1 View

I have an EPDM rubber that has swelled in oil. What solvents can bring back the rubber to the original shape?

24 January 2024 7,796 1 View

What is the impact of a Journal changing publishers for an author of a research article?

Dear all, I submitted a research manuscript to the "Journal of the Geological Society of India" on May 25, 2023. The paper underwent two rounds of "Major Revisions," which were successfully...

08 January 2024 2,822 0 View

Countess II FL Uneven Lighting. Any Fixes?

The lab that I work at uses a Countess II FL cell counter and the machine typically works fairly well. Recently we have noticed that the lighting for the brightfield is uneven and causes cells to...

28 September 2023 6,872 0 View

I want to deposit 5 micrometers linewidth of metal on my substate using shadow mask in sputtering technique. please suggest is it possible?

Is it possible? if any manufacturer is there please suggest. Also, suggest any other method for low work function metal deposition in 5-micron linewidth (5 microns * 5 mm).

27 September 2023 3,497 3 View

What are the steps to query data for machine learning from Materials project using API?

I have tried the following steps, but these are giving the errors 1. !pip install pymatgen 2. !pip install mp_api 2. from mp_api.client import MPRester with MPRester("my_api_key") as mpr: docs =...

20 August 2023 8,453 3 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View