Opinions about using uncertainty to convert replicates into independent samples?

24 March 2023 0 351 Report

Hi. I have a small tabular dataset of 130 datapoints, although these are just biological replicates of 20 samples. Each one of these samples' replicates have the same input feature values, and they only diverge in the target variable. So I have 130 different target variable values but only 20 sets of different input features.

If I consider the replicates grouped together (which seems like standard procedure) I only really have 20 samples which is a really low amount. I can compute the confidence interval of predictions from the known variability in the target variable of the replicates, and consider a Leave-One-Out strategy as a means of portraying the accuracy of my model, but I think it still falls short due to only having 20 samples.

The particularity of my case is that the input features cannot be measured for each biological replicate because most of the techniques are destructive and time consuming, hence I do have a good reason for why the dataset is like it is, but I need to increase my usable dataset size somehow.

The input features for a particular sample were computed mostly from experimental data, in which I know what is the mean and standard deviation of said feature for said sample, so I just assign the mean of that sample to all of its replicates. What I thought of doing was exchanging that fixed mean value for all replicates by random values from the gaussian distribution that can be drawn from the mean and standard deviation.

Essentially, I am trying to augment my data by the addition of noise, but this noise comes a known distribution and is a tactic to convert my biological replicates into independent samples. The presence of uncertainty of the input features for each sample is real and depends on the feature and sample so the added noise is also helping to not overtrain the model on more uncertain features, what I dont like is the fact that the input values have randomly drawn instead of given a more empirical value but I see no other way. Using the target variable information to engineer the input features is surely a data leakage that would not be approved by many.

Sorry for the ramble, my question is what do you think of this approach and has anyone ever seen this kind of strategy being used in a publication they can reference?

Thank you kindly

Badges
Science topic

More Kevin De Castro's questions See All

RNA later for the preservation of RNA in fecal samples at room temperature for one day (37°C)?

I am planning to collect human fecal samples for metatranscriptomic analysis using MGI. These samples are from indigenous people living in a region with high temperatures. I will have access to a...

06 August 2024 1,367 3 View

How to develop an academic literacy program for engineering at the higher education level?

Information literacy in higher education integration with curricula engineering

04 August 2024 5,368 3 View

How to identify wetland area in Landsat imagery?

The indexes NDWI and MNDWI are use to identify waterbody. Can I use these index to identify wetland also? Thank you for advance.

25 July 2024 6,242 2 View

Im trying iodination of an arene, but only getting black precipitates, is there anyone with iodination experience who can advise?

Organic Synthesis. My student has been trying iodination of arenes (our goal is to iodinate p-phenylenediamine) following these literature procedures: 1. Synthesis 2004, No. 11, 1869-1873 using...

16 July 2024 6,630 1 View

How can i generate a CRISPR knockin mutation zebrafish model with a reporter?

Hey! I aim to generate a transgenic knockin zebrafish line that mimetizes a genetic condtition that leads to a certain disease on human. To do so, I need to insert a codon for mutagenic aminoacid...

14 July 2024 6,240 0 View

What should be the best Lumens range for T8 (120cm) full spectrum LED lamp tubes?

Please (for Arabidopsis), what could be a good Lumens and color range (Kelvin) range for full spectrum LED lamp tubes size T8 (120cm) for each shelve measuring 130x50 cm (length x width) and 60 cm...

11 July 2024 6,078 1 View

Cross Attention in Transformers: Standard applications of the same ?

What are the standard applications of Cross Attention in Transformer Architectures ?

09 July 2024 9,310 2 View

Is there a way to calculate cell density and thus cell count from confluency for RPE-1 cells?

I am growing cells in a 24 well plate. I do not have access to a microscope with a camara, so I have to calculate the number of cells in the wells manually. I can get the confluency pretty easily,...

04 July 2024 9,622 0 View

Time Series Analysis: Has Recurrent Neural Networks (RNN) ever been used on Time Series Analysis ?

Are there standard RNN architectures been applied for Time Series Analysis, forecasting and anomaly detection problems ?

30 June 2024 3,169 8 View

LSTM on Time Series: Has LSTM architectures ever been applied to Time-Series Forecasting ?

Have we ever used LSTM architectures on Time-Series Forecasting and Analysis, and gotten a decent result ?

30 June 2024 6,924 3 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

Do you think can be any diamond in A type eclogites?

I want to know more about diamond ore deposits in world.

08 August 2024 1,514 0 View

U you think We need a website software of Blackbody radiation law expert software?

A website software of Blackbody radiation law expert software can used through the following web site. http://39.105.188.151:3000/index

07 August 2024 1,706 0 View

Enhancing Critical Thinking Skills for Slow Learners: A Review of Empirical Studies?

to identify themes in question with APA style references

07 August 2024 2,239 5 View

Unusual intensity drop in some sections of chromatograms in DDA?

Hi, we have measured tryptic peptides using both DDA and DIA method on QExactive. In DDA replicates i saw unusual intensity drops occurring at the same sections of chromatograms in DDA replicates...

07 August 2024 3,218 4 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

How to preform densitometry on SDS-page bands?

I ran a SDS-page of a bacterial lysate and I want to quantify protein concentration in a specific band. I was thinking of using a standards ladder or make some standards are different...

05 August 2024 9,805 3 View

Do you think can be any gas and oil bearing rocks in Eastern part of Iran?

I want to know more about petroleum deposits in Iran.

02 August 2024 8,725 3 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

What is the best sampling strategy?

I am conducting a qualitative study that uses interviews to investigate the perceptions of teachers about a particular leadership practice and I am focusing on 3 schools which have a total number...

01 August 2024 8,457 10 View