Training and test sets for statistical analysis?

More Andrés Larroza's questions See All

Can I use DNA a ladder for RNA samples in agarose electrophoresis? Or It is preferable to use cDNA amplified from RT-qPCR for the same samples?

I performed a nucleic acid extraction using 2 protocols for RNA extraction (Dengue Virus) and one parameter I need to evaluate is the integrity of my RNA samples. In this case, I can't use a RNA...

09 July 2024 4,857 2 View

Does anyone have access to literature supporting the idea that there are differences in energy demand across different levels of a building?

I'm working on the theoretical framework of my thesis and I can't find literature to support the idea that at different floor heights, there may be differences in energy demand. If someone can...

25 March 2024 4,998 1 View

Is anybody interested in initiatives to curb food waste at the consumption stage?

"Creative and Sustainable Initiatives to Combat Food Waste at the Consumption Stage" This Special Issue on combatting food waste through responsible consumption aims to gather the latest results...

29 February 2024 3,822 6 View

What are those shinny/metalic crystals?

I have recently commenced research involving sulfate-reducing bacteria. For my experiments, I prepared an anoxic freshwater medium supplemented with trace elements, a vitamin mixture, and sodium...

29 January 2024 4,306 2 View

A good anti-GR antibody for western blot in plants?

Hello, I would like to perform a western blot to detect a protein fused to GR (Glucocorticoid Receptor) and expressed from a 35S promoter in transgenic apple plants. I have never used anti-GR...

17 January 2024 7,806 0 View

Fluorescence properties changed due to laser irradiation?

Hi everybody. I would like to know if thre is any material whose fluorescent properties can be changed, in a permanent way, by a external stimulus, in particular by laser irradiation. For...

16 January 2024 8,821 4 View

How can I lower the viscosity of a water- Carboxymethyl cellulose solution?

I`m trying to prepare a solution of water-cmc with the highest content of CMC possible for further processing, but I wanted to know how I can lower the viscosity so it will be easier to handle but...

04 January 2024 2,062 6 View

How can we enhance histological sectioning techniques to improve the quality of tissue?

The liver sections look fragmented, how could the histological technique be improved to better observe this tissue under the microscope?

06 December 2023 7,388 3 View

Does anyone know how to remove the chromogen from an immunohistochemistry?

During my immunohistochemistry procedure, after applying the chromogen diaminobenzidine, I mistakenly added other reagents—PBS, Triton, avidin, and biotin—resulting in chromogen precipitation and...

30 November 2023 6,702 1 View

What equipment is suitable for delivering electrolytic lesions to mark electrode location?

Hi, I would like to mark the tip location of a NeurNexus probe. I know a good way to easily find the electrode location is to anesthetize the animal, perform an electrolytic lesion, wait a few...

21 October 2023 1,656 1 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

Jochen Wilhelm Popular answer

Performance measures of classifiers that are estimated from the same set of data are always biased. An unbiased estimate can be obtained only on new data. As J.A. said, splitting a coordinately acquired data set into a training and a test set reduces the bias but will not completely remove it.

So, as so often in statistics, there is no "correct" vs. "incorrect", there are only different shades of "usefulness", depending on the aims and on the context. Sometimes you only have biased estimates, and you may discuss the possible bias. There is also the possibility that the (potential) bias might not be relevant for your problem.

Edit: typo

J. A. Hageman

I would say technically yes, from a practical perspective not really. What you want to know is the classification accuracy of unknown samples (samples that were not used in the construction of the classification model). To achieve this, some form of cross validation can be used (or double cross validation if meta parameter need to be optimised). The classification results of unknown samples will give an idea of performance for new samples.

However keep in mind, that there also will be sample bias. Typically samples are collected simultaneously and will be alike to a certain extent. If you start classifying samples collected at a later instance, it may be much harder to classify them correctly than the crossvalidation initially may have indicated.

Jochen Wilhelm

Ariel Linden

An approach worthy of consideration is cross validation. This is particularly useful when the amount of data available is limited (as you correctly note). The general approach used often in model building using data mining techniques (ie., classification trees, etc.), is to apply a "10 times 10-fold cross validation." This basically means that you would run the training model on 90% of the data and test it on the remaining 10%. Then you would repeat this for each of the 10 holdout samples and collect all the goodness of fit measures. Then, you would repeat this all 10 times for a total of 100 times. A good model should provide good "goodness of fit" statistics across the test samples.

Daniel Wright

For good coverage of CV on a book that is online, see

http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf

Shuichi Shinmura

It is not good idea that you divide your data into the training and test set because you cannot obtaine the 95% CI.

See my papers using " the "k-fold cross validation "method after 2014.