Which are the best methods to artificially increase the size of numeric datasets ?

More Damian Petroff's questions See All

How user-friendly is the Sysmex DI-60 interface and operation?

The Sysmex DI-60 is a hematology analyzer developed and manufactured by Sysmex Corporation, a global leader in clinical laboratory solutions. The DI-60 is designed for use in medium-sized to large...

11 April 2024 1,666 1 View

What could be causing these peaks at 2735, 2400, and 2065 cm-1?

Hello all! Finishing my research on black powder explosives. This spectrum is of black powder explosives and I can't seem to identify these peaks and could use some help.

16 February 2024 2,101 0 View

Autonyms in the International Code of Botanical Nomenclature?

Hola! I'm having some issues interpreting 22.1 and 22.2 articles. From what I've read if the type species of a subdivision of a genus is NOT "listed" in the autonym subdivision, then that makes...

22 January 2024 3,247 6 View

What are the differences between a scoping, literature or mapping review?

21 November 2023 1,449 1 View

Filtration with volatile solvents (diethyl ether)?

Hello everyone, I'm following a protocol in which 1 ml of diethyl ether is used as a triturating solvent. The product should remain a solid while the impurities should dissolve in the diethyl...

21 September 2023 8,919 4 View

Multiple case study approach: How do I properly compare one main company to several others?

Dear fellow researchers, I am currently writing a paper in cooperation with a company. For this, I want to compare a proceass of this "main" company with the same process of several other...

23 April 2023 596 2 View

Could you recommend me some good texts on syncrosqueezed transform?

I want to use the syncrosqueezed transform for analise 1-D and 2-D signal. I am wondering if you could recommend me some good texts on syncrosqueezed transform. Knowing if some code is available...

12 March 2023 5,692 3 View

Could you recommend me some good texts on wavelet frames construction?

I am working on design wavelet frames to detect specific patterns in 1-D signals. I wondering if you could recommend mesome good texts on wavelet frames construction. Knowing if some code is...

12 March 2023 3,278 1 View

Would instillation of living root bridges (low cost carbon-negative) be beneficial for developing cities if construction times was reduced to 2-3 yrs?

Living bridges are a unique form of sustainable pedestrian infrastructure found in several parts of the world, including China, India, Indonesia, and Nepal. These bridges are created by manually...

11 March 2023 2,546 2 View

How much mice sclera do I need to perform proteomics and Western Blotting?

I want to use proteomics on mice sclera and western blotting to confirm the proteomics results . However, I am not sure how much mice sclera I need to get enough protein to be able to perform...

21 June 2022 9,575 0 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

What are examples of AI for good projects a teacher can assign to students?

So I am organizing an AI seminar. What are possible AI projects in the AI for good spirit? something the students can do and have an impact?

08 August 2024 9,437 4 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Shafagat Mahmudova

Dear Damian Petroff,

Numeric.Datasets. The datasets package defines three different kinds of datasets: Tiny datasets (up to a few tens of rows) are embedded as part of the library source code, as lists of values. ... Larger data sets which need to be fetched over the network and are cached in a local temporary directory for subsequent use.

https://www.researchgate.net/post/How_to_decide_the_best_classifier_based_on_the_data-set_provided

Regards,

Shafagat

Damian Petroff

Shafagat Mahmudova The classifier is not the subject of my question. Before even taking a classifier into consideration, the purpose of my work is to find a way to increase the dataset's size (in terms of rows), by artificially generating significant data, in order to improve the learning of a given model (could be SVC, DecisionTree, NaivesBayes, CNN, you name it...).

Deeksha Arya

You may try GANs (Generative Adversarial Networks - https://skymind.ai/wiki/generative-adversarial-network-gan).

Solomon Ebenuwa

use the SMOTE plugin in Weka, it provides the facility to increase and decrease as you required

Geert van Kollenburg

Data augmentation means increasing the variable space in such a way that a model gets more options to find good predictions/classifications. For example, adding socio-economic variables to genomic data to improve predictions or to reduce spurious associations between genes and outcomes.

From a methodological point of view what you ask for is not data augmentation. You are simply creating new data which adhere exactly to what you already know.

Simply use the model parameters as data-generation (like in a parametric bootstrap procedure) and generate data to stick to your existing data.

The result will, in general, be the same as simply dividing your standard errors by some constant related to the number of observations you include additionally.

Artificially creating significance is questionable.

Muna Al-Hawawreh

You can use Variational Autoencoder

This paper can help you more

Conference Paper Industrial Internet of Things Based Ransomware Detection usi...