Corpus v. Sample? - FAQS.TIPS

More Saif Shahin's questions See All

How can I do topic modeling of a large dataset of tweets using Mallet?

I am trying to do a topic modeling study of a dataset of about 4 million tweets using Mallet and running into issues with working memory, or "heap space." My computer does have around 15 GB of...

04 May 2018 8,874 7 View

How to exclude tweets posted by bots? Also, why is this suddenly such a concern?

A couple of reviewers have asked me to take out tweets that may have been posted by "Russian bots" from my dataset before the analysis. Such tweets supposedly undermine the results. This has led...

04 May 2018 8,488 6 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

What is the reason for current dropping in OER , LSV curve?

I tried four trials of the same Copper Phosphides sample in Alkaline medium ( 0.5M KOH) with Hg/HgO reference electrode and Pt as counter electrode. I used 0.001 V/s scan rate for first three...

10 August 2024 3,629 1 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

Jan Gil Gozun Sarmiento

Reasons for studying a smaller sample:

1.) Responses are not that varied

2.) Constraints on time/money/resources in general.

Anna Małgorzata Kamińska

When machine learning techniques are involved it is common practice to split data set to training one and test one. This enables to verify learning effectiveness on the real data set.

If you did not need to verify the results, just answer the reviewer that such a large set of data did not limit the effectiveness of research in any way, and therefore you had no reason to reduce it.

Ette Etuk

Complete enumeration has its advantages and sampling has its advantages. The former is exhaustive and if affordable is preferable to sampling. With it there is no loss of information as with sampling. However it is more costly.

Saif Shahin

Many thanks, Jan, Anna, and Ette. These are very helpful answers/suggestions.

Prajakta Ratnaparkhi

Random sampling gave more exposure to our work. They must be thinking that random sampling may give more accurate result. Just go for it. They are more experienced and learned people. So go forward and do as per suggestions . you can also have comparative analysis. it will add more in your research and your efforts will not be get wasted. All the very best