Is there any harm treating binary data as percentage data by grouping it?

More Irshad Samdani Mujawar's questions See All

Can number of people at risk (Ni) at time ti is allowed to increase during study in survival analysis?

Hello everyone, I have a question regarding empirical survival function (Kaplan-Meier). All the examples I have seen so far, I have seen that N0 at time 0 is the maximum. S(ti) = ((Ni - di)/Ni)...

05 June 2017 3,125 3 View

How predictive analytics is being used in revolutionizing education?

Hello everyone,I would like to know that that how predictive analytics is one of the most trending tech in education to change the education? Is there any good article/paper available , describing...

06 July 2016 1,260 0 View

How to determine the position of the nodes in the Khan academy knowledge map?

Hello,I was going through knowledge map built by Khan Academy. Here is the link : https://www.khanacademy.org/exercisedashboard . I was wondering how would they have calculated the coordinates for...

11 December 2015 2,515 0 View

What are the tools and packages/softwares available for estimating probabilistic knowledge structure?

Hello,I am currently using "pks" -an R package for fitting the basic local independent model(BLIM) for given knowledge structure. Does anyone know any other package/ software except "kst" and...

09 October 2015 1,150 0 View

Debugging a c++ compiled code used in R functions?

Hello Everyone,I need to debug few c++ functions used inside an R package to see what parameter are taking what values. The package name is "mirtCAT". I am not able to do it in R studio. Is...

08 September 2015 4,338 1 View

Is there any converter which converts R code to C# code?

Hello everyone,I need to convert few functions from R code to C# code, Is anyone aware of such converter? If not is there any simplest way to write the R codes in C#?P.S. R.net I already know. I...

04 May 2015 1,508 1 View

An alternative for scatter plot?

Are there any other ways than scatter plot to visually represent the relationship between x and y?Thanking you,Irshad

11 December 2014 4,599 5 View

Are there any other methods (other than imputation) to calculate the item fit analysis in the presence of sparse data?

I am using mirt (an R-package) for the IRT( Item Response Theory) analysis. Current data, I am dealing with is sparse (contains missing value). Responses are missing because the test is adaptive....

08 September 2014 1,384 3 View

Multidimensional IRT commercial software

Hello everyone,Are there any commercial software available for multidimensional adaptive testing (which include calibration of items, ability estimation and next item selection procedures) ? If...

08 September 2014 3,388 4 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

Stephen Politzer-Ahles

There are many harms in this; see e.g. Jaeger 2008. I'm especially not sure what it means to arbitrarily lump 10 rows together rather than grouping by some meaningful factor. Finally, I'm not sure that failing to classify the 1s means your model has a problem in the first place; if there's not a reliable relationship between any predictor variable and this outcome, then that seems normal.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2613284/

Irshad Samdani Mujawar

@Stephen, thank you so much for the response. By meaningful factor you mean should I group them according to the values of one of the most important variables in the predictor set?

A meaningful grouping variable depends on your experiment. e.g., the percentage of correct responses for each participant, the percentage of something for something else, whatever. "The percentage of 1s in ten rows that happen to be adjacent in my data frame" does not sound very meaningful to me, but I don't know the design of your experiment. Anyway, even if you have a meaningful grouping variable, the problems mentioned above still hold, and overall I just don't see why you can't use logistic regression (you haven't explained why there is a problem with using a logistic model).

As I mentioned the problem with logistic regression is that it is not classifying 1s. Despite the proportion of 1s being 16-17% (in both testing and validating data). I tried oversampling and undersampling methods too, this helped but this triggered misclassification rate.

One more thing I forgot to ask you that what kind of problems my model will suffer if i group my 100 rows together and treat it as proportion?