What can be done about the problem, arising from the fitting of beta distribution to a data set, which contains zero and one (or boundary points)?

More Leila Delgarm's questions See All

Are goodness-of-fit tests only used for selection when the alternative families have the same number of parameters?

For example: Kolmogrov-Smirinov test, Cramer-von Misses test and Anderson-Darling.

09 October 2013 368 2 View

Are goodness-of-fit tests used for selection when the alternative families have the same number of parameters?

Kolmogorov-Smirnov, Cramer von mises or Anderson-Darling.

09 October 2013 5,324 1 View

Are goodness-of-fit tests such as Cramer-von-Misses used for selection when alternative families have the same number of parameters?

Kolmogrov-Smirinov and Anderson-Darling tests etc.

09 October 2013 374 1 View

What can be done about the problem arises from the fitting of beta distribution to a data set which contain zero and one (or boundary points)?

No details

08 September 2013 6,119 1 View

Taking y = [y(N – 1) + s]/N to substitute zeros for the attached data set does not yield a smaller observation. What is your recommendation?

Data problem.

08 September 2013 3,030 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Fabrice Clerot

The "supplementary material" on the page below discusses the issue :

http://supp.apa.org/psycarticles/supplemental/met_11_1_54/met_11_1_54_supp.html

Brian D. Gerber

What do you mean by problems? The beta distribution's support exists on [0,1], including the boundary values. Often estimation, as regards to regression, is done by maximizing unknown parameter values on the Real scale, thus using a transformation, such as a logit or probit link so that boundary issues in the estimation process are removed. See http://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf

Brian, logit or probit transform does not help much when you have samples sitting right on the boundaries ; as your references points out :

"Furthermore, if y also assumes the extremes 0 and 1, a useful transformation in practice is (y · (n − 1) + 0.5)/n where n is the sample size (Smithson and Verkuilen 2006)."

(which is the reference I gave above)

However, this always seemed to me more like a "quick and not so dirty trick" than a clean theoretically grounded approach. I would be quite curious to learn about a cleaner treatment.

Leila Delgarm

Dear Clerot Fabrice thank you so much for your attention and helpful document.

Dear Brian Gerber, the problem is an error which will be appear when implementing optimization commands (R Optim's or nlminb's) for estimating the parameters of beta distribution in the presence of 0 or 1.

Subrata Chakraborty

Impact Factor is about the journal as a whole and may not truly reflect the impact of individual papers in that journal, where as h-index is about the individual so i will prefer h-index over IF

Richard David Gill

If the data comes exactly from a beta distribution on [0, 1] the values 0 and 1 themselves can never occur. If you do have 0's or 1's, your model is inappropriate. You had better think up a model which respects this feature of the data. Eg the data is rounded to a small number of digits after the decimal point - then you essentially have grouped data from a beta distribution.

Juan Jose Egozcue

Dear Leila,

as mentioned by R.D. Gill, an important point is whether the zeroes or ones you have in your sample are essential (true 0 or 1) or they are "under detection limit". In the first case, the beta model is unappropriate. Under detection limit data are a special kind of censored compositional data. There are methods to replace these zeroes (the ones correspond to zeroes of the complementary part) or to make multiple imputations in the framework of compositional data analysis. I would recommend to have a look on the contributions by J.A. Martin-Fernandez and J. Palarea, as they are specialists on this topic.

However, from the compositional point of view, we recommend to fit more flexible distributions than the Beta-Dirichlet distribution. The alternative is the normal distribution on the simplex (also known as logistic-normal or logit-normal). However, the problem with the zeroes is still the same as in the beta distribution.

Dear Professor Juan Jose Egozcue

Thank you so much for your note and advise.

I will study the reference that you recommend.

Best regards,

Leila Delgarm.