Binomial or Poisson distribution?

Jochen Wilhelm Popular answer

No.

Do not divide data int "normal" and "non-normal". Think what kind of data you have, and how unceratinty about that kind of data may be formulated.

If the data is from series of Bernoulli trials, where each is independent with the same success probability, the binomial model would be adequate. If the success probabilities are itself uncertain, a beta-binomial model could be more appropriate.

If the data is generated by "draws without replacement" from a set of 0 and 1, the resulting model is hypergeometric.

If the data are counts of events that happen with a given constant rate duing a given interval, uncertainty can usually be modelled with the Poisson distribution. If the rate itself is uncertain, a gamma-Poisson model (a.k.a negative-binomial) will often do a good job. Sometimes the data generating process may be more complex, and it might be neccessary to stochastically model the event that the outcome is either (gamma-)Poisson or zero. This leads to zero-inflated models.

If the data are proportions, the beta-model is often suited.

If the data are rates, masses, or concentrations, the gamma-model usually is appropriate (it includes the exponential model as a special case).

There are many different models for survival data, assuming different functional forms of the hazard rate (exponential, Weibul, log-logistic etc).

There are also many sepcial distribution models like Rayleigh, Gompertz, extreme value, Bose-Einstein, Dirac, Boltzman, Pareto, Rice, Laplace, Kent, etc etc.

It may also be that the data you analyze has several distinct und independent sources of uncertainty. This can be modelled with mixture distributions. E.g. the variable is actually exponentially distributed but there is a relatively large measurements noise, then the appropriate model would be an exponentially modified normal distribution. Or the height of a human is modelled as normal, but the expected value depends on the gender. When gender is modelled as a dichotomous random variable, the mixture will be a bimodal normal. Note that models like the beta-binomal and gamma-Poisson (negative binomial) are also mixtures.

Thinking about the kind of data can be the mmost difficult step in the analysis, and it can be the step providing the most and deepest insight in the research project!

Rolf Sundberg

Dear Choki, The question is not well-posed. These distributions do not fit "any non-normal data", but some such data. You must describe your situation and your data in some detail.

Best wishes,

Rolf S

Jochen Wilhelm

No.

Do not divide data int "normal" and "non-normal". Think what kind of data you have, and how unceratinty about that kind of data may be formulated.

If the data is generated by "draws without replacement" from a set of 0 and 1, the resulting model is hypergeometric.

If the data are proportions, the beta-model is often suited.

If the data are rates, masses, or concentrations, the gamma-model usually is appropriate (it includes the exponential model as a special case).

There are many different models for survival data, assuming different functional forms of the hazard rate (exponential, Weibul, log-logistic etc).

There are also many sepcial distribution models like Rayleigh, Gompertz, extreme value, Bose-Einstein, Dirac, Boltzman, Pareto, Rice, Laplace, Kent, etc etc.

Thinking about the kind of data can be the mmost difficult step in the analysis, and it can be the step providing the most and deepest insight in the research project!

Choki Gyeltshen

Thank you very much Rodolf and Jochen.

Sam Sugiyama

Some thoughts:

1. Is the data from a continuous or discrete process?

2. If the data is from a continuous process but significantly positively skewed, you could try the logit transformation. That may transform the data to be approximately normal.

3. If hyo are a student, then you may be able to obtain a student trial version (cost = 0, I think) for Risk Solver Platform. There is a curve fitting algorithm in the softrware that will fit the data to one of many distirbutions, display statistics regarding the goodness of fit and show a chart of the fit to the data.

4. If the data is discrete and the data follow a Bernoulli process, then you may find it useful to examine the Bayesian approach to examination of the process (prior = beta, likelihood = binimial, posterior = beta)

5. Finally, to piggy-back on the very useful responses above, there are some references that may be of interest to you that are available as Kindle eBooks:

5a. Statistical Distributions (4th Ed) by Forbes, et al (2011)

5b. Handbook of Statistical Distributions with Applications (2nd Ed) by Krishnamoorthy (2016) [I've only used this text and the first edition selectively]

Which multivarite analyses Classificationa and Regression Tree (CART) or Generalized Linear Model (GLM) is better?

What should p-value be reported when its 0.000 in SPSS?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

How can I interpret the data without the need of solving it manually?

Which test should be used to study association among demographic profile and awarness level?

Why can't academics earn the money they deserve?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?