I need to fit a negative binomial model to a data of prevalence estimates collected across several years and forecast based on this model, How to do it? what software to use?
BINOMIAL DISTRIBUTION: we need to start with positive binomial as a foundation material. recall that the positive binomial distribution starts with the definition of p = probability of success (s) from a set of observation n. The probability of failure is defined as q. recall further that the probability of success under the Laplace Rule of success is given by:
(1) p = (s + 1) / (n + 2)
Therefore, the probability is given as:
(2) q = 1 - p
NOW, in negative binomial distribution, the focal of the analysis is q NOT p. We are using q as the term of reference. The general statement is given as:
(3) f(x) = A(pn(1-p)pX
(3.1) A = (X + n - p)! / (n - 1)! X!
The expected value is given by:
(4) E(X) = n(1 - p) / p
with variance:
(5) V(X) = n(1 - p) / p2
and the moment generating function for the distribution is:
(6) ϕ(θ)neg bin = (p / [1 - (1 - p)eθ)n
Under equation (6), as n goes to infinity, the space data point (discrete) will become closer and closer, thus, the gap will disappear and a solid line would appear---(6) then creates a curve with the semblance of a normal distribution curve. under this condition, the test statistic used for negative binomial distribution is given by:
(7) Zneg bin = B + Z(C) + 0.50
(7.1) B = (n(1 - p) / p)
(7.2) C = sqrt[(n(1 - p)) / p2]
If we use 0.95 confidence interval, Z(0.95) = 1.65. H0: Zneg bin < 1.65 or not significant. the alternative hypothesis statement is: HA: Zneg bin > 1.65 or statistically significant. See attached table.
recall that in the case of positive binomial distribution, the test statistic was given by:
(8) Z+bin = E / F
(8.1) E = (X / n) - p
(8.2) F = sqrt(pq/n)
THUS, the basis for negative binomial distribution modeling is given.
CAVEAT ON TIME SERIES: If the condition of time series is introduced into binomial distribution, the issue of modeling and testing is further complicated. if we are doing point-time comparison, i.e. tn and tn+1, use F-test for two-counts Poisson distribution. That f-test for the Poisson distribution is given as:
(9) F = (1/t1)(N1 + 0.50) / (1/t2)(N2 + 0.5)
Since the F-table is tabulated by degree of freedom (df). The degree of freedom here is defined as: df1 = (2N1 + 1) and df2 = (2N2 + 1). recall that N = frequency of occurrence; t = time period. The rate of occurrence is Ri = Ni/ti.
I hope this is helpful. I attached here the Z-table for reference and future use. Cheers.
Good morning Ahmad, how are you doing? Are you familiar with the R software? It does this Negative Binomial for fitting Time Series better. I would advise you start reading on R and how you can use it to do this. R is a free software that uses coding or syntax to execute the negative distribution. If from there you have some problems contact me please.