I am conducting random parameter multi-vehicle crashes modelling and found my data has 96% of zero crashes. What are the best method in solving excess zeros data and on the same time to capture unobserved heterogeneity?
regarding your first sentence: 4% crash rate isn't high enough for you! What if 4% of all trains or airplanes crashed? To increase crash rate: increase speed limits, have drivers consume more alcohol, and text while driving.
Regarding your question: "both conditions" - what conditions?
Zero excess is typical problem with accident modelling. Some authors proposed zero-inflated models - but there has been some controversy related to their logic. See for example here - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.454.2984&rep=rep1&type=pdf
Depends on the environment, traffic condition and road typologies I don't see any problem in working with data with those features. Perhaps you can try to extend the observation period, but pay attention to the time trend effects and possibly spatial correlation. I suggest to you to use the General Estimating Equation (GEE) instead of the classical GLM approach, and to use an overdispersion parameter variable with length. The GEE will help you in making mistake related to time trend, while the use of an overidspersion parameter variable with length, represent the state of the art for the fllowing EB analysis. I suggest to you to avoid very short segment if you have in your dataset (94% of zero crashes is stricktly related to the segmentation approach you used.)
Obviously if you are dealing with intersection, what I wrote earlier remain valid apart from the segmentation and overdispersion parameter. If your models do not fit even using a GEE calibration methodology, I suggest to you to find a surrogate measure of safety for those, observed conflicts could be one of them (if there is some correlation with crashes).
If you are doing accident modelling with a a poisson law or a negative binomial, it's rather normal to have mostly zero crashes
I tried once a theory that for Level crossings with a very low frequentation of mainly locals, your crash probability was Pacc= P " non compliant driver" X PACC/ non compliant driver, snce I had for small LX both an excess of 0 and a greater variance
As you are concerned with the excess zeros, So its better to try Zero Inflated Negative Binomial Regression. You can use random effects zero inflated Negative binomial model. Please visit the link below> Hopefully you will get an idea