Hello,
I am using generalized linear (mixed?) models and ranking selected candidate models with AIC (corrected for small sample size) to determine how weather influences the migration of birds (Hook-billed Kites). My predictor variables are precipitation and temperature with various time periods and various regions. My response variable for number of kites that migrate (magnitude) is number of kites counted at a single count-site annually/ number of hours counted (kites/hour, density or rate), so there are NO zeros and negative numbers (continuous). My response variable for timing (phenology) is the julian day when 50% of the counted population is counted. There are two count-sites that I am using data from. Mexico, 1995-2019 (25 years) and Belize, 2013-2019 (7 years). I am using different candidate model sets and running separate analyses for each response variable and for each site.
My research questions is how does precipitation and temperature certain times of the year (breeding season, month prior to migration, entire year, year prior) influence the migration of kites since there is high variation of both response variables every year. Some years 700 kites migrate and some years 8000 migrate and timing varies between years.
My questions:
1) If I use GLMM, my random effect would be year, since I am really only interested in how weather influences the migration and the years that were counted were "random." Is this right approach? If I used GLM and year was used fixed would this dredge my modeling?
2) I am considering using Poisson or Negative Binomial distribution family since the response is a density and continuous. It seems like NB is more for count data that is discrete. Is this accurate and what is the best way to test what family to use for GL(M)M?
3) Now the issue is overdispersion of data. Is the best way to test this by running the GLM and dividing the residual deviance by the df? Greater than 1 there is overdispersion? Or is there a better way to determine this?