Hi, I am trying to work with opensource data and trying to perform a randomForest analysis on predictors of bacterial biomass. I have latitude and longitude data for both the biomass and environmental data at the plot level (they are .csvs), though these data don't have the exact same coordinates (some have the same plot locations). If I want to upscale the data and look at the site level predictors, what's the best way to format my datasheet to pair overall site measurements of biomass with site environmental variables (like pH, TN)? I know there's the option of matching averages (having the site as the row name, and average of biomass, pH, TN as columns), but is there a way to keep the raw data in the analysis? Is there a more fitting way than averaging these variables to examine predictors when modeling? I've tried to find examples but most of the tutorials/papers I see get their environmental data from raster layers/.tiff files with climatic variables, but I have my data as spreadsheets, so I'm having trouble figuring out how to adapt. Thank you so much for your time/help!

Similar questions and discussions