Can big data overcome a lack of georeferencing precision? Is it worth checking locality data one by one for mountain taxa?

03 March 2019 4 4K Report

For a forthcoming study on alpine taxa, I will use locality data (e.g. extracted from GBIF) associated with climate data on a 1km2 grid (worldclim). The goal is to identify whether a couple of closely related species occupy different climatic niches.

In the process of cleaning the GBIF data (removing data lacking geographical precision), I realized that only a small portion of them (max. 20%) would be appropriate (precise within 1 km2). I still have 20000 records to go through and wonder if that is worth the trouble to go one by one. To my knowledge, there is no automatic filters precise enough for this.

Note that if the Geo-localisation of an alpine species lacks precision of only a few km, then the difference in climatic conditions between the "real" locality and the locality fed (GBIF data) into the analysis may be very different because of the highly variable topography of most mountain systems, thus leading to errors. With only max. 20% of precise enough data, I am questioning the validity of the automatic filtering approach.

Thus, would you advice to live with the errors (possibly up to 80% of the data), or to verify locality data one by one (tremendous work)?

Ákos Bede-Fazekas

Dear Adrien,

Although I cannot answer your question, I can add two more questions and give you an idea...

1) Are you sure that the small portion (

Martin Kopecký

Dear Adrien,

I would definitely use only verified localities in SDM. However, there are tools which can help you to automate a process. See e.g. these two (both open-access:) papers

Article CoordinateCleaner: Standardized cleaning of occurrence recor...

Article Biogeo: An R package for assessing and improving data qualit...

Best regards,

martin

Yunusa Hassan

This is really interesting, hoping someone can answer this question

Norbert Holstein

I think, it will depend on the scale of differentiation and distribution, and also on the precision of the climate datasets, as Marco aid. Especially mountains are often not too precise, and for large areas, the climate data are estimated, not measured.

If your uncertainty is only the locality (all occurrence data are correctly identified, and that's where I have my doubts), then big data might work, if there are not too different microclimates on a small scale. On the positive side, I believe, you'd rather risk to underestimate climate niche differentiation than overestimate it.

Who knows a botanist in Mexico who could help me by collecting Mexican species of Gentiana?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

How can I interpret the data without the need of solving it manually?

How does energy flow through Earth's systems and how does that affect climate and biosphere affect the flow of matter and energy on Earth?

Which is more important: The human factor or the technological factor in combating climate change?