Skip to Main content Skip to Navigation

Statistical methods for modelling the spatial distribution of plant species from large masses of uncertain occurrences from citizen science programs

Abstract : Human botanical expertise is becoming too scarce to provide the field data needed to monitor plant biodiversity. The use of geolocated botanical observations from major citizen science projects, such as Pl@ntNet, opens interesting paths for a temporal monitoring of plant species distribution. Pl@ntNet provides automatically identified flora observations, a confidence score, and can thus be used for species distribution models (SDM). They enable to monitor the distribution of invasive or rare plants, as well as the effects of global changes on species, if we can (i) take into account identification uncertainty, (ii) correct for spatial sampling bias, and (iii) predict species abundances accurately at a fine spatial grain.First, we ask ourselves if we can estimate realistic distributions of invasive plant species on automatically identified occurrences of Pl@ntNet, and what is the effect of filtering with a confidence score threshold. Filtering improves predictions when the confidence level increases until the sample size is limiting. The predicted distributions are generally consistent with expert data, but also indicate urban areas of abundance due to ornamental cultivation and new areas of presence.Next, we studied the correction of spatial sampling bias in SDMs based on presences only. We first mathematically analyzed the bias when the occurrences of a target group of species (Target Group Background, TGB) are used as background points, and compared this bias with that of a spatially uniform selection of base points. We then show that the bias of TGB is due to the variation in the cumulative abundance of target group species in the environmental space, which is difficult to control. We can alternatively jointly model the global observation effort with the abundances of several species. We model the observation effort as a step spatial function defined on a mesh of geographical cells. The addition of massively observed species to the model then reduces the variance in the estimation of the observation effort and thus on the models of the other species.Finally, we propose a new type of SDM based on convolutional neural networks using environmental images as input variables. These models can capture complex spatial patterns of several environmental variables. We propose to share the architecture of the neural network between several species in order to extract common high-level predictors and regularize the model. Our results show that this model outperforms existing SDMs, that performance is improved by simultaneously predicting many species, and this is confirmed by two cooperative SDM evaluation campaigns conducted on independent data sets. This supports the hypothesis that there are common environmental models describing the distribution of many species.Our results support the use of Pl@ntnet occurrences for monitoring plant invasions. Joint modelling of multiple species and observation effort is a promising strategy that transforms the bias problem into a more controllable estimation variance problem. However, the effect of certain factors, such as the level of anthropization, on species abundance is difficult to separate from the effect on observation effort with occurrence data. This can be solved by additional protocolled data collection. The deep learning methods developed show good performance and could be used to deploy spatial species prediction services.
Document type :
Complete list of metadata

Cited literature [501 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, September 23, 2020 - 10:08:11 AM
Last modification on : Monday, May 17, 2021 - 6:14:06 PM
Long-term archiving on: : Thursday, December 3, 2020 - 3:51:10 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02519161, version 3


Christophe Botella. Statistical methods for modelling the spatial distribution of plant species from large masses of uncertain occurrences from citizen science programs. Statistics [math.ST]. Université Montpellier, 2019. English. ⟨NNT : 2019MONTS135⟩. ⟨tel-02519161v3⟩



Record views


Files downloads