Skip to Main content Skip to Navigation

Statistical Learning Methodology to Leverage the Diversity of Environmental Scenarios in Crop Data : Application to the prediction of crop production at large-scale

Abstract : Crop yield prediction is a paramount issue in agriculture. Considerable research has been performed with this objective relying on various methodologies. Generally, they can be classified into model-driven approaches and data-driven approaches.The model-driven approaches are based on crop mechanistic modelling. They describe crop growth in interaction with their environment as dynamical systems. Since these models are based on the mechanical description of biophysical processes, they potentially imply a large number of state variables and parameters, whose estimation is not straightforward. In particular, the resulting parameter estimation problems are typically non-linear, leading to non-convex optimisation problems in multi-dimensional space. Moreover, data acquisition is very challenging and necessitates heavy specific experimental work in order to obtain the appropriate data for model identification.On the other hand, the data-driven approaches for yield prediction necessitate data from a large number of environmental scenarios, but with data quite easy to obtain: climatic data and final yield. However, the perspectives of this type of models are mostly limited to prediction purposes.An original contribution of this thesis consists in proposing a statistical methodology for the parameterisation of potentially complex mechanistic models, when datasets with different environmental scenarios and large-scale production records are available, named Multi-scenario Parameter Estimation Methodology (MuScPE). The main steps are the following:First, we take advantage of prior knowledge on the parameters to assign them relevant prior distributions and perform a global sensitivity analysis of the model parameters to screen the most important ones that will be estimated in priority;Then, we implement an efficient non-convex optimisation method, the parallel particle swarm optimisation, to search for the MAP (maximum a posterior) estimator of the parameters;Finally, we choose the best configuration regarding the number of estimated parameters by model selection criteria. Because when more parameters are estimated, theoretically, the calibrated model could explain better the variance of the output. Meanwhile, it increases also difficulty for optimization, which leads to uncertainty in calibration.This methodology is first tested with the CORNFLO model, a functional crop model for the corn.A second contribution of the thesis is the comparison of this model-driven method with classical data-driven methods. For this purpose, according to their different methodology in fitting the model complexity, we consider two classes of regression methods: first, Statistical methods derived from generalized linear regression that are good at simplifying the model by dimensional reduction, such as Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression; second, Machine Learning Regression based on re-sampling techniques like Random Forest, k-Nearest Neighbour, Artificial Neural Network and Support Vector Machine (SVM) regression.At last, a weighted regression is applied to large-scale yield prediction. Soft wheat production in France is taken as an example. Model-driven and data-driven approaches have also been compared for their performances in achieving this goal, which could be recognised as the third contribution of this thesis.
Document type :
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Tuesday, March 9, 2021 - 3:58:11 PM
Last modification on : Monday, June 21, 2021 - 3:28:34 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03164008, version 1


Xiangtuo Chen. Statistical Learning Methodology to Leverage the Diversity of Environmental Scenarios in Crop Data : Application to the prediction of crop production at large-scale. Statistics [math.ST]. Université Paris Saclay (COmUE), 2019. English. ⟨NNT : 2019SACLC055⟩. ⟨tel-03164008⟩



Record views


Files downloads