Comparison of variable selection procedures to model weather-pathogen relation in crops

Nowadays it is possible to easily access large volumes of georeferenced climatic data. These data can be used to model the relationship between climatic conditions and disease from multiple meteorological variables, usually correlated and redundant. The selection of variables allows the identificati...

Full description

Bibliographic Details
Main Authors: Suarez, Franco Marcelo, Bruno, Cecilia, Giménez Pecci, María de la Paz, Balzarini, Mónica
Format: Online
Language:spa
Published: Facultad de Ciencias Agropecuarias 2024
Subjects:
Online Access:https://revistas.unc.edu.ar/index.php/agris/article/view/40871
Description
Summary:Nowadays it is possible to easily access large volumes of georeferenced climatic data. These data can be used to model the relationship between climatic conditions and disease from multiple meteorological variables, usually correlated and redundant. The selection of variables allows the identification of a subset of relevant regressors to build predictive models. Stepwise, Boruta, and LASSO are variable selection procedures of different nature, so their relative performance has been scarcely explored. The objective of this work was the comparison of these methods simultaneously applied in the construction of regression models to predict disease risk from climatic data. Three georeferenced databases were used with presence/absence values of different pathogens in maize crops in Argentina. For each scenario, climatic variables from the period prior to sowing until harvest were obtained. The three variable selection methods obtained models with accuracy close to 70 %. However, LASSO produced the best predictive model, selecting an intermediate number of variables with respect to Stepwise (lower number) and Boruta (higher number). The results could be extended to other pathosystems and inspire the construction of alarm systems based on climatic variables.