Cluster validation to depict population genetic structure

Since the beginning of statistics, the identification of the underlying number of existing groups in a population has been a research question aimed at answering geneticists regarding the structure that is formed by similarities between individuals of one or more populations. Numerous indices have b...

Full description

Bibliographic Details
Main Authors: Videla, María Eugenia, Bruno, Cecilia
Format: Online
Language:spa
Published: Facultad de Ciencias Agropecuarias 2022
Subjects:
Online Access:https://revistas.unc.edu.ar/index.php/agris/article/view/34015
_version_ 1811172646450102272
author Videla, María Eugenia
Bruno, Cecilia
author_facet Videla, María Eugenia
Bruno, Cecilia
author_sort Videla, María Eugenia
collection Portal de Revistas
description Since the beginning of statistics, the identification of the underlying number of existing groups in a population has been a research question aimed at answering geneticists regarding the structure that is formed by similarities between individuals of one or more populations. Numerous indices have been proposed to obtain the optimal number of groups that make up the population genetic structure (PGS).However, there is no consensus on which are the best. In order to determine the optimal number of groups constituting the PGS,a simulation study was conducted of nine PGS scenarios with three subpopulation numbers (k = 2, 5, and 10) and three levels of genetic differentiation recreating various maize genomes to evaluate four internal validation indices: CH, Connectivity, Dunn and Silhouette. This study found that the Dunn and Silhouette indices had the best performance in identifying the true number of underlying groups while Connectivityhadthe worst. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present findings may have implications for other crop species.
format Online
id oai:ojs.revistas.unc.edu.ar:article-34015
institution Universidad Nacional de Cordoba
language spa
publishDate 2022
publisher Facultad de Ciencias Agropecuarias
record_format ojs
spelling oai:ojs.revistas.unc.edu.ar:article-340152023-04-27T17:38:16Z Cluster validation to depict population genetic structure Validación de agrupamientos para representar estructura genética poblacional Videla, María Eugenia Bruno, Cecilia genetic data cluster analysis index selection exploratory data analysis datos genéticos análisis de conglomerados índices de selección análisis exploratorios de datos Since the beginning of statistics, the identification of the underlying number of existing groups in a population has been a research question aimed at answering geneticists regarding the structure that is formed by similarities between individuals of one or more populations. Numerous indices have been proposed to obtain the optimal number of groups that make up the population genetic structure (PGS).However, there is no consensus on which are the best. In order to determine the optimal number of groups constituting the PGS,a simulation study was conducted of nine PGS scenarios with three subpopulation numbers (k = 2, 5, and 10) and three levels of genetic differentiation recreating various maize genomes to evaluate four internal validation indices: CH, Connectivity, Dunn and Silhouette. This study found that the Dunn and Silhouette indices had the best performance in identifying the true number of underlying groups while Connectivityhadthe worst. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present findings may have implications for other crop species. Desde los comienzos de la estadística, ha existido la necesidad de identificar el número subyacente de grupos existentes en una población, para dar respuestas a genetistas con respecto a la estructura que se forma por similitudes entre individuos de una o más poblaciones. Se han propuesto numerosos índices para obtener el número óptimo de grupos,que conforman la estructura genética poblacional (EGP).Sin embargo, no hay consenso sobre cuáles son los de mejor desempeño. Para determinar el número óptimo de grupos que definen la EGP, se realizó un estudio de simulación de nueve escenarios de EGP con tres números de subpoblaciones (k = 2, 5 y 10) y tres niveles de diferenciación genética, recreando varios genomas de maíz, para evaluar cuatro índices de validación internos: CH, Connectivity, Dunn y Silhouette. En este estudio, se encontró que,los índices de Dunn y Silhouette tienen el mejor desempeño para identificar el verdadero número de grupos subyacentes, mientras que Conectividad, el peor. Este estudio ofrece una alternativa sólida para revelar la EGP existente, facilitando así los estudios de población y las estrategias de mejoramiento de cultivos.Además, los presentes hallazgos pueden tener implicaciones para otras especies de cultivos. Facultad de Ciencias Agropecuarias 2022-06-30 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf application/pdf https://revistas.unc.edu.ar/index.php/agris/article/view/34015 10.31047/1668.298x.v39.n1.34015 AgriScientia; Vol. 39 No. 1 (2022); 59-69 AgriScientia; Vol. 39 Núm. 1 (2022); 59-69 1668-298X 10.31047/1668.298x.v39.n1 spa https://revistas.unc.edu.ar/index.php/agris/article/view/34015/38047 https://revistas.unc.edu.ar/index.php/agris/article/view/34015/41122 Derechos de autor 2022 María Eugenia Videla, Cecilia Bruno https://creativecommons.org/licenses/by-sa/4.0
spellingShingle genetic data
cluster analysis
index selection
exploratory data analysis
datos genéticos
análisis de conglomerados
índices de selección
análisis exploratorios de datos
Videla, María Eugenia
Bruno, Cecilia
Cluster validation to depict population genetic structure
title Cluster validation to depict population genetic structure
title_alt Validación de agrupamientos para representar estructura genética poblacional
title_full Cluster validation to depict population genetic structure
title_fullStr Cluster validation to depict population genetic structure
title_full_unstemmed Cluster validation to depict population genetic structure
title_short Cluster validation to depict population genetic structure
title_sort cluster validation to depict population genetic structure
topic genetic data
cluster analysis
index selection
exploratory data analysis
datos genéticos
análisis de conglomerados
índices de selección
análisis exploratorios de datos
topic_facet genetic data
cluster analysis
index selection
exploratory data analysis
datos genéticos
análisis de conglomerados
índices de selección
análisis exploratorios de datos
url https://revistas.unc.edu.ar/index.php/agris/article/view/34015
work_keys_str_mv AT videlamariaeugenia clustervalidationtodepictpopulationgeneticstructure
AT brunocecilia clustervalidationtodepictpopulationgeneticstructure
AT videlamariaeugenia validaciondeagrupamientospararepresentarestructurageneticapoblacional
AT brunocecilia validaciondeagrupamientospararepresentarestructurageneticapoblacional