Title
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
Date Issued
01 January 2022
Access level
open access
Resource Type
journal article
Author(s)
Leal T.P.
Furlan V.C.
Gouveia M.H.
Saraiva Duarte J.M.
Fonseca P.A.
Tou R.
Scliar M.d.O.
Araujo G.S.d.
Costa L.F.
Zolini C.
Peixoto M.G.C.D.
Carvalho M.R.S.
Lima-Costa M.F.
Rodrigues M.R.
Universidad Federal de Minas Gerais
Publisher(s)
Elsevier B.V.
Abstract
Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study.
Start page
1821
End page
1828
Volume
20
Language
English
OCDE Knowledge area
Genética, Herencia Biotecnología industrial
Scopus EID
2-s2.0-85128448851
Source
Computational and Structural Biotechnology Journal
ISSN of the container
20010370
Source funding
National Institute of Neurological Disorders and Stroke
Sponsor(s)
Agencia Estatal de Investigación de Minas Gerais Instituto Nacional de Trastornos Neurológicos y Accidentes Cerebrovasculares Consejo Nacional de Investigación Coordinación para el Perfeccionamiento del Personal de Educación Superior Consejo Nacional de Desarrollo Científico y Tecnológico Fundación de Apoyo a la Investigación del Estado de Minas Gerais Ministerio de Salud
Sources of information: Directorio de Producción Científica Scopus