Title
A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
Date Issued
01 December 2021
Access level
open access
Resource Type
journal article
Author(s)
G. Ribeiro P.
Torres Jiménez M.F.
Andermann T.
Antonelli A.
Bacon C.D.
Institute of Entomology
Publisher(s)
John Wiley and Sons Inc
Abstract
The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.
Start page
6021
End page
6035
Volume
30
Issue
23
Language
English
OCDE Knowledge area
Bioinformática Ecología Genética, Herencia
Scopus EID
2-s2.0-85118255763
PubMed ID
Source
Molecular Ecology
ISSN of the container
09621083
Sponsor(s)
We would like to thank editors Bridget O’Boyle, Benjamin Sibbet and Sangeet Lamichhaney, and the three anonymous reviewers, for their constructive suggestions that helped improve this manuscript. We are grateful to Rayner Núñez (Instituto de Ecología y Sistemática, Cuba), Yves Basset (Smithsonian Tropical Research Institute, STRI, Panama) and André Freitas (UNICAMP, Brazil) for providing samples that were used for target enrichment sequencing. We thank Leonardo Ré Jorge for thorough help with statistical analyses. We thank the Servicio Nacional Forestal y de Fauna Silvestre (SERFOR), Peru, for assistance in obtaining research and export permits (Permit no. 223‐2017‐SERFOR/DGGSPFFS). We acknowledge the computational resources for this study provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085, under the programme “Projects of Large Research, Development, and Innovations Infrastructures”. The Swedish Research Council (2017‐04980) provided funding to M.F.T.J and C.D.B. T.A. received funding from the Swedish Research Council (2019‐04739) and funding to A.A. was provided by the Swedish Research Council, the Swedish Foundation for Strategic Research, and the Royal Botanic Gardens, Kew. The Grant Agency of the Czech Republic (GAČR grant: GJ20‐18566Y) and the Marie Skłodowska‐Curie Fellowship of the European Commission (MARIPOSAS‐704035) provided funding to P.M.M.
Sources of information: Directorio de Producción Científica Scopus