Title
FindMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
Date Issued
01 January 2022
Access level
open access
Resource Type
journal article
Author(s)
Chojnowski G.
Simpkin A.J.
Seifert-Davila W.
Keegan R.M.
Rigden D.J.
Publisher(s)
International Union of Crystallography
Abstract
Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.
Start page
86
End page
97
Volume
9
Language
English
OCDE Knowledge area
Bioinformática
Bioquímica, Biología molecular
Subjects
Publication version
Version of Record
Scopus EID
2-s2.0-85122761369
Source
IUCrJ
ISSN of the container
2052-2525
Sponsor(s)
This work was partially supported by the Biotechnology and Biological Sciences Research Council (BB/S007105/1).
Sources of information:
Directorio de Producción Científica
Scopus