Title
Mining of sequential patterns applied to the prediction of protein folding
Other title
Minería de Patrones Secuenciales aplicada a la Predicción del Plegamiento de Proteínas
Date Issued
01 January 2019
Access level
open access
Resource Type
conference paper
Author(s)
Publisher(s)
Latin American and Caribbean Consortium of Engineering Institutions
Abstract
Sequence mining consists of finding statistically relevant patterns in data collections represented sequentially. These, are an important type of data, where it matters the order that occupy the elements in the set and that finds a wide range of applications in Bioinformatics and Computational Biology. The prediction of protein structures is one of these applications. Where, a protein is no more than a sequence of amino acids forming patterns known as alpha helices, beta sheets and turns. For purposes of our investigation, these collections or secondary structures would be the itemsets, while the amino acids that make up the entire sequence, the items. Despite multiple attempts to predict protein folding, the algorithms developed to date only reach a 35% effectiveness. That is why we propose SPMCcm, an algorithm based on the prediction of frequent sequences and a scheme of classifiers. Which uses the information provided by the amino acid sequence, in two stages. Where, the first stage learns of the interactions between the secondary structures of the proteins, which it extracts as frequent sequences or itemsets. Meanwhile, the second stage learns of the interaction between the amino acids present in the interacting structures or items. The experimental evaluation showed that SPMCcm behaves in a similar way, independently of the base classifier used, reaching accuracies in the prediction of up to 48%, higher than the 35% reported by the literature, without using large computational resources and possessing explanatory capacity.
Volume
2019-July
Language
Spanish
OCDE Knowledge area
Ciencias de la Información Ciencias médicas, Ciencias de la salud Biotecnología relacionada con la salud
Scopus EID
2-s2.0-85073627233
ISSN of the container
24146390
ISBN of the container
978-099934436-1
Conference
Proceedings of the LACCEI international Multi-conference for Engineering, Education and Technology
Sponsor(s)
Esta investigación es una colaboración entre el Centro de Bioplantas y la Universidad de Ciego de Ávila, Cuba, la Universidad Católica de Santa María, Arequipa, Perú, y al Instituto Tecnológico de Ciudad Cuauhtémoc, Chihuahua, México. Especial agradecimiento a todo aquel que contribuyó con sus invaluables comentarios y consideraciones.
Sources of information: Directorio de Producción Científica Scopus