Title
Integrating biological knowledge based on functional annotations for biclustering of gene expression data
Date Issued
01 May 2015
Access level
open access
Resource Type
journal article
Author(s)
Nepomuceno J.
Troncoso A.
Nepomuceno-Chamorro I.
Universidad Pablo de Olavide
Publisher(s)
Elsevier Ireland Ltd
Abstract
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data.Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.
Start page
163
End page
180
Volume
119
Issue
3
Language
English
OCDE Knowledge area
Bioinformática Ciencias de la computación Ingeniería de sistemas y comunicaciones
Scopus EID
2-s2.0-84927175257
PubMed ID
Source
Computer Methods and Programs in Biomedicine
ISSN of the container
01692607
Sponsor(s)
We would like to thank Spanish Ministry of Science and Innovation, Junta de Andalucía and University Pablo de Olavide for the financial support under projects TIN2011-28956-C02-02, P12-TIC-1728 and APPB813097, respectively.
Sources of information: Directorio de Producción Científica Scopus