Integrated Network Approach to Protein Function Prediction

Natalia Novoselova, Igar Tom


One of the main problems in functional genomics is the prediction of the unknown gene/protein functions. With the rapid increase of high-throughput technologies, the vast amount of biological data describing different aspects of cellular functioning became available and made it possible to use them as the additional information sources for function prediction and to improve their accuracy.
In our research, we have described an approach to protein function prediction on the basis of integration of several biological datasets. Initially, each dataset is presented in the form of a graph (or network), where the nodes represent genes or their products and the edges represent physical, functional or chemical relationships between nodes. The integration process makes it possible to estimate the network importance for the prediction of a particular function taking into account the imbalance between the functional annotations, notably the disproportion between positively and negatively annotated proteins. The protein function prediction consists in applying the label propagation algorithm to the integrated biological network in order to annotate the unknown proteins or determine the new function to already known proteins. The comparative analysis of the prediction efficiency with several integration schemes shows the positive effect in terms of several performance measures. 


computational biology; data mining; functional association network; binary classification

Full Text:



S. F. Altschul, et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., vol. 25, no. 17, pp. 3389–3402, 1997.

S. Letovsky and S. Kasif, “Predicting protein function from protein/protein interaction data: a probabilistic approach,” Bioinformatics, vol. 19 (Suppl. 1), pp. i197–i204, 2003.

Y. Kourmpetis, A. van Dijk, M. Bink, R. van Ham, and C. ter Braak, “Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data,” PLoS ONE, vol. 5, no. 2, p. e9293, 2010.

N. Nariai, E. D. Kolaczyk, and S. Kasif, “Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data,” PLoS ONE, vol. 2, no. 3, p. e337, 2007.

U. Karaoz et al., “Whole genome annotation by using evidence integration in functional-linkage networks,” Proc. Natl. Acad. Sci. USA, vol. 101, no. 9, pp. 2888–2893, 2004.

M. Deng, K. Zhang, S. Mehta, T. Chen, and F. Sun, “Prediction of protein function using protein-protein interaction data,” Journal of Computational Biology, vol. 10, no. 6, pp. 947–960, 2004.

P. Pagel, S. Kovac, M. Oesterheld et al., “The MIPS mammalian proteinprotein interaction database,” Bioinformatics, vol. 21, no. 6, pp. 832–834, 2005.

C. Stark, B. J. Breitkreutz, A. Chatr-Aryamontri et al., “The BioGRID interaction database: 2011 update,” Nucleic Acids Res., vol. 39, no. 1, pp. D698– D704, 2011.

D. Szklarczyk, A. Franceschini, M. Kuhn M et al., “The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored,” Nucleic Acids Res., vol. 39, pp. D561–568, 2011.

M. Re and G. Valentini, “Ensemble Based Data Fusion for Gene Function Prediction,” in Proc. of the 8th International Workshop on Multiple Classifier Systems, Reykjavik, Iceland, June 10–12, 2009.

G. R. Lanckriet, M. Deng, N. Cristianini, M. I. Jordan, and W. S. Noble, “Kernel-based data fusion and its application to protein function prediction in yeast,” Proc. of the Pacific Symposium on Biocomputing, vol. 9, pp. 300–311, 2004.

M. Frasca, A. Bertoni, and G. Valentini, “UNIPred: unbalance-aware Network Integration and Prediction of protein functions,” Journal of Computational Biology, vol. 22, no. 12, pp. 1057–1074, 2015.

W. Noble and A. Ben-Hur, “Integrating information for protein function prediction,” in Bioinformatics-From Genomes to Therapies, Vol. 3, T. Lenguaer, Ed. WILEY‐VCH Verlag GmbH & Co., 2007, pp. 1297–1314.

N. Cesa-Bianchi, M. Re, and G. Valentini, “Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference,” Machine Learning, vol. 88, no. 1–2, pp. 209–241, 2012.

N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, “On kerneltarget alignment,” in Proc. of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, December 03–08, 2001, Vancouver, British Columbia, Canada. [Online]. Available: [Accessed:Oct. 5, 2018].

S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris, ”Genemania: A real-time multiple association network integration algorithm for predicting gene function,” Genome Biology, vol. 9, Suppl. 1:S4, 2008.

X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-supervised learning using Gaussian fields and harmonic functions,” in Proc. of the Twentieth International Conference on Machine Learning, pp. 912–919, August 21– 24, 2003, Washington, DC, USA.

L. Pena-Castillo, T. Murat, C. L. Myers, H. Lee, T. Joshi, C. Zhang, and et al., “A critical assessment of Mus musculus gene function prediction using integrated genomic evidence,” Genome Biology, vol. 9, Suppl. 1:S2, 2008.

Pfam 32.0 (September 2018, 17929 entries). [Online]. Available: [Accessed: Oct. 5, 2018].

InterPro: protein sequence analysis & classification. [Online]. Available: [Accessed: Oct. 5, 2018].

Online Mendelian Inheritance in Man (OMIM). [Online]. Available: [Accessed: Oct. 5, 2018].

Gene Ontology Consortium. [Online]. Available: [Accessed: Oct. 5, 2018].

N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor, “On kernel target alignment,” in Studies in Fuzziness and Soft Computing, Vol. 194, Innovations in Machine Learning, Springer-Verlag, 2006, pp. 205–256.

DOI: 10.7250/itms-2018-0016


  • There are currently no refbacks.

Copyright (c) 2018 Natalia Novoselova, Igar Tom

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.