Hybrid Classification Model for Biomedical Data Analysis

Natalia Novoselova, Igor Tom


The paper describes a method for constructing a hybrid classification model that allows combining several sources of biological information in order to build a classifier to identify subtypes of complex diseases. The distinctive feature of the method is its adaptive nature, i.e. the ability to build efficient classifiers regardless of data types, as well as a multi-criteria approach to evaluate the effectiveness of a classification. The testing results on real biomedical data showed the advantages of the proposed hybrid model in comparison with individual classifiers. 


Classification; efficiency criteria; gene expression; hybrid classifier

Full Text:



M. J. Zaki and W. Meira, Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, 2014. https://doi.org/10.1017/CBO9780511810114

A. Statnikov et al. “A comprehensive evaluation of multicategory classification methods for microbiomic data,” Microbiome, vol. 1, no. 1, Art. no. 11, Apr. 2013. https://doi.org/10.1186/2049-2618-1-11

S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discrimination methods for the classification of tumors using gene expression data,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 77–87, Dec. 2002. https://doi.org/10.1198/016214502753479248

J. R. Quinlan, C4.5: Programs for Machine Learning. Elsevier, 2014.

L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wiley, 2014. https://doi.org/10.1002/9781118914564

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. https://doi.org/10.1023/A:1010933404324

G. Valentini and F. Masulli, “Ensembles of learning machines,” in Lecture Notes in Computer Science, vol. 2486, Neural Nets WIRN Vietri-2002, R. Tagliaferri, M. Marinaro, Eds. Springer, Berlin, Jan. 2002, pp. 3–19. https://doi.org/10.1007/3-540-45808-5_1

R. E. Schapire and Y. Freund, Boosting: Foundations and Algorithms. MIT Press, 2012. https://doi.org/10.7551/mitpress/8291.001.0001

O. Okun and H. Priisalu, “Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors”, Artificial intelligence in medicine, vol. 45, no. 2–3, pp. 151–162, Feb.–Mar. 2009. https://doi.org/10.1016/j.artmed.2008.08.004

T. Hastie, “Multi-class adaboost”, Statistics and its Interface, vol. 2, no. 3, pp. 349–360, Jan. 2009. https://doi.org/10.4310/SII.2009.v2.n3.a8

Y. Wang et al. “Gene selection from microarray data for cancer classification – a machine learning approach”, Computational biology and chemistry, vol. 29, no. 1, pp. 37–46, Feb. 2005. https://doi.org/10.1016/j.compbiolchem.2004.11.001

H. Liu and L. Yu, “Toward integrating feature selection algorithms for classification and clustering”, IEEE Trans. on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–502, Apr. 2005. https://doi.org/10.1109/TKDE.2005.66

The Cancer Genome Atlas. [Online]. Available: http://cancergenome.nih.gov/abouttcga. Accessed on: Oct. 6, 2022.

DOI: 10.7250/itms-2022-0003


  • There are currently no refbacks.

Copyright (c) 2022 Natalia Novoselova, Igor Tom

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.