The Application of Class Structure to Classification Tasks

Inese Polaka, Arkady Borisov

Abstract


This article presents an approach in bioinformatics data analysis and exploration that improves classification accuracy by learning the inner structure of the data. The diseases studied in bioinformatics (diagnostic, prognostic etc. studies) often have the known or yet undiscovered subtypes that can be used while solving bioinformatics tasks providing more information and knowledge. This study deals with the problem above by studying inner class structures (probable disease subtypes) using a cluster analysis to find classification subclasses and applying it in classification tasks. The study also analyses possible cluster merges that would best describe classes. Evaluation is carried out using four classification methods that can be successfully used in bioinformatics: Naïve Bayes classifiers, C4.5, Random Forests and Support Vector Machines.


Keywords:

Bioinformatics, classification; class decomposition; data mining; data structure exploration

Full Text:

PDF

References


Cancer program data sets. [Online.] Available: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi [Accessed September 13, 2013].

Ward, J. H., Jr., “Hierarchical Grouping to Optimize an Objective Function,” Journal of the American Statistical Association, Vol. 48, pp. 236–244, 1963.

John, G. H., Langley, P. “Estimating Continuous Distributions in Bayesian Classifiers”, Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338-345, 1995.

Hall M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. H., The WEKA Data Mining Software: An Update, SIGKDD Explorations, Vol. 11, Issue 1, pp. 10-18, 2009.

Quinlan, R., C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA., 1993.

Breiman, L., Random Forests, Machine Learning, Vol 45, Issue 1, pp. 5- 32, 2001.

Platt, J., Fast Training of Support Vector Machines using Sequential Minimal Optimization. In B. Schoelkopf and C. Burges and A. Smola (eds.), Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, USA, 1998, pp. 185-208.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2013 Inese Polaka, Arkady Borisov

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.