Gastric Cancer Risk Analysis in Unhealthy Habits Data with Classification Algorithms

Arnis Kirshners, Inese Polaka, Ludmila Aleksejeva


Data mining methods are applied to a medical task that seeks for the information about the influence of Helicobacter Pylori on the gastric cancer risk increase by analysing the adverse factors of individual lifestyle. In the process of data pre- processing, the data are cleared of noise and other factors, reduced in dimensionality, as well as transformed for the task and cleared of non-informative attributes. Data classification using C4.5, CN2 and k-nearest neighbour algorithms is carried out to find relationships between the analysed attributes and the descriptive class attribute – Helicobacter Pylori presence that could have influence on the cancer development risk. Experimental analysis is carried out using the data of the Latvian-based project “Interdisciplinary Research Group for Early Cancer Detection and Cancer Prevention” database.


Classification; data pre-processing; gastric cancer risk analysis

Full Text:



Helicobacter Pylori and Cancer. USA: National Cancer Institute, 2015. [Online]. Available: h-pylori-cancer. [Accessed September 10, 2015].

W. D. Chey, B. C. Wong, “American College of Gastroenterology guideline on the management of Helicobacter pylori,” American Journal of Gastroenterology, vol. 102, pp. 1808-1825, 2007.

L. E. Wroblewski, R. M. Peek, K. T. Wilson, “Helicobacter pylori and Gastric Cancer: Factors That Modulate Disease Risk,” Clinical Microbiology Reviews, vol. 4, pp. 713–739, 2010.

Study to Prevent Gastric Cancer Mortality. Latvia: GISTAR, 2015. [Online]. Available: https://www/ [Accessed September 11, 2015].

Y. Zhu, X. Zhou, J. Wu, J. Su, G. Zhang, “Risk Factors and Prevalence of Helicobacter pylori Infection in Persistent High Incidence Area of Gastric Carcinoma in Yangzhong City,” Gastroenterol Res Pract., 2014:481365, 2014.

M. P. Dore, H. M. Malaty, D. Y. Graham, G. Fanciulli, G. Delitala, G. Realdi, “Risk Factors Associated with Helicobacter pylori Infection among Children in a Defined Geographic Area,” Clin Infect Dis, Vol. 35 (3), pp. 240-245, 2012.

The EUROGAST Study Group. Epidemiology of, and risk factors for, Helicobacter pylori infection among 3194 asymptomatic subjects in 17 populations. The EUROGAST Study Group. Gut, vol. 34(12), pp. 1672- 1676, 1993.

S. H. Lim, J. W. Kwon, N. Kim, G. H. Kim, J. M. Kang, M. J. Park, J. Y. Yim, H. U. Kim, G. H. Baik, G. S. Seo, J. E. Shin, Y. E. Joo, J. S. Kim, H. C. Jung, “Prevalence and risk factors of Helicobacter pylori infection in Korea: Nationwide multicenter study over 13 years,” BMC Gastroenterology, 13:104, 2013. 13-104

M. Hasosah, M. Satti, A. Shehzad, A. Alsahafi, G. Sukkar, A. Alzaben, A. Sunaid, A. Ahmed, S. AlThubiti, A. Mufti, K. Jacobson, “Prevalence and Risk Factors of Helicobacter pylori Infection in Saudi Children: A Three-Year Prospective Controlled Study,” Helicobacter, vol. 20(1), pp. 56–63, 2015.

J. Han and M. Kamber, Data Mining: Concepts and Techniques. Second Edition. Morgan Kaufmann, Elsevier Inc., 2006.

I. H. Witten and E. Frank, Data mining: practical machine learning tools and techniques – 2nd edition. Amsterdam etc.: Morgan Kaufman, 2005.

D. Pyle, Data Preparation for Data Mining. San Francisco etc.: Morgan Kaufmann, 1999.

P. N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining. Boston: Pearson Addison-Wesley, 2006.

Calculate Your Body Mass Index. USA: National Heart, Lung and Blood Institute, 2015. [Online]. Available: educational/lose_wt/BMI/bmicalc.htm. [Accessed September 11, 2015].

M. A. Hall, “Correlation-based feature selection for machine learning,” Doctoral Thesis, Hamilton: University of Waikato. 1999.

R. Tiwari, M. P. Singh, “Correlation-based Attribute Selection using Genetic Algorithm,” International Journal of Computer Applications, vol. 8, pp. 28-34. 2010.

A. Bharathi, E. Deepankumar, “Survey on Classification Techniques in Data Mining,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 7, pp. 1983–1986. 2014.

X. Wu, V. Kumar, J. R. Quinlan, et al., “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, pp. 1-37. 2007.

J. R. Quinlan C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann Pub., 1993.

P. Clark, T. Niblett, “The CN2 induction algorithm,” Machine Learning, 3(4), pp. 261-283, 1989.

M. W. Berry, M. Browne, Lecture Notes in Data Mining. World Scientific Publishing Co. Pte. Ltd., 2006.


  • There are currently no refbacks.

Copyright (c) 2015 Arnis Kirshners, Inese Polaka, Ludmila Aleksejeva

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.