Artificial Neural Network Generalization and Simplification via Pruning

Andrey Bondarenko, Arkady Borisov


Artificial neural networks (ANNs) are well known for their classification abilities. Although choosing hyper- parameters such as neuron layer count and size can be a quite tedious task. Pruning approaches assume that a sufficiently large ANN has already been trained and can be simplified with acceptable classification accuracy loss. The current paper presents a node pruning algorithm and gives experimental results for pruned network accuracy rates versus their non-pruned counterparts.


Artificial neural networks; generalization; overfitting; pruning

Full Text:



A.-Krizhevsky, I. Sutskever, G. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”, Advances in Neural Information Processing Systems 25, NIPS, 2012.

G. Hinton, L. Deong, D. Yu, G. Dahl and others, “Deep Neural Networks for Accoustic Modelling in Speech Recognition”. IEEE Signal Processing Magazine, Nov., 2012.

X. Qiang, G. Cheng, Z. Wang, “An Overview of Some Classical Growing Neural Networks and New Developments”, IEEE, Education Technology and Computer (ICETC), 2nd International conference, vol. 3. 2010.

V. Chaudhary, A. K. Ahlawat, R. S. Bhatia, “Growing Neural Networks using Soft Competitive Learning”. International Journal of Computer Applications (0975-8887) vol. 21, no. 3, May 2011.

R. Reed, “Pruning Algorithms – A Survey”, IEEE Transactions on Neural Networks, vol. 4., no. 5., Sep. 1993.

M. C. Mozer and P. Smolensky, “Skeletonization: A Techique for Trimming the Fat From a Network via Relevance Assessment,” in Advances in Neural Information Processing, pp. 107–115, Denver, 1989.

B. E. Segee and M. J. Carter, “Fault Tolerance of Pruned Multilayer Networks,” in Proc. Int. Joint Conf. Neural Networks, vol. 2, Seattle, pp. 447–452, 1991.

E. D. Karnin, “A Simple Procedure For Pruning Back-Propagation Trained Neural Networks”, IEEE Trans. Neural Networks, vol. 1., no. 2, pp. 239–242, 1990.

R. Setiono and H. Liu, “Understanding Neural Networks via Rule Extraction,” IJCAI, 1995.

R. Setiono and W. H. Leow, “Pruned Neural Networks for Regression” in PRICAI 2000 Topics in Artificial Intelligence, Lecture Notes in Computer Science, vol. 1886, 2000, pp. 500–509.

Y. Le Cun, J. S. Denker, and S. A. Solla, “Optimal Brain Damage,” in Advances in Neural Information Processing (2), D. S. Touretzky Ed. (Denver 1989), 1990, pp. 598–605.

B. Hassibi, D. G. Stork, G. J Wolf, “Optimal Brain Surgery and General Network Pruning.”

Y. Chauvin, “A Back-Propagation Algorithm With Optimal Use of Hidden Units” Advances in Neural Information Processing, (1) D. S. Touretzky ed. (Denver 1998), 1989, pp. 519–526.

A. S. Weigend, D. E. Rumelhart, and B. A. Huberman, “Back- Propagation, Weight Elimination and Time Series Prediction,” in Proc. 1990 Connectionist Models Summer School, D. Touretzky, J. Elman, T. Sejnowsky, and G. Hinton, Eds., 1990, pp. 105–116.

A. S. Weigend, D. E. Rumelhart, and B. A. Huberman, “Generalization by Weight-Elimination Applied to Currency Exchange Rate Prediction,” in Proc. Int. Joint Conf. Neural Networks, vol. I, (Seattle), 1991, pp.837–841.

A. S. Weigend, D. E. Rumelhart and B. A. Huberman, “Generalization by Weight-Elimination With Application to Forecasting,” in Advances in Neural Information Processing (3) R. Lippmann, J. Moody, and D. Touretzky, Eds., 1991, pp. 875–882.

C. Ji, R. R. Snapp, and D. Psaltis, “Generalizing Smoothness Constraints From Discreet Samples,” Neural Computation, vol. 2, no. 2, 1990, pp. 188–197.

D. C. Plaut, S. J. Nowlan, and G. E. Hinton, “Experiments on Learning by Back Propagation,” Tech. Rep. CMU-CS-86-126, Carnegie Mellon Univ., 1986.

S. J. Nowlan, and G. E. Hinton, “Simplifying Neural Networks by Soft Weight-Sharing,” Neural Computation vol. 4, no. 4, 1992, pp. 473–493.

L. Prechelt, “Adaptive Parameter Prunning in Neural Networks,” International Computer Science Institute, Mar. 1995.

W. Finnoff, F. Hergert, and H. G. Zimmermann, “Improving Model Selection by Nonconvergent Methods”, Elsiever Neural Networks, vol. 6, no. 6, 1993, pp. 771–783.

J. K. Kruschke, “Creating Local and Distributed Bottlenecks in Hidden Layers of Back-Propagation Networks,” in Proc. 1988 Connectionist Models Summer School, D. Touretzky, G. E. Hinton, and T. Sejnowsky, Eds., 1988, pp 120–126.

J. K. Kruschke, “Improving Generalization in Back-Propagation Networks with Distributed Bottlenecks,” in Proc. Int. Joint Conf. Neural Networks, Washington DC, vol. 1, 1989, pp.443–447.

K. Bache, M. Lichman, (2013), UCI Machine Learning Repository [Online]. Available:, Irvine, CA: University of California, School of Information and Computer Science. Accessed Sept 15, 2014.

P. Golik, P. Doetsch, and H. Ney, “Cross-Entropy vs. Squared Error Training: a Theoretical and Experimental Comparison”, in Interspeech, pp. 1756–1760, Lyon, France, August 2013.


  • There are currently no refbacks.

Copyright (c) 2014 Andrey Bondarenko, Arkady Borisov

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.