Clustering Algorithm for Travel Distance Analysis

Nadezda Zenina, Arkady Borisov

Abstract


An important problem in the application of cluster analysis is the decision regarding how many clusters should be derived from the data. The aim of the paper is to determine a number of clusters with a distinctive breaking point (elbow), calculating variance ratio criterion (VRC) by Calinski and Harabasz and J-index in order to check robustness of cluster solutions. Agglomerative hierarchical clustering was used to group a data set that is characterized by a complex structure, which makes it difficult to identify a structure of homogeneous groups. Stability of cluster solutions was performed by using different similarity measures and reordering cases in the dataset.

Keywords:

Agglomerative hierarchical clustering; distinctive breaking point (elbow); J index; variance ratio criterion

Full Text:

PDF

References


R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, New York, NY, 1973.

G. W. Milligan and M. C. Cooper. An examination of procedures for determining the number of clusters in a data set. Psychometrica: 50, 1985, pp. 159-179.

A.Ben-Hur and I.Guyon. Detecting stable clusters using principal component analysis. In Functional Genomics:Methods and Protocols. M.J.Brownstein and A.Kohodursky (eds.) Humana press, 2003, pp. 159- 182.

S. Salvador and P. Chan. Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms. 16th IEEE International Conference on Tools with Artificial Intelligence, 2004, pp. 576 – 584.

R.Tibshirani and G. Walther and D.Botstein and P.Brown. Cluster validation by prediction strength. Stanford Technical Report, Department of Statistics, Stanford University, USA, 2001.

N.Rajalingam and K.Ranjini. Hierarchical Clustering Algorithm - A Comparative Study. International Journal of Computer Applications. Volume 19, No.3, April 2011.

J. Anable. Complacent Car Addicts’ or ‘Aspiring Environmentalists’? Identifying travel behavior segments using attitude theory. Transport Policy 12, 2005, pp. 65–78.

P. Hagel and R. Shaw. The Influence of Delivery Mode on Consumer Choice of University. European Advances in Consumer Research, Volume 8, 2008.

S. Limbourg and B. Jourquin. Rail-Road terminal locations: aggregation errors and best potential locations on large networks. EJTIR, Vol 7, no. 4, 2007, pp. 317-334.

R. B. Calinski and J. Harabasz, A dendrite method for cluster analysis, Comm. in Statistics, Vol 3, 1974, pp. 1–27.

E. Mooi and M. Sarstedt. A Concise Guide to Market Research. The Process, Data, and Methods Using IBM SPSS Statistics, 2011.

I. M. G. Dresen and T. Boe and J. Huesing and M. Neuhaeuser and K.-H. Joeckel1. New resampling method for evaluating stability of clusters. BMC Bioinformatics, 2008.

G. W. Milligan and M. C. Cooper. A study of variable standardization. Journal of Classification, pp. 181–204, 1988.

J. Smith and M. Saito, “Creating Land-Use Scenarios by Cluster Analysis for Regional Land-Use and Transportation Sketch Planning”, Journal of Transportation and Statistics, vol. 04, no. 01, paper 03. [Online]. Available: http://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/ files/publications/journal_of_transportation_and_statistics/volume_04_n umber_01/paper_03/index.html . [Accessed June 15, 2013].

Y. Mingjin, ”Methods of Determining the Number of Clusters in a Data Set and a New Clustering Criterion”, Dr. thesis. [Online]. Available: http://scholar.lib.vt.edu/theses/available/etd-12062005- 153906/unrestricted/Proposal-Face.pdf. [Accessed July 8, 2013].

ArcGIS home page. [Online]. Available: http://resources.arcgis.com/en/help/main/10.1/index.html#//005p000000 05000000. [Accessed May 29, 2013].


Refbacks

  • There are currently no refbacks.


Copyright (c) 2013 Nadezda Zenina, Arkady Borisov

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.