Markov Chains in the Task of Author’s Writing Style Profile Construction

Pavels Osipovs, Andrejs Rinkevics, Galina Kuleshova, Arkady Borisov

Abstract


This paper examines the possibility of using Markov chains when constructing a profile of author’s writing style. Thus, the constructed profile can be then used to analyze other texts and calculate their level of similarity. The extraction of the unique profile of text writing style that is characteristic of a specific human can be a topical task in many spheres of human activity. As an example, the task of detecting authorship for scientific and fiction texts can be mentioned. The paper describes a basic theoretical apparatus used for profile construction, software implementation of the experimental system as well as the experiments made and provides experimental results and their analysis.


Keywords:

Formalization of author’s writing style; level of texts similarity; Markov chain

Full Text:

PDF

References


P. A. Osipov and A. N. Borisov, “Abnormal action detection based on Markov models”, in Automatic Control and Computer Sciences, vol. 45, no. 2. 2011, pp. 94–105. http://dx.doi.org/10.3103/S0146411611020052

The GraphML File Format. [Online]. Available: http://graphml.graphdrawing.org. [Accessed 05 July, 2014].

M. S. Elayidom, C. Jose et al, “Text classification for authorship attribution analysis”, in Advanced Computing: An International Journal, ACIJ, vol. 4, no. 5, Sep. 2013, 10 p.

N. Homem and J. P. Carvalho, “Authorship Identification and Author Fuzzy Fingerprints” in Fuzzy Information Processing Society (NAFIPS), 2011 Annual Meeting of the North American, 978-1-61284-968- 3/11/2011 IEEE, 2011, pp. 1–6.

A. Metwally, D. Agrawal and A. Abbadi “Efficient Computation of Frequent and Top-k Elements in Data Streams”, University of California, Santa Barbara, USA, Tech. Rep. 2005–23, September, 2005.

R. M. Dabagh “Authorship attribution and statistical text analysis”, in Metodološki zvezki, vol. 4, no. 2, 2007, pp. 149–163.

R. Zheng, Yi Qin, Z. Huang, H. Chen, “Authorship analysis in cybercrime investigation”, H. Chen et al. (Eds.): ISI 2003, LNCS 2665, Springer-Verlag Berlin Heidelberg, 2003, pp. 59–73.

P. N. Bennett, S. T. Dumais and E. Horvitz. “The combination of text classifiers using reliability indicators”, Information Retrieval, vol. 8, no. 1, pp. 67–100, 2005.

C. Sanderson and S. Guenter, “On Authorship Attribution via Markov Chains and Sequence Kernels,” 18th International Conference on Pattern Recognition, ICPR 2006, Aug. 20–24, 2006, Hong Kong, China. http://dx.doi.org/10.1109/ICPR.2006.899

E. Stamatatos, W. Daelemans et al., “Overview of the Author Identification Task at PAN 2014”, CLEF Conference, PAN part, Sheffield, UK, Sep. 15–18, 2014.

H. P. Langtangen, “A Primer on Scientific Programming with Python”, in Texts in Computational Science and Engineering, vol. 6. 4th ed. 2014, XXXI, 872 p. ISBN 978-3-642-54959-5.

J. R. Johansson, P.D. Nation and F. Nori, “QuTiP: An open-source Python framework for the dynamics of open quantum systems”, in Computer Physics Communications, vol. 183, Issue 8, 2012, pp. 1760– 1772. http://dx.doi.org/10.1016/j.cpc.2012.02.021


Refbacks

  • There are currently no refbacks.


Copyright (c) 2014 Pavels Osipovs, Andrejs Rinkevics, Galina Kuleshova, Arkady Borisov

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.