Decision Tree Creation Methodology Using Propositionalized Attributes

Abstract The aim of the article is to analyse and thoroughly research the methods of construction of the decision trees that use decision tree learning with statement propositionalized attributes. Classical decision tree learning algorithms, as well as decision tree learning with propositionalized attributes have been observed. The article provides the detailed analysis of one of the methodologies on the importance of using the decision trees in knowledge presentation. The concept of ontology use is offered to develop classification systems of decision trees. The application of the methodology would allow improving the classification accuracy.


I. INTRODUCTION
This paper is the continuation of last year's publication "Ontology-based Classification System Development Methodology" [7], where the main goal was "to analyse ontology-based classification systems with decision trees".Some methods were investigated -ontology-based inductive learning systems with classification rules and decision tree learning with taxonomy of proposed attributes.
To achieve the goal, theoretical research was conducted on the methods using ontologies in decision-tree classification systems.Ontologies were used in classification tasks with real and artificial data [5], [8].Ontologies of the proposed attributes were recommended to improve the quality of classification for datasets with a small amount of unique values.
To complete this topic, it is planned to undertake the detailed study on decision tree construction techniques using propositionalized attributes.
Nowadays, the analysis and interpretation of data processing results are significant and important.Usually, there is a desire to reflect the results in the form of the rule -in the form of knowledge.Therefore, it is necessary to search for the ways and methods of such knowledge mining.
Classification is one of the main tasks of data miningdetermination of the belonging of the object to predefined object groups.These predefined groups are called classes, but the process -classification.During a classification stage, the classification model or classifier is created -the model determines classes based on the rules that are derived during classification.There are a lot of classic algorithms and techniques to carry out classification, but in ontology classification and clustering they are used rarely, however, the need for them is confirmed by the author [6], who describes problems with the use of data mining in e-commerce and points out that hierarchical background knowledge is necessary.Solving such problems could be one of the possible uses of the ontology classification, so the authors' motivation is to investigate the methods of using ontologies in decision treebased classification systems.

II. CLASSICAL DECISION TREE LEARNING METHODS
Data classification process consists of two stages: training on the base of existing data and new data classification.First, the model training is carried out using one of the classification algorithms.In this process, a classification model or classifier is obtained.After that the use of classifier on new data is carried out, including model testing and classification evaluation.
Classification system includes a classifier, pre-treatment, post-processing, and classifier modelling.
Decision trees -a way of representing rules in a hierarchical, coherent structure, where each object corresponds to a single node, giving the decision.The rule refers to a logical structure presented in the form of "If-Then".
The application area of decision trees is wide, but all the problems solved by this unit can be grouped into the following classes:  Data description: Decision trees allow storing information about the data in a compact form; instead we can store a decision tree that contains an exact description of the objects. Classification: Decision trees primarily cope with the tasks of classification, i.e. matching the objects with one of the previously known classes.The target variable must have discrete values.Classification trees are considered in the article.A decision tree is a classification using a recursive instance space division.Decision tree is composed of nodes and oriented arcs.The root node has no incoming arcs.All other nodes have exactly one incoming arc.Internal node has an incoming arc and one or more outcoming arcs.The leaves of decision tree are nodes, which have an incoming arc, but have no outcoming arcs.
For example, a decision tree for a well-known Iris dataset [12] is given in Fig. 1.

A. Related Studies
Three approaches to ontology use for higher accuracy and shorter rule generation can be found in literature review [9]:  Attribute value taxonomy (AVT);  Word taxonomy (WT);  Propositionalized attribute taxonomy (PAT).
The value of the attribute in taxonomy use for the creation of the classifier single taxonomy for each attribute is used in order to obtain classification rules of different levels.This approach automatically generates the taxonomy of attribute values for each attribute and uses special decision tree learning to create the rules.To solve classification problems in decision tree learning, which is based on attribute value taxonomy, the Naive Bayes classifier is used.
The word taxonomy is used to group words and sentences hierarchically, and then this taxonomy is used to classify the entries.This approach also uses the Naive Bayes classifier.
Next, the use of propositionalized attribute taxonomy methodology in classification tasks is considered and analysed.Kang [9] offers a new automatic way how domain ontology can be obtained from the data sets to be used to classify the database entries in different sections with the help of a decision tree.
The method introduces an attribute or propositionalized attribute taxonomy transformed into statements, PAT, to conduct the learning algorithm of a decision tree, PAT-DTL, which extends the C4.5 learning algorithm of decision tree to be used in the created PAT taxonomy.PAT-DTL is used in both top-down and bottom-up directed search methods in the PAT taxonomy to find the necessary abstraction for a classification task.
Propositionalization is a process where a relational data set is clearly and explicitly transformed into a propositional data set.In this process, the input data are in the form of a relational database table and the output data are an attribute-value representation in the form of one table, where each example applies to one entry and is described with the values of a specific attribute set.The aim of propositionalization is to make the pre-treatment of relational data in order to analyse them later using machine learning tools, which use attribute-value input data.
The operation of these algorithms is presented in detail in [9].

IV. DECISION TREE AND KNOWLEDGE PRESENTATION
Decision tree method makes it possible to predict the belonging of objects to one or another class depending on the respective values of attributes characterising these objects.Decision trees provide the construction of automatic rules "If-Then" on available statistics and on the basis of that further decision on affiliation of observation or object to a particular class is made.
Let there be objects represented by a set , … , , where each element of this set is described by the same set of attributes named , 1, … , .Each propositionalized or pseudo-attribute can take values-, 1, … , , measured in a random scale.
If we look at the statistics, such as, for example, the bank clients [1], then the clients themselves are the set .Each client is characterised by a set of characteristics: gender, age, crediting purpose, total income, etc.These attributes are , , etc. Attribute (gender) can take two values: and , i.e, , and so on.Let there be a set of classes .Herewith, each object of set (each bank customer) has been assigned to a certain class of objects, and this is shown in the statistics.For example, in the case of bank customers there can be two classes: (the borrower repays the loan on time) and (the borrower fails to satisfactorily repay the loan).It is required to construct the classifying rules to identify the regularities between the values of the attributes of each object from set and class , which the object belongs to.
Classifying rule is the following: if the attributes of the object 1, … , takes the values … , then belongs to class .
To construct the classifying rule, it is necessary to construct a decision tree first, the top (root) of which is a check of the first attribute value of the presented object for conformity with the class, branches -intermediate checks, and the leaves -classes of objects.Such a tree can have the form shown in Fig. 2.Then, by each branch from top to bottom the renewal of the classifying rule takes place.
Constructing the tree in Fig. 2 presents no problems when for each set (or group of sets) of object attribute values to one correspondence it is possible to put a definite class of objects.However, in practical problems such compliance often has probabilistic nature.In other words, one of the selected object classes corresponds to each of the ordered sequence of object attribute values only with a certain probability.In this case, the classification is performed in a probabilistic uncertainty.

……….
……….Let there be some sample of statistics, where for different sets of attribute values of the selective set of object the appropriate classes are set (Table I).

Name of object
Values of object attributes Class of object In Table I, different classes may correspond to the same sets of attribute values of different objects, which results in the probabilistic uncertainty of classification.In other words, an event, which establishes a correspondence between the values of attribute chain and a certain class, is a random event and is characterised by a certain probability.
For example, if there is a single attribute with two possible values: and then the table of statistics for 10 objects (set ) might look as follows (Table II).

Appropriate classification rules under the condition of probabilistic uncertainty can be written as follows:
 if , then with a probability of 5/7 the object belongs to class ;  if , then with probability of 2/3 the object belongs to class .It is seen that in the presence of a single attribute, the automatic construction of rules is not difficult.In case there are several attributes, it is needed to select the order of the sequence of analysis of their values.Ordering attributes is advantageously carried out according to the principle of maximum removal of uncertainty; the measure of uncertainty is the information entropy.
If the characteristic attributes were not used, then the probability of assigning any new object (not included in the Table II) to class would be 6/10, and to class − 4/10 .In this case, the value of entropy is calculated by the formula: The use of a single attribute for classification from Table II reduces the extent of uncertainty.The corresponding value of entropy in this case is given by: When using several attributes as the first attribute for analysis, it is necessary to select the one that provides the maximum reduction of classification uncertainty with respect to the original set.The analysis of classifying ability of the second attribute indicates that the capacity of sub-multitude is equal to 7, and probability | 6/7 ; the capacity of subset is equal to 3, and the probability | 1.We calculate the value of entropy for the second attribute: The second attribute eliminates the uncertainty to a much greater extent; therefore, in this example the chain of checks in the decision tree begins with the second attribute.Let there be a set of objects , where each element of the set is described with attributes , … , .Also, a set of classes , 1, … , is given and it is known to which class each object of set belongs.The process of constructing the tree takes place from the top down -the root of the tree is created first, then the descendants of the root, etc.
We use the following algorithm to construct a decision tree [1]: Step 1.There is an empty tree (there is only a root), and the initial set (associated with the root).In the root of the tree, the statistical probabilities of belonging of each new object to a particular class , 1, … , are counted.Apparently, / , where -the number of objects of set belonging to class .
Step 2. It is required to divide the initial set into subsets.It can be done by selecting one of the attributes and sorting through all the possible values of this attribute.At the same time, each value of is associated with a particular subset of set consisting of elements from which attribute has taken this value.Then, as a result of division, there are subsets of , … , , and respectively descendants of the root are created, each of which is assigned to its own subset resulting in division of .For each of the descendants (respectively, for each subset of ) probabilities of belonging of each new object to one or another class , 1, … , are calculated.Unlike the prior probabilities , these probabilities are already conditional, as they depend on a value the attribute takes.Calculation of the probabilities is , 1, … , , 1, … , carried out according to the Bayes' formula: / ⋯ .
Step 3.All the actions of Step 2 are repeated, but there is already a division into subsets, the subsets of themselves by the following selected attribute.Each set can be divided into subsets on different attributes.To calculate conditional probabilities, the Bayes' formula will change, as the event is that two attributes have adopted certain values (moreover, independently of each other): , , , ⋯ , .
Step 4. All the actions of Step 3 are repeated with further division into subsets (creation of descendants) and the correction of Bayes' formula based on appearance of another attribute, etc.The criterion for choosing the attribute on which the division of corresponding subset of should take place is the minimum entropy: min .
The process of branching in a certain direction takes place until we get the top, in which the posterior probability of belonging the object to a certain class is equal to 1.

V. CONCEPT OF ONTOLOGY-BASED CLASSIFICATION SYSTEM
Decision tree learning, using pseudo-attribute taxonomy, consists of the following stages (see Fig. 3): pre-treatment of the data sets; pseudo-attribute data set creation from the data set attributes; pseudo-attribute taxonomy creation; the decision tree learning and test performance.A further statement is based on [9] and [7] of described algorithms.
Stage 1.It is possible to use a wide range of data sets with some limitations: the data set must be full or the missing values should make a small percentage of all values.If the data set contains continuous attributes, they must be converted into discrete intervals.Stage 2. Pseudo-attribute set is created as follows: first, for each attribute this unique attribute value domain is found.Then, based on this unique value pseudo-attributes are designed that consist of attribute-value pairs and pseudo-attribute value set is 0,1 or , .After new pseudo-attribute creation, a set of data is transformed, converting each entry into a new pseudo-attribute set.
Stage 3. Pseudo-attribute taxonomy can be created from the new data set using similarity measures.Taxonomy is created agglomeratively as a starting point choosing the pseudoattribute set, then the class distribution to pseudo-attribute value "1" is calculated for each pseudo-attribute.Taking into account the obtained values, the similarity measure J-divergence is calculated for each pair of pseudo-attributes.
A pair of attributes with the lowest J-divergence value is found and pseudo-attribute pair ( and ) with the lowest J-divergence value is incorporated into a new value by combining the attribute values with logical OR.Then the class distribution to the combined attribute is calculated and attribute is added to the taxonomy as a parent according to the and terms.
After that, the data set is changed -the combined value of is added to the data set and pseudo-attributes and are removed from the data set.Then it is checked whether the current cut size = 1.If the current cut size is 1, then the aim has been achieved and taxonomy is withdrawn.If the current cut size is not one cut, then J-divergence values are calculated again to determine which attributes are next to be combined.
Stage 4. The decision tree creation and testing include the fulfilment of multiple C4.5 algorithms, which are based on data sets that are formed in accordance with the previously created taxonomy.After the decision tree creation, the cross-validation is done and testing accuracy is obtained.Then the parent set of cut elements is made and each element of pseudo-attribute data set is replaced with its parent.
The work is finished when all taxonomies are passed through or in parent data set there are no more "valid" parents.

VI. CONCLUSION
There are a lot of methodologies with propositionalized attributes on decision tree construction techniques.The methodology developed in the article is one of many methodologies of such type used in decision tree learning and ontology application, and the authors' task has been to explore suchlike methodology.One of the methodologies on decision tree construction with assistance of propositionalized attributes for knowledge presentation has been examined in the article.The concept of ontology use in developing classification systems of decision trees has been proposed.The application of the methodologies would allow improving the classification accuracy.The use of ontology in classification tasks, decision tree learning and analysis has great prospects.In the future research, the opportunities of this methodology will be evaluated comparing it to other similar methodologies.

TABLE I CONFORMITY
OF CLASSES TO DIFFERENT ATTRIBUTE VALUES OF THE SET OF OBJECT As an example, let us include one more attribute with values , in Table II and we will get Table III.

TABLE III CONFORMITY
OF CLASSES TO DIFFERENT VALUES OF ATTRIBUTE