Building Minimum Spanning Trees under Maximum Edge Length Constraint

– Given an initial set of planar nodes, the problem is to build a minimum spanning tree connecting the maximum possible number of nodes by not exceeding the maximum edge length. To obtain a set of edges, a Delaunay triangulation is performed over the initial set of nodes. Distances between every pair of the nodes in respective edges are calculated used as graph weights. The edges whose length exceeds the maximum edge length are removed. A minimum spanning tree is built over every disconnected graph. The minimum spanning trees covering a maximum of nodes are selected, among which the tree whose length is minimal is the solution. It is 1.17 % shorter on average for 10 to 80 nodes compared to a nonselected tree.

The task of building minimum spanning trees arises when complex networks are designed in order to connect a set of nodes [1], [2].Usually these are supply networks which should cover its nodes by paying the least possible cost of the coverage [1], [3], [4].To the contrary, the number of covered nodes is naturally desired to be maximum possible.
The coverage cost is mainly determined by distances between every pair of nodes directly linked in the network [5], [6].The sum of these distances is to be minimized.Built over a set of nodes connected with edges, a minimum spanning tree is a subset of the edges of an undirected graph that connects all the nodes without any cycles and with the minimum possible total edge length [5], [7], [8].In other words, the minimum spanning tree algorithm connects all the nodes by minimizing the cost of the connection [1], [5], [9], [10].
In many practical problems, building minimum spanning trees is constrained to using a limited number of connections (edges).Besides, the maximum edge length cannot be unlimited.For instance, while building a network of electrical power converters supplying the desired voltage to industrial and individual customers from electrical power stations, the length of any direct connection between two nodes is limited in order to minimize electrical loss and leak [1], [11], [12].A similar constraint is imposed on wired tree circuits [4], [6], computer [13], [14], measurement [15], broadcasting [16], telecommunication [17], and transportation networks [18].

II. RELATED WORKS AND MOTIVATION
Borůvka was the first who solved the minimum spanning tree problem dealing with a task of efficient electrical coverage [19].The Borůvka's algorithm repeats finding the shortest-length edge incident to each node of the graph, and adding all of those edges to the forest.Each repetition reduces the number of trees, within each connected component of the graph, to at most half of this former value.Another two commonly used algorithms for finding a minimum spanning tree are the Prim's algorithm and Kruskal's algorithm.Step by step, the Prim's algorithm builds a tree by adding the cheapest possible connection from the currently built tree to another node [20], [21].The starting node is arbitrary.The Kruskal's algorithm builds a tree by adding at each step the next shortest-length edge that will not form a cycle to the minimum spanning forest [6], [22], [23].At the termination of the algorithm, the forest forms a minimum spanning forest of the connected graph, for which the forest has a single component and forms a minimum spanning tree.The algorithms have nearly the same asymptotic time complexity varying from linear to polynomial [24], but the Prim's algorithm is claimed to perform better on dense graphs [25], whereas the Kruskal's algorithm is believed to perform acceptably on sparser graphs [1], [23], [26].However, building minimum spanning trees under maximum edge length constraint has not been studied yet.Moreover, the algorithms perform on a given set of edges (connections), which in realworld scenarios are yet to be determined for an initial set of nodes (locations) [27].

III. THE GOAL AND OBJECTIVES
Given a set of N planar nodes (points whose locations are two-component vectors), the problem is to build a minimum spanning tree connecting the maximum possible number of nodes by not exceeding the maximum edge length.The root node, which is to be unconditionally included into the tree, is assigned at the start.This is, for instance, the starting node in the Prim's algorithm.The root node can be used as an identifier of a minimum spanning tree, if there are a few such trees.Although such an identifier will be not unique for a tree, disconnected trees will be identified uniquely.To achieve this goal, the following eight objectives are to be fulfilled: 1.To define an initial set of planar nodes and describe how its topology is specified.
2. To suggest a way of obtaining a set of edges based on the given initial set of planar nodes.
3. To define the metric, by which the edge length is calculated.The lengths are to be used as weights of a graph, for which a minimum spanning tree will be built.
4. To impose a maximum edge length constraint on the problem solution.To suggest a way, by which the constraint is applied to the set of edges obtained from the given initial set of planar nodes.
5. To suggest a method of solving the problem under the maximum edge length constraint.
6. To compare the suggested method performance to solving the problem straightforwardly, without considering the root node (tree identifier) change.
7. To discuss the scientific and practical importance of the suggested method.Its drawbacks are to be described as well.
8. To conclude the study and outline what can be advanced in further research on building minimum spanning trees under constraints.

IV. INITIAL SET OF PLANAR NODES
Initially, as it is common in practice, no edges are given, but a set of N planar nodes (1) is given, on which a minimum spanning tree is to be built by not exceeding the maximum edge length denoted by lmax.Components xi and yi are horizontal and vertical coordinates of node .Although the topology of set (1) is not generally specified, the distance between the pair of any two nodes is calculated using the common Euclidean metric in [28], [29].

V. MAXIMUM EDGE LENGTH CONSTRAINT
To obtain a set of edges, over which a minimum spanning tree can be built, set (1) is triangulated.The triangulation is performed by the Delaunay approach [30], [31], mostly excluding sliver triangles [32].Besides, the Delaunay triangulation does not maximize the edge-length of the triangles [33] that fits the constraint of not exceeding the maximum edge length.Instead, the Delaunay triangulation maximizes the minimum of all the angles of the triangles [34], [35].This property usually shows up (Fig. 1), but sometimes the slivering is not avoidable (Fig. 2).
Upon the nodes of set (1) are triangulated, a set of Q edges emerges, where edge q E is determined by nodes q j P and q k P connected by this edge for The number of edges connecting planar nodes after they are triangulated is not necessarily the same for a given N.For Fig. 1.A triangulated set of 36 nodes.The Delaunay triangulation "prefers" non-sliver triangles, although here are a few triangles with pretty sharp angles.2) is a result of triangulating another set of 44 nodes (Q = 125).Unlike that in Fig. 3, a few sliver triangles are noticeable here.It is also noticeable that the set of these 44 nodes is kind of inscribed into a quadrangle being close to a rectangle.The quadrangle is the convex hull of the node set.In general, as the convex hull of the node set becomes more rectangleshaped, the number of edges in set (2) tends to increase.instance, there are 115 edges upon 44 nodes are triangulated (Fig. 3), but triangulating other 44 nodes results in 125 edges (Fig. 4).It would be generally correct to claim that set (2) depends on the shape of planar data.
By the Euclidean metric in , the length of edge q E is ( ) ( ) If there was no maximum edge length constraint, a minimum spanning tree would be built for a graph with edges (2) and their respective weights In this case, the constraint still can exist, but only lmax is set either to a sufficiently great value (intentionally or not) or, more formally (intentionally), at infinity (Fig. 5).In a general case, those nodes whose distances between exceed lmax cannot be connected.Thus, the respective edges should be removed.Therefore, a subset of edges is formed.Subset (6) contains only edges not exceeding lmax, and then a minimum spanning tree is built over this subset.Fig. 5.A minimum spanning tree (highlighted bold) built over 47 nodes primarily triangulated.None of the initial Q = 128 edges is removed due to lmax is sufficiently great.It also formally means that subset (6) coincides with set (2), i. e. U = 128.
Under too short length lmax, however, subset ( 6) may be torn into two or more disconnected sets (graphs).For instance, if * L is the total edge length and for the set of 47 nodes in Fig.
then 48 out of 128 edges are removed and subset ( 6) is of 80 edges, but this subset is torn into three disconnected graphs (see Fig. 6, where the removed edges are dash-lined).So, building a minimum spanning tree over subset (6) may cause uncertainty when such cases of the disconnection occur.
Fig. 6.Subset (6) for 47 nodes in Fig. 5 under maximum edge length (7).The subset is a union of the three disconnected graphs.One of these graphs is of just three edges (the close-to-be-sliver triangle in the top right).

VI. RUNNING THROUGH ROOT NODES
Let subset (6) be a set of T separate (disconnected) graphs, where graph t is of W(t) edges of set { } ( ) 1 In a particular case, when subset ( 6) forms a single connected graph, T = 1 and W(1) = U, (1)  E E = .Let the edges in set (8)  connect i. e., tree t is built over M(t) nodes (10), where the nodes for each tree are indexed separately.Let us denote by ( ) * L t the total edge length of tree t built for graph t with a root node Note that ( ) * L t does not depend on the root node.Inasmuch as the best tree should connect the maximum possible number of nodes, only trees that cover the maximum number of nodes are considered further.Thus, the maximal integer is selected from the set of all possible numbers contains all indices of trees that cover the maximum number of nodes.Therefore, the maximum number of nodes is ( ) Then the minimum of total edge length ( )  12) is searched because it is already known as Mmax = M(1).In other cases, when T > 1, the maximum number of covered nodes is found by ( 12) and ( 13), whereupon the set of all tree lengths ∈ is determined after ( 14) by running through max T root nodes.
To compare how the suggested method performs versus solving the problem by straightforwardly (randomly or taking the first tree) picking a tree out of max T trees (when max 1 T > ), the number of initial nodes is varied as { } 10, 20, 30, 40, 50, 60, 70, 80 The maximum edge length is set to (16) by a constraint factor For β = 1 the maximum edge length by ( 16) is the average of the initial set of edge lengths (5) where θi, ξi are values of two independent random variables distributed uniformly on the open interval ( ) 0; 1 and ϑi, ζi are values of two independent random variables distributed normally with unit variance and zero mean [36], [37].For every triple the problem is regenerated for 100 times.An example of a problem with 50 initial nodes for β = 1 has two separate trees, and both cover 25 nodes.However, one tree has the total edge length of 209.8151 (Fig. 7), whereas the other one has the total edge length of 196.6618 (Fig. 8) being 6.269 % shorter.In this example, the maximal number of disconnected trees covering a maximum number of nodes is 2: max 2 T = .In the 4800 results of the computational set-up by ( 15)-( 18), the maximal number of disconnected trees covering a maximum number of nodes varies between 1 and 5: Fig. 8.The minimum spanning tree of the other disconnected graph for the problem with 50 nodes and β = 1 in Fig. 7.The tree covers the other half of the nodes (Mmax = 25), but its total edge length is 196.6618.This tree is 6.269 % shorter than that in Fig. 7.
There are 4523 problem instances with max 1 T = and 233 instances with max 2 T = .The tree triple covering a maximum number of nodes ( max 3 T = ) has occurred rarely -it is just 34 times, and the tree quadruple ( max 4 T = ) has been registered 8 times.The rarest instance of five trees covering a maximum number of nodes ( max 5 T = ) occurred only 2 times, both by the severest maximum edge length constraint (β = 1.5): one with 10 nodes and the other one with 30 nodes (Fig. 9).The latter is solved to a tree whose length is 21.6192, while the lengths of the other four trees are 24.4593,24.6687, 28.8194, 32.6425 (listed in ascending order of their least root nodes).Without considering the root node change, the average loss in the total edge length would be In general, the average and maximal losses being relevant for those 277 instances of max 1 T > are respectively calculated as Fig. 9.The instance of 30 nodes and β = 1.5 (the severest maximum edge length constraint), for which the minimum spanning tree (highlighted bold) covers 5 nodes (Mmax = 5).The tree has the total edge length of 21.6192, while the other four trees covering 5 nodes have lengths 24.4593, 24.6687, 28.8194, 32.6425.These trees are easily spotted due to each of them comprises 4 edges.The remaining, sixth, tree comprises just 2 edges, and therefore max 5 T = here.
Compared to the solution tree, the four trees are longer by 13.137, 14.1055, 33.3045, and 50.9887 %, respectively.These are indeed very significant differences, so determining the shortest tree in this instance would be quite important and efficient in practice.
( ) and ( ) for every triple (19).Average losses (20) averaged over 100 instances (19) are presented in Table I.Here and below maximal values over the number of nodes are highlighted bold; maximal values over the constraint factor are highlighted dim.These values show that the average loss tends to increase as the maximum edge length constraint is made severer.The rightmost column, however, shows that the average loss decays as the number of nodes increases.Nevertheless, the overall average (here and below highlighted with larger font) implies that, without solving problem ( 14), the solution tree is more than 1 % longer on average.The worst-case scenario cannot be excluded, too.Thus, Table III presents maxima of average losses (20) over 100 instances (19).The bold-and-dim pattern slightly differing from that in Tables I and II fits the abovementioned loss trends.The hugest loss has occurred for an instance of 20 nodes and β = 1.1; the minimal maximum of average losses (20) has occurred for an instance of 60 nodes and the same constraint factor.The overall average implies that, without solving problem (14), the solution tree is more than 42 % longer in the worst case of the average loss.Table IV presenting maxima of maximal losses (21) over 100 instances (19) has the same worst-case scenario pairs of N and β.The worst loss is 5.217, but there are three trees ( max 3 T = ) covering just 4 nodes (Mmax = 4).More practically valuable are the instances with 40 to 80 nodes, where more nodes are eventually covered by the best tree (just like the instance illustrated in Figs.7 and 8), although the potential loss is not that huge.Such an instance is illustrated in Fig. 10 for 70 nodes and β = 1.3,where the best tree covers 15 nodes having the length of 70.6549 (the lengths of the other two trees covering the same number of nodes are 78.7802 and 91.191, listed in ascending order of their least root nodes; the trees are shown in Fig. 11 in a simplified presentation style).Herein, the maximal loss is 1.2907 and the average loss is 1.1352 (both are seen in Tables IV and III), which are very significant if solving problem ( 14) is ignored.In fact, there are many such examples among those 27 instances.The bold-and-dim patterns in Tables III and IV also coincide.The overall average is 1.8077, i. e., is the averaged maximum of maximal loss exceeds 80 %, which is quite significant.All those results in Tables I-IV remain roughly the same in their trends if the part of normal distribution in generating node locations is increased.If the part is 10 times increased, that is

[
] [ ] 80 10 50 80 10 50 there are 4592 problem instances with max 1 T = , 186 instances with max 2 T = , 17 instances with max 3 T = , and 3 instances with max 4 T = .The rarest instance of five trees covering a maximum number of nodes ( max 5 T = ) occurred only 2 times, both for 10 nodes, but by β = 1.4 and β = 1.5 -unlike the instances generated by (18).The overall averages are less than those in Tables I-IV (now they are 1.0092, 1.0175, 1.3204, 1.5908, respectively), but the general pattern described by Tables I-IV remains.
For larger sets of initial nodes this pattern remains also, but the losses are weaker and rarer.Thus, when the number of initial nodes is varied as by constraint factor ( 17) and ( 22), and the problem is regenerated for 100 times for every triple (19), the overall averages of average losses (20) and of maximal losses (21) are 1.0009 and 1.0018 being far less than those in Tables I and II.The overall averages of maxima of average losses (20) and of maxima of maximal losses (21) are 1.045 and 1.0899, though.Thus, in the worst-case scenario of a larger initial set of planar nodes, the worst average loss is about 4.5 %, whereas the worst maximal loss (on average) is about 8.99 %.The relationship between these two overall averages is almost the same as it is for problems with (15) generated by (18) and problems with (15) generated by (22): the worst maximal loss percentage is nearly twice as greater compared to the worst average loss.For problems with (23) generated by (22), however, the worst average and worst maximal losses are about 10 times as lesser compared to the overall averages in Tables III and IV.Nevertheless, there is an instance of 400 nodes by β = 1.5 (Fig. 12), for which the maximal loss by ( 21) is 1.324 (or, in percentage terms, 32.4 %).This is the instance with max 2 T = , where the solution tree length is 65.0188, and the other tree has the length of 86.083.Each of the trees covers 33 nodes (Mmax = 33), and eliminating the 32.4 % loss is an efficient operation.The average loss by (20) is, obviously, 1.162 (its respective percentage 16.2 % is due to there are only two trees).
Another instance with an impressive potential loss is a problem of 100 nodes by β = 1.3,where three trees cover 15 nodes each having various lengths (Fig. 13).Despite the maximal loss elimination here is 12.2 %, the solution tree is longer than that in Fig. 12 and it covers a relatively greater number of nodes compared to the trees in Fig. 12.Each of the three trees covers 15 nodes (Mmax = 15), which is 15 % of the number of initial nodes.Thus, the trees cover a relatively greater number of nodes compared to the trees in Fig. 12.The total edge lengths of the trees, from the left to the right, are 78.7003,72.0263, and 70.1428 (the upper tree is the solution).Compared to the solution tree, the two trees are longer by 12.2 and 2.69 %, respectively.The longest tree is on the left, but it is hardly recognizable visually.The maximal loss is 1.122 (or 12.2 %).The average loss is ( ) ( )

VII. DISCUSSION
Determining the number of disconnected graphs, upon removing the edges whose length exceeds lmax, begins from finding a first graph (t = 1) and checking whether every next node (apart from the nodes in this graph) belongs to this graph.If there are no disconnections, i. e., subset (6) forms a single connected graph, the problem is solved by building the (single possible) minimum spanning tree, where any of the known efficient methods can be used.This case appears to be very likely for loose maximum edge length constraints.As the constraint is made severer, the disconnections are inevitable, and the likelihood of two or more disconnected graphs grows.The probability of two or more trees covering the same maximum of nodes (when max 1 T > ) is greater for fewer initial nodes.This can also be indirectly inferred from Tables I-IV.
The suggested method firstly removes too lengthy edges, and then it builds either minimum spanning trees over disconnected graphs or the single minimum spanning tree if there are no disconnections.However, might building the trees be the first, followed by removing too lengthy edges?No, it would be unreasonable because in this case the removal might tear some trees already built, whereupon re-building trees had to be run.
It may seem that ( 12) and ( 14) constitute a two-criterion optimization problem.In this problem, firstly, the number of nodes covered by a minimum spanning tree is maximized, and only then the tree length is minimized.Might these operations be accomplished inversely?The answer is negative because these operations are not of the same priority or order.Obviously, a minimum spanning tree which would connect the maximum possible number of nodes is to be built.This is the prime purpose.The operation of the tree length minimization is run only if there are two and more trees each of which connects the same (maximum) number of nodes.
The suggested method builds the best minimum spanning tree under maximum edge length constraint with respect to planar nodes by using Euclidean metric (4) to calculate graph weights (5).In fact, this is the Euclidean minimum spanning tree on the plane.Nevertheless, it does not mean that the suggested method is limited to the Euclidean metric.Any other metric can be used to calculate weights of Q edges (2), and still the method will be valid.Moreover, the edge weight is not necessarily to be tied to its length.Then, generally speaking, the suggested method will build the best minimum spanning tree under maximum edge weight constraint with respect to planar nodes, whichever metric is used.The matter is that using the Euclidean metric is very common in practice to evaluate the edge weight, which is a linear function of the edge length.In the considered model, the linear function is formally the nonscaled direct proportion without bias, by which distances (5) are taken as weights.However, even if the function that maps the edge length into its weight is nonlinear, it nonetheless relies on the Euclidean metric.This is why the constraint of the maximum edge length is pinned on the model.In other words, the suggested method will build still the best minimum spanning tree, whichever approach is used to obtain the weights of Q edges (2).
The study is an important scientific contribution to the domain of graph theory in combinatorial optimization.It further extends and supplements the theory of minimum spanning trees whose practical importance is very high.The main drawbacks are additional identification of disconnected trees (if any) and ignorance of potentially efficient routes based on trees covering fewer than Mmax nodes.For instance, if the best minimum spanning tree is 10 % longer than a minimum spanning tree covering Mmax -1 nodes, this may induce vagueness in making a decision on the solution tree for Mmax > 10.Such an issue is to be addressed in further research.

VIII. CONCLUSION
In order to build a minimum spanning tree connecting the maximum possible number of planar nodes by not exceeding the maximum edge length, an approach has been suggested that successively involves triangulating the initial nodes, removing too lengthy edges, determining the disconnected graphs, selecting the minimum spanning trees covering a maximum of nodes, and selecting the tree whose length is minimal.The application of this operation sequence is more relevant for not large sets of initial nodes with severe maximum edge length constraints.If the last operation in the suggested method is not accomplished, but the tree selection is random, the solution tree becomes 1.17 % longer on average (see Table I) with the worst average case of 42 % (see Table III) for 10 to 80 nodes.If the worst tree is selected (the tree covering the maximum of nodes, but being the longest among the trees covering the same number of nodes), the solution tree becomes 2.33 % longer on average (see Table II).In the worst case, the loss exceeds 80 % (see Table IV).
In addition to the optimal coverage, the suggested method has also a strong relation and practical application to the metric facility location problem [3], [4], [38], [39], where planar objects are considered.The study can be advanced in further research on building minimum spanning trees for threedimensional nodes.Minimum spanning trees of spatial graphs have practical application as well [40], [41].The case when the node is of four or more coordinates is a matter of a generalized research in this direction.

Fig. 2 .
Fig. 2.An example of triangulating another set of 36 nodes, where the slivering is not avoidable.Here are (at least) four extremely sliver triangles (the arrows point at them), which visually almost stretched into their edges.

Fig. 4 .
Fig. 4. A set of 125 edges (2) is a result of triangulating another set of 44 nodes (Q = 125).Unlike that in Fig.3, a few sliver triangles are noticeable here.It is also noticeable that the set of these 44 nodes is kind of inscribed into a quadrangle being close to a rectangle.The quadrangle is the convex hull of the node set.In general, as the convex hull of the node set becomes more rectangleshaped, the number of edges in set (2) tends to increase.

Fig. 7 .
Fig.7.The minimum spanning tree (highlighted bold; the removed edges are dash-lined) of one of two disconnected graphs for a problem with 50 nodes (N = 50) and β = 1.The tree covers exactly a half of the nodes (Mmax = 25) and its total edge length is 209.8151.

Fig. 10 .Fig. 11 .
Fig. 10.The instance of 70 nodes and β = 1.3, for which the minimum spanning tree (highlighted bold) covers 15 nodes (Mmax = 15).The tree has the total edge length of 70.6549, while the other two trees (shown in Fig. 11) covering 15 nodes have lengths 78.7802 and 91.191.Compared to the solution tree, the two trees are longer by 11.5 and 29.07 %, respectively.

Fig. 12 .
Fig.12.The problem of 400 nodes and β = 1.5, where max 2 T = (the solution tree is on the left; these two trees are shown along with initial edges, but the removed edges are not highlighted to simplify the visualization).Each of the two trees covers 33 nodes (Mmax = 33), which is 8.25 % of the number of initial nodes.The solution tree total edge length is 65.0188.The other tree total edge length is 86.083, i. e., this tree is 32.4 % longer.If the tree selection is random, the potential loss is interpreted as the average one by(20), and it is

Fig. 13 .
Fig. 13.The problem of 100 nodes and β = 1.3,where max 3 T = (these trees are shown along with initial edges, but the removed edges are not highlighted).Each of the three trees covers 15 nodes (Mmax = 15), which is 15 % of the number of initial nodes.Thus, the trees cover a relatively greater number of nodes compared to the trees in Fig.12.The total edge lengths of the trees, from the left to the right, are 78.7003,72.0263, and 70.1428 (the upper tree is the solution).Compared to the solution tree, the two trees are longer by 12.2 and 2.69 %, respectively.The longest tree is on the left, but it is hardly recognizable visually.The maximal loss is 1.122 (or 12.2 %).The average loss is (20)ch of the two trees covers 33 nodes (Mmax = 33), which is 8.25 % of the number of initial nodes.The solution tree total edge length is 65.0188.The other tree total edge length is 86.083, i. e., this tree is 32.4 % longer.If the tree selection is random, the potential loss is interpreted as the average one by(20), and it is