Using Top Trees For Easy Programming of Tree Algorithms
Using Top Trees For Easy Programming of Tree Algorithms
Using Top Trees For Easy Programming of Tree Algorithms
Algorithms
Michal Vajbar
1 Introduction
There exist many algorithms that work over tree graphs. Any of them can use
different data structures to represent the same tree. If we need to run several of
these algorithms over one forest together each one could use its own representa-
tion of the forest. Moreover, if the forest dynamically changes over the time by
edge insertions and deletions all these changes have to be reflected into the data
structures of all algorithms. This is not efficient. Thus several data structures
have been proposed to solve this problem. Any of them is good at some prop-
erties, but each has a weak point. It appears that top trees provide the most
acceptable trade-off. In this paper, we present a complex solution that allows easy
using of top trees. The solution is built on our implementation of the structure [7]
and it brings Top Tree Friendly Language (TFL) and Top Tree Query Language
(TQL). The TFL is a special programming language which combines declarative
and procedural approaches that results in simpler and faster algorithm designing.
The query language TQL provides easy top trees administration.
The paper is organized as follows. In Section 2 we introduce top trees. Used
implementation of top trees is shortly presented in Section 3. Section 4 introduces
the TFL programming language and shows how to use it. In Section 5 we present
the query language TQL. Final remarks are made in Section 6.
K. Richta, J. Pokorný, V. Snášel (Eds.): Dateso 2009, pp. 68–79, ISBN 978-80-01-04323-3.
Using Top Trees For Easy Programming of Tree Algorithms 69
2 Top Trees
Top trees are a dynamic self-adjusting data structure that was proposed by
Alstrup et al. [1]. A top tree R is an ordinary binary tree with a root. It is used to
represent a tree graph T with defined information that some tree algorithm works
with. The structure R consists of clusters. The edges of the T form basic clusters
and two clusters connected through a common vertex create other cluster. So the
R records a way of clusters connecting into one root cluster that represents whole
T . Each cluster holds information of appropriate part of the T . A manner how
the information is computed during joining and splitting of clusters characterizes
used algorithm. That is a simplified idea. In the rest of this section, the structure
is described more precisely.
(a) (c)
Fig. 1. The cases of joining two neighbouring clusters into the parent cluster (assumed
from [1]). The • are the boundary vertices of the parent cluster. The ◦ are the boundary
vertices of children clusters that did not become the boundary vertices of the parent.
The dashed line presents the cluster path of the parent cluster. Moreover, there exist
symmetric variants for (b) and (c).
Algorithms working over top trees associate information that they need with
clusters or vertices. The local operations describe how to deal with the informa-
tion during the changes. Then to describe the algorithm it is enough to define
the manner of the local operations.
To keep the structure in a consistent state it is possible to change information
only in the root cluster or in the external boundary vertex. So the top tree must
be rebuilt by the expose operation when we want to change information saved in
the cluster specified by a path v . . . w or by a vertex v. Then the path becomes
the root path and specified vertices become external boundary vertices. The
structure does not allow to change any information outside of the root cluster.
Tree algorithms can be classified as local or nonlocal. The local algorithms deal
with local properties of the tree. The local property is defined in the following
way: if the property holds for an edge or a vertex in the whole tree T then
it holds for the same edge or the vertex in any subtree of T . An example can
be the searching for the heaviest edge or the finding out the distance between
two vertices. The local properties can be computed easily in bottom-up manner
in the top tree. The nonlocal algorithms are different. They deal with nonlocal
properties: this kind of property can be held by one edge (or a vertex) in the
whole tree and by another one in any subtree. For example the searching for the
center or the median of the tree. Evidently, the using of top trees for nonlocal
algorithms is more difficult than for local algorithms.
Top trees have to support a fourth operation called select to enable nonlocal
algorithms running. This operation is very complex so only basic idea is showed
here. The detailed description can be found in [6]. The select gets a top tree R and
it picks one of the children clusters of the root cluster. It is important to realize
that the R represents the whole tree T . The operation works recursively and it
is similar to the binary searching. One child of the root cluster is selected and
this child becomes the root for next iteration. The top tree is rebuilt to prepare
the new root after the selection (details in [6]). So the new root still represents
the whole tree T . The finding finishes when a basic cluster that represents one
edge is found. This cluster is the intersection of all selected clusters.
3 Used Implementation
during the local operations as their names imply. In the root blocks, there can
be used only the blocks of the second level called auxiliary blocks. The var and
the selectQuestion are exceptions without the local blocks.
The auxiliary blocks contain the procedural tools of the TQL language that
are depicted in the section 4.4. They can be read as the sequences of the com-
mands that are executed if some condition holds (e.g. the cluster represents a
path). The blocks are explained in more detail in the following subsection.
The root block var. The var is a special kind of the root block. There are
declared global variables that can be used in any other root block. The syntax of
the declaration is the same as in the vertex block. Arrays are not allowed there.
The root blocks create and destroy. These blocks describe the algorithm
during the local operations of the corresponding names. It is necessary to dis-
tinguish if the cluster represents a path or if it is a point cluster. So there are
two auxiliary blocks:
path – the current base cluster is a path cluster
point – the current base cluster is a point cluster
A content of the block is executed if the condition above holds. These auxiliary
blocks can occur in any order but each at most once.
74 Michal Vajbar
The root blocks join and split. The blocks describe the algorithm during
the appropriate local operations. There have to be considered the types of the
parent and both children clusters. According to these types the behaviour of the
algorithm can be described by following auxiliary blocks:
path child – current descendant of the parent cluster is a path cluster
point child – current descendant of the parent cluster is a point cluster
path parent – parent cluster is a path cluster
point parent – parent cluster is a point cluster
path and path – clusters represent the variant Fig.1a
path and point – clusters represent the variant Fig.1b
point and path – clusters represent a symmetry to the variant Fig.1b
point and point – clusters represent the variant Fig.1e
lpoint over rpoint – clusters represent the variant Fig.1c
rpoint over lpoint – clusters rep. a symmetry to the variant Fig.1c
lpoint and rpoint – clusters represent the variant Fig.1d
A content of the block is executed if the condition above holds. These auxiliary
blocks form three groups: * child, * parent and the variants from the figure 1.
In the join block, the groups have to occur in the mentioned order. In the split,
the order of * child and * parent is switched. Within the scope of the groups
the blocks can occur in any order.
The mentioned orders correspond to a succession that should be observed
during the algorithm designing. When two neighbours are connected by the join
then typically the data from them are prepared at first (* child ) and then the
data of the parent cluster are computed (* parent). If the algorithm needs more
information about the clusters then the blocks from third group can be used. The
procedure is analogical for the split. Firstly the data from the parent cluster are
prepared (* parent) and then the data for the children are computed (* child ).
Eventually, the blocks from the third group can be used of course.
The root block selectQuestion. The last of the root blocks describes the
way how the decision during one step of the select operation proceeds. The
selectQuestion can be seen as the fifth local operation. Its content is slightly
different from the other root blocks. There are no auxiliary blocks and the syntax
follows:
selectQuestion {
/* any code */
if (condition) {
/* any code */
select a;
}
else {
/* any code */
select b;
}
}
Using Top Trees For Easy Programming of Tree Algorithms 75
The command select (a|b); specifies the cluster that is selected. The a (b)
denotes the left (right) child. The usage of the cluster names can be found in
Section 4.5.
The language supports four data types - integer, real, string and boolean. It
should be sufficient for the most of tree algorithms. The string represents the
sequences of characters. The values must be surrounded by the quotation marks.
The boolean type was included to allow working with logical values true and
false.
The integral numbers are represented by the integer data type and the real
numbers by the real. The language does not enable a casting between these
two data types. The both numeral types support positive and negative infinity -
IpINF and ImINF for the integer and RpINF and RmINF for the real. This differen-
tiation between the types of the infinities results from the using of the auxiliary
variables. It is described in the Section 4.5. The work with the infinities abides
by the standard IEEE 754 [3].
The variables declarated in the vertices and the clusters are initialized by the
default values according to the data types. A null string "" is the default for the
string and the false for the boolean. The integer is initialized by integral zero
0 and the real by decimal zero 0.0.
By the procedural tools we try to cover usual needs of the algorithm designing.
We have implemented a lot of tree algorithms over the top trees and it has
revealed that there are only two essential commands. The first is the assignment
of a value into a variable and the second is if-else construction. It seems to be very
little, but it is not. There is no need to support more complicated constructions
like for -loops or while-loops and the need of switch-block can be substituted by
the if-else very easily.
The syntax of the if-else is the same as in other programming languages.
The construction can contain any number of elseif parts and the else part is not
necessary:
if (condition1) {
/* any code */
} elseif (condition2) {
/* any code */
} else {
/* any code */
}
operator & for string concatenation. The language also offers the logical AND &&
and the OR ||. The priorities of the operators are ordinary and the order can
be specified by parentheses (, ). The summary of operators with their priorities:
There are two kinds of variables in the TFL. The first kind includes the variables
declared in the vertex and the cluster block. The second kind are auxiliary vari-
ables. The names have to match the regular expression [a-zA-Z][a-zA-Z0-9 ]*.
To make accessing to clusters and vertices easier the labels were set up:
c current cluster
a left child of the current cluster
b right child of the current cluster
child mark for a and b in * child blocks
left left boundary vertex
right right boundary vertex
common common vertex of the children
border the only boundary vertex of the point cluster
The usage of the names employs dot notation similarly to the object-oriented
programming languages:
variable global (declared in var ) or auxiliary variable
cluster.variable auxiliary variable or declared in cluster
cluster.array[vertex] value attached to vertex in cluster array
cluster.vertex.variable auxiliary or declared variable of the vertex
vertex.variable abbreviation for c.vertex.variable
Auxiliary variables. In the root block (excepting the var ), the auxiliary vari-
ables can be used. They do not need to be declared. The TFL learns the type
from the first value that is assigned to the variable. Therefore each auxiliary vari-
able must be initialized by some value. This is the reason for using one type of
the infinity for the integer and another one for the real. The creation of auxiliary
arrays is not allowed.
The usage of the language is very easy as can be seen from the following example
- the finding out the length of a path. The path is specified by two vertices and
Using Top Trees For Easy Programming of Tree Algorithms 77
the technique was described by Alstrup et al. [1]. Each cluster holds its length.
The algorithm needs to implement only the join. If a point cluster is created
then its length is zero. The length of a path cluster is computed as the sum of
the children lengths. The produced source code is very short and simple:
/****** The length of the path ******/
the data types and the unique identifier cannot be used. It is possible to call this
command at any time. When the node is called without the parameters then it
displays the current order and the default values settings.
A new node is created in the following manner:
unique_name (value1, value2, ...) (param1=value3, param2=value4, ...);
The enumeration part must agree with the declaration of the node. In the assign-
ment part, the default values can be overwritten and other parameters defined.
The parts can occur in any order. Any part can be omitted, but at least one has
to be applied.
6 Final Remarks
We have developed a complex solution that allows using top trees for easy pro-
gramming of tree algorithms. The solution is built on our implementation of top
trees [7] and it is formed by Top Tree Friendly Language (TFL) and Top Tree
Query Language (TQL).
The TFL is a declarative programming language that makes the structure
available to users with only a basic knowledge of top trees. The declarativity
saves users from technical details of the implementation and it allows to focus
on the designing of tree algorithms. The TQL is a simple query language that was
developed to control top trees. The language allows the adding of the vertices,
joining and splitting of the edges or working with information stored in the top
trees. In addition, the language TQL can be extended by custom functions.
Both languages form an useful tool that enables to write a simple and short
source code quickly and then easily verify its functionality. So the tool makes
the algorithm designing easier and more comfortable.
References
1. S. Alstrup, J. Holm, K. D. Lichtenberg, and M. Thorup. Maintaining information
in fully dynamic trees with top trees. ACM Trans. Algorithms, 1(2):243–264, 2005.
2. G. N. Frederickson. Data structures for on-line updating of minimum spanning
trees, with applications. SIAM Journal of Computing, 14(4):781–798, 1985.
3. IEEE-Task-P754. ANSI/IEEE 754-1985, Standard for Binary Floating-Point Arith-
metic. IEEE, New York, aug 12 1985. A preliminary draft was published in the
January 1980 issue of IEEE Computer, together with several companion articles.
Available from the IEEE Service Center, Piscataway, NJ, USA.
4. D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. Journal of
Computer and System Sciences, 26(3):362–391, 1983.
5. R. E. Tarjan and R. F. Werneck. Self-adjusting top trees. In SODA ’05: Proceedings
of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 813–
822, Philadelphia, PA, USA, 2005. Society for Industrial and Applied Mathematics.
6. M. Vajbar. Modelling dynamic trees. Master’s thesis, Department of Software
Engineering, Charles University in Prague, 2008.
7. M. Vajbar and K. Toman. Implementation of self-adjusting top trees. Technical
Report No 2008/3, Charles University, Prague, Czech Republic, 2008.
8. R. F. Werneck. Design and analysis of data structures for dynamic trees. PhD
thesis, Princeton University, Princeton, NJ, USA, 2006. Adviser-Robert E. Tarjan.