parser-implementation
parser-implementation
| Home | Algorithms |
A simple way to represent a mathematical expression into a form a calculator can easily evaluate is to represent it as a Binary
Expression Tree. This is a data structure based on the concept of a node represented by a circle containing an item (or value or ID) and
which may have a left-child, right-child or parent node. A child node extends down its parent. Representing the expression 5-6/2+3*4
gives:
In order to evaluate this expression, a post-order traversal of the tree is done. That is, starting from the root node, child nodes are first
evaluated recursively before their parent node is evaluated. So the tree is evaluated starting from the bottom all the way up to the top.
This gives:
(5-(6/2))+(3*4) = (5-3)+(3*4) = 2+12 = 14
This document presents a new, simple and very efficient iterative algorithm to parse a mathematical expression into a Binary Expression
Tree. The expression is given in the generally used form also known as infix notation. For an expression of size n, the performance is
O(n) and with very small implementation constant. The algorithm is a type of operator-precedence parser. It is originally developed to
parse the equation of a graph in the Graph Plotter 3D software.
Contents:
1. The basic idea
1. Get the first item and initialise the tree with it.
2. Currently the root node is also the current node. The current node is the node we currently lie on.
3. Get the next item. This will be called the new item.
4. Climb up the tree as long as the precedence of the current node item is greater than or equal to that of the new item. When this is
over, the current node item has a precedence strictly less than that of the new item.
5. Create a new node for the new item. Then set the left child of the new node to be the old right child of the current node. Finally set
the new right child of the current node to be the new node (also set the parent of the new node).
It can be observed that 1) we never move down the tree and 2) the current node is always on the right of its parent node. Notice that
steps 4 and 5 form the basic idea behind the algorithm. In step 4, if at the root node then a node is imagined to hold the root node as its
right child. Later we shall see that this imaginary node is in fact the open bracket node. In step 5 if there is nothing to set then nothing is
1 of 7 2017/09/24, 22:48
Expression parsing algorithm http://rhyscitlema.com/algorithms/expression-pa...
set.
Now consider the following list of operators and their precedence levels:
A number is given highest precedence. By applying the algorithm, below are the steps to parse the expression 5-6/2+3*4.
Step 4: Climb up the tree as long as the precedence of the current node item is strictly greater than that of the new item. When this
is over, the current node item has a precedence less than or equal to that of the new item.
Step 5: Set the new right child of the parent node to be the old right child of the current node, then delete the current node.
Step 6: Set the current node to be the parent node.
There is no need to set the new parent of the right child node because we never move down the tree, but do set it so to maintain tree
structure. Notice that the '(' node never has a left child.
2 of 7 2017/09/24, 22:48
Expression parsing algorithm http://rhyscitlema.com/algorithms/expression-pa...
Lets add the factorial '!' operator to our list of operators and give it a precedence of 6. Therefore it has a higher precedence than the
exponent '^'. Also it has only a left-operand to operate on. For example in 4!*2 = (4!)*2, the factorial of the operand '4' is evaluated. It is
inserted in the tree in the original basic way.
3 of 7 2017/09/24, 22:48
Expression parsing algorithm http://rhyscitlema.com/algorithms/expression-pa...
Lets also try to fit in the following three logical operators into our list of operators:
operator or with precedence 1.2
operator and with precedence 1.4
operator not with precedence 1.6
Notice that here the and operator is made to have higher precedence than the or operator.
The logical operators and and or are both associative. For example the expression (2=7 and 9=9 and 15=11) can be evaluated as ((2=7
and 9=9) and 15=11) or as (2=7 and (9=9 and 15=11)). The order does not matter. Therefore the algorithm as initially stated can be
applied.
The logical operator not has only a right-operand to operate on. For example in the expression (9=9 and not 2=7) same as (9=9 and (not
2=7)), the result from 2=7, which is false, is negated to true. So the whole expression evaluates to (true and true) = true. This operator
is inserted in the same way as the negative operator.
The comparison operators are non-associative. That is the expression (4=a=9) cannot be evaluated as ((4=a)=9) nor as (4=(a=9)). The
reason is that 4=a evaluates to either true or false, but then the expression (true=9) is invalid. Actually what we will want is for the
expression to evaluate as (4=a and a=9). Similarly we will want the expression (a<b<c<d) to evaluate as (a<b and b<c and c<d). This
requires a more advanced tree structure.
Therefore the algorithm is given n items, creates n nodes for each of these items, may visit a node again only once. This gives an
algorithmic performance of O(2n). However step 5 involves a larger implementation constant than step 4.
Correctness:
The necessary and sufficient condition for the constructed binary expression tree to be valid is that the tree respects the precedence
orders of operations, which in turn will ensure proper order of evaluation of the mathematical expression. Therefore the algorithm is
correct if it guarantees a valid constructed tree.
Respecting precedence orders implies that items with higher precedence levels must be found lower down the tree. Notice that this is
exactly what step 4 of the algorithm ensures when it climbs up. Then step 5 follows by placing these items on a left sub-tree, which is
never visited again since the current node is always on the right of its parent. With these two steps combined, it is guaranteed that a
post-order traversal of the constructed tree will follow proper order of evaluation.
Some operators exhibit the important behaviour of skipping step 4 of the algorithm. This simply goes by the nature of the operators
themselves. That is, for the '(' and '-ve' operators for example, their first encounter implies highest precedence on insertion, which
implies skipping step 4.
/*
To compile with GCC, execute:
gcc -Wall -ansi -pedantic -o output expression-parsing-algorithm.c -lm
*/
#include <stdio.h>
#include <string.h>
#include <malloc.h>
#include <math.h>
4 of 7 2017/09/24, 22:48
Expression parsing algorithm http://rhyscitlema.com/algorithms/expression-pa...
print_expression_tree(tree);
result = evaluate_expression_tree(tree);
delete_expression_tree(tree);
Below is the portion of the source code that implements the algorithm.
if(info != SkipClimbUp)
{
/* step 4: climb up */
if(info != RightAssociative)
{
/* for left-associative */
while(current->pre >= item.pre)
current = current->parent;
}
else
{
/* for right-associative */
while(current->pre > item.pre)
current = current->parent;
}
}
if(item.ID == CLOSEBRACKET)
{
/* step 5.1: remove the '(' node */
node = current->parent;
node->right = current->right;
if(current->right) current->right->parent = node;
5 of 7 2017/09/24, 22:48
Expression parsing algorithm http://rhyscitlema.com/algorithms/expression-pa...
the worst case it is O(n). Looking at the original algorithm which uses the binary expression tree, an important observation can be made,
which forms the basic idea behind the stack based algorithm: the left sub-tree can be evaluated during parsing.
It turns out that this stack-based version is essentially the Precedence Climbing algorithm, for the reason that it considers a binary tree
structure. The tree-based version can however be designed with a more advanced tree structure, such as would be required to parse
source code inside a compiler, such as is implemented in the Rhyscitlema computer application.
...to be updated...
As may be noticed from the source code below, this section is harder to explain...
Consider donating at http://rhyscitlema.com/donate to encourage improvement!
/*
To compile with GCC, execute:
gcc -Wall -ansi -pedantic -o output expression-parse-and-evaluate.c -lm
*/
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <math.h>
Below is the portion of the source code that implements the algorithm.
left = (current-1)->number;
right = (current+1)->number;
switch(current->ID)
{
case POSITIVE: ( current)->number = +right; break;
case NEGATIVE: ( current)->number = -right; break;
case PLUS: (--current)->number = left + right; break;
case MINUS: (--current)->number = left - right; break;
case TIMES: (--current)->number = left * right; break;
case DIVIDE: (--current)->number = left / right; break;
case EXPONENT: (--current)->number = pow(left, right); break;
case FACTORIAL: (--current)->number = factorial(left); break;
case SIN: ( current)->number = sin(right); break;
case COS: ( current)->number = cos(right); break;
case TAN: ( current)->number = tan(right); break;
default: break;
}
current->ID = NUMBER; /* force node to be a number */
current->pre = 10; /* set to highest precedence */
current--; /* prepare for next iteration */
}
switch(item.ID)
{
case CLOSEBRACKET:
*current = *(current+1);
/* current-=1; this part is optional */
break;
6 of 7 2017/09/24, 22:48
Expression parsing algorithm http://rhyscitlema.com/algorithms/expression-pa...
case PLUS:
case MINUS:
case TIMES:
case DIVIDE:
case EXPONENT:
case FACTORIAL: current+=2; break;
default: current+=1; break;
}
}
if(item.ID != CLOSEBRACKET)
*current = item;
return current;
}
Links
1. http://en.wikipedia.org/wiki/Binary_expression_tree
2. https://en.wikipedia.org/wiki/Tree_traversal
3. https://en.wikipedia.org/wiki/Infix_notation
4. https://en.wikipedia.org/wiki/Time_complexity#Linear_time
5. http://en.wikipedia.org/wiki/Operator-precedence_parser
6. https://github.com/rhyscitlema/librfet
See also
Evaluate any math expression online
Convert sorted list to complete BST
Generate nth palindromic number
7 of 7 2017/09/24, 22:48