Talk:Shunting yard algorithm

Latest comment: 1 year ago by Chorl in topic Better reference for Dijkstra's original?

Problems with algorithm description

edit

There are multiple points in the algorithm's description where the term 'operator' is used when it should be 'token' and other points where an ambiguity is created about whether a parentheses is to be considered an operator. As it stands, the algorithm description is technically inaccurate. — Preceding unsigned comment added by 24.170.37.40 (talk) 11:57, 5 March 2021 (UTC)Reply

https://brilliant.org/wiki/shunting-yard-algorithm/ has a much better description of the algorithm. i recommend it be used as a basis for improving the algorithm description — Preceding unsigned comment added by 24.170.37.40 (talk) 10:59, 6 March 2021 (UTC)Reply
Since it's an external link, I'll reproduce the contents in question here:
While there are tokens to be read:
      Read a token
      If it's a number add it to queue
      If it's an operator
             While there's an operator on the top of the stack with greater precedence:
                     Pop operators from the stack onto the output queue
             Push the current operator onto the stack
      If it's a left bracket push it onto the stack
      If it's a right bracket 
           While there's not a left bracket at the top of the stack:
                    Pop operators from the stack onto the output queue.
            Pop the left bracket from the stack and discard it
While there are operators on the stack, pop them to the queue

Early discussions

edit

--78.50.237.77 (talk) 07:40, 10 May 2012 (UTC)Reply

The code does not compile because of missing cast at line: 223, 226, 236, 240, 243, 257 for "&sc". I know it is implicite casting which is supported by C, but for example gcc doesn't compile.

The right form is

printf("%s;\n",(char *) &sc);

instead of

printf("%s;\n", &sc);

Please correct it.



I think the code of this page is not correct. In particular, the output shown by the example is not correct. Let me copy here the execution example (August 23rd, 2011)

input: a = D(f - b * c + d, !e, g)
output: afbc*-d+e!gD=
order: (arguments in reverse order)
_00 = c * b;
_01 = _00 - f;
_02 = d + _01;
_03 = ! e;
_04 = D(g, _03, _02)
_05 = _04 = a;
_05 is a result

Observe the output of "_01". It should be

_01 = f - _00

The problem is located in the execution_order function. The code that says

sc = stack[sl - 1];
sl--;
printf("%s %c ", &sc, c);
sc = stack[sl - 1];
sl--;
printf("%s;\n",&sc);

should be replaced by

sc = stack[sl - 2];
printf("%s %c ", &sc, c);
sc = stack[sl - 1];
sl--; sl--;
printf("%s;\n",&sc);

In this case the output of the execution is

input: a = D(f - b * c + d, !e, g)
output: afbc*-d+e!gD=
order: (arguments in reverse order)
_00 = b * c;
_01 = f - _00;
_02 = _01 + d;
_03 = ! e;
_04 = D(g, _03, _02)
_05 = a = _04;
_05 is a result


Lgoster (talk) 14:45, 23 August 2011 (UTC)Reply

I think this page needs correction: o1 is associative or left-associative and its precedence is greater than or equal should be less than or equal, and o1 is right-associative and its precedence is less than that of o2 should too. The method works fine when using that modification, but gives strange answers otherwise. The example also follows the rule as less than, not greater than.

Is the complex example correct? I'm new at RPN, but I think I remember my basic arithmetic. When I enter 3 4 2 * 1 5 - 2 ^ / + in an RPN calculator, I get 3.5, but when I calculate 3+4*2/(1-5)^2, I get 0.6875. I'll be really embarrassed if I miscalculated. lomedhi

3+4*2/(1-5)^2
3+4*2/(-4)^2
3+4*2/16
3+8/16
3+0.5
3.5

Italo Tasso 05:32, 24 October 2006 (UTC)Reply

Whoops. I screwed up my order of operations on 3+8/16. I saw it in my head as:

 

I am indeed duly embarrassed.

lomedhi 21:08, 7 November 2006 (UTC)Reply

Abstract syntax tree

edit

How to use this algorithm to produce an abstract syntax tree? exe 01:01, 11 March 2007 (UTC)Reply

Answer:

Every time you push an operator, pop off the necessary number of arguments, so it could look something like this (untested):

   struct node{
       val self;
       int argc;
       node *argv;
   }
   bool ast_add(node *stack,val new_val,int max_len,int len){
       if(max_len)return false;//could realloc
       node n;
       n.argv=(n.argc=val_get_arity(n.self=new_val))&&(node*)malloc(n.argc*sizeof(node));
       if(len<n.argc)
           return false;
       for(int i=0;i<n.argc;++i)
           n.argv[i]=stack[--len];
       stack[len++]=n;
   }

— Preceding unsigned comment added by 99.240.99.150 (talkcontribs) 8 September 2011


While the effort above is appreciated, it is really not much help. Because it's pretty stiff old-school C highly steeped in the idiosyncratic conventions of the C run-time and other compiler jargon terms (like "arity"), it's not really understandable by anyone other than an expert C programmer (most of whom already know this). More problematic is its dependence on the principle of popping off "the necessary number of arguments" (i.e., "arity"), information that the shunting-yard algorithm is not supposed to have nor need.
What's really needed here is an addition to the main article that provides a working pseudocode example of a Shunting-Yard algorithm that produces correct AST output with the information that is available. Otherwise, the claim should be removed. RBarryYoung (talk) 21:26, 25 November 2012 (UTC)Reply
After eight years there is still no substantiation of this unlikely claim, so I have removed it. -- RBarryYoung (talk) 14:27, 28 December 2020 (UTC)Reply
It's not unlikely, its pretty trivial. For example see [1] for a reference. All it takes is modifying the output stage. The output is now a stack of AST nodes. When a number, variable or other terminal is received, push a corresponding node onto the stack. When a binary operator is received, pop two nodes off the stack, and build an new operator node with the two nodes as children. Push this onto the output stack. At the end of the algorithm there should be a single node on the output, this is the root node for the AST.
Stack<Node> output = new Stack<>();
class TerminalNode extends Node { Token tok; TerminalNode(Token t) { tok=t; } }
class OperatorNode extends Node { 
   Token operator; 
   Node left, right; 
   OperatorNode(Token t,Node l,Node r) { tok=t; left=l; right=r;}
}
public void output(Token tok) {
      if(tok is an operator) {
         Node r = output.pop(); // need to pop in reverse order
         Node l = output.pop();
         OperatorNode node = new OperatorNode(tok,l,r);
         output.push(node);
      } else {
         TerminalNode leaf = new TerminalNode(tok);
         output.push(leaf);
      }
}

The above is Java like pseudocode, as you can see there is not much to it. --Salix alba (talk): 19:51, 28 December 2020 (UTC)Reply

Operator precedence??

edit

Hello, I have just implemented this algorithm and I believe that the latest corrections (by anonymous users) to the detailed description might be wrong. I believe it should read

1) while there is an operator, o2, at the top of the stack, and either

o1 is associative or left-associative and its precedence is less than (lower precedence) or equal to that of o2, or

o1 is right-associative and its precedence is less than (lower precedence) that of o2,

Earlier versions of this page seem to have had it this way originally

Also, ^2^3 in infix notation means ^8 by my reckoning, so the final answer I get for the complex example 3+4*2/(1−5)^2^3 is 3.0001220703125

Mike mueller nz 11:48, 4 April 2007 (UTC)Reply

After a bit of head scratching and rereading I was able to implement this in Perl with no problems besides the fact that this was my second program in Perl... I used at as one function of a command line calculator, with somewhat cleaned up math problems going in and an array where each element was an operand or operator making an RPN equation going out the other. 72.50.165.166 05:47, 5 April 2007 (UTC)Reply

-

How to handle functions in the stack?


Bjc world (talk) 14:02, 31 December 2007 (UTC) BEGINReply
I think 'The algorithm in detail' section needs to clarify how to process functions when the token being examined is an operator. In particular, the logic describes what to do if the top element on the stack is an operator, but it does not say what to do if the top element on the stack is a function or parenthesis (the stack may contain left parentheses, functions and operators). I believe the correct functionality is to treat functions like operators, but then there is the question of what precedence a function has. In my implementation, I have set functions to have a higher precedence than multiplication and division and lower than parenthesis and this seems to work.
Bjc world (talk) 14:02, 31 December 2007 (UTC) ENDReply

unary minus "-"

edit

Does this algorithm work with unary minus "-" ( -x + x = 0 ) ?

If so, are these fundamental properties or conventions satisfied? :

a-b = a+(-b)

-a^b = -(a^b)

a/-b = a/(-b)

a^-b = a^(-b)

a/-b^c = a/-(b^c)

a^-b^c = a^-(b^c)

Thanks.

yes but priorities must be such that
  • binary_on_stack < minus (=> pospounding output of binary_on_stack, so minus goes first)
  • and minus_on_stack < binary ( binary goes first )
  • ...

edit: basically you answered yourself ca1 (talk) 01:20, 11 May 2008 (UTC)Reply

I think you are correct. I stumbled on the same issue. The algorithm gives wrong result for the case sin(max(2,3)/5) Bob Ueland (talk) 21:20, 28 January 2016 (UTC)Reply

a a + + c

edit

something like that will parse ok. should not there be mentioned that input has to be correct infix? ca1 (talk) 01:03, 11 May 2008 (UTC)Reply

> I think that's a good point, and I added a paragraph to the article to clarify that the algorithm will accept some invalid expressions. 2A02:168:65FC:0:9EB6:D0FF:FEE3:5E4F (talk) 16:51, 18 December 2020 (UTC)Reply

Bitwise XOR operator in example is incorrect?

edit

The complex example section has these two statements:

  • "^ has higher precedence than /"
  • "^ is evaluated right-to-left"

Neither of these statements are true, at least in most programming languages I can think of:

http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_precedence

Treating bitwise XOR like C++ should produce this final RPN output, presuming I didn't screw up my test program:

 3 4 2 * 1 5 - / + 2 ^ 3 ^

The current revision shoves "^ ^ / +" to the end of the output. Can someone verify that my assertion (and work!) are correct?

155.229.108.58 (talk) 21:32, 23 February 2009 (UTC)Ryan C. GordonReply

Thats because its not a bitwise XOR. Its a pow function. 2 ^ 3 is equivalent to pow(2, 3) 70.15.250.196 (talk) 04:39, 5 April 2015 (UTC)Reply

Exponential operator

edit

The operator '^' in this case isn't the bitwise XOR operator as in C++ but rather the exponential operator, so '^' is right-associative and has greater precedence than '/' in this example. Nevertheless, this could be explicitly stated in the article to avoid such confusion. —Preceding unsigned comment added by 190.232.14.227 (talk) 21:21, 6 March 2009 (UTC)Reply

C++ example code

edit

I just took a look at the C++ example code, and it is shocking.

Why on earth are enums not being used in place of those magic numbers? I would go ahead and change it myself but I assume there's a reason behind not using enums.

Can someone clarify this? 124.168.95.207 (talk) 04:43, 18 July 2010 (UTC)Reply

And what's the use of a huge block of code in the middle of an article? — Preceding unsigned comment added by 77.228.71.21 (talk) 09:09, 19 June 2013 (UTC)Reply

bug

edit

There is a bug in the argument-number checking both in the algorithm and in the code. The program crashes for some incorrect input (ie: "a + (b + )"). I don't know the solution so I am not modifying the article. Need an expert, someone can help? - —Preceding unsigned comment added by 90.80.39.41 (talk) 15:45, 20 July 2010 (UTC)Reply

The algorithm is assuming sane input, which is fine for illustrating the basics of how the algorithm works. I would say its not really wikipedias job to provide fully working solutions with full error checking. A quick fix is to add a sanity check after the output has been produced, basically count up the number of numbers and variables, n, count up the number of binary operators, m, to be sane m=n-1. This can actually be done by checking the return value from execution_order(output) which must be true.
A better approach would be to check the input, while parsing, using a formal grammar Backus–Naur Form. If we restrict ourselves to numbers, variable, binary operators and brackets then we basically have two states: next token must be a number, variable or left bracket; next token must be an operator or right bracket. This is the approach taken in [2] which does have a working implementation of the algorithm which I have successfully used myself.--Salix (talk): 19:18, 20 July 2010 (UTC)Reply

.svg is confusing from step d) to step e)

edit

The .svg graphic is confusing from step d) to step e). The transition between earlier steps only moves one item at a time. The transition from step d) to step e) moves three items at once. It's unclear as to which '+' was moved to the output and which plus is moved to the operand stack for that transition. Was the plus on the operand stack moved to the output and the plus at the input moved to the operand stack? Or, was the plus from the input moved to the output? If someone doesn't understand the algorithm, they won't know that the plus at the output is from the operand stack and the plus now on the operand stack was from the input. They won't recognize that both pluses were moved and one of them is now in the same place that the other one was. They'll think that the plus from the input was moved to the output. This can be resolved by moving fewer items per state transition, or by changing the second plus to a minus. —Preceding unsigned comment added by 98.243.108.232 (talk) 18:12, 21 December 2010 (UTC)Reply

C example

edit

The following should be added to the C example after the #include statements so it compiles in ANSI C:

 #define bool int
 #define false 0
 #define true 1

These terms are not supported. —Preceding unsigned comment added by 71.220.68.202 (talk) 03:54, 10 January 2011 (UTC)Reply

I see that change made it in the article. Another option is to include the c99 header <stdbool.h> which save two lines of code but break ANSI c89 compatibility. R4p70r (talk) 06:02, 18 February 2011 (UTC)Reply

I'm testing C example. I'm getting correct output when reversing operator precedence OR changed code to:

((op_left_assoc(c) && (op_preced(c) >= op_preced(sc))) ||
                          (op_preced(c) > op_preced(sc))))

Is it bug in algorithm ? (i'm analyzing resulting expression left-to-right). — Preceding unsigned comment added by 5.20.182.75 (talk) 19:03, 13 January 2013 (UTC)Reply

Parsing mathematical equations

edit

The article states that the shunting-yard algorithm is a method for parsing mathematical equations. I'm not too versed in mathematics but it seems that the word "equations" imply the equality between two expressions. The thing is I don't see an equal signs in the infix input to the algorithm. Is it correct to say that equations are being parsed in this context? R4p70r (talk) 05:01, 18 February 2011 (UTC)Reply

Indeed, it is probably not strictly correct. I would use the term "expressions". C3lticmatt (talk) 16:25, 28 March 2011 (UTC)Reply

Prefix notation

edit

I can make no sense of the sentence explaining prefix notation:

The shunting yard algorithm can also be applied to produce prefix notation (also known as polish notation). To do this one would simply start from the beginning of a string of tokens to be parsed and work backwards, and then reversing the output queue (therefore making the output queue an output stack).

Presumably the participle 'reversing' should be 'reverse'. But then how do you start at the beginning of a string and work backwards, without falling off the beginning of the string? Marinheiro (talk) 10:16, 8 December 2011 (UTC)Reply

edit

Python implementation is not correct. I opened a bug with an original author: https://github.com/ekg/shuntingyard/issues/1 — Preceding unsigned comment added by Yuriyzubarev (talkcontribs) 19:46, 23 March 2012 (UTC)Reply

A bug in the Algorithm description.

edit

It seems to be an error, because it does not work for the case sin(max(2,3)/5) If the current token is an operator o1, then I think when we look on top of the stack not just for other operators, but functions as well. And if it is a function we should pop it onto the output queue, because function calls have a very high precedence. Bob Ueland (talk) 21:17, 28 January 2016 (UTC)Reply

edit

Hello fellow Wikipedians,

I have just added archive links to one external link on Shunting-yard algorithm. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

 Y An editor has reviewed this edit and fixed any errors that were found.

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—cyberbot IITalk to my owner:Online 04:48, 3 March 2016 (UTC)Reply

As the algorithm states, it does not work for composite functions. Hereisdx (talk) 04:25, 25 December 2019 (UTC)Reply

Shunting yard algorithm description ambiguous

edit

In the algorithm description under "The algorithm in detail", at the first "while", it is not clear whether "and the operator is left associative" refers to the operator on top of the stack or the operator found from the token. I assumed the former when implementing it, but ran into issues, which the latter approach happened to solve.

Could anybody clarify on this ambiguous statement and perhaps correct it?

103.9.76.196 (talk) 22:09, 15 October 2017 (UTC)Reply

The algorithm was reworded in June 2017. See the version at 12 April 2017 which has more detail and a relevant example. It's a long time since I looked the algorithm so I don't want to offer an opinion on which operator is intended at the moment. Johnuniq (talk) 03:31, 16 October 2017 (UTC)Reply

"The operator is left associative" refers to the current operator found from the token, not the one on the operator stack. It's implements the April 2017 version but is expressed with fewer comparisons. Granted, the new description _is_ ambiguous there. 91.152.24.108 (talk) 21:36, 27 February 2021 (UTC)Reply

Slight error in algorithm description

edit

It says: "The shunting yard algorithm can also be applied to produce prefix notation (also known as Polish notation). To do this one would simply start from the end of a string of tokens to be parsed and work backwards, reverse the output queue (therefore making the output queue an output stack), and flip the left and right parenthesis behavior (remembering that the now-left parenthesis behavior should pop until it finds a now-right parenthesis)."

Correct me if I'm wrong, but I don't think this is actually sufficient to produce prefix notation with the same semantics. This is because associativity would be flipped if you just used this method. As in, if you have 2*3/4 that would be 23*4/ in postfix notation assuming left-associativity ((2*3)/4). However, using the method stated here to get prefix notation, I get *2/34 (2*(3/4)), i.e. associativity has been switched! SorteKanin (talk) 18:38, 17 February 2018 (UTC)Reply

Implementation example in Pharo

edit

I am new to Wikipedia. I submitted an implementation of the algorithm fully working. I implemented the code based on the concept in this page and it was very straight forward. A few adjustments were done indeed for the unary sign (to answer question above). The code actually uses OOP to discover the operands and functions dynamically and builds all the required data structures on demand. I think could help many people that need to implement a similar one, reason why I placed it in the external reference. With the parser and the dynamic structures created, I also built a calculator. The code was written for Pharo Smalltalk. I used the GitHub URL which allows anybody to download the code. — Preceding unsigned comment added by Grpistoia (talkcontribs) 04:34, 24 January 2021 (UTC)Reply

Algorithm problem

edit

I think that there is a problem in the while cycle where we verify the operator stack before pop another one onto, instead of: [1]

we have to fix with something like:

       while ((there is a operator at the top of the operator stack)
              AND ((the operator at the top of the operator stack has greater precedence)
              or (the operator at the top of the operator stack has equal precedence and the token is left associative)))
             and (the operator at the top of the operator stack is not a left parenthesis):
           pop operators from the operator stack onto the output queue.

GPaolin (talk) 07:20, 13 May 2020 (UTC)Reply

References

  1. ^ while ((there is a operator at the top of the operator stack) or (there is an operator at the top of the operator stack with greater precedence) or (the operator at the top of the operator stack has equal precedence and the token is left associative)) and (the operator at the top of the operator stack is not a left parenthesis): pop operators from the operator stack onto the output queue.

Better reference for Dijkstra's original?

edit

The link given in the first paragraph for "Mathematisch Centrum report MR 34/61" points to a good enough source but I believe a better one exists here; the years of publishing do differ, however. Chorl (talk) 14:36, 26 August 2023 (UTC)Reply

Reversal

edit

I wonder whether there's already an article about the inverse operation: Converting RPN to an equivalent algebraic expression. If so, it should be linked, probably under "See also". I guess you just have to shuttle the tokens the other way and insert parentheses as necessary. Though maybe that algorithm doesn't have an agreed-upon name yet.