Skip to content

Context Sensitive Lexer #1054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

marcioAlmada
Copy link
Contributor

This aims to be a consistent implementation of semi-reserved words grammar on OO context and OO access context.

RFC: https://wiki.php.net/rfc/context_sensitive_lexer

marcioAlmada added a commit to marcioAlmada/php-src that referenced this pull request Feb 5, 2015
@marcioAlmada marcioAlmada force-pushed the context branch 6 times, most recently from e3c8519 to 94a4100 Compare February 11, 2015 13:55
static unsigned int in_class = 0;
static zend_bool in_const_list = 0;

ZEND_ASSERT(in_class >= 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in_class is an unsigned int ... I would be very surprised if it became smaller than zero ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a test with static unsigned int in_class = -1; and it caused strange bugs without errors here :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the unsigned and kept the assertion because it's currently not very obvious to detect bugs if somebody decrements in_class in wrong conditions.

@laruence
Copy link
Member

@smalyshev @jpauli

@jpauli jpauli added the RFC label Feb 16, 2015
@marcioAlmada marcioAlmada changed the title Allow context sensitive grammar for object context Context sensitive lexer for OO context Feb 16, 2015
@marcioAlmada marcioAlmada force-pushed the context branch 3 times, most recently from 4c384aa to 636e210 Compare February 20, 2015 06:43
@POPSuL
Copy link

POPSuL commented Feb 24, 2015

It's great idea!

@marcioAlmada marcioAlmada force-pushed the context branch 5 times, most recently from a8effa2 to 1fe57cd Compare February 25, 2015 03:12
@HallofFamer
Copy link

I have a question though, does this also apply to classes, interfaces or traits, or only to properties, methods and constants? Lets say if scalar type hinting is passed and the scalar names become reserved words, while I have an Int class and a String class. Will this RFC solve the problem?

@marcioAlmada
Copy link
Contributor Author

Hi,

Will this RFC solve the problem?

No.

I tried a more ambitious RFC before that targeted namespaces, classes,
traits and interfaces. The other patch was fully functional but it had some
drawbacks and conflicted with other possible future RFCs. So the RFC was
reverted to this version, targeting only OO declaration context and OO
access context.

A pure lexical approach, without significant changes on the lexer
structure, will probably not solve this problem. I have some ideas for
future work targeting the wider issue though, involving multiple lexers or
maybe find a way to do lexing feedback without break tokenizer extension.
But it wont't/can't be finished before the feature freeze.

Anyway, the current solution is a great start and already mitigates a lot
of the BC breaks from other RFCs that will soon be put to vote, like:

Hope that's useful information.

Thanks,
Márcio.

@HallofFamer
Copy link

I see, thanks for your response, I will be looking forward to the more ambitious case-sensitive patch too. It's unfortunate that scalar names will become reserved words, I have mixed feelings about scalar type hinting. Part of me like it since its a step towards strongly typed PHP I was hoping for, but reserving scalar types makes it inconvenient for me. I know Levi Morrison proposed reserve some types in PHP 7, I am okay with his proposal though as I fully understand why this may be necessary.

@marcioAlmada marcioAlmada changed the title Context sensitive lexer for OO context Context Sensitive Lexer Mar 1, 2015
#define ZEND_RESERVED_PRIVATE 4
#define ZEND_RESERVED_PROTECTED 5
#define ZEND_RESERVED_PUBLIC 6
#define ZEND_RESERVED_CLASS 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the distinction between the individual reserved keywords really relevant here? It look to me like it would be good enough to just return a boolean from the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be useful in case one wants to check for a specific identifier to give more specialized err messages, like: https://github.com/php/php-src/pull/1054/files#diff-3a8139128d4026ce0cb0c86beba4e6b9R4511

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen the error message check, but the distinction seems really weird there ... it's pretty much the same message with slightly different wording. That does not seem worthwhile.

yy_push_state(ST_LOOKING_FOR_SR_NAME);
}
return ',';
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about const FOO = [1, isset(BAR[0])]; will the isset be lexed as T_ISSET or as T_STRING? It should be a T_ISSET. We currently don't support any language constructors in constants I think, but that's just because they don't happen to be implemented, we may want to add them in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T_ISSET because the ST_LOOKING_FOR_SR_NAME pops itself after first match so in const FOO = [1, isset(BAR[0])]; only FOO will be the T_STRING.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand your response. In that case in_class and in_const_list should be 1, so that would push LOOKING_FOR_SR_NAME after the , - or not? How does it distinguish between A = 1, B = 2 and A = [1, B] for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Fixed with marcioAlmada@a8b1f21. Thanks ^^

@marcioAlmada
Copy link
Contributor Author

superseded by #1158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants