-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Routing] Match 77.7x faster by compiling routes in one regexp #26059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
afe32a7
to
6c2e2a3
Compare
$code .= sprintf(" case %s:\n", self::export($url)); | ||
foreach ($routes as $name => list($hasTrailingSlash, $route)) { | ||
$methods = $route->getMethods(); | ||
$supportsTrailingSlash = $supportsRedirections && (!$methods || in_array('HEAD', $methods) || in_array('GET', $methods)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Route methods must be uppercase strings, \in_array
calls could be made strict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is borrowed from existing code, nothing to do here: uppercase is already done by the very method you linked, and tweaking in_array() would have zero impact as this is dumping code, not critical at all
fb0de45
to
75f4efe
Compare
// dynamic | ||
return $this->mergeDefaults(array_replace($matches, array('_route' => 'dynamic')), array()); | ||
|
||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This break
statement seems useless.
|
||
// a_second | ||
if ('/a/22' === $pathinfo) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This break
statement seems useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @phansys, you missed some :)
This is generated code, removing them would add a lot of complexity for no practical benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if unreachable code has a performance impact, as the opcodes are still generated (but never used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's opcache's job, removing dead code
// a_wildcard | ||
return $this->mergeDefaults(array_replace($matches, array('_route' => 'a_wildcard')), array()); | ||
|
||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This break
statement seems useless.
// nested_wildcard | ||
return $this->mergeDefaults(array_replace($matches, array('_route' => 'nested_wildcard')), array()); | ||
|
||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This break
statement seems useless.
// regex_trailing_slash_no_methods | ||
return $this->mergeDefaults(array_replace($matches, array('_route' => 'regex_trailing_slash_no_methods')), array()); | ||
|
||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This break
statement seems useless.
// regex_not_trailing_slash_no_methods | ||
return $this->mergeDefaults(array_replace($matches, array('_route' => 'regex_not_trailing_slash_no_methods')), array()); | ||
|
||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This break
statement seems useless.
75f4efe
to
bfc2288
Compare
} | ||
switch ($m = $matches['MARK']) { | ||
case '_28': | ||
$matches = array('var' => $matches[1] ?? null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most routes the $matches
var seems to be used only inside the $this->mergeDefaults()
call. If that's always the case, maybe we can avoid this temporary variable and always include it inside $this->mergeDefaults()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are dumped using independent methods, so cannot do it without heavy changes/complexity
since this is run-once code, there would be no practical benefit, so it's not worth it
} | ||
|
||
$regexOffset = 0; | ||
$regexList = array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's useful in this case, but the S
modifier makes PHP better analyze the regexps to improve performance: http://php.net/manual/en/reference.pcre.pattern.modifiers.php
S
When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
studying a pattern is useful only for non-anchored patterns
the pattern here is anchored, so this doesn't apply
In the benchmark code, all the route URLs have the same prefix. I wonder if that influences in any way the results (maybe it unlocks some internal optimizations somehow): $routes->add('r'.$i, new Route('/abc'.$i));
$routes->add('f'.$i, new Route('/abc{foo}/'.$i)); Maybe for the benchmark we could try the worst case possible (a different prefix for each route) and see if the results vary: for ($i = 0; $i < 400; ++$i) {
- $routes->add('r'.$i, new Route('/abc'.$i));
- $routes->add('f'.$i, new Route('/abc{foo}/'.$i));
+ $prefix = substr(str_shuffle('abcdefghijklmnopqrstuvwxyz'), -3);
+ $routes->add('r'.$i, new Route('/'.$prefix.$i));
+ $routes->add('f'.$i, new Route('/'.$prefix.'{foo}/'.$i));
}
// ...
while (--$i) {
- $router->match('/abcdef/399');
+ $router->match('/'.$prefix.'def/399');
}
// ... |
@javiereguiluz oh yes, please report your findings! |
d864e16
to
3b78aa5
Compare
PR ready :) |
foreach ($collection->all() as $name => $route) { | ||
$compiledRoute = $route->compile(); | ||
$regex = $compiledRoute->getRegex(); | ||
$methods = $route->getMethods(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that $methods
is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed thanks
9ba72eb
to
54ec3e6
Compare
@nicolas-grekas it seems this is now too optimized for a common prefix? Out of interest I benchmarked it on my biggest Symfony 3.4 monolith app with around 540 routes. <?php
require_once 'vendor/autoload.php';
$kernel = new AppKernel('prod', false);
$kernel->boot();
$router = $kernel->getContainer()->get('router');
$routes = $router->getRouteCollection();
$router->getContext()->setHost('...');
$router->getContext()->setMethod('GET');
foreach (range(0, 10000) as $i) {
/** @var \Symfony\Component\Routing\Route $route */
foreach ($routes as $route) {
if (!$route->getCondition() && (!$route->getMethods() || in_array('GET', $route->getMethods()))) {
try {
$router->match(preg_replace('/\{[^{]+\}/', 'foo', $route->getPath()));
} catch (\Symfony\Component\Routing\Exception\ResourceNotFoundException $e) {
}
}
}
} This naive script matches 177 out of 541 routes. Benchmarks done with cleared & warmed up cache. With Symfony 3.4.4.:
Applied your changes for
So for me this takes roughly 3.5 times as long now. Do you see something wrong with my benchmark? |
0a38a41
to
a6012ae
Compare
@nicolas-grekas awesome job 🎉 For my huge monolith app I extended my simple test script a bit so it matches routes for all available hosts: https://gist.github.com/dmaicher/61f2e388ac0e65cc945dbb6e678e2ef7 Here the new numbers: Stock symfony 3.4.4: The total number of matched routes is identical 👍 Edit: but I guess you used my route collection already for some benchmarks yourself 😄 |
$code .= <<<EOF | ||
default: | ||
\$routes = array( | ||
{$this->indent($default, 4)} ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This array can contain severals static routes. With a big amount of route, this array and the $regexList
and $matchedPathinfo
array could be significants.
Isn't it possible to define them as static array and benefit of OPCode cache ? (But won't work on 3.4 with php55 requirement...) Like you've done with the php array adapter ;)
Anyway great job @nicolas-grekas !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A static array can be inline, opcache will work with them. So nothing to change here to benefit from shared memory.
1189067
to
95aa7d9
Compare
PR ready for a last round of review, all green on my side. |
throw $e; | ||
} | ||
if (!in_array($this->context->getMethod(), array('HEAD', 'GET'))) { | ||
throw new MethodNotAllowedException(array('GET')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
@nicolas-grekas I think the trailing slash logic is way to complicated now and makes it really hard to maintain. I think it would be much easier to do something similar as the RedirectableUrlMatcher. This would also allow to do the redirect the other way round very easily which has been asked for several times: GET /foo/ to redirect to GET /foo when /foo/ does not exist. |
95aa7d9
to
f933f70
Compare
@Tobion why not. For another PR if you don't mind? |
Thank you Nicolas for this inspiring work |
…e regexp (nicolas-grekas) This PR was merged into the 4.1-dev branch. Discussion ---------- [Routing] Match 77.7x faster by compiling routes in one regexp | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | - | License | MIT | Doc PR | - Builds on top of #25961 and http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html to make routing 77.7x faster (measured when matching the last dynamic route of 400 static + 400 dynamic routes.) More benchs welcomed. - [x] check static routes first in a switch - [x] group same-modifiers regexps - [x] put condition-less routes in a static map, in a switch's "default" - [x] match host+pathinfo at the same time - [x] group same-host regexps in sub-patterns - [x] consider the host to move more static routes in the static switch - [x] move static single-route "case" to "default" by adding requirements to $routes - [x] group same-static-prefix in sub-patterns - [x] group consecutive same-regex in a single "case" - [x] move dynamic single-route "case" to "default" by adding requirements to $routes - [x] extract host variables from the main match and remove the current 2nd match - [x] extend same-prefix grouping to placeholders - [x] group same-suffix hosts together Here is my benchmarking code: ```php <?php namespace Symfony\Component\Routing; require 'vendor/autoload.php'; $routes = new RouteCollection(); for ($i = 0; $i < 400; ++$i) { $routes->add('r'.$i, new Route('/abc'.$i)); $routes->add('f'.$i, new Route('/abc{foo}/'.$i)); } $dumper = new Matcher\Dumper\PhpMatcherDumper($routes); eval('?'.'>'.$dumper->dump()); $router = new \ProjectUrlMatcher(new RequestContext()); $i = 10000; $s = microtime(1); while (--$i) { $router->match('/abcdef/399'); } echo 'Symfony: ', 1000 * (microtime(1) - $s), "\n"; namespace FastRoute; $dispatcher = simpleDispatcher(function(RouteCollector $r) { for ($i = 0; $i < 400; ++$i) { $r->addRoute('GET', '/abc'.$i, 'r'.$i); $r->addRoute('GET', '/abc{foo}/'.$i, 'f'.$i); } }); $i = 10000; $s = microtime(1); while (--$i) { $dispatcher->dispatch('GET', '/abcdef/399'); } echo 'FastRoute: ', 1000 * (microtime(1) - $s), "\n"; ``` Commits ------- f933f70 [Routing] Match 77.7x faster by compiling routes in one regexp
This is probably going to be a problem with pcre jit enabled. The stack limit there is 32k |
@mvrhov not sure this can happen on real world app. Routes are pretty simple usually, and pretty short. Matching them is straightforward - ie it doesn't involve much backtracking. But of course I may be wrong, please provide a reproducer if you have a case that doesn't work. There is another limit I just hit when trying to run https://github.com/tyler-sommer/php-router-benchmark: We may need chunking to workaround the issue. This should be easy to detect at compile time. edit: chunking done in #26169 |
The 2nd limit might come from pcre.backtrack_limit. |
@mvrhov Symfony generates possessive quantifiers for placeholders by default. And when defining custom requirements, it is generally possible to rely on possessive quantifiers too (not being able to use them would require having some crazy URL pattern in most cases). And so, backtracking should not happen too often. |
…-grekas) This PR was merged into the 4.1-dev branch. Discussion ---------- [Routing] Handle very large set of dynamic routes | Q | A | ------------- | --- | Branch? | master | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | - | License | MIT | Doc PR | - Follow up of #26059; allows handling very long lists of dynamic routes. Allows running https://github.com/tyler-sommer/php-router-benchmark seamlessly. BTW, here are the result of running this benchmark: R3 extension is not loaded. Skipping initialization for "Worst-case matching" test using R3. R3 extension is not loaded. Skipping initialization for "First route matching" test using R3. ## Worst-case matching This benchmark matches the last route and unknown route. It generates a randomly prefixed and suffixed route in an attempt to thwart any optimization. 1,000 routes each with 9 arguments. This benchmark consists of 10 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded. Test Name | Results | Time | + Interval | Change --------- | ------- | ---- | ---------- | ------ Symfony4 - unknown route (1000 routes) | 995 | 0.0000085699 | +0.0000000000 | baseline Symfony4 - last route (1000 routes) | 999 | 0.0000086754 | +0.0000001055 | 1% slower FastRoute - unknown route (1000 routes) | 980 | 0.0000305154 | +0.0000219455 | 256% slower FastRoute - last route (1000 routes) | 999 | 0.0000529922 | +0.0000444223 | 518% slower Pux PHP - unknown route (1000 routes) | 972 | 0.0003162730 | +0.0003077032 | 3591% slower Pux PHP - last route (1000 routes) | 999 | 0.0004376847 | +0.0004291148 | 5007% slower Aura v2 - unknown route (1000 routes) | 976 | 0.0138277517 | +0.0138191818 | 161253% slower Aura v2 - last route (1000 routes) | 989 | 0.0138914190 | +0.0138828491 | 161996% slower ## First route matching This benchmark tests how quickly each router can match the first route. 1,000 routes each with 9 arguments. This benchmark consists of 5 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded. Test Name | Results | Time | + Interval | Change --------- | ------- | ---- | ---------- | ------ FastRoute - first route | 999 | 0.0000016928 | +0.0000000000 | baseline Pux PHP - first route | 999 | 0.0000017381 | +0.0000000453 | 3% slower Symfony4 - first route | 997 | 0.0000029818 | +0.0000012890 | 76% slower Aura v2 - first route | 977 | 0.0000376436 | +0.0000359508 | 2124% slower Commits ------- ee8b201 [Routing] Handle very large set of dynamic routes
I just published “Making Symfony’s Router 77.7x faster — 1/2” |
…n possible (nicolas-grekas) This PR was merged into the 4.1-dev branch. Discussion ---------- [Routing] Redirect from trailing slash to no-slash when possible | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | yes | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #26207 | License | MIT | Doc PR | - Implemented as suggest by @Tobion in #26059 (comment) When a route for `/foo` exists but the request is for `/foo/`, we now redirect. (this complements the flipped side redirection, which already exists.) Commits ------- 69a4e94 [Routing] Redirect from trailing slash to no-slash when possible
Builds on top of #25961 and http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html to make routing 77.7x faster (measured when matching the last dynamic route of 400 static + 400 dynamic routes.)
More benchs welcomed.
Here is my benchmarking code: