Skip to content

[Routing] Match 77.7x faster by compiling routes in one regexp #26059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2018

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Feb 5, 2018

Q A
Branch? master
Bug fix? no
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets -
License MIT
Doc PR -

Builds on top of #25961 and http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html to make routing 77.7x faster (measured when matching the last dynamic route of 400 static + 400 dynamic routes.)

More benchs welcomed.

  • check static routes first in a switch
  • group same-modifiers regexps
  • put condition-less routes in a static map, in a switch's "default"
  • match host+pathinfo at the same time
  • group same-host regexps in sub-patterns
  • consider the host to move more static routes in the static switch
  • move static single-route "case" to "default" by adding requirements to $routes
  • group same-static-prefix in sub-patterns
  • group consecutive same-regex in a single "case"
  • move dynamic single-route "case" to "default" by adding requirements to $routes
  • extract host variables from the main match and remove the current 2nd match
  • extend same-prefix grouping to placeholders
  • group same-suffix hosts together

Here is my benchmarking code:

<?php

namespace Symfony\Component\Routing;

require 'vendor/autoload.php';

$routes = new RouteCollection();

for ($i = 0; $i < 400; ++$i) {
    $routes->add('r'.$i, new Route('/abc'.$i));
    $routes->add('f'.$i, new Route('/abc{foo}/'.$i));
}

$dumper = new Matcher\Dumper\PhpMatcherDumper($routes);

eval('?'.'>'.$dumper->dump());

$router = new \ProjectUrlMatcher(new RequestContext());

$i = 10000;
$s = microtime(1);

while (--$i) {
    $router->match('/abcdef/399');
}

echo 'Symfony: ', 1000 * (microtime(1) - $s), "\n";

namespace FastRoute;

$dispatcher = simpleDispatcher(function(RouteCollector $r) {
    for ($i = 0; $i < 400; ++$i) {
        $r->addRoute('GET', '/abc'.$i, 'r'.$i);
        $r->addRoute('GET', '/abc{foo}/'.$i, 'f'.$i);
    }
});

$i = 10000;
$s = microtime(1);

while (--$i) {
    $dispatcher->dispatch('GET', '/abcdef/399');
}

echo 'FastRoute: ', 1000 * (microtime(1) - $s), "\n";

$code .= sprintf(" case %s:\n", self::export($url));
foreach ($routes as $name => list($hasTrailingSlash, $route)) {
$methods = $route->getMethods();
$supportsTrailingSlash = $supportsRedirections && (!$methods || in_array('HEAD', $methods) || in_array('GET', $methods));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Route methods must be uppercase strings, \in_array calls could be made strict

Copy link
Member Author

@nicolas-grekas nicolas-grekas Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is borrowed from existing code, nothing to do here: uppercase is already done by the very method you linked, and tweaking in_array() would have zero impact as this is dumping code, not critical at all

@nicolas-grekas nicolas-grekas force-pushed the router-one-rx branch 3 times, most recently from fb0de45 to 75f4efe Compare February 5, 2018 21:30
// dynamic
return $this->mergeDefaults(array_replace($matches, array('_route' => 'dynamic')), array());

break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This break statement seems useless.


// a_second
if ('/a/22' === $pathinfo) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This break statement seems useless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phansys, you missed some :)
This is generated code, removing them would add a lot of complexity for no practical benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if unreachable code has a performance impact, as the opcodes are still generated (but never used).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's opcache's job, removing dead code

// a_wildcard
return $this->mergeDefaults(array_replace($matches, array('_route' => 'a_wildcard')), array());

break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This break statement seems useless.

// nested_wildcard
return $this->mergeDefaults(array_replace($matches, array('_route' => 'nested_wildcard')), array());

break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This break statement seems useless.

// regex_trailing_slash_no_methods
return $this->mergeDefaults(array_replace($matches, array('_route' => 'regex_trailing_slash_no_methods')), array());

break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This break statement seems useless.

// regex_not_trailing_slash_no_methods
return $this->mergeDefaults(array_replace($matches, array('_route' => 'regex_not_trailing_slash_no_methods')), array());

break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This break statement seems useless.

}
switch ($m = $matches['MARK']) {
case '_28':
$matches = array('var' => $matches[1] ?? null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most routes the $matches var seems to be used only inside the $this->mergeDefaults() call. If that's always the case, maybe we can avoid this temporary variable and always include it inside $this->mergeDefaults().

Copy link
Member Author

@nicolas-grekas nicolas-grekas Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are dumped using independent methods, so cannot do it without heavy changes/complexity
since this is run-once code, there would be no practical benefit, so it's not worth it

}

$regexOffset = 0;
$regexList = array(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's useful in this case, but the S modifier makes PHP better analyze the regexps to improve performance: http://php.net/manual/en/reference.pcre.pattern.modifiers.php

S

When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

studying a pattern is useful only for non-anchored patterns

the pattern here is anchored, so this doesn't apply

@symfony symfony deleted a comment from phansys Feb 6, 2018
@symfony symfony deleted a comment from phansys Feb 6, 2018
@symfony symfony deleted a comment from phansys Feb 6, 2018
@javiereguiluz
Copy link
Member

In the benchmark code, all the route URLs have the same prefix. I wonder if that influences in any way the results (maybe it unlocks some internal optimizations somehow):

$routes->add('r'.$i, new Route('/abc'.$i));
$routes->add('f'.$i, new Route('/abc{foo}/'.$i));

Maybe for the benchmark we could try the worst case possible (a different prefix for each route) and see if the results vary:

for ($i = 0; $i < 400; ++$i) {
-    $routes->add('r'.$i, new Route('/abc'.$i));
-    $routes->add('f'.$i, new Route('/abc{foo}/'.$i));
+    $prefix = substr(str_shuffle('abcdefghijklmnopqrstuvwxyz'), -3);
+    $routes->add('r'.$i, new Route('/'.$prefix.$i));
+    $routes->add('f'.$i, new Route('/'.$prefix.'{foo}/'.$i));
}

// ...

while (--$i) {
-    $router->match('/abcdef/399');
+    $router->match('/'.$prefix.'def/399');
}

// ...

@nicolas-grekas
Copy link
Member Author

@javiereguiluz oh yes, please report your findings!

@nicolas-grekas
Copy link
Member Author

PR ready :)

foreach ($collection->all() as $name => $route) {
$compiledRoute = $route->compile();
$regex = $compiledRoute->getRegex();
$methods = $route->getMethods();
Copy link
Contributor

@Simperfit Simperfit Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that $methods is not used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed thanks

@nicolas-grekas nicolas-grekas force-pushed the router-one-rx branch 2 times, most recently from 9ba72eb to 54ec3e6 Compare February 6, 2018 10:48
@dmaicher
Copy link
Contributor

dmaicher commented Feb 6, 2018

@nicolas-grekas it seems this is now too optimized for a common prefix?

Out of interest I benchmarked it on my biggest Symfony 3.4 monolith app with around 540 routes.

<?php

require_once 'vendor/autoload.php';

$kernel = new AppKernel('prod', false);
$kernel->boot();

$router = $kernel->getContainer()->get('router');
$routes = $router->getRouteCollection();
$router->getContext()->setHost('...');
$router->getContext()->setMethod('GET');

foreach (range(0, 10000) as $i) {
    /** @var \Symfony\Component\Routing\Route $route */
    foreach ($routes as $route) {
        if (!$route->getCondition() && (!$route->getMethods() || in_array('GET', $route->getMethods()))) {
            try {
                $router->match(preg_replace('/\{[^{]+\}/', 'foo', $route->getPath()));
            } catch (\Symfony\Component\Routing\Exception\ResourceNotFoundException $e) {
            }
        }
    }
}

This naive script matches 177 out of 541 routes.

Benchmarks done with cleared & warmed up cache.

With Symfony 3.4.4.:

$ time php route_bench.php 

real	0m11.288s
user	0m11.256s
sys	0m0.024s

Applied your changes for PhpMatcherDumper and PhpGeneratorDumper:

$ time php route_bench.php 

real	0m37.907s
user	0m37.856s
sys	0m0.028s

So for me this takes roughly 3.5 times as long now. Do you see something wrong with my benchmark?

@dmaicher
Copy link
Contributor

dmaicher commented Feb 12, 2018

@nicolas-grekas awesome job 🎉

For my huge monolith app I extended my simple test script a bit so it matches routes for all available hosts: https://gist.github.com/dmaicher/61f2e388ac0e65cc945dbb6e678e2ef7

Here the new numbers:

Stock symfony 3.4.4: 0m22.874s
Your optimizations: 0m14.812s

The total number of matched routes is identical 👍

Edit: but I guess you used my route collection already for some benchmarks yourself 😄

$code .= <<<EOF
default:
\$routes = array(
{$this->indent($default, 4)} );
Copy link
Contributor

@ScullWM ScullWM Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This array can contain severals static routes. With a big amount of route, this array and the $regexList and $matchedPathinfo array could be significants.

Isn't it possible to define them as static array and benefit of OPCode cache ? (But won't work on 3.4 with php55 requirement...) Like you've done with the php array adapter ;)

Anyway great job @nicolas-grekas !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A static array can be inline, opcache will work with them. So nothing to change here to benefit from shared memory.

@nicolas-grekas nicolas-grekas force-pushed the router-one-rx branch 2 times, most recently from 1189067 to 95aa7d9 Compare February 12, 2018 18:01
@nicolas-grekas
Copy link
Member Author

PR ready for a last round of review, all green on my side.

throw $e;
}
if (!in_array($this->context->getMethod(), array('HEAD', 'GET'))) {
throw new MethodNotAllowedException(array('GET'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be reverted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@Tobion
Copy link
Contributor

Tobion commented Feb 12, 2018

@nicolas-grekas I think the trailing slash logic is way to complicated now and makes it really hard to maintain. I think it would be much easier to do something similar as the RedirectableUrlMatcher.
So it first matches with the original uri path. And only if it does not find anything then trigger a new internal match with the changed path using the same generated match method. Then we don't need to keep track of all the supportsRedirections and hashTrailingSlash logic and changing the regex based on this. And since it happens only when it does not match the original path, it has no real-world performance inpact as well. But I think it makes the code much more readable.

This would also allow to do the redirect the other way round very easily which has been asked for several times: GET /foo/ to redirect to GET /foo when /foo/ does not exist.
This is currently really hard to implement due to all the string manipulation magic and implementation details.

@nicolas-grekas
Copy link
Member Author

@Tobion why not. For another PR if you don't mind?

@Tobion
Copy link
Contributor

Tobion commented Feb 12, 2018

Thank you Nicolas for this inspiring work

@Tobion Tobion merged commit f933f70 into symfony:master Feb 12, 2018
Tobion added a commit that referenced this pull request Feb 12, 2018
…e regexp (nicolas-grekas)

This PR was merged into the 4.1-dev branch.

Discussion
----------

 [Routing] Match 77.7x faster by compiling routes in one regexp

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

Builds on top of #25961 and http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html to make routing 77.7x faster (measured when matching the last dynamic route of 400 static + 400 dynamic routes.)

More benchs welcomed.

- [x] check static routes first in a switch
- [x] group same-modifiers regexps
- [x] put condition-less routes in a static map, in a switch's "default"
- [x] match host+pathinfo at the same time
- [x] group same-host regexps in sub-patterns
- [x] consider the host to move more static routes in the static switch
- [x] move static single-route "case" to "default" by adding requirements to $routes
- [x] group same-static-prefix in sub-patterns
- [x] group consecutive same-regex in a single "case"
- [x] move dynamic single-route "case" to "default" by adding requirements to $routes
- [x] extract host variables from the main match and remove the current 2nd match
- [x] extend same-prefix grouping to placeholders
- [x] group same-suffix hosts together

Here is my benchmarking code:

```php
<?php

namespace Symfony\Component\Routing;

require 'vendor/autoload.php';

$routes = new RouteCollection();

for ($i = 0; $i < 400; ++$i) {
    $routes->add('r'.$i, new Route('/abc'.$i));
    $routes->add('f'.$i, new Route('/abc{foo}/'.$i));
}

$dumper = new Matcher\Dumper\PhpMatcherDumper($routes);

eval('?'.'>'.$dumper->dump());

$router = new \ProjectUrlMatcher(new RequestContext());

$i = 10000;
$s = microtime(1);

while (--$i) {
    $router->match('/abcdef/399');
}

echo 'Symfony: ', 1000 * (microtime(1) - $s), "\n";

namespace FastRoute;

$dispatcher = simpleDispatcher(function(RouteCollector $r) {
    for ($i = 0; $i < 400; ++$i) {
        $r->addRoute('GET', '/abc'.$i, 'r'.$i);
        $r->addRoute('GET', '/abc{foo}/'.$i, 'f'.$i);
    }
});

$i = 10000;
$s = microtime(1);

while (--$i) {
    $dispatcher->dispatch('GET', '/abcdef/399');
}

echo 'FastRoute: ', 1000 * (microtime(1) - $s), "\n";
```

Commits
-------

f933f70 [Routing] Match 77.7x faster by compiling routes in one regexp
@nicolas-grekas nicolas-grekas deleted the router-one-rx branch February 13, 2018 08:50
@mvrhov
Copy link

mvrhov commented Feb 13, 2018

This is probably going to be a problem with pcre jit enabled. The stack limit there is 32k
https://bugs.php.net/bug.php?id=70110 and http://marc.info/?l=php-internals&m=143764967808306&w=2

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Feb 13, 2018

@mvrhov not sure this can happen on real world app. Routes are pretty simple usually, and pretty short. Matching them is straightforward - ie it doesn't involve much backtracking. But of course I may be wrong, please provide a reproducer if you have a case that doesn't work.

There is another limit I just hit when trying to run https://github.com/tyler-sommer/php-router-benchmark:
Compilation failed: regular expression is too large at offset 111543

We may need chunking to workaround the issue. This should be easy to detect at compile time.

edit: chunking done in #26169

@mvrhov
Copy link

mvrhov commented Feb 13, 2018

The 2nd limit might come from pcre.backtrack_limit.
I don't have reproducer I'm just saying that the some huge (in number of items) and long (in number of characters per route url) route collection benchmark should be run with pcre.jit set to true

@stof
Copy link
Member

stof commented Feb 13, 2018

@mvrhov Symfony generates possessive quantifiers for placeholders by default. And when defining custom requirements, it is generally possible to rely on possessive quantifiers too (not being able to use them would require having some crazy URL pattern in most cases). And so, backtracking should not happen too often.

nicolas-grekas added a commit that referenced this pull request Feb 14, 2018
…-grekas)

This PR was merged into the 4.1-dev branch.

Discussion
----------

[Routing] Handle very large set of dynamic routes

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

Follow up of #26059; allows handling very long lists of dynamic routes.
Allows running https://github.com/tyler-sommer/php-router-benchmark seamlessly.

BTW, here are the result of running this benchmark:

R3 extension is not loaded. Skipping initialization for "Worst-case matching" test using R3.
R3 extension is not loaded. Skipping initialization for "First route matching" test using R3.
## Worst-case matching
This benchmark matches the last route and unknown route. It generates a randomly prefixed and suffixed route in an attempt to thwart any optimization. 1,000 routes each with 9 arguments.

This benchmark consists of 10 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name | Results | Time | + Interval | Change
--------- | ------- | ---- | ---------- | ------
Symfony4 - unknown route (1000 routes) | 995 | 0.0000085699 | +0.0000000000 | baseline
Symfony4 - last route (1000 routes) | 999 | 0.0000086754 | +0.0000001055 | 1% slower
FastRoute - unknown route (1000 routes) | 980 | 0.0000305154 | +0.0000219455 | 256% slower
FastRoute - last route (1000 routes) | 999 | 0.0000529922 | +0.0000444223 | 518% slower
Pux PHP - unknown route (1000 routes) | 972 | 0.0003162730 | +0.0003077032 | 3591% slower
Pux PHP - last route (1000 routes) | 999 | 0.0004376847 | +0.0004291148 | 5007% slower
Aura v2 - unknown route (1000 routes) | 976 | 0.0138277517 | +0.0138191818 | 161253% slower
Aura v2 - last route (1000 routes) | 989 | 0.0138914190 | +0.0138828491 | 161996% slower

## First route matching
This benchmark tests how quickly each router can match the first route. 1,000 routes each with 9 arguments.

This benchmark consists of 5 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name | Results | Time | + Interval | Change
--------- | ------- | ---- | ---------- | ------
FastRoute - first route | 999 | 0.0000016928 | +0.0000000000 | baseline
Pux PHP - first route | 999 | 0.0000017381 | +0.0000000453 | 3% slower
Symfony4 - first route | 997 | 0.0000029818 | +0.0000012890 | 76% slower
Aura v2 - first route | 977 | 0.0000376436 | +0.0000359508 | 2124% slower

Commits
-------

ee8b201 [Routing] Handle very large set of dynamic routes
@nicolas-grekas
Copy link
Member Author

I just published “Making Symfony’s Router 77.7x faster — 1/2”
https://twitter.com/nicolasgrekas/status/964513325815074817

Tobion added a commit that referenced this pull request Feb 24, 2018
…n possible (nicolas-grekas)

This PR was merged into the 4.1-dev branch.

Discussion
----------

[Routing] Redirect from trailing slash to no-slash when possible

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #26207
| License       | MIT
| Doc PR        | -

Implemented as suggest by @Tobion in #26059 (comment)

When a route for `/foo` exists but the request is for `/foo/`, we now redirect.
(this complements the flipped side redirection, which already exists.)

Commits
-------

69a4e94 [Routing] Redirect from trailing slash to no-slash when possible
@fabpot fabpot mentioned this pull request May 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.