-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Routing] Handle very large set of dynamic routes #26169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
$code .= "\n .'{$rx}',"; | ||
$state->regex .= $rx; | ||
|
||
// if the regex is too large, throw a signaling exception to recompute with smaller chunk size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious... why not have (reasonably large) fixed limit instead? is it variable in case of PCRE? It could avoid any recomputing needed no?
edit: wait this is compile info.. right? It gives the best chunk limit possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I have absolutely no idea what would that mean in practice, and the current logic saves me from answering this question :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The large fixture is a result of this choice i guess.. in theory that could break tests in different envs.. 🤔 should it be mocked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we should care about such concerns.
…-grekas) This PR was merged into the 4.1-dev branch. Discussion ---------- [Routing] Handle very large set of dynamic routes | Q | A | ------------- | --- | Branch? | master | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | - | License | MIT | Doc PR | - Follow up of #26059; allows handling very long lists of dynamic routes. Allows running https://github.com/tyler-sommer/php-router-benchmark seamlessly. BTW, here are the result of running this benchmark: R3 extension is not loaded. Skipping initialization for "Worst-case matching" test using R3. R3 extension is not loaded. Skipping initialization for "First route matching" test using R3. ## Worst-case matching This benchmark matches the last route and unknown route. It generates a randomly prefixed and suffixed route in an attempt to thwart any optimization. 1,000 routes each with 9 arguments. This benchmark consists of 10 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded. Test Name | Results | Time | + Interval | Change --------- | ------- | ---- | ---------- | ------ Symfony4 - unknown route (1000 routes) | 995 | 0.0000085699 | +0.0000000000 | baseline Symfony4 - last route (1000 routes) | 999 | 0.0000086754 | +0.0000001055 | 1% slower FastRoute - unknown route (1000 routes) | 980 | 0.0000305154 | +0.0000219455 | 256% slower FastRoute - last route (1000 routes) | 999 | 0.0000529922 | +0.0000444223 | 518% slower Pux PHP - unknown route (1000 routes) | 972 | 0.0003162730 | +0.0003077032 | 3591% slower Pux PHP - last route (1000 routes) | 999 | 0.0004376847 | +0.0004291148 | 5007% slower Aura v2 - unknown route (1000 routes) | 976 | 0.0138277517 | +0.0138191818 | 161253% slower Aura v2 - last route (1000 routes) | 989 | 0.0138914190 | +0.0138828491 | 161996% slower ## First route matching This benchmark tests how quickly each router can match the first route. 1,000 routes each with 9 arguments. This benchmark consists of 5 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded. Test Name | Results | Time | + Interval | Change --------- | ------- | ---- | ---------- | ------ FastRoute - first route | 999 | 0.0000016928 | +0.0000000000 | baseline Pux PHP - first route | 999 | 0.0000017381 | +0.0000000453 | 3% slower Symfony4 - first route | 997 | 0.0000029818 | +0.0000012890 | 76% slower Aura v2 - first route | 977 | 0.0000376436 | +0.0000359508 | 2124% slower Commits ------- ee8b201 [Routing] Handle very large set of dynamic routes
@nicolas-grekas: you can't use it directly as it's not PHP, but you might want to look at the prefix matching approach used by https://github.com/hhvm/hack-router ; our benchmarks put it at ~ 5x faster than fast-route's approach, which AIUI is what you've been following here:
In short, what we do is:
This means that each common constant part of the URL can be looked up in constant time - but it will be linear in terms of number of parts, and where regexes are needed, that part will have usual regex complexity. e.g. if we have /foo/bar => BarController, /foo/baz => BazController, and /herp/derp => DerpController, we'll make:
We'll then look up If we have |
@fredemmott happy to see you here :) That's interesting. I looked at the code but didn't run it. From what I understand, I'm not sure this would provide additional performances, because in the end there is still a regex call, and also because the regex we use embeds same-prefix tree optimizations. I also fear this approach is not compatible with our in-order logic. I'd be happy to be wrong though :) |
So, there's no regexp call for static routes - only when there's a variable. We're also not using the regexp for any static prefixes before a variable. The main advantage is that while the regexp is checking for common prefixes, we're looking them up with array key access; for large maps, this can be much faster. This is definitely not representative of real world performance, but illustrates the difference: <?php
// generate a bunch of /foo/{id} routes, then look them up
const LENGTH = 8;
const MAP_SIZE = 2500;
const LOOKUPS = 10000;
$random_parts = array_map(
function() { return bin2hex(random_bytes(intdiv(LENGTH, 2))); },
range(1, MAP_SIZE)
);
$prefix_map = [];
foreach ($random_parts as $part) {
$route = '/'.$part.'/';
$prefix_map[$route] = $part;
}
$regexp = '/^\/('.implode($random_parts, '|').')\/\d+$/';
$requests = array_map(
function() use ($random_parts) {
return [
$random_parts[random_int(0, count($random_parts) - 1)],
random_int(0, PHP_INT_MAX)
];
},
range(1, LOOKUPS)
);
// Optimize the requests for each map kind
$regexp_requests = array_map(
function($request) {
list($route, $id) = $request;
return '/'.$route.'/'.$id;
},
$requests
);
$prefix_requests = array_map(
function($request) {
list($route, $id) = $request;
return ['/'.$route.'/', (string) $id];
},
$requests
);
$regexp_results = [];
$prefix_results = [];
$start = microtime(true);
foreach ($regexp_requests as $request) {
$matches = [];
if (preg_match($regexp, $request, $matches) !== 1) {
throw new Exception('failed to match regexp');
}
$regexp_results[] = [$matches[1], $matches[2]];
}
$regexp_time = microtime(true) - $start;
$start = microtime(true);
foreach ($prefix_requests as list($prefix, $id)) {
$route = $prefix_map[$prefix] ?? null;
if ($route === null) {
throw new Exception('failed to match prefix');
}
$matches = [];
if (preg_match('/^\d+$/', $id, $matches) !== 1) {
throw new Exception('failed to match id regexp');
}
$prefix_results[] = [$route, $matches[1]];
}
$prefix_time = microtime(true) - $start;
printf(
"Regexp time:\t%fs\nPrefix time:\t%fs\nEqual: %s\n",
$regexp_time,
$prefix_time,
$prefix_results == $regexp_results ? 'yes' : 'no'
);
|
@fredemmott it would be great if you could run the router we have on master and compare it to yours! |
As for order: we have some hybrid approaches that preserve it, but not publicly. For hack-router, we:
This means that if your map is
We wont' preserve order: |
Sorry, not going to be able to do that any time soon; just tried, and the 'urgent' PHP7 compatibility list for HHVM just got a fair bit larger :/ In particular: |
Have fun, and thanks for the new ideas :) |
@fredemmott FYI, here is the kind of optims I'm was referring to: #26208. |
@fredemmott I wonder how your approach holds up against routes that have multiple variables for example |
Follow up of #26059; allows handling very long lists of dynamic routes.
Allows running https://github.com/tyler-sommer/php-router-benchmark seamlessly.
BTW, here are the result of running this benchmark:
R3 extension is not loaded. Skipping initialization for "Worst-case matching" test using R3.
R3 extension is not loaded. Skipping initialization for "First route matching" test using R3.
Worst-case matching
This benchmark matches the last route and unknown route. It generates a randomly prefixed and suffixed route in an attempt to thwart any optimization. 1,000 routes each with 9 arguments.
This benchmark consists of 10 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.
First route matching
This benchmark tests how quickly each router can match the first route. 1,000 routes each with 9 arguments.
This benchmark consists of 5 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.